Complexity, sure. But cost? I thought a single 1 TB RAM server is more expensive than 10x 100 GB RAM servers.
And many people don't want to deal with physical hardware. Dealing with physical hardware increases operational complexity too. They want to rent a virtual/cloud server. Which provider allows you to rent a virtual server with 1 TB RAM?
Cost follows complexity, even if it's not always immediately obvious.
A 1TB RAM server is more expensive than 10x 100GB RAM servers, but the hardware cost is often small compared to the business and technical cost of getting a solution to scale across a cluster.
Of course, generalizations are always dangerous—the take-home point here is perhaps that before going to a cluster because “that's the way big data is handled,” it's a good idea to do a proper cost-benefit analysis.
I took a look around for "high-ram" servers, and it seems one I can buy today, is HP ProLiant DL580 Gen9. With just 256 GB of ram, it clocks in at 540.995,- NOK (71.5k USD). It has 96 ram slots, and I can't seem to find anything bigger than 32 GB DDR4 RAM, and rounding the price up 96x32GB comes to roughly 672.000,- NOK (~90k USD). Adding that up (throwing away the puny ram installed), gets us to a little over double the original price, or 1.212.995,- (~161k USD). This has 4 18 core E7s (72 cores) clocked at 2.5Ghz -- and 3TB of ram (half of max, because of 32 GB dimms).
It is true that while the jump from 256GB to 3TB is "just" ~2x -- I could get a server for 1/10 of the price of the original configuration -- but only with 4GB of RAM, and nowhere near even 18 hardware threads.
If you are CPU limited (even at 72 hw threads) you might need more, smaller servers.
But such a monster should scale "pretty far" I'd say. Does cost about half as much as a small apartment, or one developer/year.
Dell sells servers with 96x64GB RAM. There is a huge (7x) premium for the 64GB DIMMs instead of 32GB, so it runs around 500k, with almost the entire price going to RAM.
A Dell R920 with four E7-8880L v2 15 core processors (e.g. 60 real threads, 120 with HT), and 1024GB / 1TB of memory costs about $50,000 USD. To go to 1.5TB of memory pushes you to $60,000 USD.
Expensive in a relative to a low-end server or month of cloud usage, but that's an absurd amount of computational power.
> Complexity, sure. But cost? I thought a single 1 TB RAM server is more expensive than 10x 100 GB RAM servers.
Sure, but in the latter case you'd also have to pay for the manpower to build a cluster solution out of a formerly simple application. And people are usually more expensive than servers.
Apart from what other commenters already said about the cost of software complexity, is there a variant of Amdahl's law that could be used here? 10x 100 GB servers working on a problem together will probably never be 10x faster than 1x 100 GB server. Perhaps just the increase in the order of magnitude of the distance information needs to travel is already sufficient to set some higher bounds... So you may need to buy more than 10 of them to match 1x 1 TB server.
Although, that means the 10x setup must cost much more. I think the idea in the comments above was taking 10 cheaper, weaker servers and somehow coming out with roughly the same price...
Well, in any case, things just got too complicated :)
SoftLayer lets you rent a server with 512 GB ram directly from their order form. (Monthly price for the ram is $1,444.00; a dual xeon 2620 server you can put it in is $380/month). It's baremetal, not virtual, but you can file tickets with them for any hardware stuff that comes up.
If you work outside the order form, you can get 768 GB, too. 1 TB is possible with their haswell servers, but availability seems limited.
10 servers with 100G will use a lot more power and will require distribution of your algorithm right along with your dataset, so instead of 10 server you will probably end up with a pretty high multiple of 10.
It's a reasonable cloud vs. on-premise argument. Obviously the scale of the data transfer to a cloud has more to do with the dataset size than the number of instances.
And many people don't want to deal with physical hardware. Dealing with physical hardware increases operational complexity too. They want to rent a virtual/cloud server. Which provider allows you to rent a virtual server with 1 TB RAM?