Why would you use that instead of bare metal? You'll come out >$4k per month with sustainable use, for that price you get the same power in bare metal?
Preemptible is a bit cheaper but are 600GB memory really worth it for short running applications? Until you loaded everything in memory your machine probably gets destroyed..
EDIT: Not sure about the exact CPU performance but should be quite close to what OVH offers here? With the same memory configuration and 2TB NVMe this still costs <$1,500/month (https://www.ovh.co.uk/dedicated_servers/hg/180bhg1.xml).
You can find a cheaper bare metal system with comparable performance for every instance type. That is not a new argument for high vCPU/Memory instances.
If (monthly) price is the main concern "the cloud" is probably not for you. Other people obviously massively value the benefits of it and pay the premium, and that is not suddenly going to stop for a new instance size.
In addition to that, those instances might be deployed for short time were much power at once is needed and powered off after that. In that case the cloud might even be cheaper as Google offers pricing by Minute (or was it second now?) whereas most bare metal providers bill by the month.
"Most bare metal providers bill by the month." Maybe you mean dedicated hosting providers?
Any provider letting you spin up bare-metal through an API almost certainly bills at much finer grain than monthly, although they may quote monthly prices to make it easier for customers to assess cost.
There is overhead for space, reliability, networking, and maintenance for bare metal. If you have an existing datacenter the marginal costs are tiny, but if you are only in the cloud or not planning to need these beasts for years on end, the net costs are much lower to just harness the economies of scale provided by GCP, AWS, et al. As for OVH, the amount you’d save in networking between that box and the rest of your infrastructure by having everything on one cloud provider probably pays for the difference.
Add in flexible network storage options and integration with existing security infrastructure. You are thinking too small by comparing a single box to a single box.
You are thinking too big by talking about having an existing datacenter. Many companies colocate anywhere from 1U to a private room with many racks in an shared datacenter.
I used a GCE to test some image processing software I wrote a while ago (it runs on a very large dataset). I configured a 64 core machine with 128gb of memory. It ran perfectly, although it cost about $200 to run the test for a day.
Sure, it wasn't the highest performance per CPU, but I didn't have to buy the bare metal, I can scale up the number of cores if need be, and I can fire one up whenever I want one.
For the price of 2-3 days you should be able to get a dedicated OVH server for a month. The 64 GCP cores are 32 real cores, so monthly rent of $600 should get you there.
For one reason, if you only need the one system, you almost certainly need to compare the cost to owning and maintaining two.
If you can't tolerate the server being down for more than the lead time it takes to get a new one, you need to already have one on standby. The lead time is probably at least a couple weeks, but there's no guarantees since you're depending on vendor availability and hundreds of other things out of your control.
Depending on what you're running on it, you'll also probably want to test software upgrades and have fallback plans when you deploy.
The thing the "cloud" version gets you is zero lead time, along with the ability to spin up a second instance (or ten, if you want) while you deploy a new version or just want to do some testing.
Maybe you need the capacity only for some hours here there?
If all your data and other processing is in cloud A, moving some processing to B might not be feasible (moving lots of data takes time, security requirements may complicate setup)
One of the biggest data-lockin factors is network I/O costs. These have been kept artificially high by all cloud providers and act both to deter import/export and also to subsidize other functionality.
I run semi-bandwidth intensive applications and DigitalOcean and LightSail are actually better deals than EC2 for the amount of bandwidth. $5/mo for 1 TB on DO/LS vs $90 for 1 TB on EC2.
We use a mix of dedicated hardware and DO/LS to meet our needs as bandwidth on the major cloud providers was just too expensive.
Yeah, OVH is great, I highly recommend them, we use their cheap dedicated servers via Kimsufi and SoYouStart at a few locations.
I seem to recall a friend having stability issues with their VPSes several years ago, so we stuck to the dedicated stuff from them, but it's been extremely good especially considering the price. Have you had good stability with their VPS services?
Unfortunately all I can do right now is point to the anecdata of others as in the link above and additional pointers within that project[1][2]. If you have time and can share any more details on your experience I would tremendously appreciate it.
I am debating starting a Twitch -> YouTube video stream duplicator/archiver that would initially make money by auctioning available capacity with the long-term goal of being aquired by Twitch since their integration is so unreliable.
I'm doing a similar (but different) kind of thing going from Twitch to YouTube. At the moment, Google Cloud Platform doesn't charge for bandwidth egress to their own services, so YouTube uploads are free if you use GCP.
Thanks for the specific tip! Link for the lazy: http://gamebot.gg/ A Show HN would probably do well (with a bit of behind-the-scenes in the comment), as would in-depth blog posts if you're doing machine learning.
I actually tried the Hearthstone one and the very first clip in the current example 'Greatest Clips' (BJwDyxrplpo) appeared to miss the actual action (clicking "Disenchant" - which may actually have been the point since it could have been just a tease) but the rest of the clips seemed complete (and interesting).
I've thought about stream-jumping/recording based on simple indicators like increases in chat comments, viewers, followers, etc. How much of this could be built off Twitch's own 'clip' functionality (whether initiating them yourself or aggregating the manual curation of others -- neither of which AFAIK has an API right now) and collecting them later? Separate note I'm trying to hide in this pararagraph: don't overfit if you want to apply this tech to other streaming sites where real money is flowing (aka NSFW).
Personally I don't care so much about specific games on Twitch (except Street Fighter, which gets relatively little love but your videos are a real time-saver) instead of personalities. It might be worth offering this service to them, focused solely on collecting their highlights. had a tough time with the non-English streams but not sure what options you have there.
I'm also interested to see how this will turn out for you using Twitch content if they notice that what you're doing is catching on. Twitch seems to be leaving a lot of low-hanging fruit behind for othes to capitalize on.
Feature wise: more playlists, maybe monthly and/or collecting the highlights of the highlights, with most comments/views/thumbs-up on previous YouTube videos. If there was a way to incorporate chat since most streamers don't in their videos, you should. https://github.com/PetterKraabol/Twitch-Chat-Downloader
Bug-wise, it seems like something is going wrong with the links as the end of this video, appx. 30 seconds of moving images but no links in Firefox with ad block [disabled as legacy]. (8ql3id1lJoM, ilkKuvuna10)
Ahh you found it! No machine learning at the moment, just using the Twitch clips api. Machine learning would be very helpful for some problems however, specifically to weed out clips in which the broadcaster specified the incorrect game.
You're right about offering the service to the streamers - that's definitely the way to go to make a business out of it and it's something I've considered. However, I was mostly interested in doing the project for fun, and for some passive income, and making it a service would definitely not be passive.
The clips API returns language information about the clips, which you can use to filter them. Before that, I had to manually maintain a blacklist of non-English streamers.
I do monthly highlight videos, but they're solely based on clip views on Twitch - it doesn't use YouTube analytics, which I'm sure would improve the videos.
It is a cool idea to include chat - another thing I've considered but haven't implemented, though I've noticed some Twitch highlight channels (that do manually edited videos) do it. Thanks for the link to the downloader.
The links at the end of the videos are tricky - there's no api for that, so currently they're populated by a Firefox macro on a desktop that's supposed to be run every day - looks like there's an issue with it running! The better version would be to use a webscraper or headless browser to automate those clicks via the render server. That's what I'm supposed to be working on next, in fact...
These are still 4 to 5 figure prices. Larger companies really don't care about these tiny fees, especially compared to the licensing costs of the software running on these servers. It gets the job done faster and easier, so it's worth it, especially when everything else might already be in Google Cloud.
Cool. So, where do you put that server? does it have sufficient power, AC, generators, UPS? How much does that cost per month?
Who monitors that PC, and does preventative maintenance, etc? If a part looks like its going to fail, where do you migrate your workload to, so you can take that server offline for repairs. (you need multiple servers)
Since you have multiple servers, how much does your 40GB networking cost (with 100GB uplinks) so you aren't constrained by the network? And what kind of storage network do you have, so that you can live-migrate these running machines around to different machines?
Lastly, if you co-locate the server somewhere, what does it cost for multiple redundant internet connections to the facility? And where is your failover facility, that is at least a few hundred miles away?
I think this is a very important point and one that many folks don't fully appreciate:
The total cost of ownership of a computing asset is several times greater than the cost of the actual asset.
Think of it this way: a dog can be obtained for a very nominal cost (or free) but the cost to house, feed, entertain, and provide healthcare for is non-trivial.
it's not unheard of for just the costs of deploying a new device into a large organization to be something like EIGHT TIMES what the cost is for the actual asset. That's just to get the hardware deployed and NOT the cost to keep it running.
Cutting down on TCO and streamlining the deployment of resources is a big part of the sell for cloud deployments. Particularly for computing assets that may otherwise spend a lot of their time idle.
Softlayer's network is crap compared to google and aws. That being said in my previous company we used a combination of Softlayer (for dedicated machines) and AWS for cloud. There is definitely a use case for each, but as Nrsolis mentions there is additional cost in things beyond the cost of the initial hardware itself.
At least with AWS you get placement groups, which can help a lot. With Softlayer we saw entirely too much packet loss on a regular basis, and they try to upsell you on things to "fix" it.
Thank you, I was looking for this comment. That is the value of the cloud: the reduction in TCO, the predictable pricing, up-time guarantees, and bandwidth availability.
...ok? As I just said, we (as a company) dont care about a few thousand per month in exchange for letting Google Cloud handle everything for us. We're definitely not interested in buying some used server from ebay and then figuring out where to run it.
How old is that server? I do not see ECC RAM if you care... also are the other pieces of the server going to fail anytime soon?
How loud is it? How much energy does it consume while running? How hard is it to configure and keep running? What kind of firmware does it have and will it be a problem updating?
These are all the questions I would have before buying a beast like that...
The whole “electricity costs a lot” argument is getting tiresome. Where I live, electricity is 11 cents per KWh. That means even if you run that machine full tilt 24x7 (which you won’t), and even if it draws 1KW (which it won’t), it’s still only $79/mo in electricity cost.
well it's not just electricity. to run your server you probably at least need a network, ups and probably other things. this stuff especially gets ugly when you want to have a network with many servers. at least most dedicated server providers charge a ton of money for interconnection of servers. well at least ova actually provides a vRack for dedicated servers but its not always free.
Power in a lot of places is 2-3x that, and for a naively designed data centre you can double that to include the cost for air conditioning (yes you can do a lot better but on a small scale, basic AC is going to equal your workload draw).
then you would need multiple of them (for redundancy) if you use it in production environment, then goes addirional maintenance of hard drives and so on. so what would be end prive for bare bones?
What if you don't have sustained usage? What if you are developing software for scientific data processing, and you usually work on small data sets for testing, and once in a while you have huge computing needs?
No. Scientific workloads rarely scale well when having to do a lot of communication over a commodity network. You're also assuming people would rather have a bunch of machines lying around that they had to pay upfront for than just paying for an occasional single instance? You're oversimplifying things if you literally think it never makes financial sense to use the cloud. It's the same as saying it's never cheaper to have health insurance. Objectively, that has to be true on average, yeah, but then why do so many people buy health insurance? You're managing complex risk at the expense of overhead. Even when it sucks, it's not feasible for everyone to keep enough cash laying around for when they get hit by a bus.
We used a config not quite this big for a Nominatium database rebuild. It takes weeks on an underpowered server, but hours (or a day?) on something with enough resources.
Once rebuilt, using the database is fine on a normal server.
There are quite a few problems where you need a lot of memory and CPU performance for just a relatively short amount of time like a few hours per day or even just a few hours per week. Forecasting or complex optimization problems for example.
In these cases the amount of money you spend on hardware virtual or otherwise is negligible. Depending on what you do it might just as well be a rounding error.
Not only that but due to the usual NUMA mismatch, additional page tables, iommu, poor storage connectivity/sharing, etc between the bare metal and the VM, the VM is likely losing a significant chunk of perf vs the bare metal.
Frankly, I have a hard time understanding why the convenience of being able to call an API to get a VM (vs using an API to get bare metal) continues to be an advantage. I am reminded of the reddit articles about all the effort they went through to re-optimize their app (by batching queries) for the longer database latencies at AWS... Its like they never considered all that work might also apply to bare metal and save them even more money...
The problem with this isn't the price for the computing power / memory, the killer are the traffic costs in the cloud that are going to bankrupt you before this thing is even at half utilization.
> >$4k per month with sustainable use, for that price you get the same power in bare metal?
not so sure about that + a hoster who provides such a machine as bare metal wants a setup fee, needs time to setup and a minimum contract duration much longer than one month
guess there are not many hosters which have such a beast as bare metal in stock and available in few minutes (are they any hosters at all?); they will order sch machine themselves and you will wait at least a week
Minimum contract length: 1 month, total cost (including setup): $4,384 (on-going month to month cost thereafter: ~$2,191.20).
For that you'd get an aggregate total of 4TB RAM, 7.6TB HDD (SSD), 96 real Intel E5-1650 v3 cores (or 192 vCPUs) and 800TB of bandwidth.
Sprinkle with terraform/ansible/k8s/docker and you have a resilient, massively powerful compute cloud with no long term obligation that's about half the price of GCE if you keep it around beyond 30 days. Or another way to look at it: if you needed such a platform for two years, your second year would be free compared to GCE.
One major issue with this approach (versus GCE's "all in one" box) could be network performance bottlenecks depending on what task(s) you were using such a cluster for.
There's IBM/Softlayer for that. Their hourly bare metal offering goes up to just 256 GB of RAM, though. More than that and you'll have to do a monthly commitment.
Yeah, the main reason to go with Amazon in this case is if you only need the box for a few hours (ie: you're doing data science or similar). For long term high load use, bare metal, even managed bare metal like that is almost always cheaper.
Everyone jumps to OVH for dedicated server price comparisons, but what about pricing for similar US coastal datacenter locations? Are there even US dedicated server providers with pricing lower than “call us?”
I totally believe that OVH is cheapest for European companies serving European customers, but that’s apples to oranges when we’re talking about American cloud providers.
I would think lead time. To get the quote from your hardware vendor, get the PO approved from finance, get the box built and shipped, getting it racked and stacked in DC. This whole sequence can easily take longer than a month.
You can rent a bare metal server from Codero and get it provisioned in about an hour. Prices from about $100 to $1000 a month. At the high end, you get roughly what Google is offering here.
I had a big simulation to run that required lots of memory and lots of cores. I rented a machine for some 10 hours that it took and happily paid the 30-40 dollars that were charged.
those machines have a lot of value for specific workloads.
But that one costs 6.4x the 16 core variant for 6x the cores. Are there any applications where you're heavily dependent on having all cores in one machine?
Anything where the workers have to synchronize their work with each other often. Having everybody talking to each other over the network quickly kills performance.
Interesting that usually GCP is one upping AWS on some metric but in this case it isn't touching the current largest compute/memory instance on EC2, the X1 family with up to 128 vCPUs and 4TB of memory. Though the blog does allude to them testing such types in a closed beta, it still is a game of catch up still.
Yikes, I'm a little out of the loop. I didn't realize you could get 4TB of RAM in a single machine on EC2.
I've been seeing that medium data keeps getting bigger (i.e. the features of traditional RDBMS are eating away at the need for specialized / distributed stores for data analysis). But so too does it appear that small data is getting a lot bigger too—just load that dataset into memory for analysis. 4TB of memory allows for pretty big "small data."
"I remember back when we used to do gradient descent to estimate linear models; back in the long ago when we didn't have 900 exabytes of memory attached to our NVidia Matrix Crusher 9000 linear algebra accelerator unit."
Is it possible that data isn't getting bigger - but that the people who work with it just want to process larger data sets than before?
I mean before they'd train a model of 1,000 inputs and then test it against another 50 and call it a day. Now they want to train it against 1,000,000 inputs.
Am I completely off base? It's not my area, though I work with databases, my observation is that developers always want to use the most data possible even when it doesn't really provide any benefit.
Sorry, I was being a bit playful with language. What I mean is that, if you roughly define small, medium, and large data in terms of the strategies required to process, then the absolute size of the data that can be processed using simpler methods grows.
And whether or not more data is needed or collectible varies by discipline. Astrophysics collects way more data than they used to because 1. they need it. 2. instrumentation allows it.
Some kinds of data collection hasn't scaled up however. Surveying humans is expensive and labor intensive. And for many things that you might want to study about humans, you can't simply afix a sensor to them. So, what might have been only accomplished through big data, or medium data methods a few years ago can now be loaded into memory (i.e. small data strategies).
That is my experience recently. Developers storing 500GB on a database (pre-launch), with < 1GB of meaningful data. A bunch of json logs that they knew data science would want eventually, but couldn't be bothered to either pare down or put in a more sensible place.
The thing is, it didn't really matter; Postgres still had a ton of performance left over even after the product went live. If you can still fit it in RAM, why waste $$$ of dev time over the $$ cost of a bigger instance.
You only need 2 processors, since each cores gives you 2 vCPUs. "For the n1 series of machine types, a virtual CPU is implemented as a single hardware hyper-thread" --https://cloud.google.com/compute/docs/machine-types
There aren't quad socket Skylake yet :). You can also see that the largest public single package is 28 cores (56 threads). I'll sadly have to let you figure out how many because silly.
I honestly wonder why did Google cite "SAP HANA" as the _reference_ workload for such setups. I never heard of that product. Noobie asking here, is it the _reference_ workload for such setups ? And are there more demanding workloads ?
AWS used it as their reference workload for similarly sized boxes too. SAP HANA is a huge deal in certain sectors. I assume there were plenty of clients they couldn't land without support for it.
Sure, legacy software is still catching up to multiple cores. Postgres is much better now or you can use SQL Server instead. None of them are native multi-master anyway so scale up is still better than scale out unless you only need reads.
I am not sure it is correct to call Postgres or MySql "legacy software"
To me, legacy software would be something like using Novel PeopleManager 2002 or WordPerfect 7.
Postgres 10.0 is < a week old, and is a perfectly fine path to go for a brand new application. You are fooling yourself if you think you should just throw every new application in MongoDB or Cassandra because they are "web scale".
If you are using "Legacy Software" to mean "software that has existed for a long time", then I guess sure. But there are many pieces of software that are very new, and could benefit from a single instance with a lot of cores.
The most common use of "Legacy Software" is "Old crusty stuff which needs to be replaced", which is NOT at all the case for MySql or Postgres.
It seems you've misunderstood my comments. Perhaps read them again? I'm saying that it's easier to have fewer more powerful servers than many more smaller ones, but even then some software designed for single-node operations still has problems utilizing all the cores (like postgres until recently). I never mentioned mongodb or cassandra.
Yes, Citus is good stuff. We use MemSQL though as it's a better fit for our (data warehouse) needs. There's also TimescaleDB, PipelineDB, CockroachDB, and more, lots of interesting options today for distributed "newsql" databases.
You can scale HANA vertically and horizontally. Vertical makes sense for ERP workloads where you want low latency (avoiding communication). The ERP workloads of most companies can fit in RAM easily nowadays. Horizontal scaling is best for data warehousing/analytical workloads, which tend to be more CPU bound due to lots of calculations.
Horizonal scaling costs more in base licensing, not to mention whatever 3rd party stuff you're running in your Hana. Enough to encourage scaling up as far as you can before scaling out.
I would honestly like to hear of another reference workload for machines of this type. I can't find very many uses other than databases for that much memory.
I think you can store every single uncompressed frame of a bluray movie in memory of an AWS X1... but even then, so what?
Did they increase the per-instance network bandwidth caps? Previously instances had 2Gb/s/core of network bandwidth capped at 16Gb/s, which makes 8-core nodes the sweet spot for network bandwidth.
I never quite understood why this is necessary if they're cutting up larger machines. Why should the total network cap matter if I have two 8-core instances or one 16-core instance on the same physical machine?
"The egress traffic from a given VM instance is subject to maximum network egress throughput caps. These caps are dependent on the number of cores that the VM instance has. Each core is subject to a 2 Gbps cap for peak performance. Each additional core increases the network cap, up to a theoretical maximum of 16 Gbps for each instance. The actual performance you experience will vary depending on your workload. All caps are meant as maximum possible performance, and not sustained performance."
Pretty simple to understand why, they're "guaranteeing" that bandwidth. The hosts have a finite amount of network bandwidth available, they also have a finite amount of cores.
Judging by the ratio it's likely something such as 2x 56gbe network ports on the host (=112Gbps), which has 56 vcpu (2x xeon 14c/28T)
Can you really spin one of these up on demand? That obviously means that they have machines of that size (or greater) sitting idle, waiting for someone to use them. That's mind-boggling.
They can still divide them up for the smaller sizes preemptible and destroy those once they get a request in. That way utilisation should be quite good. Yield probably not as high (since hardware costs will be higher than for multiple small machines) but the yield of renting out the highest variant sometimes will be worth it.
Yes, I configured a 64 cpu machine in a matter of minutes, ran a test on it for 24 hours, then shut it down and deleted the instance. Total cost was around $200.
Intel pages says these 28-core hyperthreaded processors work for 8+ sockets. Let's see...8 sockets * 56 virtual processors per socket = 448 virtual processors potentially in one VM.
Does someone have experience using this (or other) cloud solutions in a scientific HPC context? The only obvious disadvantage I can think of is that every student running something incurs a certain cost whereas after one has bought the actual computers, running jobs is somewhat free.
Depends heavily on the projected utilization. If you know your compute node is going to be computing for the next 3 years with at least medium utilization, then the self hosted metal is probably going to be quite a bit cheaper.
Its amazing how much hardware you can pack into a single machine for 10k€. Last year our group bought two additional high-memory (768GB) nodes for around that price each (including support for a couple years from the vendor).
A few years before we bought 40 nodes with 128GB RAM each, for a similar price to last years high-memory nodes (and a fast interconnect and a lot of storage).
If you are at a larger research institution, you probably also have an IT department that can co-locate your hardware for next to nothing (compared to cloud). There you also will save a lot of ingress/egress, storage, backup, etc. costs.
Regarding the per student costs, even with cloud instances I would consider running a traditional HPC job system (grid engine, lsf, torque, ...). The MIT had a nice solution with Starcluster [1] to easily deploy a SGE on AWS. It looks a bit dead now though.
> Depends heavily on the projected utilization. If you know your compute node is going to be computing for the next 3 years with at least medium utilization, then the self hosted metal is probably going to be quite a bit cheaper.
Isn't that already the case for 1 month? Bare metal doesn't mean own data centre or colocation. If you go with a hosting provider most offer dedicated hardware on a monthly contract. As long as you need them longer than 1-2 months that should be significantly cheaper than Google/AWS.
That's usually the case for almost everything on AWS/Google. If you're using them for specific features, or for very bursty work (e.g. if you use the instances less than about 6-8 hours a day), they can be cost effective, but the moment you use instances full time and don't leverage/depend on a ton of extra services, you're paying way above the odds.
The biggest downside (other than cost) I've found is that each vCPU core is quite a bit slower that what you'll get on equivalent real hardware. So any code that doesn't scale more or less linearly across an arbitrary number of cores will suffer.
scientific computing and simulation.
I have had need for machines like this - for a project that was very compute heavy but embarrassingly parallel, but also was very "chatty" - I.e updated the data structures a lot during compute
I can't be too specific about it, but if involved creating a very large tree structure and updating, pruning and transversing the tree a lot
If the algorithm is updating and reading a large data structure a lot it's only practical from a speed point of view to hold the whole structure in RAM
Private companies want to do simulations as well, and with this type of solution, you can pretty much run them on demand rather than having to wait in line.
My advisor's company straddles the public / private divide, but we've definitely done some simulations for private clients on NERSC, and I assume we weren't misusing hours allocated for some other purpose.
For this type of problem - it must be done on a single node. The overhead of network communication would have killed the latency requirement - that's why we need a huge machine like this.
The code was written in C
(We also maintain a Hadoop and Cassandra cluster, and I use Spark for distributed computation - but those are different projects)
The Super Micro SuperServer 8048B-TR4FT lists that it supports up to 12TB DDR4 ECC RAM (which could have 4xE7-8890v4 for 96 cores / 192 threads). And the 7088B-TR4FT lists that it supports up to 24TB DDR4 ETC RAM (with a corresponding 192 cores / 384 threads).
This is not a SKU to play with. If $20/hr is indeed the price (I don't know), this is the hourly cost of a couple of waiters. You get to run SAP on someone's infra and someone to support it.
Do people really enjoy not knowing what they are buying some providers provide some info on what vCPU is other don't. Many people think it's actual CPU core.
Preemptible is a bit cheaper but are 600GB memory really worth it for short running applications? Until you loaded everything in memory your machine probably gets destroyed..
EDIT: Not sure about the exact CPU performance but should be quite close to what OVH offers here? With the same memory configuration and 2TB NVMe this still costs <$1,500/month (https://www.ovh.co.uk/dedicated_servers/hg/180bhg1.xml).