I had my formative years in programming when memory usage was something you still worried about as a programmer. And then memory expanded so much that all kinds of “optimal” patterns for programming just become nearly irrelevant. Will we start to actually consider this in software solutions again as a result?
You're right in terms of fitting your program to memory, so that it can run in the first place.
But in performance work, the relative speed of RAM relative to computation has dropped such that it's a common wisdom to treat today's cache as RAM of old (and today's RAM as disk of old, etc).
In software performance work it's been all about hitting the cache for a long time. LLMs aren't too amenable to caching though.
AFAIK, you can't explicitly allocate cache like you allocate RAM however. A bit like if you could only work on files and ram was used for cache. Maybe I am mistaken ? (Edit: typo)
A fun fact for the people who like to go on rabbit holes. There is an x86 technique called cache-as-RAM (CAR) that allows you to explicitly allocate a range of memory to be stored directly in cache, avoiding the DRAM entirely.
CAR is often used in early boot before the DRAM is initialized. It works because the x86 disable cache bit actually only decouples the cache from the memory, but the CPU will still use the cache if you primed it with valid cache lines before setting the cache disable bit.
So the technique is to mark a particular range of memory as write-back cacheable, prime the cache with valid cache lines for the entire region, and then set the bit to decouple the cache from memory. Now every access to this memory region is a cache hit that doesn't write back to DRAM.
The one downside is that when CAR is on, any cache you don't allocate as memory is wasted. You could allocate only half the cache as RAM to a particular memory region, but the disable bit is global, so the other half would just sit idle.
Same as the failure of Itanium VLIW instructions: you don't actually want to force the decision of what is in the cache back to compile time, when the relevant information is better available at runtime.
Also, additional information on instructions costs instruction bandwidth and I-cache.
> you don't actually want to force the decision of what is in the cache back to compile time, when the relevant information is better available at runtime
That is very context-dependent. In high-performance code having explicit control over caches can be very beneficial. CUDA and similar give you that ability and it is used extensively.
Now, for general "I wrote some code and want the hardware to run it fast with little effort from my side", I agree that transparent caches are the way.
that solves the pollution problem, but it doesn't pin cache lines. it also doesn't cover the case that ppc does where you want to assert a line is valid without actually fetching.
That seems correct, but it also doesn’t account for managed languages with runtimes like JavaScript or Java or .NET, which probably have a lot of interesting runtime info they could use to influence caching behavior. There’s an amount of “who caches the cacher” if you go down this path (who manages cache lines for the V8 native code that is in turn managing cache lines for jitted JavaScript code), but it still seems like there is opportunity there?
thats a strange statement. its certainly not black and white, but the compiler has explicit lifetime information, while the cache infrastructure is using heuristics. I worked on a project which supported region tags in the cache for compiler-directed allocation and it showed some decent gains (in simulation).
I guess this is one place where it seems possible to allow for compiler annotations without disabling the default heuristics so you could maybe get the best of both.
There are cache control instructions already. The reason why it goes no further than prefetch/invalidate hints is probably because exposing a fuller api on the chip level to control the cache would overcomplicate designs, not be backwards compatible/stable api. Treating the cache as ram would also require a controller, which then also needs to receive instructions, or the cpu has to suddenly manage the cache itself.
I can understand why they just decide to bake the cache algorithms into hardware, validate it and be done with it. Id love if a hardware engineer or more well-read fellow could chime in.
Because programmers are in general worse at managing them than the basic LRU algorithm.
And because the abstraction is simple and easy enough to understand that when you do need close control, it's easy to achieve by just writing to the abstraction. Careful control of data layout and nontemporal instructions are almost always all you need.
There has! Intel has Cache Acceleration Technology, and I was very peripherally involved in reviewing research projects at Boston University into this. One that I remember was allowing the operating system to divide up cache and memory bandwidth for better prioritization.
This is not applicable to most programming scenarios since the cache gets trashed unpredictably during context switches (including the user-level task switches involved in cooperative async patterns). It's not a true scratchpad storage, and turning it into one would slow down context switches a lot since the scratchpad would be processor state. Maybe this can be revisited once even low-end computers have so many hardware cores/threads that context switches become so rare that the overhead is not a big deal. But we are very far from anything of the sort.
I would say this is the main benefit of cuda programming on gpu. You get to control local memory. Maybe nvidia will bring it to the cpu now that the make CPU’s
You can in CUDA. You can have shared memory which is basically L1 cache you have full control over. It's called shared memory because all threads within a block (which reside on a common SM) have fast access to it. The downside: you now have less regular L1 cache.
OTOH, LLM inference tends to have very predictable memory access patterns. So well-placed prefetch instructions that can execute predictable memory fetches in parallel with expensive compute might help CPU performance quite a bit. I assume that this is done already as part of optimized numerical primitives such as GEMM, since that's where most of the gain would be.
I've actively started to use outlook and teams through chrome to free up some of my ram, easily saves 3-4gb. It's gotten ridiculous how much ram basic tools are using, leaving nothing for doing actually real work
People get on me all the time about not installing programs on my computer. I run everything in the browser, if I can. Partly so I can kill it properly without it misbehaving, and partly because I don't trust their software at all. Zoom, Slack, Gmail, etc-- if I can run it in the browser, then that's the only way I'll run it.
Same for me on mobile. I don’t install the Amazon app I just use the browser where I can limit tracking and only log in when actually buying something.
Or at least improving the shared browser ui / chromeless experience for "app" installs. I think that Tauri is pretty reasonable as well, weak link being Linux currently.
On my personal desktop, I have 96gb... I've never gone over 70 or so.. but that was with a lot of services running a fairly complex system with data loaded locally. I generally don't five a f*ck about the ram I'm using day to day. I'll run various updates and reboot between once a month and once a quarter.
I doubt it. I predict in a few years, maybe sooner, one/some of the AI companies buying up the supply will either have achieved their goal or collapsed, and then the market will be flooded with a glut of memory driving prices low again. Or, conversely, the demand stays high for a sustained period of time and the suppliers just increase supply. There's no hard bill of materials/technical reasons for the memory prices to be this high, unlike 20+ years ago.
And in the meantime, major buyers (government, big orgs) adjust by extending the planned lifespan of their computers, and upping the IT wage budget a bit to support that. That adjustment probably won't go away after supply returns.
Im always shocked how much good IT equipment is shoved into the trashbin:
At a lot of companies I could make a great deal - either for using it on my own or selling it on Ebay later on.
Big Corporations offen trash IT equipment thats only 3 - 4 years old. And there is no recycling etc. Very sad.
Big corporations tend to send old hardware through the surplus marketplace. There's lots of 3-4 year old corporate computers for sale. Often, the company leases the computers and then the lessor will sell them when they're returned.
As long as it's working (and not gross), why do you need a new monitor? My current monitor is a 2010 model, I think I got it around 2013. I don't know what a new monitor would do for me, other than have a worse aspect ratio, cause Dell stopped making 30" 16:10 monitors.
In theory, yes. However a lot of these monitors are still 2560x1440 and are 30”+. The ppi is quite low. I’m looking for 4k and something that looks similar enough to the M4 MBP I’m working on. A lot of these just don’t look good as they used to.
1440p is good enough that you aren't going to see individual pixels - just sit far back enough from the screen and use reasonable font hinting (Mac users are sadly out of luck here, but even then 2160p/4K is overkill).
> A machine from 5 years ago feels just as fast as a brand new machine.
Except you can't install Windows 11 on it, and the org has to trash it anyway to keep up with security requirement (I know people on that line of work, they're all angry about it)
AI companies aren't buying RAM, they are buying the Wafers themselves. Then they are making special AI stuff. So the RAM never exists, and there will be no glut memory coming. Maybe some DDR5 will dribble out, but HBM isn't something we can use (at the moment).
> And then memory expanded so much that all kinds of “optimal” patterns for programming just become nearly irrelevant.
I don't think that ever happened. Using relatively sparse amount of memory turns into better cache management which in turn usually improves performance drastically.
And in embedded stuff being good with memory management can make the difference between 'works' and 'fail'.
The need to use optimal patterns didn't go away, but the techniques certainly did. Just as a quick example, it's usually a bad idea now to use lookup tables to accelerate small math workloads. The lookup table creates memory pressure on the cache, which ends up degrading performance on modern systems. Back in the 1980s, lookup tables were by far the dominant technique because math was *slow.*
> Back in the 1980s, lookup tables were by far the dominant technique because math was slow.
This actually generalizes in a rather clean way: compared to the 1980s, you now want to cheaply compress data in memory and use succinct representations as much as practicable, since the extra compute involved in translating a more succinct representation into real data is practically free compared to even one extra cacheline fetch from RAM (which is now hundreds of cycles latency, and in parallel code often has surprisingly low throughput).
It obviously never became completely irrelevant. But I think programmers spend a lot less time thinking about memory than they used to. People used to do a lot of gymnastics and crazy optimizations to fit stuff into memory. I do quite a bit of embedded programming and most of the time it seems easier for me to simply upgrade the MCU and spend 10cents more (or whatever) than to make any crazy optimimzations. But of course there are still cases where it makes sense.
While thinking less about memory optimizations is possible since we have more memory, it was enabled by the languages and libraries we use. Fourty years ago, you were probably implementing your own data structures. Sure, there were plenty of languages that offered them back then (LISP was based on linked lists, and that language is from the 1960's). Chances are you weren't using such languages unless you were using big computers or writing software that didn't handle much data. These days, pretty much any language will provide at least some data structures and their related algorithms. Even systems programming languages like C++ and Rust. Of course, there are an absurd number of libraries if you need anything more specialized.
Coincidentially, last night, and I'm not pulling your leg! But to be fair that's the first time in much more than a decade. I don't normally work with such huge files and this was one very rare exception. I also nearly crashed my machine by triggering the OOM killer after naively typing 'vi file' without first checking how large it had become. I'm working on a project that I probably should run on a more serious machine but I don't feel like moving my whole work environment from the laptop that I normally use.
I never really bought in to the anti-Leetcode crowd’s sentiment that it’s irrelevant. It has always mattered as a competitive edge, against other job candidates if you’re an employee or the competition of you’re a company. It only looked irrelevant because opportunities were everywhere during ZIRP, but good times never last.
Most developers work at banks, insurance companies and other “enterprise” jobs. Even most developers at BigTech and who are working “at scale” are building on top of scalable infrastructure and aren’t worrying about reversing a btree on a whiteboard.
Agree that the whiteboard thing is often not applicable but it's so nice when a developer has efficient code if only because it indicates that they know what's going on and also that there are fewer bugs and other bottlenecks in the system.
Those bugs don’t come from using the wrong algorithm, they come from not understanding the business case of what you’re writing. Most performance issues in the real world for most cases don’t have anything to do with the code. It’s networking, databases, etc.
Your login isn’t slow because the developer couldn’t do leetcode
No, it's because 50k reads of settings are happening with a SQL Table in memory that's queried via SQL statement instead of a key/value hashtable. (real world experience, I think it was close to 28k reads, but the point stands)
It's not like most developers are wasting memory for fun by using Electron etc. It's just the simplest way to deploy applications that require frequent multiplatform changes. Until you get Apple to approve native app changes faster and Linux users to agree on framework, app distribution, etc., it's the most optimal way to ship a product and not just a program.
Not for fun but for convenience (laziness occasionally?). Someone needed to "pay" for the app being available on all platforms. Either the programmer by coding and optimizing multiple times, or the user by using a bloated unoptimized piece of software. The choice was made to have the user pay. It's been so long I doubt recent generations of coders could even do it differently.
RAM didn't get more expensive to produce. It just got more desirable. The prices will come down again when supply responds. It may take some time, but it will happen eventually.
RAM production is highly inelastic and controlled by an oligopoly. They have little desire to increase production considering the lead time and the risk that the AI demand might be transient.
They actively prefer keeping confortable margins than competing between each other. They have already been condemned for active collusion in the past.
New actors from China could shake things up a bit but the geopolitical situation makes that complicated. The market can stay broken for a long time.
They are increasing production as fast as they can (which is not fast at all, it's more like slowly steering a huge ship towards the correct direction) because current prices are too high even when accounting for the historical oligopoly dynamics. They can easily increase their collective profits by making more.
RAM manufacturers don't increase production as fast as possible, because they've been through enough boom and bust.
Rapid increase in capacity leads to oversupply which leads to negative margins. They've been there before, and they don't want to go there again.
RAM manufacturers do routinely setup new fabs and decommision old fabs. Maybe they're trying to hurry up new fab construction in times like these, and they would likely defer shutting down old fabs or restart them where possible. But they're less likely to build new fabs that weren't already part of their long term plans.
They've actually not seen such prices before. DRAM now costs as much per Gb as it did around 2006-2007 - despite around 20 years of real technical progress since then! That's genuinely unprecedented.
As far as I know, they are merely shifting capacities from the customer market towards the data center market with minimal retooling. I am unaware of any of the three actively investing in new capacity. Some modest increase are planned but nowhere near what you would expect given current demand.
We would have, if the expensive memory was a long term trend. It is not - eventually the supply will expand to match demand. There is no fundamental lack of raw materials underlying the issues, it is just a demand shock.
Also, it's not like we have regressed in the process itself either, which was historically the limiting factor. As you said this is purely an economics thing resulting from a greedy shift in business focus by e.g. Micron.
I just heard in a podcast, they talked about how powerful our devices are today but do not feel faster than they did 15 years ago and that it's because of what you write here.
I have a 2020 Intel Mac (quad core, 16gb RAM) and it feels as slow as the Packard Bell from 2000 when I was a kid. The launchpad takes 1-2 seconds to show a bunch of icons. Absolutely insane!
When I train some leetcode problems, I remember the best solution was the one that optimised cpu (time) instead of memory. Meaning adding data index in memory instead of iterating on the main data structure. I thought, ok, thats fine, it's normal, you can (could) always buy more RAM, but you can't buy more time.
But well, I think there is no right answer and there always be a trade off case by case depending on the context.
Android's investing significantly in reducing the memory usage of the next release simply because the BOM cost of RAM for their low-end partners is becoming prohibitive.
But if that new or different because of this event? No it's not, Android has had several initiatives to enable low end devices, from optimizing full fatter Android, to inventing new versions of Android.
Android has been talking about these kinds of things for a long time. But if they're actually meaningfully making progress on them, it's most likely because of real pressure. (He types on his phone with 6GB of ram)
In this case it is explicitly because of the RAMpocalypse. The initiatives have existed forever but they've gotten a lot more funding and a lot more exec attention because of the situation in the hardware market.
Yes, it's a nefarious plot of AI producers to attempt a monopoly with a product that no one seems capable of demonstrating has the exponential value they're betting on.
Once everybody has a decent amount of VRAM they can just run local AIs and the need to mess with Ad-laden search results will fizzle. So of course they are desperate to grab a new monopoly. People haven't realised yet, that local AIs are fast and produce good results - on pretty average hardware. If they don't manage to grab a new monopoly Google will be history.
But it doesn't really need a nefarious plot for the price spikes. There is a serious lack of VRAM deployed out there. Filling that gap will take quite some time. Add to that the nefarious plot and the situation will most likely get even worse....
LLM inference is mostly read only, so high-bandwidth flash looks like it could provide huge cost savings over VRAM. It's not yet in commercial products but there are working prototypes already. Previous HN discussion:
Although their stated reason for hoarding is that they "really need it", I think it was a strategic move to make their competitors' lives more difficult with little regard for the collateral consequences to non-competitors, such as regular people or companies needing new computers.
I can never understand why so many people resort conspiracy theories when the obvious answer is supply and demand. I know well educated people, who do this when they talk about the resential property market. (Including an accountant).
Supply and demand can be caused by a conspiracy. OpenAI secretly bought 40% of the world's RAM on purpose. It's only a conspiracy if Anthropic and Google did something similar, though.
Eventually new capacity will come online, and the money the DRAM companies are making are going to accelerate even ,ore new capacity. If you can get your new capacity going before your competitors, maybe you can avoid a bubble burst. If you don’t build new capacity, your competitors will, etc, etc…
They're not building any new manufacturing capacity though. They assume this is a demand bubble and they don't want supply to exceed demand after it pops.
multiple major DRAM factories are currently being built or planned, driven by AI demand and government incentives. Micron is constructing a massive $100 billion "megafab" complex in New York, with groundbreaking occurring in January 2026, and is building new facilities in Idaho. Other projects include expansion in Singapore and Japan.
Key DRAM Factory Construction Projects:
Micron Technology (USA): Building a $100 billion, 4-fab complex in Clay, New York (first production expected around 2030) and a new $15 billion, 2-fab project in Boise, Idaho.
Micron (Global): Investing in expanding capacity in Singapore and Taiwan.
Nanya Technology (Taiwan): Previously initiated a $10.69 billion DRAM facility in New Taipei, Taiwan.
A quick search tells me the megafab in New York was announced years ago, the Singapore fab is for NAND flash, and the Taiwan fab already exists and they're buying it. So none of those are in response to the AI demand for RAM, are they?
I get that you are an AI skeptic but you can do better than that with a quick search these days. HBM for high end (commercial) GPUs.
SK Hynix
The current HBM market leader is fast-tracking multiple "megafabs" and packaging centers.
Cheongju, South Korea (P&T7): A new $13 billion advanced packaging and testing plant dedicated to stacking and testing HBM chips. Construction is set to begin in April 2026, with completion by late 2027.
Cheongju, South Korea (M15X): This fab is being fast-tracked for HBM4 mass production, with the first cleanroom now expected to open in February 2027.
Yongin, South Korea: SK Hynix is investing roughly $22 billion in the first fab of a massive new semiconductor cluster. Operations are planned to start in February 2027.
West Lafayette, Indiana, USA: A $3.87 billion advanced packaging site that will integrate HBM directly onto GPUs. Construction fencing was installed in February 2026, with production targeted for late 2028.
Samsung Electronics
Samsung is accelerating its "Shell First" strategy to secure production space ahead of competitors.
Pyeongtaek, South Korea (P4 & P5): Samsung has advanced the construction of the P5 cleanroom by several months, with a new operational target of late 2027. The P4 line is expected to come online even earlier, likely during 2026.
Taylor, Texas, USA: This $17 billion "megafab" is designed for advanced logic and HBM packaging. While hit by delays, it is now targeting a late 2026 opening.
Micron Technology
Micron is diversifying its HBM production across the U.S. and Asia to grow its market share.
Boise, Idaho, USA (ID1 & ID2): The ID1 fab reached a key milestone in June 2025 and is expected to start wafer output in the second half of 2027. ID2 is planned to follow shortly after.
Onondaga County, New York, USA: Micron officially broke ground in January 2026 on a $100 billion "megafab" complex, though significant supply is not expected until near 2030.
Hiroshima, Japan: A planned $9.6 billion HBM-focused fab is expected to come online between 2027 and 2028.
Singapore & Taiwan: Micron began construction on a $24 billion wafer facility in Singapore in January 2026 and acquired a fab in Taiwan for $1.8 billion to rapidly expand DRAM capacity by late 2027.
For lower end GPUs, like what goes into Apple machines.
New LPDDR Production Facilities
Samsung (Pyeongtaek P4 & P5): Samsung is converting several NAND flash lines to DRAM and accelerating the P4 and P5 fabs in South Korea. While these fabs support HBM, they are also designed for mass-producing 6th-generation 1c DRAM, which will form the basis of the next-gen LPDDR6 modules expected to debut in 2026.
SK Hynix (Icheon & M15X): SK Hynix is planning an 8-fold increase in 1c DRAM production by the end of 2026. This capacity will be split between HBM and "general-purpose" DRAM, which includes the LPDDR variants used in mobile and laptop chips.
Micron (Boise, Idaho - ID1): Micron's new ID1 fab in Boise is currently under construction, with structural steel completion reached in late 2025. It is scheduled to begin wafer output in the second half of 2027, focusing on leading-edge DRAM that includes LPDDR for the U.S. market.
The "Memory Wall" for Apple
The primary challenge is that HBM production requires significantly more wafer area than standard LPDDR. Consequently, even as these new factories open, the shortage of commodity DRAM (LPDDR5X/LPDDR6) is expected to persist through 2028 because manufacturers find HBM far more profitable.