Why did memory prices spike late in the year?

Why did memory prices spike late in the year?

November 25, 2025
Source: Investing.com

Investing.com -- Memory prices have surged in recent months as AI workloads force hyperscalers to buy far more DRAM and NAND than expected.

Demand tied to AI is more complex than a broad data center buildout. Instead, software changes and model architecture shifts are creating a step change in how much memory each GPU consumes, pulling more DRAM and NAND into every cluster.

One driver is new versions of Nvidia’s CUDA software, which let GPUs tap larger pools of memory across an entire system. Features in CUDA 12.8 and 13.0 allow models to treat GPU and CPU memory as one unified space, making oversubscription easier and encouraging developers to allocate far larger working sets.

That means AI servers need more DRAM and more SSD capacity in the background to support paging and storage of model data.

The rapid expansion of context windows in large language models is another major shift. As models process inputs running into hundreds of thousands of tokens, memory becomes the main bottleneck.

These longer sequences require significantly more VRAM to hold intermediate data, and when that overflows, the system must offload to host RAM or SSDs. Hyperscalers have adopted this hierarchy at scale, using NVMe drives as an extension of system memory.

Longer contexts also require higher throughput from storage. Reading large prompts in real time and supporting many users at once demand fast, high capacity NAND.

Modern inference workloads involve frequent random reads across model parameters and databases, which SSDs handle far better than hard drives. This is pushing cloud providers to expand flash-based storage pools built on high performance NAND.

BofA analysts say improvements in CUDA and new attention algorithms reduce some memory overhead but ultimately enable larger workloads.

As GPUs handle longer sequences, the underlying systems must support more data streaming in from SSDs. Multi GPU designs also play a role by spreading massive models and contexts across many accelerators, which increases pooled memory needs and pushes more data to flash.

Broader electronics markets have also rebounded at the same time AI demand has tightened supply, creating a “super cycle” for memory.

With PC, phone and traditional data center spending recovering, the surge in NAND and DRAM tied to AI has left inventory thin and driven prices sharply higher.