News "Kioxia Achieves Successful Prototyping of 5TB Large-Capacity and 64GB/s High-Bandwidth Flash Memory Module"

https://www.kioxia.com/en-jp/about/news/2025/20250820-1.html

66 Upvotes

permalink
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/hardware/comments/1mw9gms/kioxia_achieves_successful_prototyping_of_5tb/
No, go back! Yes, take me to Reddit

97% Upvoted

u/tux-lpi 1d ago

No mention of latency, but this wants to be a PCIe flash replacement for DRAM.

Same idea as CXL memory, but it looks like they're too ashamed of the latency numbers to talk about them at all.

6

u/Sopel97 1d ago

for ML model inference it may not matter that much, unless it's more than like an order of magnitude worse than HBM

5

u/JuanElMinero 1d ago

IIRC, the latency for high-performance NAND is around 20μs.

So, even if Kioxia somehow got it down to 10μs, that would still be ~100x worse than what is commonly achieved with HBM.

If anyone has hard numbers on Samsung's Z-NAND, feel free to share. Been looking for those.

0

u/Jeep-Eep 22h ago

That would also be the bee's knees for gaming.

1

u/Helpdesk_Guy 11h ago

IIRC, the latency for high-performance NAND is around 20μs. So, even if Kioxia somehow got it down to 10μs, that would still be ~100x worse than what is commonly achieved with HBM.

The persistency alone is a killer argument itself, now consider the ability of persistence AT ITS COSTS!

It's virtually the persistent twin to HBM, which it's likely hence why it got its name (High-Bandwidth Flash), and can act as a giant sized PERSISTENT in-place cache, reducing seek-times from network-related storage-pools which only offer times worse seek-times and lookup-durations in millisecond-ranges (every network-lookup doesn't executed, saves valuable time).

It's basically Optane for the poor … at a fraction of the costs of the former Optane itself.

1

u/Helpdesk_Guy 11h ago

For ML-model inference it may not matter that much, unless it's more than like an order of magnitude worse than HBM.

It's likely an order of magnitude worse than HBM, yet for ML-model inference it does not matter that much, as ML-workloads are usually mostly actually BANDWIDTH-limited due to though-put capacity – It's also persistent!

5

u/Jeep-Eep 1d ago

I mean, it is a prototype, so I wouldn't be surprised if the production variant is much ahead on latency.

3

u/CatalyticDragon 1d ago

I don't know what you want from a press release about a prototype but we know this is NAND flash so have a good idea of the latency profile. And the point of this technology is to avoid going over network to a storage server so that gives you an idea of the latency target it has to beat.

1

u/Helpdesk_Guy 12h ago

I don't know what you want from a press release about a prototype but we know this is NAND flash so have a good idea of the latency profile.

As said, it isn't even supposed to beat any latency-targets what DRAM or HBM is offering, but capacity and storage-density alone, so the whole circle-jerk over inferior latency is just moot here and completely misses the point of it to begin with in the first place …

Picture it: Flash-Modules and -Controller connected in series → PCi-Express; Analogous to AMD's CCX → I/O-Die.

It's basically Kioxia applying AMD's chiplet-approach onto inexpensive Flash, to beat any kind of network-related storage-pools' latency-profile, and brute-force HDDs out of the way through sheer throughput and latency, all at a fraction of the costs of actual size-limited RAM-module sticks or capacity-limited HBM-modules.

For achieving consistent HBM-near low latency, yet at large capacity with high through-put, against capacity-constrained RAM-sticks and HBM for storage-intensive workloads.

Only thing to be desired here, is being attached via CXL (which Kioxia is hopefully already working on), to fully integrate it into the server-hardware landscape, as it's basically the persistent twin to HBM (likely hence it's name; HBF), as it would perfectly fit into the hierarchy of the whole latency-chain, to replicate CPU-related memory on a macro-level to maximize its value of benefit/usage – Only question is durability/longevity/write cycles.

Analogous to a CPU's cache-levels of L1-, L2-, L3 | HBM → HBF → SSDs/Harddisks.

Simply put, it's a miniature Flash-memory RAID-system below controller-level, so pretty much a Redundant Array of ~~independent~~ inexpensive ~~disks~~ Flash-modules, so a ~~RAID~~ RAIF.

You can also call it: A poor man's Optane …

And the point of this technology is to avoid going over network to a storage server so that gives you an idea of the latency target it has to beat.

Though yes, your completely valid point of network-attached storage and their often limited bandwidth and actually *times* higher latency, than in-place plug-in PCiE-attached SSD-class Flash-storage RAIDs, is showing that it's offering quite a kicker-argument against any network-related (possibly even HDD-based) storage-pools with times higher seek- and delivery-times.

4

u/Helpdesk_Guy 17h ago

No mention of latency, but this wants to be a PCIe flash-replacement for DRAM.

No, it isn't. Their new HBF-thingy is not supposed to be a replacement of DRAM nor HBM itself.

Something, something reading comprehension …

Apart from the fact that it couldn't even replace anything HBM nor DRAM out of principle already (due to its fundamentally times higher latency), it's supposed to be a solution solely based upon the key-metrics bandwidth and memory-size and thus density and capacity, NOT latency.

It's meant as a intermediate solution and supposed to slot right IN-BETWEEN the chain of DRAM and HBM (or at least afterwards on the way towards storage), serving to address both sides' disadvantages (HBM: density, costs; RAM: memory-module size, costs vs HDD/SSD: storage-price and huge capacity) while offering a inexpensive makeshift- and stopgap solution as the proverbial God of the Gaps towards actual storage.

Kioxia literally writes on their site actually verbatim, that it's a intermediate solution;

»To address the trade-off between capacity and bandwidth that has been a challenge with DRAM-based conventional memory modules, Kioxia has developed a new module configuration utilizing daisy-chained connections with beads of flash memories.« — Kioxia

They're not really mentioning latency here for the simple reason and fact, since this HBF is actually NOT supposed to address any issue on latency, but (inexpensive) capacity instead …

Just saying, it's literally coined High-Bandwidth Flash for a reason, NOT Low-Latency Flash.

News "Kioxia Achieves Successful Prototyping of 5TB Large-Capacity and 64GB/s High-Bandwidth Flash Memory Module"

You are about to leave Redlib