Available ECC dims are often slower than non-ECC dims. Both slower MT/s and high...

ryao · 2025-07-16T01:52:37 1752630757

This is incorrect. ECC DIMMs are no slower than regular DIMMs. Instead, they have extra memory and extra memory bandwidth. A 8GB DDR4 ECC DIMM would have 9GB of memory and 9/8 the memory bandwidth. The extra memory is used to store the ECC bits while the extra memory bandwidth is to prevent performance loss when reading/writing ECC alongside the rest of the memory. The memory controller will spend an extra cycle verifying the ECC, which is a negligible performance hit. In reality, there is no noticeable performance difference. However, where you would have 128 traces to a Zen 3 CPU for DDR4 without ECC, you would need 144 traces for DDR4 with ECC.

A similar situation occurs with GDDR6, except Nvidia was too cheap to implement the extra traces and pay for the extra chip, so instead, they emulate ECC using the existing memory and memory bandwidth, rather than adding more memory and memory bandwidth like CPU vendors do. This causes the performance hit when you turn on ECC on most Nvidia cards. The only exception should be the HBM cards, where the HBM includes ECC in the same way it is done on CPU memory, so there should be no real performance difference.

bilegeek · 2025-07-16T03:26:34 1752636394

Their second point is wrong (unless the silicon is buggy), but their first point is true. I researched when buying ECC sticks for my rig; nobody that I've found makes unregistered sticks that go above 5600, while some non-ECC sticks are already at 8200, and 6400 is commonplace.

Frustratingly, it's only unregistered that's stuck in limbo; VCC makes a kit of registered 7200.

ryao · 2025-07-16T04:27:48 1752640068

That is partly due to artificial reasons and partly due to technical reasons. The artificial reasons would be that the 8200 MT/sec UDIMMs are overclocked. Notice how they run much slower if you do not enable XMP/EXPO, which simultaneously over volts and overclocks them. These exist because a large number of people liked overclocking their memory modules to get better performance. This was unreliable and memory manufacturers noticed that there was a market for a premium product where the overclocking results were guaranteed. Early pre-overclocked modules required people to manually enter the manufacturer provided voltage, frequency and timings into the BIOS, but XMP and later EXPO were made to simplify this process. This idea only took off for non-ECC modules, since the market for ECC UDIMMs wants reliability above all else, so there never was quite the same market opportunity to sell ECC DIMMs that were guaranteed to overclock to a certain level outside of the memory IC maker’s specifications.

There is no technical reason why ECC UDIMMs cannot be overclocked to the same extent and ECC actually makes them better for overclocking since they can detect when overclocking is starting to cause problems. You might notice that the non-ECC UDIMMs have pads and traces for an additional IC that is present on ECC UDIMMs. This should be because the ECC DIMMs and non-ECC DIMMs are made out of the same things. They use the same PCBs and the same chips. The main differences would be whether the extra chips to store ECC are on the module, what the SPD says it is and what the sticker says. There might also be some minor differences in what resistors are populated. Getting back to the topic of overclocking, if you are willing to go back to the days before the premium pre-overclocked kits existed, you will likely find a number of ECC UDIMMs can and will overclock with similar parameters. There is just no guarantee of that.

As for RDIMMs having higher transfer rates, let us consider the differences between a UDIMM, a CUDIMM and a RDIMM. The UDIMM connects directly to the CPU memory controller for the clock, address, control and data signals, while the RDIMM has a register chip that buffers the clock, address and control signals, although the data signals still connect to the memory controller directly. This improves signal integrity and lets more memory ICs be attached to the memory controller. A recent development is the CUDIMM, which is a hybrid of the two. In the CUDIMM, the clock signal is buffered by a Client Clock Driver, which does exactly what the register chip does to the clock signal in RDIMMs. CUDIMM are able to reach higher transfer rates than UDIMMs without overclocking because of the Client Clock Driver, and since RDIMMs also do what CUDIMMs do, they similarly can reach higher transfer rates.

bilegeek · 2025-07-16T05:27:25 1752643645

Thanks for the explanation on CUDIMM, I never quite grokked the difference besides it being more stable with two sticks per channel. Hopefully they'll make an ECC CUDIMM at some point, but I'm not holding my breath.

consp · 2025-07-16T09:12:33 1752657153

If they don't and you are up for a challenge in bga soldering you can make them yourself if there is pad for the chips. You likely have to buy an extra module to get the chips though.

ryao · 2025-07-16T15:53:54 1752681234

This would also need a SPD programmer and possibly some additional SMT resistors, but it is possible in theory.

bobmcnamara · 2025-07-16T12:46:47 1752670007

Propagation delay is a thing.

Edit: at some point the memory controller gets a chunk from the lowest level write buffer and needs to compute ECC data before writing everything out to RAM.

Without ECC, that computation time isn't there. The ECC computation is done in parallel in hardware, but it's not free.

ryao · 2025-07-16T16:08:24 1752682104

You assume that the ECC is not already calculated when the data is in the cache (and the cache line is marked dirty). Caches in CPUs are often ECC protected, regardless of whether the memory has ECC pritection. The cache should already have the ECC computation done. Writes to ECC memory can simply reuse the existing ECC bytes from the cache, so no additional calculation time is needed on writes. Reads are where additional time is needed in the form of one cycle and if your cache is doing its job, you won’t notice this. If you do notice it, your cache hit rate is close to 0 and your CPU is effectively running around 50MHz due to pipeline stalls.

That said, this is tangential to whether the ECC DIMMs themselves run at lower MT/sec ratings with higher latencies, which was the original discussion. The ECC DIMM is simply memory. It has an extra IC and a wider data bus to accommodate that IC in parallel. The chips run at the same MT/sec as the non-ECC DIMM in parallel. The signals reach the CPU at the same in both ECC DIMMs and non-ECC DIMMs, such that latencies are the same (the ECC verification does use an extra cycle in the CPU, but cache hides this). There are simply more data lanes with ECC DIMMs due to the additional parallelism. This means that there is more memory bandwidth in the ECC DIMM, but that additional memory bandwidth is being used by the ECC bytes, so you never see it in benchmarks.

bobmcnamara · 2025-07-17T01:56:07 1752717367

> You assume that the ECC is not already calculated when the data is in the cache (and the cache line is marked dirty).

It was case on the systems I worked with. Integrating it between the cache and memory controller is a great idea though, and it makes sense where you've described it.

> If you do notice it, your cache hit rate is close to 0 and your CPU is effectively running around 50MHz due to pipeline stalls.

Where memory latency hurts for us is ISRs and context switches. The hit rate is temporarily very low, and as you mentioned the IPC suffers greatly.

ryao · 2025-07-17T03:55:56 1752724556

> Where memory latency hurts for us is ISRs and context switches. The hit rate is temporarily very low, and as you mentioned the IPC suffers greatly.

While that is true, that is infrequent and having those memory accesses take 151 cycles instead of 150 cycles is not going to make much difference. Note that those are ballpark figures.

bobmcnamara · 2025-07-17T15:24:46 1752765886

For DDR4 it's 17 vs 16 memory cycles for the data burst phase.

ryao · 2025-07-19T13:51:31 1752933091

If you measure memory access time in CPU cycles, you would see that 150 cycles is the ballpark. 16 cycles would be closer to L2 cache.

tverbeure · 2025-07-16T16:49:27 1752684567

The kind of ECC that’s used for register file and memory protection is trivial to compute and completely in the noise in terms of area. It is essentially free.

The reason people say ECC is not free is because it added area for every storage location, not because of the ECC related logic.

bobmcnamara · 2025-07-17T01:57:42 1752717462

> It is essentially free.

The cycle cost is often specified in the memory controller manual.