Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You assume that the ECC is not already calculated when the data is in the cache (and the cache line is marked dirty). Caches in CPUs are often ECC protected, regardless of whether the memory has ECC pritection. The cache should already have the ECC computation done. Writes to ECC memory can simply reuse the existing ECC bytes from the cache, so no additional calculation time is needed on writes. Reads are where additional time is needed in the form of one cycle and if your cache is doing its job, you won’t notice this. If you do notice it, your cache hit rate is close to 0 and your CPU is effectively running around 50MHz due to pipeline stalls.

That said, this is tangential to whether the ECC DIMMs themselves run at lower MT/sec ratings with higher latencies, which was the original discussion. The ECC DIMM is simply memory. It has an extra IC and a wider data bus to accommodate that IC in parallel. The chips run at the same MT/sec as the non-ECC DIMM in parallel. The signals reach the CPU at the same in both ECC DIMMs and non-ECC DIMMs, such that latencies are the same (the ECC verification does use an extra cycle in the CPU, but cache hides this). There are simply more data lanes with ECC DIMMs due to the additional parallelism. This means that there is more memory bandwidth in the ECC DIMM, but that additional memory bandwidth is being used by the ECC bytes, so you never see it in benchmarks.



> You assume that the ECC is not already calculated when the data is in the cache (and the cache line is marked dirty).

It was case on the systems I worked with. Integrating it between the cache and memory controller is a great idea though, and it makes sense where you've described it.

> If you do notice it, your cache hit rate is close to 0 and your CPU is effectively running around 50MHz due to pipeline stalls.

Where memory latency hurts for us is ISRs and context switches. The hit rate is temporarily very low, and as you mentioned the IPC suffers greatly.


> Where memory latency hurts for us is ISRs and context switches. The hit rate is temporarily very low, and as you mentioned the IPC suffers greatly.

While that is true, that is infrequent and having those memory accesses take 151 cycles instead of 150 cycles is not going to make much difference. Note that those are ballpark figures.


For DDR4 it's 17 vs 16 memory cycles for the data burst phase.


If you measure memory access time in CPU cycles, you would see that 150 cycles is the ballpark. 16 cycles would be closer to L2 cache.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: