points by marcan_42 4 years ago

Don't forget it's not just instruction sets; Intel is the reason we don't have ECC RAM on desktops. Every other high density storage technology has used error correction for a decade or two, but we're still sitting here pretending we can have 512 billion bits of perfect memory sitting around that will never go wrong, because Intel fuse it off on desktop chips. I guess only servers need to be reliable.

AMD supports ECC on their consumer chips, but without Intel support it's never taken off and some motherboards don't support it, or if they do it's not clear in the documentation. I do use ECC RAM on my Threadripper machine and it does work, but I had to look for third party info on whether it would and dig around DMI and EDAC info to convince myself it was really on. It also makes it safer to overclock RAM since you get warnings when you're pushing things too far, before outright failures. And it helps with Rowhammer mitigation.

Apple M1s don't do ECC in the memory controller as far as I can tell, but at least they have a good excuse: you can't sensibly do ECC with 16-bit LPDDR RAM channels. There's no such excuse for 64/72-bit DIMM modules. I do hope we work out a way to make ECC available on mobile/LPDDR architectures in the future, though. Probably with something like in-RAM-die ECC (which for all I know might already be a thing on M1s; we don't have all the details).

formerly_proven 4 years ago

> Don't forget it's not just instruction sets; Intel is the reason we don't have ECC RAM on desktops. Every other high density storage technology has used error correction for a decade or two, but we're still sitting here pretending we can have 512 billion bits of perfect memory sitting around that will never go wrong, because Intel fuse it off on desktop chips. I guess only servers need to be reliable.

And not just storage - the main memory bus is the only data bus in a modern computer that doesn't use some form of error correction or detection. Even USB 1.0 has a checksum. So everywhere else we use ECC/FEC or at least a checksum, be it PCIe, SATA, USB, all storage devices as you mentioned rely heavily on FEC, all CPU caches use ECC. Except the main memory and its bus. Where all data is moved through (eventually). D'uh.

  • marcan_42 4 years ago

    Yup. PCIe will practically run over wet string, thanks to error detection and retransmits and other reasons, but try having a marginal DRAM bus and see how much fun that is...

    • vanderZwan 4 years ago

      Could be a fun way to test and demonstrate robustness of various parts of computer hardware, actually. It's already been done with ADSL for example:

      [0] https://www.revk.uk/2017/12/its-official-adsl-works-over-wet...

      • marcan_42 4 years ago

        My comment was actually a self quote from a talk I gave about PS4 hacking where I described that PCIe will happily run over bare soldered wires without much care for signal integrity, at least over short distances (unlike what you might expect of a high-speed bus like that) :)

        Not literally wet string, but definitely low tech. ADSL is special though, not many technologies can literally run over wet string :-)

        • vanderZwan 4 years ago

          Well, then we could make it a scale of what the worst transmission medium is that two pieces of hardware can (sort of) communicate across :)

      • cameron_b 4 years ago

        off topic, but I commend the article if just for the conclusion

PragmaticPulp 4 years ago

> Intel is the reason we don't have ECC RAM on desktops.

Intel has offered ECC support in a lot of their low-end i3 parts for a long time. They’re popular for budget server builds for this reason.

The real reason people don’t use ECC is because they don’t like paying extra for consumer builds. That’s all. ECC requires more chips, more traces, and more expense. Consumers can’t tell if there’s a benefit, so they skip it.

> AMD supports ECC on their consumer chips, but without Intel support it's never taken off

You’re blaming Intel’s CPU lineup for people not using ECC RAM on their AMD builds?

Let’s be honest: People aren’t interested in ECC RAM for the average build. I use ECC in my servers and workstations, but I also accept that I’m not the norm.

  • marcan_42 4 years ago

    > You’re blaming Intel’s CPU lineup for people not using ECC RAM on their AMD builds?

    I'm blaming the decade+ of Intel dominance for killing any chance of ECC becoming popular in non-server environments, just as RAM density was reaching the point where it is absolutely essential for reliability.

    > The real reason people don’t use ECC is because they don’t like paying extra for consumer builds. That’s all. ECC requires more chips, more traces, and more expense. Consumers can’t tell if there’s a benefit, so they skip it.

    Motherboard traces are ~free and the feature is in the die already, so it requires zero expense to offer it to consumers. Intel chose to artificially cripple their chips to remove that option. Yes, I know there are a few oddball lines where they did offer it. They should have offered it across the board from the get go, seeing as they were selling the same dies with ECC for workstation use.

    • Karunamon 4 years ago

      ECC memory on the other hand is always going to be more expensive.

      • marcan_42 4 years ago

        Indeed, which is why it should be an option.

        OTOH, it shouldn't be significantly more expensive. It should be ~9/8 the cost of regular memory. It's just one extra chip for every 8. Nothing more.

        • my123 4 years ago

          in-band ECC is also a thing. In that scenario, you give up some capacity for the ECC bits but stay with the same DRAM config as before.

          (in-band ECC is present on Elkhart Lake Atoms and on Tegra Xavier for example)

        • namibj 4 years ago

          Actually less, because you only need the additional memory chip and associated trace layouting, not any additional PCB manufacturing cost (beyond miniscule yield impact of the additional traces) and no significant added distribution cost (packaging, shipping weight, etc.).

    • PragmaticPulp 4 years ago

      > I'm blaming the decade+ of Intel dominance for killing any chance of ECC becoming popular in non-server environments

      I disagree. AMD has offered ECC support for a while and it’s not catching on. It doesn’t make sense to blame this on Intel.

      > Motherboard traces are ~free and the feature is in the die already, so it requires zero expense to offer it to consumers.

      Yet it’s missing from a substantial number of AMD boards, despite being supported. You have to specifically confirm the motherboard added those traces before buying it.

      Traces aren’t entirely free. Modern boards are densely packed and manufacturers aren’t interested in spending extra time on routing for a feature that consumers aren’t interested in anyway.

      • marcan_42 4 years ago

        > Traces aren’t entirely free. Modern boards are densely packed and manufacturers aren’t interested in spending extra time on routing for a feature that consumers aren’t interested in anyway.

        Or they just don't care because it's not already popular and unbuffered ECC RAM isn't even particularly widely available. The delta design cost of routing another 8 data lines per DIMM channel is tiny. Especially on ATX boards and other larger formats. I could see some crazy packed mini-ITX layout where this might be a bit harder, but definitely not in the normal cases.

        (I've routed a rather dense 4-layer BGA credit card sized board; not exactly a motherboard, but I do have a bit of experience with this subject. It was definitely denser than a typical ATX board per layer.)

        • simoncion 4 years ago

          > ...unbuffered ECC RAM isn't even particularly widely available.

          Every time I've gone looking for unbuffered ECC RAM over the past three or five years, I've had no trouble finding it. In my experience, the trick is to shop for "server" RAM, rather than "desktop" RAM.

          Are there speeds or capacities here that you'd particularly like to see that aren't present? <https://nemixram.com/server-memory/ecc-udimm/>

          • marcan_42 4 years ago

            It's available, but not nearly as widely, and even less so at reasonable prices. Last time I had to buy ECC RAM over here in Japan, I had to go to a niche webshop to get a decent price on the capacity I was interested in. For every other PC part I'd just use Amazon and get it delivered next day, usually at the market lowest price or almost.

            • simoncion 4 years ago

              > Last time I had to buy ECC RAM over here in Japan, I had to go to a niche webshop to get a decent price on the capacity I was interested in.

              I've been quite satisfied with the three orders that I've placed with Nemix. I see no indication that they _don't_ ship to Japan, and indications that they _do_ ship internationally... so consider purchasing from them next time you have a need for such memory.

              Or, hell, I'd be _shocked_ if there wasn't a company in JP or or KR or CN that also does what Nemix does (that is, yank RAM from decommissioned servers, test it, and sell the stuff that's solid).

              And, with Ryzen (and Ryzen Threadripper) becoming ever-more popular, I would expect ECC RAM to continue to drop in price when compared to non-ECC RAM. (But, let's be real here, when you spread the price out over five or ten years, it's _totally_ worth it to have RAM you can rely on.)

      • Avamander 4 years ago

        > I disagree. AMD has offered ECC support for a while and it’s not catching on. It doesn’t make sense to blame this on Intel.

        It does make sense. Imagine if only 50% of web browsers supported a feature, would you implement it in your website?

        Point being, the low market share of ECC-compatible setups means that the market demand for ECC is low, which means that the selection is low, which means the prices are higher than they could be. So yes, absolutely Intel has contributed massively to the issue.

  • formerly_proven 4 years ago

    > Intel has offered ECC support in a lot of their low-end i3 parts for a long time. They’re popular for budget server builds for this reason.

    Intel removed ECC support in the 10th gen so you have to go for Xeon nowadays.

    • jeffbee 4 years ago

      With DDR5 you can have (a form of) ECC on all current 12th-generation Core CPUs. That is, if you were able to find DDR5 DIMMs on the market, which you currently cannot.

      • temac 4 years ago

        Not really: internal ECC in DDR5 is an implementation detail that is neither exposed on the bus nor giving you the real reliability and monitoring capability that real ECC terminated in the memory controller did. It is only there because the error rate would be absolutely horrific without, so you need internal ECC to get to basically the same point you were without ECC on DDR4.

        • marcan_42 4 years ago

          I expect in-chip ECC should still be a significant improvement for RAM reliability (any ECC is going to be better than none, even if your memory array is significantly worse; I've had my share of RAM with weak bits that would absolutely be fixed with that), but it's not going to help with bus errors and isn't nearly as transparent to system software as end to end ECC is.

          • temac 4 years ago

            In theory some weak ECC on top of particularly unreliable storage can still be less reliable than way more reliable storage not employing any ECC, but I also suspect this won't be the case here. However, if the target reliability is only say 2 or 3 times what you had DDR4 without ECC, it is still completely unsuitable for serious applications. And really we should find another name for the internal ECC of DDR5, because the services it provides is completely different from real ECC.

            • jeffbee 4 years ago

              In the industry we already have terms of art that are better than just “ECC”. Normally we speak of EDAC, error detection and correction. We refer to them by their capabilities, such as SECDED, ChipKill, or whatever.

  • makomk 4 years ago

    As far as I can tell, Intel only offered ECC on a small handful of i3 parts that mainly seemed to be marketed to NAS manufacturers, likely because they were otherwise giving up that market entirely to competitors like AMD. They really don't seem to be interested in offering it as an option on consumer desktops.

  • rasz 4 years ago

    >You’re blaming Intel’s CPU lineup for people not using ECC RAM on their AMD builds?

    Yes. ECC was standard on first IBM PC 5150, on PS/2 line, on pretty much all 286 clones etc. Intel killed ECC on the desktop when moving to Pentium, prior to that all of their chipset products (486) supported it. 1995 artificial market segmentation shenanigans https://www.pctechguide.com/chipsets/intels-triton-chipsets-...

  • temac 4 years ago

    They did support ECC on some i3 simply because they did not bother to double the sku, however IIRC you need the server / WS S chipset to enable it. At which point just put an entry level Xeon on that.

    In the absolute the cost of ECC everywhere would not be substantially greater than the prices we have now without. The current ECC prices are high because it is not broadly used, and not really the inverse. Consumer skip it because it is fucking hard to get ECC enable parts for S SKUs (or H / U) in the current situation, while there are plenty of non-ECC vendors and resellers, and something like at least 3 times the number of SKUs. And consumers have not been informed they are buying unreliable shit.

ClumsyPilot 4 years ago

"pretending we can have 512 billion bits of perfect memory sitting around that will never go wrong, because Intel fuse it off on desktop chips"

I think computers are now so important to our life, we need to start regulating them like we do cars.

Start seriously slapping companies that deliberately or negligently release equipment with obsolete kernels and security holes, mandate ECC like we mandate ABS, mandate part avaliability for 10 years like we do with cars, etc.

Every day we let this this slide, thousands of people loose precious data and number of 'smart' toasters mining crypto increases.

  • kmeisthax 4 years ago

    My main worry with this sort of thing, is that if we start mandating legal liability, and security becomes a compliance line-item, then companies are going to start locking down everything they ship so they have a legal defense in court. The argument's going to be, "if we are liable for shipping insecure desktops then you shouldn't be allowed to install Linux onto them and then sue us when you get hacked".

    Think about how many laptops ship with Wi-Fi whitelists with the excuse of "FCC certification". It doesn't matter that the FCC doesn't actually prohibit users from swapping out Wi-Fi cards; manufacturers will do it anyway.

    • Bolkan 4 years ago

      Just add a physical seal on the product like other dumb electronics do.

rbobby 4 years ago

> AMD supports ECC on their consumer chips

And now the next desktop consumer upgrade I purchase will be AMD and will have ECC (well... unless it's way more expensive).

philistine 4 years ago

Since the Mac Pro has ECC Ram, I would expect a future Apple Silicon Mac Pro to offer it as well with its desktop M1 chip, with the functionality trickling down the line in years to come.

freemint 4 years ago

DDR5 is a form of ECC and DDR5 is only supported on Intel so far.

  • wtallis 4 years ago

    The DDR5 memory bus used by Intel's latest consumer processors does not have ECC enabled. The memory dies themselves have some internal ECC that is not exposed to the host system and is not related to the fact that they use a DDR5 interface; all state of the art DRAM now needs on-die ECC due to the high density.

    • freemint 4 years ago

      So what it has on die ECC which allows to recover from radiation induced bitflips and stuff. Maybe to compensate for density the error correction is a bit more busy and can compensate less errors per minute but 0.5 ECC instead of full ECC on DDR4 (no random errors due to density) is still an improvement for most people in terms of immunity to unlucky cosmic rays.

bluedino 4 years ago

> Don't forget it's not just instruction sets; Intel is the reason we don't have ECC RAM on desktops.

Of course we do: workstations.

It's cheaper, that's why it isn't everywhere.

  • marcan_42 4 years ago

    Intel's lower end workstation chips are the same silicon, and thus the same manufacturing cost, as their desktop chips. They just artificially disable features like ECC for product segmentation. It is unconscionable that something as essential as ECC is crippled out of the consumer line-up.

    • bluedino 4 years ago

      Except that the memory chips and motherboards also need to support ECC

      • marcan_42 4 years ago

        ECC costs $0 to support in motherboards (8 extra traces per DIMM slot; traces are free). Memory is where the consumer gets to choose whether to spend extra on ECC or not. There is absolutely no reason why consumers should be forced to pay extra for a CPU to get ECC when they are literally getting the same piece of silicon.

        • jeffbee 4 years ago

          It's odd how indifferent you are being about the energy costs of ECC. Memory now dominates the energy story of many systems. Filling an x86 cache line from DDR4 costs 1000x as much energy as a double-precision multiplication operation. ECC memory costs 12.5% more energy. That's a big, big difference.

          • marcan_42 4 years ago

            I'm not saying everyone should use ECC, I'm saying ECC should be an option for everyone.