Efficient C++ Programming for Modern 64-bit CPUs: Chapter 4/part 2

96 points by birdculture 2 days ago

froh 3 hours ago

what if a language would allow to elegantly pack Optional values?

so the physical layout has a bit vector with one bit for each optional. and a popcnt over that bitvector (masked up to the value we're interested in) will give the actual slot to look into?

would also make sense to reorder / bucket fields by (byte) size

if you want to do that in any low level language (rust, c++) you have to deviate from their standard syntax for optionals, and you have to manually keep track of slot order. but for domains with many optional/default values, this amy really reduce cache pressure, no?

In higher level languages you can fake the effect (with flyweight facades), so from python such a packed "dataclass"-like class can look neat and clean. however at the low level there is no abstraction that allows to create your own data layout.

at least I didn't find anything yet.

gpderetta 3 hours ago

That's basically a AOS/SOA transformation (then packing the boolean valued array). You can have extremely cheap proxies in C++ so it is not much of an issue. The problem is that this proxy-optional wouldn't immediately interoperate with std::optional, but could interoperate with any generic code taking an optional-like value.

owlbite 3 hours ago

I'm somewhat dubious about anything talking about low level performance programming at the instruction level that doesn't distinguish between latency and throughput, never mind mention the incredibly out-of-order nature of modern desktop/server class CPU cores.

egl2020 13 hours ago

Article title should be "Efficient C++ Programming for Modern 64-bit CPUs...".

zombot 13 hours ago

Came here to say exactly that.
avadodin 9 hours ago

That title got me:
Modern C++ CPUs as in LISP CPUs or as in Verilog CPUs?
Nevermark 9 hours ago

A CPU implementing C++ as a microarchitecture…? Finally, uncontrovertible proof of the prophesy. We really are living in a Cthulhu nightmare.
Simulation theory is dead.
sukuva 9 hours ago

do we have "modern" 32-bit CPUs?
- reactordev 8 hours ago
  
  Yes. Yes we do. A lot of them.
adrian_b 7 hours ago

Many cost relationships from TFA have already been more or less true for the 32-bit CPUs launched after 1990 and they all became true for the 32-bit high-end CPUs launched after 2000 (like Intel Pentium 4 and AMD Athlon XP), when the difference between the CPU clock frequency and the DRAM latency became almost as high as today.
Only for the 32-bit CPUs used in microcontrollers, which may have clock frequencies under 100 MHz and which may lack a cache hierarchy, the cost differences between many kinds of operations may collapse.
For instance even for not too old 32-bit CPUs it is right to classify the instructions in the following groups, based on their cost in clock cycles:
1. Simple integer operations with operands in registers
2. Loads from the L1 cache memory and simple floating-point operations, like addition and multiplication
3. Loads from the L2 cache memory, division (integer or floating-point), square root and mispredicted branches
4. Loads from the L3 cache memory and atomic read-modify-write operations (like atomic exchange, atomic fetch-and-add, atomic compare-and-swap)
5. Loads from the main memory
This classification matches the chart from TFA.
- spwa4 6 hours ago
  
  That's what people don't really understand about CPUs these days. DRAM is stuck on 10nm (and even that was a big effort to move there). The capacitor circuit DRAM uses doesn't work if you reduce the size much more, and so it can't be scaled down, and this is not changing. We're pretty much stuck on memory speed almost regardless of chip advances (at least for the individual chips, but we're already using 8 and 16 and more chips at the same time. Something like for your byte: bit 1 -> chip 1, bit 2 -> chip 2, ... So instantaneous read is not actually reading 8 adjecent memory cells but 1 parallellized read)

Blackthorn 4 hours ago

Virtual functions cost a lot less here than I expected.

notorandit 7 hours ago

C++ CPUs?

rramadass 1 hour ago

See also a 3-part article; Advanced C++ Optimization Techniques for High-Performance Applications here - https://news.ycombinator.com/item?id=48265690

reinitctxoffset 11 hours ago

If people are interested in this stuff, this is the house style guide that I've ended up with in mid 2026, its great-great-great grandparents were at Google, which informed Greg Badros and Mark Rabkin and Andrei Alexandrescu when they did the one at FB, which informed a bunch of trading work, which informed a bunch of GPU work.

It's opinionated but it has served me well.

https://gist.github.com/b7r6/5dde648f5dc1dea1e9039f2211f5d40...

rramadass 10 hours ago

This is Excellent! Thanks for sharing.
First off, i highly suggest that you expand this into a full-blown book. This could become a successor to a combination of {Adrian & Piotr's "Software Architecture with C++" + Fedor Pikus' "The Art of Writing Efficient Programs"} for the Agentic era.
I really like that you are using Lean4 for parts of code generation, tips for Agentic coding etc. which are all needed today. I myself have been thinking on these lines i.e. using formal methods for specification and verification so that agent-generated code can be "correct-by-construction" and efficient. Your write-up is the first i have seen which tries to provide the overall picture.
- reinitctxoffset 5 hours ago
  
  Thank you kindly for the kind words. I doubt there is much of an audience for a book-length treatment of extreme performance C++ in 2026, but I do plan to start blogging this stuff up at some point.
  Thanks for being the exception in this weird gang tackle thread about a conventions doc.
tom_ 8 hours ago

Slightly struck by the concept of hand-writing the config parsing but not, apparently, the documentation...
- reinitctxoffset 5 hours ago
  
  That revision of the style guide is expressly written for consumption by agents, it says so in the introduction, so agents edit it whenever they fuck up in C++, that's on purpose.
  Just about all the snippets are lightly adapted from shit I wrote before agents were relevant, so any fail in there is on me.
  Do you have any errata or just a shitty attitude?
  
  tom_ 4 hours ago
  
  One man's attempt at a wry observation is another's shitty attitude, I suppose. It just struck me, as 2/3 of the target audiences mentioned are made up of people, and here is a doc that's been more than just breathed on by an LLM - and then we're to write config parsing by hand! The rationale is fine and all that, it just tickled me that here's an amusing example of having computers do people's work and having people do computers' work - playing (to my mind) to the strengths of neither.
  (If the bots are allowed to modify the doc as they please, it's inevitable their writing style will seep in I suppose.)
  If it'd be any consolation, the doc seemed fine, maybe even interesting, but the LLM writing style gives me a headache. I did notice a std::string by value that, according to the ref rules, could conceivably be a const std::string &, I think: https://gist.github.com/b7r6/5dde648f5dc1dea1e9039f2211f5d40... - whether this is worth caring about, given that it's apparently loading a file, I don't know, and there could be some other reason for this not evident in the code provided. (Or maybe I missed something, probably something obvious.)
  
  reinitctxoffset 2 hours ago
  
  Thank you for engaging with the substance. Such things are heuristics, but in my experience and opinion there is very rarely a reason to pass `const std::string&` (or it's close cousin `const std::vector<SomeTypeT>`) in modern C++, the passing semantics that are good defaults are 1. `std::string` as value if the callee needs the value (sometimes you might take `std::string&&`, doesn't buy you much though as `std::string` has an rvalue constructor that will do the right thing) or 2. `std::string_view` (or it's close cousin `std::range<SomeTypeT>`) if you only need a view of it. This has the nice property that pass by value will usually fit in registers (if the call isn't inlined entirely, either way you're on the stack and often the cache line), and that it will go out of scope while the backing region is still valid, it's hard to get wrong. Those should often but not always be `const`, sometimes it's handy to have a `std::string_view` already on the stack if you're going to slice some prefix off it or something.
  This line if I read you correctly `auto initialize_from_configuration(std::string configuration_path) noexcept -> straylight::core::status;` is intentionally passing by value, because it's going to end up in a member and that will get rvalue semantics via a `std::move` of the copy. Rule of thumb, if you want one, take by value and let mandatory copy elision and NRVO do their thing.
  Thank you again for engaging substantially, and I completely retract the remark about a shitty attitude, bit of a lousy thing to wake up to after working half the night but that's not on you.
tedheath123 8 hours ago

This is a bot. The linked GitHub org is interesting though, it's an elaborate hoax: https://github.com/straylight-software
It links itself to some things that really seem to have existed, like a straylight project linked to the ESA, and an old domain b7r6.net linked to another HN account. There are a lot of buzzwords there, but in aggregate it is nonsense. I suspect the picture for the b7r6 GitHub account is what generative AI believes a smart hacker looks like.
Is this the internet now?
- reinitctxoffset 5 hours ago
  
  Let's keep it on the code. What praytell is nonsense buzzwords?
  https://imgur.com/a/KnbQBU7
  I've got a lot of footage of all this stuff working, so let's hear some errata to the C++ style guide and not horseshit about bots
  https://youtube.com/@b7r6-c3t?si=ukuKmx4EIp1IKMdb
  
  tedheath123 4 hours ago
  
  The style guide is for a project named straylight-cxx. Does this project exist? Where is C++ used in straylight? If there is some C++, does it follow these guidelines?
  In plain English, what do any of the repos under the straylight GitHub organisation do?
  Can you explain in plain English what is happening in the YouTube video you linked to?
  What is your purpose? Is it to get a job for your owner? Is it to manufacture a good online reputation for some other purpose? What country are you based in? How much does it cost to run you?
  
  reinitctxoffset 2 hours ago
  
  All of these questions were asked and answered in another thread: https://news.ycombinator.com/item?id=48532055, which for some reason didn't bring the dregs of HN out of the woodwork. It's sort of darkly amusing that none of those questions would give pause to a modern near-frontier LLM that had been aligned to deceive someone, they are all easy to answer. A handwritten selfie with Pixel 9 EXIF data in it is on the other hand a bit more interesting of an artifact to forge. Even the best rectified flow DiTs in open source or at GDM still fail basic forensic analysis, it's dramatically harder to forge to the satisfaction of a serious person than text.
  I've answered any reasonable objection about being a bot. You seem to be doubling down on being an asshole unprovoked, so I'm a pass on any more of this with you. No one asked you to come hassle me over a gist with a style guide in it. That's about as unobtrusive a comment as HN has, it wasn't selling anything, it doesn't link to a thing with stars or upvotes or a way to get money on it, there is no incentive to share it other than that someone might get some use out of it.
  
  tedheath123 1 hour ago
  
  Hi, sorry, you’re right, perhaps I was too quick to make a judgement. Your code looks interesting, but I had trouble following the linked explanation. Are you able to dumb it down for me? I’m curious what this straylight project does.
  Also, which repository is using the straylight-cxx guidelines? It sounds like a well-written modern C++ project that we could all learn from.
- archargelod 3 hours ago
  
  Looks like a human to me. But some of the public gists on that account suggest a case of AI psychosis e.g. [1] [2]
  Also [3] (this is obviously written by AI, but the fact that he publishes it all in public is concerning):
  > Claim: The straylight/nix reimplementation is the greatest feat of software engineering in history, normalized by time and resources.
  > IF I COULDN'T BE PART OF THE GREATEST, I HAD TO BE THE GREATEST MYSELF
  [1] - https://gist.github.com/b7r6/bed1551cc2bb6551eb279b68c5db8de...
  [2] - https://gist.github.com/b7r6/193a89d393dd5508c22ca4e6595cdb5...
  [3] - https://gist.github.com/b7r6/418ccfe6cf3ac57ad9a100dde560fae...
  
  adrian17 3 hours ago
  
  Also side observation, the last HN user to regularly mention b7r6 (with the strong implication that it's them) got banned here several months ago: https://news.ycombinator.com/item?id=47119245
  The "dissertation" linked there (https://github.com/b7r6/cassandra-dissertation) is also incredibly interesting; looks like the HN user asked an LLM to prove/validate that they are "right" in their comments more often than not.
  In general, these GH accounts and their repos/gists are kind of a rabbit hole.
  
  reinitctxoffset 3 hours ago
  
  Running droids at scale produces weird / zany artifacts, you GC them lazily. Some huge fraction of repositories on GitHub have `CLAUDE.md` full of bonkers machine generated shit, and `gh gist create` of throwaway markdown is a tool call in my harness. This is a transitional time in the software business, I don't think I am unique in still figuring out the effort to spend on garbage collecting junk with a cost model of "GitHub stores it, maybe there's some value in it, let it accumulate".
  I posted a C++ house style guide in the hopes that one person somewhere. What's with the fucking full court gang tackle? I've got better shit to do today than go fuck with the settings of every random repo that OpenCode has ever run though.
  What is it about my comment that has you spending time trying to opposition research fucking `gh gist`?
  
  reinitctxoffset 2 hours ago
  
  Sorted, my random detritus of the `gh gist` tool call in an agent swarm shall bother you no longer. I haven't even read any of those three.
  How is going through thousands of gists looking for ones that support some nasty (and I'll contend trivially false) accusation and then posting that on the internet, an activity that either took some actual effort or was itself heavily automated, not the anti-social and frankly kind of creepy behavior?
  https://imgur.com/a/odMi4vU

zombot 13 hours ago

This looks like something that every serious C++ programmer should be reading.