Every Byte Matters | Svelte Hacker News

noelwelsh 1 hour ago

The JVM is currently pretty bad for memory allocation. Every object (i.e. not a primitive) has a header that IIRC is 12 bytes. But there is good news in JVM land: this will be reduced to 8 bytes in the next JVM release, and Project Valhalla will give the tools to do away with headers entirely in some cases. Project Valhalla also has tools to manage off-heap memory, which is important in many cases.

The JVM is an odd place where it requires too much heap to compete with the AOT compiled languages, but its startup time is too slow compared to interpreted languages. I think these enhancements are essential to keep the platform relevant.

kakacik 1 hour ago

Most of real world use of Java platform has next to 0 concerns like those. Some more niche use case may benefit, good, but overall success map isn't changing anytime soon. Reasons for its long term success lie elsewhere.
- FartyMcFarter 50 minutes ago
  
  Android Java apps' memory consumption is definitely a relevant concern.
pron 1 hour ago

> Every object (i.e. not a primitive) has a header that IIRC is 12 bytes. But there is good news in JVM land: this will be reduced to 8 bytes in the next JVM release
Since JDK 25 it's already 64 bits with the `-XX:+UseCompactObjectHeaders` flag [1], but in JDK 27 it will be the default [2].
> where it requires too much heap to compete with the AOT compiled languages
Not to compete but to beat, and not too much, but the right amount. Low level languages are optimised for control, not performance (that control translates to better performance in smaller programs, and to worse performance in larger programs), and their particular constraints prevent them from enjoying certain important optimisations, especially those offered by JIT compilation and moving collectors, which remove some overheads that AOT compilers and free-list allocators incur. Their memory management is forced (by their constraints) to optimise for footprint rather than speed.
There are common misunderstandings about memory management and why moving collectors were created to reduce the CPU overheads of malloc/free, especially in large programs, in exchange for what is effectively free RAM. This is why moving collectors are chosen by the languages that are unconstrained enough to use them and have the resources to implement them (Java, .NET, V8). With the exception of Zig (and even there it requires some effort), it's hard for low level languages to use the basic optimisation that's behind moving collectors. I gave a talk about how moving collectors optimise memory management at the last Java One, and it should be available on YouTube soonish [3].
> but its startup time is too slow compared to interpreted languages
That hasn't been the case for some time. You are right, though, that startup/warmup time is worse than in AOT compiled languages, and that is the tradeoff of optimising JITs: reduce the overheads associated with AOT compilation in large program in exchange for warmup.
Both startup and warmup have already been improved thanks to Project Leyden's "AOT cache" [4], but it will never be as low as C.
In general, the tradeoff is between optimisations that help large programs vs optimisations that help small programs.
[1]: https://openjdk.org/jeps/519
[2]: https://openjdk.org/jeps/534
[3]: I can't reproduce the full talk (which goes into the maths of memory management) here but what happened with moving collectors was that until very recently (open source low-latency moving collectors are newer than ChatGPT), they required pauses and so weren't suitable for programs requiring low latencies. As a result, many developers either forgot or never learnt just how incredibly efficient moving collectors are. But the key is that because accessing RAM by necessity requires CPU, using CPU effectively captures RAM even it's not used by the program. Bringing the CPU and RAM usage into a good balance is more efficient than trying to minimise one or the other. This is also the reason why hardware (physical or virtual) is packaged within a very narrow band of RAM/core ratio.
[4]: https://www.youtube.com/watch

forinti 1 hour ago

So if you need speed, you just have to swallow your OO programmer's pride and put your data in arrays.

theandrewbailey 1 hour ago

Maybe someone can write an OO language where arrays of structs are automatically stored as structs of arrays.
mild /s
- Mizza 1 hour ago
  
  Are you talking about Zig's MultiArrayList?
  
  alex7o 57 minutes ago
  
  He is talking about jai the programing language from Jonathan Blow, which is quite cool but there is no way to access it.
- fp64 56 minutes ago
  
  Odin has some helpers, was one of the more interesting features I found, but never tried. Not sure if you want to consider Odin OO, but well https://odin-lang.org/docs/overview/#soa-struct-arrays
- tlb 46 minutes ago
  
  There's a package to do this in Julia: https://juliaarrays.github.io/StructArrays.jl/stable/
bob1029 25 minutes ago

And avoid moving said data between physical threads as much as possible.
Most of the bottlenecks I see are not due to the organization of data. Unnecessary communication of data is the #1 offender.

pron 1 hour ago

> The cost of each new field is rarely considered

Most developers, in Java and in most other languages, do not consider the cost of every field, but I can tell you that people who need micro-optimisations certainly do care, and in Java's standard library, a layout is very much a concern (except, as always, you want to optimise what really matters; there's no point in optimising something that is unlikely to be a hot spot in a real program). Sometimes, though, you want to intentionally spread out the layout to avoid cache line sharing when concurrency is involved. You will find such examples in the standard library, too.

AxelWickman 7 minutes ago

Cool read. The AoS vs SoA speaks for itself.

ssiddharth 1 hour ago

Slight tangent, but every ms, μs, and ns counts too. We've gotten awfully carefree with response times and wasted compute cycles.

coldcity_again 1 hour ago

I love to see stuff like this. And an active Vectrex gamedev and PC/Amiga sizecoder I strongly agree with the sentiment!

yas_hmaheshwari 1 hour ago

Out of course: I had thought about reading an article about Iran war or some geo political news when I read fzakaria :-)

RickJWagner 1 hour ago

That’s a great read. I wish more people wrote like that.

fdegmecic 34 minutes ago

CppCon 2014: Mike Acton "Data-Oriented Design and C++"
Andrew Kelley: A Practical Guide to Applying Data Oriented Design (DoD)
you should check these two talks out then.

coolThingsFirst 1 hour ago

Why doesn’t the machine fill up the other cache lines as well why is 64 bytes only and then a miss?

masklinn 39 minutes ago

They will absolutely do that (prefetching, they can even eagerly load what’s on the other side of a pointer).
However it requires additional hardware to recognize patterns which benefit from prefetching, and every time the CPU prefetches data which ends up not being used it has both burned energy and memory bandwidth, and evicted data from the cache which might be needed (cache pollution).
Liquid_Fire 39 minutes ago

It might sometimes prefetch the surrounding lines as well, but ultimately cache space is limited, so there is a trade-off. Every time you fill a line, you are throwing away something else that was cached there previously, which you may need again in the near future.