Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data

56 points by tosh 4 days ago

jayd16 6 minutes ago

I can't tell from the blog, is this actually verified or is it theory and then numbers showing plausibility?

I could certainly come up with alternative theories about memory compression and prefetching if we were talking about texture reads.

nzach 35 minutes ago

I went in expecting to find 'branch prediction'[0] as the answer, but apparently things are even more complex nowadays.

kangalioo 13 minutes ago

To be fair, the culprit in the article is _less complex_ than branch prediction: "with random data, bits are flipped often, and bit flips in transistors inherently draw power" is less mental gymnastics than "with random data, the cpu fails to predict the future, causing redundant speculative execution"

evanjrowley 17 minutes ago

Designing for predictable execution flow is one of the advantages of Tenstorrent hardware.

gdevenyi 2 hours ago

People have been noticing the effects of this in local LLM inference. Power limiting seems to improve overall performance!

gchamonlive 1 hour ago

In general, constraints require optimizations and rearchitectures. I'd also expect the ram shortage for instance to have a big impact on the software industry as a whole, specially in games. They will need to make do with what people have, a ps5/pro or similar in PC power.
- aNoob7000 1 hour ago
  
  I actually think it is a good thing to introduce constraints to AI and the overall tech industry. Hopefully everyone will have to look at improving performance without having to add RAM or increase CPU/GPU performance.
Aurornis 19 minutes ago

This is not observable from LLM inference.
Power limiting does not improve performance but it does improve efficiency. You might be able to get 90% of the performance for only 70% of the power usage, for example. It does not make the card go faster though.