Funnily enough, pasting your comment straight into Jimmy leads to a... Funnily suboptimal answer that does not answer the question.
As someone else already contributed, this is driven by a Canadian startup taalas that basically makes chips that are llms, so everything is very fast but also, baked into the chip. Once this kind of stuff is a commodity in like 10 years, our world will be very, very different.
Taalas HC1 AI uses Llama 3.1 8B, but takes up a massive 53B transistors and 815mm2 on TSMC N6 (nearly at the reticle limit of 858mm2). N2 is a little less than 3x as dense (110MTr/mm2 vs 313MTr/mm2).
This chip would still be 272mm2 on N2 which is an eye-watering $30k/wafer and bigger than a 9950x or Nvidia 5070.
This just isn't feasible. Some of the latest-gen LLMs seem to have 5-10T parameters or about 1000x more. I don't know that taping out just one chip makes economic sense let alone the 300-1000 chips required for a cutting-edge model. Things like continuing education so your model knows about the latest NPM packages or world news is super important, but seems like it would require new chips.
There are a TON of uses for an 8B parameter models on the edge, but this is WAY too big to put on the edge of anything. Something like a 10mm2 100m parameter voice model might be feasible on the edge, but only for expensive devices, but most of those are TSMC 28nm (up to 29MTr/mm2) or GF FDX22 (up to 40MTR/mm2) which would increase the AI chip to the point where it would absolutely dominate the BOM.
> Things like continuing education so your model knows about the latest NPM packages or world news is super important, but seems like it would require new chips.
They probably have a few ideas around that. Me, personally, I'd have one main expensive chip (replaced every 10 years, or whatever), with a secondary cheap chip in front of it that gets replaced every year or so.
The secondary chip could act the way RAG does, or perhaps both chips together can act as LoRA.
Either way, 99.999% of the knowledge is static, you just need to fine-tune the weights with that remaining 0.001% knowledge, which can be done using RAG or LoRA on a much smaller (thus cheaper) disposable chip.
Yeah, they're clearly just starting out and just shipped their very first proof of concept. But to me, their plans seem generally reasonable https://taalas.com/the-path-to-ubiquitous-ai/, and like I wrote, if this kind of thing succeeds and could become some kind of cheaply producible commodity component, I think there's huge value in that. Alas, maybe not as a frontier model replacement, but say 10 years from now you can drop a cheap raspberry pi like device in your Lan and have a fast local engine for things like text sentiment analysis, text summarisation, voice recognition, basic vision and things like that, that would be pretty exciting to me (but maybe as you outlined, impossible in practice)
Funnily enough, pasting your comment straight into Jimmy leads to a... Funnily suboptimal answer that does not answer the question.
As someone else already contributed, this is driven by a Canadian startup taalas that basically makes chips that are llms, so everything is very fast but also, baked into the chip. Once this kind of stuff is a commodity in like 10 years, our world will be very, very different.
Taalas HC1 AI uses Llama 3.1 8B, but takes up a massive 53B transistors and 815mm2 on TSMC N6 (nearly at the reticle limit of 858mm2). N2 is a little less than 3x as dense (110MTr/mm2 vs 313MTr/mm2).
This chip would still be 272mm2 on N2 which is an eye-watering $30k/wafer and bigger than a 9950x or Nvidia 5070.
This just isn't feasible. Some of the latest-gen LLMs seem to have 5-10T parameters or about 1000x more. I don't know that taping out just one chip makes economic sense let alone the 300-1000 chips required for a cutting-edge model. Things like continuing education so your model knows about the latest NPM packages or world news is super important, but seems like it would require new chips.
There are a TON of uses for an 8B parameter models on the edge, but this is WAY too big to put on the edge of anything. Something like a 10mm2 100m parameter voice model might be feasible on the edge, but only for expensive devices, but most of those are TSMC 28nm (up to 29MTr/mm2) or GF FDX22 (up to 40MTR/mm2) which would increase the AI chip to the point where it would absolutely dominate the BOM.
the flash models have fallen in size at least between deep seek models. Is there a limit to the shrinking capacity of the models?
> Things like continuing education so your model knows about the latest NPM packages or world news is super important, but seems like it would require new chips.
They probably have a few ideas around that. Me, personally, I'd have one main expensive chip (replaced every 10 years, or whatever), with a secondary cheap chip in front of it that gets replaced every year or so.
The secondary chip could act the way RAG does, or perhaps both chips together can act as LoRA.
Either way, 99.999% of the knowledge is static, you just need to fine-tune the weights with that remaining 0.001% knowledge, which can be done using RAG or LoRA on a much smaller (thus cheaper) disposable chip.
Yeah, they're clearly just starting out and just shipped their very first proof of concept. But to me, their plans seem generally reasonable https://taalas.com/the-path-to-ubiquitous-ai/, and like I wrote, if this kind of thing succeeds and could become some kind of cheaply producible commodity component, I think there's huge value in that. Alas, maybe not as a frontier model replacement, but say 10 years from now you can drop a cheap raspberry pi like device in your Lan and have a fast local engine for things like text sentiment analysis, text summarisation, voice recognition, basic vision and things like that, that would be pretty exciting to me (but maybe as you outlined, impossible in practice)
That’s why this stuff should be a government mega project ultimately.
It is not market viable but it is sure as heck revolutionary. Like an atomic bomb but including more… peaceful uses.
That’s exactly where government should take rein like with ISS etc. However the models are too rapidly advancing for now for it to make sense
https://taalas.com/
Taalas https://taalas.com/the-path-to-ubiquitous-ai/
Previous HN discussion: https://news.ycombinator.com/item?id=47103661