| Svelte Hacker News

points by superkuh 20 hours ago

AMD hasn't signaled in behavior or words that they're going to actually support ROCm on $specificdevice for more than 4-5 years after release. Sometimes it's as little as the high 3.x years for shrinks like the consumer AMD RX 580. And often the ROCm support for consumer devices isn't out until a year after release, further cutting into that window.

Meanwhile nvidia just dropped CUDA/driver support for 1xxx series cards from their most recent drivers this year.

For me ROCm's mayfly lifetime is a dealbreaker.

mindcrime 20 hours ago

Last year, AMD ran a GitHub poll for ROCm complaints and received more than 1,000 responses. Many were around supporting older hardware, which is today supported either by AMD or by the community, and one year on, all 1,000 complaints have been addressed, Elangovan said. AMD has a team going through GitHub complaints, but Elangovan continues to encourage developers to reach out on X where he’s always happy to listen.

Seems like they're making some effort in that direction at least. If you have specific concerns, maybe try hitting up Anush Elangovan on Twitter?

djsjajah 13 hours ago

> or by the community
Hmmm

SwellJoe 19 hours ago

Is it really that short? This support matrix shows ROCm 7.2.1 supporting quite old generations of GPUs, going back at least five or six years. I consider longevity important, too, but if they're actively supporting stuff released in 2020 (CDNA), I can't fault them too much. With open drivers on Linux, where all the real AI work is happening, I feel like this is a better longevity story than nvidia...where you're dependent on nvidia for kernel drivers in addition to CUDA.

https://rocm.docs.amd.com/en/latest/compatibility/compatibil...

Karliss 13 hours ago

You missed the note at the top "GPUs listed in the following table support compute workloads (no display information or graphics)". It doesn't mean that all CDNA or RDNA2 cards are supported. That table is very is very misleading it's for enterprise compute cards only - AMD Instinct and AMD Radeon Pro series. For actual consumer GPUs list is much worse https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/in... , more or less 9000 and select 7000 series. Not even all of the 7000 series.
- SwellJoe 13 hours ago
  
  I think that speaks to them not understanding at the time the opportunity they were missing out on by not shipping a CUDA-like thing to everyone, including consumer tech. The question is what'll it look like in a few years now that they do understand AI is the biggest part of the GPU industry.
  I suspect, given AMDs relative openness vs. nvidia, even consumer-level stuff released today will end up with a longer useful life than current nvidia stuff.
  I could be wrong, of course. I've taken the gamble...the last nvidia GPU I bought was a 3070 several years ago. Everything recent has been AMD. It's half the price for nearly competitive performance and VRAM. If that bet turns out wrong, I'll just upgrade a little sooner and still probably end up ahead. But, I think/hope openness will win.
  Also, nvidia graphics drivers on Linux are a pain in the ass that I didn't want to keep dealing with. I decided it wasn't worth the hassle, even if they're better on some metrics. I've been able to run everything I've tried on an AMD Strix Halo and an old Radeon Pro V620 (not great, but cheap, compared to other 32GB GPUs and still supported by current ROCm).

lrvick 20 hours ago

ROCm is open source and TheRock is community maintained, and in a minute the first Linux distro will have native in-tree builds. It will be supported for the foreseeable future due to AMDs open development approach.

It is Nvidia that has the track record of closed drivers and insisting on doing all software dev without community improvements to expected results.

KennyBlanken 19 hours ago

> expected results
The defacto GPU compute platform? With the best featureset?
- lrvick 19 hours ago
  
  And the worst privacy, transparency, and FOSS integration due to their insistence on a heavily proprietary stack.
  Also pretty hard to beat a Strix Halo right now in TPS for the money and power consumption.
  Even that aside there exist plenty like me that demand high freedom and transparency and will pay double for it if we have to.
  
  KennyBlanken 19 hours ago
  
  > And the worst privacy, transparency, and FOSS integration due to their insistence on a heavily proprietary stack.
  The market doesn't care about any of that. The consumer market doesn't care, and the commercial market definitely does not. The consumer market wants the most Fortnite frames per second per dollar. The commercial market cares about how much compute they can do per watt, per slot.
  > there exist plenty like me that demand high freedom and transparency and will pay double for it if we have to.
  The four percent share of the datacenter market and five percent of the desktop GPU market say (very strongly) otherwise.
  I have a 100% AMD system in front of me so I'm hardly an NVIDIA fanboy, but you thinking you represent the market is pretty nuts.
  
  lrvick 19 hours ago
  
  I did not claim to represent the market as a whole, but I feel I likely represent a significant enough segment of it that AMD is going to be just fine.
  I think local power efficient LLMs are going to make those datacenter numbers less relevant in the long run.

canpan 20 hours ago

I was thinking to get 2x r9700 for a home workstation (mostly inference). It is much cheaper than a similar nvidia build. But still not sure if good value or more trouble.

chao- 20 hours ago

Talking to friends who have fought more homelab battles than I ever will, my sense is that (1) AMD has done a better job with RDNA4 than the past generations, and (2) it seems very workload-dependent whether AMD consumer gear is "good value", "more trouble", or both at the same time.
Edit: I misread the "2x r9700" as "2 rx9700" which differs from the topic of this comment (about RNDA4 consumer SKUs). I'll keep my comment up, but anyone looking to get Radeon PRO cards can (should?) disregard.
- KennyBlanken 19 hours ago
  
  Given RDNA3 was a pathetic joke, it wouldn't be hard for them to do a better job.
cyberax 20 hours ago

I have this setup, with 2x 32Gb cards. It's perfect for my needs, and cheaper than anything comparable from NV.
stephlow 20 hours ago

I own a single R9700 for the same reason you mentioned, looking into getting a second one. Was a lot of fiddling to get working on arch but RDNA4 and ROCm have come a long way. Every once in a while arch package updates break things but that’s not exclusive to ROCm.
LLM’s run great on it, it’s happily running gemma4 31b at the moment and I’m quite impressed. For the amount of VRAM you get it’s hard to beat, apart from the Intel cards maybe. But the driver support doesn’t seem to be that great there either.
Had some trouble with running comfyui, but it’s not my main use case, so I did not spent a lot of time figuring that out yet
- canpan 19 hours ago
  
  Thanks for the answer. Brings my hope up. Looking in my local shops, I can get 3 cards for the price of one 5090.
  May I ask, what kind of tok/s you are getting with the r9700? I assume you got it fully in vram?
  
  jhgorrell 17 hours ago
  
  Stock install, no tuning.
  $uname -r 6.8.0-107-generic $ollama --version ollama version is 0.20.2 $ollama run "gemma4:31b" --verbose "write fizzbuzz in python." [...] total duration: 45.141599637s load duration: 143.633498ms prompt eval count: 21 token(s) prompt eval duration: 48.047609ms prompt eval rate: 437.07 tokens/s eval count: 1057 token(s) eval duration: 44.676612241s eval rate: 23.66 tokens/s
  
  theoli 16 hours ago
  
  I have a dual R9700 machine, with both cards on PCIe gen4 x8 slots. The 256bit GDDR6 memory bandwidth is the main limiting factor and makes dense models above 9b fairly slow.
  The model that is currently loaded full time for all workloads on this machine is Unsloth's Q3_K_M quant of Qwen 3.5 122b, which has 10b active parameters. With almost no context usage it will generate 59 tok/sec. At 10,000 input tokens it will prefill at about 1500 tok/sec and generate at 51 tok/sec. At 110,000 input tokens it will prefill at about 950 tok/sec and generate at 30 tok/sec.
  Smaller MoE models with 3b active will push 70 tok/sec at 10,000 context. Dense models like Qwen 3.5 27b and Devstral Small 2 at 24b will only generate at around 13 - 15 tok/sec with 10,000 context.
  This is all on llama.cpp with the Vulkan backend. I didn't get to far in testing / using anything that requires ROCm because there is an outstanding ROCm bug where the GPU clock stays at 100% (and drawing like 60 watts) even when the model is not processing anything. The issue is now closed but multiple commenters indicate it is still a problem. Using the Vulkan backend my per-card idle draw is between 1 and 2 watts with the display outputs shut down and no kernel frame buffer.
djsjajah 8 hours ago

I have 2 of them. I would advise against if you want to run things like vllm. I have had the cards for months and I still have not been able to create a uv env with trl and vllm. For vllm, it’s works fine in docker for some models. With one gpu, gpt-oss 20b decoding at a cumulative 600-800tps with 32 concurrent requests depending on context length but I was getting trash performance out of qwen3.5 and Gemma4
If I were to do it again, I’d probably just get a dgx spark. I don’t think it’s been worth the hassle.
- girvo 7 hours ago
  
  FWIW I’m in love with my Asus GX10 and have been learning CUDA on it while playing with vllm and such. Qwen3.5 122B A10 at ~50tps is quite neat.
  But do beware, it’s weird hardware and not really Blackwell. We are only just starting to squeeze full performance out of SM12.1 lately!

Shitty-kitty 14 hours ago

The splist CDNA/RDNA architecture is a problem for AMD. The upcoming unified UDMA will solve the issue.

hotstickyballs 20 hours ago

Driver support eats directly into driver development