Qwen3.7-Max: The Agent Frontier

153 points by kevinsimper 3 hours ago

goldenarm 17 minutes ago

The non-hallucination rate in AA-omniscience is SOTA, better than Opus 4.7, Gemini 3.1 Pro and GPT5.5! Congrats to the team

tekacs 1 hour ago

As they start to release more proprietary models, I so wish that they partnered with one of the major US hyperscalers to allow using these models through something US-domiciled.

Totally understand why it may not be reasonable or in their best interest (and that the US is _absolutely_ not doing the same reflexively). But it would be lovely to be able to try these out on production workloads in earnest.

embedding-shape 1 hour ago

Unless US hyperscalers do the same in reverse, I hope the status quo stays as it is. Either people are happy to share, and the sharing should happen both ways, or US hyperscalers can keep isolating themselves as they've done so far.
- adjejmxbdjdn 46 minutes ago
  
  I do hope The U.S. hyperscalers do the same as well.
  In an ideal world U.S. residents would use Chinese AI models and Chinese residents would use U.S. AI models.
  Governments in both countries are collecting data for nefarious reasons. But the Chinese government has far less influence on a U.S. resident and vice versa.
  We are all better off if our data is collected by a government halfway across the world instead of our own governments which hold incredible amounts of power over us.
  
  nickdothutton 32 minutes ago
  
  China is much more interested in waging a campaign against companies that represent the material of the future growth in productivity, exports, and prosperity of the US and her people, than learning about you as an individual. Unless of course you are a Chinese dissident living in the US.
  
  giancarlostoro 27 minutes ago
  
  Which is basically the current primary use for AI is programming more than anything, you hear about AI in programming more than in any other field.
  
  WarmWash 13 minutes ago
  
  China definitley wants information on all Americans. This commment is so far off the mark you it's on par with "Billionaires aren't interested in taking your money"
  As Americans go through life, some of them will become people with power. When you need to leverage that power, having the right knowledge about them can effectively transfer that power to you.
  Tiktok was a goldmine, because every 20-something on their way to a future position of power was uploading every single facit of their digital life to CCP servers everyday.
  
  giancarlostoro 28 minutes ago
  
  It would have been the world we live in if China wasn't involved in so much corporate espionage. I don't even feel comfortable using their open weight models on anything my employer makes, the only time I use Qwen is for greenfield "how good is this?" type of projects, but otherwise, how do I trust that it wont mysteriously hallucinate phoning home?
  On the other hand, there's other models where the source is 100% open, the training data is known, and people have reproduced the same model from scratch, so while those trail behind, there's definitely an effort to make models more open and capable.
  
  eloisant 12 minutes ago
  
  I agree, but the same goes for the US. Remember Echelon.
  
  CodingJeebus 15 minutes ago
  
  > We are all better off if our data is collected by a government halfway across the world instead of our own governments which hold incredible amounts of power over us.
  Sure, that is until each government's dataset is interesting enough to the other to facilitate a data-sharing agreement.
  There's gotta be an internet "law" that says something like "Eventually, the data you volunteer to a benign 3rd party eventually winds up being used against you by someone". This is short-term thinking at it's finest.
0xbadcafebee 27 minutes ago

I'm more interested in hearing specific reasons why one wouldn't use a Chinese company. Unless you're thinking Alibaba is going to ship chat logs to some government ministry that will then dole out proprietary information to new competitors (which doesn't seem logistically feasible), or you run a human rights organization, it feels a bit like FUD.
- tekacs 6 minutes ago
  
  … building and selling a product to US companies that sends company-internal data to Chinese AI providers is not a particularly good way to get people to buy it.
  Even if they weren’t individually worried about their proprietary data being shared with Chinese domestic competitors or with government… their audit / security programs likely wouldn’t allow it for a _huge_ range of types of data.
- vessenes 4 minutes ago
  
  All this data is accessible to national security agencies; this is true in every country in the world.
  China has more integration between intelligence and industry than many western countries, and it does present a higher risk of unwanted “tech transfer” to industry than running on oracle or Google or ms or Amazon does in the US.
  DHS has long staffed full time agents in California to deal with foreign IP exfiltration - using qwen is like fast/easy mode for IP exfiltration: why make anyone get a job in your palo alto office when you can just send it to them in Hanzhou?
  Upshot - If you have something proprietary you’re working on I would generally advise not to just direct send it to Alibaba.
epolanski 22 minutes ago

US hyperscalers, all of them, are financially invested in the US AI labs and have the incentives to keep the status quo.
motiw 5 minutes ago

ChatLLM support QWEN, do you consider this as US safe?

goyozi 2 hours ago

These are very good numbers. I still don’t get why they don’t compare against latest competitor versions in these posts, it’s not like we’re all not going to notice.

hmokiguess 1 hour ago

this puzzles me too, I want to know
htrp 1 hour ago

I think its part of the expectation setting (with a side of we did our distillation/ eval harness on a specific model).
if they say it's 4.7 comparable, it anchors that into your head as the model to evaluate against.
maelito 1 hour ago

Marketing.
Aurornis 1 hour ago

I think the argument is that trying to suggest that they’re close to N months from SOTA.
Realistically I assume they hope readers don’t notice the fine details.
The Qwen models are great for open weights but for every past release they haven’t performed as well as the benchmarks in my experience. They’re optimizing for benchmark numbers because they know it works.
- epolanski 21 minutes ago
  
  > Realistically I assume they hope readers don’t notice the fine details.
  The pool of people reading such articles while ignoring such details can't be big.
  
  Aurornis 18 minutes ago
  
  I disagree. Most people skim articles, not read them deeply.
  On Hacker News I wonder if most people even opened the article at all most times.
beydogan 48 minutes ago

honestly, initial version of Opus-4.6 was much better than whatever we are being served right now as 4.7. If it performs same level to that, i'm totally willing to switch.
NiloCK 46 minutes ago

I find it forgivable if it's within minor version bump. (NB that x.5 is now a defacto major-version bump for LLMs for whatever reason).
Even with LLMs, posts like this don't just fall out of a coconut tree. If you have a set of target benchmarks for your own model, then keeping "the set" of side-by-side comparable models is its own maintenance headache.

tarruda 1 hour ago

Looking forward to more open weight releases from Qwen, especially 122B and 397B.

smcleod 1 hour ago

Yeah that 60-150b~ range is such a sweet spot for current 'prosumer' hardware, I'd love to see something like a 120b-a14b or there about.
- gcr 1 hour ago
  
  What’s the price point for getting into that sweet spot?
  I’m on an M1 Max with 32GB VRAM, so I’m looking forward to the 27B or 35B-A3B models. Is dropping $5k for an RTX 6000 or a DGX Spark really the best option?
  
  tarruda 1 hour ago
  
  > What’s the price point for getting into that sweet spot?
  In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. https://frame.work/ is selling 128G strix halo mainboard for $2700, but you have to add storage and case.
  
  ttoinou 1 hour ago
  
  M5 Max 64GB (sweet spot) or 128GB (only 1000 USD, better to keep it for the future) more are the best quality price ratio, future proof, reliable, resellable and flexible workloads. Harder to use as a server might be the only drawback
  
  roger_ 1 hour ago
  
  M5 Max 128GB for $1k?
  
  smallerize 1 hour ago
  
  I think they mean the upgrade to 128GB is +$1k.
  
  tempoponet 1 hour ago
  
  The memory upgrade is $1k on a Macbook Pro. The laptop is ~$5500.
  
  throwaw12 1 hour ago
  
  What do you recommend for non-Mac setup? I am a Mac user, but its getting expensive, and not seeing reason to jump to the latest M5
  
  varispeed 14 minutes ago
  
  Probably a comparable non-Mac setup will be Threadripper, but it will become much more expensive. My view is that actually Apple products are the cheapest on the market when it comes to performance.
  
  anonym29 1 hour ago
  
  Strix Halo at $2k with similar TG and about half the PP of DGX Spark was a pretty good deal IMO, especially considering it's also a full x86 system... 16c/32t Zen 5, 40 CU RDNA 3.5, 128 GB unified memory at ~220 GB/s real-world speeds (256 GB/s theoretical) - that runs full tilt at 140W in performance mode and idles at ~10W.
  Unfortunately, the prices rose on these a lot, but unevenly. Beelink GTR 9 Pro is $4400, Framework Desktop is ~$3500, for what is basically the exact same mainboard as a Bosgame M5 for $2800.
  Apple's M5 Max is another attractive option. Apple silicon traditionally had great MBW and was good at TG, but struggled with PP, but the new neural engines in those GPU cores have made a big difference in a good way here.
  Gorgon Halo is rumored for June announcement with Q4'26 release with basically +100 MHz clocks on Strix Halo, LPDDR5X-8533 instead of LPDDR5X-8000, but more importantly, 192 GB max instead of 128 GB.
  I'd say it's better to wait for Gorgon Halo than to grab Strix Halo now. However, Medusa Halo, rumored for H2'27, is slated to have up to 26c Zen 6 (heterogeneous cores - kinds funny that AMD is heading towards these as Intel retreats from them), 48 CU of RDNA 5 instead of 40 CU RDNA 3.5, and a 384 bit bus w/ LPDDR6, which should make 256 GB at more like ~490-600 GB/s MBW, which will really make Strix and Gorgon Halo obsolete.
  Also worth keeping an eye out for Serpent Lake (intel CPU + nvidia iGPU on a single board with unified memory, rumored for 2028-2029 iirc), and on the 160 GB Crescent Island Intel dGPU.
  
  tempoponet 1 hour ago
  
  Expect to pay $4k-10k
  - Your RTX 6000 is closer to $10k now
  - Sparks are creeping into the $4-5k range
  - AMD Strix are ~3.5k
  - Apple depends on chipset and memory. Sweet spot would be 128gb M3 Ultra, probably $6-8k but admittedly haven't been tracking closely. New M5 might come in the fall. You can get a new 128gb M5 Max laptop for ~5-6k today.
  - a 4x3090 rig would take $5-6k
  Every platform has tradeoffs, but it's mostly ecosystem, memory bandwidth, and power consumption. They're all slow. The best option is likely to rent hardware on Runpod. The RIO on self-hosting is very low unless you have a specific need or you're ok treating it as a hobby.
  
  anonym29 1 hour ago
  
  Bosgame M5 (Strix Halo) w/ 128 GB still goes for $2800 right now. SH systems have surged in price dramatically but quite unevenly.
  >The best option is likely to rent hardware on Runpod.
  Vast.ai is much cheaper, but the broader point here is contestable. The only dimension in which cloud GPU rentals win is cost. You lose the confidentiality, integrity, and availability benefits of local deployments.
  
  ai_fry_ur_brain 35 minutes ago
  
  Rentals are priced to pay themselves off in 1-1.5 years (when renting them out per hour, not selling tokens). Its never a better option to rent.
  Not that I'd encourage anyone to throw large amounts of money to have access to LLMs, but you're definately going to be better off buying something that you can amortize over multiple years with a multi year warranty.
  
  ai_fry_ur_brain 39 minutes ago
  
  And for what? Spend 10-15k for the slopiest of slop code, non deterministic automations, and the ability to spawn an AI gf?
  This whole thing is really starting to remind me of the crypto hype phases of 2016-2018 when everyone thought their investment in GPUs was going to make them rich.
  
  organsnyder 37 minutes ago
  
  It is possible to get real work done with LLMs. There are plenty of ethical concerns, and they're definitely over-hyped, but they are exceptionally useful tools when used well.
  
  embedding-shape 1 hour ago
  
  If I could find a RTX Pro 6000 for $5K I'd definitively grab it, I'm running RedHatAI/Qwen3.6-35B-A3B-NVFP4 on one (I had to pay closer to $10K for it though) with 260K context and it's a blast! ds4 by antirez also works well, even IQ2XXS seems to work relatively well but Qwen3.6-35B-A3B-NVFP4 is both faster and higher quality responses (at least for coding and translations which I use them mostly for).
- tarruda 1 hour ago
  
  I have a 128G mac studio and even 397B was a happy surprise to me due to its high quantization resilience.
  I've created a 2.54BPW quant that fit on my hardware with 128k context, 20 tps tg and 200tps pp, while maintaining high scores on many benchmarks: https://huggingface.co/tarruda/Qwen3.5-397B-A17B-GGUF/discus...
  
  ttoinou 1 hour ago
  
  better than antirez ds4 ?
  
  tarruda 1 hour ago
  
  I only tried a very early version of that when it was just a llama.cpp fork and Qwen was certainly better in my tests.
  But I was not super impressed with deepseek 4 flash using it from the official API either, so it doesn't seem quantization fault. It is a good model, but nothing out of the ordinary in the few benchmarks I ran on it (with full awareness that benchmarks are biased).
  
  chrisweekly 51 minutes ago
  
  Apple store's current options for mac studio seem to max out at 96GB. I'm questioning ROI, esp. given it's not upgradeable. Curious about others' takes on new mac hardware.
  
  tarruda 40 minutes ago
  
  > I'm questioning ROI
  If by ROI you mean saving more money than using paid APIs, then I don't think it is worth it. All you gain is full sovereignty over your AI usage.
  
  drob518 32 minutes ago
  
  Currently, Apple is letting some of its models go out of stock in preparation for new models coming in a few weeks. I would expect at least 128 GB models at that time. That said, the memory crunch is hitting everyone.
mixtureoftakes 1 hour ago

I'm more excited for qwen3.7 9b and 72b, these are usually so good for their size
guitcastro 58 minutes ago

I am still waiting for qwem image-edit 2.0 open weight

bsenftner 1 hour ago

Any reports from people using their coding agent(s)?

rayboy1995 40 minutes ago

I'm running Qwen 3.6 27B Q5 K M GGUF on a Tesla P40 and koboldcpp using pi.dev as the harness, I gotta say I am impressed. Took some setup and configuring but I already have some code it has made commited and pushed. It can be slow on my hardware at >50k tokens, but the fact I bought this one P40 for like $150 back when the LLM trend started I can't complain. (I have a second one too but I couldn't physically fit the card in my server unfortunately.)
The setup I had to do was important and I had to compile koboldcpp with a few special params for my hardware, I mostly just had Claude figure it out. I don't remember everything I did now but it was very slow and would often stop mid task, it seems it was mostly a parsing issue. It made the model seem broken/dumb, but once I had all that settled I actually am able to use this how I use Claude Code. Disclaimer, I am pretty explicit with requirements, I imagine this fails more when you leave it to figure out things on its own but for my flow its pretty rad.
Currently setting it up as an automated agent now to pull Trello cards, create PRs for them, and move the card to be reviewed.
Command I am using to run: python koboldcpp.py \ --port 61514 --quiet --multiuser --gpulayers 999 --contextsize 262144 --quantkv 2 \ --usecublas normal --threads 4 --jinja --jinja_tools --jinja_kwargs '{"enable_thinking":true, "preserve_thinking":false}' \ --skiplauncher --model /data/models/Qwen3.6-27B-Q5_K_M.gguf --smartcache 5

bratao 1 hour ago

It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.

vessenes 1 hour ago

Some of it’s probably timing. Some of it is wanting to look good. That said, I just went to the claw-eval site, and neither 4.7 nor 5.5 from oAI are listed on the benchmarks. So there’s also just the time from others to get benchmarking done and published.
varispeed 38 minutes ago

Opus-4.6 was probably the best model so far before it got nerfed. 4.7 is nowhere near experience I had. In fact I stopped using it completely because more often than not its output is just dumber than local models.

XCSme 1 hour ago

Any info on pricing and latency?

esafak 31 minutes ago

Does anyone have experience with the Alibaba Cloud Model Studio?

howmayiannoyyou 1 hour ago

I can't bring myself to use any model that trains or sends telemetry back to my country's primary competitor/adversary. I don't care how much money is saved.

Mashimo 55 minutes ago

That is understandable. Just don't do it. No need to announce it.
InsideOutSanta 47 minutes ago

As somebody in Europe, uh, that doesn't leave many options.
- avazhi 8 minutes ago
  
  This is the current European modus operandi: virtue signal and cry about tech that other countries produce, pass local laws that limit its use in their countries even though they have no viable local alternatives, brag amongst themselves about decoupling from US and Chinese tech, and then look on wistfully as the rest of the world moves on without a single fuck given.
  Europe is a market whose sense of superiority and important to external markets are assbackwards.

dfansteel 1 hour ago

Can anyone check its knowledge base for me? I’m honestly not able to run it and the Qwen models I can run censor information critical towards the Chinese government.

Tiananmen Square is the first place to start.

Mashimo 53 minutes ago

> I’m honestly not able to run it
What do you mean? This is not self hosted, it's closed source. And any website that targets China or is hosted in China will probably censor Tiananmen Square.