Gemini 3.5 Flash

blog.google

235 points by spectraldrift 2 hours ago

https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...

simonw 57 minutes ago

The pelican is a lot: https://github.com/simonw/llm-gemini/issues/133#issuecomment...

Not a great bicycle though, it forgot the bar between the pedals and the back wheel and weirdly tangled the other bars.

Expensive too - that pelican cost 13 cents: https://www.llm-prices.com/#it=11&ot=14403&sel=gemini-3.5-fl...

hedgehog 54 minutes ago

That pelican looks like it's in Miami for a crypto conference.
- xattt 33 minutes ago
  
  It looks like it’s been partying for 60 years based on the wrinkles on its pouch.
- joseda-hg 25 minutes ago
  
  It looks like the starting soon screen of a crypto presentation
- egillie 8 minutes ago
  
  and somehow in 1992
hydra-f 48 minutes ago

Same old issue with Gemini models trying to "enrich" everything
nashashmi 42 minutes ago

Beats a human by like 10$
- unglaublich 30 minutes ago
  
  So according to Google logic, the value of the pelican is $10-eps. (They applied that reasoning to conversions via adwords)
irthomasthomas 38 minutes ago

This is a perfect illustration of something I noticed with llm progress. Ask them to improve an svg like this, and it never fixes the missing crossbar or disconnected limbs, it just adds more stuff. In this example they have obviously improved greatly, and it contains a ridiculous amount of detail, but they still to get the basic shape of the frame right.
gcgbarbosa 33 minutes ago

funny that when I try the same prompt, gemini generates an image, not an SVG. something is not right.
- simonw 17 minutes ago
  
  That's likely because you're using the Gemini app which has a tool for image generation (nano banana) - I do my tests against the API to avoid any possibility of tool use.
  
  nickmccann 7 minutes ago
  
  This question makes me wonder if you one shot each pelican or do you run it a few times to get the best one?
smcleod 31 minutes ago

I feel like it embodies Google's vibe of an uncool guy trying to stay relevant to the youth pretty well.
holtkam2 31 minutes ago

at a certain point you're gonna need to change your benchmark because this will end up in the model's training set
- simonw 17 minutes ago
  
  Gemini were the team most likely to have this in their training set - see https://x.com/JeffDean/status/2024525132266688757 - and yet their latest model still messes up the bicycle frame!
tantalor 25 minutes ago

Forgetting the chainstay is typical of asking random people to draw a bicycle.
https://www.gianlucagimini.it/portfolio-item/velocipedia/
> most ended up drawing something that was pretty far off from a regular men’s bicycle

GodelNumbering 1 hour ago

Per million input/output tokens:

Gemini 2.5 flash: $0.30/$2.50

Gemini 3.0 flash preview: $0.50/$3.00

Gemini 3.5 flash: $1.50/$9.00

Interesting pricing direction. I don't think we have ever seen a 3x price increase for in the immediate next same-sized model (and lol @ 3 only ever getting a preview).

3.5 flash costs similar to Gemini 2.5 pro which was $1.25/$10

dbbk 1 hour ago

I don't think they're really comparable. Seems they created the Flash-Lite tier to take the spot of the old Flash models.
- GodelNumbering 1 hour ago
  
  No, 2.5 had both flash and flash lite.
  
  mlmonkey 5 minutes ago
  
  It is Google, after all ....
rudedogg 1 hour ago

If Google is actually getting cheaper inference than everyone else with their TPUs, this smells like trouble to me. Maybe serving LLMs at a profit is proving difficult.
Or maybe they think because their benchmarks are good they can ramp up the prices. Seems like they don’t have the market share to justify a move like that yet to me.
- IncreasePosts 1 hour ago
  
  Maybe the margins are just very large for Google because they predict so much demand for 3.5?
  
  GodelNumbering 1 hour ago
  
  This combined with locally runnable models getting pretty good recently (e.g. Qwen 3.6) tells me that it's time to seriously consider local dev setup again
  
  MASNeo 55 minutes ago
  
  Besides the cost you get the control, transparency and ability to identify small language models or LoRA you want to serve even more cost effective.
  
  cft 17 minutes ago
  
  This should become the new Apple's hardware and software play. I am hopeful about the new CEO
- tempaccount420 56 minutes ago
  
  This is not priced at inference cost.
  My guess: it's the price at which they make more money than if they rent the TPUs to other companies.
  The Gemini team has had trouble securing enough TPUs for their user's needs. They struggle with load and their rate limits are really bad. Maybe at a higher price, they have a better chance at getting more TPUs assigned?
- spyckie2 4 minutes ago
  
  Its probably that in 1 or 2 years local (free) models will completely take the place of cheap models so cheap models need to move up the quality chain.
  You have free local models for most tasks, $20 subscriptions for near-frontier intelligence, and API per token costs for frontier intelligence.
  Flash seems to be targeting the near-frontier category.
fnordsensei 1 hour ago

3.5 flash is listed as stable rather than preview, or am I misreading?
https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...
- GodelNumbering 1 hour ago
  
  ah I mistakenly wrote preview
dr_dshiv 1 hour ago

3.1 flash lite — $0.25/$1.50 — plus insanely fast.
3.1 flash lite isn’t quite as good as 3 flash preview (which is the most incredible cheap model… I really love it) — but 3.1 is half the price and the insane speed opens up different use cases.
For comparison, Opus models are $5/$25
- SwellJoe 24 minutes ago
  
  Opus 4.7 is smarter than even Gemini 3.1 Pro on nearly every metric, though. You're comparing apples to oranges. Gemini 3.1 Flash is somewhere in the neighborhood between current Haiku and Sonnet, I think? Still a better value than the Anthropic models, I guess, which are quite pricey.
  Since Gemini 3.5 Flash is raising the price to $1.50/$9.00, it's priced between Haiku and Sonnet. If it outperforms Sonnet, it remains a good value, I guess. Though DeepSeek V4 Flash is much cheaper than all of them, and seemingly competitive.
doginasuit 1 hour ago

They probably never intended to keep serving cheap models. This is a natural way to introduce the squeeze, now that they have people who built services on their API. It makes a lot of sense to have an abstraction layer where the provider doesn't matter. If you are working in Kotlin, Koog is excellent.
- hnarn 19 minutes ago
  
  > now that they have people who built services on their API
  People really can’t wait to be the next Zynga
ilia-a 1 hour ago

Yeah, it is a massive jump in price, hardly a "Flash" model anymore... I wonder if they'll release flash lite or something with a bit more affordable price point.
LetsGetTechnicl 1 hour ago

Gen AI is unprofitable, especially at the insanely cheap rates they've been offering to get people in the door. So expect more increases in the future.
- GaggiX 50 minutes ago
  
  If you don't need SOTA or near SOTA there are plenty of dirt cheap models, just look at Gemma 4 31B on Openrouter.
  
  ai_fry_ur_brain 25 minutes ago
  
  Imagine reducing yourself to using any of this. My brain is 15X more capable than Opus or the smallest models.
  Miss me with all this laziness. You and yout LLMs will never compete with the work I can do all by myself.
  Have fun reducing yourself and betting your competency on the quality and quantity of tokens you can afford.
- npn 22 minutes ago
  
  It is insanely profitable though, if you cut out r&d cost, plus the marketing and loss leaders. Don't let them gaslight you.
  Even anthropic who does not own any hardware still have a big margin providing claude models.
- roadside_picnic 18 minutes ago
  
  These companies are unprofitable (as all companies at this stage and ambition should be) but I increasingly don't see any justification for the idea that it is fundamentally unprofitable.
  Inference alone is certainly profitable. I'm running models at home that are comparable to performance of paid models a year or so ago for free. Even for much larger models the cost around inference serving are clearly manageable.
  Training is where the costs are, but I'm increasingly convinced those too could have costs dramatically reduced if necessary. Chinese companies like Moonshot.ai are doing fantastic work training frontier models for a fraction of the cost we're seeing from Anthropic/OpenAI.
  This isn't like Uber or Doordash where the economics fundamentally don't make sense (referring to the early days of these services where rates were very cheap).
  It's a compelling story that "current AI is unsustainable", but it doesn't pan out in practice for a multitude of reasons (not the least of which is that we can always fall back to what models did last year for basically free).
hei-lima 51 minutes ago

We need another "Deepseek moment" or else it will become impossible for the regular dude to use AI. It will become something that only big companies can afford.
- segmondy 43 minutes ago
  
  You can use lots of open weight models today.
  
  hei-lima 21 minutes ago
  
  That's one solution to the problem. But it still needs some good computational capabilities. Either we optimize the hell out of those models, or we wait for the hardware to become good enough for them.
- squidbeak 38 minutes ago
  
  Deepseek had another moment a few weeks ago. V4 isn't far behind the US frontier, and so far its flash variant seems a very reliable coder and costs a pittance.
  
  ai_fry_ur_brain 28 minutes ago
  
  Deepseek V4 (not flash) trippled in price too by the way (from Deepseek). Get used to this pattern.
  This is what you get for relying on the generosity of billionaires. Keep offshoring your thinking ability to a machine and let me know how competitive you. Hint, you wont be. There's nothing special about being able to use an LLM.
  
  npn 25 minutes ago
  
  Unlike other providers, Deepseek does promise that they will lower the price when their Huawei cards arrive in a few more months.
  
  aurareturn 24 minutes ago
  
  I think demand is too great and compute is not enough. Nothing to do with billionaires colluding to increase prices by 3x.
  
  dpoloncsak 20 minutes ago
  
  Mate why are you so mad at people upset the price trippeled? It's a fair complaint that people built services using the cheaper ones with the expectation future models would be similarly priced. You can avoid 'offloading thinking' while still building ontop of these models
  
  ls612 13 minutes ago
  
  Anyone can host Deepseek V4 on rented GPUs and sell inference on it. Price will very quickly converge to the marginal cost of inference. This is as close to a pure commodity as it gets in the AI space so competitive market economics will put in work. Same is true for any open-weights model.
  
  ai_fry_ur_brain 5 minutes ago
  
  You dont understand the costs involved to run inference at scale
  Please go run some numbers.The hardware needed to Run Deepseek v4 flash at 20 tps for a single session is nowhere close to what is required to run it at 50tps for 5,000 concurrent sessions.
  Imagine what it takes to be profitible when running at 150 tps for 30cents per 1mm. You make less than 1k per month and the hardware required to run that cost 10k a month to rent with hardly any concurrent session capability.
- GeorgeOldfield 34 minutes ago
  
  gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh
  
  k8sToGo 8 minutes ago
  
  Are you really comparing flash to opus? Shouldn't you be comparing pro?
irthomasthomas 47 minutes ago

And they are using this to power search answers?
- CooCooCaCha 12 minutes ago
  
  I bet the API pricing helps pay for search users
photonair 46 minutes ago

In general, Gemini flash is still relatively cheaper compared to the "mini" version of the other big 2. However, I agree that newer version seem to have multiple X price increase (similar to the new ChatGPT) and we certainly need competition from the open source models to keep these guys in check with pricing.
llm_nerd 36 minutes ago

It might be temporary pricing given that 3.5 Flash is actually superior to the existing 3.1 Pro in almost all regards, so they're in a bit of a lurch as 3.1 Pro really doesn't make sense given that 3.5 Pro has been delayed a bit.
SwellJoe 35 minutes ago

That's a lot. DeepSeek v4 Flash is just over a tenth the price, and DeepSeek v4 Pro is roughly the same price (currently heavily discounted, but will be $1.74).
I mean, the benchmarks for Gemini 3.5 Flash are very strong, but at those prices it has to be. I guess the time of subsidized tokens from the big guys is slowly coming to an end.
WhitneyLand 19 minutes ago

Their rationale might be that it’s size and intelligence are growing relative to the market.
Fwiw it’s beating Claude Sonnet in most benchmarking (benchmaxxing?), yet they’ve priced it almost half off on a per token basis.
Question is are you going to persuade anyone with this argument?
Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.

SXX 2 hours ago

  > Create animated SVG of a frog on a boat rowing through jungle river. Single page self contained HTML page with SVG

3.5 Flash: Thinking Medium - 7516 tokens

https://gistpreview.github.io/?5c9858fd2057e678b55d563d9bff0...

3.5 Flash: Thinking High - 7280 tokens

https://gistpreview.github.io/?1cab3d70064349d08cf5952cdc165...

3.1 Pro - 28,258 tokens

https://gistpreview.github.io/?6bf3da2f80487608b9525bce53018...

Though 3.1 took 3 minutes of thinking to generate, but it only one that got animated movement.

abi 2 hours ago

Your links are broken FYI.
- John7878781 2 hours ago
  
  They work for me.
  
  TacticalCoder 1 hour ago
  
  They do work here too.
captn3m0 1 hour ago

All three links animate for me.
- NitpickLawyer 1 hour ago
  
  I think they mean the boat is moving. In the flash ones the paddles are animated but the boat is stationary for me.
  
  codazoda 1 hour ago
  
  The boat moves in all three for me
  
  Fishkins 1 hour ago
  
  The boat itself rocks, but do you see the background changing to indicate the boat is progressing through the environment? I only see that in the 3.1 Pro example. I believe that's what the OP meant.
  
  Manuel_D 1 hour ago
  
  I think this illustrates the problem with OP's prompt. If the goal is specifically to implement a scrolling background, this should have been in the prompt.
  
  SXX 1 hour ago
  
  Yup. My bad. It was just first idea that come to my mind since I enjoy visually compare each new release with unique prompts.
wslh 1 hour ago

Can you try with a more complex story such as "three little pigs"? I tried but it created a storybook instead of the SVG animation. I am looking to partially imitate Godogen [1][2] which is really great, even for animations.
[1] https://github.com/htdt/godogen
[2] https://drive.google.com/file/d/1ozZmWcSwieZQG0muYjbj7Xjhhlz...
- SXX 11 minutes ago
  
  I think it's unreasonable to expect models generate complex stories in single prompt since they trained to be concise, but I tried. This is prompt on top of story with no control buttons request:
  Now think, plan how to tell this story in a cartoon, make scene outline and then generate SVG animation story for "Three Little Pigs" in self contained HTML page. Just single animation no control buttons.
  Full prompt in gist comments: https://gist.github.com/ArseniyShestakov/ed9faa53604035005ca...
  Actual results for models, one shot:
  Gemini 3.5 Flash - Three Little Pigs - 9,050 tokens:
  https://gistpreview.github.io/?ed9faa53604035005cae86c63c766...
  Gemini 3.1 Pro - Three Little Pigs - 24,272 tokens:
  https://gistpreview.github.io/?f506bbfd9b4459c8cd55d89605af8...
  Gemini 3.5 Flash - Three Little Pigs - 9,050 tokens:
  https://gistpreview.github.io/?ed9faa53604035005cae86c63c766...
  Gemma 4 31B IT - Three Little Pigs - 5,494 tokens:
  https://gistpreview.github.io/?a3aa75abbe8fd7818b73f6fa55ee6...
  Gemma 4 26B A4B IT - Three Iittle Pigs - 6,375 tokens:
  https://gistpreview.github.io/?1e631caebeb54f9f0cd6d0e3d4d5e...
SXX 1 hour ago

Gemini 3.1 Flash Lite Thinking High - 2,526 tokens:
https://gistpreview.github.io/?3496285c5dac5ba10ebbc0b201a1a...
Gemini 2.5 Pro - 5,325 tokens:
https://gistpreview.github.io/?cc5e0fefeaaffecd228c16c95e736...
Gemini 2.5 Flash - 7,556 tokens:
https://gistpreview.github.io/?263d6058fe526a62b8f270f0620ec...
Gemma 4 31B IT - 3,261 tokens via AI Studio:
https://gistpreview.github.io/?858a42b96af864859a3b89508619d...
Gemma 4 26B A4B IT - 4,034 tokens via AI Studio:
https://gistpreview.github.io/?4adb7703897e0c6b583f9de928e4a...
- SXX 1 hour ago
  
  Gemma 4 E4B it via Edge Gallery on pixel phone:
  https://gistpreview.github.io/?da742884e5e830ce71ee4db877519...
  OFC this is just for fun, but nevertheless gave me working code on first try.
abtinf 1 hour ago

hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF @ Q6_K
8112 tokens @ 52.97 TPS, 0.85s TTFT
https://gistpreview.github.io/?7bdefff99aca89d1bc12405323bd4...
Full session: https://gist.github.com/abtinf/7bdefff99aca89d1bc12405323bd4...
Generated with LM Studio on a Macbook Pro M2 Max
https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6...
- SXX 1 hour ago
  
  Well, honestly this is quite impressive compared to 3.1 Flash Lite and 2.5 Pro. Considering that 2.5 Pro is actually quite good at generating massive amounts of code one shot.
- svnt 9 minutes ago
  
  It isn’t animated at all for me?
  
  SXX 7 minutes ago
  
  It is animated just no movement like on my 3.5 flash examples. Try different browser might be unless it iOS.
franze 1 hour ago

Opus 4.7
https://claude.ai/public/artifacts/128ebe5a-add7-406a-9bce-6...
- tasuki 7 minutes ago
  
  Wow that's terrible. Any idea why?
vtail 37 minutes ago

Here is GPT 5.5 High thinking; I had to add a second follow up prompt "it's not animated though" as the first one was not animated.
https://gistpreview.github.io/?557f979c82701862bc26d24f10399...
- vtail 29 minutes ago
  
  Here is a GPT 5.5 Extra High with a modified instruction:
  > Create animated SVG of a frog on a boat rowing through jungle river. Single page self contained HTML page with SVG. Use the Brave Browser to verifty that the image is indeed animated and looks like a proper rowing frog; iterate until you are satisfied with it.
  It was able to discover and fix an animation bug, but the result is still far from perfect: https://gistpreview.github.io/?029df86d03bfe8f87df1e4d9ed2f6...
krupan 19 minutes ago

These are hilarious. 3.5 Flash Thinking High is the only one that is weirdly deformed (what is going on with the hat in 3.1 Pro??)

OhMeadhbh 30 minutes ago

Am I really so old that when someone says "Flash" my immediate response is... "consider HTML5 instead" ??

nightski 22 minutes ago

Very little of what made the Flash culture so fun made its way into HTML5.
_puk 4 minutes ago

Lol. Young uns!
Flash, ah, ah, saviour of the universe. Flash, ah, ah, he'll save every one of us!
Every time I have heard the word flash for goodness knows how many years.

reconnecting 1 hour ago

Knowledge cutoff: January 2025

Latest update: May 2026

I have a very bad feeling about this lag.

hosel 1 hour ago

Can you explain what you mean?
- nemomarx 57 minutes ago
  
  It might indicate core model training and pre training is really slowing down?
  
  mixtureoftakes 46 minutes ago
  
  also parsing is harder + so much more of the new data is being generated by ai itself.
  still the cutoff is very much concerning and inconvenient
- reconnecting 34 minutes ago
  
  LLM pre-training models risk being unable to be updated with data from after 2025, as much of it is corrupted with LLM-generated content. We might be locked into outdated knowledge, where only whitelisted sources decide what to include.
  Taking into account the sometimes blind belief that 'LLMs know everything', the outcome could be very costly, especially for technologies and businesses unfortunate enough to emerge after 2025.
yoda7marinated 42 minutes ago

I thought that was a choice that Google made?
SwellJoe 15 minutes ago

At least in some cases, there seems to be a move toward training on more synthetic data and strictly curated data, especially for smaller models where knowledge can't be extremely broad, because there just isn't enough room to store the world in tens or hundreds of gigabytes of model weights. So, to achieve higher quality reasoning, the training has to be focused and the data has to be very high quality and high density.
With strong tool use, it maybe doesn't even matter that the models are using older data. They can search for updated information. Though most models currently don't, without a little nudge in that direction.
Also, I believe the Qwen 3 series are all based on the same base model, with just fine-tuning/post-training to improve them on various metrics. Maybe everything in the Gemini 3 series is the same, and maybe they're concurrently training the Gemini 4 base model with updated knowledge as we speak.

lanewinfield 38 minutes ago

Gemini 3.5 Flash's 2000 token clocks aren't bad. https://clocks.brianmoore.com/

wg0 27 minutes ago

3x price increase for a similar model almost. And they said AI would be cheaper and ubiquitous.

alexandre_m 5 minutes ago

Ubiquitous like the crack epidemic.

s3p 1 hour ago

Yikes. I think the concept of a 'flash' model is changing, no? Google used to market this as its lower-intelligence, faster, cheaper option. I appreciate that they are delivering on both of those, but personally I would appreciate if they could create an incremental knowledge improvement while holding price steady. Fortune 500 companies have to make their money I guess.

2001zhaozhao 1 hour ago

I think flash just means "fast" now

npn 1 hour ago

The price is crazy.

And I guess Gemini 3.5 pro will have the pricing increment, too. 12 x 5 = 60?

It seems like google does want us to use Chinese models.

OsrsNeedsf2P 1 hour ago

Beats 3.1 Pro for price per token, but artificial analysis is showing it's dumber per token and costs more overall

sauwan 1 hour ago

Yeah, bummer. I was very excited for this release, but this killed it.
droidjj 1 hour ago

The pricing is an issue.
golfer 1 hour ago

Arena.ai is saying "Gemini 3.5 Flash’s pricing shifts the Pareto frontier in Text. 8 models from GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers."
https://x.com/arena/status/2056793180998361233

asar 2 hours ago

$1.5/m input tokens $9/m output tokens

6x the price of 3.1 flash lite

himata4113 2 hours ago

I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.
- minimaxir 2 hours ago
  
  10% of input pricing is standard especially compared to competition.
  
  himata4113 2 hours ago
  
  yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.
- wolttam 2 hours ago
  
  It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.
  
  himata4113 1 hour ago
  
  gemini models solve a problem in 80% less tokens so that's something to think about.
  
  johaugum 1 hour ago
  
  Source?
- __jl__ 2 hours ago
  
  In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.
  
  svachalek 1 hour ago
  
  In my experience Google is the most flaky in general, which is surprising considering the rock solid history of their search and other products. Just more likely not to respond at all, to give a response out of left field, to handle the same error in 12 different ways randomly (a rainbow of HTTP status codes and error messages), etc etc.
  
  veselin 34 minutes ago
  
  Exactly our experience too. Effectively we catch these and on these status codes, we send to OpenAI. Retrying the same query in Gemini has high chance to give kind-of the same status code.
- simonw 1 hour ago
  
  Gemini caching is confusing though:
  $0.15 / million tokens $1.00 / 1,000,000 tokens per hour (storage price)
  I much prefer the OpenAI/DeepSeek way of pricing caching where you don't have to think about storage price at all - you pay for cached tokens if you reuse the same prefix within a (loosely defined) time period.
  
  simonw 6 minutes ago
  
  As far as I can tell Gemini caching DOES work like OpenAI - see implicit caching here: https://ai.google.dev/gemini-api/docs/caching
  I confirmed this by running a bunch of prompts through Gemini 3.5 Flash without doing anything special to configure caching and noting that it comes back with a "cachedContentTokenCount" on many of the responses.
  The "storage price" quoted is for an optional Gemini feature that most people don't care about: https://ai.google.dev/gemini-api/docs/caching#explicit-cachi...
John7878781 2 hours ago

[deleted]
- stri8ed 2 hours ago
  
  Output cost is 3x from Gemini 3 flash.
iwhalen 2 hours ago

I wonder why they didn't discuss price in the post?
Compare to the GPT-5.5 announcement: https://openai.com/index/introducing-gpt-5-5/
WarmWash 1 hour ago

I haven't used 3.5 at all yet, but previous Gemini (and Gemma models) are by far the most token light per task than any other model.
Cost per task is a more productive measure, but obviously a more difficult one to benchmark.
Aunche 1 hour ago

"Flash-Lite" is a different product from "Flash", which is more expensive. They couldn't be more confusing with their naming though, especially since they have 3.1 Pro and not 3.1 Flash non-lite.

bredren 11 minutes ago

Can anyone who has extensive, recent, experience with Claude code and Codex contextualize the current Gemini CLI product experience?

x3cca 26 minutes ago

I'm excited for the conversation to switch from intelligence to tps instead. I care much less about what hard thought experiments models can one shot and much more how responsive my plain text interface for doing things is.

golfer 1 hour ago

Arena.ai:

> Gemini 3.5 Flash’s pricing shifts the Pareto frontier in Text. 8 models from GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers.

https://x.com/arena/status/2056793180998361233

h14h 30 minutes ago

Given how widely varying the amount of tokens each model uses for a given task, "price-per-token" is essentially meaningless when doing this sort of comparison.
Artificial Analysis's "Cost to run" model (aka num_tokens_used * price_per_token) is much better, but even that is likely problematic since it's not clear whether running a bunch of benchmarks maps cleanly to real-world token use.

paperwork360 22 minutes ago

Google also updated Antigravity. version 2.0 is more for conversation with agent. The previous VS Code like IDE was much better.

merb 1 hour ago

Stil no new processor version for document ai https://docs.cloud.google.com/document-ai/docs/release-notes that is so weird. (Customer extractor)

It’s not possible to uptrain on preview releases and it did not get that much love for a while.

MASNeo 58 minutes ago

Well, available for Gemini means these days that half the time they are “Receiving a lot of requests right now.” and so sorry they couldn’t complete the task. Luckily the model supports long time horizons because that’s what’s needed. /me likes Gemini a lot just wishing Google would add the compute!

mackross 50 minutes ago

The antigravity teamwork-preview doesn't work for me -- upgraded to ultra, installed antigravity 2, ran teamwork-preview, keeps failing: "You have exhausted your capacity on this model. Your quota will reset after 0s."

aliljet 2 hours ago

Is there a good benchmark tracking hallucinations? The models are all incredibly good now, even the open ones, and my hope is that the rate of hallucinations is something that's falling off in concert with larger and larger context lengths.

Sevii 2 hours ago

I haven't been bothered by hallucinations in premier models since early last year. Still see it in smaller local models though.
- aliljet 1 hour ago
  
  I'm really running into this deep at the edges of content creation. Take, for example, a need to general some kind of legal work. The cost of painstakingly checking and rechecking each case cited is reducing the value of these frontier models immensely.
  Coding, however, is solved like magic. Easier to add tests, to be fair.
throawayonthe 1 hour ago

well there is https://artificialanalysis.ai/evaluations/omniscience
- goldenarm 1 hour ago
  
  It's a gibberish input detection benchmark, and does not measure output hallucinations.
yieldcrv 1 hour ago

if last year's models were the ones people got familiar with in late 2022, hallucinations would be an underrepresented rumor, there would be no articles about it because its so rare. overconfident lawyers wouldn't have messed up dockets in court with fake case law, in other domains that move faster, sources would be only partially outdated with agentic search and mcp servers filling in the gaps
AI psychosis would be the problem people talk about more, not just outright agreement but subtle ways of making you feel confident in your ideas. "yes, buy that domain name buy these other ones for defensibility"
(the domain name is dumb and completely unmarketable)
- jampekka 1 hour ago
  
  The models still hallucinate bad when called via APIs, especially if web search is not enabled. Gemini hallucinates quite frequently even with the app and search enabled. More recent (e.g. ChatGPT 5.x and Deepseek v4) prompts/harnesses search very aggressively, which does greatly mitigate hallucinations.
majso 1 hour ago

maybe something like this? https://petergpt.github.io/bullshit-benchmark/viewer/index.v...
WarmWash 1 hour ago

People complain about them incessantly, but I can almost never get people to actually post receipts. Every provider allows sharing chats, and anyone can share a prompt that reliably produces hallucinations.
More often than not, people are using images in responses that go awry. Which is fair, the models are sold as multi-modal, but image analyses is still at gpt-4.0 text-analyses levels.
Also knowledge cutoff issues, where people forget the models exist months to a year or more in the past.
- saberience 1 hour ago
  
  I see hallucinations ALL the time. It's only obvious when you're prompting about a subject you know well.
  And when I say all the time, I mean it, and this is for Opus 4.7 Adaptive.
  I often have to say, please do searches and cite sources, as if it doesn't it will confidently give me wrong or outdated information.
  If you're often asking questions about a topic that's not in your specialist knowledge you won't notice them.
  
  droidjj 1 hour ago
  
  Hallucination is also much better controlled in the context of agentic coding because outputs can be validated by running the code (or linters/LSP). I almost never notice hallucinations when I’m coding with AI, but when using AI for legal work (my real job) it hallucinates constantly and perniciously because the hallucinations are subtle—e.g., making up a crucial fact about a real case.
  
  krupan 6 minutes ago
  
  Yes, you can catch many mistakes that LLMs make whike coding, but I wouldn't necessarily call it "controlled." Every now and then the LLM will run into dead ends where it makes a certain mistake, the compiler or unit tests find the mistake, so it tries a different approach that also fails, and then it goes back to the first approach, then tries the second approach again, and gets stuck in an endless loop trying small variations on those two approaches over and over.
  If you aren't paying attention it can spend a long time (and a lot of tokens) spinning in that loop. Sometimes there might be more than two approaches in the loop, which makes it even harder to see that it's repeating itself in a loop. It's pretty frustrating to see it working away productively (so you think) for 20 minutes or so only to finally notice what's going on
- rjh29 1 hour ago
  
  "People complain about them incessantly, but I can almost never get people to actually post receipts."
  ...my chats are all pretty long and involve personal conversations, or I've deleted them. It's a lot to ask for someone to post receipts. The number of complaints is enough data.
  No matter how big the model is there will be edge cases where it has no data or is out of date. In these cases it just makes stuff up. You can detect it yourself by looking for words like usually or often when it states facts, e.g. "the mall often has a Starbucks." I asked it about a Genshin Impact character released in June 2025 and it consistently interpreted the name (Aino) as my player character because Aino wasn't in its data.
  Honestly I'm surprised your haven't encountered it if you're using it more than casually. Pro is much better but not perfect.
  
  ls612 8 minutes ago
  
  Claude has gotten good in the past month or two at recognizing when it might need to search the web for updated info rather than saying that it has no idea what I'm talking about or making stuff up.
- hibikir 57 minutes ago
  
  I see constant hallucination in claude code when using specific tooling: It thinks it knows aws cli, for instance, but there's some flags that don't exist, it attempts to use all the time in 4.6 and 4.7. When asked about it, it says that yes , the flag doesn't exist in that command, but it exists in a different command (which it does), and yet, it attempts to use it without extra info.
  Claude also believes it knows how AWS' KMS works, quite confidently, while getting things wrong. I have a separate "this is how KMS replication actually works" file just to deal with its misconceptions.
  For gemini, I typically use it to query information from large corpuses, but it often web searches and hallucinates instead of reading the actual corpus. On a book series, it will hallucinate chapters and events which, while reasonable and plausible, do not exist. "Go look at the files and see if your reference is correct" shows that it's not correct, and it's a mandatory step. But that doesn't prevent hallucination, but makes sure you catch it after the fact, just like a method in a class that doesn't exist gets found out by the compiler. The LLM still hallucinated it.
- hamdingers 50 minutes ago
  
  I can reliably produce hallucinations with this genre of prompt: "write a script that does <simple task> with <well known but not too-well-known API>." Even the frontier models will hallucinate the perfect API endpoint that does exactly what I want, regardless of if it exists.
  The fix is easy enough though, a line in my global AGENTS.md instructing agents to search/ask for documentation before working on API integrations.
  
  sapneshnaik 43 minutes ago
  
  Yeah. Better to have more details in your prompt than fewer. For example, I use this:
```
Build a Nango sync that stores Figma projects.
Integration ID: figma
Connection ID for dry run: my-figma-connection
Frequency: every hour
Metadata: team_id
Records: Project with id, name, last_modified
API reference: https://www.figma.com/developers/api#projects-endpoints
```
  Note: You do need a Nango account and the Nango Skill installed before it could work.
- asdfasgasdgasdg 48 minutes ago
  
  https://gemini.google.com/share/9cd8ca68025a
  I was trying to understand a game I've been playing, The Last Spell. I asked it for a tier list of omens -- which ones the community considers most important. At least a few of the names it posts are hallucinated ("omen of the sun" does not exist, and the omens that give extra gold are "savings," "fortune," and "great wealth").
  Obviously not a critical use case but issues like this do keep me on my toes regarding whether the thing is working at all. I should ask 3.5 flash to do the same job. (I did try and it once again hallucinated the omen names and some of the effects.)
- brooksc 16 minutes ago
  
  I asked gemini 3.1 Pro to search for the linkedin URLs for a list of peers. It generated a plausible list of links -- but they were all hallucinated. On a follow up it confirmed it couldn't actually search, but didn't tell me that without prompting.
FergusArgyll 1 hour ago

As long as the model uses web search, they almost never hallucinate anymore. The fast models (haiku, gpt-instant, flash) still sometimes have the problem where they don't search before answering so they can hallucinate
- goldenarm 1 hour ago
  
  I've seen chatGPT and Gemini hallucinate even from web search, it's better is not sufficient
krupan 4 minutes ago

It really depends what you are asking it. If the answer is in the training data, then the odds of it lying to you are much lower than if you are asking it for something it has never seen before.

golfer 2 hours ago

Here's the benchmark scoreboard they published:

https://storage.googleapis.com/gweb-uniblog-publish-prod/ori...

himata4113 2 hours ago

Engineers at google have publically stated that the models are too big and are far from their potencial. Glad they're being proven right with every release.

They continue to focus on smaller models while openai and anthropic are increasing compute requirements for their SOTA models.

stri8ed 2 hours ago

Given the cost increase associated with this model, and previous model releases, I think the size is trending upwards, not down.
- himata4113 2 hours ago
  
  The speed says otherwise. I think they're increasing costs since they want to start seeing ROI.
  
  JanSt 1 hour ago
  
  Those are (mostly) new, faster TPU
  
  himata4113 1 hour ago
  
  latest TPU's appear to reach 800tok/s rather than the advertised 300tok/s.
maipen 2 hours ago

Don’t let that fool yourself. Google will have SOTA models as big as or even bigger than their competitors.
They are just refining their current models while they finish training the next generation.
They will all come out at about the same time. Anthropic, OpenAi, Google, xAI
- ACCount37 2 hours ago
  
  Anthropic has been sitting on Mythos for a while now. I guess they don't feel pressured to fuck it ship it until anyone else gets a 10T to work.
  
  Sevii 2 hours ago
  
  It's doubtful they have the compute to make mythos publicly available even after the SpaceX datacenter deal. And why sell it publicly if people are still willing to pay for Opus 4.7?
  
  outside1234 1 hour ago
  
  I suspect that Mythos doesn't have a business model that works
  
  throwa356262 1 hour ago
  
  According to people who have access to Mythos, it is slightly worse than GPT-5.5-xhigh. At least for security tasks.
  Hold on, I think this claim needs some hard data. Here you go gentlemen:
  https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...
  
  ACCount37 1 hour ago
  
  That claim keeps contradicted hard by other parties, who say Mythos beats 5.5 resoundingly on both autonomous search and discovery and creation of complex exploit chains.
  There might be a harness difference, but also, this CTF-type benchmark might not capture the capability difference fully.
  
  aesthesia 1 hour ago
  
  See the later post testing a newer Mythos checkpoint, though: https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber...
  
  abirch 1 hour ago
  
  Anthropic can sell Mythos to Fortune 500 companies and bypass the average user. I'm not sure how much is hype but I see things like this https://blog.cloudflare.com/cyber-frontier-models/
howdareme 2 hours ago

Google’s pro models are almost certainly bigger than Openai’s lol
- fikama 1 hour ago
  
  Why would that be? I am curious why do you think that.
  
  ActorNightly 45 minutes ago
  
  Because TPUs are more efficient, and its cheaper for them to field them in higher quantity since they own the chip.
  
  mnicky 43 minutes ago
  
  E.g. because they are behind on research and so must compensate with size to achieve similar level of intelligence. At least this is what I heard.
  For intelligence/size only OpenAI and Anthropic are the frontier. Google has more compute so it can compensate for that with size of the models...
Jabbles 1 hour ago

> Engineers at google have publically stated that the models are too big and are far from their potencial
Can you link to a source?
Dinux 1 hour ago

Source please cause i dont believe that for once second
ActorNightly 41 minutes ago

I mean, yes and no.
Nobody really knows the answer to which one is more optimal
* Large model trained on a large amount of data across multiple domains, that doesn't need any extra content to answer questions.
* Smaller model that is smart enough to go fetch extra relevant content, and then operate on essentially "reformatting" the context into an answer.

mixtureoftakes 2 hours ago

benchmarks look REALLY good, the price hike is big but it also beats sonnet 4.6 in every discipline?

eis 2 hours ago

3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite. $1551 for 3.5 Flash [0] vs $892 for 3.1 Pro [1]. That's 74% more cost while ranking lower. It's 2.5x as fast but I don't think the bang for the buck is there anymore like it was with 3.0 Flash. I'm a bit bummed out to be honest.

I did not expect such a huge (3x) price increase from 3.0 Flash and I bet many people will not just blindly upgrade as the value proposition is widely different.

One interesting point to note is that Google marked the model as Stable in contrast to nearly everything else being perpetually set as Preview.

[0] https://artificialanalysis.ai/models/gemini-3-5-flash [1] https://artificialanalysis.ai/models/gemini-3-1-pro-preview

ls_stats 1 hour ago

>3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite
That's everything I needed to know.
ekojs 1 hour ago

Seems like the only good thing about 3.5 Flash is its speed. Not cost-competitive or benchmark-leading by any means.
mijoharas 1 hour ago

That's what I came here to check. Last model release they only put it into preview[0] at first.
Does that mean this model is production ready?
[0] https://news.ycombinator.com/item?id=47076484
pingou 56 minutes ago

How do they calculate that?
3.1 has 57M output tokens from Intelligence Index, 3.5 Flash has 73M, so not a lot more, and 3.5 is a bit cheaper, I don't get how 3.5 can be 74% more expensive.

noelsusman 1 hour ago

The Artificial Analysis benchmark results are pretty underwhelming. Roughly the same "intelligence" as MiMo-V2.5-Pro for over 3x the cost. We'll have to see how that translates to actual usage but it's not a great sign.

hydra-f 1 hour ago

That really depends on whether they have similar parameter counts, doesn't it? Unless you know that, the comparison is just strange
- halJordan 38 minutes ago
  
  Bad look to tell people they're not allowed to compare things just because we need to respect Google's privacy
  
  hydra-f 21 minutes ago
  
  I didn't take the price into consideration when writing that. I meant to point out that even if they have similar scores, the Flash model might be smaller than MiMo or Kimi, which would by itself be a win
  That said, haste makes waste as the price point completely invalidates that

swe_dima 2 hours ago

Flash family but costs like a Pro. $9 vs $12 for output.

ai_fry_ur_brain 23 minutes ago

Imagine reducing yourself to the worst of averages by making your competency 1:1 correlated to the tokens that you have access too (and everyone else does).

casey2 33 minutes ago

I think the field moved to agents too fast. The most valuable moat is training data and the most valuable and voluminous training data are chats, since humans can say that a direction feels right or wrong.

nightski 1 hour ago

AI being a product is not the future. It's more like an operating system that deserves to be open and free (aka Linux). Unless that happens we are in for a very dystopian future. I wish I had the intelligence, resources and/or connections to try and make that happen.

lugu 45 minutes ago

What we need today is a standard local API (think of it as a POSIX extension). So that each desktop app that needs AI to enhance a feature can simply call that. This way, those apps will need to handle the case where AI is not availabile. This will empower users.

alexdns 2 hours ago

Its Gemini 3.5 Flash

nerdalytics 2 hours ago

Yeah, Google chose a misleading title for the blog post.
- jader201 1 hour ago
  
  > Today, we’re introducing Gemini 3.5, our latest family of models combining frontier intelligence with action. This represents a major leap forward in building more capable, intelligent agents. We’re kicking off the series by releasing 3.5 Flash.

hubraumhugo 1 hour ago

Just updated my HN Wrapped project with it and it does well on my totally unscientific LLM humor benchmark: https://hn-wrapped.kadoa.com

amarant 37 minutes ago

Lol, nice project! I liked the xkcd-style comic the most!
I'm only gonna cry a little bit about the all-too-accurate roasts. Some of that stuff cut deep!

stan_kirdey 1 hour ago

EXPENSIVE ._.

f311a 2 hours ago

$9/1M output

explosion-s 2 hours ago

I wonder if this is because it's a larger model or maybe just because they can? Although with the latest Deepseek it's really tough to compete pricing wise. Inference speed and integration (e.g. Antigravity) might be their only hope here
- hydra-f 53 minutes ago
  
  It has to be a larger model, wouldn't make much sense otherwise. That isn't to say the price isn't artificially increased as well
  The Antigravity harness is really well done, so I do agree it's their strong suit. Can't say the same about gemini-cli (though it has a really nice interface)
  Would still choose Deepseek for the price

bakugo 2 hours ago

Triple the price of the last Flash model ($3 -> $9 per 1M output). Quickly approaching Sonnet prices.

Feels like the AI pricing noose is tightening sooner rather than later.

andrewstuart 1 hour ago

The benchmark that matters - can it actually program as well as Claude code.

If not then I’m not using it.

Cancelled my account 3 months ago, only Claude code level capability would bring me back.

simianwords 1 hour ago

No one talking about how this flash Beats Pro? Imagine what 3.5 pro looks like?

Also concerned about Gemini models being benchmaxxed generally

NitpickLawyer 1 hour ago

> concerned about Gemini models being benchmaxxed generally
I would say they are the least benchmaxxed out of all the top labs, for coding. They've always been behind opus/gpt-xhigh for agentic stuff (mostly because of poor tool use), but in raw coding tasks and ability to take a paper/blog/idea and implement it, they've been punching above their benchmarks ever since 2.5. I would still take 2.5 over all the "chinese model beats opus" if I could run that locally, tbh.

llmslave 1 hour ago

Conspiracy theory:

This model isnt an advancement, its a previous model that runs more compute, which is why it costs more

npn 1 hour ago

Nah, it costs what you are willing to pay.

ralusek 55 minutes ago

Those prices, what a disappointment.

cesarvarela 2 hours ago

Add Flash to the title, please.

meetpateltech 1 hour ago

edited it.

HardCodedBias 1 hour ago

Oh boy.

GDM is making (or has been backed into a corner into making) the bet that high throughput, low latency, low capability models are the path forward.

That probably works for vibe coded apps by non-practitioners.

I suspect that practitioners/professionals will wait longer for better results.

brokencode 1 hour ago

Where do you see that it’s low capability?
And Google is trying to make something affordable enough for a mass market, ad-supported audience.
They aren’t hyper focused on enterprise like Anthropic is. And that’s okay. There’s room for different players in different markets.

jdw64 47 minutes ago

Honestly, I feel like the new Gemini 3.5 Flash is a failure. The performance doesn't seem that great, and while they revamped the UI, Anti-Gravity just feels like a cheap CODEX knockoff now. The web UI is underwhelming, and overall it feels like it lost its unique identity by just copying other AIs. It’s a flop in both performance and price point. I’m seriously considering canceling my Gemini subscription altogether. Using Chinese AI models might actually be a better option at this point

warthog 1 hour ago

GPT-5.5 on the benchmarks still seem to perform better than this

Plus the vibe of the gemini models are so weird particularly when it comes to tool calling

At this point I kinda need them to shock me to make the switch

benbencodes 2 hours ago

Pricing is now live on ai.google.dev/pricing:

Gemini 3.5 Flash: $0.75 input / $4.50 output per 1M tokens, 1M context window. The output price explicitly "includes thinking tokens" — which is why it's higher than a typical flash-class model.

For comparison within the Gemini lineup: - Gemini 2.5 Flash: $0.30 / $2.50 - Gemini 3.1 Flash-Lite: $0.25 / $1.50 - Gemini 3.1 Pro Preview: $2.00 / $12.00

So 3.5 Flash is ~2.5x more expensive input vs 2.5 Flash. The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization.

conorh 2 hours ago

I think you have your pricing wrong there, Gemini 3.5 flash is $1.50 input and $9 output.
- mchusma 1 hour ago
  
  Okay, it's kind of somewhere between haiku and sonnet level pricing, at somewhere between sonnet and opus level performance. Its a great option. I was hoping to see opus class intelligence at haiku level pricing out of google, and this is close to that!
  
  mchusma 1 hour ago
  
  Never mind, after looking at more benchmarks, seems closer to sonnet level intelligence at slightly lower cost. Speed is great for latency sensitive applications, but if this was 1/2 the cost it would have been priced to win.
  If this is the big model release out of google, its a disappointent.
jpau 2 hours ago

Standard pricing is showing for me as $1.50 / $9.
(I suspect you're viewing the "flex" pricing).
lyjackal 1 hour ago

You’re quoting the batch pricing. On demand is 1.5 per input and 9 per M output. This is effectively comparable cost to Gemini 2.5 Pro in a flash tier model
ls_stats 1 hour ago

You are seeing batch inference, standard inference is $1.5/$9. I was excited until I saw that price.
Tiberium 1 hour ago

Please delete/edit your AI-written and factually wrong post.
MallocVoidstar 46 minutes ago

In addition to people pointing out your LLM got the pricing wrong,
> The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization
Every Gemini model starting with 2.5 has been a reasoning model.