I never want to hear from developers again that they are not susceptible to marketing. I see meet ups specifically about Claude often.
Modern tupperware party.
A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
Couldn’t tell.
Edit: i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
I think for developers the distinction is that ChatGPT is this commercial all in one solution for normies and Claude is specific for developers, in reality as you say the results for normal developers is indistinguishable.
Last year I used a bunch of models to try to generate Rust code. They all sucked.
This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
I then tried other models. Total disappointment.
I've continued to repeat this experiment. Opus is the only model that can write Rust reasonably.
Codex produces junk to this day. It passes variables that aren't needed, it abuses pointers, it creates overly verbose monstrosities...
I don't want any single company to win. I want OpenAI to be competitive. I want open source models to win. But right now, Claude Code and Opus are it.
I recently tried with C# code and Avalonia on Linux. Total disaster. Could only get things to run after 10 attempts or so, and was only trying a very basic example. For some of the experiments I actually gave up.
> This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
Having looked at a bunch of known or suspected (based on the intent of the code and/or what I know about the developer(s)) LLM generated rust, there's only a few explanations here:
1. You're way better at prompting than (virtually) anyone else.
2. You're vastly overestimating how good the rust code it produced is.
3. You handheld the model throughout and made lots of edits.
4. Your hand written rust code is very bad.
Because from every example I've seen, these models write horrible rust. Sure, it may technically pass all the tests, but it's horribly pessimized, badly organized, doesn't even attempt to use the type system, if there aren't bugs now there will be the second it tries to refactor or add a new feature, etc. etc.
(I also strongly suspect that the same would be true for other languages, but I can detect it in rust more easily because it's my main language)
This is like saying you gave a Taylor Swift fan sheet music from 1984 and from Michael Jackson’s thriller and they couldn’t tell the difference.
I have a strong affinity for Claude Code because of the interaction experience and overall tone / vibe / process. I am 100% willing to believe the code it produces is identical or possibly less good than Codex.
I enjoy working with Claude in a way I just don’t get from OpenAI. YMMV, you may feel just the opposite. But it’s a mistake to look at the produced code as the only dimension of these products.
The creative output and time to direct, to deliver due to the flow will also be different.
And it really depends on the task. Is it a typical well defined bug, or is it simpel CRUD. Or does it require research, combining different sources of data in a complex and creative ways.
This is also why benches never show reality, and the only real understanding comes if you actually try to build something.
That's a weird way to look at it. Any car gets you to your destination, but some people prefer driving a sports car or an SUV. They get something out of it that isn't just a marketing delusion, but subjective joy from the interaction with one product over another.
Luxury cars are indeed a good comparison. The subjective joy is a result of the delusion. That is why so much money is spent on such marketing to begin with. The analogous comparison would be if a blindfolded passenger turned out to prefer the Sienna to the 911.
I would actually say it is a luxury car where you have your personal driver and you are free to work on other tasks, and it gets you faster to the destination. Time to me is at least the most valuable thing.
Claude has an "End Conversation" tool that it can trigger on it's own, forcing your interaction to a close based on it's own feelings towards the conversation.
I have no idea how this wasn't the end of Anthropic's positive public perception.
Same. But even worse than all that: OAI erased Anthropic's red lines with the DOW, making it socially acceptable for every other AI company to do the same, creating a "race to the bottom."
I think OAI actually legitimately increased p(doom) for us all. Very strange behavior for a company that is supposedly concerned about x-risk.
I can’t tell the difference between code written in vim or vs code but it matters substantially to the person writing the code. There’s stuff beyond just the output that goes into tool choice.
Your argument is fine but different from the claim the OP is making. You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. Subjectively, people might still prefer one over due to anything from design to marketing, but that's very different from the claim that X is better than Y for coding (see: "A colleague was convinced Claude is better"). Basically, I prefer Claude is a different claim than Claude is better and the latter has a higher bar of proof.
> You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output.
You definitely can in principle; that’s the entire point of the comment you are responding to. If one tool completes it in 10 minutes with little hand holding, and the other does it in one hour at 4× the cost and while needing a lot of steering, the former is arguably better even if the end result is the same.
Whether that’s specifically true and demonstrable of GPT and Claude is another question, but your blanket statement doesn’t hold as a general rule.
That's a fair callout and I agree my statement was too general in just mentioning 'output', as you correctly pointed out. To define 'better' you would indeed need to agree on the dimensions you would evaluate candidates against.
I think a more appropriate rephrasing would be 'You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference on dimensions you care about'. In the case of latest of claude code vs codex with gpt 5.5) both are similar enough in the dimensions people will care about in evaluating (vs. differing wildly in cost or time taken).
That’s actually what my comment was based on; raw code output isn’t the only measure of quality. Engineers write better code if they have the tools they prefer.
Claude and Codex are tools. You can't tell the difference in the output between something that was done with a ratcheting wrench vs a standard combination wrench, but your mechanic certainly knows the ratcheting wrench is better (for most tasks).
I've not used Codex to compare against, so I'm not claiming X is better than Y, but comparing tools simply on their output is naive.
I'd bet I could tell with a result somewhat better than random chance.
While there is no meaningful difference in the ability to write code, vim has earned it's reputation for having a learning curve. I'd argue that predisposition, that requirement for additional investment energy will bias the results towards attention to detail, and pure minimalism.
I use both, enough to reach Codex highest personal sub limits and Claude is stronger to me specifically because of how the flow of building feels. So the PR for any random task would be irrelevant to me.
It's crazy hearing devs on this site claim Claude is 10x better than all other AI solutions. I think it is fomo. Claude $LATEST_VERSION is perceived as the best and anything else is "missing out". New version comes out? Suddenly the old version is worthless, how on earth did anyone get work done with that?
Same reason people buy the RTX 4090 and 5090 cards - overpriced but they must have the "best". Never mind the diminishing returns trying to max out PC settings (3-4x performance hit for an almost imperceptible increase in graphics, ignoring DLSS) - it's the psychological cost of having to move a slider down a notch.
I've been using Google and now DeepSeek v4 and I am having absolutely no problems and it's a fraction of the cost. I'd love for Claude to be 10x better but it just isn't, for my use case anyway.
I’ve been using DeepSeek V4 in OpenCode exclusively for about a month.
I think it’s great, but coming from Claude Code it did feel like going back in time by ~6 months in model capabilities. This isn’t a big deal to me for what I do, but the difference is definitely there.
Deepseek v4 Pro is like Opus 4.5 or GPT 5.2, but costs pennies on the pound for API. Which is to say, I should definitely be using it more to let my Codex and Claude subs go further.
Opus 4.5 was definitely stronger than DeepSeek V4 for me, specifically with large context.
I’m being pedantic/splitting hairs, though. I’ve obviously switched to DeepSeek full-time because it makes more sense to me pragmatically — I spend a few more tokens to get the outcome I want, but the tokens are cheap as dirt and the API is faster.
Perhaps I should plug it into Claude Code and see how it performs? I haven’t tried that.
Opus 4.8 and GPT 5.5 are the best models, but people don't care about "best" anymore, until there is a big leap in capability I don't think anyone will care about point releases.
Vibes and tribalism will prevail until one of emerges as clearly and unambiguously superior to the other.
You're comparing apples to oranges. Claude is a frontend overall product name, GPT5.5 is a specific model. Which model within Claude's offerings are you referring to? Opus 4.7, Sonnet 4.6, or something else?
Ah that's always SO fun. It doesn't matter how "smart" the person actually are (or think they are) we are ALL susceptible to influence and blind tests are shockingly simple to implement.
Convinced you can distinguish A from B? Ok! No problem, let's try! Can be at the dinner table for fancy wine or with agents, it's all the same, you try an option, another option, maybe all options from the same, and if you reliably can't tell well kudos, you are just like the rest of us!
It's easy to "know" in retrospect but blind test is where genuine difference can be found. Or not.
It’s also true in every other realm. Governments, think tanks, political parties, and activist groups use propaganda because it works.
I sometimes wonder how much of what I believe is bullshit I was fed through intentional propaganda. I do think as I’ve gotten older I’ve gradually identified and challenged some of it.
I don't think it's marketing, for quite a long time Claude was clearly better and not everyone has adapted to the new reality where they have similar capabilities.
I was really frustrated by GPT-5.4, but last night I really pulled out the stops and within a few hours I got path tracing and DLSS implemented on top of Godot, which doesn’t even support DLSS. Just to see if it could do it? And you know what, it did, which was absolutely mind blowing. It wrote like 5,000 lines of C++, I set up a mostly local asset production pipeline using GPT image gen, voiceovers using ElevenLabs API, and even background music using Suno via the chrome use extensions in Codex. I just wanted to see how far I could push this little dumb game my kids asked me to make, and my kids are like “wow our game looks so good!” These models are absolutely mind blowing. I didn’t want to go to sleep I was having so much fun.
Pretty easy to tell depending what the code is. GPT follows this pattern is using maybe_something and using uppercase constants by default. Claude is a little more natural but tends to include more fallbacks than gpt5.5
Sam Altman is so cartoonishly, over-the-top sociopathically shady that he makes JD Vance look like Benjamin Franklin. I mean, honestly, tricking third world people into retinal scans in order to get a scam crypto coin? Anyone using OpenAI for anything at this point should pause and examine their ethical compass.
Calling this a "tupper ware" seems a bit emotional, you're intentionally disregarding many things that matter for devs in order to try to claim equivalence, rather than paying attention to the actual process of software creation.
For example in your "test" you're only looking at output and ignoring the entire process of creation.
In addition to that process, you're ignoring that Claude Code was first and better for a long time, why would people switch for something that produces the same output? Claude Code has been way ahead in the process of agentic software creation for a long time, I still prefer its features. Even though I think that Opus 4.7 was a big step backwards, and I've been getting worse results seemingly every day with the churn of features at Claude Code, some of that may also be me testing the bounds of how little I can specify and still get acceptable results, so it's hard to know.
Calling all these concrete realities "marketing" is itself you trying to market Codex as "good enough" instead of paying attention to how we got where we are and where we will go in the future.
No, Tupperware is the exact analogy. As you point out though, the multi level marketing applies to all models. Anthropic is just the most aggressive, especially here.
Software developers are the most susceptible of all population groups for amplifying their employers' new whims. There are true believers and useful idiots, but many are just mediocre and know that playing along will further their career for a couple of years.
a) everyone is "susceptible" to marketing - so what
b) therefore a preference for Claude is marketing - complete bollocks
Either the tasks you chose were well below the capabilities of top models, or meaningful differences for preference are elsewhere, or both.
Your comment is probably energy-efficient and sustainable, however, because you could use it again and again when another comparison comes up, like Vim vs Emacs, or tea vs coffee
I have always found this field, especially in the last 10-15 years, to be incredibly fad driven to the point that it reminds me of things like fashion more than an engineering field.
It’s one of the things I don’t like about it. All humans are susceptible to herd behavior and influence but engineers should be at least a bit more hard nosed and reason more from first principles.
I don't think that's the only reason but you're spot on about OpenAI marketing being absolutely terrible. The primary product names of "Claude" vs "ChatGPT" highlights this remarkable difference. To the point where I'm seeing Claude completely take over the generic term for agent.
I do think OpenAI is doomed due to bad leadership. What you said (that the marketing is relatively terrible) and what others are saying here (that the product is worse) is damning isn't it? Are they really failing on all fronts?
I don't think it's only marketing. OpenAI had the advantage of being first to the market, and in the beginning of the race it seemed that the future belongs to them. Then came the bad PR and unpredictable quality of their main product.
For general use, ChatGPT's answers have gotten worse over the last year. I abandoned it.
Isn't the experience of interacting with the models appreciably different? It's not all about the outcome. Not to mention the harnesses are increasingly the real product.
1. It's 1 in 10 failures that can take half of your time or bugs that can take a long time to surface. Plus the way they change things largely depends on the current codebase (and how it was created)
2. In my case codex seem to be writing a more solid code, but I still use claude most of the time because it's my witty rubber ducky and I can actually sometimes force some legit insights out of it. Codex is much worse at this. And whether that matters or not depends on the project.
> i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
This is complicated by the way that the coding agents inject prompts that preempt and potentially undermine user instructions. I suspect that one of the reasons Codex works way better for me than Claude Code in certain projects is that the latter adds some garbage like "go ahead and write repetitive copy/paste code, keep it simple, take shortcuts" to every session. A fair test would have to hide but more or less still use the harnesses, not just the models.
Very similar thing happened when I was at a design event a couple of days ago. I’d say it’s even worse on the design end - there was a big discussion around how to optimize your usage of Claude. Not optimize your usage of AI, but Claude specifically, as it was the only model literally all of them were using. The biggest issue is they were all hitting their usage limits. I asked whether they had tried other, lighter models (Ie gemini or composer), and it was like I was speaking a foreign language.
I find codex superior in speed and equal in quality, so it’s my preference. But Claude Code made prettier UIs last time I tested. Codex produces Microsoft-grade UIs. Very enterprise and ugly unless I actively steer it.
The results may be the same but I personally find Claude nicer to work with. It seems to understand my intent better than GPT and needs less guidance. Maybe it’s just personal preference.
I picked Anthropic way early on, before Claude code even existed. Because they at least play lip service to behaving morally. That’s the most you can hope for these days really.
> Edit: i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
I think you're missing one (or more) of the facets individuals decide "better" is, for the subjective individual.
Early on i hopped between all the providers. Code quality for SOTA at the time was pretty decent if you didn't ask it to solve challenging problems. However the thing i found most difficult is consistency in how it listened. Eg Gemini (i forget what version, not current) was super prone to focusing solely on the functionality/goal, but not any of the directions on how to write the code. It would throw in comments everywhere, document in a manner i didn't want, use abstractions i told it not to, etc.
How well a model would follow instructions to drop their horrible "isms" was the #1 criteria for me. If i have to constantly remind the model not to do X behavior then it's a terrible model.
With that said, that is why i chose Claude for the last N months. However i've stuck with Claude because dealing with these "isms" and their little behavioral nuances is a chore in itself. I've found you have to learn the model just as much as anything, and so the idea of hopping these days when i'm just trying to get shit done is not likely.
These days for me personally, Claude has to give me a reason to switch rather than me investing even more money (i'm on the 20x plan) in other providers. I'm definitely not committed to Claude Code, but i am tired of the LLM churn, tooling churn, subscription churn, and the general fear of which providers we can trust.
edit: In short, it's the interactive UX just as much as it is the final output.
You're overestimating the extent to which individual developers have a choice here. My employer signed up for a Claude Code membership, I use Claude Code. I cannot use Codex.
Anecdotally I hear of folks with workplace Claude Code subscriptions all the time. I'm not sure I've ever heard someone talk about their workplace Codex subscription. Anthropic clearly did a far better job chasing corporate customers while OpenAI was busy chasing consumers with Sora etc.
Intellectual property. My employer has an agreement that our code will never end up as part of Claude's training data. At this point there are also now custom Claude integrations etc.
I'm sure they could also negotiate a similar deal with OpenAI but in my outsider experience it seems that negotiations around these kind of corporate contracts takes forever and when the selling point is "they're broadly pretty similar" I suspect the motivation isn't there.
The OP seems unaware that Claude had a lead in this space and captured market share and attention for that reason alone.
The test they (supposedly) ran with their coworkers to look at PRs from both is such a bad way to compare LLMs that I don’t think they’re very experienced with using them.
> The OP seems unaware that Claude had a lead in this space
I remember using GitHub Copilot (OpenAI "Codex" mk1) in Aug 2021 (ChatGPT would launch a year later 2 weeks after Meta's botched Galactica release). Cursor & others took it and ran a mighty good race.
Maybe some of these companies will learn to stop appointing awful leadership then.
Having a sleazy CEO like Sam Altman or Elon Musk is a business risk. Many potential customers don’t like these people and they say abrasive and alienating things publicly.
Rolling over to the DoD’s desire for fully automated weaponry is more bad marketing. How many people switched from OpenAI to Anthropic over that? I sure did. Anthropic’s willingness to burn that bridge over an ethical stance said a lot about the company to me.
I’m not going to use OpenAI products for these reasons among others.
I’m also not going to use Cursor as xAI plans to acquire Cursor.
Maybe it’s foolish of me to avoid those companies for such petty reasons, but that’s not my problem. That’s their problem.
It takes years to build trust and hours to burn that trust to the ground. Customers can hold grudges for a lifetime.
I did a pair programming comparison over 3 month on Codex 5.2 and Claude Sonnet and my subjective experience was that based on cost and rollbacks to a previous commit Claude is significantly better. Especially in VS Code Copilot. I wrote a long Substack post about it. I would share its but its in the paywalled archive by now.
> We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
> Couldn’t tell.
Why would you expect them to be able to recognize the signature of a model from a pair of PRs? I don’t understand why you think this is a useful test for anything when we have numerous benchmarks that run 100s of tests on models and both GPT-5.5 and Opus-4.8 perform similarly.
I have subscriptions to both. I run both on max reasoning. It is interesting to see the relative strengths and weaknesses of each model. You won’t always see it if you’re just scanning code. Some times one will spin for a long time on certain problems where the other has no problem finding the appropriate parts of the codebase and getting an efficient solution.
antirez made a comment that he and others found GPT-5.5 to be better at the optimization tasks he was working on than Opus. There are other classes of tasks where GPT-5.5 consistently stumbles where Opus will get a solution quicker. Lately I’ve been working on some code where neither model comes up with a good solution. That’s just how LLMs go.
The only reason you have seen more activity about Claude is that they got there first. Codex has been a step behind and GPT couldn’t match Opus at first. You’re testing them after they’ve closed the gap.
The results are the same but I’ve found the process to get to the results are just more pleasant with Claude. I can’t put my finger on it. Overall most these models at the highest level are about the same in many respects but the UI/UX for some are just more enjoyable, for lack of a better term.
Codex I feel the need to be very specific and precise with. Claude… I feel like I can be lazy, which I enjoy.
Both still need to be reviewed stringently but I feel I can be more ambiguous with Claude and get better results than when Codex.
You confuse ease of using a tool with quality of output.
A skilled carpenter can work both with high and with medium quality tools and prefer one over the other with no difference visible in the craft they produce.
Instead of only hanging them evaluate the final output, you ought to also have a way to have them evaluate the process and agentic aspects in getting to said output. Claude Code outshines when you look at it end-to-end, in my experience.
Benchmarking 1 or a few samples isn't ever going to yield anything but noise. The actual benchmarks use thousands of tasks.
GPT 5.5 genuinely was back on top for a while there, but if you look at the past 2 years, being on Claude was better than being on OpenAI most of the time. If you're going to pick a tool and not switch constantly it was the right choice. Not to mention their tooling has always been ahead, and that gets ecosystem benefits.
Are they close and interchangeable today? Sure. But Sonnet was genuinely way better than anything OpenAI offered for a long time -- the valuation reflects that, not any given moment in time.
I think Sam Altman is an asshole and I prefer to spend my money elsewhere.
Frontier models being commoditize is inevitable. OpenAI thinks they're still competing on technology, and not user experience and market reputation otherwise they'd understand the continuous negative PR generated by Altman's chaos is going to cost them everything.
Altman does appear to be an asshole, but I have bad news for you if you think Anthropic are the good guys. If anything, they might be worse than OpenAI.
Can you elaborate or give some examples as to why? I dont know much about this subject, last i heard, Anthropic declined deals with Military and government agencies - while OpenAI opened their arms. But i am not
I wouldn't say Anthropic is worse than OpenAI, but there's a lot wrong with them. https://anthropic.ml/ has a collection of incidents and relevant evidence.
I mean paying for deepseek is sending money to China, and any company in China is pretty much an extension of the CCP (or can become one at the snap of their fingers). I don't think this is much better or worse than the American AI companies.
Probably.
When most people choose to time in AGAINST the idea of funding evil people, I think their arguments are disingenuous, they are just looking for a way to excuse their own behavior, which they know is bad. They don't want to give up the convenience of say, their nice Tesla that would like to own, and make excuses about why it is ok to enrich a nazi even further.
Dario is constantly fearmongering to generate press, gaslighting, and contradicting himself. Mythos is the most recent example of that. It was never too powerful to release, that was a lie to generate publicity and fear, and an excuse because they didn't have the compute to serve it. People were finding the same bugs and exploits using GPT5.4, GPT5.5, and lesser models. Now all of a sudden, they do have the compute, and now they're saying that Mythos is releasing in the coming weeks.
Anthropic is constantly caught up in ethical scandals too. They pump the web full of advertising bots. They steal peoples tokens, punish you for disabling telemetry, blacklist people they don't like. They had remote code execution vulns in their product for nearly a year and secretly buried that fact, no disclosures at all. Here are some of them https://clawd.rip
It’s also a weird argument.
You can only spend your money once, and the affected employees also chose to work for a bell-end like Altman (or Zuck, or Musk)
I enough 'small' senior developers, project managers, product owners, internal IT people take a small stand against OpenAI products, that can still sum up to a notable impact
People can spend money how they wish. SamA is a prick, so I don’t buy from his company. I don’t buy from Microsoft or Oracle either. Giving a company your money is explicitly supporting them and everything they do. Are you going to force me to buy products from people I don’t agree with?
Sam Altman appears to represent a significant liability for OpenAI’s success from this point forward. A big portion of the driver for Anthropic’s meteoric rise over the last six months appears to be folks recognizing “it’s that AI startup not run by Sam Altman.” Anthropic has amazing tech, but its biggest asset at the moment seems to be that “it’s not OpenAI.”
Not saying that’s right or wrong, but it’s clearly a factor holding OpenAI back at this point.
At this point I think it’s more important to have a solid workflow and understanding of how [insert your favorite model here] works and its capabilities, than chasing the next shinny release jumping back and forth between companies. I just finished my first large project with Codex and it is hard for me to believe Claude can be much better. It may be a bit better or worse, but again, they are all so good now that the user is the one driving the difference.
GPT-5.5 is the better programmer but Opus 4.8 remains the better system architect and product designer.
Codex is very "miss the forest for the trees", but is much better at successfully making large changes in large codebases. Claude Code makes more mistakes, but has more taste and a better grasp on idiomatic and elegant software development.
The roulette pockets for the model are bigger for some outputs than others. Draw a big enough black box around it and a different one around humans and it's insistinguishable.
More interesting than arguing a jumble of electrochemical reactions have taste? That may seem more readily familiar but is no less strange if you prod at it. Nonetheless it’s difficult to argue either don’t produce output that has qualities of discernment (ie taste).
Great analysis and follows my experience as well. Codex is better when you know how you want the design and the architecture and you drive the agent a lot more aggressively. Claude Code feels like more autopilot so executives and users who didn’t code before AI like it a lot more.
But I feel like an expert who can drive GPT aggressively will out perform Opus. It’s why some smart people I know are opting for GPT and have fallen off on Opus. It’s like asking an F1 driver to sit in a taxi.
This is exactly right. Claude has baked in autonomy and preferences that let it handle underspecified prompts elegantly, which makes it seem smarter to people who like to prompt that way, but it also ignores instructions and fights you on things, which makes it a bad model for people who know what they want to do and specify it.
Opus 4.7 (haven't tried 4.8) just really struggles writing correct code for complicated (i.e. valuable) work. I can handle architecture, which takes <1% of my time anyway. But writing code that's wrong is a cardinal sin. I've had much more luck with GPT 5.5 so far.
GPT 5.5 still invents facts rather than looking them up, and manages to come across both as condescending and sycophantic. It feels like talking to a used car salesman.
Funny cause I'm quite literally having this exact issue with 4.8 as we speak. I've been going back and forth with Claude since yesterday afternoon on chopping up, stabilizing and facilitating recovery on a flaky mega-pipeline. Not 5 minutes ago, I had to remind it that two of the solutions it proposed were not possible because the target technology doesn't allow what it wanted to do, despite pointing it to the very docs that says it can't be done in the first place.
As far as its tone... Both feel like sycophantic as hell to me. To be honest, they just all feel so.
> GPT 5.5 still invents facts rather than looking them up
So does Claude, what’s your point?
I used it and ChatGPT this week in trying to assist troubleshooting a complex DB related issue and Claude had to apologise no less than three times in which it admitted to talking complete shit.
Just one example of the kind of shit it dribbled:
> I need to be upfront with you. I should not have claimed X as if I knew that for a fact. That was overreach on my part.
You're using last week's model; Opus 4.7 is old news. Opus 6.9 is the new hotness; it is a better product manager than GPT, and has more X productivity. It replaced our junior dev team, and tells me my hair looks good.
I'm experiencing the same. Codex gtp-5.5 has more brilliant intuitions, write less code, i.e. it identifies the exact point in which the modification shall be done. Nevertheless, huge improvements on personality from opus 4.7 (it was too accomodating) to opus 4.8
I strongly believe the reason gpt-5.x performs so well on large projects is because of the focused training they've done on their dedicated apply_patch primitive.
The official implementation of apply_patch is well thought out. It is a two-phase process that will not actually make any changes until all files in the change set are not ambiguous. The pre-commit error feedback usually fixes anchoring issues with one or two additional attempts. It generally goes something like:
Reading file A L1:154
Reading file B L1:123
Attempting to apply patch...
[anchor errors for both A & B]
Reading file A L43:67
Reading file B L50:74
Attempting to apply patch...
Patch succeeded! Running compilation & unit tests...
The anchor error feedback helps massively because in this implementation it also returns the current line numbers where the problem was found.
Techniques that replace the whole file or depend on find-replace are useful in more isolated contexts. However, when you need to refactor 20+ files, something like apply_patch is what you want. Anything that depends on specific line numbers for actual replacement targets is a total dead end for complex edit scenarios.
My problem with codex/gpt that is too verbose (mostly js and python): a lot of helper functions, a lot of 1 or 2 line functions used in 1 place only, a lot of types or proxy like objects.
I have specific skills for trying to avoid this, but nevertheless I spent half of the time fighting with its verbosity.
Currently, I'm trying to scaffold the functions/classes I know I need with NotImpelmented and ask it to implement only inside those specific places. It's a little bit better, but I still have to fight with function in functions definitions ...
Upshot - poetry expertise does not seem to be the primary focus these days, perhaps to the detriment of the entire world. We did move on from training scaling to “test time” scaling (which I hate as a name btw), Ilya does not seem to have been needed, (although I am really curious what he’s building).
My prediction that you want to be deeply embedded and really rich and part of global infrastructure feels good. My suggestion that oAI / MS would be able to use the lead in 2024 to extend was wrong.
Neither of us talked much about coding as a product that would drive value and behavior, which is super interesting to me, we were probably six months from seeing real competence of any sort there way back in June 2024.
We both seemed to think there would be a single breakout company, or could be one, (although I did suggest buying the basket), clearly not the case with GOOG oAI and Anthropic all posting serious revenues this last quarter / year.
One area of Anthropic that was nascent in 2024, but that I have come to think is super valuable is their mechinterp group. I still don’t see work done by other labs (at least published) to nearly the quality of Anthropic. And the group has clearly moved into a period of productivity; there’s a good chance in my mind it could provide a truly enduring strategic advantage as a tool to be used by the taste makers steering the ship. In 2024, interpretability seemed almost impossible to get a handle on — today, the sustained chipping away at the problem makes a lot more look possible.
They just have no moral issues with spamming the internet with bots. They utilize blackhat tactics whenever they can to get an upper hand. Every social media platform is absolutely choc full of Anthropic and Claude promoting bots, and you know they're bots because they all repeat the same things, in the same wording. X in particular seems to have millions of them.
I get the feeling this also means AI works very well for the general coding tasks and that's their biggest success in terms of difficulty AND people paying for it.
Of course every AI company has been over promising and pumping the numbers as much as possible but OpenAI has been hitting the reality wall more because both their people not being able to keep improving at a faster rate and their whole cost structure and financial plates spinning.
This doesn't invalidate the fact Anthropic is also overhyped to the max for their IPO.
In this game, who wins - in the long term - is who has the best model: so far OpenAI is ahead, so in the long term this is what matters. However, for the same reason, if in the future open weight models will be very near the quality of frontier labs, Anthropic and OpenAI will be out of business very soon. The game they play only make sense if their SOTA models do things that other models can't do at a comparable level.
You can theoretically do most things AWS does most of the time, yet people pay premium for it and keep paying for it, even though alternatives are cheaper, simpler and more performant.
I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.
You might have a subpar product (for the price) but the reputation and history is what makes people open their wallets.
> I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.
Depends. The bigger the bubble, the bigger the pop.
Only a few unicorns from the dot-com bust came out the other side (Amazon, Google, ... anyone else?), and that was a piddling affair compared to this one.
I have the same impression. Strange to see this being downvoted & it was after reading the comment that I read the username to find out its antirez!
Now, I think that with these companies IPO'ing and Nasdaq and other bending themseleves and their rules to cater to them (as in case of SpaceX), these companies are very close to an IPO.
So for the employees, they are probably gonna get good evaluations, atleast in the short term and perhaps they are having a problem which is worth having.
But as you have suggested, I feel like the whole thing might be flaky especially given open source models. I believe that OSS models are at worst close to literal SOTA ~6 months ago.
So OpenAI & Anthropic have to somehow always be on the edge to get better models to not lose this (imo) very small time grip that they have, all while losing billions of dollars and having to worry about profitability & so many other concerns in it of itself.
I don't think that there is any other thing inside CS or any industry where two pieces of software being almost comparable enough with not much moat around except a diff of 6 months best, is something on which trillions of dollars float around on. We don't know how things will pan out but if I have to guess, It might not be looking good for OAI, Anthropic over especially the longer horizon.
The headline is false. First off, OpenAI hasn’t raised a recent round so you can’t compare these two companies randomly like this. Second, Anthropic is known to have accounting methods that give it more revenue. And neither of these companies are known to be doing gaap accounting
This is depressing. Anthropic really is the last company we want to see leading this race, given how greedy they are. Let's not forget all of the lying and gaslighting too. The creator of OpenClaw made this I believe: https://clawd.rip
Stealing peoples tokens because you use a product they don't like... That shows the morals they have. Actions speak louder than words. Disabling peoples caches because they disable telemetry was another juicy one that I don't believe is on this site. In fact there are far more I remember that aren't even listed here.
"Investors who have poured hundreds of billions into closed-source labs are betting on an unprovable safety moat".
Nobody is investing in closed-source labs for safety reasons, being able to explore more in details what and how the model is thinking is nice but by no means a game changer. What matters to investors and most of the users is that the model gives the right answer at the end.
Pointless article (like much of the AI marketing hotness and spin room).
> The new valuation is nearly three times higher than the company’s February valuation, when Anthropic was estimated to be worth around $380 billion.
> In March, OpenAI was valued at $852 billion following a record $122 billion funding round.
Basically, today (Late May) we're declaring Anthropic the most valuable. They've nearly tripled in value since February. But also, OpenAI was $852B in March and presumably has grown since then.
In a few weeks we'll either have a new rounding of funding for OpenAI or they'll announce their IPO and the hype train will be abuzz that they're now the most valuable.
The models aside, my impression is that Anthropic is winning in large part because of very pragmatic and high-velocity product development on top of them; like with Claude Code.
Like actually iterating hard to make them useful. Many, many details matter here.
I haven't tested the similar OpenAI/Google tools in detail lately though. Previously I found them way too generic and unpolished to be useful.
My impression as well. OpenAI was riding the high of ChatGPT with a very confusing and seemingly unfocused offering beyond that. Anthropic was always laser focused on business use cases. Claude Code being the big one. Finance seems to be their next target.
Anthropic has much narrower capabilities. No image generation, no video generation, no 3d world models, barely any voice stuff. But they know who their target customers are, and their API has a model selection anyone can understand and pricing that rarely changes. Focus and predictably
ChatGPT dropped the ball for a while that most devs and technical people went to Claude for a year or more, they still probably have the most normie market share + are at least trying to win back some of that delay in their latest model so it'd be interesting to see
I never want to hear from developers again that they are not susceptible to marketing. I see meet ups specifically about Claude often.
Modern tupperware party.
A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
Couldn’t tell.
Edit: i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
Everyone can be propagandised. It's a matter of pushing the right buttons.
Not everyone one. Some are very strong mentally and not so easily malleable.
I don’t think that applies to most on here tho.
I RAN to downvote this dunning kruger of a comment.
Seeing yourself as immune to propaganda probably makes you more susceptible to propaganda.
Edit: Oh they’re trolling, nm. :-/
Or pushing the wrong ones
I think for developers the distinction is that ChatGPT is this commercial all in one solution for normies and Claude is specific for developers, in reality as you say the results for normal developers is indistinguishable.
Maybe some people think that but there’s not really any meaningful difference in their offerings
FWIW most of the normies I know are using Claude
> Couldn’t tell.
I can tell. It's night and day.
Last year I used a bunch of models to try to generate Rust code. They all sucked.
This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
I then tried other models. Total disappointment.
I've continued to repeat this experiment. Opus is the only model that can write Rust reasonably.
Codex produces junk to this day. It passes variables that aren't needed, it abuses pointers, it creates overly verbose monstrosities...
I don't want any single company to win. I want OpenAI to be competitive. I want open source models to win. But right now, Claude Code and Opus are it.
I recently tried with C# code and Avalonia on Linux. Total disaster. Could only get things to run after 10 attempts or so, and was only trying a very basic example. For some of the experiments I actually gave up.
> This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
Having looked at a bunch of known or suspected (based on the intent of the code and/or what I know about the developer(s)) LLM generated rust, there's only a few explanations here:
1. You're way better at prompting than (virtually) anyone else.
2. You're vastly overestimating how good the rust code it produced is.
3. You handheld the model throughout and made lots of edits.
4. Your hand written rust code is very bad.
Because from every example I've seen, these models write horrible rust. Sure, it may technically pass all the tests, but it's horribly pessimized, badly organized, doesn't even attempt to use the type system, if there aren't bugs now there will be the second it tries to refactor or add a new feature, etc. etc.
(I also strongly suspect that the same would be true for other languages, but I can detect it in rust more easily because it's my main language)
Been to an Anthropic event in Paris last summer.
They served caviar. It probably had good ROI.
This is like saying you gave a Taylor Swift fan sheet music from 1984 and from Michael Jackson’s thriller and they couldn’t tell the difference.
I have a strong affinity for Claude Code because of the interaction experience and overall tone / vibe / process. I am 100% willing to believe the code it produces is identical or possibly less good than Codex.
I enjoy working with Claude in a way I just don’t get from OpenAI. YMMV, you may feel just the opposite. But it’s a mistake to look at the produced code as the only dimension of these products.
This is my point. The harness itself creates feelings that are positive, but the artifacts produced are similar.
It is like the employee who is slightly worse but is a brownnoser getting promoted more often.
And what do you know, that is what is happening. It is like the coke commercial with the nice music and beautiful person in the back.
Speaking of which, remember Pepsi Challenge? Coke lovers are like the claude code lovers.
The creative output and time to direct, to deliver due to the flow will also be different.
And it really depends on the task. Is it a typical well defined bug, or is it simpel CRUD. Or does it require research, combining different sources of data in a complex and creative ways.
This is also why benches never show reality, and the only real understanding comes if you actually try to build something.
But what they're pointing out is user experience, not marketing.
That's a weird way to look at it. Any car gets you to your destination, but some people prefer driving a sports car or an SUV. They get something out of it that isn't just a marketing delusion, but subjective joy from the interaction with one product over another.
Luxury cars are indeed a good comparison. The subjective joy is a result of the delusion. That is why so much money is spent on such marketing to begin with. The analogous comparison would be if a blindfolded passenger turned out to prefer the Sienna to the 911.
I would actually say it is a luxury car where you have your personal driver and you are free to work on other tasks, and it gets you faster to the destination. Time to me is at least the most valuable thing.
this site is reddit 2.0
> The subjective joy is a result of the delusion.
Repeat after me:
_Other people can experience things you do not experience and it is still valid, and not a delusion_. They are not sheeple who fell for marketing.
If it were a matter of 'enjoyment' then the OP would have made his point.
There should be a material difference between the tools.
There is.
vim / emacs / jetbrains - different tools to produce code.
Codex and Claude are different.
Claude has an "End Conversation" tool that it can trigger on it's own, forcing your interaction to a close based on it's own feelings towards the conversation.
I have no idea how this wasn't the end of Anthropic's positive public perception.
Luckily this doesn’t come up while writing code. It tends to be if you are chatting it up in friend mode, and ask for a bomb recipe.
Yes, which means that in the long run this looks ugly.
So much faith and money in this idea, and seeing how fragile it is, does not look good.
for me personally it's two reasons:
1) Brockman ($25M) and Altman ($1M) both personally donated to Trump/MAGA.
2) Anthropic pushed back against DOD's demand for unrestricted use of AI to kill people while OpenAI eagerly said "please use ours!".
Same. But even worse than all that: OAI erased Anthropic's red lines with the DOW, making it socially acceptable for every other AI company to do the same, creating a "race to the bottom."
I think OAI actually legitimately increased p(doom) for us all. Very strange behavior for a company that is supposedly concerned about x-risk.
I can’t tell the difference between code written in vim or vs code but it matters substantially to the person writing the code. There’s stuff beyond just the output that goes into tool choice.
> There’s stuff beyond just the output that goes into tool choice.
Yup, like billions of capex. Unlike vim.
Your argument is fine but different from the claim the OP is making. You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. Subjectively, people might still prefer one over due to anything from design to marketing, but that's very different from the claim that X is better than Y for coding (see: "A colleague was convinced Claude is better"). Basically, I prefer Claude is a different claim than Claude is better and the latter has a higher bar of proof.
> You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output.
You definitely can in principle; that’s the entire point of the comment you are responding to. If one tool completes it in 10 minutes with little hand holding, and the other does it in one hour at 4× the cost and while needing a lot of steering, the former is arguably better even if the end result is the same.
Whether that’s specifically true and demonstrable of GPT and Claude is another question, but your blanket statement doesn’t hold as a general rule.
This obviously correct take will get pushback, so let me add some other examples:
- which tool required more detailed goal-setting in the prompt?
- did one tool ask follow-up questions up front vs spread out over implementation?
- did either tool match existing coding styles?
- did either tool remind you about potential conflicts between what you asked it to build and other parts of the codebase?
There are a lot of ways to compare agents besides just the code. (Similarly, working engineers are not evaluated just on their code output.)
That's a fair callout and I agree my statement was too general in just mentioning 'output', as you correctly pointed out. To define 'better' you would indeed need to agree on the dimensions you would evaluate candidates against.
I think a more appropriate rephrasing would be 'You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference on dimensions you care about'. In the case of latest of claude code vs codex with gpt 5.5) both are similar enough in the dimensions people will care about in evaluating (vs. differing wildly in cost or time taken).
> A colleague was convinced Claude is better
That’s actually what my comment was based on; raw code output isn’t the only measure of quality. Engineers write better code if they have the tools they prefer.
Claude and Codex are tools. You can't tell the difference in the output between something that was done with a ratcheting wrench vs a standard combination wrench, but your mechanic certainly knows the ratcheting wrench is better (for most tasks).
I've not used Codex to compare against, so I'm not claiming X is better than Y, but comparing tools simply on their output is naive.
" You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output"
Sorry I think this misses the mark.
Because it's not the output but the process.
And sometimes the outcomes are not always discernable.
Codex and Claude are very different.
I use them for different things.
Their behaviour difference is obvious.
Of course it'd impossible for anyone to tell by looking at my code base 'how it was written'.
I'd bet I could tell with a result somewhat better than random chance.
While there is no meaningful difference in the ability to write code, vim has earned it's reputation for having a learning curve. I'd argue that predisposition, that requirement for additional investment energy will bias the results towards attention to detail, and pure minimalism.
I use both, enough to reach Codex highest personal sub limits and Claude is stronger to me specifically because of how the flow of building feels. So the PR for any random task would be irrelevant to me.
It's crazy hearing devs on this site claim Claude is 10x better than all other AI solutions. I think it is fomo. Claude $LATEST_VERSION is perceived as the best and anything else is "missing out". New version comes out? Suddenly the old version is worthless, how on earth did anyone get work done with that?
Same reason people buy the RTX 4090 and 5090 cards - overpriced but they must have the "best". Never mind the diminishing returns trying to max out PC settings (3-4x performance hit for an almost imperceptible increase in graphics, ignoring DLSS) - it's the psychological cost of having to move a slider down a notch.
I've been using Google and now DeepSeek v4 and I am having absolutely no problems and it's a fraction of the cost. I'd love for Claude to be 10x better but it just isn't, for my use case anyway.
Hey, at least the superior performance of a 4090 or a 5090 can be objectively measured.
I’ve been using DeepSeek V4 in OpenCode exclusively for about a month.
I think it’s great, but coming from Claude Code it did feel like going back in time by ~6 months in model capabilities. This isn’t a big deal to me for what I do, but the difference is definitely there.
Deepseek v4 Pro is like Opus 4.5 or GPT 5.2, but costs pennies on the pound for API. Which is to say, I should definitely be using it more to let my Codex and Claude subs go further.
Opus 4.5 was definitely stronger than DeepSeek V4 for me, specifically with large context.
I’m being pedantic/splitting hairs, though. I’ve obviously switched to DeepSeek full-time because it makes more sense to me pragmatically — I spend a few more tokens to get the outcome I want, but the tokens are cheap as dirt and the API is faster.
Perhaps I should plug it into Claude Code and see how it performs? I haven’t tried that.
Opus 4.8 and GPT 5.5 are the best models, but people don't care about "best" anymore, until there is a big leap in capability I don't think anyone will care about point releases.
Vibes and tribalism will prevail until one of emerges as clearly and unambiguously superior to the other.
You're projecting
> Same reason people buy the RTX 4090 and 5090 cards - overpriced but they must have the "best".
Or they need to run high VRAM apps like LLMs
Or they have 4K monitors and want smooth gameplay on them
Is this whole thread just dedicated to snark about other people’s personal preferences?
Claude was the best for the longest time. GPT5.5 challenges that, but inertia is real
You're comparing apples to oranges. Claude is a frontend overall product name, GPT5.5 is a specific model. Which model within Claude's offerings are you referring to? Opus 4.7, Sonnet 4.6, or something else?
I am not refering to one specific model, I mean the entire Claude Opus line starting from about 4.5, vs the at the time equivalent OpenAI model
Google came pretty close at times
Should've used deepseek. That would have have been interesting.
Ah that's always SO fun. It doesn't matter how "smart" the person actually are (or think they are) we are ALL susceptible to influence and blind tests are shockingly simple to implement.
Convinced you can distinguish A from B? Ok! No problem, let's try! Can be at the dinner table for fancy wine or with agents, it's all the same, you try an option, another option, maybe all options from the same, and if you reliably can't tell well kudos, you are just like the rest of us!
It's easy to "know" in retrospect but blind test is where genuine difference can be found. Or not.
It’s also true in every other realm. Governments, think tanks, political parties, and activist groups use propaganda because it works.
I sometimes wonder how much of what I believe is bullshit I was fed through intentional propaganda. I do think as I’ve gotten older I’ve gradually identified and challenged some of it.
Isn’t this obvious?
Over half of HN commentators visibly struggle to piece 3 or more complex ideas together.
How could anyone, who spent more than 30 minutes reading HN, expect otherwise?
Tribalism at it's worst. It's like the Coke and Pepsi comparisons from years past.
I don't think it's marketing, it's the "nobody got fired for buying IBM" effect applied to software developers choosing tools.
It's the same reason why most of the software out there keeps using bloated technologies that are most of the time the wrong fit for the product.
And the same applies to tooling. Nothing new.
Which model produced code that ran faster, with less bugs, etc?
I don't think it's marketing, for quite a long time Claude was clearly better and not everyone has adapted to the new reality where they have similar capabilities.
I was really frustrated by GPT-5.4, but last night I really pulled out the stops and within a few hours I got path tracing and DLSS implemented on top of Godot, which doesn’t even support DLSS. Just to see if it could do it? And you know what, it did, which was absolutely mind blowing. It wrote like 5,000 lines of C++, I set up a mostly local asset production pipeline using GPT image gen, voiceovers using ElevenLabs API, and even background music using Suno via the chrome use extensions in Codex. I just wanted to see how far I could push this little dumb game my kids asked me to make, and my kids are like “wow our game looks so good!” These models are absolutely mind blowing. I didn’t want to go to sleep I was having so much fun.
Adapt to what? If they are the "same", there is no reason to move. Actually, there are reasons not to, if you care about OpenAI's behavior.
Pretty easy to tell depending what the code is. GPT follows this pattern is using maybe_something and using uppercase constants by default. Claude is a little more natural but tends to include more fallbacks than gpt5.5
Modern Tupperware party. 100% agree! That’s the best framing I’ve heard in a long time!
Sam Altman is so cartoonishly, over-the-top sociopathically shady that he makes JD Vance look like Benjamin Franklin. I mean, honestly, tricking third world people into retinal scans in order to get a scam crypto coin? Anyone using OpenAI for anything at this point should pause and examine their ethical compass.
Calling this a "tupper ware" seems a bit emotional, you're intentionally disregarding many things that matter for devs in order to try to claim equivalence, rather than paying attention to the actual process of software creation.
For example in your "test" you're only looking at output and ignoring the entire process of creation.
In addition to that process, you're ignoring that Claude Code was first and better for a long time, why would people switch for something that produces the same output? Claude Code has been way ahead in the process of agentic software creation for a long time, I still prefer its features. Even though I think that Opus 4.7 was a big step backwards, and I've been getting worse results seemingly every day with the churn of features at Claude Code, some of that may also be me testing the bounds of how little I can specify and still get acceptable results, so it's hard to know.
Calling all these concrete realities "marketing" is itself you trying to market Codex as "good enough" instead of paying attention to how we got where we are and where we will go in the future.
No, Tupperware is the exact analogy. As you point out though, the multi level marketing applies to all models. Anthropic is just the most aggressive, especially here.
Software developers are the most susceptible of all population groups for amplifying their employers' new whims. There are true believers and useful idiots, but many are just mediocre and know that playing along will further their career for a couple of years.
In the end they will be fired anyway of course.
a) everyone is "susceptible" to marketing - so what
b) therefore a preference for Claude is marketing - complete bollocks
Either the tasks you chose were well below the capabilities of top models, or meaningful differences for preference are elsewhere, or both.
Your comment is probably energy-efficient and sustainable, however, because you could use it again and again when another comparison comes up, like Vim vs Emacs, or tea vs coffee
I can tell the difference between tea and coffee 100% of the time.
I have always found this field, especially in the last 10-15 years, to be incredibly fad driven to the point that it reminds me of things like fashion more than an engineering field.
It’s one of the things I don’t like about it. All humans are susceptible to herd behavior and influence but engineers should be at least a bit more hard nosed and reason more from first principles.
I don't think that's the only reason but you're spot on about OpenAI marketing being absolutely terrible. The primary product names of "Claude" vs "ChatGPT" highlights this remarkable difference. To the point where I'm seeing Claude completely take over the generic term for agent.
I do think OpenAI is doomed due to bad leadership. What you said (that the marketing is relatively terrible) and what others are saying here (that the product is worse) is damning isn't it? Are they really failing on all fronts?
in my experience out of the box Claude Code is the better tool if you want to spend 0 time on config
If advertising is a multi-billion dollar industry then it has to be effective!
I don't think it's only marketing. OpenAI had the advantage of being first to the market, and in the beginning of the race it seemed that the future belongs to them. Then came the bad PR and unpredictable quality of their main product.
For general use, ChatGPT's answers have gotten worse over the last year. I abandoned it.
Isn't the experience of interacting with the models appreciably different? It's not all about the outcome. Not to mention the harnesses are increasingly the real product.
Sure, none of this is rational.
Some of its timing: Claude Code was good before other harnesses and so behaviors (and contracts) were timed to lock in on that ecosystem.
Some of it was ethical/political: Anthropic fighting with the Trump admin about use of the model.
Some of it is social: Never overrate a CEO just being kind of perceived as a piece of shit by people who have power to influence decisions.
But switching costs are low! Because of the same models!
Let the race to the bottom commence. Hopefully before the monopoly/collusion starts.
1. It's 1 in 10 failures that can take half of your time or bugs that can take a long time to surface. Plus the way they change things largely depends on the current codebase (and how it was created)
2. In my case codex seem to be writing a more solid code, but I still use claude most of the time because it's my witty rubber ducky and I can actually sometimes force some legit insights out of it. Codex is much worse at this. And whether that matters or not depends on the project.
> i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
This is complicated by the way that the coding agents inject prompts that preempt and potentially undermine user instructions. I suspect that one of the reasons Codex works way better for me than Claude Code in certain projects is that the latter adds some garbage like "go ahead and write repetitive copy/paste code, keep it simple, take shortcuts" to every session. A fair test would have to hide but more or less still use the harnesses, not just the models.
Very similar thing happened when I was at a design event a couple of days ago. I’d say it’s even worse on the design end - there was a big discussion around how to optimize your usage of Claude. Not optimize your usage of AI, but Claude specifically, as it was the only model literally all of them were using. The biggest issue is they were all hitting their usage limits. I asked whether they had tried other, lighter models (Ie gemini or composer), and it was like I was speaking a foreign language.
I find codex superior in speed and equal in quality, so it’s my preference. But Claude Code made prettier UIs last time I tested. Codex produces Microsoft-grade UIs. Very enterprise and ugly unless I actively steer it.
The results may be the same but I personally find Claude nicer to work with. It seems to understand my intent better than GPT and needs less guidance. Maybe it’s just personal preference.
I picked Anthropic way early on, before Claude code even existed. Because they at least play lip service to behaving morally. That’s the most you can hope for these days really.
“…Hey but at least the tormentor in my panopticon gives you a high five after the skin harvesting”
This has to be in some far side gallery somewhere
> Edit: i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
I think you're missing one (or more) of the facets individuals decide "better" is, for the subjective individual.
Early on i hopped between all the providers. Code quality for SOTA at the time was pretty decent if you didn't ask it to solve challenging problems. However the thing i found most difficult is consistency in how it listened. Eg Gemini (i forget what version, not current) was super prone to focusing solely on the functionality/goal, but not any of the directions on how to write the code. It would throw in comments everywhere, document in a manner i didn't want, use abstractions i told it not to, etc.
How well a model would follow instructions to drop their horrible "isms" was the #1 criteria for me. If i have to constantly remind the model not to do X behavior then it's a terrible model.
With that said, that is why i chose Claude for the last N months. However i've stuck with Claude because dealing with these "isms" and their little behavioral nuances is a chore in itself. I've found you have to learn the model just as much as anything, and so the idea of hopping these days when i'm just trying to get shit done is not likely.
These days for me personally, Claude has to give me a reason to switch rather than me investing even more money (i'm on the 20x plan) in other providers. I'm definitely not committed to Claude Code, but i am tired of the LLM churn, tooling churn, subscription churn, and the general fear of which providers we can trust.
edit: In short, it's the interactive UX just as much as it is the final output.
You're overestimating the extent to which individual developers have a choice here. My employer signed up for a Claude Code membership, I use Claude Code. I cannot use Codex.
Anecdotally I hear of folks with workplace Claude Code subscriptions all the time. I'm not sure I've ever heard someone talk about their workplace Codex subscription. Anthropic clearly did a far better job chasing corporate customers while OpenAI was busy chasing consumers with Sora etc.
Corporate accounts pay the full api price, so I don't know what is stopping them or you from also using codex on the same terms?
Intellectual property. My employer has an agreement that our code will never end up as part of Claude's training data. At this point there are also now custom Claude integrations etc.
I'm sure they could also negotiate a similar deal with OpenAI but in my outsider experience it seems that negotiations around these kind of corporate contracts takes forever and when the selling point is "they're broadly pretty similar" I suspect the motivation isn't there.
> My employer has an agreement that our code will never end up as part of Claude's training data.
“Our competitive advantage is that we believe them,” I’ve read—wonder if that’s still a [prevailing] sentiment.
(Edit - context was probably using SotA models instead of being limited to local open source only)
I think the marketing campaign came first. Anthropic captured developer mindshare first, then they brought it to their companies.
The OP seems unaware that Claude had a lead in this space and captured market share and attention for that reason alone.
The test they (supposedly) ran with their coworkers to look at PRs from both is such a bad way to compare LLMs that I don’t think they’re very experienced with using them.
> The OP seems unaware that Claude had a lead in this space
I remember using GitHub Copilot (OpenAI "Codex" mk1) in Aug 2021 (ChatGPT would launch a year later 2 weeks after Meta's botched Galactica release). Cursor & others took it and ran a mighty good race.
Maybe some of these companies will learn to stop appointing awful leadership then.
Having a sleazy CEO like Sam Altman or Elon Musk is a business risk. Many potential customers don’t like these people and they say abrasive and alienating things publicly.
Rolling over to the DoD’s desire for fully automated weaponry is more bad marketing. How many people switched from OpenAI to Anthropic over that? I sure did. Anthropic’s willingness to burn that bridge over an ethical stance said a lot about the company to me.
I’m not going to use OpenAI products for these reasons among others.
I’m also not going to use Cursor as xAI plans to acquire Cursor.
Maybe it’s foolish of me to avoid those companies for such petty reasons, but that’s not my problem. That’s their problem.
It takes years to build trust and hours to burn that trust to the ground. Customers can hold grudges for a lifetime.
I did a pair programming comparison over 3 month on Codex 5.2 and Claude Sonnet and my subjective experience was that based on cost and rollbacks to a previous commit Claude is significantly better. Especially in VS Code Copilot. I wrote a long Substack post about it. I would share its but its in the paywalled archive by now.
IME Claude has been a bit inferior. But, yeah, the marketing is just great.
Who doesn't think they are susceptible to marketing?
That seems like a strawman.
> We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
> Couldn’t tell.
Why would you expect them to be able to recognize the signature of a model from a pair of PRs? I don’t understand why you think this is a useful test for anything when we have numerous benchmarks that run 100s of tests on models and both GPT-5.5 and Opus-4.8 perform similarly.
I have subscriptions to both. I run both on max reasoning. It is interesting to see the relative strengths and weaknesses of each model. You won’t always see it if you’re just scanning code. Some times one will spin for a long time on certain problems where the other has no problem finding the appropriate parts of the codebase and getting an efficient solution.
antirez made a comment that he and others found GPT-5.5 to be better at the optimization tasks he was working on than Opus. There are other classes of tasks where GPT-5.5 consistently stumbles where Opus will get a solution quicker. Lately I’ve been working on some code where neither model comes up with a good solution. That’s just how LLMs go.
The only reason you have seen more activity about Claude is that they got there first. Codex has been a step behind and GPT couldn’t match Opus at first. You’re testing them after they’ve closed the gap.
I am not sure why the past matters here. I am talking about now, it is a fast moving space.
As for the test, of course the output matters. Take image models for example. Differences are clear as day.
The results are the same but I’ve found the process to get to the results are just more pleasant with Claude. I can’t put my finger on it. Overall most these models at the highest level are about the same in many respects but the UI/UX for some are just more enjoyable, for lack of a better term.
Codex I feel the need to be very specific and precise with. Claude… I feel like I can be lazy, which I enjoy.
Both still need to be reviewed stringently but I feel I can be more ambiguous with Claude and get better results than when Codex.
You confuse ease of using a tool with quality of output. A skilled carpenter can work both with high and with medium quality tools and prefer one over the other with no difference visible in the craft they produce.
Instead of only hanging them evaluate the final output, you ought to also have a way to have them evaluate the process and agentic aspects in getting to said output. Claude Code outshines when you look at it end-to-end, in my experience.
Benchmarking 1 or a few samples isn't ever going to yield anything but noise. The actual benchmarks use thousands of tasks.
GPT 5.5 genuinely was back on top for a while there, but if you look at the past 2 years, being on Claude was better than being on OpenAI most of the time. If you're going to pick a tool and not switch constantly it was the right choice. Not to mention their tooling has always been ahead, and that gets ecosystem benefits.
Are they close and interchangeable today? Sure. But Sonnet was genuinely way better than anything OpenAI offered for a long time -- the valuation reflects that, not any given moment in time.
100%
The belief structures here are really interesting. Blind tests would likely illuminate a lot of why people think that
I think Sam Altman is an asshole and I prefer to spend my money elsewhere.
Frontier models being commoditize is inevitable. OpenAI thinks they're still competing on technology, and not user experience and market reputation otherwise they'd understand the continuous negative PR generated by Altman's chaos is going to cost them everything.
He must have done something personally to you.
That's... that's not how social perception works at all.
Dario is genuinely as bad as Sam.
I’ve heard this said, but why?
he pushes mysticism of the models
he's starkly anti-China with a warlike posture that I find dangerous and unappealing
Anthropic has a much more confused mission statement than OpenAI
in interviews, Dario appears to care little for the well-being of common folk, while Sam at least pretends
Altman does appear to be an asshole, but I have bad news for you if you think Anthropic are the good guys. If anything, they might be worse than OpenAI.
What makes you say that
Can you elaborate or give some examples as to why? I dont know much about this subject, last i heard, Anthropic declined deals with Military and government agencies - while OpenAI opened their arms. But i am not
I wouldn't say Anthropic is worse than OpenAI, but there's a lot wrong with them. https://anthropic.ml/ has a collection of incidents and relevant evidence.
What makes you think Anthropic might be worse than OpenAI? Anything specific, or just vibes?
How can you say this as if supporting Dario is any better.
At the top level of anything there is almost no such thing as a non-asshole.
None of them care genuinely about you they just want your money.
OpenAI’s models could be materially better than Anthropic’s and I still wouldn’t use them because I don’t want to support Altman.
Do you think Amodei is different?
The choice is not binary. I use DeepSeek (paid) for coding, and Qwen (free) for casual stuff from the browser chat UI.
I mean paying for deepseek is sending money to China, and any company in China is pretty much an extension of the CCP (or can become one at the snap of their fingers). I don't think this is much better or worse than the American AI companies.
Wait, people actually pay for DeepSeek?
Probably. When most people choose to time in AGAINST the idea of funding evil people, I think their arguments are disingenuous, they are just looking for a way to excuse their own behavior, which they know is bad. They don't want to give up the convenience of say, their nice Tesla that would like to own, and make excuses about why it is ok to enrich a nazi even further.
No. He's worse. Much worse.
Why?
Why is Altman worse?
Dario is constantly fearmongering to generate press, gaslighting, and contradicting himself. Mythos is the most recent example of that. It was never too powerful to release, that was a lie to generate publicity and fear, and an excuse because they didn't have the compute to serve it. People were finding the same bugs and exploits using GPT5.4, GPT5.5, and lesser models. Now all of a sudden, they do have the compute, and now they're saying that Mythos is releasing in the coming weeks.
Anthropic is constantly caught up in ethical scandals too. They pump the web full of advertising bots. They steal peoples tokens, punish you for disabling telemetry, blacklist people they don't like. They had remote code execution vulns in their product for nearly a year and secretly buried that fact, no disclosures at all. Here are some of them https://clawd.rip
Compared to Scam Altman? Infinitely
Do you hold any amount of power in the world? A project that people care about, or a deliverable that someone depends on?
Just curious how you can afford to care about the guy 7 levels above the men that built and support the API that you buy.
Some people care about things beyond their own immediate self interest.
Some don't, and find it hard to believe others really do.
It’s also a weird argument. You can only spend your money once, and the affected employees also chose to work for a bell-end like Altman (or Zuck, or Musk)
I enough 'small' senior developers, project managers, product owners, internal IT people take a small stand against OpenAI products, that can still sum up to a notable impact
What is this Sam’s alt account?
People can spend money how they wish. SamA is a prick, so I don’t buy from his company. I don’t buy from Microsoft or Oracle either. Giving a company your money is explicitly supporting them and everything they do. Are you going to force me to buy products from people I don’t agree with?
why would you spend even a fraction of a second defending him
Sam Altman appears to represent a significant liability for OpenAI’s success from this point forward. A big portion of the driver for Anthropic’s meteoric rise over the last six months appears to be folks recognizing “it’s that AI startup not run by Sam Altman.” Anthropic has amazing tech, but its biggest asset at the moment seems to be that “it’s not OpenAI.”
Not saying that’s right or wrong, but it’s clearly a factor holding OpenAI back at this point.
At this point I think it’s more important to have a solid workflow and understanding of how [insert your favorite model here] works and its capabilities, than chasing the next shinny release jumping back and forth between companies. I just finished my first large project with Codex and it is hard for me to believe Claude can be much better. It may be a bit better or worse, but again, they are all so good now that the user is the one driving the difference.
Unicorns, strapped with rockets, too busy looking at each other to realise the Earth is far gone.
They'll kill us all, or they'll kill each other. They sure as hell ain't making the world a better place, like they promised.
Dario really gives: "I'll make the world a better place after I burn it to the ground, I promise."
codex gtp-5.5 is far superior to opus 4.7 working on large projects
Not everyone is a developer...
And 4.7 is so last week..
Soon none of us will be! right?
GPT-5.5 is the better programmer but Opus 4.8 remains the better system architect and product designer.
Codex is very "miss the forest for the trees", but is much better at successfully making large changes in large codebases. Claude Code makes more mistakes, but has more taste and a better grasp on idiomatic and elegant software development.
If you can afford to, I recommend juggling both.
I find arguing that a complex weighted graph has a taste is interesting.
This is not a jab, but a genuine curiosity of mine.
The taste that the complex weighted graph was trained on was better for one than the other I think is the long winded way to say it
The roulette pockets for the model are bigger for some outputs than others. Draw a big enough black box around it and a different one around humans and it's insistinguishable.
More interesting than arguing a jumble of electrochemical reactions have taste? That may seem more readily familiar but is no less strange if you prod at it. Nonetheless it’s difficult to argue either don’t produce output that has qualities of discernment (ie taste).
Great analysis and follows my experience as well. Codex is better when you know how you want the design and the architecture and you drive the agent a lot more aggressively. Claude Code feels like more autopilot so executives and users who didn’t code before AI like it a lot more.
But I feel like an expert who can drive GPT aggressively will out perform Opus. It’s why some smart people I know are opting for GPT and have fallen off on Opus. It’s like asking an F1 driver to sit in a taxi.
This is exactly right. Claude has baked in autonomy and preferences that let it handle underspecified prompts elegantly, which makes it seem smarter to people who like to prompt that way, but it also ignores instructions and fights you on things, which makes it a bad model for people who know what they want to do and specify it.
Opus 4.7 (haven't tried 4.8) just really struggles writing correct code for complicated (i.e. valuable) work. I can handle architecture, which takes <1% of my time anyway. But writing code that's wrong is a cardinal sin. I've had much more luck with GPT 5.5 so far.
In what ways? LM Arena has Opus 4.7 w/ 1567 -/+ 7 vs. 1505 -/+ 10 from GPT-5.5 Codex in code. I'm currently using both.
Admittedly my recent experience tilts Opus now 4.8, but you and others have my interest piqued re: GPT-5.5 Codex so I'm trying that more now.
Opus 4.7 is not the current version of Opus.
GPT 5.5 still invents facts rather than looking them up, and manages to come across both as condescending and sycophantic. It feels like talking to a used car salesman.
Funny cause I'm quite literally having this exact issue with 4.8 as we speak. I've been going back and forth with Claude since yesterday afternoon on chopping up, stabilizing and facilitating recovery on a flaky mega-pipeline. Not 5 minutes ago, I had to remind it that two of the solutions it proposed were not possible because the target technology doesn't allow what it wanted to do, despite pointing it to the very docs that says it can't be done in the first place.
As far as its tone... Both feel like sycophantic as hell to me. To be honest, they just all feel so.
> GPT 5.5 still invents facts rather than looking them up
So does Claude, what’s your point?
I used it and ChatGPT this week in trying to assist troubleshooting a complex DB related issue and Claude had to apologise no less than three times in which it admitted to talking complete shit.
Just one example of the kind of shit it dribbled:
> I need to be upfront with you. I should not have claimed X as if I knew that for a fact. That was overreach on my part.
You're using last week's model; Opus 4.7 is old news. Opus 6.9 is the new hotness; it is a better product manager than GPT, and has more X productivity. It replaced our junior dev team, and tells me my hair looks good.
I'm experiencing the same. Codex gtp-5.5 has more brilliant intuitions, write less code, i.e. it identifies the exact point in which the modification shall be done. Nevertheless, huge improvements on personality from opus 4.7 (it was too accomodating) to opus 4.8
I strongly believe the reason gpt-5.x performs so well on large projects is because of the focused training they've done on their dedicated apply_patch primitive.
The official implementation of apply_patch is well thought out. It is a two-phase process that will not actually make any changes until all files in the change set are not ambiguous. The pre-commit error feedback usually fixes anchoring issues with one or two additional attempts. It generally goes something like:
The anchor error feedback helps massively because in this implementation it also returns the current line numbers where the problem was found.
Techniques that replace the whole file or depend on find-replace are useful in more isolated contexts. However, when you need to refactor 20+ files, something like apply_patch is what you want. Anything that depends on specific line numbers for actual replacement targets is a total dead end for complex edit scenarios.
https://developers.openai.com/api/docs/guides/tools-apply-pa...
My problem with codex/gpt that is too verbose (mostly js and python): a lot of helper functions, a lot of 1 or 2 line functions used in 1 place only, a lot of types or proxy like objects.
I have specific skills for trying to avoid this, but nevertheless I spent half of the time fighting with its verbosity.
Currently, I'm trying to scaffold the functions/classes I know I need with NotImpelmented and ask it to implement only inside those specific places. It's a little bit better, but I still have to fight with function in functions definitions ...
Ah, it’s a good time to check in with gwern on our conversation about oAI vs Anthropic: https://news.ycombinator.com/item?id=40816755 and our predictions (ca two years ago).
Upshot - poetry expertise does not seem to be the primary focus these days, perhaps to the detriment of the entire world. We did move on from training scaling to “test time” scaling (which I hate as a name btw), Ilya does not seem to have been needed, (although I am really curious what he’s building).
My prediction that you want to be deeply embedded and really rich and part of global infrastructure feels good. My suggestion that oAI / MS would be able to use the lead in 2024 to extend was wrong.
Neither of us talked much about coding as a product that would drive value and behavior, which is super interesting to me, we were probably six months from seeing real competence of any sort there way back in June 2024.
We both seemed to think there would be a single breakout company, or could be one, (although I did suggest buying the basket), clearly not the case with GOOG oAI and Anthropic all posting serious revenues this last quarter / year.
One area of Anthropic that was nascent in 2024, but that I have come to think is super valuable is their mechinterp group. I still don’t see work done by other labs (at least published) to nearly the quality of Anthropic. And the group has clearly moved into a period of productivity; there’s a good chance in my mind it could provide a truly enduring strategic advantage as a tool to be used by the taste makers steering the ship. In 2024, interpretability seemed almost impossible to get a handle on — today, the sustained chipping away at the problem makes a lot more look possible.
They are far far better at marketing than OpenAI
They just have no moral issues with spamming the internet with bots. They utilize blackhat tactics whenever they can to get an upper hand. Every social media platform is absolutely choc full of Anthropic and Claude promoting bots, and you know they're bots because they all repeat the same things, in the same wording. X in particular seems to have millions of them.
qazinform.com seems to be shadow-banned (and posted only by OP): https://news.ycombinator.com/from?site=qazinform.com
UPD https://en.wikipedia.org/wiki/Kazinform
I get the feeling this also means AI works very well for the general coding tasks and that's their biggest success in terms of difficulty AND people paying for it.
Of course every AI company has been over promising and pumping the numbers as much as possible but OpenAI has been hitting the reality wall more because both their people not being able to keep improving at a faster rate and their whole cost structure and financial plates spinning.
This doesn't invalidate the fact Anthropic is also overhyped to the max for their IPO.
Bernie Madoff would be jealous. Stealing all open source and reselling "git clone" + "sed" for $1 trillion is something he did not achieve.
The chutzpah is remarkable.
They are selling shovels, not mining gold themselves though.
So it's more like selling a derivative on a promise to steal open source for you in a useful way.
I've seen fewer people insisting OpenAI has a moat lately, but I'm still not sold the big winner will be either of these two in the end.
In this game, who wins - in the long term - is who has the best model: so far OpenAI is ahead, so in the long term this is what matters. However, for the same reason, if in the future open weight models will be very near the quality of frontier labs, Anthropic and OpenAI will be out of business very soon. The game they play only make sense if their SOTA models do things that other models can't do at a comparable level.
IMO bad take.
You can theoretically do most things AWS does most of the time, yet people pay premium for it and keep paying for it, even though alternatives are cheaper, simpler and more performant.
I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.
You might have a subpar product (for the price) but the reputation and history is what makes people open their wallets.
> I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.
Depends. The bigger the bubble, the bigger the pop.
Only a few unicorns from the dot-com bust came out the other side (Amazon, Google, ... anyone else?), and that was a piddling affair compared to this one.
I have the same impression. Strange to see this being downvoted & it was after reading the comment that I read the username to find out its antirez!
Now, I think that with these companies IPO'ing and Nasdaq and other bending themseleves and their rules to cater to them (as in case of SpaceX), these companies are very close to an IPO.
So for the employees, they are probably gonna get good evaluations, atleast in the short term and perhaps they are having a problem which is worth having.
But as you have suggested, I feel like the whole thing might be flaky especially given open source models. I believe that OSS models are at worst close to literal SOTA ~6 months ago.
So OpenAI & Anthropic have to somehow always be on the edge to get better models to not lose this (imo) very small time grip that they have, all while losing billions of dollars and having to worry about profitability & so many other concerns in it of itself.
I don't think that there is any other thing inside CS or any industry where two pieces of software being almost comparable enough with not much moat around except a diff of 6 months best, is something on which trillions of dollars float around on. We don't know how things will pan out but if I have to guess, It might not be looking good for OAI, Anthropic over especially the longer horizon.
this is like saying the car with the better engine wins, but all we're doing is commuting to work
Some says the founder worked at Baidu before. Is that true?
I dunno, the latest Opus models seems to be tuned to waste money... and Claude is kinda lazy lately?
Most people think the current valuation is for the models themselves. Actually, they're building the infrastructure for the next 50 years.
The dark fiber of our time?
> Actually, they're building the infrastructure for the next 50 years.
What infrastructure? The hardware would be outdated in 3 - 5 years, after all. What other infrastructure is needed for AI?
my take it's because of the naming: Amodei, Claude and Mythos have this money-throwing vibe to it
The headline is false. First off, OpenAI hasn’t raised a recent round so you can’t compare these two companies randomly like this. Second, Anthropic is known to have accounting methods that give it more revenue. And neither of these companies are known to be doing gaap accounting
All overvalued.
By an order of magnitude.
tesla has been overvalued for almost a decade with little sign of slowing down, it really doesn't seem to matter anymore
This is depressing. Anthropic really is the last company we want to see leading this race, given how greedy they are. Let's not forget all of the lying and gaslighting too. The creator of OpenClaw made this I believe: https://clawd.rip
Stealing peoples tokens because you use a product they don't like... That shows the morals they have. Actions speak louder than words. Disabling peoples caches because they disable telemetry was another juicy one that I don't believe is on this site. In fact there are far more I remember that aren't even listed here.
How much dilution? Who’s getting the value?
Investors of both should read this: https://open.substack.com/pub/sublius/p/srt-introspect-why-c...
"Investors who have poured hundreds of billions into closed-source labs are betting on an unprovable safety moat".
Nobody is investing in closed-source labs for safety reasons, being able to explore more in details what and how the model is thinking is nice but by no means a game changer. What matters to investors and most of the users is that the model gives the right answer at the end.
they don't care, they're driving towards a cliff full speed and are all counting on jumping out at the right moment
Bummer, they are the least friendly to open source, and the most incompatible with free use of your subscription via your own tools/custom harnesses.
Pointless article (like much of the AI marketing hotness and spin room).
> The new valuation is nearly three times higher than the company’s February valuation, when Anthropic was estimated to be worth around $380 billion.
> In March, OpenAI was valued at $852 billion following a record $122 billion funding round.
Basically, today (Late May) we're declaring Anthropic the most valuable. They've nearly tripled in value since February. But also, OpenAI was $852B in March and presumably has grown since then.
In a few weeks we'll either have a new rounding of funding for OpenAI or they'll announce their IPO and the hype train will be abuzz that they're now the most valuable.
Either they are getting fleeced or they are getting very good terms for the investments
It’s because the programming works.
OpenAI. Spent its resources on AGI whilst Claude worked on making programming work.
Google Gemini is out of the race entirely its programming AI is a joke.
It is unclear which strategy will work in the end. 3.5 flash uses fewer tokens and is cheaper.
The models aside, my impression is that Anthropic is winning in large part because of very pragmatic and high-velocity product development on top of them; like with Claude Code.
Like actually iterating hard to make them useful. Many, many details matter here.
I haven't tested the similar OpenAI/Google tools in detail lately though. Previously I found them way too generic and unpolished to be useful.
Is there something to this?
My impression as well. OpenAI was riding the high of ChatGPT with a very confusing and seemingly unfocused offering beyond that. Anthropic was always laser focused on business use cases. Claude Code being the big one. Finance seems to be their next target.
Anthropic has much narrower capabilities. No image generation, no video generation, no 3d world models, barely any voice stuff. But they know who their target customers are, and their API has a model selection anyone can understand and pricing that rarely changes. Focus and predictably
Start what?
ChatGPT dropped the ball for a while that most devs and technical people went to Claude for a year or more, they still probably have the most normie market share + are at least trying to win back some of that delay in their latest model so it'd be interesting to see
The "normie" market doesn't pay for enterprise features though. They might cost more in inference then they make back from advertising.