Anthropic walks back policy that could have 'sabotaged' researchers using Claude

www.wired.com

68 points by ericflo 15 hours ago

https://web.archive.org/web/20260611033414/https://www.wired...

One takeaway from this is that anthropic do not want their public-facing model to be good at advanced ML research.

With that incentive it seems reasonable to assume that future anthropic models will not be good at ML research, at least because they aren't incentivised to make it so, and therefore using other models perhaps open source will be the way to go.

This does make the large assumption that they can afford to train a parallel model for themselves to assist in their own research. But given their huge valuation, and incentives, that extra cost is feasibly worth it for them.

nomorehere 6 hours ago

I think I've seen this model used by major academic publishers before.: https://files.catbox.moe/mrh1ll.jpg

impulser_ 15 hours ago

It's kinda weird to think the Chinese AI labs might be more trust worthy than the US labs.

- Anthropic is ran by a bunch of nut jobs.

- OpenAI is ran by a guy you can't trust.

I don't even know if we should include DeepMind, Meta, or xAi in the conversation of AI labs at this point since they can't produce models better than Chinese labs.

reasonableklout 14 hours ago

To be fair, nerfing Claude on frontier research tasks is consistent with Anthropic's stated beliefs. So in that sense you can trust them to always behave consistently if strangely. But this launch was done very poorly with the lack of transparency on when the frontier research policy was violated.
- impulser_ 13 hours ago
  
  Yeah and their belief are fucking crazy and dangerous. They are literally sabotaging their users. They built in malware into their model if you prompt it about training a fucking AI model. It doesn't tell you, no it literally sabotages you by editing your prompt and intentionally goes against your request.
  You want fucking nut jobs like this building models?
  It's one thing to build safeguards on your model and have it prompt the user back. I'm sorry I can't help you with this request. Chinese models do this for some requests.
  It's another thing to actively try to make the model perform worst for your user on purpose because it asked the model to do something you, the model creator, didn't like.
  Imagine someone is asking a logical medical question and the model swaps the prompt and purpose being less intelligent and gives bad advice to this person.
  How do these people not understand they are stupid.
  
  noduerme 11 hours ago
  
  Is it really crazy to nerf a proprietary model to prevent it from training another model? I don't think that's even remotely similar to giving bad medical advice.
  
  Balinares 7 hours ago
  
  Given that the "proprietary" model is built on stolen work at an unprecedented scale, it's at the very least hypocritical to a degree that would not be possible without a fundamentally amoral mindset.
  
  preg_match 3 hours ago
  
  It’s not a nerf, it’s sabotage. That’s different. This is like if you’re driving a car and it detects your pulling up to a competing dealership so it cuts the brakes.
  This is, in my mind, effectively malware. We don’t know exactly what code the model will inject, and we certainly don’t know when it will happen. It could very easily introduce vulnerabilities.
  
  skeledrew 5 hours ago
  
  > You want fucking nut jobs like this building models?
  It takes *nut jobs" to advance tech like this at the speed it is. They have strong beliefs and they work hard to realize those beliefs.
redox99 2 hours ago

Deepmind is definitely frontier. They just don't care that much about code.
piyuv 1 hour ago

The thing with Chinese AI labs is that you don’t need to trust them. They publish the models, you can run them on-prem or rent a beefy VPS.

LoganDark 14 hours ago

https://archive.ph/yxYhU

SilverElfin 14 hours ago

The damage is done. They designed Fable to be dishonest and sabotage-y. Look at this report of Claude now randomly changing AI software without being asked to:

https://xcancel.com/hammer_mt/status/2064839924398825798

This is so completely dishonest. But it also shows how deeply anti competitive Anthropic is. They will talk about safety but it’s not actually about safety: building features like this seems intended to hurt competition in the AI space. They don’t mind if AI helps YOUR competitor but if it means competition for them, they suddenly have a problem with it.

I don’t care that they walked this back. They’ve shown who they are. And what they’re capable of.

nmfisher 14 hours ago

Call me a cynic, but I don't believe this is a genuine change of heart at all. It feels much more like a panicked response to something that might undermine their IPO.

Even if you trust Anthropic today (which I don't), they clearly don't want competition and there's no telling what other shady moves they'll pull in future.

The only sustainable way forward is to support open models. I was already on the fence about whether or not to keep my Max subscription (the extra cost over something like DeepSeek V4 didn't really feel justifiable). This is the tipping point for me, I'll be cancelling my sub before it renews at the end of the month.

LoganDark 14 hours ago

I think they are legitimately convinced that this model is so dangerous it could destroy the world and that they genuinely have the responsibility to prevent it from assisting other models to destroy the world.
I don't think I agree that I should be forbidden from e.g. patching a binary to work on the latest macOS since the company behind it died and intentionally installed a time-based kill-switch (FUCK ADOBE for popularizing that practice). But ooOOooOOoo working with machine code is so cybersecurity and therefore suspicious.
- chatmasta 13 hours ago
  
  The company was founded basically out of the effective altruism movement.
  
  LoganDark 13 hours ago
  
  What is the impact of effective altruism? I looked it up, but I don't understand how it differs from simple logical consideration, i.e. how it would be responsible for any of Anthropic's eccentricities.
  
  chatmasta 13 hours ago
  
  It’s logical consideration with “logical” meaning Spock style logic, ie utilitarianism at all costs. Another prominent EA is SBF for example. It’s designed to sound innocuous and many of its cultish promoters may genuinely believe it’s innocuous, but it’s not.
  
  LoganDark 13 hours ago
  
  Can you help me understand what costs? Utilitarianism alone would not necessarily be so obsessed with these safeguards -- Anthropic seems to have much more of an obsession with moral good than utilitarianism alone would suggest. I feel utilitarianism alone would likely be more obsessed with advancing the technology, making it generally available, and more generally compensating for attackers advancing at similar rates, than with obsessively trying to avoid being the way they get there. In other words, utilitarianism alone wouldn't explain such the obsessive sense of responsibility and fear of reprehensibility over how their tools are used.
  
  cyanydeez 9 hours ago
  
  the problem is pretty simple. what do you think happens when you ignore the source of income needed to be "effectively altruistic" or the timelength needed to implement your altruism.
  if I pollute your groundwater today, but promise you in the future to give you double the clean water, because ill make billions on some industrial process, youd understand.
  the simple physics is about entropy: ignoring local effects and claiming youll do global good is effective altruism.
  unless you misunderstand: the hard part of altruism is knowing whether you are actually doing good or just making yourself feel good by deluding the subject.
  many social science fails to move social improvements from lab to real world implementation because of structural variables that cant be overcome.
  so effective altruism is the belief you can reverse entropy by ignoring local effects...in practice
  
  LoganDark 8 hours ago
  
  Thank you! So, this would cover things like ruining cities with datacenters under the impression that they'll understand once the global superintelligence cures cancer. Or etc.
  I don't think Anthropic is the one aggressively expanding their datacenters to areas where literally everybody does not want them, but that's "effective altruism", right? Just justifying bads with a supposedly larger good?
  
  cyanydeez 5 hours ago
  
  I've got no idea, one of the problems we're all stuck with is American capitalism has spent about half a century ensuring no one knows what it's doing. Read through stuff like this current Uber "problem": https://consumerwatchdog.org/accountability/accountability-r...
  Essentially, all these corporations just use LLCs and random corporations to do the things they dont want on their public books, and the actual benefactors arn't required to be listed or registered anywhere. America is one of the largest owners of these poorly defined companies. Combine that with how much dark money can be used in American politics, there's zero chance anyone can figure out whose pushing what agenda.
  But we know they all came from the same tech scene, so they could all be in any one of those cults.
  
  EnPissant 12 hours ago
  
  It's not real. It's like naming your movement "The Good People". It sprouted from the "Rationalist" community, which is even more self-aggrandizing.
  Neither has any hope of doing any good for the world as they don't understand evolutionary pressures. They are set up to reward making members feel smart, not accomplishing anything.
  And if they ever gain any real power, they will be corrupted immediately.
  
  LoganDark 12 hours ago
  
  I don't see any of that in Anthropic at all. They're not intelligence above all else, not by a long shot. They're scared of intelligence and obsessed with ensuring it can't be abused, even as they advance the frontier.
  
  Planktonne 8 hours ago
  
  That's what they claim, but it's not supported by their actions. As in the parent comment: saying you're the good guys doesn't make that real.
  
  cyanydeez 9 hours ago
  
  which is effectively a delusion about out racing entropy: guys, if we just make a lot of money, we can fix all thw problems that making a lot of money creates
- SilverElfin 13 hours ago
  
  > I think they are legitimately convinced that this model is so dangerous it could destroy the world and that they genuinely have the responsibility to prevent it from assisting other models to destroy the world.
  Do they really believe that? Or do they just want to control this technology exclusively with moves like this and with pushing for regulatory capture after complaining about safety all the time? Didn’t Dario say that GPT2 or GPT3 would present a similar destroy the world level of danger?
reasonableklout 13 hours ago

I guess I don't understand why it's shady. It seems more like a poorly executed decision to enforce a publicly stated policy (it's been against Anthropic's ToS to use their models on frontier ML research for a while now). After all, people found out about this through their published system card.
It is definitely a bad idea to do this without notifying the user, because users who are incorrectly affected will have no way of providing feedback or getting support. And it is also anticompetitive, but if you truly believe that AI is not a normal technology, it is rational.
- nmfisher 12 hours ago
  
  It's shady because they were going to silently poison your outputs.
  It's actually worse than it sounds initially, because Fable isn't actually omniscient when it comes to safety classification. Many people (myself included) had refusals or fallback to Opus 4.8 for seemingly compliant/innocuous requests.
  Wouldn't you be pissed off if they decided to sabotage your project despite having done nothing wrong?
- dannyw 10 hours ago
  
  The trouble is the silence, not Anthropic setting guardrails. Claude saying "I'm sorry, I can't assist further because it looks like you're [XYZ]" is fine.
  We all know the false positive rates for classifiers on Fable. Imagine being a ML researcher working on any kind of ML/AI project that isn't against their ToS, and having your codebase poisoned and sabotaged silently.

rvz 15 hours ago

Too late.

We now all know that Anthropic CAN do that if they want to. The fact that they told you upfront about it shows that their arrogance on this self-sabotage against their customers is at stratospheric levels.

Believe them the first time, and they are not your friends at all.

dtj1123 13 hours ago

Great, now just let me use it for bioinformatics and we're good.