points by LiamPowell 12 hours ago

They should show this then to demonstrate that it's not something that has already been fully considered. Running LLMs over projects that I'm very familiar with will almost always have the LLM report hundreds of "vulnerabilities" that are only valid if you look at a tiny snippet of code in isolation because the program can simply never be in the state that would make those vulnerabilities exploitable. This even happens in formally verified code where there's literally proven preconditions on subprograms that show a given state can never be achieved.

As an example, I have taken a formally verified bit of code from [1] and stripped out all the assertions, which are only used to prove the code is valid. I then gave this code to Claude with some prompting towards there being a buffer overflow and it told me there's a buffer overflow. I don't have access to Opus right now, but I'm sure it would do the same thing if you push it in that direction.

For anyone wondering about this alleged vulnerability: Natural is defined by the standard as a subtype of Integer, so what Claude is saying is simply nonsense. Even if a compiler is allowed to use a different representation here (which I think is disallowed), Ada guarantees that the base type for a non-modular integer includes negative numbers IIRC.

[1]: https://github.com/AdaCore/program_proofs_in_spark/blob/fsf/...

[2]: https://claude.ai/share/88d5973a-1fab-4adf-8d29-8a922c5ac93a

SpicyLemonZest 10 hours ago

They've promised that they will show this once the responsible disclosure period expires, and pre-published SHA3 hashes for (among others) four of the Linux kernel disclosures they'll make.

> Running LLMs over projects that I'm very familiar with will almost always have the LLM report hundreds of "vulnerabilities" that are only valid if you look at a tiny snippet of code in isolation because the program can simply never be in the state that would make those vulnerabilities exploitable.

Their OpenBSD bug shows why this is not so simple. (We should note of course that this is an example they've specifically chosen to present as their first deep dive, and so it may be non-representative.)

> Mythos Preview then found a second bug. If a single SACK block simultaneously deletes the only hole in the list and also triggers the append-a-new-hole path, the append writes through a pointer that is now NULL—the walk just freed the only node and left nothing behind to link onto. This codepath is normally unreachable, because hitting it requires a SACK block whose start is simultaneously at or below the hole's start (so the hole gets deleted) and strictly above the highest byte previously acknowledged (so the append check fires).

Do you think you would be able to identify, in a routine code review or vulnerability analysis with nothing to prompt your focus on this particular paragraph, how this normally unreachable codepath enables a DoS exploit?

  • LiamPowell 9 hours ago

    I agree they found at least some real vulnerabilities. What I think is nonsense is the claim of finding thousands of real critical vulnerabilities and claims that they've found other Linux vulnerabilities that they simply can't exploit.

    There are notably no SHA-3 sums for all their out-of-bound write Linux vulnerabilities, which would be the most interesting ones.

    • SpicyLemonZest 9 hours ago

      Sure. I guess it's a question of whether this is the worst they found or a representative case among thousands. It sounds like you'd know better than me, so I'm going to provisionally hope you're right...

    • tptacek 9 hours ago

      Why is that nonsense? Do you think they exhausted all their compute finding just the few big vulnerabilities they've already discussed, and don't have a budget to just keep cranking the machine to generate more?

      They're not publishing SHAs for things that aren't confirmed vulnerabilities. They're doing exactly the thing you'd want them to do: they claim to have vulnerabilities when they have actual vulnerabilities.

      • SpicyLemonZest 9 hours ago

        If I understand Anthropic's statements correctly, they've been cranking for a while, and what they have now is the results of Mythos-enabled vulnerability scans on every important piece of software they could find. (I do want to acknowledge how crazy it is that "vulnerability scan all important software repos in the world" is even an operation that can be performed.)

        • tptacek 9 hours ago

          We talked to Nicholas Carlini on SCW and did not at all get the impression that they've hit everything they can possibly hit. They're still proving the concept one target at a time, last I heard.