Testing distributed systems with AI agents

I like the “claim-driven” framing.

For stateful systems, tests named after setup details often get weakened over time. Tests named after the claim they are trying to falsify are harder to water down.

The part I’d be most interested in is how well this works for business invariants like idempotent posting, no lost acknowledgements and recovery after partial failure.

cyanydeez 1 hour ago

I think all these scripts become poor where they're context based as opposed to actual guardrails; what we need is various silo'd protocols like a ssh protocol that keeps the harness producing work through the protocol rather than a bunch of loosely based bash scripts, etc. Plus, the harness needs to be outside the environment so it's not something you have to install ever on a remote system, whether it's a container, a vm, a ssh location. We shouldn't base everything around running bash without a secure tunnel into the location of interest.
The failure mode of these tools is self destructive in many cases.

aphyr 41 minutes ago

Welp. Glad to see Li Shen's using the last fifteen years of my work to automate away my job. :-/

-- edit --

I've seen clients and some colleagues working on things like this, and I can't seem to put into words how disheartening it is. With the exception of some private analysis work, I've shared everything I've built, with everyone, for free. Papers like Elle took years to think through, implement, test, and write. That's free. High-quality checkers, Knossos, Jepsen itself, and the analyses I've put my life into: all public, all free. I put a lot of time into docs and support; essentially all unpaid. I teach classes and give conference talks to make these techniques broadly accessible because I want other engineers to be able to make high-quality systems.

At the same time, I've got a giant pile of debt from an old house that just won't quit throwing curveballs at me, and it's gonna be a few more decades before I can retire. The fact that my clients are willing to pay for this work is why I can invest so much time in R&D and give it all away. When I see someone roll in and just tell an LLM "Go use Jepsen and Elle and figure this out", it's like... well fuck. Is this even possible any more?

Thankfully, LLMs are still really bad at my job, but I don't know if, or how long, that will last. They also don't need to be good to be useful.

And if these LLM tools work, it's good, right? They find bugs, systems get safer. I want systems to be safer. On the other hand, I'm motivated to share what I do because I really want to help people. If it's just LLMs... it feels hollow. I think about this every time I've tried to work on open-source in the last few months. When I spend hours trying to figure out how to keep naming consistent, how to preserve compatibility over a decade, how to make complex code approachable through quality documentation... I have a person in mind. Someone I'll never meet, but they'll see that work, and their life will be a little easier, and maybe they'll smile. I've been talking with my therapist about it: how the work I used to do thinking about other human beings now feels purposeless. How the effort I put into making these tools and ideas accessible will inevitably cannibalize my own employment, because someone, somewhere, is going to tell an LLM "Hey, go do that", and I work in a very, very small niche. It feels like incipient depression.

Recently I've been thinking about taking Jepsen and its supporting libraries closed-source, and changing the way I write reports--instead of teaching people how to test and what to look for, just telling people the results. I don't want to do this. It's bad for everyone, but maybe it buys me a few years of runway. Enough to pay off the debt and figure out what I can do next with this body.

Fuck.

gkfasdfasdf 17 minutes ago

Coming from the creator of Jespen, that is a pretty big endorsement.