"Running leading LLMs across the eleven systems shows that LLMs are great at producing correct TLA+ syntax but struggle to ensure conformance and appropriate invariants."
I think that's what author implies by this sentence in the intro:
> It’s still your responsibility to understand your system and define what “correctness” means, and you need a high-level understanding of temporal logic.
Unfortunately the benefit of TLA+ is the act of modeling your system painstakingly. The actual checker helps confirm your hypothesis, etc. But skipping the modeling and outsourcing it is not ideal. I’ve always struggled reasoning about models my team mates wrote, and will often have to mentally go through the process of arriving at the same abstractions/invariants etc before I can understand it.
I did this myself a few weeks ago and the technique that helped me the most was to compare the TLA output against the race-conditions I could construct by hand. I worked iteratively and unit-tested the model by constructing a model that didn't have certain race protection mechanisms and validating that the model generated the expected race.
You can also work backwards from the races it generates and ensure that they're real races.
https://www.sigops.org/2026/can-llms-model-real-world-system...
"Running leading LLMs across the eleven systems shows that LLMs are great at producing correct TLA+ syntax but struggle to ensure conformance and appropriate invariants."
I think that's what author implies by this sentence in the intro:
> It’s still your responsibility to understand your system and define what “correctness” means, and you need a high-level understanding of temporal logic.
Unfortunately the benefit of TLA+ is the act of modeling your system painstakingly. The actual checker helps confirm your hypothesis, etc. But skipping the modeling and outsourcing it is not ideal. I’ve always struggled reasoning about models my team mates wrote, and will often have to mentally go through the process of arriving at the same abstractions/invariants etc before I can understand it.
I did this myself a few weeks ago and the technique that helped me the most was to compare the TLA output against the race-conditions I could construct by hand. I worked iteratively and unit-tested the model by constructing a model that didn't have certain race protection mechanisms and validating that the model generated the expected race.
You can also work backwards from the races it generates and ensure that they're real races.
What did you do to get the TLA output?
I was working from a design-doc, not code.
"look at @design-document and generate a TLA+ specification for the interactions between local and remote"