We should revisit literate programming in the agent era

292 points by horseradish 4 months ago

palata 4 months ago

I am not convinced.

- Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).

- LLMs are trained on tons of source code, which is arguably a smaller space than natural languages. My experience is that LLMs are really good at e.g. translating code between two programming languages. But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous.

- I wonder if it is a question of "natural languages vs programming languages" or "bad code vs good code". I could totally imagine that documenting bad code helps the LLMs (and the humans) understand the intent, while documenting good code actually adds ambiguity.

What I learned is that we write code for humans to read. Good code is code that clearly expresses the intent. If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-).

Of course there is an argument to make that the quality of code is generally getting worse every year, and therefore there is more and more a need for documentation around it because it's getting hard to understand what the hell the author wanted to do.

hosh 4 months ago

I don’t have my LLMs generate literate programming. I do ask it to talk about tradeoffs.
I have full examples of something that is heavily commented and explained, including links to any schemas or docs. I have gotten good results when I ask an LLM to use that as a template, that not everything in there needs to be used, and it cuts down on hallucinations by quite a bit.
bottd 4 months ago

> If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-)
If good code was enough on its own we would read the source instead of documentation. I believe part of good software is good documentation. The prose of literate source is aimed at documentation, not line-level comments about implementation.
- WillAdams 4 months ago
  
  https://diataxis.fr/
  (originally developed at: https://docs.divio.com/documentation-system/) --- divides documentation along two axes:
  - Action (Practical) vs. Cognition (Theoretical)
  - Acquisition (Studying) vs. Application (Working)
  which for my current project has resulted in:
  - readme.md --- (Overview) Explanation (understanding-oriented)
  - Templates (small source snippets) --- Tutorials (learning-oriented)
  - Literate Source (pdf) --- How-to Guides (problem-oriented)
  - Index (of the above pdf) --- Reference (information-oriented)
  
  zenoprax 4 months ago
  
  I've been trying to implement this as closely as possible from scratch in an existing FOSS project:
  https://github.com/super-productivity/super-productivity/wik...
  Even with a well-described framework it is still hard to maintain proper boundaries and there is always a temptation to mix things together.
  
  ramses0 4 months ago
  
  README => AGENTS.md HOWTO => SKILLS.md INFO => Plan/Arch/Guide REFERENCE => JavaDoc-ish
  I'm very near the idea that "LLM's are randomized compilers" and the human prompts should be 1000% more treated with care. Don't (necessarily) git commit the whole megabytes of token-blathering from the LLM, but keeping the human prompts:
  "Hey, we're going to work on Feature X... now some test cases... I've done more testing and Z is not covered... ok, now we'll extend to cover Case Y..."
  Let me hover over the 50-100 character commit message and then see the raw discussion (source) that led to the AI-generated (compiled) code. Allow AI.next to review the discussion/response/diff/tests and see if it can expose any flaws with the benefit of hindsight!
- AdieuToLogic 4 months ago
  
  > If good code was enough on its own we would read the source instead of documentation.
  An axiom I have long held regarding documenting code is:
  Code answers what it does, how it does it, when it is used, and who uses it. What it cannot answer is why it exists. Comments accomplish this.
  
  eru 4 months ago
  
  An important addendum: code can sometimes, with a bit of extra thinking of part of the reader, answer the 'why' question. But it's even harder for code to answer the 'why not' question. Ie what were other approaches that we tried and that didn't work? Or what business requirements preclude these other approaches.
  
  1718627440 4 months ago
  
  I don't think this is enough to completely obsolete comments, but a good chunk of that information can be encoded in a VCS. It encodes all past approaches and also contains the reasoning and why not in annotation. You can also query this per line of your project.
  
  eru 4 months ago
  
  Git history is incredible important, yes, but also limited.
  Practically, it only encodes information that made it into `main`, not what an author just mulled over in their head or just had a brief prototype for, or ran an unrelated toy simulation over.
  
  1718627440 4 months ago
  
  If you throw away commit messages, that is on you, it is not a limitation of Git. If I am cleaning up before merging, I'm maybe rephrasing things, but I am not throwing that information away. I regularly push branches under 'draft/...' or 'fail/...' to the central project repository.
  
  kalaksi 4 months ago
  
  Sounds easier (for everybody) to just use comments.
  
  1718627440 4 months ago
  
  You put past failed implementation in comments? That sounds like a nightmare. I rather only include a short description in the comment that can then link to the older implementation if necessary.
  
  kalaksi 3 months ago
  
  What? No, a short explanation of why some approach doesn't work well.
  
  eru 4 months ago
  
  Sure, but you are still supposed to clean things up to make the life of the reviewer easier.
  There's an inherent tension between honest history and a polished 'lie' to make the reviewer's life easier.
  
  seba_dos1 4 months ago
  
  The "honest" historical record of when I decided to use "git commit" while working on something is 100% useless for anyone but me (for me it's 90% useless).
  git tracks revisions, not history of file changes.
  
  1718627440 4 months ago
  
  The WIP commits I initially recorded also don't necessarily existed as such in my file system and often don't really work completely, so I don't know why the commit after a rebase is any more a lie then the commit before the rebase.
  
  eru 3 months ago
  
  It's a 'lie' in the sense that you are optimising for telling a convenient and easy to understand story for the reviewer where each commit works atomically.
  
  necovek 4 months ago
  
  In fairness to GP, they said VCS, not Git, even if they are somewhat synonomous today. Other VCSes did support graph histories.
  Still, "3rd dimension" code reasoning (backwards in time) has never been merged well with code editing.
  
  eru 4 months ago
  
  > Other VCSes did support graph histories.
  Yes, git ain't the only one, but apart from interface difference, they are pretty much compatible in what they allow you to record in the history, I think?
  Part of the problem here is that we use git for two only weakly correlated purposes:
  - A history of the code
  - Make nice and reviewable proposals for code changes ('Pull Request')
  For the former, you want to be honest. For the latter, you want to present a polished 'lie'.
  
  necovek 4 months ago
  
  Not really. Launchpad.net does not have any public branches I could share atm as an example, but Bazaar (now breezy) allowed having a nested "merge commit": your trunk would have "flattened" merge commits ("Merge branch foo"), and under it you could easily get to each individual commit by a developer ("Prototype", "Add test"...). It would really be shown as a tree, but smartness was wven richer.
  This was made possible by using a DAG for commit storage and referencing, instead of relying on file contents and series of commits per reference. Merge behaviour was much smarter in case of diverging tip or criss-cross merges. But this ultimately was harder and slower to implement, and developers did not value this enough and they instead accepted the Git trade-offs.
  So you seamlessly did both with a different VCS without splitting those up: in a sense, computers and software worried about that for us.
  
  eru 4 months ago
  
  I am not quite sure what you are describing here. Git's underlying commit graph is a DAG.
  You can use different, custom merge-drivers (or whatever it's called) for Git to get the behaviour you describe here.
  
  necovek 4 months ago
  
  Certainly, but merges are treated differently by default, and getting to this sort of output would require "custom" tooling for things like "git log".
  Whereas bzr just did the expected thing.
  
  1718627440 4 months ago
  
  You can select whether you want the diff to the first or the second parent, which is the difference between collapsing and expanding merges. You can also completely collapse merges by showing first-parent-history.
  Or I do not understand what you mean with "the expected thing".
  
  eru 3 months ago
  
  Yes, `git log --first-parent` has been a godsend for coping with our team's messy non-cleaned up history.
  
  1718627440 4 months ago
  
  > - A history of the code
  Which is a causal history, not a editing log. So I don't perceive these to be actually different.
  
  1718627440 4 months ago
  
  > In fairness to GP, they said VCS, not Git
  I did say VCS, but I also don't know what Git is missing in this relation.
  > Other VCSes did support graph histories.
  How does Git do not?
  > Still, "3rd dimension" code reasoning (backwards in time) has never been merged well with code editing.
  Maybe it's not perfect, but Git seems to do that just fine for my taste. What is missing there?
  
  crazygringo 4 months ago
  
  But why would you ever put that into your VCS as opposed to code comments?
  The VCS history has to be actively pulled up and reading through it is a slog, and history becomes exceptionally difficult to retrace in certain kinds of refactoring.
  In contrast, code comments are exactly what you need and no more, you can't accidentally miss them, and you don't have to do extra work to find them.
  I have never understood the idea of relying on code history instead of code comments. It seems like it's all downsides, zero upsides.
  
  eru 4 months ago
  
  Both have their place. While I mostly agree with you, there's a clear example where git history is better: delete old or dead or unused code, rather than comment it out.
  
  1718627440 4 months ago
  
  Because comments are a bad fit to encode the evolution of code. We implemented systems to do that for a reason.
  > The VCS history has to be actively pulled up and reading through it is a slog
  Yes, but it also allows to query history e.g. by function, which to me gets me to understand much faster than wading through the current state and trying to piece information together from the status quo and comments.
  > history becomes exceptionally difficult to retrace in certain kinds of refactoring.
  True, but these refactorings also make it more difficult to understand other properties of code that still refers to the architecture pre-refactoring.
  > I have never understood the idea of relying on code history instead of code comments. It seems like it's all downsides, zero upsides.
  Comments are inherently linear to the code, that is sometimes what you need, for complex behaviour, you rather want to comment things along another dimension, and that is what a VCS provides.
  What I write is this:
  /* This used to do X, but this causes Y and Z and also conflicts with the FOO introduced in 5d066d46a5541673d7059705ccaec8f086415102. Therefore it does now do BAR, see c7124e6c1b247b5ec713c7fb8c53d1251f31a6af */
  
  AdieuToLogic 4 months ago
  
  > But it's even harder for code to answer the 'why not' question.
  Great point. Well-placed documentation as to why an approach was not taken can be quite valuable.
  For example, documenting that domain events are persisted in the same DB transaction as changes to corresponding entities and then picked up by a different workflow instead of being sent immediately after a commit.
  
  necovek 4 months ago
  
  Good naming and good tests can get you 90% of the way to "why" too.
  
  palata 4 months ago
  
  Agreed. Tests are documentation too. Tests are the "contract": "my code solves those issues. If you have to modify my tests, you have a different understanding than I had and should make sure it is what you want".
- wvenable 4 months ago
  
  > If good code was enough on its own we would read the source instead of documentation.
  That's 100% how I work -- reading the source. If the code is confusing, the code needs to be fixed.
  
  kalaksi 4 months ago
  
  Confusing code is one thing, but projects with more complex requirements or edge cases benefit from additional comments and documentation. Not everything is easily inferred from code or can be easily found in a large codebase. You can also describe e.g. chosen tradeoffs.
  
  habinero 4 months ago
  
  There's no way around just learning the codebase. I have never seen code documentation that was complete or correct, let alone both.
  
  actionfromafar 4 months ago
  
  But the documentation can really help in telling why we are doing things. That also seeps in to naming things like classes. If that were not so, we'd just name everything Class1, Class2, Method1, Method2 and so on.
  
  samplifier 4 months ago
  
  def reallyDumbIdeaByManagerWorkaroundMethodToGetCoverageToNinetyPercent(self): """Dont worry, this is a clear description of the method. """ return False
  
  TuxSH 4 months ago
  
  You exaggerate, but in this situation, I think putting a link to a Jira ticket or Slack convo (or whatever) as comment is best
  
  palata 4 months ago
  
  My point is that if your code is well written, it is self-documenting. Obviously Class1 and var2 are not self-documenting.
  
  dinfinity 4 months ago
  
  The code is what it does. The comments should contain what it's supposed to do.
  Even if you give them equal roles, self-documenting code versus commented code is like having data on one disk versus having data in a RAID array.
  Remember: Redundancy is a feature. Mismatches are information. Consider this:
  // Calculate the sum of one and one
  sum = 1 + 2;
  You don't have to know anything else to see that something is wrong here. It could be that the comment is outdated, which has no direct effects and is easily solved. It could be that this is a bug in the code. In any case it is information and a great starting point for looking into a possible problem (with a simple git blame). Again, without needing any context, knowledge of the project or external documentation.
  My take on developers arguing for self-documenting code is that they are undisciplined or do not use their tools well. The arguments against copious inline comments are "but people don't update them" and "I can see less of the code".
  
  palata 4 months ago
  
  > Redundancy is a feature. Mismatches are information. Consider this:
  Respectfully, if someone wrote code like this, I wouldn't want to work with them. I mean next step is "I copy paste code instead of writing functions, and in the comment above I mention all the other copies, so that it's easy to check that they are all doing the same thing redundantly".
  > The arguments against copious inline comments are "but people don't update them" and "I can see less of the code".
  Well no, that's not my argument. I have been navigating code for 20 years and in good codebases, comments are rare and describe something "surprising". Good code is hardly surprising.
  My problem with "literate programming" (which means "add a lot of comments in the implementation details") is that I find it hard to trust developers who genuinely cannot understand unsurprising code without comments. I am fine with a junior needing more time to learn, but after a few years if a developer cannot do it, it concerns me.
  
  dinfinity 4 months ago
  
  You did not engage with my main arguments. You should still do so.
  1. Redundancy: "The code is what it does. The comments should contain what it's supposed to do. [...] You don't have to know anything else to see that something is wrong here." and specifically the concrete trivial (but effective) example.
  2. "My take on developers arguing for self-documenting code is that they are undisciplined or do not use their tools well. The arguments against copious inline comments are "but people don't update them" and "I can see less of the code"."
  > Respectfully, if someone wrote code like this, I wouldn't want to work with them. I mean next step is "I copy paste code [...]
  This is an nonsensical slippery slope fallacy. In no way does that behavior follow from placing many comments in code. It also says nothing about the clearly demonstrated value of redundancy.
  > I have been navigating code for 20 years and in good codebases, comments are rare and describe something "surprising".
  Your definition of good here is circular. No argument on why they are good codebases. Did you measure how easy they were to maintain? How easy it was to onboard new developers? How many bugs it contained? Note also that correlation != causation: it might very well be that the good codebases you encountered were solo-projects by highly capable motivated developers and the comment-rich ones were complicated multi-developer projects with lots of developer churn.
  > My problem with "literate programming" [...] is that I find it hard to trust developers who genuinely cannot understand unsurprising code without comments.
  This is gatekeeping code by making it less understandable and essentially an admission that code with comments is easier to understand. I see the logic of this, but it is solving a problem in the wrong place. Developer competence should not be ascertained by intentionally making the code worse.
  
  palata 4 months ago
  
  You talk as if you had scientific proof that literate programming is objectively better, and I was the weirdo contradicting it without bringing any scientific proof.
  Fact is, you don't have any proof at all, you just have your intuition and experience. And I have mine.
  > It also says nothing about the clearly demonstrated value of redundancy.
  Clearly demonstrated, as in your example of "Calculate the sum of one and one"? I wouldn't call that a clear demonstration.
  > This is gatekeeping code by making it less understandable
  I don't feel like I am making it less understandable. My opinion is that a professional worker should have the required level of competence (otherwise they are not a professional in that field). In software engineering, we feed code to a compiler, and we trust that the compiler makes sure that the machine executes the code we write. The role of the software engineer is to understand that code.
  Literate programming essentially says "I am incapable of writing code that is understandable, ever, so I always need to explain it in a natural language". Or "I am incapable of reading code, so I need it explained in a natural language". My experience is that good code is readable by competent software engineers without explaining everything. But not only that: code is more readable when it is more concise and not littered with comments.
  > and essentially an admission that code with comments is easier to understand.
  I disagree again. Code with comment is easier to understand for the people who cannot understand it without the comments. Now the question is, again: are those people competent to handle code professionally? Because if they don't understand the code without comments, many times they will just have to trust the comments. If they used the comments to actually understand the code, pretty quickly they would be competent enough to not require the comments. Which means that at the point where they need it, they are not yet professionals, but rather apprentices.
  
  ninalanyon 4 months ago
  
  I have written code that was correct and necessarily written the way it was oly to have it repeatedly altered by well meaning colleagues who thought it looked wrong, inefficient, or unidiomatic. Eventually I had to fill it with warning comments and write a substantial essay explaining why it had to be the way it was,
  Code tells you what is happening but it doesn't always do it so that it is easy to understand and it almost never tells you why something is the way it is.
  
  palata 4 months ago
  
  Difficult to say without an example, but "code isn't enough" is just one possible conclusion in this case. Another one could be that the code is not actually as good as expected, and another one is that the colleagues may need to... do something about it.
  An obvious example I have is CMake. I have seen so many people complaining about CMake being incomprehensible, refactoring it to make it terrible, even wrapping it in Makefiles (and then wrapping that in Dockerfiles). But the problem wasn't the original CMakeLists or a lack of comments in it. The problem was that those developers had absolutely no clue about how CMake works, and felt like they should spend a few hours modifying it instead of spending a few hours understanding it.
  However, I do agree that sometimes there is a need for a comment because something is genuinely tricky. But that is rare enough that I call it "a comment" and not "literate programming".
  
  tonyedgecombe 4 months ago
  
  I always think the biggest mistake is using CMake in the first place. I’ve never come across a project as convoluted and poorly documented as it.
  
  palata 4 months ago
  
  What do you mean by "poorly documented"? I have been using it for 20 years, I have yet to find something that is not documented.
  As for convoluted, I don't find it harder than the other build systems I use.
  Really the problem I have with CMake is the amount of terribly-written CMakeLists. The norm seems to be to not know the basics of CMake but to still write a mess and then complain about CMake. If people wrote C the way they write CMake, we wouldn't blame the language.
  
  seba_dos1 4 months ago
  
  Exactly, that's why a good project will use comments sparingly and have them only where they matter to actually meaningfully augment the code. The rest is noise.
  
  dkersten 4 months ago
  
  Code alone can never describe intent or rationale.
  
  ithkuil 4 months ago
  
  Indeed, you need both!
  But documentation should not go too deep in the "how" otherwise it risks telling a lie after a while as the code changes but the documentation lags.
- necovek 4 months ago
  
  Having "grown up" on free software, I've always been quick to jump into code when documentation was dubious or lacking: there is only one canonical source of truth, and you need to be good at reading it.
  Though I'd note two kinds of documentation: docs how software is built (seldom needed if you have good source code), and how it is operated. When it comes to the former, I jump into code even sooner as documentation rarely answers my questions.
  Still, I do believe that literate programming is the best of both worlds, and I frequently lament the dead practice of doing "doctests" with Python (though I guess Jupyter notebooks are in a similar vein).
  Usually, the automated tests are the best documentation you can have!
- habinero 4 months ago
  
  > If good code was enough on its own we would read the source instead of documentation.
  Uh. We do. We, in fact, do this very thing. Lots of comments in code is a code smell. Yes, really.
  If I see lots of comments in code, I'm gonna go looking for the intern who just put up their first PR.
  > I believe part of good software is good documentation
  It is not. Docs tell you how to use the software. If you need to know what it does, you read the code.
  
  ninalanyon 4 months ago
  
  > If you need to know what it does, you read the code.
  True.
  But If you need to know why it does what its does, you read the comments. And often you need that knowledge if you are about to modify it.
  
  palata 4 months ago
  
  Do you have an example of such knowledge that you need to get from the comments? I have been programming for 20 years, and I genuinely don't see that much code that is so complex that it needs comments.
  Not that it doesn't exist; sometimes it's needed. But so rarely that I call it "comments", and not a whole discipline in itself that is apparently be called "literate programming". Literate programming sounds like "you need to comment pretty much everything because code is generally hard to understand". I disagree with that. Most code is trivial, though you may need to learn about the domain.
  
  tonyedgecombe 4 months ago
  
  Most of my comments related to the outside world not behaving quite as you would expect.
  Usually something like the spec says this but the actual behaviour is something else.
  
  EraYaN 4 months ago
  
  Because the why can be completely unrelated to the code (odd business requirements etc). The code can be known to be non-optimal but it is still the correct way because the embedded system used in product XYZ has some dumb chip in it that needs it this weird way etc. Or the CEO loves this way of doing things and fires everyone who touches it. So many possibilities, most technical projects have a huge amount of politics and weird legacy behavior that someone depends on (including on internal stuff, private methods are not guaranteed to not be used by a client for example). And comments can guard against it, both for the dev and the reviewer. Hell we currently have clients depend on the exact internal layout of some PDF reports, and not even the rendered layout but that actual definitions.
  
  palata 4 months ago
  
  Again, if it's a comment saying "we need this hack because the hardware doesn't support anything", I don't call it "literate programming".
  Literate programming seems to be the idea that you should write prose next to the code, because code "is difficult to understand". I disagree with that. Most good code is simple to understand (doesn't mean it's easy to write good code).
  And the comments here prove my point, I believe: whenever I ask for examples where a comment is needed, the answer is something very rare and specific (e.g. a hardware limitation). The answer to that is comments where those rare and specific situations arise. Not a whole concept of "literate programming".
  
  ninalanyon 4 months ago
  
  I've never properly tried literate programming, overkill for hobby projects and not practical for a team unless everyone agrees.
  Examples of code that needs comments in my career tend to come from projects that model the behaviour of electrical machines. The longest running such project was a large object oriented model (one of the few places where OOP really makes sense). The calculations were extremely time consuming and there were places where we were operating with small differences between large numbers.
  As team members came and went and as the project matured the team changed from one composed of electrical engineers, physicists, and mathematicians who knew the domain inside out to one where the bulk of the programmers were young computer science graduates who generally had no physical science background at all.
  This meant that they often had no idea what the various parts of the program were doing and had no intuition that would make them stop and think or ask a question before fixing a bug in wat seemed the most efficient way.
  The problem in this case is that sometimes you have to sacrifice runtime speed for correctness and numerical stability. You can't always re-order operations to reduce the number of assignments say and expect to get the same answers.
  Of course you can write unit and functional tests to catch some such errors but my experience says that tests need even better comments than the code that is being tested.
  
  Izkata 4 months ago
  
  > Literate programming sounds like "you need to comment pretty much everything because code is generally hard to understand".
  You and I read code. Came so naturally for me that I didn't realize others don't. But over the years and with some weird chats I've realized that for a lot of developers it's more like "deciphering code", like they're slowly translating a human language they only vaguely know - and it never even crossed their mind that it was possible to learn a programming language to the point you could just read it.
  
  bottd 4 months ago
  
  Not for everything. For code you own, yes this is often the case. For the majority of the layers you still rely on documentation. Take the project you mention going straight to source, did you follow this thread all the way down through each compiler involved in building the project? Of course not.
  
  palata 4 months ago
  
  My understanding is that "literate programming" doesn't say "you should document the public API". It says "you should document the implementation details, because code is hard to understand".
  My opinion is that if whoever is interested in reading the implementation details cannot understand it, either the code is bad or they need to improve themselves. Most of the time at least. But I hear a lot of "I am very smart, so if I don't understand it without any effort, it means it's too complicated".
  
  crazygringo 4 months ago
  
  > Lots of comments in code is a code smell. Yes, really.
  No, not really. It's actually a sign of devs who are helping future devs who will maintain and extend the code, so they can understand it faster. It's professionalism and respect.
  > If I see lots of comments in code, I'm gonna go looking for the intern who just put up their first PR.
  And I'm going to find them to say good job, keep it up! You're saving us time and money in the future.
  
  palata 4 months ago
  
  > It's professionalism and respect.
  If someone gives me code full of superfluous comments, I don't consider it professional. Sounds like an intern who felt the need to comment everything because ever single line seemed very complex to them.
  
  crazygringo 4 months ago
  
  Nobody said anything about "superfluous" comments.
  I'm assuming "lots of comments" means lots of meaningful comments. As complex code often requires. Nobody's talking about `i++; // increment i` here.
  
  palata 4 months ago
  
  > I'm assuming "lots of comments" means lots of meaningful comments.
  That's not what literate programming is. Literate programming says that you explain everything in a natural language.
  IMO, good code is largely unsurprising. I don't need comments for unsurprising code. I need comments for surprising code, but that is the exception, not the rule. Literate programming says that it is the rule, and I disagree.
  
  crazygringo 4 months ago
  
  > Literate programming says that you explain everything in a natural language.
  At a high level. Not line-by-line comments.
  > IMO, good code is largely unsurprising. I don't need comments for unsurprising code.
  I've never heard anything like that, and could not disagree more. Twenty different considerations might go into a single line of code. Often, one of them is something non-obvious. So you comment that thing. The idea that "good" code avoids anything non-obvious, that those are "exceptions", is frankly bizarre to me. Unless the code you write is 99% boilerplate or something.
  
  palata 4 months ago
  
  > So you comment that thing. The idea that "good" code avoids anything non-obvious, that those are "exceptions", is frankly bizarre to me.
  What I find interesting from the comments here is that there are obviously different perspectives on that. Granted, I cannot say that my way is better. Just as you cannot say that your way is better.
  But I am annoyed when I have to deal with code following your standards, and I assume you are annoyed when you have to deal with code following mine :-).
  Or maybe, I imagine that people who defend literate programming mean more comments than I think is reasonable, and people who disagree with me (like you) imagine that I mean fewer comments than you think is reasonable. And maybe in reality, given actual code samples, we would totally agree :-).
  Communication is hard.
- Verdex 4 months ago
  
  I do read the code instead of the documentation, whenever that is an option.
  Interesting factiod. The number of times I've found the code to describe what the software does more accurately than the documentation: many.
  The number of times I've found the documentation to describe what the software does more accurately than the code: never.
  
  crazygringo 4 months ago
  
  You seem to misunderstand the purpose of documentation.
  It's not to be more accurate than the code itself. That would be absurd, and is by definition impossible, of course.
  It's to save you time and clarify why's. Hopefully, reading the documentation is about 100x faster than reading the code. And explains what things are for, as opposed to just what they are.
  
  Verdex 4 months ago
  
  Clearly.
  Crazy thing.
  Number of times reading the source saved time and clarified why: many.
  Number of times reading the documentation saved time and clarified why: never.
  Perhaps I've just been unlucky?
  EDIT:
  The hilarious part to me is that everyone can talk past each other all day (reading the documentation) or we can show each other examples of good/bad documentation or good/bad code (reading the code) and understand immediately.
  
  crazygringo 4 months ago
  
  > Number of times reading the documentation saved time and clarified why: never.
  OK, so let's use an example... if you need to e.g. make a quick plot with Matplotlib. You just... what? Block off a couple weeks and read the source code start to finish? Or maybe reduce it to just a couple days, if you're trying to locate and understand the code just for the one type of plot you're trying to create? And the several function calls you need to set it up and display it in the end?
  Instead of looking at the docs and figuring out how to do it in 5 or 10 min?
  Because I am genuinely baffled here.
  
  palata 4 months ago
  
  Literate programming is not about documenting the public API, it's about documenting the implementation details, right? Otherwise no need for a new name, it's just "API documentation".
  > if you need to e.g. make a quick plot with Matplotlib. You just... what?
  Read the API documentation.
  Now if you need to fix a bug in Matplotlib, or contribute a feature to it, then you read the code.
awesome_dude 4 months ago

> Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).
I loathe this take.
I have rocked up to codebases where there were specific rules banning comments because of this attitude.
Yes comments can lie, yes there are no guards ensuring they stay in lock step with the code they document, but not having them is a thousand times worse - I can always see WHAT code is doing, that's never the problem, the problems is WHY it was done in this manner.
I put comments like "This code runs in O(n) because there are only a handful of items ever going to be searched - update it when there are enough items to justify an O(log2 n) search"
That tells future developers that the author (me) KNOWS it's not the most efficient code possible, but it IS when you take into account things unknown by the person reading it
Edit: Tribal knowledge is the worst type of knowledge, it's assumed that everyone knows it, and pass it along when new people onboard, but the reality (for me) has always been that the people doing the onboarding have had fragments, or incorrect assumptions on what was being conveyed to them, and just like the childrens game of "telephone" the passing of the knowledge always ends in a disaster
- AdieuToLogic 4 months ago
  
  > Yes comments can lie ...
  Comments only lie if they are allowed to become one.
  Just like a method name can lie. Or a class name. Or ...
  
  bonesss 4 months ago
  
  Right.
  The compiler ensures that the code is valid, and what ensures that ‘// used a suboptimal sort because reasons’ is updated during a global refactor that changes the method? … some dude living in that module all day every day exercising monk-like discipline? That is unwanted for a few reasons, notably the routine failures of such efforts over time.
  Module names and namespaces and function names can lie. But they are also corrected wholesale and en-masse when first fixed, those lies are made apparent when using them. If right_pad() is updated so it’s actually left_pad() it gets caught as an error source during implementation or as an independent naming issue in working code. If that misrepresentation is the source of an emergent error it will be visible and unavoidable in debugging if it’s in code, and the subsequent correction will be validated by the compiler (and therefore amenable to automated testing).
  Lies in comments don’t reduce the potential for lies in code, but keeping inline comments minimal and focused on exceptional circumstances can meaningfully reduce the number of aggregate lies in a codebase.
  
  deathanatos 4 months ago
  
  > what ensures that ‘// used a suboptimal sort because reasons’ is updated during a global refactor that changes the method?
  And for that matter, what ensures it is even correct the first time it is written?
  (I think this is probably the far more common problem when I'm looking at a bug, newly discovered: the logic was broken on day 1, hasn't changed since; the comment, when there is one, is as wrong as the day it was written.)
  
  awesome_dude 4 months ago
  
  But, you've still got an idea of why things were done the way they were - radio silence is....
  Go ask Steve, he wrote it, oh, he left about 3 years ago... does anyone know what he was thinking?
- larusso 4 months ago
  
  I don’t disagree here. I personally like to put the why into commit messages though. It’s my longtime fight to make people write better commit messages. Most devs I see describe what they did. And in most cases that is visible from the change-set. One has to be careful here as similar to line documentation etc everything changes with size. But I prefer if the why isn’t sprinkled between source. But I’m not dogmatic about it. It really depends.
  
  awesome_dude 4 months ago
  
  https://conventionalcommits.org/en/v1.0.0/
  I <3 great (edit: improve clarity) commit comments, but I am leaning more heavily to good comments at the same level as the dev is reading - right there in the code - rather than telling them to look at git blame, find the appropriate commit message (keeping in mind that there might have been changes to the line(s) of code and commits might intertwine, thus making it a mission to find the commit holding the right message(s).
  edit: I forgot to add - commit messages are great, assuming the people merging the PR into main aren't squashing the commits (a lot of people do this because of a lack of understanding of our friend rebase)
- palata 4 months ago
  
  IMHO, you shouldn't have to justify yourself ("yeah yeah, this is not optimal, I know it because I am not an idiot"). Just write your code in O(n) if that's good enough now. Later, a developer may see that it needs to be optimised, and they should assume that the previous developer was not an idiot and that it was fine with O(n), but now it's not anymore.
  Or do you think that your example comment brings knowledge other than "I want you to know that I know that it is not optimal, but it is fine, so don't judge me"?
  
  awesome_dude 4 months ago
  
  A little bit of "Don't judge me" and a little bit of "I nearly fell into a trap here, and started writing O(log n) search, but realised that it was a waste of time and effort (and would actually slow things down) - so to save you from that trap here's a note"
  
  palata 4 months ago
  
  The risk with that is that because it was not obvious to you does not necessarily mean it's not obvious to others.
  Over the years, I have seen many, many juniors wrapping simple CLI invocations in a script because they just learned about them and thought they weren't obvious.
  - clone_git_repo.sh
  - run_docker_container.sh
  I do agree that something actually tricky should be commented. But that's exceedingly rare.
  
  awesome_dude 4 months ago
  
  I mean, the whole point of explicit being superior to implicit is because what's obvious to some isn't necessarily obvious to everyone.
  Someone following me could look at it and go.. "well duh" and that's not going to hurt anyone, but if I didn't put that comment and someone refractometer, then we have someone redoing and then undoing, for no good reason.
  There's that meme where people are told to update the number of hours wasted because people try to refactor some coffee and have to undo it because it doesn't work
  
  palata 4 months ago
  
  Do you write a comment before every for loop to explain how a for loop works? Do you write a comment above that to remind the reader that the next few lines are written in, say, Go, just like in the rest of the file? Do you write a comment explaining that the text appearing on the screen is actually digital and will disappear when you turn off the computer?
  Obviously you don't, because you assume that the person reading that code has some level of knowledge. You don't say "well, it may not be obvious to everybody, so I need to explain everything".
  I guess where we differ is that to me, a professional software developer should be able to understand good code. If they aren't, they are a junior who needs practice. But I am for designing tools for the professionals, not for the apprentices. The goal of an apprentice is to become a professional, not to remain an apprentice forever.
  
  awesome_dude 4 months ago
  
  > Do you write a comment before every for loop to explain how a for loop works?
  Thank you for missing the point.
  It's not about the WHAT, it's about the WHY.
  For loops are obvious. O(n) being intentional instead of 'lazy' isn't obvious without context. That's what comments preserve - the decision rationale, not the syntax explanation.
  A professional developer can read code. But they can't read the mind of the author who made a non obvious tradeoff. That's what comments preserve.
  > I guess where we differ is that to me, a professional software developer should be able to understand good code. If they aren't, they are a junior who needs practice. But I am for designing tools for the professionals, not for the apprentices. The goal of an apprentice is to become a professional, not to remain an apprentice forever.
  If you are going to make personal attacks, you should know that I work with actual professionals, and they understand that future maintainers, myself included, cannot read their mind on why they chose the path they did.
  
  palata 4 months ago
  
  > It's not about the WHAT, it's about the WHY.
  And my point is that I don't care what it is about, I care about whether or not it is useful. I disagree with the literate programming idea that it's always useful to explain why you wrote the code the way you did, and your one example (justifying the O(n)) actually proves to me that I really don't care about your explanation in this particular case. So obviously your one example that I don't find useful won't convince me that all WHY comments are useful.
  > O(n) being intentional instead of 'lazy' isn't obvious without context.
  What does such a comment tell me?
  - That you chose the O(n): it's the "please don't judge me, I know what I am doing" part. It's superfluous, because by default I assume that you know what you are doing.
  - That you tried to do better and failed. If I believe that we don't need better than O(n), I don't care. If I believe that we need better than O(n), I will reason about doing it myself (no matter what you wrote).
  - ... I can't see anything else.
  Now sometimes, of course, there is real knowledge that needs to go into a comment. Like "This is a workaround due to a bug in version 1.4.2 of this proprietary dependency". But that's an exception. I can also totally imagine that some files implement something really tricky and deserve a lot of comments. But in my experience reading and contributing to a lot of open source code from many different projects, most code is not like that. The concept of "literate programming" doesn't say "be pragmatic about comments, use them when it matters", it says "comment the code because it always helps".
  > If you are going to make personal attacks
  I am not making personal attacks, I genuinely believe that you are perfectly able to read and understand code that does not follow the "literate programming" paradigm. And if you are not, I still don't see that as a personal attack: with experience you will definitely get there.
  > cannot read their mind on why they chose the path they did.
  I just want to repeat it here: it does not matter at the implementation detail level. You may want to document the architecture (including technology choices) of course, but that's not what literate programming is about. You probably want to document the public API (because using an API generally does not require reading the code, and the implementation may be proprietary), but again that's not what literate programming is about. But the implementation details? Unless it's surprising (e.g. a necessary workaround), I don't care about why it was written the way it was, I just care about understanding what it does such that I can reason about it.
  
  awesome_dude 3 months ago
  
  You make a lot of comments for someone that thinks it should be obvious and there's no need for comments.
  
  palata 3 months ago
  
  Again you prove my point: natural languages are ambiguous and communication is hard.
  And maybe also that you don't seem to make the difference between natural languages and programming languages: I have not been commenting code here. If you can't make the difference, maybe it explains why you want to mix them.
  
  awesome_dude 3 months ago
  
  And silence doesn't achieve the goal at all, as you continually prove.
k32k 4 months ago

"But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous."
Not only that, but there's something very annoying and deeply dissatisfying about typing a bunch of text into a thing for which you have no control over how its producing an output, nor can an output be reproduced even if the input is identical.
Agreed natural language is very ambiguous and becoming more ambiguous by the day "what exactly does 'vibe' mean?".
People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.
- caseyohara 4 months ago
  
  > People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.
  Surely you don’t mean everyone in the 1960s spoke directly, free of metaphor or euphemism or nuance or doublespeak or dog whistle or any other kind or ambiguity? Then why are there people who dedicate their entire life to interpreting religious texts and the Constitution?
  
  k32k 4 months ago
  
  Compared with today, on average, they did.
  There's a generation of people that 'typ lyk dis'.
  So yes.
  
  jyounker 4 months ago
  
  Your point is less persuasive than you intended. You complain about linguistic ambiguity, but then you show an example of sensible spelling reform.
  
  ChrisGreenHeur 4 months ago
  
  that example is regarding syntax, and is actually no worse than any other
casey2 4 months ago

Programming languages are natural and ambiguous too, what does READ mean? you have to look it up to see the types. The power comes from the fact that it's audit-able, but that you don't need to audit it every time you want to write some code. You think you write good code? try to prove it after the compiler gets through with it.
Natural languages are richer in ideas, it may be harder to get working code going from a purely natural description to code, than code to code, but you don't gain much from just translating code. One is only limited by your imagination the other already exists, you could just call it as a routine.
You only have a SENSE for good code because it's a natural language with conventions and shared meaning. If the goal of programming is to learn to communicate better as humans then we should be fighting ambiguity not running from it. 100 years from now nobody is going to understand that your conventions were actually "good code".
- musicale 4 months ago
  
  > Programming languages are natural and ambiguous too
  Programming languages work because they are artificial (small, constrained, often based on algebraic and arithmetic expressions, boolean logic, etc.) and have generally well-defined semantics. This is what enables reliable compilers and interpreters to be constructed.
  
  mexicocitinluez 4 months ago
  
  Exactly. Programming is the art of removing ambiguity and making it formal. And it's why the timelines between getting an EXACT plan of what I need to implement vs hazy requirements are so out of whack.
- mexicocitinluez 4 months ago
  
  > Programming languages are natural and ambiguous too, what does READ mean?
  Not nearly in the same sense actual language is ambiguous.
  And ambiguity in programming is usually a bad thing, whereas in language it can usually be intended.
  Good code, whatever that means, can read like a book. Event-driven architectures is a good example because the context of how something came to be is right in the event name itself.
- palata 4 months ago
  
  > Programming languages are natural and ambiguous too, what does READ mean?
  "READ" is part of the "documentation in natural language". The compiler ignores it entirely, it's not part of the programming language per se. It is pure documentation for the developers, and it is ambiguous.
  But the part that the compiler actually reads is non-ambiguous. It cannot deal with ambiguity, fundamentally. It cannot infer from the context that you wrote a line of code that is actually ironic, and it should therefore execute the opposite.
- LEDThereBeLight 4 months ago
  
  What is good code now is only good code because of the bad programming languages we’ve had to accept for the last hundred years because we’re tied to incremental improvements. We’re tied to static brittle types. But look at natural systems - they all use dynamic “languages.” When you get a cut, your flesh doesn’t throw an exception because it’s connected to the wrong “thing.” Maybe AI will redefine what good code means, because it’s better able to handle ambiguity.
pdntspa 4 months ago

> because my prompts are in natural languages, and hence ambiguous.
Legalese developed specifically because natural language was too ambiguous. A similar level of specificity for prompting works wonders
One of the issues with specifying directions to the computer with code is that you are very narrowly describing how something can be done. But sometimes I don't always know the best 'how', I just know what I know. With natural language prompting the AI can tap into its training knowledge and come up with better ways of doing things. It still needs lots of steering (usually) but a lot of times you can end up with a superior result.
- vnorilo 4 months ago
  
  Yes. LLMs are search engines into the (latent) space or source code. Stuff you put into the context window is the "query". I've had some good results by minimizing the conversational aspect, and thinking in terms of shaping the context: asking the LLM to analyze relevant files, nor because I want the analysis, but because I want a good reading in the context. LLMs will work hard to stay in that "landscape", even with vague prompts. Often better than with weirdly specific or conflicting instructions.
  
  ptx 4 months ago
  
  But search engines are not a good interface when you already know what you want and need to specify it exactly.
  See for example the new Windows start menu compared to the old-school run dialog – if I directly run "notepad", then I get always Notepad; but if I search for "notepad" then, after quite a bit of chugging and loading and layout shifting, I might get Notepad or I might get something from Bing or something entirely different at different times.
  
  vnorilo 4 months ago
  
  Indeed, which is not all that different from LLM code generation, to be honest.
baq 4 months ago

Docs and code work together as mutually error correcting codes. You can’t have the benefits of error detection and correction without redundant information.
- ghywertelling 4 months ago
  
  > With agents, does it become practical to have large codebases that can be read like a narrative, whose prose is kept in sync with changes to the code by tireless machines?
  I think this is true. Your point supports it. If either the explanation / intention or the code changes, the other can be brought into sync. Beautiful post. I always hated the fact that research papers don't read like novels, eg "ohk, we tried this which was unsuccessful but then we found another adjacent approach and it helped."
  Computer Scientist Explains One Concept in 5 Levels of Difficulty | WIRED
  https://www.youtube.com/watch?v=fOGdb1CTu5c
  Computer scientist Amit Sahai, PhD, is asked to explain the concept of zero-knowledge proofs to 5 different people; a child, a teen, a college student, a grad student, and an expert. Using a variety of techniques, Amit breaks down what zero-knowledge proofs are and why it's so exciting in the world of cryptography.
alkonaut 4 months ago

Maybe if we had a really terse and unambiguous form of English? Whenever there is ambiguity we insert parentheses and operators to really make it clear what we mean. We can enclose different sentences in brackets to make sure that the scope of a logical condition and so on. Oh wait
gwbas1c 4 months ago

> That's the reason why we created programming languages.
No, we created programming languages because when computers were invented:
1: They (computers) were incapable of understanding natural language.
2: Programming languages are easier to use than assembly or writing out machine code by hand.
LLMs are a quite recent invention, and require significantly more computing power than early computers had.
psychoslave 4 months ago

>Natural languages are ambiguous. That's the reason why we created programming languages.
Programming languages can be ambiguous too. The thing with formal languages is more that they put a stricter and narrower interpretation freedom as a convention where it's used. If anything there are a subset of human expression space. Sometime they are the best tool for the job. Sometime a metaphor is more apt. Sometime you need some humour. Sometime you better stay in ambiguity to play the game at its finest.
- palata 4 months ago
  
  Programming languages are non-ambiguous, in the sense that there is no doubt what will be executed. It's deterministic. If the program crashes, you can't say "no but this line was a joke, you should have ignored it". Your code was wrong, period.

CharlieDigital 4 months ago

The easiest thing to do is to have the LLM leave its own comments.

This has several benefits because the LLM is going to encounter its own comments when it passes this code again.

    > - Apply comments to code in all code paths and use idiomatic C# XML comments
    > - <summary> be brief, concise, to the point
    > - <remarks> add details and explain "why"; document reasoning and chain of thought, related files, business context, key decisions.
    > - <params> constraints and additional notes on usage
    > - inline comments in code sparingly where it helps clarify behavior

(I have something similar for JSDoc for JS and TS)

Several things I've observed:

1. The LLM is very good at then updating these comments when it passes it again in the future.

2. Because the LLM is updating this, I can deduce by proxy that it is therefore reading this. It becomes a "free" way to embed the past reasoning into the code. Now when it reads it again, it picks up the original chain-of-thought and basically gets "long term memory" that is just-in-time and in-context with the code it is working on. Whatever original constraints were in the plan or the prompt -- which may be long gone or otherwise out of date -- are now there next to the actual call site.

3. When I'm reviewing the PR, I can now see what the LLM is "thinking" and understand its reasoning to see if it aligns with what I wanted from this code path. If it interprets something incorrectly, it shows up in the `<remarks>`. Through the LLM's own changes to the comments, I can see in future passes if it correctly understood the objective of the change or if it made incorrect assumptions.

solarkraft 4 months ago

How do you deal with the comments sometimes being relatively noisy for humans? I tend to be annoyed by comments overly referring to a past correction prompt and not really making sense by themselves, but then again this IS probably the highest value information because these are exactly the things the LLM will stumble on again.
- CharlieDigital 4 months ago
  
  > How do you deal with the comments sometimes being relatively noisy for humans?
  To extents, that is a function of tweaking the prompt to get the level of detail desired and signal/vs noise produced by the LLM. e.g. constraining the word count it can use for comments.
  We have a small team of approvers that are reviewing every PR and for us, not being able to see the original prompt and flow of interactions with the agent, this approach lets us kind of see that by proxy when reviewing the PR so it is immensely useful.
  Even for things like enum values, for example. Why is this enum here? What is its use case? Is it needed? Having the reasoning dumped out allows us to understand what the LLM is "thinking".
  (Of course, the biggest benefit is still that the LLM sees the reasoning from an earlier session again when reading the code weeks or months later).
- stingraycharles 4 months ago
  
  Inline comments in function body: for humans.
  Function docs: for AI, with clear trigger (“use when X or Y”) and usage examples.
- JamesSwift 4 months ago
  
  I really hate its tendency to leave those comments as well. I seem to have coached it out with some claude.md instructions but they still happen on occasion.
ulrikrasmussen 4 months ago

Interesting observation. After a human is done writing code, they still have a memory of why they made the choices they made. With an LLM, the context window is severely limited compared to a brain, so this information is usually thrown away when the feature is done, and so you cannot go back and ask the LLM why something is the way it is.
- CharlieDigital 4 months ago
  
  Yup; in the moment, you can just have the LLM dump its reasoning into the comments (we use idiomatic `<remarks></remarks>` for C# and JSDoc `@remarks`).
  Future agents see the past reasoning as it `greps` through code. Good especially for non-obvious context like business and domain-level decisions that were in the prompt, but may not show in the code.
  I can't prove this, but I'm also guessing that this improves the LLM's output since it writes the comment first and then writes the code so it is writing a mini-spec right before it outputs the tokens for the function (would make an interesting research paper)
zozbot234 4 months ago

In my experience, LLM-added comments are too silly and verbose. It's going to pollute its own context with nonsense and its already limited ability to make sense of things will collapse. LLMs have plenty of random knowledge which is occasionally helpful, but they're nowhere near the standard of proper literacy of even an ordinary skilled coder, let alone Dr. Knuth who defined literate programming in the first place.
- CharlieDigital 4 months ago
  
  The output of an LLM is a reflection of the input and instructions. If you have silly and verbose comments, then consider improving your prompt.
  
  astrange 4 months ago
  
  Almost nothing in a Claude Code session has to do with "your prompt", it works for an hour afterwards and mostly talks to itself. I've noticed if you give it small corrections it will leave nonsensical comments referring to your small correction as if it's something everyone knows.
  
  CharlieDigital 4 months ago
  
  It has everything to do with your prompt and why Claude Code has a plan mode: because the quality of your planning, prompting, and inputs significantly affects the output.
  Your assertion, then, is that even a 1 sentence prompt is as good as a 5 section markdown spec with detailed coding style guidance and feature, by feature specification. This is simply not true; the detailed spec and guidance will always outperform the 1 sentence prompt.
  
  astrange 4 months ago
  
  No, I use plan mode and have several rounds of conversation with it, but lately I've been doing tasks where it does tons of independent research and finds complicated conclusions in an existing old codebase. I don't really feel like either of those count as "a prompt".
  The plan mode is useful because if you do corrections during development mode it does that silly thing where it leaves comments referring to your corrections.
  
  just6979 3 months ago
  
  "then consider improving all your training data and reinforcement feedback"
  Fixed that for you.
  The input is sooo much more than your prompt, that's kind of the point.
3371 4 months ago

Somehow made me think I should enforce a rule agents should sign their conment so it's identifiable at first glance

rustybolt 4 months ago

I have noticed a trend recently that some practices (writing a decent README or architecture, being precise and unambiguous with language, providing context, literate programming) that were meant to help humans were not broadly adopted with the argument that it's too much effort. But when done to help an LLM instead of a human a lot of people suddenly seem to be a lot more motivated to put in the effort.

zdragnar 4 months ago

In my years of programming, I find that humans rarely give documentation more than a cursory glance up until they have specific questions. Then they ask another person if one is available rather than read for the answer.
The biggest problem is that humans don't need the documentation until they do. I recall one project that extensively used docblock style comments. You could open any file in the project and find at least one error, either in the natural language or the annotations.
If the LLM actually uses the documentation in every task it performs- or if it isn't capable of adequate output without it- then that's a far better motivation to document than we actually ever had for day to day work.
- ijk 4 months ago
  
  I have discovered that the measure of good documentation is not whether your team writes documentation, but is instead determined by whether they read it.
- suzzer99 4 months ago
  
  The other problem is that documentation is always out of date, and one wrong answer can waste more time than 10 "I don't knows".
- 1718627440 4 months ago
  
  I think this really depends on culture. If you target OS APIs or the libc, the documentation is stellar. You have several standards and then conceptual documentation and information about particular methods all with historic and current and implementation notes, then there is also an interactive hypertext system. I solve 80% of my questions with just looking at the official documentation, which is also installed on my computer. For the remaining I often try to use the WWW, but these are often so specific, that it is more successful to just read the code.
  Once I step out of that ecosystem, I wonder how people even cope with the lack of good documentation.
jpollock 4 months ago

Documentation rots a lot more quickly than the code - it doesn't need to be correct for the code to work. You are usually better off ignoring the comments (even more so the design document) and going straight to the code.
- hinkley 4 months ago
  
  I maintain you’re either grossly misappropriating the time and energy of new and junior devs if this is the case on your project, or you have gone too long since hiring a new dev and your project is stagnating because of it.
  New eyes don’t have the curse of knowledge. They don’t filter out the bullshit bits. And one of the advantages of creating reusable modules is you get more new eyes on your code regularly.
  This may also be a place where AI can help. Some of the review tools are already calling us out on making the code not match the documentation.
  
  habinero 4 months ago
  
  No, they're 100% correct. This has been my experience at every place I've worked at in SV, from startup to FAANG.
  You write the code so you can scan it easily, and you build tools to help, and you ask for help when you need it, but you still gotta build that mental map out
hinkley 4 months ago

Paraphrasing an observation I stole many years ago:
A bunch of us thought learning to talk to computers would get them out of learning to talk to humans and so they spent 4 of the most important years of emotional growth engaging in that, only to graduate and discover they are even farther behind everyone else in that area.
- analog31 4 months ago
  
  This raises an interesting point. I've speculated that if someone has a hard time expressing themselves to other humans verbally or in writing, they're also going to have a hard time writing human-readable code. The two things are rooted in the same basic abilities. Writing documentation or comments in the code at least gives someone two slim chances at understanding them, instead of just one.
  I have the opposite problem. Granted, I'm not a software developer, but only use code as a problem solving tool. But once again, adding comments to my code gives me two slim chances of understanding it later, instead of one.
  
  1718627440 4 months ago
  
  > I've speculated that if someone has a hard time expressing themselves to other humans verbally or in writing
  I don't think they have actually problems with expressing themselves, code is also just a language with a very formal grammar and if you use that approach to structure your prose, it's also understandable. The struggle is more to mentally encode non-technical domain knowledge, like office politics or emotions.
  
  analog31 4 months ago
  
  That's true. But people have had formal language for millennia, so why don't we use it?
  Here's my hunch. Formal specifiation is so inefficient that cynics suspect it of being a form of obstructionism, while pragmatic people realize that they can solve a problem themselves, quicker than they can specify their requirements.
  
  1718627440 4 months ago
  
  > But people have had formal language for millennia, so why don't we use it?
  In case you don't refer to the mathematical notion of formal, then we use formal language all the time. Every subject has its formal terms, contracts are all written in a formal way, specifications use formal language. Anything that really matters or is read by a large audience is written in formal language.
  
  hinkley 4 months ago
  
  I think there’s some of that, but it’s also probably a thing where people who make good tutors/mentors tend to write clearer code as well, and the Venn diagram for that is a bit complicated.
  Concise code is going to be difficult if you can’t distill a concept. And that’s more than just verbal intelligence. Though I’m not sure how you’d manage it with low verbal intelligence.
cmrdporcupine 4 months ago

I've had LLMs proactively fix my inline documentation. Rather pleasant surprise: "I noticed the comment is out of date and does not reflect the actual implementation" even asking me if it should fix it.
- jimbokun 4 months ago
  
  I find LLMs more diligent about keeping the documentation than any human developer, including myself.
jimbokun 4 months ago

Well maybe if those people were managing one or more programmers and not writing the code themselves, they would have worked similarly.
what 4 months ago

The difference is that they’re using the LLM to write those readmes and architecture and whatever else documents. They’re not putting any effort in.

rednafi 4 months ago

I think a lighter version of literate programming, coupled with languages that have a small API surface but are heavy on convention, is going to thrive in this age of agentic programming.

A lighter API footprint probably also means a higher amount of boilerplate code, but these models love cranking out boilerplate.

I’ve been doing a lot more Go instead of dynamic languages like Python or TypeScript these days. Mostly because if agents are writing the program, they might as well write it in a language that’s fast enough. Fast compilation means agents can quickly iterate on a design, execute it, and loop back.

The Go ecosystem is heavy on style guides, design patterns, and canonical ways of doing things. Mostly because the language doesn’t prevent obvious footguns like nil pointer errors, subtle race conditions in concurrent code, or context cancellation issues. So people rely heavily on patterns, and agents are quite good at picking those up.

My version of literate programming is ensuring that each package has enough top-level docs and that all public APIs have good docstrings. I also point agents to read the Google Go style guide [1] each time before working on my codebase.This yields surprisingly good results most of the time.

[1] https://google.github.io/styleguide/go/

username223 4 months ago

> The Go ecosystem is heavy on style guides, design patterns, and canonical ways of doing things.
Go was designed based on Rob Pike's contempt for his coworkers (https://news.ycombinator.com/item?id=16143918), so it seems suitable for LLMs.

perrygeo 4 months ago

Considering LLMs are models of language, investing in the clarity of the written word pays off in spades.

I don't know whether "literate programming" per se is required. Good names, docstrings, type signatures, strategic comments re: "why", a good README, and thoughtfully-designed abstractions are enough to establish a solid pattern.

Going full "literate programming" may not be necessary. I'd maybe reframe it as a focus on communication. Notebooks, examples, scripts and such can go a long way to reinforcing the patterns.

Ultimately that's what it's about: establishing patterns for both your human readers and your LLMs to follow.

crazygringo 4 months ago

Yeah, I think what is needed is somewhere between docstrings+strategic comments, and literate programming.
Basically, it's incredibly helpful to document the higher-level structure of the code, almost like extensive docstrings at the file level and subdirectory level and project level.
The problem is that major architectural concepts and decisions are often cross-cutting across files and directories, so those aren't always the right places. And there's also the question of what properly belongs in code files, vs. what belongs in design documents, and how to ensure they are kept in sync.
- amelius 4 months ago
  
  Also:
  "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
  -- Linus Torvalds
  
  Swizec 4 months ago
  
  > "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
  If you get the architecture wrong, everyone complains. If you get it right, nobody notices it's there.
  
  esafak 4 months ago
  
  The SRE's Lament.
  
  Terr_ 4 months ago
  
  "Nothing needs fixing, so what do we pay you for?"
  "Everything's broken! What do we even pay you for!?"
  
  k32k 4 months ago
  
  Doesnt this apply with the hysteria of LLMs?
  The question being - are LLMs 'good' at interpreting and making choices/decisions about data structures and relationships?
  I do not write code for a living but I studied comp sci. My impression was always that the good software engineers did not worry about the code, not nearly as much as the data structures and so on.
  
  skydhash 4 months ago
  
  The only use of code is to process data, aka information. And any knowledge worker that the success of processing information is mostly relying on how it's organized (try operating a library without an index).
  Most of the time is spent about researching what data is available and learning what data should be returned after the processing. Then you spend a bit of brain power to connect the two. The code is always trivial. I don't remember ever discussing code in the workplace since I started my career. It was always about plans (hypotheses), information (data inquiry), and specifications (especially when collaborating).
  If the code is worrying you, it would be better to buy a book on whatever technology you're using and refresh your knowledge. I keep bookmarks in my web browser and have a few books on my shelf that I occasionally page through.
jimbokun 4 months ago

Notebooks are an example of literate programming.

jph00 4 months ago

Nearly all my coding for the last decade or so has used literate programming. I built nbdev, which has let me write, document, and test my software using notebooks. Over the last couple of years we integrated LLMs with notebooks and nbdev to create Solveit, which everyone at our company uses for nearly all our work (even our lawyers, HR, etc).

It turns out literate programming is useful for a lot more than just programming!

mkl 4 months ago

This seems to be the best link? https://solve.it.com/
The name is quite hard to search for, as it's used by a lot of different things.
Jeremy it's pretty hard to understand what this is from the descriptions, and the two videos are each ~1 hour long. Please consider showing screenshots and one or two short videos.

cfiggers 4 months ago

Interesting and semi-related idea: use LLMs to flag when comments/docs have come out of sync with the code.

The big problem with documentation is that if it was accurate when it was written, it's just a matter of time before it goes stale compared to the code it's documenting. And while compilers can tell you if your types and your implementation have come out of sync, before now there's been nothing automated that can check whether your comments are still telling the truth.

Somebody could make a startup out of this.

spawarotti 4 months ago

There is at least one startup doing it already (I'm not affiliated with it in any way): https://promptless.ai/
- cfiggers 4 months ago
  
  Thanks for the pointer. That looks more to me like it's totally synthesizing the docs for me. I can see someone somewhere wanting that. I would want a UX more like a compiler warning. "Comment on line 447 may no longer be accurate." And then I go fix it my own dang self.
  
  gogopromptless 3 months ago
  
  Ha, this is funny (also sad for me because I failed to explain on website clearly) because you have described exactly what it does as an example of what it can't do.
  The core loop is more like a truffle-hunting pig than a ghostwriter. Promptless watches for signal that your product is behaving differently from the live documentation. It watches PRs opened/merging, Slack threads, support tickets. Then like a pig alerting on a truffle it shows up like "hey, this section over here doesn't match what the code/product does anymore."
  Now of course we'll also generate a first draft of a suggested fix, but I want to say 40% of tech writers just like knowing when things changed.
  Its a proper union find algorithm, where every suggestion links back to the source that triggered it, but multiple source do get linked up to just a single canonical suggestion. So you don't get duplicate alerts if people keep talking for weeks about a fix going out in the next release.
  Obviously I've got some more work to do on the website again but c'est la vie.
esafak 4 months ago

If you have CI hooked up to AI you could you just use a SLM to do that in a periodic job with https://github.github.com/gh-aw/ or https://www.continue.dev/. You could also have it detect architectural drift.
andyhasit 4 months ago

I once had a mad idea of creating an automated documentation-driven paradigm where every directory/module/class/function has to have a DocString/JSDoc, with the higher level ones (directory/module) essentially being the documentation of features and architecture. A ticket starts by someone opening a PR with suggested changes to the docs, the idea being that a non-technical person like a PM or tester could do it. The PR then passes to a dev who changes the code to match the doc changes. Before merging, the tool shows the doc next to every modified piece of code and the reviewer must explicitly check a box to say it's still valid. And docstrings would be able to link to other docstrings, so you could find out what other bits of code are connected to what you're working on (as that link doesn't always exist in code, e.g. across APIs) and read their docs to find the larger context and gotchas.
kaycebasques 4 months ago

I'm a technical writer. Off the top of my head I reckon at least 10 startups have … started up … in this space since 2023.
amelius 4 months ago

Why would you need comments from an AI if you can just ask it what the code is doing?
- melagonster 4 months ago
  
  Because only a human writer can explain why he did the resolution. But nobody wants to update comments each time.
- jimbokun 4 months ago
  
  Because the human needs to tell the AI whether it’s the code or the comment that’s wrong.

cadamsdotcom 4 months ago

Test code and production code in a symmetrical pair has lots of benefits. It’s a bit like double entry accounting - you can view the code’s behavior through a lens of the code itself, or the code that proves it does what it seems to do.

You can change the code by changing either tests or production code, and letting the other follow.

Code reviews are a breeze because if you’re confused by the production code, the test code often holds an explanation - and vice versa. So just switch from one to the other as needed.

Lots of benefits. The downside is how much extra code you end up with of course - up to you if the gains in readability make up for it.

sublinear 4 months ago

> This is especially important if the primary role of engineers is shifting from writing to reading.

This was always the primary role. The only people who ever said it was about writing just wanted an easy sales pitch aimed at everyone else.

Literate programming failed to take off because with that much prose it inevitably misrepresents the actual code. Most normal comments are bad enough.

It's hard to maintain any writing that doesn't actually change the result. You can't "test" comments. The author doesn't even need to know why the code works to write comments that are convincing at first glance. If we want to read lies influenced by office politics, we already have the rest of the docs.

c0rp4s 4 months ago

You're right that you can't test comments, but you can test the code they describe. That's what reproducibility bundles do in scientific computing ;; the prose says "we filtered variants with MAF < 0.01", and the bundle includes the exact shell command, environment, and checksums so anyone can verify the prose matches reality. The prose becomes a testable claim rather than a decorative comment. That said, I agree the failure mode of literate programming is prose that drifts from code. The question is whether agents reduce that drift enough to change the calculus.
ares623 4 months ago

I don't buy that. Writing is taking a bad rap from all this. Writing _is_ a form of more intense reading. Reading on steroids, as they say. If reading is considered good, writing should be considered better.
- bigyabai 4 months ago
  
  Writing in that draft style is really only useful because a) you read the results and b) you write an improved version at the end. Drafting forever is not considered "better" because someone (usually you) has to sift through the crap to find the good parts.
  This is especially pronounced in the programming workplace, where the most "senior" programmers are asked to stop programming so they can review PRs.
8note 4 months ago

> You can't "test" comments.
I'm thinking that we're approaching a world where you can both test for comments and test the comments themselves.
- senderista 4 months ago
  
  Now that would be really interesting: prompt an LLM to find comments that misrepresent the code! I wonder how many false positives that would bring up?
  
  ccosky 4 months ago
  
  I have a Claude Code skill for adding, deleting and improving comments. It does a decent job at detecting when comments are out of date with the code and updating them. It's not perfect, but it's something.

macey 4 months ago

I agree it's worth revisiting. Actually I wrote about this recently, I didn't realise there was a precedent here. https://tessl.io/blog/how-to-capture-intent-with-coding-agen...

> As a benefit, the code base can now be exported into many formats for comfortable reading. This is especially important if the primary role of engineers is shifting from writing to reading.

Underrated point. Also, whether we like it or not, people without engineering backgrounds will be closer to code in the future. That trend isn't slowing down. The inclusion of natural language will make it easier for them to be productive and learn.

librasteve 4 months ago

I dont know Org, but Rakudoc https://docs.raku.org/language/pod is useful for literate programming (put the docs in the code source) and for LLM (the code is "self documenting" so that in the LLM inversion of control, the LLM can determine how to call the code).

https://podlite.org is this done in a language neutral way perl, JS/TS and raku for now.

Heres an example:

  #!/usr/bin/env raku
  =begin pod
  =head1 NAME
  Stats::Simple - Simple statistical utilities written in Raku

  =head1 SYNOPSIS
      use Stats::Simple;

      my @numbers = 10, 20, 30, 40;

      say mean(@numbers);     # 25
      say median(@numbers);   # 25

  =head1 DESCRIPTION
  This module provides a few simple statistical helper functions
  such as mean and median. It is meant as a small example showing
  how Rakudoc documentation can be embedded directly inside Raku
  source code.

  =end pod

  unit module Stats::Simple;

  =begin pod
  =head2 mean

      mean(@values --> Numeric)

  Returns the arithmetic mean (average) of a list of numeric values.

  =head3 Parameters
  =over 4
  =item @values
  A list of numeric values.

  =back

  =head3 Example
      say mean(1, 2, 3, 4);  # 2.5
  =end pod
  sub mean(*@values --> Numeric) is export {
      die "No values supplied" if @values.elems == 0;
      @values.sum / @values.elems;
  }

  =begin pod
  =head2 median

      median(@values --> Numeric)

  Returns the median value of a list of numbers.

  If the list length is even, the function returns the mean of
  the two middle values.

  =head3 Example
      say median(1, 5, 3);     # 3
      say median(1, 2, 3, 4);  # 2.5
  =end pod
  sub median(*@values --> Numeric) is export {
      die "No values supplied" if @values.elems == 0;

      my @sorted = @values.sort;
      my $n = @sorted.elems;

      return @sorted[$n div 2] if $n % 2;

      (@sorted[$n/2 - 1] + @sorted[$n/2]) / 2;
  }

  =begin pod
  =head1 AUTHOR
  Example written to demonstrate Rakudoc usage.

  =head1 LICENSE
  Public domain / example code.
  =end pod

beernet 4 months ago

Literate programming sounds great in a blog post, but it falls apart the moment an agent starts hallucinating between the prose and the actual implementation. We’re already struggling with docstrings getting out of sync; adding a layer of philosophical "intent" just gives the agent more room to confidently output garbage. If you need a wall of text to make an agent understand your repo, your abstractions are probably just bad. It feels like we're trying to fix a lack of structural clarity with more tokens.

trixn86 4 months ago

I don't think that agents actually benefit from comments that describe what the code does at all. In my experience in the best case they don't really improve response quality and in the worst case they drastically reduce it. This is just noise that does not help the AI understand the context any better. This has already been true for a trained developer and it is even more so true for AI agents. Natural language is in almost every way less efficient in providing context and AI has no problem at all to infer intent from good code. The challenge is rather to make the AI produce good code which needs a strict harness and rules. Another good addition is semantic indexing of the codebase to help the AI find code using semantic search (which is what some agents already do quite successfully).

The only context I consistently found to be useful is about project-specific tool calling. Trying to provide natural language context about the project itself always proved to be ambiguous, inaccurate and out-of-date. Agents are very good at reading code and code is the best way to express context unambiguously.

empath75 4 months ago

You can have perfectly good code, which is perfectly easy to understand which nevertheless _does not do what you intended to do_. That is why tests exist, after all.

jarnm0 4 months ago

I agree that we should revisit literate programming, but I don't think using LLMs to generate or summarize code is ever going to be the ultimate solution. You want something that is unambiguous and computable but that also non-technical people can work with - a programming language which reads like natural language.

In 2021 I started to "solve programming in natural language" by building a platform which enables creating these kinds of domain-specific (projectional) programming languages which can look exactly like (structured) natural language. The idea was to enable domain/business experts to manage the business rules in different kinds of systems. The platform works and the use-cases are there, but I haven't been able to commercialize it yet.

I didn't initially build it for LLMs, but after the release of GPT 3.5 it became obvious that these structured natural languages would be the perfect link between non-technical people, LLMs and deterministic logic. So now I have enabled the platform to instruct LLMs to work with the languages with very good results and are trying to commercialize for LLM use-cases. There absolutely is synenergies in combining literate programming and LLMs!

I've written a bit more about it here: - https://www.linkedin.com/pulse/how-i-accidentally-built-cont... - https://www.linkedin.com/pulse/llms-structured-natural-langu...

(P.S. Looking for a co-founder, feel free to reach out in LinkedIn if this resonates!)

teleforce 4 months ago

Not sure if the author know about CUE, here's the HN post from early this year on literate programming with CUE [1].

CUE is based of value-latticed logic that's LLM's NLP cousin but deterministic rather than stochastic [2].

LLMs are notoriously prone to generating syntactically valid but semantically broken configurations thus it should be used with CUE for improving literate programming for configs and guardrailing [3].

[1] CUE Does It All, But Can It Literate? (22 comments)

https://news.ycombinator.com/item?id=46588607

[2] The Logic of CUE:

https://cuelang.org/docs/concept/the-logic-of-cue/

[3] Guardrailing Intuition: Towards Reliable AI:

https://cue.dev/blog/guardrailing-intuition-towards-reliable...

gervwyk 4 months ago

For me this is where a config layer shines. Develop a decent framework and then let the agents spin out the configuration.

This allows a trusted and tested abstraction layer that does not shift and makes maintenance easier, while making the code that the agents generate easier to review and it also uses much less tokens.

So as always, just build better abstractions.

cyanydeez 4 months ago

when do you think we'll get to build real software?
jauntywundrkind 4 months ago

I fully agree. (Seeing how good Figment2 is for layered config in rust is wildly eye opening, has been a revelatory experience.)
Sometimes what we manage with config is itself processing pipelines. A tool like darktable has a series of processing steps that are run. Each of those has config, but the outer layer is itself a config of those inner configs. And the outer layer is a programmable pipeline; it's not that far apart from thinking of each user coming in and building their own http handler pipeline, making their own bespoke computational flow.
I guess my point is that computation itself is configuration. XSLT probably came closest to that sun. But we see similar lessons everywhere we look.
macintux 4 months ago

I work with a project that is heavily configuration-driven. It seems promising, but in reality:
- Configuration is massively duplicated, across repositories
- No one is willing to rip out redundancy, because comprehensive testing is not practical
- In order to understand the configuration, you have to read lots of code, again across multiple repositories (this in particular is a problem for LLM assistance, at least the way we currently use it)
I love the idea, but in practice it’s currently a nightmare. I think if we took a week we could clean things up a fair bit, but we don’t have a week (at least as far as management is concerned), and again, without full functional testing, it’s difficult to know when you’ve accidentally broken someone else’s subsystem
- macintux 4 months ago
  
  Now that I've returned to working on the project tonight, I just remembered another failing of our code. (I'm not in any way claiming these are universal problems, just that they are something to be wary of.)
  Naming is so incredibly important. The wrong name for a configuration key can have cascading impacts, especially when there is "magic" involved, like stripping out or adding common prefixes to configuration values.
  We have a concept called a "domain" which is treated as a magic value everywhere, such as adding a prefix or suffix. But domain isn't well-defined, and in different contexts it is used different ways, and figuring out what the impact is of choosing a domain string is typically a matter of trial and error.
jimbokun 4 months ago

All of that is just code.
Frameworks are just overly brittle and fragile libraries that overly restrict how you can use them.

jauntywundrkind 4 months ago

One of the things I love most about WebMCP is the idea that it's a MCP session that exists on the page, which the user already knows.

Most of these LLM things are kind of separate systems, with their own UI. The idea of agency being inlayed to existing systems the user knows like this, with immediate bidirectional feedback as the user and LLM work the page, is incredibly incredibly compelling to me.

Series of submissions (descending in time): https://news.ycombinator.com/item?id=47211249 https://news.ycombinator.com/item?id=47037501 https://news.ycombinator.com/item?id=45622604

trane_project 4 months ago

I think full literate programming is overkill but I've been doing a lighter version of this:

- Module level comments with explanations of the purpose of the module and how it fits into the whole codebase.

- Document all methods, constants, and variables, public and private. A single terse sentence is enough, no need to go crazy.

- Document each block of code. Again, a single sentence is enough. The goal is to be able to know what that block does in plain English without having to "read" code. Reading code is a misnomer because it is a different ability from reading human language.

Example from one of my open-source projects: https://github.com/trane-project/trane/blob/master/src/sched...

robertwer 4 months ago

There seems to be some evidence that literate programming style comments help humans to comprehend code they don't know. I found a paper investigating this.

Some folks from Google tested 1) how good LLMs can update existing code with LP style comments and 2) if that helps humans to better understand that enhanced code. (see the 2024 arXiv paper "Natural Language Outlines for Code").

If I remember correctly they had systematically tested how good humans could understand the enhanced code compared to no comments at all and they also tested different flavours of comments (line level, block level etc.).

The conclusion was: if you use the right amount of comments in the right style (intent explaining the purpose of the code on block level, not every line), it's very beneficial.

jimbokun 4 months ago

This does seem exciting at first glance. Just write the narrative part of literate programming and an LLM generates the code, then keep the narrative and voila! Literate programming without the work of generating both.

However I see two major issues:

Narrative is meant to be consumed linearly. But code is consumed as a graph. We navigate from a symbol to its definition, or from definition to its uses, jumping from place to place in the code to understand it better. The narrative part of linear programming really only works for notebooks where the story being told is dominant and the code serves the story.

Second is that when I use an LLM to write code, the changes I describe usually require modifying several files at once. Where does this “narrative” go relative to the code.

And yes, these two issues are closely related.

melisgl 4 months ago

- For many languages, we can get away with Untangled LP. See, e.g. https://quotenil.com/untangling-literate-programming.html

- Introducing redundancies (of code, tests, documentation) is our primary tool to increase our confidence in the correctness of the solution: See, e.g. https://quotenil.com/multifaceted-development.html

- Untangled LP has been a good idea even before LLMs. It's even better now, as LLMs can maintain documentation and check it against the code.

cmontella 4 months ago

I agree with this. I've been a fan of literate programming for a long time, I just think it is a really nice mode of development, but since its inception it hasn't lived up to its promise because the tooling around the concept is lacking. Two of the biggest issues have been 1) having to learn a whole new toolchain outside of the compiler to generate the documents 2) the prose and code can "drift" meaning as the codebase evolves, what's described by the code isn't expressed by the prose and vice versa. Better languages and tooling design can solve the first problem, but I think AI potentially solves the second.

Here's the current version of my literate programming ideas, Mechdown: https://mech-lang.org/post/2025-11-12-mechdown/

It's a literate coding tool that is co-designed with the host language Mech, so the prose can co-exist in the program AST. The plan is to make the whole document queryable and available at runtime.

As a live coding environment, you would co-write the program with AI, and it would have access to your whole document tree, as well as live type information and values (even intermediate ones) for your whole program. This rich context should help it make better decisions about the code it writes, hopefully leading to better synthesized program.

You could send the AI a prompt, then it could generate the code using live type information; execute it live within the context of your program in a safe environment to make sure it type checks, runs, and produces the expected values; and then you can integrate it into your codebase with a reference to the AI conversation that generated it, which itself is a valid Mechdown document.

That's the current work anyway -- the basis of this is the literate programming environment, which is already done.

The docs show off some more examples of the code, which I anticipate will be mostly written by AIs in the future: https://docs.mech-lang.org/getting-started/introduction.html

catlifeonmars 4 months ago

We actually have had literate programming for a while, it just doesn’t look exactly how it was envisioned: Nowadays, it’s common for many libraries to have extensive documentation, including documentation, hyperlinks and testable examples directly inline in the form of comments. There’s usually a well defined convention for these comments to be converted into HTML and some of them link directly back to the relevant source code.
This isn’t to say they’re exactly what is meant by literate programming, but I gotta say we’re pretty damn close. Probably not much more than a pull request away for your preferred languages’ blessed documentation generator in fact.
(The two examples I’m using to draw my conclusions are Rust and Go).
- cmontella 4 months ago
  
  I think that's exactly what is meant, and it's a great example. The two places where literate programming have shined most are 1) documentation because it's a natural fit there and you can get away with having little programs rather than focusing on a book-length narrative as Knuth had originally purposed it for. But also 2) notebook programming environments especially Jupyter and Org mode. I think programs structured in these notebooks really are perfectly situated for LLM analysis and extension, which is where the opportunity lies today.

frakt0x90 4 months ago

Maybe for literate programming, we can switch from common, ambiguous human languages like English and Spanish to [Lojban](https://en.wikipedia.org/wiki/Lojban)! That way our human language will be unambiguous which will translate to machine code much better. We'll call this the de facto "language for programming". Improvements and other variants may pop up in the future as new needs arise. All that is old is new again.

rorylaitila 4 months ago

Even on the latest models, LLMs are not deterministic between "don't do this thing" and "do this thing". They are both related to "this thing" and depending on other content in the context and seed, may randomly do the thing or not. So to get the best results, I want my context to be the smallest possible truthful input, not the most elaborated. More is not better. I think good names on executable source code and tightest possible documentation is best for LLMs, and probably for people too.

Arubis 4 months ago

Anecdotally, Claude Opus is at least okay at literate emacs. Sometimes takes a few rounds to fix its own syntax errors, but it gets the idea. Requiring it to TDD its way in with Buttercup helps.

ajkjk 4 months ago

I've had the same thought, maybe more grandiosely. The idea is that LLM prompts are code -- after all they are text that gets 'compiled' (by the LLM) into a lower-level language (the actual code). The compile process is more involved because it might involve some back-and-forth, but on the other hand it is much higher level. The goal is to have a web of prompts become the source of truth for the software: sort of like the flowchart that describes the codebase 'is' the codebase.

Copyrightest 4 months ago

One problem with this is that there isn't really a "current prompt" that completely describes the current source code; each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
- ajkjk 4 months ago
  
  true, but that just means that's the problem to solve. probably the ideal architecture isn't possible right now. But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it, so that eventually it becomes a full 'spec'.
  And maybe there is a way to trim the parts out of it that are not needed... like to automatically produce an initial prompt which is equivalent to the results of a longer session, but is precise enough so as to not need clarification upon reprocessing it. Something like that? I'm not sure if that's something that already exists.
  
  sarchertech 4 months ago
  
  > But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it
  Why would you think this though? There are an infinite number of programs that can satisfy any non-trivial spec.
  We have theoretical solutions to LLM non-determinism, we have no theoretical solutions to prompt instability especially when we can’t even measure what correct is.
  
  ajkjk 4 months ago
  
  yeah but all of the infinite programs are valid if they satisfy the spec (well, within reason). That's kinda the point. Implementation details like how the code is structured or what language it's in are swept under the rug, akin to how today you don't really care what register layout the compiler chooses for some code.
  
  sarchertech 4 months ago
  
  There has never been a non trivial program in the history of the world that could just “sweep all the implementation details under the rug”.
  Compilers use rigorous modeling to guarantee semantic equality and that is only possible because they are translating between formal languages.
  A natural language spec can never be precise enough to specify all possible observable behaviors, so your bot swarm trying to satisfy the spec is guaranteed to constantly change observable behaviors.
  This gets exposed to users and churn, jank, and workflow breaking bugs.
- 1718627440 4 months ago
  
  > each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
  My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log.
- 1718627440 4 months ago
  
  > each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
  My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log from LLMs.
sarchertech 4 months ago

No it doesn’t get compiled. Compilation is a translation from one formal language to another that can be rigorously modeled and is generally reproducible.
Translating from a natural language spec to code involves a truly massive amount of decision making because it’s ambiguous. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.
Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously or constantly recompile specs that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.
Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand. The only solution to this problem is that LLMs get better.
Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Max or Star Trek.
- ajkjk 4 months ago
  
  well you have to expand your definition of "compile" a bit. There is clearly a similarity, whether or not you want to call it the same word. Maybe it needs a neologism akin to 'transpiled'.
  other than that you seem to be arguing against someone other than me. I certainly agree that agents / existing options would be chaotic hell to use this way. But I think the high-level idea has some potential, independent of that.
  
  sarchertech 4 months ago
  
  I fundamentally don’t think the higher level idea has any potential because of the ambiguity of natural language. And I certainly don’t think it has anything in common with compilation unless you want to stretch the definition so far as to say that engineers are compilers. It’s delegation not abstraction.
  I think we’ll either get to the point where AI is so advanced it replaces the manager, the PM, the engineer, the designer, and the CEO, or we’ll keep using formal languages to specify how computers should work.

yuppiemephisto 4 months ago

I do a form of literate programming for code review to help read AI code. I use [Lean 4](lean-lang.org) and its doc tool [Verso](https://github.com/leanprover/verso/) and have it explain the code through a literate essay. It is integrated with Lean and gets proper typechecking etc which I find helpful.

jasfi 4 months ago

I wrote something similar where you specify the intent in Markdown at the file level. That can also be done by an AI agent. Each intent file compiles to a source file.

It works, but needs improvement. Any feedback is welcome!

https://intentcode.dev

https://github.com/jfilby/intentcode

avatardeejay 4 months ago

Something in this realm covers my practice. I just keep a master prompt for the whole program, and sparsely documented code. When it's time to use LLM's in the dev process, they always get a copy of both and it makes the whole process like 10x as coherent and continuous. Obvi when a change is made that deviates or greatly expands on the spec, I update the spec.

grapheneposter 4 months ago

I do something similar with quality gates. I have a bunch of markdown files at the ready to point agents to for various purposes. It lets me leverage LLMs at any stage of the dev process and my clients get docs in their format without much maintenance from myself. As you said once you get it down it becomes a very coherent process that can be iterated on in its own right.
I am currently fighting the recursive improvement loop part of working with agents.

threethirtytwo 4 months ago

Should be extremely low effort to try this out with an agent.

The thing is, I feel an agent can read code as if it was english. It doesn't differentiate one as hard and the other as much more readable as we do. So it could end up just increasing the token burn amount just to get through a program because it has to run through the literate part as well as the actual code part.

arikrahman 4 months ago

I have instructed my LLMs to at least provide a comment per function, but prompt it to comment when it takes out things additionally, and why it opted to choose a particular solution. DistroTube also loves declarative literate programming approach, often citing how his one document configuration with nix configures his whole system.

stephbook 4 months ago

Take it to the logical conclusion. Track the intended behavior in a proper issue tracking software like Jira. Reference the ticket in your version control system.

Boring and reliable, I know.

If you need guides to the code base beyond what the programming language provides, just write a directory level readme.md where necessary.

andyferris 4 months ago

I think the externality of issue tracking systems like Jira (or even GitHub) cause friction. Literate programming has everything in one place.
I’d like to have a good issue tracking system inside git. I think the SQLite version management system has this functionality but I never used it.
One thing to solve is that different kinds of users need to interact with it in different kinds of ways. Non-programmers can use Jira, for example. Issues are often treated as mutable text boxes rather than versioned specification (and git is designed for the latter). It’s tricky!

fhub 4 months ago

We were taught Literate Programming and xtUML at university. In both courses, the lecturers (independently) tried to convince us that these technologies were the future. I also did an AI/ML course. That lecturer lamented that the golden era was in the past.

eisbaw 4 months ago

https://github.com/eisbaw/litterate_bitorrent 800 pages, noweb extracts rust. Made by claude in a ralph loop over 1-2 days.

yes, it downloads actual torrents.

nailer 4 months ago

> Literate programming is the idea that code should be intermingled with prose such that an uninformed reader could read a code base as a narrative

Have you tried naming things properly? A reader that knows your language could then read your code base as a narrative.

ljlolel 4 months ago

Everyone is circling getting rid of the code and just having Englishscript https://jperla.com/blog/claude-electron-not-claudevm

gwbas1c 4 months ago

One thing I've discovered with an LLM is that I can ask it to search through my codebase and explain things to me. It saves a lot of time when I need to understand concepts that would otherwise require a few hours of reading and digging.

charcircuit 4 months ago

>I don't have data to support this

With there being data that shows context files which explain code reduces the performance of them, it is not straightforward that literate programming is better so without data this article is useless.

prpl 4 months ago

It has always been been possible to program literately in programming languages - not to the extent that you can in Web, but good code can read like a story and obviate comments

wewewedxfgdf 4 months ago

What we need is comments that LLMs simply do not delete.

We need metadata in source code that LLMs don't delete and interpreters/compilers/linters don't barf on.

ontouchstart 4 months ago

Literate programming in the sense of Donald Knuth is more about the chain of thoughts of the programmer than documenting code with comments or doc strings.

pjmlp 4 months ago

I rather go with formal specifications, and proofs.

DennisL123 4 months ago

If agents can already read and rewrite code, literate programming might actually be unnecessary. Instead of maintaining prose alongside code, you could let agents generate explanations on demand. The real requirement becomes writing code in a form that is easily interpretable and transformable by the next agent in the chain. In that model, code itself becomes the stable interface, while prose is just an ephemeral view generated whenever a human (or another agent) needs it.

tacone 4 months ago

I am already doing that. For performance, I am just caching the latest explanation alongside the code.

ChicagoDave 4 months ago

I think we’re on the verge of readable code and human-edited code disappearing.

There is a paradigm shift coming. Ephemeral code.

monsieurbanana 4 months ago

Those two are not linked. I could buy that maybe human-readable code will be the minority.
But what does ephemeral code even means? That we will throw everything out of the window at every release cycle and recreate from scratch with llms based on specs? That's not happening
- gombosg 4 months ago
  
  I think you're right, ephemeral code would be the concept that you have (I'm hand-waving) "the spec", that specifies what the code should be doing and the AI could regenerate the code any time based on it.
  I'm also baffled by this concept and fundamentally believe that code _should be_ the ground truth (the spec), hence it should be human readable. That's what "clean code" would be about, choosing tools and abstractions so that code is consumable for humans and easy to reason about, debug and extend.
  If we let go of that and rely on LLMs entirely... not sure where that would land, since computers ultimately execute the code - and the company is liable for the results of that code being executed -, not the plain language "specs".
- ChicagoDave 4 months ago
  
  By ephemeral I mean we no longer care about code as an asset. If a feature is broken or requires changes, we can perform a clean organ transplant. The actual code doesn’t matter anymore. Its testable functionality is what matters.

senderista 4 months ago

The "test runbook" approach that TFA describes sounds like doctest comments in Python or Rust.

whatgoodisaroad 4 months ago

it could be fun to make a toy compiler that takes an arbitrary literate prompt as input and uses an LLM to output a machine code executable (no intermediate structured language). could call it llmllvm. perhaps it would be tremendously dangerous

rudhdb773b 4 months ago

I'd love to see what Tim Daly could with LLMs on Axiom's code base.

koolala 4 months ago

Left to right APL style code seems like it could be words instead of symbols.

s3anw3 4 months ago

I think the tension between natural language and code is fundamentally about information compression. Code is maximally compressed intent — minimal redundancy, precise semantics. Prose is deliberately less compressed — redundant, contextual, forgiving — because human cognition benefits from that slack.

Literate programming asks you to maintain both compression levels in parallel, which has always been the problem: it's real work to keep a compressed and an uncompressed representation in sync, with no compiler to enforce consistency between them.

What's interesting about your observation is that LLMs are essentially compression/decompression engines. They're great at expanding code into prose (explaining) and condensing prose into code (implementing). The "fundamental extra labor" you describe — translating between these two levels — is exactly what they're best at.

So I agree with your conclusion: the economics have changed. The cost of maintaining both representations just dropped to near zero. Whether that makes literate programming practical at scale is still an open question, but the bottleneck was always cost, not value.

hsaliak 4 months ago

I explored this in std::slop (my clanker) https://github.com/hsaliak/std_slop. One of it's differentiating features of this clanker i that it only has a single tool call, run_js. The LLM produces js scripts to do it's work. Naturally, i tried to teach it to add comments for these scripts and incorporate literate programming elements. This was interesting because, every tool call now 'hydrated' some free form thinking, but it comes at output token cost.

Output Tokens are expensive! In GPT-5.4 it's ~180 dollars per Million tokens! I've settled for brief descriptions that communicate 'why' as a result. The code is documentation after all.

amelius 4 months ago

We need an append-only programming language.

anotheryou 4 months ago

but doesn't "the code is documentation" work better for machines?

and don't we have doc-blocks?

zdragnar 4 months ago

Code doesn't express intent, only the implementation. Docblocks are fine for specifying local behavior, but are terrible for big picture things.
- anotheryou 4 months ago
  
  right you are :)
  does literate code have a place for big pic though?
- palata 4 months ago
  
  Well many times it does.
  bool isEven(number: Int) { return number % 2 == 0 }
  I would say this expresses the intent, no need for a comment saying "check if the number is even".
  Most of the code I read (at work) is not documented, still I understand the intent. In open source projects, I used to go read the source code because the documentation is inexistent or out-of-date. To the point where now I actually go directly to the source code, because if the code is well written, I can actually understand it.
  
  zdragnar 4 months ago
  
  In your example, the implementation matches the intention. That is not the same thing.
  bool isWeekday(number: Int) { return number % 2 == 0 }
  With this small change, all we have are questions:
  Is the name wrong, or the behavior? Is this a copy / paste error? Where is the specification that tells me which is right, the name or the body? Where are the tests located that should verify the expected behavior?
  Did the implementation initially match the intent, but some business rule changed that necessitated a change to the implantation and the maintainer didn't bother to update the name?
  Both of our examples are rather trite- I agree that I wouldn't bother documenting the local behavior of an "isEven" function. I probably would want a bit of documentation at the callsite stating why the evenness of a given number is useful to know. Generally speaking, this is why I tend to dislike docblock style comments and prefer bigger picture documentation instead- because it better captures intent.
  
  palata 4 months ago
  
  I would call your example "bad code". Do you disagree with that?
  
  zdragnar 4 months ago
  
  Not at all. I'm just pointing out that code does not intrinsically convey intent, only implementation.
  To use a less trite example, I'd probably find some case where a word or name can have different meanings in different contexts, and how that can be confusing rather than clarifying without further documentation or knowledge of the problem space.
  Really though, any bug in the code you write is a deviation between intent and implementation. That's why documentation can be a useful supplement to code. If you haven't, take a look at the underhanded C contests- there's some fantastically good old gems in there that demonstrate how a plain reading of the code may not convey intent correctly.
  The winner of this contest might be a good example: https://www.underhanded-c.org/_page_id_26.html
  
  palata 4 months ago
  
  I feel like we're going from "literate programming" to "sometimes it makes sense to add comments". I agree with the latter. Good code is mostly unsurprising, and when it is surprising it deserves a comment. But that is more the exception than the rule.
  Literate programming makes it the rule.

xmcqdpt2 4 months ago

I'd much much rather the model write the code blocks than the prose myself. In my experience LLM can produce pretty decent code, but the writing is horrible. If anything I would prefer an agentic tool where you don't even see the slop. I definitely would rather it not be committed.

akater 4 months ago

The question posed is, “With agents, does it become practical to have large codebases that can be read like a narrative, whose prose is kept in sync with changes to the code by tireless machines?”

It's not practical to have codebases that can be read like a narrative, because that's not how we want to read them when we deal with the source code. We jump to definitions, arriving at different pieces of code in different paths, for different reasons, and presuming there is one universal, linear, book-style way to read that code, is frankly just absurd from this perspective. A programming language should be expressive enough to make code read easily, and tools should make it easy to navigate.

I believe my opinion on this matters more than an opinion of an average admirer of LP. By their own admission, they still mostly write code in boring plain text files. I write programs in org-mode all the time. Literally (no pun intended) all my libraries, outside of those written for a day job, are written in Org. I think it's important to note that they are all Lisp libraries, as my workflow might not be as great for something like C. The documentation in my Org files is mostly reduced to examples — I do like docstrings but I appreciate an exhaustive (or at least a rich enough) set of examples more, and writing them is much easier: I write them naturally as tests while I'm implementing a function. The examples are writen in Org blocks, and when I install a library of push an important commit, I run all tests, of which examples are but special cases. The effect is, this part of the documentation is always in sync with the code (of course, some tests fail, and they are marked as such when tests run). I know how to sync this with docstrings, too, if necessary; I haven't: it takes time to implement and I'm not sure the benefit will be that great.

My (limited, so far) experience with LLMs in this setting is nice: a set of pre-written examples provides a good entry point, and an LLM is often capable of producing a very satisfactory solution, immediately testable, of course. The general structure of my Org files with code is also quite strict.

I don't call this “literate programming”, however — I think LP is a mess of mostly wrong ideas — my approach is just a “notebook interface” to a program, inspired by Mathematica Notebooks, popularly (but not in a representative way) imitated by the now-famous Jupyter notebooks. The terminology doesn't matter much: what I'm describing is what the silly.business blogpost is largerly about. The author of nbdev is in the comments here; we're basically implementing the same idea.

silly.business mentions tangling which is a fundamental concept in LP and is a good example of what I dislike about LP: tangling, like several concepts behing LP, is only a thing due to limitations of the programming systems that Donald Knuth was using. When I write Common Lisp in Org, I do not need to tangle, because Common Lisp does not have many of the limitations that apparently influenced the concepts of LP. Much like “reading like a narrative” idea is misguided, for reasons I outlined in the beginning. Lisp is expressive enough to read like prose (or like anything else) to as large a degree as required, and, more generally, to have code organized as non-linearly as required. This argument, however, is irrelevant if we want LLMs, rather than us, read codebases like a book; but that's a different topic.

melagonster 4 months ago

This is a good opinion. Maybe humans do not really know how to teach this skill of reading code. We do not have a good, exact protocol because people rely on their personal heuristic methods.