This does seem exciting at first glance. Just write the narrative part of literate programming and an LLM generates the code, then keep the narrative and voila! Literate programming without the work of generating both.
However I see two major issues:
Narrative is meant to be consumed linearly. But code is consumed as a graph. We navigate from a symbol to its definition, or from definition to its uses, jumping from place to place in the code to understand it better. The narrative part of linear programming really only works for notebooks where the story being told is dominant and the code serves the story.
Second is that when I use an LLM to write code, the changes I describe usually require modifying several files at once. Where does this “narrative” go relative to the code.
- Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).
- LLMs are trained on tons of source code, which is arguably a smaller space than natural languages. My experience is that LLMs are really good at e.g. translating code between two programming languages. But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous.
- I wonder if it is a question of "natural languages vs programming languages" or "bad code vs good code". I could totally imagine that documenting bad code helps the LLMs (and the humans) understand the intent, while documenting good code actually adds ambiguity.
What I learned is that we write code for humans to read. Good code is code that clearly expresses the intent. If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-).
Of course there is an argument to make that the quality of code is generally getting worse every year, and therefore there is more and more a need for documentation around it because it's getting hard to understand what the hell the author wanted to do.
> If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-)
If good code was enough on its own we would read the source instead of documentation. I believe part of good software is good documentation. The prose of literate source is aimed at documentation, not line-level comments about implementation.
I don’t have my LLMs generate literate programming. I do ask it to talk about tradeoffs.
I have full examples of something that is heavily commented and explained, including links to any schemas or docs. I have gotten good results when I ask an LLM to use that as a template, that not everything in there needs to be used, and it cuts down on hallucinations by quite a bit.
> Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).
I loathe this take.
I have rocked up to codebases where there were specific rules banning comments because of this attitude.
Yes comments can lie, yes there are no guards ensuring they stay in lock step with the code they document, but not having them is a thousand times worse - I can always see WHAT code is doing, that's never the problem, the problems is WHY it was done in this manner.
I put comments like "This code runs in O(n) because there are only a handful of items ever going to be searched - update it when there are enough items to justify an O(log2 n) search"
That tells future developers that the author (me) KNOWS it's not the most efficient code possible, but it IS when you take into account things unknown by the person reading it
Edit: Tribal knowledge is the worst type of knowledge, it's assumed that everyone knows it, and pass it along when new people onboard, but the reality (for me) has always been that the people doing the onboarding have had fragments, or incorrect assumptions on what was being conveyed to them, and just like the childrens game of "telephone" the passing of the knowledge always ends in a disaster
"But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous."
Not only that, but there's something very annoying and deeply dissatisfying about typing a bunch of text into a thing for which you have no control over how its producing an output, nor can an output be reproduced even if the input is identical.
Agreed natural language is very ambiguous and becoming more ambiguous by the day "what exactly does 'vibe' mean?".
People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.
> People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.
Surely you don’t mean everyone in the 1960s spoke directly, free of metaphor or euphemism or nuance or doublespeak or dog whistle or any other kind or ambiguity? Then why are there people who dedicate their entire life to interpreting religious texts and the Constitution?
Not sure if the author know about CUE, here's the HN post from early this year on literate programming with CUE [1].
CUE is based of value-latticed logic that's LLM's NLP cousin but deterministic rather than stochastic [2].
LLMs are notoriously prone to generating syntactically valid but semantically broken configurations thus it should be used with CUE for improving literate programming for configs and guardrailing [3].
[1] CUE Does It All, But Can It Literate? (22 comments)
I think a lighter version of literate programming, coupled with languages that have a small API surface but are heavy on convention, is going to thrive in this age of agentic programming.
A lighter API footprint probably also means a higher amount of boilerplate code, but these models love cranking out boilerplate.
I’ve been doing a lot more Go instead of dynamic languages like Python or TypeScript these days. Mostly because if agents are writing the program, they might as well write it in a language that’s fast enough. Fast compilation means agents can quickly iterate on a design, execute it, and loop back.
The Go ecosystem is heavy on style guides, design patterns, and canonical ways of doing things. Mostly because the language doesn’t prevent obvious footguns like nil pointer errors, subtle race conditions in concurrent code, or context cancellation issues. So people rely heavily on patterns, and agents are quite good at picking those up.
My version of literate programming is ensuring that each package has enough top-level docs and that all public APIs have good docstrings. I also point agents to read the Google Go style guide [1] each time before working on my codebase.This yields surprisingly good results most of the time.
I agree with this. I've been a fan of literate programming for a long time, I just think it is a really nice mode of development, but since its inception it hasn't lived up to its promise because the tooling around the concept is lacking. Two of the biggest issues have been 1) having to learn a whole new toolchain outside of the compiler to generate the documents 2) the prose and code can "drift" meaning as the codebase evolves, what's described by the code isn't expressed by the prose and vice versa. Better languages and tooling design can solve the first problem, but I think AI potentially solves the second.
It's a literate coding tool that is co-designed with the host language Mech, so the prose can co-exist in the program AST. The plan is to make the whole document queryable and available at runtime.
As a live coding environment, you would co-write the program with AI, and it would have access to your whole document tree, as well as live type information and values (even intermediate ones) for your whole program. This rich context should help it make better decisions about the code it writes, hopefully leading to better synthesized program.
You could send the AI a prompt, then it could generate the code using live type information; execute it live within the context of your program in a safe environment to make sure it type checks, runs, and produces the expected values; and then you can integrate it into your codebase with a reference to the AI conversation that generated it, which itself is a valid Mechdown document.
That's the current work anyway -- the basis of this is the literate programming environment, which is already done.
We actually have had literate programming for a while, it just doesn’t look exactly how it was envisioned: Nowadays, it’s common for many libraries to have extensive documentation, including documentation, hyperlinks and testable examples directly inline in the form of comments. There’s usually a well defined convention for these comments to be converted into HTML and some of them link directly back to the relevant source code.
This isn’t to say they’re exactly what is meant by literate programming, but I gotta say we’re pretty damn close. Probably not much more than a pull request away for your preferred languages’ blessed documentation generator in fact.
(The two examples I’m using to draw my conclusions are Rust and Go).
I think that's exactly what is meant, and it's a great example. The two places where literate programming have shined most are 1) documentation because it's a natural fit there and you can get away with having little programs rather than focusing on a book-length narrative as Knuth had originally purposed it for. But also 2) notebook programming environments especially Jupyter and Org mode. I think programs structured in these notebooks really are perfectly situated for LLM analysis and extension, which is where the opportunity lies today.
I have noticed a trend recently that some practices (writing a decent README or architecture, being precise and unambiguous with language, providing context, literate programming) that were meant to help humans were not broadly adopted with the argument that it's too much effort. But when done to help an LLM instead of a human a lot of people suddenly seem to be a lot more motivated to put in the effort.
In my years of programming, I find that humans rarely give documentation more than a cursory glance up until they have specific questions. Then they ask another person if one is available rather than read for the answer.
The biggest problem is that humans don't need the documentation until they do. I recall one project that extensively used docblock style comments. You could open any file in the project and find at least one error, either in the natural language or the annotations.
If the LLM actually uses the documentation in every task it performs- or if it isn't capable of adequate output without it- then that's a far better motivation to document than we actually ever had for day to day work.
I have discovered that the measure of good documentation is not whether your team writes documentation, but is instead determined by whether they read it.
Paraphrasing an observation I stole many years ago:
A bunch of us thought learning to talk to computers would get them out of learning to talk to humans and so they spent 4 of the most important years of emotional growth engaging in that, only to graduate and discover they are even farther behind everyone else in that area.
Documentation rots a lot more quickly than the code - it doesn't need to be correct for the code to work. You are usually better off ignoring the comments (even more so the design document) and going straight to the code.
I maintain you’re either grossly misappropriating the time and energy of new and junior devs if this is the case on your project, or you have gone too long since hiring a new dev and your project is stagnating because of it.
New eyes don’t have the curse of knowledge. They don’t filter out the bullshit bits. And one of the advantages of creating reusable modules is you get more new eyes on your code regularly.
This may also be a place where AI can help. Some of the review tools are already calling us out on making the code not match the documentation.
I've had LLMs proactively fix my inline documentation. Rather pleasant surprise: "I noticed the comment is out of date and does not reflect the actual implementation" even asking me if it should fix it.
Considering LLMs are models of language, investing in the clarity of the written word pays off in spades.
I don't know whether "literate programming" per se is required. Good names, docstrings, type signatures, strategic comments re: "why", a good README, and thoughtfully-designed abstractions are enough to establish a solid pattern.
Going full "literate programming" may not be necessary. I'd maybe reframe it as a focus on communication. Notebooks, examples, scripts and such can go a long way to reinforcing the patterns.
Ultimately that's what it's about: establishing patterns for both your human readers and your LLMs to follow.
Yeah, I think what is needed is somewhere between docstrings+strategic comments, and literate programming.
Basically, it's incredibly helpful to document the higher-level structure of the code, almost like extensive docstrings at the file level and subdirectory level and project level.
The problem is that major architectural concepts and decisions are often cross-cutting across files and directories, so those aren't always the right places. And there's also the question of what properly belongs in code files, vs. what belongs in design documents, and how to ensure they are kept in sync.
The question being - are LLMs 'good' at interpreting and making choices/decisions about data structures and relationships?
I do not write code for a living but I studied comp sci. My impression was always that the good software engineers did not worry about the code, not nearly as much as the data structures and so on.
The only use of code is to process data, aka information. And any knowledge worker that the success of processing information is mostly relying on how it's organized (try operating a library without an index).
Most of the time is spent about researching what data is available and learning what data should be returned after the processing. Then you spend a bit of brain power to connect the two. The code is always trivial. I don't remember ever discussing code in the workplace since I started my career. It was always about plans (hypotheses), information (data inquiry), and specifications (especially when collaborating).
If the code is worrying you, it would be better to buy a book on whatever technology you're using and refresh your knowledge. I keep bookmarks in my web browser and have a few books on my shelf that I occasionally page through.
Nearly all my coding for the last decade or so has used literate programming. I built nbdev, which has let me write, document, and test my software using notebooks. Over the last couple of years we integrated LLMs with notebooks and nbdev to create Solveit, which everyone at our company uses for nearly all our work (even our lawyers, HR, etc).
It turns out literate programming is useful for a lot more than just programming!
Interesting and semi-related idea: use LLMs to flag when comments/docs have come out of sync with the code.
The big problem with documentation is that if it was accurate when it was written, it's just a matter of time before it goes stale compared to the code it's documenting. And while compilers can tell you if your types and your implementation have come out of sync, before now there's been nothing automated that can check whether your comments are still telling the truth.
I once had a mad idea of creating an automated documentation-driven paradigm where every directory/module/class/function has to have a DocString/JSDoc, with the higher level ones (directory/module) essentially being the documentation of features and architecture.
A ticket starts by someone opening a PR with suggested changes to the docs, the idea being that a non-technical person like a PM or tester could do it. The PR then passes to a dev who changes the code to match the doc changes.
Before merging, the tool shows the doc next to every modified piece of code and the reviewer must explicitly check a box to say it's still valid.
And docstrings would be able to link to other docstrings, so you could find out what other bits of code are connected to what you're working on (as that link doesn't always exist in code, e.g. across APIs) and read their docs to find the larger context and gotchas.
Thanks for the pointer. That looks more to me like it's totally synthesizing the docs for me. I can see someone somewhere wanting that. I would want a UX more like a compiler warning. "Comment on line 447 may no longer be accurate." And then I go fix it my own dang self.
It has always been been possible to program literately in programming languages - not to the extent that you can in Web, but good code can read like a story and obviate comments
Test code and production code in a symmetrical pair has lots of benefits. It’s a bit like double entry accounting - you can view the code’s behavior through a lens of the code itself, or the code that proves it does what it seems to do.
You can change the code by changing either tests or production code, and letting the other follow.
Code reviews are a breeze because if you’re confused by the production code, the test code often holds an explanation - and vice versa. So just switch from one to the other as needed.
Lots of benefits. The downside is how much extra code you end up with of course - up to you if the gains in readability make up for it.
I have instructed my LLMs to at least provide a comment per function, but prompt it to comment when it takes out things additionally, and why it opted to choose a particular solution. DistroTube also loves declarative literate programming approach, often citing how his one document configuration with nix configures his whole system.
I dont know Org, but Rakudoc https://docs.raku.org/language/pod is useful for literate programming (put the docs in the code source) and for LLM (the code is "self documenting" so that in the LLM inversion of control, the LLM can determine how to call the code).
https://podlite.org is this done in a language neutral way perl, JS/TS and raku for now.
Heres an example:
#!/usr/bin/env raku
=begin pod
=head1 NAME
Stats::Simple - Simple statistical utilities written in Raku
=head1 SYNOPSIS
use Stats::Simple;
my @numbers = 10, 20, 30, 40;
say mean(@numbers); # 25
say median(@numbers); # 25
=head1 DESCRIPTION
This module provides a few simple statistical helper functions
such as mean and median. It is meant as a small example showing
how Rakudoc documentation can be embedded directly inside Raku
source code.
=end pod
unit module Stats::Simple;
=begin pod
=head2 mean
mean(@values --> Numeric)
Returns the arithmetic mean (average) of a list of numeric values.
=head3 Parameters
=over 4
=item @values
A list of numeric values.
=back
=head3 Example
say mean(1, 2, 3, 4); # 2.5
=end pod
sub mean(*@values --> Numeric) is export {
die "No values supplied" if @values.elems == 0;
@values.sum / @values.elems;
}
=begin pod
=head2 median
median(@values --> Numeric)
Returns the median value of a list of numbers.
If the list length is even, the function returns the mean of
the two middle values.
=head3 Example
say median(1, 5, 3); # 3
say median(1, 2, 3, 4); # 2.5
=end pod
sub median(*@values --> Numeric) is export {
die "No values supplied" if @values.elems == 0;
my @sorted = @values.sort;
my $n = @sorted.elems;
return @sorted[$n div 2] if $n % 2;
(@sorted[$n/2 - 1] + @sorted[$n/2]) / 2;
}
=begin pod
=head1 AUTHOR
Example written to demonstrate Rakudoc usage.
=head1 LICENSE
Public domain / example code.
=end pod
I've had the same thought, maybe more grandiosely. The idea is that LLM prompts are code -- after all they are text that gets 'compiled' (by the LLM) into a lower-level language (the actual code). The compile process is more involved because it might involve some back-and-forth, but on the other hand it is much higher level. The goal is to have a web of prompts become the source of truth for the software: sort of like the flowchart that describes the codebase 'is' the codebase.
No it doesn’t get compiled. Compilation is a translation from one formal language to another that can be rigorously modeled and is generally reproducible.
Translating from a natural language spec to code involves a truly massive amount of decision making because it’s ambiguous. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.
Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously or constantly recompile specs that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.
Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand.
The only solution to this problem is that LLMs get better.
Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Max or Star Trek.
well you have to expand your definition of "compile" a bit. There is clearly a similarity, whether or not you want to call it the same word. Maybe it needs a neologism akin to 'transpiled'.
other than that you seem to be arguing against someone other than me. I certainly agree that agents / existing options would be chaotic hell to use this way. But I think the high-level idea has some potential, independent of that.
One problem with this is that there isn't really a "current prompt" that completely describes the current source code; each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
true, but that just means that's the problem to solve. probably the ideal architecture isn't possible right now. But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it, so that eventually it becomes a full 'spec'.
And maybe there is a way to trim the parts out of it that are not needed... like to automatically produce an initial prompt which is equivalent to the results of a longer session, but is precise enough so as to not need clarification upon reprocessing it. Something like that? I'm not sure if that's something that already exists.
I think full literate programming is overkill but I've been doing a lighter version of this:
- Module level comments with explanations of the purpose of the module and how it fits into the whole codebase.
- Document all methods, constants, and variables, public and private. A single terse sentence is enough, no need to go crazy.
- Document each block of code. Again, a single sentence is enough. The goal is to be able to know what that block does in plain English without having to "read" code. Reading code is a misnomer because it is a different ability from reading human language.
For me this is where a config layer shines. Develop a decent framework and then let the agents spin out the configuration.
This allows a trusted and tested abstraction layer that does not shift and makes maintenance easier, while making the code that the agents generate easier to review and it also uses much less tokens.
I work with a project that is heavily configuration-driven. It seems promising, but in reality:
- Configuration is massively duplicated, across repositories
- No one is willing to rip out redundancy, because comprehensive testing is not practical
- In order to understand the configuration, you have to read lots of code, again across multiple repositories (this in particular is a problem for LLM assistance, at least the way we currently use it)
I love the idea, but in practice it’s currently a nightmare. I think if we took a week we could clean things up a fair bit, but we don’t have a week (at least as far as management is concerned), and again, without full functional testing, it’s difficult to know when you’ve accidentally broken someone else’s subsystem
Now that I've returned to working on the project tonight, I just remembered another failing of our code. (I'm not in any way claiming these are universal problems, just that they are something to be wary of.)
Naming is so incredibly important. The wrong name for a configuration key can have cascading impacts, especially when there is "magic" involved, like stripping out or adding common prefixes to configuration values.
We have a concept called a "domain" which is treated as a magic value everywhere, such as adding a prefix or suffix. But domain isn't well-defined, and in different contexts it is used different ways, and figuring out what the impact is of choosing a domain string is typically a matter of trial and error.
I fully agree. (Seeing how good Figment2 is for layered config in rust is wildly eye opening, has been a revelatory experience.)
Sometimes what we manage with config is itself processing pipelines. A tool like darktable has a series of processing steps that are run. Each of those has config, but the outer layer is itself a config of those inner configs. And the outer layer is a programmable pipeline; it's not that far apart from thinking of each user coming in and building their own http handler pipeline, making their own bespoke computational flow.
I guess my point is that computation itself is configuration. XSLT probably came closest to that sun. But we see similar lessons everywhere we look.
I explored this in std::slop (my clanker) https://github.com/hsaliak/std_slop. One of it's differentiating features of this clanker i that it only has a single tool call, run_js.
The LLM produces js scripts to do it's work. Naturally, i tried to teach it to add comments for these scripts and incorporate literate programming elements. This was interesting because, every tool call now 'hydrated' some free form thinking, but it comes at output token cost.
Output Tokens are expensive! In GPT-5.4 it's ~180 dollars per Million tokens!
I've settled for brief descriptions that communicate 'why' as a result. The code is documentation after all.
Something in this realm covers my practice. I just keep a master prompt for the whole program, and sparsely documented code. When it's time to use LLM's in the dev process, they always get a copy of both and it makes the whole process like 10x as coherent and continuous. Obvi when a change is made that deviates or greatly expands on the spec, I update the spec.
I do something similar with quality gates. I have a bunch of markdown files at the ready to point agents to for various purposes. It lets me leverage LLMs at any stage of the dev process and my clients get docs in their format without much maintenance from myself. As you said once you get it down it becomes a very coherent process that can be iterated on in its own right.
I am currently fighting the recursive improvement loop part of working with agents.
Anecdotally, Claude Opus is at least okay at literate emacs. Sometimes takes a few rounds to fix its own syntax errors, but it gets the idea. Requiring it to TDD its way in with Buttercup helps.
Take it to the logical conclusion. Track the intended behavior in a proper issue tracking software like Jira. Reference the ticket in your version control system.
Boring and reliable, I know.
If you need guides to the code base beyond what the programming language provides, just write a directory level readme.md where necessary.
I think the externality of issue tracking systems like Jira (or even GitHub) cause friction. Literate programming has everything in one place.
I’d like to have a good issue tracking system inside git. I think the SQLite version management system has this functionality but I never used it.
One thing to solve is that different kinds of users need to interact with it in different kinds of ways. Non-programmers can use Jira, for example. Issues are often treated as mutable text boxes rather than versioned specification (and git is designed for the latter). It’s tricky!
it could be fun to make a toy compiler that takes an arbitrary literate prompt as input and uses an LLM to output a machine code executable (no intermediate structured language). could call it llmllvm. perhaps it would be tremendously dangerous
> This is especially important if the primary role of engineers is shifting from writing to reading.
This was always the primary role. The only people who ever said it was about writing just wanted an easy sales pitch aimed at everyone else.
Literate programming failed to take off because with that much prose it inevitably misrepresents the actual code. Most normal comments are bad enough.
It's hard to maintain any writing that doesn't actually change the result. You can't "test" comments. The author doesn't even need to know why the code works to write comments that are convincing at first glance. If we want to read lies influenced by office politics, we already have the rest of the docs.
Now that would be really interesting: prompt an LLM to find comments that misrepresent the code! I wonder how many false positives that would bring up?
I have a Claude Code skill for adding, deleting and improving comments. It does a decent job at detecting when comments are out of date with the code and updating them. It's not perfect, but it's something.
I don't buy that. Writing is taking a bad rap from all this. Writing _is_ a form of more intense reading. Reading on steroids, as they say. If reading is considered good, writing should be considered better.
Writing in that draft style is really only useful because a) you read the results and b) you write an improved version at the end. Drafting forever is not considered "better" because someone (usually you) has to sift through the crap to find the good parts.
This is especially pronounced in the programming workplace, where the most "senior" programmers are asked to stop programming so they can review PRs.
You're right that you can't test comments, but you can test the code they describe. That's what reproducibility bundles do in scientific computing ;; the prose says "we filtered variants with MAF < 0.01", and the bundle includes the exact shell command, environment, and checksums so anyone can verify the prose matches reality. The prose becomes a testable claim rather than a decorative comment. That said, I agree the failure mode of literate programming is prose that drifts from code. The question is whether agents reduce that drift enough to change the calculus.
I would say this expresses the intent, no need for a comment saying "check if the number is even".
Most of the code I read (at work) is not documented, still I understand the intent. In open source projects, I used to go read the source code because the documentation is inexistent or out-of-date. To the point where now I actually go directly to the source code, because if the code is well written, I can actually understand it.
With there being data that shows context files which explain code reduces the performance of them, it is not straightforward that literate programming is better so without data this article is useless.
One of the things I love most about WebMCP is the idea that it's a MCP session that exists on the page, which the user already knows.
Most of these LLM things are kind of separate systems, with their own UI. The idea of agency being inlayed to existing systems the user knows like this, with immediate bidirectional feedback as the user and LLM work the page, is incredibly incredibly compelling to me.
The question posed is, “With agents, does it become practical to have large codebases that can be read like a narrative, whose prose is kept in sync with changes to the code by tireless machines?”
It's not practical to have codebases that can be read like a narrative, because that's not how we want to read them when we deal with the source code. We jump to definitions, arriving at different pieces of code in different paths, for different reasons, and presuming there is one universal, linear, book-style way to read that code, is frankly just absurd from this perspective. A programming language should be expressive enough to make code read easily, and tools should make it easy to navigate.
I believe my opinion on this matters more than an opinion of an average admirer of LP. By their own admission, they still mostly write code in boring plain text files. I write programs in org-mode all the time. Literally (no pun intended) all my libraries, outside of those written for a day job, are written in Org. I think it's important to note that they are all Lisp libraries, as my workflow might not be as great for something like C. The documentation in my Org files is mostly reduced to examples — I do like docstrings but I appreciate an exhaustive (or at least a rich enough) set of examples more, and writing them is much easier: I write them naturally as tests while I'm implementing a function. The examples are writen in Org blocks, and when I install a library of push an important commit, I run all tests, of which examples are but special cases. The effect is, this part of the documentation is always in sync with the code (of course, some tests fail, and they are marked as such when tests run). I know how to sync this with docstrings, too, if necessary; I haven't: it takes time to implement and I'm not sure the benefit will be that great.
My (limited, so far) experience with LLMs in this setting is nice: a set of pre-written examples provides a good entry point, and an LLM is often capable of producing a very satisfactory solution, immediately testable, of course. The general structure of my Org files with code is also quite strict.
I don't call this “literate programming”, however — I think LP is a mess of mostly wrong ideas — my approach is just a “notebook interface” to a program, inspired by Mathematica Notebooks, popularly (but not in a representative way) imitated by the now-famous Jupyter notebooks. The terminology doesn't matter much: what I'm describing is what the silly.business blogpost is largerly about. The author of nbdev is in the comments here; we're basically implementing the same idea.
silly.business mentions tangling which is a fundamental concept in LP and is a good example of what I dislike about LP: tangling, like several concepts behing LP, is only a thing due to limitations of the programming systems that Donald Knuth was using. When I write Common Lisp in Org, I do not need to tangle, because Common Lisp does not have many of the limitations that apparently influenced the concepts of LP. Much like “reading like a narrative” idea is misguided, for reasons I outlined in the beginning. Lisp is expressive enough to read like prose (or like anything else) to as large a degree as required, and, more generally, to have code organized as non-linearly as required. This argument, however, is irrelevant if we want LLMs, rather than us, read codebases like a book; but that's a different topic.
This is a good opinion. Maybe humans do not really know how to teach this skill of reading code. We do not have a good, exact protocol because people rely on their personal heuristic methods.
This does seem exciting at first glance. Just write the narrative part of literate programming and an LLM generates the code, then keep the narrative and voila! Literate programming without the work of generating both.
However I see two major issues:
Narrative is meant to be consumed linearly. But code is consumed as a graph. We navigate from a symbol to its definition, or from definition to its uses, jumping from place to place in the code to understand it better. The narrative part of linear programming really only works for notebooks where the story being told is dominant and the code serves the story.
Second is that when I use an LLM to write code, the changes I describe usually require modifying several files at once. Where does this “narrative” go relative to the code.
And yes, these two issues are closely related.
I am not convinced.
- Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).
- LLMs are trained on tons of source code, which is arguably a smaller space than natural languages. My experience is that LLMs are really good at e.g. translating code between two programming languages. But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous.
- I wonder if it is a question of "natural languages vs programming languages" or "bad code vs good code". I could totally imagine that documenting bad code helps the LLMs (and the humans) understand the intent, while documenting good code actually adds ambiguity.
What I learned is that we write code for humans to read. Good code is code that clearly expresses the intent. If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-).
Of course there is an argument to make that the quality of code is generally getting worse every year, and therefore there is more and more a need for documentation around it because it's getting hard to understand what the hell the author wanted to do.
> If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-)
If good code was enough on its own we would read the source instead of documentation. I believe part of good software is good documentation. The prose of literate source is aimed at documentation, not line-level comments about implementation.
https://diataxis.fr/
(originally developed at: https://docs.divio.com/documentation-system/) --- divides documentation along two axes:
- Action (Practical) vs. Cognition (Theoretical)
- Acquisition (Studying) vs. Application (Working)
which for my current project has resulted in:
- readme.md --- (Overview) Explanation (understanding-oriented)
- Templates (small source snippets) --- Tutorials (learning-oriented)
- Literate Source (pdf) --- How-to Guides (problem-oriented)
- Index (of the above pdf) --- Reference (information-oriented)
I don’t have my LLMs generate literate programming. I do ask it to talk about tradeoffs.
I have full examples of something that is heavily commented and explained, including links to any schemas or docs. I have gotten good results when I ask an LLM to use that as a template, that not everything in there needs to be used, and it cuts down on hallucinations by quite a bit.
> Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).
I loathe this take.
I have rocked up to codebases where there were specific rules banning comments because of this attitude.
Yes comments can lie, yes there are no guards ensuring they stay in lock step with the code they document, but not having them is a thousand times worse - I can always see WHAT code is doing, that's never the problem, the problems is WHY it was done in this manner.
I put comments like "This code runs in O(n) because there are only a handful of items ever going to be searched - update it when there are enough items to justify an O(log2 n) search"
That tells future developers that the author (me) KNOWS it's not the most efficient code possible, but it IS when you take into account things unknown by the person reading it
Edit: Tribal knowledge is the worst type of knowledge, it's assumed that everyone knows it, and pass it along when new people onboard, but the reality (for me) has always been that the people doing the onboarding have had fragments, or incorrect assumptions on what was being conveyed to them, and just like the childrens game of "telephone" the passing of the knowledge always ends in a disaster
"But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous."
Not only that, but there's something very annoying and deeply dissatisfying about typing a bunch of text into a thing for which you have no control over how its producing an output, nor can an output be reproduced even if the input is identical.
Agreed natural language is very ambiguous and becoming more ambiguous by the day "what exactly does 'vibe' mean?".
People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.
> People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.
Surely you don’t mean everyone in the 1960s spoke directly, free of metaphor or euphemism or nuance or doublespeak or dog whistle or any other kind or ambiguity? Then why are there people who dedicate their entire life to interpreting religious texts and the Constitution?
Compared with today, on average, they did.
There's a generation of people that 'typ lyk dis'.
So yes.
Not sure if the author know about CUE, here's the HN post from early this year on literate programming with CUE [1].
CUE is based of value-latticed logic that's LLM's NLP cousin but deterministic rather than stochastic [2].
LLMs are notoriously prone to generating syntactically valid but semantically broken configurations thus it should be used with CUE for improving literate programming for configs and guardrailing [3].
[1] CUE Does It All, But Can It Literate? (22 comments)
https://news.ycombinator.com/item?id=46588607
[2] The Logic of CUE:
https://cuelang.org/docs/concept/the-logic-of-cue/
[3] Guardrailing Intuition: Towards Reliable AI:
https://cue.dev/blog/guardrailing-intuition-towards-reliable...
I think a lighter version of literate programming, coupled with languages that have a small API surface but are heavy on convention, is going to thrive in this age of agentic programming.
A lighter API footprint probably also means a higher amount of boilerplate code, but these models love cranking out boilerplate.
I’ve been doing a lot more Go instead of dynamic languages like Python or TypeScript these days. Mostly because if agents are writing the program, they might as well write it in a language that’s fast enough. Fast compilation means agents can quickly iterate on a design, execute it, and loop back.
The Go ecosystem is heavy on style guides, design patterns, and canonical ways of doing things. Mostly because the language doesn’t prevent obvious footguns like nil pointer errors, subtle race conditions in concurrent code, or context cancellation issues. So people rely heavily on patterns, and agents are quite good at picking those up.
My version of literate programming is ensuring that each package has enough top-level docs and that all public APIs have good docstrings. I also point agents to read the Google Go style guide [1] each time before working on my codebase.This yields surprisingly good results most of the time.
[1] https://google.github.io/styleguide/go/
I agree with this. I've been a fan of literate programming for a long time, I just think it is a really nice mode of development, but since its inception it hasn't lived up to its promise because the tooling around the concept is lacking. Two of the biggest issues have been 1) having to learn a whole new toolchain outside of the compiler to generate the documents 2) the prose and code can "drift" meaning as the codebase evolves, what's described by the code isn't expressed by the prose and vice versa. Better languages and tooling design can solve the first problem, but I think AI potentially solves the second.
Here's the current version of my literate programming ideas, Mechdown: https://mech-lang.org/post/2025-11-12-mechdown/
It's a literate coding tool that is co-designed with the host language Mech, so the prose can co-exist in the program AST. The plan is to make the whole document queryable and available at runtime.
As a live coding environment, you would co-write the program with AI, and it would have access to your whole document tree, as well as live type information and values (even intermediate ones) for your whole program. This rich context should help it make better decisions about the code it writes, hopefully leading to better synthesized program.
You could send the AI a prompt, then it could generate the code using live type information; execute it live within the context of your program in a safe environment to make sure it type checks, runs, and produces the expected values; and then you can integrate it into your codebase with a reference to the AI conversation that generated it, which itself is a valid Mechdown document.
That's the current work anyway -- the basis of this is the literate programming environment, which is already done.
The docs show off some more examples of the code, which I anticipate will be mostly written by AIs in the future: https://docs.mech-lang.org/getting-started/introduction.html
We actually have had literate programming for a while, it just doesn’t look exactly how it was envisioned: Nowadays, it’s common for many libraries to have extensive documentation, including documentation, hyperlinks and testable examples directly inline in the form of comments. There’s usually a well defined convention for these comments to be converted into HTML and some of them link directly back to the relevant source code.
This isn’t to say they’re exactly what is meant by literate programming, but I gotta say we’re pretty damn close. Probably not much more than a pull request away for your preferred languages’ blessed documentation generator in fact.
(The two examples I’m using to draw my conclusions are Rust and Go).
I think that's exactly what is meant, and it's a great example. The two places where literate programming have shined most are 1) documentation because it's a natural fit there and you can get away with having little programs rather than focusing on a book-length narrative as Knuth had originally purposed it for. But also 2) notebook programming environments especially Jupyter and Org mode. I think programs structured in these notebooks really are perfectly situated for LLM analysis and extension, which is where the opportunity lies today.
I have noticed a trend recently that some practices (writing a decent README or architecture, being precise and unambiguous with language, providing context, literate programming) that were meant to help humans were not broadly adopted with the argument that it's too much effort. But when done to help an LLM instead of a human a lot of people suddenly seem to be a lot more motivated to put in the effort.
In my years of programming, I find that humans rarely give documentation more than a cursory glance up until they have specific questions. Then they ask another person if one is available rather than read for the answer.
The biggest problem is that humans don't need the documentation until they do. I recall one project that extensively used docblock style comments. You could open any file in the project and find at least one error, either in the natural language or the annotations.
If the LLM actually uses the documentation in every task it performs- or if it isn't capable of adequate output without it- then that's a far better motivation to document than we actually ever had for day to day work.
The other problem is that documentation is always out of date, and one wrong answer can waste more time than 10 "I don't knows".
I have discovered that the measure of good documentation is not whether your team writes documentation, but is instead determined by whether they read it.
Well maybe if those people were managing one or more programmers and not writing the code themselves, they would have worked similarly.
Paraphrasing an observation I stole many years ago:
A bunch of us thought learning to talk to computers would get them out of learning to talk to humans and so they spent 4 of the most important years of emotional growth engaging in that, only to graduate and discover they are even farther behind everyone else in that area.
Documentation rots a lot more quickly than the code - it doesn't need to be correct for the code to work. You are usually better off ignoring the comments (even more so the design document) and going straight to the code.
I maintain you’re either grossly misappropriating the time and energy of new and junior devs if this is the case on your project, or you have gone too long since hiring a new dev and your project is stagnating because of it.
New eyes don’t have the curse of knowledge. They don’t filter out the bullshit bits. And one of the advantages of creating reusable modules is you get more new eyes on your code regularly.
This may also be a place where AI can help. Some of the review tools are already calling us out on making the code not match the documentation.
I've had LLMs proactively fix my inline documentation. Rather pleasant surprise: "I noticed the comment is out of date and does not reflect the actual implementation" even asking me if it should fix it.
I find LLMs more diligent about keeping the documentation than any human developer, including myself.
Considering LLMs are models of language, investing in the clarity of the written word pays off in spades.
I don't know whether "literate programming" per se is required. Good names, docstrings, type signatures, strategic comments re: "why", a good README, and thoughtfully-designed abstractions are enough to establish a solid pattern.
Going full "literate programming" may not be necessary. I'd maybe reframe it as a focus on communication. Notebooks, examples, scripts and such can go a long way to reinforcing the patterns.
Ultimately that's what it's about: establishing patterns for both your human readers and your LLMs to follow.
Notebooks are an example of literate programming.
Yeah, I think what is needed is somewhere between docstrings+strategic comments, and literate programming.
Basically, it's incredibly helpful to document the higher-level structure of the code, almost like extensive docstrings at the file level and subdirectory level and project level.
The problem is that major architectural concepts and decisions are often cross-cutting across files and directories, so those aren't always the right places. And there's also the question of what properly belongs in code files, vs. what belongs in design documents, and how to ensure they are kept in sync.
Also:
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
-- Linus Torvalds
> "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
If you get the architecture wrong, everyone complains. If you get it right, nobody notices it's there.
The SRE's Lament.
"Nothing needs fixing, so what do we pay you for?"
"Everything's broken! What do we even pay you for!?"
Doesnt this apply with the hysteria of LLMs?
The question being - are LLMs 'good' at interpreting and making choices/decisions about data structures and relationships?
I do not write code for a living but I studied comp sci. My impression was always that the good software engineers did not worry about the code, not nearly as much as the data structures and so on.
The only use of code is to process data, aka information. And any knowledge worker that the success of processing information is mostly relying on how it's organized (try operating a library without an index).
Most of the time is spent about researching what data is available and learning what data should be returned after the processing. Then you spend a bit of brain power to connect the two. The code is always trivial. I don't remember ever discussing code in the workplace since I started my career. It was always about plans (hypotheses), information (data inquiry), and specifications (especially when collaborating).
If the code is worrying you, it would be better to buy a book on whatever technology you're using and refresh your knowledge. I keep bookmarks in my web browser and have a few books on my shelf that I occasionally page through.
Nearly all my coding for the last decade or so has used literate programming. I built nbdev, which has let me write, document, and test my software using notebooks. Over the last couple of years we integrated LLMs with notebooks and nbdev to create Solveit, which everyone at our company uses for nearly all our work (even our lawyers, HR, etc).
It turns out literate programming is useful for a lot more than just programming!
Interesting and semi-related idea: use LLMs to flag when comments/docs have come out of sync with the code.
The big problem with documentation is that if it was accurate when it was written, it's just a matter of time before it goes stale compared to the code it's documenting. And while compilers can tell you if your types and your implementation have come out of sync, before now there's been nothing automated that can check whether your comments are still telling the truth.
Somebody could make a startup out of this.
I'm a technical writer. Off the top of my head I reckon at least 10 startups have … started up … in this space since 2023.
I once had a mad idea of creating an automated documentation-driven paradigm where every directory/module/class/function has to have a DocString/JSDoc, with the higher level ones (directory/module) essentially being the documentation of features and architecture. A ticket starts by someone opening a PR with suggested changes to the docs, the idea being that a non-technical person like a PM or tester could do it. The PR then passes to a dev who changes the code to match the doc changes. Before merging, the tool shows the doc next to every modified piece of code and the reviewer must explicitly check a box to say it's still valid. And docstrings would be able to link to other docstrings, so you could find out what other bits of code are connected to what you're working on (as that link doesn't always exist in code, e.g. across APIs) and read their docs to find the larger context and gotchas.
There is at least one startup doing it already (I'm not affiliated with it in any way): https://promptless.ai/
Thanks for the pointer. That looks more to me like it's totally synthesizing the docs for me. I can see someone somewhere wanting that. I would want a UX more like a compiler warning. "Comment on line 447 may no longer be accurate." And then I go fix it my own dang self.
Why would you need comments from an AI if you can just ask it what the code is doing?
Because the human needs to tell the AI whether it’s the code or the comment that’s wrong.
Because only a human writer can explain why he did the resolution. But nobody wants to update comments each time.
If you have CI hooked up to AI you could you just use a SLM to do that in a periodic job with https://github.github.com/gh-aw/ or https://www.continue.dev/. You could also have it detect architectural drift.
It has always been been possible to program literately in programming languages - not to the extent that you can in Web, but good code can read like a story and obviate comments
Test code and production code in a symmetrical pair has lots of benefits. It’s a bit like double entry accounting - you can view the code’s behavior through a lens of the code itself, or the code that proves it does what it seems to do.
You can change the code by changing either tests or production code, and letting the other follow.
Code reviews are a breeze because if you’re confused by the production code, the test code often holds an explanation - and vice versa. So just switch from one to the other as needed.
Lots of benefits. The downside is how much extra code you end up with of course - up to you if the gains in readability make up for it.
I have instructed my LLMs to at least provide a comment per function, but prompt it to comment when it takes out things additionally, and why it opted to choose a particular solution. DistroTube also loves declarative literate programming approach, often citing how his one document configuration with nix configures his whole system.
I dont know Org, but Rakudoc https://docs.raku.org/language/pod is useful for literate programming (put the docs in the code source) and for LLM (the code is "self documenting" so that in the LLM inversion of control, the LLM can determine how to call the code).
https://podlite.org is this done in a language neutral way perl, JS/TS and raku for now.
Heres an example:
Everyone is circling getting rid of the code and just having Englishscript https://jperla.com/blog/claude-electron-not-claudevm
I've had the same thought, maybe more grandiosely. The idea is that LLM prompts are code -- after all they are text that gets 'compiled' (by the LLM) into a lower-level language (the actual code). The compile process is more involved because it might involve some back-and-forth, but on the other hand it is much higher level. The goal is to have a web of prompts become the source of truth for the software: sort of like the flowchart that describes the codebase 'is' the codebase.
No it doesn’t get compiled. Compilation is a translation from one formal language to another that can be rigorously modeled and is generally reproducible.
Translating from a natural language spec to code involves a truly massive amount of decision making because it’s ambiguous. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.
Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously or constantly recompile specs that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.
Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand. The only solution to this problem is that LLMs get better.
Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Max or Star Trek.
well you have to expand your definition of "compile" a bit. There is clearly a similarity, whether or not you want to call it the same word. Maybe it needs a neologism akin to 'transpiled'.
other than that you seem to be arguing against someone other than me. I certainly agree that agents / existing options would be chaotic hell to use this way. But I think the high-level idea has some potential, independent of that.
One problem with this is that there isn't really a "current prompt" that completely describes the current source code; each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
true, but that just means that's the problem to solve. probably the ideal architecture isn't possible right now. But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it, so that eventually it becomes a full 'spec'.
And maybe there is a way to trim the parts out of it that are not needed... like to automatically produce an initial prompt which is equivalent to the results of a longer session, but is precise enough so as to not need clarification upon reprocessing it. Something like that? I'm not sure if that's something that already exists.
I think full literate programming is overkill but I've been doing a lighter version of this:
- Module level comments with explanations of the purpose of the module and how it fits into the whole codebase.
- Document all methods, constants, and variables, public and private. A single terse sentence is enough, no need to go crazy.
- Document each block of code. Again, a single sentence is enough. The goal is to be able to know what that block does in plain English without having to "read" code. Reading code is a misnomer because it is a different ability from reading human language.
Example from one of my open-source projects: https://github.com/trane-project/trane/blob/master/src/sched...
What we need is comments that LLMs simply do not delete.
We need metadata in source code that LLMs don't delete and interpreters/compilers/linters don't barf on.
For me this is where a config layer shines. Develop a decent framework and then let the agents spin out the configuration.
This allows a trusted and tested abstraction layer that does not shift and makes maintenance easier, while making the code that the agents generate easier to review and it also uses much less tokens.
So as always, just build better abstractions.
All of that is just code.
Frameworks are just overly brittle and fragile libraries that overly restrict how you can use them.
I work with a project that is heavily configuration-driven. It seems promising, but in reality:
- Configuration is massively duplicated, across repositories
- No one is willing to rip out redundancy, because comprehensive testing is not practical
- In order to understand the configuration, you have to read lots of code, again across multiple repositories (this in particular is a problem for LLM assistance, at least the way we currently use it)
I love the idea, but in practice it’s currently a nightmare. I think if we took a week we could clean things up a fair bit, but we don’t have a week (at least as far as management is concerned), and again, without full functional testing, it’s difficult to know when you’ve accidentally broken someone else’s subsystem
Now that I've returned to working on the project tonight, I just remembered another failing of our code. (I'm not in any way claiming these are universal problems, just that they are something to be wary of.)
Naming is so incredibly important. The wrong name for a configuration key can have cascading impacts, especially when there is "magic" involved, like stripping out or adding common prefixes to configuration values.
We have a concept called a "domain" which is treated as a magic value everywhere, such as adding a prefix or suffix. But domain isn't well-defined, and in different contexts it is used different ways, and figuring out what the impact is of choosing a domain string is typically a matter of trial and error.
I fully agree. (Seeing how good Figment2 is for layered config in rust is wildly eye opening, has been a revelatory experience.)
Sometimes what we manage with config is itself processing pipelines. A tool like darktable has a series of processing steps that are run. Each of those has config, but the outer layer is itself a config of those inner configs. And the outer layer is a programmable pipeline; it's not that far apart from thinking of each user coming in and building their own http handler pipeline, making their own bespoke computational flow.
I guess my point is that computation itself is configuration. XSLT probably came closest to that sun. But we see similar lessons everywhere we look.
when do you think we'll get to build real software?
I explored this in std::slop (my clanker) https://github.com/hsaliak/std_slop. One of it's differentiating features of this clanker i that it only has a single tool call, run_js. The LLM produces js scripts to do it's work. Naturally, i tried to teach it to add comments for these scripts and incorporate literate programming elements. This was interesting because, every tool call now 'hydrated' some free form thinking, but it comes at output token cost.
Output Tokens are expensive! In GPT-5.4 it's ~180 dollars per Million tokens! I've settled for brief descriptions that communicate 'why' as a result. The code is documentation after all.
Something in this realm covers my practice. I just keep a master prompt for the whole program, and sparsely documented code. When it's time to use LLM's in the dev process, they always get a copy of both and it makes the whole process like 10x as coherent and continuous. Obvi when a change is made that deviates or greatly expands on the spec, I update the spec.
I do something similar with quality gates. I have a bunch of markdown files at the ready to point agents to for various purposes. It lets me leverage LLMs at any stage of the dev process and my clients get docs in their format without much maintenance from myself. As you said once you get it down it becomes a very coherent process that can be iterated on in its own right.
I am currently fighting the recursive improvement loop part of working with agents.
I'd love to see what Tim Daly could with LLMs on Axiom's code base.
Anecdotally, Claude Opus is at least okay at literate emacs. Sometimes takes a few rounds to fix its own syntax errors, but it gets the idea. Requiring it to TDD its way in with Buttercup helps.
Take it to the logical conclusion. Track the intended behavior in a proper issue tracking software like Jira. Reference the ticket in your version control system.
Boring and reliable, I know.
If you need guides to the code base beyond what the programming language provides, just write a directory level readme.md where necessary.
I think the externality of issue tracking systems like Jira (or even GitHub) cause friction. Literate programming has everything in one place.
I’d like to have a good issue tracking system inside git. I think the SQLite version management system has this functionality but I never used it.
One thing to solve is that different kinds of users need to interact with it in different kinds of ways. Non-programmers can use Jira, for example. Issues are often treated as mutable text boxes rather than versioned specification (and git is designed for the latter). It’s tricky!
it could be fun to make a toy compiler that takes an arbitrary literate prompt as input and uses an LLM to output a machine code executable (no intermediate structured language). could call it llmllvm. perhaps it would be tremendously dangerous
I rather go with formal specifications, and proofs.
Left to right APL style code seems like it could be words instead of symbols.
The "test runbook" approach that TFA describes sounds like doctest comments in Python or Rust.
> This is especially important if the primary role of engineers is shifting from writing to reading.
This was always the primary role. The only people who ever said it was about writing just wanted an easy sales pitch aimed at everyone else.
Literate programming failed to take off because with that much prose it inevitably misrepresents the actual code. Most normal comments are bad enough.
It's hard to maintain any writing that doesn't actually change the result. You can't "test" comments. The author doesn't even need to know why the code works to write comments that are convincing at first glance. If we want to read lies influenced by office politics, we already have the rest of the docs.
> You can't "test" comments.
I'm thinking that we're approaching a world where you can both test for comments and test the comments themselves.
Now that would be really interesting: prompt an LLM to find comments that misrepresent the code! I wonder how many false positives that would bring up?
I have a Claude Code skill for adding, deleting and improving comments. It does a decent job at detecting when comments are out of date with the code and updating them. It's not perfect, but it's something.
I don't buy that. Writing is taking a bad rap from all this. Writing _is_ a form of more intense reading. Reading on steroids, as they say. If reading is considered good, writing should be considered better.
Writing in that draft style is really only useful because a) you read the results and b) you write an improved version at the end. Drafting forever is not considered "better" because someone (usually you) has to sift through the crap to find the good parts.
This is especially pronounced in the programming workplace, where the most "senior" programmers are asked to stop programming so they can review PRs.
You're right that you can't test comments, but you can test the code they describe. That's what reproducibility bundles do in scientific computing ;; the prose says "we filtered variants with MAF < 0.01", and the bundle includes the exact shell command, environment, and checksums so anyone can verify the prose matches reality. The prose becomes a testable claim rather than a decorative comment. That said, I agree the failure mode of literate programming is prose that drifts from code. The question is whether agents reduce that drift enough to change the calculus.
We need an append-only programming language.
but doesn't "the code is documentation" work better for machines?
and don't we have doc-blocks?
Code doesn't express intent, only the implementation. Docblocks are fine for specifying local behavior, but are terrible for big picture things.
Well many times it does.
bool isEven(number: Int) { return number % 2 == 0 }
I would say this expresses the intent, no need for a comment saying "check if the number is even".
Most of the code I read (at work) is not documented, still I understand the intent. In open source projects, I used to go read the source code because the documentation is inexistent or out-of-date. To the point where now I actually go directly to the source code, because if the code is well written, I can actually understand it.
right you are :)
does literate code have a place for big pic though?
> Literate programming is the idea that code should be intermingled with prose such that an uninformed reader could read a code base as a narrative
Have you tried naming things properly? A reader that knows your language could then read your code base as a narrative.
>I don't have data to support this
With there being data that shows context files which explain code reduces the performance of them, it is not straightforward that literate programming is better so without data this article is useless.
One of the things I love most about WebMCP is the idea that it's a MCP session that exists on the page, which the user already knows.
Most of these LLM things are kind of separate systems, with their own UI. The idea of agency being inlayed to existing systems the user knows like this, with immediate bidirectional feedback as the user and LLM work the page, is incredibly incredibly compelling to me.
Series of submissions (descending in time): https://news.ycombinator.com/item?id=47211249 https://news.ycombinator.com/item?id=47037501 https://news.ycombinator.com/item?id=45622604
The question posed is, “With agents, does it become practical to have large codebases that can be read like a narrative, whose prose is kept in sync with changes to the code by tireless machines?”
It's not practical to have codebases that can be read like a narrative, because that's not how we want to read them when we deal with the source code. We jump to definitions, arriving at different pieces of code in different paths, for different reasons, and presuming there is one universal, linear, book-style way to read that code, is frankly just absurd from this perspective. A programming language should be expressive enough to make code read easily, and tools should make it easy to navigate.
I believe my opinion on this matters more than an opinion of an average admirer of LP. By their own admission, they still mostly write code in boring plain text files. I write programs in org-mode all the time. Literally (no pun intended) all my libraries, outside of those written for a day job, are written in Org. I think it's important to note that they are all Lisp libraries, as my workflow might not be as great for something like C. The documentation in my Org files is mostly reduced to examples — I do like docstrings but I appreciate an exhaustive (or at least a rich enough) set of examples more, and writing them is much easier: I write them naturally as tests while I'm implementing a function. The examples are writen in Org blocks, and when I install a library of push an important commit, I run all tests, of which examples are but special cases. The effect is, this part of the documentation is always in sync with the code (of course, some tests fail, and they are marked as such when tests run). I know how to sync this with docstrings, too, if necessary; I haven't: it takes time to implement and I'm not sure the benefit will be that great.
My (limited, so far) experience with LLMs in this setting is nice: a set of pre-written examples provides a good entry point, and an LLM is often capable of producing a very satisfactory solution, immediately testable, of course. The general structure of my Org files with code is also quite strict.
I don't call this “literate programming”, however — I think LP is a mess of mostly wrong ideas — my approach is just a “notebook interface” to a program, inspired by Mathematica Notebooks, popularly (but not in a representative way) imitated by the now-famous Jupyter notebooks. The terminology doesn't matter much: what I'm describing is what the silly.business blogpost is largerly about. The author of nbdev is in the comments here; we're basically implementing the same idea.
silly.business mentions tangling which is a fundamental concept in LP and is a good example of what I dislike about LP: tangling, like several concepts behing LP, is only a thing due to limitations of the programming systems that Donald Knuth was using. When I write Common Lisp in Org, I do not need to tangle, because Common Lisp does not have many of the limitations that apparently influenced the concepts of LP. Much like “reading like a narrative” idea is misguided, for reasons I outlined in the beginning. Lisp is expressive enough to read like prose (or like anything else) to as large a degree as required, and, more generally, to have code organized as non-linearly as required. This argument, however, is irrelevant if we want LLMs, rather than us, read codebases like a book; but that's a different topic.
This is a good opinion. Maybe humans do not really know how to teach this skill of reading code. We do not have a good, exact protocol because people rely on their personal heuristic methods.
[dead]
[dead]
[dead]