I like that the article mentions Manfred von Thun, his iteration over FORTH was a joy to use, spent many hours writing code with it. It's a breeze. It was the true mirror reflection of Lisp in FORTH, and helps one reap the benefits of concatenation without getting bogged down in the nitty gritty details as one often ends up dealing with in FORTH, at least in the beginning until a mature vocabulary is built for the problem.
As I'm still so far behind in the LLM tech in terms of how it works, so I don't know what to think of this article, but my experience with using them as a user to generate FORTH code was often a failure. They just can't get it right, most likely due to lack of training data. OTOH, I also found writing FORTH as a human way more difficult than any other language I used, even more so than hand written assembly. But it amortizes itself fairly quickly, and things get a lot easier after a point (what I called earlier as vocabulary maturity). But more importantly writing FORTH code is way more fun and satisfying to me somehow.
From the title alone I tought it will be another FORTH interpreter implementation article, but I was happy to see someone actually using it for anything besides proving their interpreter with a Fibonacci calculation.
Yep, given that implementing Forth is so easy (easier even than implementing Lisp) pretty soon nearly every Forth programmer decides to take their turn doing it themselves.
My own baby started out as a Forth dialect, but now sits somewhere between Logo and Common Lisp on the complexity scale. Forth is a good starting point imo, you don't waste any time on non essentials.
The observation that concatenative programming languages have nearly ideal properties for efficient universal learning on silicon is very old. You can show that the resource footprint required for these algorithms to effectively learn a programming language is much lower than other common types of programming models. There is a natural mechanical sympathy with the theory around universal learning. It was my main motivation to learn concatenative languages in the 1990s.
This doesn't mean you should write AI in these languages, just that it is unusually cheap and easy for AI to reason about code written in these languages on silicon.
I have just spent a month writing about 2000 lines of Forth. My answer is no, at least w/r to generating something that looks like the by-hand code I wrote. LLMs coast by on being able to reproduce idiomatic syntax and having other forms of tooling(type checkers, linters, unit tests, etc.) back them up.
But Forth taken holistically is a do-anything-anytime imperative language, not just "concatenative" or "postfix". It has a stack but the stack is an implementation detail, not a robust abstraction. If you want to do larger scale things you don't pile more things on the stack, you start doing load and store and random access, inventing the idioms as you go along to load more and store more. This breaks all kinds of tooling models that rely on robust abstractions with compiler-enforced boundaries. I briefly tested to see what LLMs would do with it and gave up quickly because it was a complete rewrite every single time.
Now, if we were talking about a simplistic stack machine it might be more relevant, but that wouldn't be the same model of computation.
> It has a stack but the stack is an implementation detail, not a robust abstraction.
Not exactly. Not only the stack is central in the design of Forth (see my comment over there [1]).
It seems to me that a point-free language like Forth would be highly problematic for an LLM, because it has to work with things that literally are not in the text. I suppose it has to make a lot of guesses to build a theory of the semantic of the words it can see.
Nearly every time the topic of Forth is discussed on HN, someone points out that the cognitive overload* of full point-free style is not viable.
Most models are multi-paradigm, and so they get... Fixated on procedural language design. Concepts like the stack, backtracking, etc. violate the logic they've absorbed, leading to... Burning tokens whilst it corrects itself.
This won't show up in a smaller benchmark, because the clutching at straws tends to happen nearer to the edge of the window. The place where you can get it to give up obvious things that don't work, and actually try the problem space you've given.
Probably prefix notation would work better, but I suspect that there would be a stronger effect from predictable and reasonable declinations/suffixes/prefixes at a grammatical level.
Even though I really like postfix from an elegance standpoint, and I use an RPN calculator, IMO it's harder to reason about subexpressions with postfix. Being able to decompose an expression into independent parts is what allows us to understand it. If you just randomly scan a complex expression in infix, if you see parenthesis or a +, you know that what's outside of the parenthesis or on the other side of a + can't affect the part you're looking at.
If you're executing the operations interactively, you're seeing what's happening on the stack, and so it's easy to keep track of where you are, but if you're reading postfix expressions, it's significantly harder.
The claim seems extremely unlikely to me. LLM comprehension is very sophisticated by any metric, the idea that something as trivial as concatenative syntactic structure would make a significant difference is implausible.
LLMs handle deeply nested syntax just fine - parentheses and indentation are not the hard part. Linearization is not a meaningful advantage.
In fact, it’s much more likely to be a disadvantage, much as it is for humans. Stack effects are implicit, so correct composition requires global reasoning. A single missing dup breaks everything downstream. LLMs, and humans, are much more effective when constraints are named and localized, not implicit and global.
I’m not claiming forth should be used as is. I’ve opened the benchmark so others can reproduce the result I share in the post: https://github.com/rescrv/stack-bench
I like that the article mentions Manfred von Thun, his iteration over FORTH was a joy to use, spent many hours writing code with it. It's a breeze. It was the true mirror reflection of Lisp in FORTH, and helps one reap the benefits of concatenation without getting bogged down in the nitty gritty details as one often ends up dealing with in FORTH, at least in the beginning until a mature vocabulary is built for the problem.
As I'm still so far behind in the LLM tech in terms of how it works, so I don't know what to think of this article, but my experience with using them as a user to generate FORTH code was often a failure. They just can't get it right, most likely due to lack of training data. OTOH, I also found writing FORTH as a human way more difficult than any other language I used, even more so than hand written assembly. But it amortizes itself fairly quickly, and things get a lot easier after a point (what I called earlier as vocabulary maturity). But more importantly writing FORTH code is way more fun and satisfying to me somehow.
From the title alone I tought it will be another FORTH interpreter implementation article, but I was happy to see someone actually using it for anything besides proving their interpreter with a Fibonacci calculation.
There's another front page article right now with someone using it in a very cool way.
https://news.ycombinator.com/item?id=46918824
thanks, somehow I missed that.
Yep, given that implementing Forth is so easy (easier even than implementing Lisp) pretty soon nearly every Forth programmer decides to take their turn doing it themselves.
I suspect, for many, that implementing a forth is more interesting than using a forth.
Once you start writing really complex programs the system gets painful and hard. But trivial things are easy, and the consistency is so appealing.
It is the bootstrap that makes it interesting.
Creating the required primitives in Assembly, and then the remaining userspace out from them.
Afterwards it is programming like most languages.
I have done it with Lisps though.
Also on 8 bit home computers it provided the feeling to be coding close to Assembly while being close enough to BASIC as high level language.
Which reminds me that its time to dust off my old FORTH and make a proper calculator out of it.
My own baby started out as a Forth dialect, but now sits somewhere between Logo and Common Lisp on the complexity scale. Forth is a good starting point imo, you don't waste any time on non essentials.
https://gitlab.com/codr7/shik
The observation that concatenative programming languages have nearly ideal properties for efficient universal learning on silicon is very old. You can show that the resource footprint required for these algorithms to effectively learn a programming language is much lower than other common types of programming models. There is a natural mechanical sympathy with the theory around universal learning. It was my main motivation to learn concatenative languages in the 1990s.
This doesn't mean you should write AI in these languages, just that it is unusually cheap and easy for AI to reason about code written in these languages on silicon.
It sounds like you’re referring a proof. Where can one find it, and what background prepares one for it?
Muxleq and Subleq:
https://github.com/howerj/subleq
https://github.com/howerj/muxleq
Muxleq because of performance:
Edit muxleq.fth, set these values like this: Then run:./muxleq ./muxleq.dec < muxleq.fth > new.dec
new.dec it's the enhanced SUBLEQ EForth image.
To run it:
For the available words, run inside the interpreter.Enter
to exit.Get Starting Forth but the ANS version (there's some PDF in search engines) and Thinking Forth.
Diffusion text models to the rescue! :)
Looking to discuss with people about whether LLMs would do better if the language had properties similar to postfix-notation.
I have just spent a month writing about 2000 lines of Forth. My answer is no, at least w/r to generating something that looks like the by-hand code I wrote. LLMs coast by on being able to reproduce idiomatic syntax and having other forms of tooling(type checkers, linters, unit tests, etc.) back them up.
But Forth taken holistically is a do-anything-anytime imperative language, not just "concatenative" or "postfix". It has a stack but the stack is an implementation detail, not a robust abstraction. If you want to do larger scale things you don't pile more things on the stack, you start doing load and store and random access, inventing the idioms as you go along to load more and store more. This breaks all kinds of tooling models that rely on robust abstractions with compiler-enforced boundaries. I briefly tested to see what LLMs would do with it and gave up quickly because it was a complete rewrite every single time.
Now, if we were talking about a simplistic stack machine it might be more relevant, but that wouldn't be the same model of computation.
> It has a stack but the stack is an implementation detail, not a robust abstraction.
Not exactly. Not only the stack is central in the design of Forth (see my comment over there [1]).
It seems to me that a point-free language like Forth would be highly problematic for an LLM, because it has to work with things that literally are not in the text. I suppose it has to make a lot of guesses to build a theory of the semantic of the words it can see.
Nearly every time the topic of Forth is discussed on HN, someone points out that the cognitive overload* of full point-free style is not viable.
[1] https://news.ycombinator.com/item?id=46918824#46921815
Most models are multi-paradigm, and so they get... Fixated on procedural language design. Concepts like the stack, backtracking, etc. violate the logic they've absorbed, leading to... Burning tokens whilst it corrects itself.
This won't show up in a smaller benchmark, because the clutching at straws tends to happen nearer to the edge of the window. The place where you can get it to give up obvious things that don't work, and actually try the problem space you've given.
I haven’t tried the extremes. Context rot says it’ll likely degrade there anyway.
What I’m investigating is if more compact languages work for querying data.
What makes you think it’s going to clutch at straws more? What makes you think it won’t do better with a more compact, localized representation?
Probably prefix notation would work better, but I suspect that there would be a stronger effect from predictable and reasonable declinations/suffixes/prefixes at a grammatical level.
Even though I really like postfix from an elegance standpoint, and I use an RPN calculator, IMO it's harder to reason about subexpressions with postfix. Being able to decompose an expression into independent parts is what allows us to understand it. If you just randomly scan a complex expression in infix, if you see parenthesis or a +, you know that what's outside of the parenthesis or on the other side of a + can't affect the part you're looking at.
If you're executing the operations interactively, you're seeing what's happening on the stack, and so it's easy to keep track of where you are, but if you're reading postfix expressions, it's significantly harder.
I find some calculations easier to reason about using either RPN or algebraic. Its entirely context driven.
Playing with APL has really changed the way I look at both.
The claim seems extremely unlikely to me. LLM comprehension is very sophisticated by any metric, the idea that something as trivial as concatenative syntactic structure would make a significant difference is implausible.
LLMs handle deeply nested syntax just fine - parentheses and indentation are not the hard part. Linearization is not a meaningful advantage.
In fact, it’s much more likely to be a disadvantage, much as it is for humans. Stack effects are implicit, so correct composition requires global reasoning. A single missing dup breaks everything downstream. LLMs, and humans, are much more effective when constraints are named and localized, not implicit and global.
I’m not claiming forth should be used as is. I’ve opened the benchmark so others can reproduce the result I share in the post: https://github.com/rescrv/stack-bench