Show HN: Remoroo – Trying to fix memory in long-running coding agents
www.remoroo.comI built Remoroo because most coding agents fall apart once the work stops being a short edit-and-run loop.
A real engineering experiment can run for hours. Along the way, the agent reads files, runs commands, checks logs, compares metrics, tries ideas that fail, and needs to remember what already happened. Once context starts slipping, it forgets the goal, loses track of the baseline, and retries bad ideas.
Remoroo is my attempt to solve that problem.
You point it at a repo and give it a measurable goal. It runs locally, tries changes, executes experiments, measures the result, keeps what helps, and throws away what does not.
A big part of the system is memory. Long runs generate far more context than a model can hold, so I built a demand-paging memory system inspired by OS virtual memory to keep the run coherent over time.
There is a technical writeup here: https://www.remoroo.com/blog/how-remoroo-works
Would love feedback from people working on long-running agents, training loops, eval harnesses, or similar workflows.
> Would love feedback from people working on long-running agents, training loops, eval harnesses, or similar workflows.
I have not required a service for this kind of optimization at work. Though work gives me unbounded access to Claude 4.x-1m (substitute x with whatever is available). So I often ask it to do this kind of task.
I found that when I just specify, sometimes the AI will optimize to the point that it breaks other existing functional requirements in the same codebase. So I have to steer it with invariants. This is where the bulk of my effort is - monitoring to make sure that the agent didn't suddenly scramble the infra or delete valid usecases.
1. How do you address that [paperclip problem](https://en.wikipedia.org/wiki/Instrumental_convergence) in Remoroo? Can we define invariants? 2. Why is there a whole orchestration system? Was there some limitation that prompted this architecture, e.g. did workers die frequently? Looks like Temporal/AWS SWF with the brain/worker/control architecture. The existence of `q (quit): Kills the Worker, aborts the run. The run is marked FAILED.` makes me think there's only one worker...so why...? It'd make more sense if the brain wanted to dispatch multiple hypotheses to multiple workers to test in parallel (e.g. if optimizing SQL, try these different joins all at once, discard queries running for X minutes after the first complete one).
Congrats on shipping and thanks for sharing. Sorry for the silly question, but can you help me understand what’s the difference between this and https://github.com/snarktank/ralph
(I didn’t dive too deep into neither so my apologies if it’s apples to oranges)
I am doing something similar with https://github.com/holoduke/myagent This project is actually some autonomous system that reads papers and tries to implement them. A fun experiment I would say.
I have no comments on the product and wish you all the best with it, but the AI copy on the page is so painful to read.
"Not a coding agent. An autonomous research engine."
"It didn't guess. It proved."
"▸ 30 experiments completed ▸ 8 kept · 22 discarded ▸ val_bpb: 2.24 → 1.55 ▸ VERIFIED · REPRODUCIBLE You slept through it."
God that is cringe…