Obviously I'm an AI-tools skeptic, but this is hilarious:
> 1. Prepare issues with sufficient context
> Start by ensuring each GitHub issue contains enough context for agents to understand what needs to be built and how it integrates with the system. This might include details about feature behavior, file locations, database structure, or specific requirements such as displaying certain fields or handling edge cases.
> You can’t do half-hearted prompts and fix as you go, because those fixes come an hour later.
> Skills That Become More Important
> Full-stack understanding
> Problem decomposition
> Good writting skills
> QA and Code Review skills
This is just software engineering?!?
edit: On the other hand, maybe I can convince people in my org to get better at software engineering by telling them its for the AI to work better.
Absolutely. The existence of vibe coding does not mean production code is going to be engineered without the same principles we've always used - even if we're using AI to generate a lot more code than before.
Any crowd suggesting that this was not the case has lost the plot, imo.
People find it a lot more palatable when the AI requires all this information than when software engineers do though. If I ask for clear requirements I’m asked to just figure it out. But if the AI implements nonsense without clear requirements that the fault of the specs.
I consider it my job to figure out the requirements. The fact that they aren't specified in detail allows me to do what I think is best rather than being bound by often arbitrary specifications.
I judge which decisions I make and which ones I bring up to my team/PO/whatever. Most of the time I just do what I think is best, some times I'll do something and then bring it up later like "I did this this way but if that doesn't work I can change it", typically for things that will be easy to change later. Some things I ask about before I do them because they won't be easy to change later.
I'll often take technical liberties with frontend designs, for example I'll use a html select rather than reinventing the drop-down just to be able to put rounded corners on the options. I'll style scrollbars using the very limited options css provides rather than reinvent scrollbars just to follow the design exactly. Most of the time nobody cares, we can always go back later and do these types of things if we really want a certain aesthetic.
I have never had the impression that my questions bother people, rather the opposite. I've had multiple designers say they appreciate the way I interact with them, I respect their work and their designs but I ask them if something looks like an oversight or I'm not exactly sure what their intention is. POs and such are always happy to answer simple questions, I make it easy for them: here's a decision we need to make, I want you to make it. Maybe I have a suggestion for what I would prefer and some reasons why I prefer that solution.
I don't expect them to think of everything and answer all my potential questions in advance, that's just unnecessary and difficult work.
> On the other hand, maybe I can convince people in my org to get better at software engineering by telling them its for the AI to work better.
Really good engineering practices are fundamental to get the most out of AI tooling. Convoluted documentation, code fragmentation, etc all pollute context when working with AI tools.
In my experience, just having one outdated word (especially if it's a library name) anywhere in code or documentation can create major ongoing headaches.
The worst part of it is trying to avoid negative assertions. When the AI tooling keeps trying to do "the wrong thing" it's sometimes a challenge to rephrase instructions for "the right thing" to frame a positive assertion.
Yeah, it’s funny, we may finally have a way to get developers to write documentation for other developers, it’s just that the other developers aren’t human!
The author is lying. My team and I are heavy users of Claude code and other agents and it ain’t like this. You need to manage an AI coding agent carefully and course correct frequently. There are cases for parallel agents but they are tasks like parallel document fetches and summarization, and other tasks that don’t require supervision.
The idea of having multiple parallel agents merge pull requests or resolve issues in parallel is still just an idea.
Please don’t post or upvote attention seeking crap like this. It gives a very exciting and promising technology a bad name.
No. Unfortunately there’s a problem now of people blatantly lying about the ability of LLMs to get attention. And it’s extremely effective.
I say this as someone who uses them every day for programming and is also excited now and for the future possibilities. The just blatant lying needs to stop though and needs to be called out.
Having 20 PRs open at once doesn't necessarily mean managing 20 agents simultaneously.
It can mean, for example, that 2 agents worked for some time in a list of 20 TODO features and produced 20 PRs to be reviewed. They could have worked overnight even.
You're seemingly judging from the least generous interpretation, which is not constructive and is also against HN guidelines fyi.
I’m not ok with someone self promoting here at the cost of thousands of people thinking they’re either not smart enough, or are doing something incorrectly. We saw this same pattern during the dot com boom a quarter century ago with self promoters creating a “you just don’t get it” culture which eventually collapsed like a house of cards. What we share should be reproducible by others and we should avoid hand wavey excitement without substance. Especially here on HN where many of the next great companies and ideas will be born.
Technology evolves. At some point there's going to be things that other people are doing that you can't replicate yet. That doesn't mean you're not smart enough. But it might mean that you are doing something wrong. Often though, you just got to try different things or wait for methodologies to consolidate and become mainstream.
Even if parallel agents is not something easily done currently, debating about ways to do it is constructive enough for me.
Exactly! To make it more clear, here is how I approach my day:
9-10am: I comb through our issue list, figuring out which issues are well defined, which need more input or design decision. => I pick a hand-full, lets say 10 that I kick off to run in the background, and lets say another 10 for further specification.
10-2pm: I thinker with 10 issues to figure out the exact specs and to expand the requirement list.
2pm-6pm: I review the code written by the agents, one by one. I kick off for further work things that need more input, or merge things that look good.
Can this guy, or someone else post a full days (4-8 hours, or what ever is spent in the weeds) stream of work to youtube or something. I just want to watch the process to see what I'm missing. Or if there is anyone that already does that can they recommend it to me. I would appreciate it.
Got through about 45 min at 2x speed / some skipping ahead out of pure fascination. Man that's something else. It's like bug-driven-development. Get the LLM to churn out a huge chunk of text, then skim the code for about 10 seconds and say it looks good. Then spend a while testing and hitting one error after the next until it finally seems to work. Repeat.
Are you saying that because you're also skeptical? I haven't had the best time switching to agent coding. I mean for throwaway work its fine but its kind of boring and aider still messes up from time to time
I probably lean on the sceptical side of the spectrum. I'm not against giving it a go if I can get value out of it but I'm not having the wonderful experience that these people are having.
- The asynchronous nature of it slows me down and it feels the opposite of what this bloke is saying around getting into a flow.
- I miss things because I'm not thinking it all the way through.
- The issues with errors or hallucinations.
- It does not feel faster (I might blow through a couple of things really fast, but the issues created elsewhere sometimes eat all that saved time up).
- The quality of work is all over the shop. Bigger projects just fall apart after a while.
I also wonder if the way I think is hindering me. I don't like natural language. I struggle to communicate at the best of times. All my emails are dot points. If someone asks me for a diagram I write it in plantuml or using a python library. I work in DevOps and love declarative manifests and templates.
Try as an initial step having the agentic AI improve your prompt for you. I have a "prompt improvement prompt template", which is a standardized document (customized for each project I'm working on), that has a bunch of boilerplate instructions in it, along with a section where I paste in my first-draft prompt. I then feed this document (boilerplate + crappy prompt) into the AI and it creates a way better prompt for me. Then I edit that to ensure it's correct, and then that becomes the prompt I use.
Is it me or the post sounding (showing!) that they haven't tried the mentioned approach in real life.
Because in real life, one agent tries to fix build issue with rm -rf node_modules and the other is already running a server (ie: npm server), conflicting with each other nearly all the time!. (even if it's not a destructive action, the second npm server will most likely to fail due to port-allocation conflicts!)
Meanwhile, what I found helpful is that:
1. Clone the same repo twice or three times
2. In each terminal or whatever, `cd` into it
3. Create a branch, run your ~commands~ prompts (each is their own session with their own repo)
4. commit/push then merge/rebase (also resolve conflicts if needed, use LLM again if you want)
Any other way multiple agents work in harmony in a single repo (filesystem/directory) at the same time is a pipe-dream with the current status of the MCP and agents.
Let alone being aware of each other (agents), they don't even have a proper locking mechanism. As soon as you make a new change, most of the editing (search/replace) functionality in the MCPs fail miserably. Then they re-read the entire file, just creating context-rot or over-filling with already-existing stuff. Soon you run out of tokens (or just pay extra for no reason)
> edit: comments mentioned that each agent runs in a VM isolated from others, kinda makes sense but still, there will be massive merge-conflicts unless each agent runs in a completely different set of service/code-base (ie frontend vs backend or couple of micro-services)
The post is about using GitHub's integrated Copilot tooling, where each issue gets its own instance presumably running in a sandbox. This sidesteps the issues you're talking about here.
I don't claim to have lots of experience with this, I've been only doing it for a couple of weeks, but I do feel that some of your comments are disingenuous.
---
> Any other way multiple agents work in harmony in a single repo (filesystem/directory) at the same time is a pipe-dream with the current status of the MCP and agents.
Every agent runs in a separate VM on GitHub.
> Let alone being aware of each other (agents), they don't even have a proper locking mechanism.
Never claimed this. Feels like a strawman argument.
This seems like an important caveat the author of the article failed to mention when they described this:
> you can have several agents running simultaneously - one building a user interface, another writing API endpoints, and a third creating database schemas.
If these are all in the same project then there has to be some required ordering to it or you get a frontend written to make use of a backend that doesn't have the endpoints used, and you get a backend that makes use of a different database schema than the separately generated database schema.
This is just project management. Teams of software devs have been doing this for decades. And it’s easier with agents because there is no harm in letting one sit idle.
On the same project you can use worktrees or otherwise separate clones of the repo - that part is not that bad. My comment was just about my own context switch.
A technique I have found that works well is to have it working on one feature and then to have another session planning the next. Whilst it's busy generating some code, I open up another instance, tell it the next task and instruct it to create a gherkin feature file with an implementation plan. I then go back and forth between reviewing the code for the current feature and the plan for the next one.
I don't think it's tongue-in-cheek at all. It refers to a specific type of LLM coding, where you literally don't care about how bad the code is and just code stuff and hope it works. That's how I use the term, and that's why I use it rarely.
People took it seriously because that's exactly how a lot of LLM users think and exactly what they want 'coding' to be. Honestly I'm not even certain it is satire.
I'm starting to think that the rather slow nature of Claude Code is a feature. In fact if they suddenly sped things up by 10X, I would want an option to slow it back down. Sometimes I am fine with it working unsupervised while I empty the dishwasher or take a shower, but a lot of the time I watch it work. Not only does this help me stop it from going down rabbit holes / chewing through all of my Opus usage cap, but I have a much better understanding of what it's built, in the same way I might if I was pair-programming with someone and they were driving.
The idea of having multiple instances working in parallel sounds like a nightmare to me. I find this technology works best when it is guided, ideally in real time.
I find Codex and Claude Code to have different strength/weaknesses and wanted to be able to use them from a single interface. Currently hacking on https://devfleet.ai to make agent management more easy on myself.
Briefly mentioned on the article but async agents really thrive on small and scoped issues. Imagine hooking them up to your feedback tool (eg canny) and automatically having a PR as you review the customer feedback. Now this would likely not work for large asks but for smaller asks, you can just accept the PR and ship it really fast!
The codex model is trained differently than the normal models. It has extra training on how to use cli and I find it to be better at project scope tasks (eg running tests, migrations, etc). Whereas in my experience Claude is the better coding model.
Look, I like AI coding but we're already way past the need for parallelism.
LLMs write so much code in such a short time that the bottleneck is already the human having to review, correct, rewrite.
Parallel agents working on different parts of the application just compound this problem worse, it's impossible to catch up.
The only far fetched use case I can see is swarming hundreds of solutions against a properly designed test case and spec documents and having an agent selecting the best solutions.
Still, I'm quite convinced humans would be the bottleneck.
It really depends on the project. For example, there's a lot of thorny devops debugging where I can just let Claude spin for 30 minutes and it'll solve the problem (or fail) with a relatively short final answer.
The sweet spot for me tends to be running one of these slower projects on a worktree in the background, and one more active coding project.
Exactly. With other models that are not Claude, the code generation for an issue takes a minute at most, whereas writing the detailed specification for it as a human takes me days or longer. Parallel code generation is as relevant to me as having a fast car stuck in traffic at a red light.
(1) I feel like most people call these async agents, though maybe "parallel" is the term that will stick.
(2) Async is great for reasons other than concurrent execution.
(3) Concurrent execution is tricky, at least for tightly defined projects, because the PRs will step on each other, and (maybe this is just me) I would rather rewrite an entire project than try to pick through a complicated merge conflict.
Nah, I saw this problem a while ago and already spec'd out the solution. First, agents need to be doing atomic commits, and second you can just have a massive merge queue with bisection, if you're using bazel you can handle ci gating on thousands of PRs with very little overhead, and when a merge batch fails you find the bad patch set in O(log(n)) time and dispatch to an agent for reconciliation. I even built a prototype, works great in benchmarks but I don't have a need for it over merge trains in gitlab yet.
My experience with parallel agents is that the bottleneck is not how fast we can produce code but the speed at which we can review it and context switch. Realistically, I don’t think most people have the mental capacity to supervise more than one simultaneous task of any real complexity.
Context switching between more than 2 threads of work is untenable if you want to really review code in depth. And with AI, you need to go through everything with a fine toothed comb
Eh, you quickly learn who you can trust and who needs the super skeptical detailed look with humans. With LLM’s you have you to super skeptical of everything.
I use Claude every day. When I give it a program that does something straightforward in one file and it writes it from scratch, it does great. When I have it fix issues or add functionality to small to medium sized apps that have a mostly simple design, it does great. When I point it at codebases that are 20 years old and have a lot of indirection in their design due to years of hard lessons learned and a lot (like a LOT) of cases covered, it really struggles (just read my profile to know what codebase this is). I mostly try to get it to write changelog messages, docs and tests, where it works, but I have to really wrestle with it. I can't imagine doing anything on "vibes" and it all seems quite ridiculous if you are working on hardcore library oriented software with tens of thousands of users.
If we're going to say, who cares, with LLMs we'll never need 20 year old codebases we'll just keep writing new stuff, OK you do you.
one thing that i find works really well is to ask it to research things in the codebase and write a plan first. codex with gpt-5 is exceedingly good at doing this. then ask it to write a plan for what it would do with that information, i.e., i want you to research codebase for <goal>. then write a plan for how you would achieve <goal> given what you have learned.
Claude writes out plans and all that, it's good about that.
Sure would be great if ai agents could learn from conversations. That would really makes things better. I tell Claude to capture things in the Claude.md file, but I have to manually tend to that quite a lot.
Counteropinion: neither parallel nor single threaded agents are a gamechanger.
I mean they might be changing the game into prodicing more hard to maintain software, faster, but if that is the game you are playing, I dont wanna participate.
I totally disagree about the monorepo argument. Mono-repos exist as a reaction to poor modularization and tight coupling which creates a need to update dependencies frequently.
The reason for having dependencies in the same repo as the trunk of the project code is precisely because the dependencies aren't sufficiently generic, too dependent on the project's business domain and so they require constant maintenance.
This tight coupling means that the agent requires more context to solve problems and implement simple features. The need for more context is a problem for agents, not a benefit. Agents benefit from modularization, loose coupling and well-chosen abstractions. These attributes do not correspond to the kinds of complex, tightly integrated codebases which benefit from having a monorepo structure.
Dependencies should be like tools. If you think of a hammer, you can do a lot of different jobs with the same hammer... You can debate whether or not a hammer is the right tool for any given job, but for those jobs where a hammer is the right tool, how often do you need to tweak the hammer itself? A hammer solves a very specific problem but that problem can be generalized to countless different use cases. A hammer would make a good module.
Now if you did a project for a candle factory and let the project business domain leak into the design of your tools/modules; you may build a hammer out of wax to straighten out candles... Then in your next project building a house you will find that this hammer doesn't work for that case and needs to be modified. This is a failure of separation of concerns. The tool was not originally optimized for the specific task of applying blunt force to a limited area; it couldn't do that narrow job very well and that's why it needs to be changed. Had you built a hammer out of steel, it would likely have solved both problems even through it's a completely different use case.
Obviously I'm an AI-tools skeptic, but this is hilarious:
> 1. Prepare issues with sufficient context
> Start by ensuring each GitHub issue contains enough context for agents to understand what needs to be built and how it integrates with the system. This might include details about feature behavior, file locations, database structure, or specific requirements such as displaying certain fields or handling edge cases.
> You can’t do half-hearted prompts and fix as you go, because those fixes come an hour later.
> Skills That Become More Important > Full-stack understanding > Problem decomposition > Good writting skills > QA and Code Review skills
This is just software engineering?!?
edit: On the other hand, maybe I can convince people in my org to get better at software engineering by telling them its for the AI to work better.
I tend to agree with English being the new programming language. Those with English communication struggles will struggle to code this way.
>This is just software engineering?!?
Absolutely. The existence of vibe coding does not mean production code is going to be engineered without the same principles we've always used - even if we're using AI to generate a lot more code than before.
Any crowd suggesting that this was not the case has lost the plot, imo.
People find it a lot more palatable when the AI requires all this information than when software engineers do though. If I ask for clear requirements I’m asked to just figure it out. But if the AI implements nonsense without clear requirements that the fault of the specs.
I am amazed at how suddenly people are on board with writing clear design documentation now that it means AI can generate the code rather than humans.
I wonder how much better humans would be at generating code given the same abundance of clearly-written design documentation?
My workplace was always pretty good at writing requirements so I call myself a chatgpt wrapper now.
Well, that’s because the software engineers are irritating when they push back and say ‘no’ or ‘wtf’.
When the AI does it, it’s being polite and stuff. /s, kinda.
You're right. That's an excellent observation! I will make sure to use those language patterns in all my professional communications going forward.
/s but not really?
I consider it my job to figure out the requirements. The fact that they aren't specified in detail allows me to do what I think is best rather than being bound by often arbitrary specifications.
I judge which decisions I make and which ones I bring up to my team/PO/whatever. Most of the time I just do what I think is best, some times I'll do something and then bring it up later like "I did this this way but if that doesn't work I can change it", typically for things that will be easy to change later. Some things I ask about before I do them because they won't be easy to change later.
I'll often take technical liberties with frontend designs, for example I'll use a html select rather than reinventing the drop-down just to be able to put rounded corners on the options. I'll style scrollbars using the very limited options css provides rather than reinvent scrollbars just to follow the design exactly. Most of the time nobody cares, we can always go back later and do these types of things if we really want a certain aesthetic.
I have never had the impression that my questions bother people, rather the opposite. I've had multiple designers say they appreciate the way I interact with them, I respect their work and their designs but I ask them if something looks like an oversight or I'm not exactly sure what their intention is. POs and such are always happy to answer simple questions, I make it easy for them: here's a decision we need to make, I want you to make it. Maybe I have a suggestion for what I would prefer and some reasons why I prefer that solution.
I don't expect them to think of everything and answer all my potential questions in advance, that's just unnecessary and difficult work.
> On the other hand, maybe I can convince people in my org to get better at software engineering by telling them its for the AI to work better.
Really good engineering practices are fundamental to get the most out of AI tooling. Convoluted documentation, code fragmentation, etc all pollute context when working with AI tools.
In my experience, just having one outdated word (especially if it's a library name) anywhere in code or documentation can create major ongoing headaches.
The worst part of it is trying to avoid negative assertions. When the AI tooling keeps trying to do "the wrong thing" it's sometimes a challenge to rephrase instructions for "the right thing" to frame a positive assertion.
Yes. AI assisted software engineering is still software engineering. I don't see that part changing anytime soon.
> This is just software engineering?!?
Indeed yes. Although most places shipping software in a "software development" and/or "programming" fashion for many years.
Many, many places certainly do not do the engineering part, even though resulting product is a software.
Yeah, it’s funny, we may finally have a way to get developers to write documentation for other developers, it’s just that the other developers aren’t human!
Yes, the ability to clearly and unambiguously communicate what's required works on both humans and machines.
lmao, "good writting skills" =)
[sic]
I lol'ed too but then thought - at least he actually wrote this!
Heh, damn. Made a typo at the worst spot
The author is lying. My team and I are heavy users of Claude code and other agents and it ain’t like this. You need to manage an AI coding agent carefully and course correct frequently. There are cases for parallel agents but they are tasks like parallel document fetches and summarization, and other tasks that don’t require supervision.
The idea of having multiple parallel agents merge pull requests or resolve issues in parallel is still just an idea.
Please don’t post or upvote attention seeking crap like this. It gives a very exciting and promising technology a bad name.
Your comment is disproportionately rude. Just because your team can't leverage multiple coding agents doesn't mean no one else can.
And even if OP also can't, this is a good place to discuss possible problems and solutions for parallel development using coding agents.
Please refrain from gatekeeping.
No. Unfortunately there’s a problem now of people blatantly lying about the ability of LLMs to get attention. And it’s extremely effective.
I say this as someone who uses them every day for programming and is also excited now and for the future possibilities. The just blatant lying needs to stop though and needs to be called out.
“With this approach, I can manage to have 10–20 pull requests open at once, each handled by a dedicated agent.”
A quote from the post. No, I think my post is calibrated quite well considering what OPs post does to our industry.
Having 20 PRs open at once doesn't necessarily mean managing 20 agents simultaneously.
It can mean, for example, that 2 agents worked for some time in a list of 20 TODO features and produced 20 PRs to be reviewed. They could have worked overnight even.
You're seemingly judging from the least generous interpretation, which is not constructive and is also against HN guidelines fyi.
I’m not ok with someone self promoting here at the cost of thousands of people thinking they’re either not smart enough, or are doing something incorrectly. We saw this same pattern during the dot com boom a quarter century ago with self promoters creating a “you just don’t get it” culture which eventually collapsed like a house of cards. What we share should be reproducible by others and we should avoid hand wavey excitement without substance. Especially here on HN where many of the next great companies and ideas will be born.
Technology evolves. At some point there's going to be things that other people are doing that you can't replicate yet. That doesn't mean you're not smart enough. But it might mean that you are doing something wrong. Often though, you just got to try different things or wait for methodologies to consolidate and become mainstream.
Even if parallel agents is not something easily done currently, debating about ways to do it is constructive enough for me.
[dead]
Exactly! To make it more clear, here is how I approach my day:
9-10am: I comb through our issue list, figuring out which issues are well defined, which need more input or design decision. => I pick a hand-full, lets say 10 that I kick off to run in the background, and lets say another 10 for further specification.
10-2pm: I thinker with 10 issues to figure out the exact specs and to expand the requirement list.
2pm-6pm: I review the code written by the agents, one by one. I kick off for further work things that need more input, or merge things that look good.
Can this guy, or someone else post a full days (4-8 hours, or what ever is spent in the weeds) stream of work to youtube or something. I just want to watch the process to see what I'm missing. Or if there is anyone that already does that can they recommend it to me. I would appreciate it.
https://youtu.be/xAKVi_jvvg4
Two hours of Web Dev Cody.
Got through about 45 min at 2x speed / some skipping ahead out of pure fascination. Man that's something else. It's like bug-driven-development. Get the LLM to churn out a huge chunk of text, then skim the code for about 10 seconds and say it looks good. Then spend a while testing and hitting one error after the next until it finally seems to work. Repeat.
Wow, I didnt expect that dystopias can be so boring.
If somebody like that producing code of like that low quality worked with me, I can see myself spilling coffee or acid on them or their laptop.
Web dev cody is great. I recommend him.
I (author) sometimes stream my work here as well https://www.youtube.com/@operatelybackstage.
Are you saying that because you're also skeptical? I haven't had the best time switching to agent coding. I mean for throwaway work its fine but its kind of boring and aider still messes up from time to time
I probably lean on the sceptical side of the spectrum. I'm not against giving it a go if I can get value out of it but I'm not having the wonderful experience that these people are having. - The asynchronous nature of it slows me down and it feels the opposite of what this bloke is saying around getting into a flow. - I miss things because I'm not thinking it all the way through. - The issues with errors or hallucinations. - It does not feel faster (I might blow through a couple of things really fast, but the issues created elsewhere sometimes eat all that saved time up). - The quality of work is all over the shop. Bigger projects just fall apart after a while. I also wonder if the way I think is hindering me. I don't like natural language. I struggle to communicate at the best of times. All my emails are dot points. If someone asks me for a diagram I write it in plantuml or using a python library. I work in DevOps and love declarative manifests and templates.
Try as an initial step having the agentic AI improve your prompt for you. I have a "prompt improvement prompt template", which is a standardized document (customized for each project I'm working on), that has a bunch of boilerplate instructions in it, along with a section where I paste in my first-draft prompt. I then feed this document (boilerplate + crappy prompt) into the AI and it creates a way better prompt for me. Then I edit that to ensure it's correct, and then that becomes the prompt I use.
Is it me or the post sounding (showing!) that they haven't tried the mentioned approach in real life.
Because in real life, one agent tries to fix build issue with rm -rf node_modules and the other is already running a server (ie: npm server), conflicting with each other nearly all the time!. (even if it's not a destructive action, the second npm server will most likely to fail due to port-allocation conflicts!)
Meanwhile, what I found helpful is that: 1. Clone the same repo twice or three times 2. In each terminal or whatever, `cd` into it 3. Create a branch, run your ~commands~ prompts (each is their own session with their own repo) 4. commit/push then merge/rebase (also resolve conflicts if needed, use LLM again if you want)
Any other way multiple agents work in harmony in a single repo (filesystem/directory) at the same time is a pipe-dream with the current status of the MCP and agents.
Let alone being aware of each other (agents), they don't even have a proper locking mechanism. As soon as you make a new change, most of the editing (search/replace) functionality in the MCPs fail miserably. Then they re-read the entire file, just creating context-rot or over-filling with already-existing stuff. Soon you run out of tokens (or just pay extra for no reason)
> edit: comments mentioned that each agent runs in a VM isolated from others, kinda makes sense but still, there will be massive merge-conflicts unless each agent runs in a completely different set of service/code-base (ie frontend vs backend or couple of micro-services)
The post is about using GitHub's integrated Copilot tooling, where each issue gets its own instance presumably running in a sandbox. This sidesteps the issues you're talking about here.
I don't claim to have lots of experience with this, I've been only doing it for a couple of weeks, but I do feel that some of your comments are disingenuous.
---
> Any other way multiple agents work in harmony in a single repo (filesystem/directory) at the same time is a pipe-dream with the current status of the MCP and agents.
Every agent runs in a separate VM on GitHub.
> Let alone being aware of each other (agents), they don't even have a proper locking mechanism.
Never claimed this. Feels like a strawman argument.
git workspaces?
u mean worktrees
The sweet spot for me is 2 agents on different projects. Surprisingly the context switch is easy. It's harder when doing 2 tasks on the same project.
> on different projects
This seems like an important caveat the author of the article failed to mention when they described this:
> you can have several agents running simultaneously - one building a user interface, another writing API endpoints, and a third creating database schemas.
If these are all in the same project then there has to be some required ordering to it or you get a frontend written to make use of a backend that doesn't have the endpoints used, and you get a backend that makes use of a different database schema than the separately generated database schema.
This is just project management. Teams of software devs have been doing this for decades. And it’s easier with agents because there is no harm in letting one sit idle.
On the same project you can use worktrees or otherwise separate clones of the repo - that part is not that bad. My comment was just about my own context switch.
A technique I have found that works well is to have it working on one feature and then to have another session planning the next. Whilst it's busy generating some code, I open up another instance, tell it the next task and instruct it to create a gherkin feature file with an implementation plan. I then go back and forth between reviewing the code for the current feature and the plan for the next one.
It still amuses me how literally people took Kapathy's famous tweet around vibe coding https://x.com/karpathy/status/1886192184808149383
If people were to actually read beyond the first sentence, it would become clear very quickly that this was meant to be tongue in cheek.
Because most people have the context-window of 10 tokens, they do not read further than the first sentence (or two).
I don't think it's tongue-in-cheek at all. It refers to a specific type of LLM coding, where you literally don't care about how bad the code is and just code stuff and hope it works. That's how I use the term, and that's why I use it rarely.
People took it seriously because that's exactly how a lot of LLM users think and exactly what they want 'coding' to be. Honestly I'm not even certain it is satire.
I'm starting to think that the rather slow nature of Claude Code is a feature. In fact if they suddenly sped things up by 10X, I would want an option to slow it back down. Sometimes I am fine with it working unsupervised while I empty the dishwasher or take a shower, but a lot of the time I watch it work. Not only does this help me stop it from going down rabbit holes / chewing through all of my Opus usage cap, but I have a much better understanding of what it's built, in the same way I might if I was pair-programming with someone and they were driving.
The idea of having multiple instances working in parallel sounds like a nightmare to me. I find this technology works best when it is guided, ideally in real time.
I find Codex and Claude Code to have different strength/weaknesses and wanted to be able to use them from a single interface. Currently hacking on https://devfleet.ai to make agent management more easy on myself.
Briefly mentioned on the article but async agents really thrive on small and scoped issues. Imagine hooking them up to your feedback tool (eg canny) and automatically having a PR as you review the customer feedback. Now this would likely not work for large asks but for smaller asks, you can just accept the PR and ship it really fast!
Cool project! Do you think a lot of Codex's strengths are just from using GPT-5 as the model?
The codex model is trained differently than the normal models. It has extra training on how to use cli and I find it to be better at project scope tasks (eg running tests, migrations, etc). Whereas in my experience Claude is the better coding model.
When I read the title, I thought you were referring to https://parallel.ai, which also is a game changer in my opinion :)
PS: I have no affiliation with Parallel the company
Look, I like AI coding but we're already way past the need for parallelism.
LLMs write so much code in such a short time that the bottleneck is already the human having to review, correct, rewrite.
Parallel agents working on different parts of the application just compound this problem worse, it's impossible to catch up.
The only far fetched use case I can see is swarming hundreds of solutions against a properly designed test case and spec documents and having an agent selecting the best solutions.
Still, I'm quite convinced humans would be the bottleneck.
You are the main thread:
https://www.claudelog.com/mechanics/you-are-the-main-thread/
It really depends on the project. For example, there's a lot of thorny devops debugging where I can just let Claude spin for 30 minutes and it'll solve the problem (or fail) with a relatively short final answer.
The sweet spot for me tends to be running one of these slower projects on a worktree in the background, and one more active coding project.
Yeah sure, I mean, there always be problems you can swarm..
Exactly. With other models that are not Claude, the code generation for an issue takes a minute at most, whereas writing the detailed specification for it as a human takes me days or longer. Parallel code generation is as relevant to me as having a fast car stuck in traffic at a red light.
So:
(1) I feel like most people call these async agents, though maybe "parallel" is the term that will stick.
(2) Async is great for reasons other than concurrent execution.
(3) Concurrent execution is tricky, at least for tightly defined projects, because the PRs will step on each other, and (maybe this is just me) I would rather rewrite an entire project than try to pick through a complicated merge conflict.
Nah, I saw this problem a while ago and already spec'd out the solution. First, agents need to be doing atomic commits, and second you can just have a massive merge queue with bisection, if you're using bazel you can handle ci gating on thousands of PRs with very little overhead, and when a merge batch fails you find the bad patch set in O(log(n)) time and dispatch to an agent for reconciliation. I even built a prototype, works great in benchmarks but I don't have a need for it over merge trains in gitlab yet.
I agree, async does feel like a better description. I wish I used that term for the title.
My experience with parallel agents is that the bottleneck is not how fast we can produce code but the speed at which we can review it and context switch. Realistically, I don’t think most people have the mental capacity to supervise more than one simultaneous task of any real complexity.
To those who have worked with autonomous background agents techniques, can you describe the stack and the workflow?
Has anyone set up a local only autonomous agent, using an open source model from somewhere like huggingface?
Still a bit confused on the details of implementing the technique. Would appreciate any explanations (thanks in advance).
Context switching between more than 2 threads of work is untenable if you want to really review code in depth. And with AI, you need to go through everything with a fine toothed comb
The same work as any senior software engineer reviewing his teams work, imho.
Eh, you quickly learn who you can trust and who needs the super skeptical detailed look with humans. With LLM’s you have you to super skeptical of everything.
[dead]
Counterpoint, you need to develop more robust automated systems so you don't have to go through everything with a fine toothed comb.
Yeah, I find I need to interrupt Claude at least once every two turns to prevent it from going off into the wrong rabbit hole.
Very strange that Devin and Claude code weren’t in the list of systems that support these workflows.
Less convincing when you open up by gushing about every other lame AI tech then proceed to insist that this new thing is the real revolution.
Comes across as someone who just wants to shill for AI for some reason.
So the solution that gets upvoted during this hype cycle is the one that requires throwing more money at these companies? Curious.
I use Claude every day. When I give it a program that does something straightforward in one file and it writes it from scratch, it does great. When I have it fix issues or add functionality to small to medium sized apps that have a mostly simple design, it does great. When I point it at codebases that are 20 years old and have a lot of indirection in their design due to years of hard lessons learned and a lot (like a LOT) of cases covered, it really struggles (just read my profile to know what codebase this is). I mostly try to get it to write changelog messages, docs and tests, where it works, but I have to really wrestle with it. I can't imagine doing anything on "vibes" and it all seems quite ridiculous if you are working on hardcore library oriented software with tens of thousands of users.
If we're going to say, who cares, with LLMs we'll never need 20 year old codebases we'll just keep writing new stuff, OK you do you.
one thing that i find works really well is to ask it to research things in the codebase and write a plan first. codex with gpt-5 is exceedingly good at doing this. then ask it to write a plan for what it would do with that information, i.e., i want you to research codebase for <goal>. then write a plan for how you would achieve <goal> given what you have learned.
Claude writes out plans and all that, it's good about that.
Sure would be great if ai agents could learn from conversations. That would really makes things better. I tell Claude to capture things in the Claude.md file, but I have to manually tend to that quite a lot.
Counteropinion: neither parallel nor single threaded agents are a gamechanger.
I mean they might be changing the game into prodicing more hard to maintain software, faster, but if that is the game you are playing, I dont wanna participate.
I totally disagree about the monorepo argument. Mono-repos exist as a reaction to poor modularization and tight coupling which creates a need to update dependencies frequently.
The reason for having dependencies in the same repo as the trunk of the project code is precisely because the dependencies aren't sufficiently generic, too dependent on the project's business domain and so they require constant maintenance.
This tight coupling means that the agent requires more context to solve problems and implement simple features. The need for more context is a problem for agents, not a benefit. Agents benefit from modularization, loose coupling and well-chosen abstractions. These attributes do not correspond to the kinds of complex, tightly integrated codebases which benefit from having a monorepo structure.
Dependencies should be like tools. If you think of a hammer, you can do a lot of different jobs with the same hammer... You can debate whether or not a hammer is the right tool for any given job, but for those jobs where a hammer is the right tool, how often do you need to tweak the hammer itself? A hammer solves a very specific problem but that problem can be generalized to countless different use cases. A hammer would make a good module.
Now if you did a project for a candle factory and let the project business domain leak into the design of your tools/modules; you may build a hammer out of wax to straighten out candles... Then in your next project building a house you will find that this hammer doesn't work for that case and needs to be modified. This is a failure of separation of concerns. The tool was not originally optimized for the specific task of applying blunt force to a limited area; it couldn't do that narrow job very well and that's why it needs to be changed. Had you built a hammer out of steel, it would likely have solved both problems even through it's a completely different use case.
Wouldn't the first "AI" use in coding be code suggestions that IDE's have already had since before LLMS?
Or UML tool that generate code?