Parallel AI agents are a game changer

69 points by shiroyasha 16 hours ago

Obviously I'm an AI-tools skeptic, but this is hilarious:

> 1. Prepare issues with sufficient context

> Start by ensuring each GitHub issue contains enough context for agents to understand what needs to be built and how it integrates with the system. This might include details about feature behavior, file locations, database structure, or specific requirements such as displaying certain fields or handling edge cases.

> You can’t do half-hearted prompts and fix as you go, because those fixes come an hour later.

> Skills That Become More Important > Full-stack understanding > Problem decomposition > Good writting skills > QA and Code Review skills

This is just software engineering?!?

edit: On the other hand, maybe I can convince people in my org to get better at software engineering by telling them its for the AI to work better.

mensetmanusman an hour ago

I tend to agree with English being the new programming language. Those with English communication struggles will struggle to code this way.
DetroitThrow 15 hours ago

>This is just software engineering?!?
Absolutely. The existence of vibe coding does not mean production code is going to be engineered without the same principles we've always used - even if we're using AI to generate a lot more code than before.
Any crowd suggesting that this was not the case has lost the plot, imo.
- Aeolun 15 hours ago
  
  People find it a lot more palatable when the AI requires all this information than when software engineers do though. If I ask for clear requirements I’m asked to just figure it out. But if the AI implements nonsense without clear requirements that the fault of the specs.
  
  tjr 14 hours ago
  
  I am amazed at how suddenly people are on board with writing clear design documentation now that it means AI can generate the code rather than humans.
  I wonder how much better humans would be at generating code given the same abundance of clearly-written design documentation?
  
  dzhiurgis 13 hours ago
  
  My workplace was always pretty good at writing requirements so I call myself a chatgpt wrapper now.
  
  lazide 14 hours ago
  
  Well, that’s because the software engineers are irritating when they push back and say ‘no’ or ‘wtf’.
  When the AI does it, it’s being polite and stuff. /s, kinda.
  
  ambicapter 13 hours ago
  
  You're right. That's an excellent observation! I will make sure to use those language patterns in all my professional communications going forward.
  /s but not really?
  
  sfn42 2 hours ago
  
  I consider it my job to figure out the requirements. The fact that they aren't specified in detail allows me to do what I think is best rather than being bound by often arbitrary specifications.
  I judge which decisions I make and which ones I bring up to my team/PO/whatever. Most of the time I just do what I think is best, some times I'll do something and then bring it up later like "I did this this way but if that doesn't work I can change it", typically for things that will be easy to change later. Some things I ask about before I do them because they won't be easy to change later.
  I'll often take technical liberties with frontend designs, for example I'll use a html select rather than reinventing the drop-down just to be able to put rounded corners on the options. I'll style scrollbars using the very limited options css provides rather than reinvent scrollbars just to follow the design exactly. Most of the time nobody cares, we can always go back later and do these types of things if we really want a certain aesthetic.
  I have never had the impression that my questions bother people, rather the opposite. I've had multiple designers say they appreciate the way I interact with them, I respect their work and their designs but I ask them if something looks like an oversight or I'm not exactly sure what their intention is. POs and such are always happy to answer simple questions, I make it easy for them: here's a decision we need to make, I want you to make it. Maybe I have a suggestion for what I would prefer and some reasons why I prefer that solution.
  I don't expect them to think of everything and answer all my potential questions in advance, that's just unnecessary and difficult work.
skhameneh 14 hours ago

> On the other hand, maybe I can convince people in my org to get better at software engineering by telling them its for the AI to work better.
Really good engineering practices are fundamental to get the most out of AI tooling. Convoluted documentation, code fragmentation, etc all pollute context when working with AI tools.
In my experience, just having one outdated word (especially if it's a library name) anywhere in code or documentation can create major ongoing headaches.
The worst part of it is trying to avoid negative assertions. When the AI tooling keeps trying to do "the wrong thing" it's sometimes a challenge to rephrase instructions for "the right thing" to frame a positive assertion.
shiroyasha 15 hours ago

Yes. AI assisted software engineering is still software engineering. I don't see that part changing anytime soon.
pvtmert 14 hours ago

> This is just software engineering?!?
Indeed yes. Although most places shipping software in a "software development" and/or "programming" fashion for many years.
Many, many places certainly do not do the engineering part, even though resulting product is a software.
wrs 14 hours ago

Yeah, it’s funny, we may finally have a way to get developers to write documentation for other developers, it’s just that the other developers aren’t human!
rukuu001 14 hours ago

Yes, the ability to clearly and unambiguously communicate what's required works on both humans and machines.
electroglyph 15 hours ago

lmao, "good writting skills" =)
- ambicapter 13 hours ago
  
  [sic]
- ScotterC 15 hours ago
  
  I lol'ed too but then thought - at least he actually wrote this!
  
  shiroyasha 15 hours ago
  
  Heh, damn. Made a typo at the worst spot

mmaunder 14 hours ago

The author is lying. My team and I are heavy users of Claude code and other agents and it ain’t like this. You need to manage an AI coding agent carefully and course correct frequently. There are cases for parallel agents but they are tasks like parallel document fetches and summarization, and other tasks that don’t require supervision.

The idea of having multiple parallel agents merge pull requests or resolve issues in parallel is still just an idea.

Please don’t post or upvote attention seeking crap like this. It gives a very exciting and promising technology a bad name.

hu3 14 hours ago

Your comment is disproportionately rude. Just because your team can't leverage multiple coding agents doesn't mean no one else can.
And even if OP also can't, this is a good place to discuss possible problems and solutions for parallel development using coding agents.
Please refrain from gatekeeping.
- bn-l 6 minutes ago
  
  No. Unfortunately there’s a problem now of people blatantly lying about the ability of LLMs to get attention. And it’s extremely effective.
  I say this as someone who uses them every day for programming and is also excited now and for the future possibilities. The just blatant lying needs to stop though and needs to be called out.
- mmaunder 14 hours ago
  
  “With this approach, I can manage to have 10–20 pull requests open at once, each handled by a dedicated agent.”
  A quote from the post. No, I think my post is calibrated quite well considering what OPs post does to our industry.
  
  hu3 14 hours ago
  
  Having 20 PRs open at once doesn't necessarily mean managing 20 agents simultaneously.
  It can mean, for example, that 2 agents worked for some time in a list of 20 TODO features and produced 20 PRs to be reviewed. They could have worked overnight even.
  You're seemingly judging from the least generous interpretation, which is not constructive and is also against HN guidelines fyi.
  
  mmaunder 13 hours ago
  
  I’m not ok with someone self promoting here at the cost of thousands of people thinking they’re either not smart enough, or are doing something incorrectly. We saw this same pattern during the dot com boom a quarter century ago with self promoters creating a “you just don’t get it” culture which eventually collapsed like a house of cards. What we share should be reproducible by others and we should avoid hand wavey excitement without substance. Especially here on HN where many of the next great companies and ideas will be born.
  
  hu3 13 hours ago
  
  Technology evolves. At some point there's going to be things that other people are doing that you can't replicate yet. That doesn't mean you're not smart enough. But it might mean that you are doing something wrong. Often though, you just got to try different things or wait for methodologies to consolidate and become mainstream.
  Even if parallel agents is not something easily done currently, debating about ways to do it is constructive enough for me.
  
  karn97 2 hours ago
  
  [dead]
  
  shiroyasha 14 hours ago
  
  Exactly! To make it more clear, here is how I approach my day:
  9-10am: I comb through our issue list, figuring out which issues are well defined, which need more input or design decision. => I pick a hand-full, lets say 10 that I kick off to run in the background, and lets say another 10 for further specification.
  10-2pm: I thinker with 10 issues to figure out the exact specs and to expand the requirement list.
  2pm-6pm: I review the code written by the agents, one by one. I kick off for further work things that need more input, or merge things that look good.

osn9363739 15 hours ago

Can this guy, or someone else post a full days (4-8 hours, or what ever is spent in the weeds) stream of work to youtube or something. I just want to watch the process to see what I'm missing. Or if there is anyone that already does that can they recommend it to me. I would appreciate it.

slig 15 hours ago

https://youtu.be/xAKVi_jvvg4
Two hours of Web Dev Cody.
- pton_xd 11 hours ago
  
  Got through about 45 min at 2x speed / some skipping ahead out of pure fascination. Man that's something else. It's like bug-driven-development. Get the LLM to churn out a huge chunk of text, then skim the code for about 10 seconds and say it looks good. Then spend a while testing and hitting one error after the next until it finally seems to work. Repeat.
- ath3nd 5 hours ago
  
  Wow, I didnt expect that dystopias can be so boring.
  If somebody like that producing code of like that low quality worked with me, I can see myself spilling coffee or acid on them or their laptop.
shiroyasha 15 hours ago

Web dev cody is great. I recommend him.
I (author) sometimes stream my work here as well https://www.youtube.com/@operatelybackstage.
_345 15 hours ago

Are you saying that because you're also skeptical? I haven't had the best time switching to agent coding. I mean for throwaway work its fine but its kind of boring and aider still messes up from time to time
- osn9363739 15 hours ago
  
  I probably lean on the sceptical side of the spectrum. I'm not against giving it a go if I can get value out of it but I'm not having the wonderful experience that these people are having. - The asynchronous nature of it slows me down and it feels the opposite of what this bloke is saying around getting into a flow. - I miss things because I'm not thinking it all the way through. - The issues with errors or hallucinations. - It does not feel faster (I might blow through a couple of things really fast, but the issues created elsewhere sometimes eat all that saved time up). - The quality of work is all over the shop. Bigger projects just fall apart after a while. I also wonder if the way I think is hindering me. I don't like natural language. I struggle to communicate at the best of times. All my emails are dot points. If someone asks me for a diagram I write it in plantuml or using a python library. I work in DevOps and love declarative manifests and templates.
  
  adriand 14 hours ago
  
  Try as an initial step having the agentic AI improve your prompt for you. I have a "prompt improvement prompt template", which is a standardized document (customized for each project I'm working on), that has a bunch of boilerplate instructions in it, along with a section where I paste in my first-draft prompt. I then feed this document (boilerplate + crappy prompt) into the AI and it creates a way better prompt for me. Then I edit that to ensure it's correct, and then that becomes the prompt I use.

pvtmert 14 hours ago

Is it me or the post sounding (showing!) that they haven't tried the mentioned approach in real life.

Because in real life, one agent tries to fix build issue with rm -rf node_modules and the other is already running a server (ie: npm server), conflicting with each other nearly all the time!. (even if it's not a destructive action, the second npm server will most likely to fail due to port-allocation conflicts!)

Meanwhile, what I found helpful is that: 1. Clone the same repo twice or three times 2. In each terminal or whatever, `cd` into it 3. Create a branch, run your ~commands~ prompts (each is their own session with their own repo) 4. commit/push then merge/rebase (also resolve conflicts if needed, use LLM again if you want)

Any other way multiple agents work in harmony in a single repo (filesystem/directory) at the same time is a pipe-dream with the current status of the MCP and agents.

Let alone being aware of each other (agents), they don't even have a proper locking mechanism. As soon as you make a new change, most of the editing (search/replace) functionality in the MCPs fail miserably. Then they re-read the entire file, just creating context-rot or over-filling with already-existing stuff. Soon you run out of tokens (or just pay extra for no reason)

> edit: comments mentioned that each agent runs in a VM isolated from others, kinda makes sense but still, there will be massive merge-conflicts unless each agent runs in a completely different set of service/code-base (ie frontend vs backend or couple of micro-services)

abound 14 hours ago

The post is about using GitHub's integrated Copilot tooling, where each issue gets its own instance presumably running in a sandbox. This sidesteps the issues you're talking about here.
shiroyasha 14 hours ago

I don't claim to have lots of experience with this, I've been only doing it for a couple of weeks, but I do feel that some of your comments are disingenuous.
---
> Any other way multiple agents work in harmony in a single repo (filesystem/directory) at the same time is a pipe-dream with the current status of the MCP and agents.
Every agent runs in a separate VM on GitHub.
> Let alone being aware of each other (agents), they don't even have a proper locking mechanism.
Never claimed this. Feels like a strawman argument.
GZGavinZhao 14 hours ago

git workspaces?
- lacasito25 2 hours ago
  
  u mean worktrees

furyofantares 15 hours ago

The sweet spot for me is 2 agents on different projects. Surprisingly the context switch is easy. It's harder when doing 2 tasks on the same project.

merlincorey 15 hours ago

> on different projects
This seems like an important caveat the author of the article failed to mention when they described this:
> you can have several agents running simultaneously - one building a user interface, another writing API endpoints, and a third creating database schemas.
If these are all in the same project then there has to be some required ordering to it or you get a frontend written to make use of a backend that doesn't have the endpoints used, and you get a backend that makes use of a different database schema than the separately generated database schema.
- kasey_junk 15 hours ago
  
  This is just project management. Teams of software devs have been doing this for decades. And it’s easier with agents because there is no harm in letting one sit idle.
- furyofantares 15 hours ago
  
  On the same project you can use worktrees or otherwise separate clones of the repo - that part is not that bad. My comment was just about my own context switch.
rcarr 13 hours ago

A technique I have found that works well is to have it working on one feature and then to have another session planning the next. Whilst it's busy generating some code, I open up another instance, tell it the next task and instruct it to create a gherkin feature file with an implementation plan. I then go back and forth between reviewing the code for the current feature and the plan for the next one.

modarts 15 hours ago

It still amuses me how literally people took Kapathy's famous tweet around vibe coding https://x.com/karpathy/status/1886192184808149383

If people were to actually read beyond the first sentence, it would become clear very quickly that this was meant to be tongue in cheek.

pvtmert 14 hours ago

Because most people have the context-window of 10 tokens, they do not read further than the first sentence (or two).
stavros 14 hours ago

I don't think it's tongue-in-cheek at all. It refers to a specific type of LLM coding, where you literally don't care about how bad the code is and just code stuff and hope it works. That's how I use the term, and that's why I use it rarely.
krapp 14 hours ago

People took it seriously because that's exactly how a lot of LLM users think and exactly what they want 'coding' to be. Honestly I'm not even certain it is satire.

adriand 14 hours ago

I'm starting to think that the rather slow nature of Claude Code is a feature. In fact if they suddenly sped things up by 10X, I would want an option to slow it back down. Sometimes I am fine with it working unsupervised while I empty the dishwasher or take a shower, but a lot of the time I watch it work. Not only does this help me stop it from going down rabbit holes / chewing through all of my Opus usage cap, but I have a much better understanding of what it's built, in the same way I might if I was pair-programming with someone and they were driving.

The idea of having multiple instances working in parallel sounds like a nightmare to me. I find this technology works best when it is guided, ideally in real time.

muratsu 15 hours ago

I find Codex and Claude Code to have different strength/weaknesses and wanted to be able to use them from a single interface. Currently hacking on https://devfleet.ai to make agent management more easy on myself.

Briefly mentioned on the article but async agents really thrive on small and scoped issues. Imagine hooking them up to your feedback tool (eg canny) and automatically having a PR as you review the customer feedback. Now this would likely not work for large asks but for smaller asks, you can just accept the PR and ship it really fast!

conradkay 15 hours ago

Cool project! Do you think a lot of Codex's strengths are just from using GPT-5 as the model?
- muratsu 14 hours ago
  
  The codex model is trained differently than the normal models. It has extra training on how to use cli and I find it to be better at project scope tasks (eg running tests, migrations, etc). Whereas in my experience Claude is the better coding model.

manveerc 15 hours ago

When I read the title, I thought you were referring to https://parallel.ai, which also is a game changer in my opinion :)

PS: I have no affiliation with Parallel the company

epolanski 15 hours ago

Look, I like AI coding but we're already way past the need for parallelism.

LLMs write so much code in such a short time that the bottleneck is already the human having to review, correct, rewrite.

Parallel agents working on different parts of the application just compound this problem worse, it's impossible to catch up.

The only far fetched use case I can see is swarming hundreds of solutions against a properly designed test case and spec documents and having an agent selecting the best solutions.

Still, I'm quite convinced humans would be the bottleneck.

rcarr 15 hours ago

You are the main thread:
https://www.claudelog.com/mechanics/you-are-the-main-thread/
SatvikBeri 15 hours ago

It really depends on the project. For example, there's a lot of thorny devops debugging where I can just let Claude spin for 30 minutes and it'll solve the problem (or fail) with a relatively short final answer.
The sweet spot for me tends to be running one of these slower projects on a worktree in the background, and one more active coding project.
- epolanski 8 hours ago
  
  Yeah sure, I mean, there always be problems you can swarm..
OutOfHere 10 hours ago

Exactly. With other models that are not Claude, the code generation for an issue takes a minute at most, whereas writing the detailed specification for it as a human takes me days or longer. Parallel code generation is as relevant to me as having a fast car stuck in traffic at a red light.

tptacek 15 hours ago

So:

(1) I feel like most people call these async agents, though maybe "parallel" is the term that will stick.

(2) Async is great for reasons other than concurrent execution.

(3) Concurrent execution is tricky, at least for tightly defined projects, because the PRs will step on each other, and (maybe this is just me) I would rather rewrite an entire project than try to pick through a complicated merge conflict.

CuriouslyC 14 hours ago

Nah, I saw this problem a while ago and already spec'd out the solution. First, agents need to be doing atomic commits, and second you can just have a massive merge queue with bisection, if you're using bazel you can handle ci gating on thousands of PRs with very little overhead, and when a merge batch fails you find the bad patch set in O(log(n)) time and dispatch to an agent for reconciliation. I even built a prototype, works great in benchmarks but I don't have a need for it over merge trains in gitlab yet.
shiroyasha 15 hours ago

I agree, async does feel like a better description. I wish I used that term for the title.

ravila4 12 hours ago

My experience with parallel agents is that the bottleneck is not how fast we can produce code but the speed at which we can review it and context switch. Realistically, I don’t think most people have the mental capacity to supervise more than one simultaneous task of any real complexity.

sovietmudkipz 14 hours ago

To those who have worked with autonomous background agents techniques, can you describe the stack and the workflow?

Has anyone set up a local only autonomous agent, using an open source model from somewhere like huggingface?

Still a bit confused on the details of implementing the technique. Would appreciate any explanations (thanks in advance).

asdev 15 hours ago

Context switching between more than 2 threads of work is untenable if you want to really review code in depth. And with AI, you need to go through everything with a fine toothed comb

shiroyasha 15 hours ago

The same work as any senior software engineer reviewing his teams work, imho.
- lazide 14 hours ago
  
  Eh, you quickly learn who you can trust and who needs the super skeptical detailed look with humans. With LLM’s you have you to super skeptical of everything.
- aisizbzbzh 15 hours ago
  
  [dead]
CuriouslyC 14 hours ago

Counterpoint, you need to develop more robust automated systems so you don't have to go through everything with a fine toothed comb.
Aeolun 15 hours ago

Yeah, I find I need to interrupt Claude at least once every two turns to prevent it from going off into the wrong rabbit hole.

kasey_junk 15 hours ago

Very strange that Devin and Claude code weren’t in the list of systems that support these workflows.

anthem2025 14 hours ago

Less convincing when you open up by gushing about every other lame AI tech then proceed to insist that this new thing is the real revolution.

Comes across as someone who just wants to shill for AI for some reason.

tomlockwood 15 hours ago

So the solution that gets upvoted during this hype cycle is the one that requires throwing more money at these companies? Curious.

zzzeek 15 hours ago

I use Claude every day. When I give it a program that does something straightforward in one file and it writes it from scratch, it does great. When I have it fix issues or add functionality to small to medium sized apps that have a mostly simple design, it does great. When I point it at codebases that are 20 years old and have a lot of indirection in their design due to years of hard lessons learned and a lot (like a LOT) of cases covered, it really struggles (just read my profile to know what codebase this is). I mostly try to get it to write changelog messages, docs and tests, where it works, but I have to really wrestle with it. I can't imagine doing anything on "vibes" and it all seems quite ridiculous if you are working on hardcore library oriented software with tens of thousands of users.

If we're going to say, who cares, with LLMs we'll never need 20 year old codebases we'll just keep writing new stuff, OK you do you.

localhost 14 hours ago

one thing that i find works really well is to ask it to research things in the codebase and write a plan first. codex with gpt-5 is exceedingly good at doing this. then ask it to write a plan for what it would do with that information, i.e., i want you to research codebase for <goal>. then write a plan for how you would achieve <goal> given what you have learned.
- zzzeek 12 hours ago
  
  Claude writes out plans and all that, it's good about that.
  Sure would be great if ai agents could learn from conversations. That would really makes things better. I tell Claude to capture things in the Claude.md file, but I have to manually tend to that quite a lot.

ath3nd 4 hours ago

Counteropinion: neither parallel nor single threaded agents are a gamechanger.

I mean they might be changing the game into prodicing more hard to maintain software, faster, but if that is the game you are playing, I dont wanna participate.

jongjong 12 hours ago

I totally disagree about the monorepo argument. Mono-repos exist as a reaction to poor modularization and tight coupling which creates a need to update dependencies frequently.

The reason for having dependencies in the same repo as the trunk of the project code is precisely because the dependencies aren't sufficiently generic, too dependent on the project's business domain and so they require constant maintenance.

This tight coupling means that the agent requires more context to solve problems and implement simple features. The need for more context is a problem for agents, not a benefit. Agents benefit from modularization, loose coupling and well-chosen abstractions. These attributes do not correspond to the kinds of complex, tightly integrated codebases which benefit from having a monorepo structure.

Dependencies should be like tools. If you think of a hammer, you can do a lot of different jobs with the same hammer... You can debate whether or not a hammer is the right tool for any given job, but for those jobs where a hammer is the right tool, how often do you need to tweak the hammer itself? A hammer solves a very specific problem but that problem can be generalized to countless different use cases. A hammer would make a good module.

Now if you did a project for a candle factory and let the project business domain leak into the design of your tools/modules; you may build a hammer out of wax to straighten out candles... Then in your next project building a house you will find that this hammer doesn't work for that case and needs to be modified. This is a failure of separation of concerns. The tool was not originally optimized for the specific task of applying blunt force to a limited area; it couldn't do that narrow job very well and that's why it needs to be changed. Had you built a hammer out of steel, it would likely have solved both problems even through it's a completely different use case.

lawlessone 15 hours ago

Wouldn't the first "AI" use in coding be code suggestions that IDE's have already had since before LLMS?

Or UML tool that generate code?