Ask HN: Are you using Spec Driven Development?
Developers working with LLMs in a professional environment, is Spec Driven Development really the way forward with LLM coding agents?
Over the past few months I've really tried to lean into SDD, after trying SpecKit and GSD, I finally settled on OpenSpec as my preferred spec generator.
However I'm not quite ready to give up on understanding the actual code that the LLM is producing.
To help with maintaining my comprehension of the generated code, I've tried a few different methods, such as generating smaller specs for more granular features, and pausing after each major task block to review the code. Approaching this via "/ospx-apply Run until task 3.5 is complete and then pause for review" gives me the ability to vet the code being produced and make corrections if needed. But honestly, the major issue that I have is that coding style and good code hygiene is often lacking, even with appropriate AGENTS.md and Skills loaded (the usual issues of multiple helper methods, inconsistent naming).
I can't help but wonder what developers are doing in their workplaces. Is SDD actually used as part of the developer toolkit? Am I trying to keep too much control over the code and should I just let the LLM be free within the bounds of a spec to guide it? Or should I abandon SDD and look at another path to ensure that code being produced can at least be maintained long term?
Might I suggest that you're making the whole process a little more difficult than it needs to be with all those specific tools and frameworks?
Example: I'm building a music player in Swift for MacOS. I want to have visualizers that use Metal to render directly with the GPU. I don't understand GPU rendering. I never will. So, what do I do?
I tell a strong model like Opus or Fable to write a spec first. Tell it that I want "lesser/cheaper models" to be able to do the actual implementation without needing to guess or think too hard about it. It builds something exactly like this: https://github.com/bocan/bocan-music/blob/main/docs/design-s...
As you can see, I don't just tell it to build software. I tell it to build tests. My CLAUDE.md has guardrails. It knows that if it's told to build anything, it has to format, lint, run the new tests and old tests. It knows to keep my coverage above 80%. When it's done, it commits with a highly detailed commit message.
When all that's done. I test the software extensively by hand. Does it do exactly what I want? If so, great. Next feature. If not, I go back and have my lesser/cheaper model fix the code and add tests. I rarely if ever update the specs if it finds bugs as there's no need.
Cool idea, hadn't considered the use of lesser models picking up the implementation set by the strong model plan.
Question regarding the actual code generated, is this a priority to you beyond looking at it meets the design doc? For example, if the LLM goes ahead and builds the feature that passes the tests, but the code contains duplicate functions, abstractions etc, would you steer the LLM to fix that, or even dive in yourself to rearchitect?
With that particular project, no. Mostly because I applied a world of SKILLS into it at the start and reviewed the code as it was being built and never found any such problem so eventually I just started trusting it.
In other projects (my work life revolves around Terraform and infrastructure as code) I absolutely check and re-steer on a regular basis. I _suspect_ I could resolve those issues with better Terraform SKILLs but writing Terraform is more natural to me that dealing with Swift so I haven't bothered.
I use SDD since Feb for all my mid+ size projects. This works great for me from many angles:
- two levels of task decomposition - first is multiple steps in workflow, second is task decomposition into multiple subtasks - help to keep session context clean and focused, thus improving adherence, reducing cost
- multiple levels of model understanding verification - you verify first requirements step, then design step and only later code implementation. This helps to ensure that even if miss implementation happens it is rather tactical, on a code level, than strategical - on design level
- agent works as a helper to analyze what you need to build, interview you, builds spec which is better than just short prompt
- works nicely with coding agents orchestrated around task queue
Started with GSD and later implemented my own to fit better my typical size of features.
I use https://github.com/EveryInc/compound-engineering-plugin/blob... from https://every.to.
Here's a readup on it: https://every.to/guides/compound-engineering
There so much better out there, I use https://github.com/ChristopherAlphonse/adadex with gsd skill