With the rise of Large Language Models (LLMs), coding has never been more accessible, but also never more misunderstood. The promise of AI-generated software is intoxicating—until you hit a wall. You try a few prompts, the AI generates something that looks great, and then, as you attempt to scale or maintain it, the cracks appear. It forgets. It breaks. It misinterprets.
The fundamental issue is not the capability of AI, but how we use it. AI is great at generating convincing answers. But the best engineers start by asking great questions before giving an answer.
Why does AI so far fail to deliver full-fledged applications?
We believe there are 3 fundamental reasons:
- Lack of semantic context: AI will make assumptions based on its training data, which rarely matches the exact nuances of your project. This is partly the human's fault and caused by what we call the "lazy prompting syndrome". Given how good LLMs are at generating answers, we expect incredible outputs from very short prompts.
- Forgetting and Fragility: Even when writing small snippets of code, LLMs can make steps backwards in the implementation, forgetting lines of code or breaking working algorithms while trying to improve it.
- Limited understanding of the existing codebase: it leads to changes causing unintended breaks elsewhere in the codebase. While all the pieces of the puzzle are functional, they may not assemble well together.
In programming, the bottleneck isn't typing—it's thinking. Defining what the code should do is far more important than focusing on how it should do it. Without clear guidance, we leave too much to the AI, expecting it to determine both the "what" and the "how." Our role is to define the "what," while AI can handle the "how."
The Future of AI-Assisted Development: A New Approach
To build reliable software using AI, we must fundamentally change our approach. Instead of asking AI to generate entire applications in a few prompts, we must break the process down into atomic, well-defined steps.
Here's how we believe it should be done:
1. Separation of Concerns
Let's use an analogy. In order for a democracy to work well, it has been established that the 3 main powers should be held by separate bodies. To build software with AI, we must separate the AI's tasks along the same principles:
- One body writes the laws (specifications & acceptance criteria).
- Another enforces them (writing the implementation).
- A third interprets and audits them (writing the tests and asserting them).
Such architecture acts as a guardrail, ensuring that the AI cannot cut corners either on the requirements or the implementation itself.
2. TDD is the Future and AI Makes It Feasible
TDD (Test-Driven Development) is a development approach where automated tests are written first to define functionality before writing the actual code.
Aircrafts are the safest way to travel today, and yet we don't just rely on our trust that they are made according to specifications. We keep testing every single part and system more regularly than any other.
Our belief is that the same should be true for an app served to millions of users. AI can generate impressive-looking code at breakneck speed, but without rigorous testing, it remains untrustworthy. Simply accepting the code produced by an AI without the reassurance given by thorough tests is not acceptable.
In programming, Test-Driven Development (TDD) has long been recognized as the gold standard, but has often been neglected due to time constraints. AI changes this. The cost of writing tests is now so low that it is inexcusable not to use TDD rigorously.
With AI:
- We can generate comprehensive test suites effortlessly.
- We can demand that AI writes implementation only after tests exist.
- We can achieve robustness that was previously too expensive to justify.
- We can re-write tests dynamically as the requirements change, before having the implementation changed.
3. AI Should Work on Small Isolated Units
Current AI models are great at handling small, specific tasks but it struggles when expected to generate large, complex applications holistically. The solution is to break features and behaviours into atomic components. Each piece should be:
- Clearly defined in regards to the expected behaviour for the user.
- Thoroughly tested before moving forward.
- Designed to integrate seamlessly into a larger system.
4. Seniority as a Guardrail
Such system is not meant to make junior engineers more productive, it's meant to be put in the hands of highly experienced developers who can both run this process and review the output.
We don't believe the future is a legion of junior engineers armed with AI-generated snippets, we believe it will be made by Navy SEAL teams of senior engineers who act as architects, reviewers, and quality enforcers. We may put most of coding on autopilot, but we still need a pilot in the plane.
There is no doubt AI will change software development, but our take is that if you want it to work on large scale projects, we must enforce separation of concerns, leverage TDD, break down problems into small, isolated units, and ensure senior oversight. Only then, can we create a system where AI accelerates development without sacrificing reliability.
The goal is not to replace skilled engineers but to elevate their role, shifting from manual coding to guiding, reviewing, and enforcing quality.
The Autopilot Analogy
Forty years ago, commercial aircraft required two pilots actively managing every aspect of flight. Today, those same cockpits still have two pilots, but autopilot systems handle 90% of the actual flying. The pilots haven't disappeared—their role has evolved to focus on critical decision-making, monitoring systems, and handling exceptional situations.
Software development is following a similar trajectory. We will always need skilled engineers, but AI is becoming our autopilot. Engineers will spend less time writing boilerplate code and more time architecting solutions, making critical decisions, and ensuring quality. The result? Development velocity will increase dramatically while maintaining—or even improving—reliability and safety. Just as we still trust human pilots with our lives despite autopilot handling most of the flight, we'll continue to need expert engineers at the helm of software development, even as AI handles more of the routine coding.