How AI-Powered Development Actually Works: Claude Code, Agents & Automated Testing

Having been fully immersed in AI-powered development for over 2 years, people often ask me how it works in practice. This article aims to provide a few insights.

Since the start of 2026 AI has effectively surpassed the ability of humans to write the vast majority of code required in typical business systems. You may have heard about hallucinations and other ways AI can fail, but these have largely been overcome with the most recent AI models, particularly Claude's Opus 4.6.

Furthermore, given the overall speed gains provided by AI, any issues that do arise are easily resolved in a timeframe that's still much shorter than hand coding. And besides, didn't human developers ever make errors when building software?

AI can now generate the entire codebase for a typical application in a matter of hours or days, rather than weeks or months. The main task now is to manage that process to the required outcome.

At Provanta we primarily use Claude Code CLI from Anthropic as our development tool of choice, coupled with the Claude Opus model. We also use OpenAI's Codex tool (and model) along with a handful of other tools such as Google's Gemini for tasks like image generation.

Some developers prefer to use Codex, others like Google's Gemini, but these are matters of personal preference. We use Claude because historically is has produced better code across complete projects than any other model we have tried.

Claude Code CLI is a command line tool, i.e. it runs in an old-school DOS window where you can type in propmpts. You run it from a folder and it can then read and write the files in that folder directly.

There is also a Windows version available, but as a developer, the CLI version feels more natural - for now at least!

We can use these tools with any programming language and technology platform, and we use a variety depending on the task at hand. Currently we are working in TypeScript, C#.Net and Python, along with React, PostgreSQL, SQL Server and of course HTML/CSS and JavaScript.

We spend our time between preparing the materials required to support code generation, and in conversation with Claude Code, managing its work. In times before AI we would live in IDE's (integrated development environments), where we would edit code by hand, but we are finding that we do this far less now and its quicker to do everything conversationally through Claude Code.

Claude Code is not a chatbot that generates snippets of code for you to copy and paste. It is an agentic coding tool — an AI that reads the codebase, understands context, makes changes across multiple files, runs commands, and executes builds and tests, all within a structured environment that allows experienced AI developers full visibility of whats going on and to stay in control.

But Claude Code on its own is just a harness for Anthropic's models. What makes our process effective is the system and content we have built around it: the combination of Claude plug-ins, examples, prototyping, specification and automated testing that together produce commercial-grade software so much more quickly and accurately than traditional coding methods.

It starts with the prototype

Before any production code is written, we build a fully-working prototype of the ptarget system. This is an AI-generated, interactive HTML, CSS and JavaScript application, that runs in your browser and looks and feels just like the finished system. Users can click through screens, see their data laid out, and experience the workflow as it will be in production. (see my previous blog post "Rapid Solution Prototyping in the age of AI-Powered Development")

The prototype serves two critical purposes. First, stakeholders can see and interact with what they are getting, which eliminates the ambiguity that plagues traditional written specifications. Second, it becomes a detailed reference that directly feeds the AI code generation process.

We iterate the prototype with stakeholders until it is an accurate representation of the desired system. At this point it effectively becomes the contractual specification — you approve what you see before the build begins and everyone knows exactly what to expect.

From prototype to specification documents

The prototype is now a key input to the code generation process and it is complemented by traditional specification documents that cover the aspects a prototype cannot show: business rules, architecture, data models, integration requirements, security constraints, and non-functional requirements such as performance and scalability targets.

These documents are not lengthy dissertations written for the sake of process. They are precise, structured references that Claude Code reads alongside the prototype to understand exactly what needs to be built. The combination of a visual prototype and written technical specifications gives the AI far richer context than either would provide alone, and it is this combination that drives the quality of the output.

Claude Code: skills, hooks and agents

Claude Code provides several plug-in mechanisms that allow us to customise and extend its behaviour to match our development process. Understanding these explains how AI-powered development moves beyond simple code generation into a structured, repeatable engineering workflow.

Skills

Skills are reusable prompt templates that encode domain knowledge and standard procedures. We maintain a library of skills that cover common patterns across our projects: setting up a new API endpoint with our standard error handling and authentication, creating database migration scripts that follow our naming conventions, generating UI components that match the design system established in the prototype, and many more.

When Claude Code is given a task, relevant skills are referenced and loaded into its context so it follows our established patterns automatically rather than inventing its own approach for each task. This helps to ensures consistency across the codebase. Skills effectively turn our accumulated development experience into something the AI can apply instantly and consistently.

Hooks

Hooks are automated actions that execute at specific points in Claude Code’s workflow. They allow us to enforce quality gates without manual intervention and trigger repetitive project management tasks. For example, we configure hooks that automatically run code reviews and formatting checks before any code is committed, that trigger tests after changes are made to critical modules, and that validate database migrations before they are applied.

Hooks are particularly powerful because they catch issues at the moment they are introduced, rather than leaving them to be discovered during a later review cycle. If Claude Code generates a function that does not meet our coding standards, the hook flags it immediately and Claude Code corrects it before moving on. This tight feedback loop is one of the reasons AI-powered development produces fewer bugs than traditional methods — problems are caught and fixed in seconds rather than days.

Agents and sub-teams

Agents in Claude Code are like members of a traditional dev team. The represent specialisations and each has specific skills. Claude Code can run agents as sub-processes that have their own context window and handle complex, multi-step tasks autonomously. Rather than asking a single AI instance to do everything, we configure agents for specific roles such as front-end design, API design, implementation or review. This has the added benefit of helping to optimise context usage for each task (see "Context is king" below).

We take this further with agent sub-teams — coordinated groups of agents that work together on larger tasks. These agents communicate with each other, passing context and results, which means complex features can be built more quickly while maintianing the same rigour as if a team of experienced developers were collaborating on the work.

Examples lead the way

One of the most powerful techniques we use to guide AI code generation is providing examples. When we want Claude Code to implement something, we give it a basic example of the pattern that we would like it to follow. We iterate on this until we have the exact approach we need and then save this to the project's example library, which is then linked to one or more skills. This is far more effective than giving a narrative description because it shows the AI exactly how we want the code to look and behave, including the structure, naming conventions, error handling, and even comments. The AI can then adapt that example to the new context rather than trying to invent something from scratch. We find this is the best way to ensure consitency.

Test-driven development with AI

Testing is not an afterthought in our process — it is considered at the outset and then woven into every stage. We follow a test-driven approach where test expectations are defined before implementation begins, and AI generates both the tests and the code that satisfies them.

The specification documents include acceptance criteria for every feature, expressed in clear, testable terms. Before Claude Code writes a line of production code, it generates test cases that encode those acceptance criteria. Claude Code then writes the implementation, runs the tests, and iterates until they all pass.

Layers of automated testing

We implement multiple layers of testing, all automated:

Unit tests verify that individual functions and methods behave correctly in isolation. Claude Code generates these alongside the code it writes, ensuring every significant piece of logic has test coverage from the moment it is created.

Integration tests verify that components work together correctly — that API endpoints return the right data, that database queries produce the expected results, and that services communicate properly. These tests run against real databases and real service instances, not mocks, because we have learned that mocked tests can mask issues that only appear in production.

End-to-end tests verify complete user workflows by simulating real user interactions through the application. These are driven by Claude's ability to automate browser interactions. These tests follow the same paths that users will take in production, confirming that the system works as a whole rather than just in pieces.

Regression tests are maintained as a growing suite that runs on every change, ensuring that new features do not break existing functionality. Because AI can generate and run these tests so quickly, we maintain far more comprehensive regression coverage than would be practical with manual testing.

Continuous integration

All tests run automatically through our continuous integration pipeline. Every change that Claude Code makes is committed, tested, and validated before it is merged into the main codebase. If any test fails, the change is rejected and Claude Code is directed to fix the issue. This means the main codebase is always in a working state — there are no “it works on my machine” surprises.

The CI pipeline also runs static analysis tools that check for security vulnerabilities, code quality issues, and potential performance problems. These checks happen automatically on every change, providing an additional layer of quality assurance beyond what the tests themselves cover.

Context is king

A major key to success when developing with AI is learning to provide it with the right context for any given prompt. Even with the huge context windows of current models, the issue of "context rot" is still a problem. Context rot occurs when the AI loses track of what information in its memory is relevant as the conversation grows longer, causing the quality of its responses to deteriorate.

We counter this by providing concise, and relevant context for each prompt, along with progressive disclosure which allows the model itself to decide what information it needs to load into its context window, thus minimising overload and context rot. Use of agents allow us to provide the specific information for the specific task being handled by the agent, rather than having to provide a wider context to a less speific agent. We also use techniques like summarizing previous interactions and maintaining a consistent style guide to help the AI stay on track.

When the context is managed correctly, the model is able to focus clearly on the task in hand using its context window effectively. It can build code that works as specified while aligning correctly and consistently with the required architectural approach and interfacing correctly with other solution components.

Human oversight throughout

It is worth emphasising that none of this happens unsupervised. Our experienced human developers control all of the above steps and recieve notifications as tasks complete. We define the architecture, review the specifications, configure the skills and hooks, design the test strategies, and review the output at every stage. Claude Code is an extraordinarily capable tool, but it operates within boundaries that we set and monitor.

We review every significant piece of generated code. We validate that the architecture decisions are sound. We confirm that security requirements are met. We check that the implementation matches the approved prototype and specification. The AI does the heavy lifting of writing and testing code, but the engineering judgement — the decisions about what to build, how to structure it, and whether the result is good enough — remains firmly with our team.

This is what separates professional AI-powered development from vibe coding. The AI is faster and more consistent than manual coding, but it needs experienced professionals to direct it, validate its output, and make the architectural and design decisions that determine whether a system will be maintainable, secure, and fit for purpose over the long term.

The result

The combination of Claude Code, prototypes, specifications and examples produces software that is delivered in days or weeks rather than months, costs 80–90% less than traditional development, and arrives with comprehensive test coverage and consistent code quality from day one.

As a seasoned developer, steeped in the old ways of coding, being AI-powered still feels like a superpower. However it is a well-engineered process that uses the best available tools in a disciplined way, guided by decades of experience in building commercial software. The AI handles the volume and velocity of coding. The team handles the thinking, the quality, and the relationship with your business. Your team will spend less time and money working with us on the project and can focus on user acceptance and migration.

Having spent decades toiling to get features built on time and to meet expectations, having the power of AI at our fingertips is incredibly liberating. We can build features so quickly and easily now, its hard to stop once you get started!

If you would like to see this process in action, get in touch. We are always happy to walk through how we would approach your specific project.