How AI-Powered Development Actually Works: Claude Code, Agents & Automated Testing

Having beenfully immersed in AI-powered development for over 2 years, people often ask me how AI-powered development works in practice. This article aims to give some insight into just that.

Since the start of 2026 AI has effectively surpassed the ability of humans to write the vast majority of code required in typical business systems. You may have heard about hallucinations and other ways AI can fail, but these have largely been overcome with the most recent AI models, particularly Claude's Opus 4.6.

AI can now generate the entire codebase for a typical application in a matter of hours or days, rather than weeks or months. The task at hand is to manage that process to the required outcome.

At Provanta we primarily use Claude Code CLI from Anthropic as our development tool of choice. we also use OpenAI's Codex tool (and model) along with a handful of others such as Google's Gemini for tasks like image generation.

Some developers prefer Codex, others like Google's AntiGravity, but these are matters of personal preference. We use Claude because histrically is has produced better code across a whole project than any other model we have tried.

We can use these tools with any programming language and technology platform and we use a variety depending on the task at hand. Currently we are working in TypeScript, C#.Net and Python along with React, PostgreSQL, SQL Server and of course HTML/CSS and JavaScript.

We spend our time between preparing the materials required to support code generation and in conversation with Claude Code, managing its work. In times before AI we would live in IDE's (integrated development environments), where we would edit code by hand, but we find we do this far less now and its quicker to do everything conversationally with Claude.

Claude Code is not a chatbot that generates snippets of code for you to copy and paste. It is an agentic coding tool — an AI that reads the codebase, understands context, makes changes across multiple files, runs commands, and executes tests, all within a structured environment that allows experienced developers full visibility of whats going on and in control.

But Claude Code on its own is just a harness for Anthropic's models. What makes our process effective is the system we have built around it: the combination of skills, agents, hooks, sub-teams, prototyping, specification and automated testing that together produce commercial-grade software quickly and more accurately than traditional methods.

It starts with the prototype

Before any production code is written, we build a fully working prototype. This is a fully interactive HTML, CSS and JavaScript application that runs in your browser and looks and feels like the finished system. Users can click through screens, see their data laid out, and experience the workflow as it will be in production. (see my previous blog post "Rapid Solution Prototyping in the age of AI-Powered Development")

The prototype serves two critical purposes. First, it aligns everyone — stakeholders can see and interact with what they are getting, which eliminates the ambiguity that plagues traditional written specifications. Second, it becomes a detailed reference that guides AI code generation. When Claude Code can see exactly what the finished screens look like and how the user flows work, it generates far more accurate and complete production code than it would from a written description alone.

We iterate the prototype with stakeholders until it is an accurate facsimile of the desired system. At that point it effectively becomes the contractual specification — you approve what you see before the build begins.

From prototype to specification documents

The prototype is now a key input to the code generation process and it is complemented by traditional specification documents that cover the aspects a prototype cannot show: business rules, architecture, data models, integration requirements, security constraints, and non-functional requirements such as performance and scalability targets.

These documents are not lengthy dissertations written for the sake of process. They are precise, structured references that Claude Code reads alongside the prototype to understand exactly what needs to be built. The combination of a visual prototype and written technical specifications gives the AI far richer context than either would provide alone, and it is this combination that drives the quality of the output.

Claude Code: skills, hooks and agents

Claude Code provides several mechanisms that allow us to customise and extend its behaviour to match our development process. Understanding these explains how AI-powered development moves beyond simple code generation into a structured, repeatable engineering workflow.

Skills

Skills are reusable prompt templates that encode domain knowledge and standard procedures. We maintain a library of skills that cover common patterns across our projects: setting up a new API endpoint with our standard error handling and authentication, creating database migration scripts that follow our naming conventions, generating UI components that match the design system established in the prototype, and many more.

When Claude Code is given a task, relevant skills are referenced and loaded into its context so it follows our established patterns automatically rather than inventing its own approach for each task. This helps to ensures consistency across the codebase. Skills effectively turn our accumulated development experience into something the AI can apply instantly and consistently.

Hooks

Hooks are automated actions that execute at specific points in Claude Code’s workflow. They allow us to enforce quality gates without manual intervention and trigger repetitive project management tasks. For example, we configure hooks that automatically run code reviews and formatting checks before any code is committed, that trigger tests after changes are made to critical modules, and that validate database migrations before they are applied.

Hooks are particularly powerful because they catch issues at the moment they are introduced, rather than leaving them to be discovered during a later review cycle. If Claude Code generates a function that does not meet our coding standards, the hook flags it immediately and Claude Code corrects it before moving on. This tight feedback loop is one of the reasons AI-powered development produces fewer bugs than traditional methods — problems are caught and fixed in seconds rather than days.

Agents and sub-teams

Agents in Claude Code are like members of a traditional dev team. The represent specialisations and each has specifi skills. Claude Code can run agents as sub-processes that have their own context window and handle complex, multi-step tasks autonomously. Rather than asking a single AI instance to do everything, we configure agents for specific roles: one might explore the codebase to understand existing patterns before making changes, another might plan the implementation strategy for a complex feature, and another might execute the implementation itself or perform a review.

We take this further with agent sub-teams — coordinated groups of agents that work together on larger tasks. A typical sub-team for building a new feature might include a planning agent that designs the implementation approach, an implementation agent that writes the code, a test agent that generates and runs tests, and a review agent that checks the output against our quality standards and the specification. These agents communicate with each other, passing context and results, which means complex features can be built with the same rigour as if a team of experienced developers were collaborating on the work.

The sub-team approach is what allows us to maintain quality at speed. Instead of a single AI ploughing through a task in a linear manner, multiple specialised agents each focus on what they do well, with built-in checkpoints between them. The planning agent ensures the approach is sound before any code is written. The test agent verifies that what was built actually works. The review agent confirms it meets our standards. If any agent identifies an issue, the team iterates until it is resolved.

Examples lead the way

One of the most powerful techniques we use to guide AI code generation is providing examples. When we want Claude Code to implement something, we give it a basic example of the pattern that we would like it to follow. We iterate on this until we have the exact approach we need and then save this to the project's example library. This is far more effective than giving a narrative description because it shows the AI exactly how we want the code to look and behave, including the structure, naming conventions, error handling, and even comments. The AI can then adapt that example to the new context rather than trying to invent something from scratch. We find this is the best way to ensure consitency.

Test-driven development with AI

Testing is not an afterthought in our process — it is considered at the outset and then woven into every stage. We follow a test-driven approach where test expectations are defined before implementation begins, and AI generates both the tests and the code that satisfies them.

The specification documents include acceptance criteria for every feature, expressed in clear, testable terms. Before Claude Code writes a line of production code, it generates test cases that encode those acceptance criteria. The tests initially fail, of course, because the feature has not been built yet. Claude Code then writes the implementation, runs the tests, and iterates until they all pass. This is the classic red-green-refactor cycle, except that AI executes it in minutes rather than hours.

Layers of automated testing

We implement multiple layers of testing, all automated:

Unit tests verify that individual functions and methods behave correctly in isolation. Claude Code generates these alongside the code it writes, ensuring every significant piece of logic has test coverage from the moment it is created.

Integration tests verify that components work together correctly — that API endpoints return the right data, that database queries produce the expected results, and that services communicate properly. These tests run against real databases and real service instances, not mocks, because we have learned that mocked tests can mask issues that only appear in production.

End-to-end tests verify complete user workflows by simulating real user interactions through the application. These are driven by Claude's ability to automate browser interactions. These tests follow the same paths that users will take in production, confirming that the system works as a whole rather than just in pieces.

Regression tests are maintained as a growing suite that runs on every change, ensuring that new features do not break existing functionality. Because AI can generate and run these tests so quickly, we maintain far more comprehensive regression coverage than would be practical with manual testing.

Continuous integration

All tests run automatically through our continuous integration pipeline. Every change that Claude Code makes is committed, tested, and validated before it is merged into the main codebase. If any test fails, the change is rejected and Claude Code is directed to fix the issue. This means the main codebase is always in a working state — there are no “it works on my machine” surprises.

The CI pipeline also runs static analysis tools that check for security vulnerabilities, code quality issues, and potential performance problems. These checks happen automatically on every change, providing an additional layer of quality assurance beyond what the tests themselves cover.

Context is king

A major key to success with AI is learning to provide it with the right context for any given prompt. Even with today's massive context windows, the problem of "context rot" is still an issue. Context rot occurs when the AI loses track of the relevant information as the conversation grows longer, and then the quality of its responses deteriorates.

We counter this by providing concise, and relevant context for each prompt, along with progressive disclosure which allows the model itself to decide what information it needs to load into its context window, thus minimising overload and context rot. We also use techniques like summarizing previous interactions and maintaining a consistent style guide to help the AI stay on track.

When the context is managed correctly, the model is able to focus clearly on the task in hand using its context window effectively. It can build code that aligns correctly and consistently with the required architectural approach and that interfaces correctly with other solution components.

Human oversight throughout

It is worth emphasising that none of this happens unsupervised. Our experienced team of human developers control all of the above steps and recieve notifications as tasks complete. They define the architecture, review the specifications, configure the skills and hooks, design the test strategies, and review the output at every stage. Claude Code is an extraordinarily capable tool, but it operates within boundaries that our team sets and monitors.

We review every significant piece of generated code. We validate that the architecture decisions are sound. We confirm that security requirements are met. We check that the implementation matches the approved prototype and specification. The AI does the heavy lifting of writing and testing code, but the engineering judgement — the decisions about what to build, how to structure it, and whether the result is good enough — remains firmly with our team.

This is what separates professional AI-powered development from vibe coding. The AI is faster and more consistent than manual coding, but it needs experienced professionals to direct it, validate its output, and make the architectural and design decisions that determine whether a system will be maintainable, secure, and fit for purpose over the long term.

The result

The combination of prototypes, specifications, Claude Code with skills, hooks, agents and sub-teams, test-driven development, and continuous integration produces software that is delivered in weeks rather than months, costs 80–90% less than traditional development, and arrives with comprehensive test coverage and consistent code quality from day one.

It is not magic. It is a well-engineered process that uses the best available tools in a disciplined way, guided by decades of experience in building commercial software. The AI handles the volume and velocity of coding. Our team handles the thinking, the quality, and the relationship with your business. Your team will spend less time and money working with us on the project and can focus on user acceptance and migration.

If you would like to see this process in action, get in touch. We are always happy to walk through how we would approach your specific project.