Engineering Notes · Agentic QA

The Atomic Developer: Maintaining Balance in the Age of AI Agents

AI agents can multiply what one developer ships in a day. Your attention span hasn't moved an inch. This is a field report on the working rhythms, the safety gates, and the instincts I had to earn the hard way — the things that let me work at agent speed without being wrecked by Friday afternoon.

12 June 202610 min read

TL;DR

The job stopped being about typing faster. It became about holding the whole system in your head while the machines do the actual building. I learned this the day an audit showed me nine feature directories that all looked finished and were all completely empty.
The shower principle is real. With agents running in the background and the rules written down, my most useful thinking often happens away from the desk. The walk isn't time off. The walk is the work.
Working this way without burning out comes down to three habits: guarding your attention before something else eats it, running an honest weekly look back at what actually went wrong, and learning a part of the system yourself before you ever hand it to an agent.
Verification gates — the local checks, the numbers you commit to before you run, and a reviewer agent that never sees the implementer's reasoning — aren't red tape. They are the only reason the fast pace doesn't collapse on you a week later.

The Job Quietly Changed Under Us

Something big shifted in software work over the past two years, and most of us felt it before we could name it. One developer can now do the work of a small team. The tools got dramatically more capable almost overnight. The person using them did not. We still have the same attention span, the same need to sleep, the same brain that can only really hold one hard problem at a time.

So the bottleneck moved. It used to be how fast your fingers could turn an idea into code. Now the code is the cheap part. The expensive part is your judgement — knowing what to build, spotting when something is quietly wrong, deciding what 'good' even means here.

Day to day, that means I spend far less time writing loops and boilerplate and far more time doing something closer to direction. I set the goal. I write down how we'll know it worked. I make the calls that need taste rather than typing. The agents handle the rest.

The moment this really landed for me was during an architecture audit on a privacy-first health app I was building. The project had a clear, documented structure on paper — a tidy feature-sliced layout with view models, repositories, and services neatly scoped to each part of the app. So I pointed a small team of agents at the actual code and asked one simple question: does what we built match what we said we'd build?

The answer was brutal. Nine feature directories existed. Every single one held nothing but a placeholder file. The real screens had been quietly thrown together up in the routing layer, with fake data hardcoded straight into the components — little arrays and constants sitting inline where a real backend should have been. There was no backend wired in at all. The gap between the plan and the reality wasn't a crack. It was the whole floor.

Here's the part that stuck with me. The agents didn't find that because they were cleverer than me. They found it because I hadn't been looking. I'd had my nose pressed against the code for weeks, shipping small wins, and I'd lost sight of the shape of the thing. Stepping back to write the spec, run the audit and read the findings — that was my actual job. The reading and the diffing were theirs.

That's the whole shift in one sentence. My job is no longer to write the code. It's to keep the system honest with itself, and to let the machines do the checking.

Your Best Ideas Don't Arrive at the Desk

For years I believed the same thing most developers believe: that peak productivity meant being glued to the editor, deep in focus mode, headphones on. And yes, intense focus does get things done. It also gives you tunnel vision. Looking back over the past year, almost every good architectural decision I made arrived in the twenty quiet minutes after I shut the laptop — not during the eight hours in front of it.

There's a name I use for this: the shower principle. It's that maddening thing where the answer to a problem you've been chewing on all morning turns up while you're walking the dog or waiting for the kettle. The reason it works now is simple. Because the agent runs on its own while I'm gone, I no longer have to choose between stepping away and getting work done. I get both.

Hand the agent a real, meaty task — and write down exactly what 'done' looks like before you go.
Physically get up and leave the desk. No half-measures, no watching from the sofa.
Come back to a finished pull request, read what happened, and steer it from there.

One of the hardest stretches of a recent project was a full machine-learning validation run. I had to train a probabilistic time-series model, convert it into a format that runs across platforms, prove the numbers still matched on two different device simulators, and then test it against data it had never seen. Each step had to clear a bar before the next one was allowed to start.

The old version of me would have sat in the terminal for eight hours, babysitting training runs and squinting at logs. Instead, I wrote the pass criteria into a sprint file, told the agent to go, and went for a long walk. I came back to a pull request. The agent had got the numbers matching across both platforms, updated the sprint file, and flagged two odd edge cases it wanted me to look at. All I had to do was read the report, sanity-check that its reasoning held up, and approve the model.

That walk wasn't a break I felt guilty about. It was where I worked out the scope for the next two weeks. I came back clear-headed — and I would not have been if I'd spent those hours watching logs scroll past.

Protect Your Attention Before Something Else Spends It

The fastest way to ruin a day is to let it get chopped into pieces. Gloria Mark's research at UC Irvine found it takes over twenty minutes to get back into deep focus after an interruption. Now count how many times a normal developer switches tasks in a day — it's dozens. Do that maths and you don't get a productive day. You get an exhausting one that produces almost nothing you're proud of.

What works for me is delegating the triage. Before I sit down for a real block of work, I hand the agent my backlog, my open issues, the pile of messages waiting on me. Then I ask it one thing: what here genuinely needs my brain today, and what can wait or be handled with a quick template reply?

On the health app, this took the shape of a sprint folder where the agent kept one live status file. Before any session, it read that file, found the most important task that wasn't blocked, and started there — not on whatever we happened to be talking about last. The file was the filter. My job was to update it when priorities moved, not to re-explain the entire project from scratch every single morning.

The thing to hold on to is this: the agent is the filter, not the decision-maker. You decide what matters. It sorts everything else against that decision and keeps the noise off your desk.

Talk to Your Agents Instead of Typing at Them

If you're still typing out every prompt by hand, you're working at a fraction of your real speed. Voice input isn't just quicker — it changes what you say. When you talk, you naturally explain why you want something, not just what you want. You ramble in a useful way. And that messy, high-context, thinking-out-loud instruction is exactly the kind of input that gets you a better result from the agent.

The bigger win is that your hands come free. I can keep three agent windows moving at once — one building, one half-written, one being reviewed — without the constant little tax of clicking between them and finding my place again. The limit stops being how fast I can type. It becomes how clearly I can think, which is where the limit should have been all along.

Your Chat Logs Are a Mirror — Read Them

Your AI conversation logs are quietly one of the most useful things you own. They're a recording of how you actually think: where you got stuck, where you were vague, where the agent went off and you had to drag it back. Every correction you made is a little flag planted over a spot where your instructions weren't clear or your picture of the system didn't match the real thing.

So every week I run a short retrospective, and I use the agent to do it. I feed it the week's logs and ask three plain questions. Where did we burn the most time just figuring things out, which usually means the spec was fuzzy or the problem was genuinely new? Where did the agent build something I then had to heavily rewrite, which usually means I briefed it badly? And what kept coming up again and again that I could turn into a reusable skill?

The answers are reliably humbling. Several of the worst debugging slogs of the past year traced straight back to one woolly line in a spec — a line I wrote myself and never reread once the work kicked off. So the retro isn't really a productivity trick. It's an honesty check. And it pays for itself twice over, because the patterns that keep costing me time get turned into named skills that cost nothing the next time round.

Earn Your Scars Before You Hand It Over

The worry I hear most from developers circling this way of working is always the same one: if the machine writes all the code, don't I lose the ability to write it myself? It's a fair worry. The fix is more boring than people expect. Don't hand a task to an agent until you've done it by hand enough times to know, in your gut, what good looks like.

You need that gut feel for one reason. It's the only thing that tells you when the agent is making things up, chasing the wrong goal, or proposing something that's elegant on paper and completely wrong for your particular system.

I learned this the hard way on the machine-learning side of the health app. The model at the heart of it uses a Kalman filter — a very specific bit of maths with its own quirks around how it settles, how it handles noise, and how it updates its guesses. During a review, the agent described how the algorithm behaved and quietly used the wrong name for it. Not wildly wrong. A cousin of the right answer, a similar kind of estimator. But wrong enough that anyone trusting it would have gone off and read the wrong textbooks and walked away convinced they understood a system they didn't.

I only caught it because I'd sat down and worked through the maths myself. If I'd handed the whole thing to the agent without ever learning it, I'd have nodded along and signed off a confident, plausible, completely incorrect description of my own system's core. That's the trap. The wrong answer didn't look wrong. It looked great.

And the good news sits right next to the bad. Once you do have that mental model, using AI to learn faster is one of the best things you can do with it. Ask it to quiz you. Ask it to explain why a solution works, not just hand you the solution. Ask it to throw awkward edge cases at you to find the holes in your understanding. Use it to learn faster — just don't use it to skip the learning.

Earn the mental model before you hand over the domain. It's the only thing that lets you catch a fast, confident agent being fast, confidently wrong.

Move Fast, But Build the Gates First

The more powerful the tools get, the more your safety net has to keep up. The pull to just go faster is strong and real. So is the size of the mess a confident agent can make when it has misread a single constraint and barrelled ahead anyway.

What's held up for me is a set of gates, layered one behind the other, with the rules agreed before any work starts — not invented after the fact to explain why the broken thing is fine.

Gate one, the local one. Linting, type checks, unit tests, a clean build. These pass before anything else is even on the table. The word that matters is 'before' — not 'usually', not 'when I get a minute'. On the ML pipeline this also meant a hard check that the exported model and the trained model agreed down to a tight decimal. That check ran on every single export, not now and then.
Gate two, the functional one. The agent actually drives the real thing end to end. For a mobile app that means running on a real device or simulator and walking the critical paths a user would. The trick is to write down what counts as a pass before you run it. Decide 'good enough' after you've seen the number and you'll always, somehow, find a reason the number is good enough.
Gate three, the review one. A second agent audits the first agent's work against a fixed set of rules. On the health app this was a code-reviewer that ran after every sprint, starting from a blank slate with none of the implementer's context — so it didn't inherit the implementer's blind spots. Anything serious it flagged had to be fixed before a single line could land.

None of this comes from distrust. It comes from wanting a pace I can actually keep up. Because the other option — sprinting with no gates and then losing three days to debugging whatever you shipped with such confidence last week — was never fast in the first place. It just felt fast at the time.

The Point Was Never More Hours

The future of this work isn't longer days. It's working with more intent.

The developers who'll do well in this era aren't the ones who found a way to be at the keyboard the longest. They're the ones who learned how to step away from it without losing the thread — who treat their agents as trusted teammates with clear rules, not as magic that works right up until the moment it doesn't. They put the time into the boring durable stuff: the specs, the decision records, the weekly retros. The artefacts that make the whole setup get stronger over time instead of slowly rotting.

The tools are new. The idea underneath them is ancient. Doing good work for the long haul has always meant guarding your attention, building systems that don't lean on you remembering everything, and being honest with yourself about where you went wrong.

What's different now is the deal on offer. The cost of getting these habits right has never been lower, and the payoff has never been higher. That's exactly the moment to slow down for a second and be deliberate about how you work.

Developer balance stack: an implementation guide

Five steps to get the working rhythms in place before agent speed starts outrunning your ability to steer it.

Audit the gap between the plan and the codeBefore starting any new sprint, point a team of agents at your design docs and have them diff the intent against what's actually in the codebase. The gap is almost always bigger than you'd guess, and it's the single biggest reason agent work drifts off course.
Write down 'done' before you hand anything overFor every task you delegate, put the acceptance criteria in a file the agent reads — not buried in the chat. If you can't write them clearly in five minutes, the task isn't sharp enough to hand off safely yet. That's a signal, not a nuisance.
Commit to the numbers before you runFor any gate with a number on it — a pass rate, an error bound, a performance threshold — write the bar down before you run the benchmark. Only move it for a reason you've documented, and never after you've seen the result.
Run the weekly look-backAt the end of each week, hand the session logs to an agent and ask for three things: the three biggest time sinks, the three moments it built the wrong thing, and one pattern worth turning into a reusable skill. It's not a performance review. It's an honesty check.
Add a clean-slate reviewerAfter every sprint, run a review agent that sees only the code and the rules — never the implementer's reasoning. Anything serious gets fixed before it lands. This is the highest-impact gate you have, precisely because it doesn't share the builder's blind spots.

The developers who'll thrive aren't the ones who found a way to be at the keyboard the longest. They're the ones who learned how to step away without losing the thread.

Key takeaways

Once agents join your workflow, the bottleneck stops being how fast you type and becomes how well you judge. Your real job is to hold the system together — set the bar, write the gates, and make the calls that are genuinely yours to make.
The shower principle works. Hand the agent a task with the rules written down, get up and leave, come back to a pull request. The walk isn't slacking — it's where you do the thinking the desk was crowding out.
Before you delegate any part of the system, learn it by hand first. That gut-level model is the only thing that catches a confident agent being wrong — a misnamed algorithm or an invented constraint is invisible without it.
A weekly look-back over your session logs is the best improvement loop you've got. The worst debugging slogs almost always trace back to one fuzzy line in a spec. The retro catches it; the day-to-day never will.
Three gates — local checks, numbers you commit to up front, and a clean-slate reviewer — are what make all this speed safe. Decide what counts as a pass before you run, because if you decide afterwards you'll always talk yourself into it.

FAQs

Won't I lose my coding skills if I let agents write most of the code?+

Only if you skip the part where you earn it first. The rule is simple: do a task by hand enough times to know what good looks like before you ever hand it over. That instinct is what lets you catch a misnamed algorithm, an invented API, or an architecture that's wrong for your system specifically. Use agents to learn faster — ask for the 'why' behind a solution, get them to quiz you — not to dodge the learning altogether.

How do I stop the agent shipping something wrong while I'm away from the desk?+

Three things. Write the pass criteria into a file before you leave, not into the chat. Run a review agent that starts from a blank slate and sees only the code and the rules, never the builder's reasoning. And make it a hard rule that anything serious gets fixed before a single line lands. None of that needs you watching. It just needs the gates standing before you walk off.

What does a good weekly look-back over session logs actually look like?+

Three questions, half an hour, done. Where did the most time disappear into just figuring things out, which points at fuzzy specs or genuinely new problems? Where did the agent build something you had to heavily rewrite, which points at a weak brief? And what kept recurring that you could turn into a reusable skill? The output is three things to tighten next week — not a report card.

How much of the day should the agent be running versus me being in the editor?+

There's no magic ratio, but here's a useful test. If you can write the full acceptance criteria for a task in five minutes, the agent can probably run it without you hovering. And keep an eye on how often you catch yourself correcting a brief you already wrote — that number tells you whether your gates are sharp enough to hand off safely.

Isn't a separate review gate just too slow for a fast-moving project?+

It's only slow if you sit and wait for it. The reviewer runs alongside everything else — you're writing the next brief while it reads the last sprint. The real bottleneck is your attention, and a clean-slate review is one of the few things that genuinely protects it. A reviewer that shares the builder's blind spots isn't faster, by the way. It's just less useful.

Working at agent speed but running on empty by Friday?

We help teams put the scaffolding in place — durable artefacts, verification gates the agent can't fake, and the weekly look-back that makes high-autonomy work compound instead of burn you out. If you want the patterns that make this pace sustainable, get in touch.

Talk to us about workflows

About the authorVenkata Kari · Founder, GVK Technologies

The architecture audit, ML pipeline, and Kalman filter anecdotes are from a real cross-stack build run with an AI coding agent as a standing engineering team. Product, brand, and dataset names are anonymised; the patterns, failures, and lessons are real.

All posts