Engineering notes on agentic QA
How we actually build, operate, and trust agentic test fleets in production — written from inside live engagements, with the numbers on.
Data Privacy and Agentic AI in Testing: Aligning Your QA Stack With UK ICO Guidance
9 min read
Data privacy is the top barrier to agentic AI in QA, cited by 67%. How to align your agentic AI testing stack with the UK ICO's direction of travel.
Read the postDon't Break Checkout: Agentic QA for Revenue-Critical Retail Funnels
12 min read
On a retail site the checkout funnel is revenue, and it's fragile — payment gateways you can't hit for real, inventory and pricing that change constantly, promo-rule combinatorics, and silent funnel regressions that still pass. Here's how an agent protects the path to purchase.
Read the postSelf-Healing Mobile Test Automation in CI: What Actually Works for iOS, Android, and React Native
10 min read
Self-healing mobile test automation in CI is harder than web. What actually works for iOS, Android, and React Native in 2026: architecture and feedback timing.
Read the postTen Thousand Article Templates and Three Ad Networks: Agentic QA for Publishers
11 min read
Publisher sites render thousands of article permutations through a handful of templates, gate content behind paywalls, and load third-party ad scripts that break layout and Core Web Vitals. Here's how an agent tests content at scale — permutations, metering personas, ad determinism, and structured-data correctness.
Read the postWhat the Big 4's Agentic Testing Playbook Gets Wrong for UK Mid-Market Teams
8 min read
Deloitte, KPMG, PwC and EY built agentic testing for eight-figure budgets. Why UK mid-market teams need a different model — and the leaner alternative.
Read the postThe Atomic Developer: Maintaining Balance in the Age of AI Agents
10 min read
AI agents multiply your output overnight, but your attention doesn't. The workflows, gates and habits that keep agent-speed work sustainable, not burnout.
Read the postHallucination, Flakiness, and Trust: How to Evaluate an Agentic AI Test Agent in 2026
9 min read
Agentic test agents are harder to buy than traditional tools. A vendor-neutral twelve-point checklist to evaluate an agentic AI test agent in 2026.
Read the postThe CRM That's Different in Every Org: Agentic QA for Configurable Enterprise Apps
12 min read
Enterprise CRMs are configured differently in every org, render differently per role, and fire workflows you can't see. Selector-based tests can't keep up. Here's how an agent that reasons by intent tests a CRM — across personas, test data, integrations, and the side-effects that hide in passing runs.
Read the postTest Maintenance Is Eating Your QA Budget. Here's Where Self-Healing Actually Pays Back
8 min read
Test maintenance eats 60–80% of automation effort. Here's where self-healing test automation ROI is real, where it isn't, and the preconditions that decide.
Read the postThe Self-Driving Codebase
18 min read
Running an AI coding agent as a near-autonomous engineering team for a year — the artefacts, eval gates, and adversarial review that make the autonomy safe.
Read the postWrite Once, Break Twice: Agentic QA Across React Native's Two Runtimes
11 min read
One React Native codebase, two native runtimes, and bugs that show up on only one of them. Here's how we run an agent across iOS and Android together — the testID contract, bridge synchronization, and detecting silent platform divergence before users do.
Read the postPilot Purgatory: Why Most Agentic AI QA Projects Stall Before Production
8 min read
Only 15% of organisations have scaled agentic AI in QA. Here's why teams stall taking an agentic AI QA pilot to production — and how UK software teams escape.
Read the postEspresso, UI Automator, and an Agent: Taming the Android Device Matrix
11 min read
Android's device matrix breaks deterministic test suites in ways iOS never does. Here's how we engineer an agent driving Espresso and UI Automator across fragmentation — state reset, OEM divergence, the resource-id contract, and classifying device-specific failures from real ones.
Read the postPointing an Agent at XCUITest: The Seven Things That Decide Signal From Noise
11 min read
iOS is a harder target for agentic QA than the browser. Here's how we engineer the seven things that decide whether an agent driving XCUITest is signal or noise — state, network mocking, accessibility identifiers, screen actions, anomaly watching, versioned config, and the xcresult bundle.
Read the postMaking an Agentic Test Run Boring: Determinism, Retries, and the Flake Budget
9 min read
Agentic tests fail in a different shape from traditional end-to-end tests. Here's how to engineer a flake budget, a failure taxonomy, and the determinism levers that actually move the number.
Read the postEvals Are the Test Suite for Your Test Suite: Running Agentic QA in Production
10 min read
Once you ship agentic QA, you have two systems that can regress — the product, and the agent. Most teams only instrument the first. Here's the eval harness, golden traces, and model-upgrade protocol that keep an agentic fleet honest in production.
Read the post