Engineering Notes · Agentic QA

Write Once, Break Twice: Agentic QA Across React Native's Two Runtimes

React Native sells you one codebase. It does not sell you one runtime. Your JavaScript runs on top of two different native stacks, and the bugs that matter most are the ones that appear on exactly one of them. The agentic win here isn't driving the app — it's running the same intent on both platforms and noticing when they quietly disagree.

5 June 202611 min read

TL;DR

React Native's defining failure mode is platform divergence: the same code, the same intent, a different result on iOS than on Android. A suite that tests one platform and assumes the other is fine ships the bug.
testID is the one locator contract that spans both platforms — it maps to the accessibility identifier on iOS and the resource-id/test tag on Android. Keep it honest in one place and both platforms benefit.
The bridge is the flake source. Detox synchronizes against native and JS idleness; where it still flakes, the agent must watch for it rather than retry blindly.
Reset state once, mock the network once, run the same agent intent on both targets, and diff the behaviour. Versioned config carries shared screens plus per-platform overrides; artifacts come back per platform.

The promise and the catch

React Native's promise is one codebase. The catch every team learns the hard way is that 'one codebase' is not 'one app'. Your JavaScript executes on top of two native runtimes — UIKit and the iOS rendering path on one side, the Android View or Compose hierarchy on the other — bridged by a layer (the old bridge, or the New Architecture's JSI, Fabric, and TurboModules) that behaves differently on each. A date picker, a keyboard avoidance view, a list-scroll momentum, a permission prompt: each can be subtly or completely different across the two.

So the bugs that hurt are platform-specific. The feature works on the iOS simulator the developer built it on, and breaks on a Samsung in a way nobody saw because nobody looked. A test suite that exercises one platform and trusts the other to match is, structurally, a suite that ships those bugs.

If you've read our iOS and Android posts, the seven-lever spine carries straight over — state, boundary, locators, maintained actions, anomaly watching, versioned config, artifacts. What React Native adds is a job none of the single-platform posts had: running the same thing twice and caring about the difference.

1. testID is the one contract that spans both platforms

On iOS the agent resolves elements through the accessibility identifier. On Android it leans on resource-id and content-description. React Native gives you a single prop that resolves to both: testID. Set it once on a component and, configured correctly, it surfaces as the accessibility identifier on iOS and as the resource-id (or test tag) on Android. That single prop is the most valuable locator contract in cross-platform mobile, because it lets the agent address the same control by the same name on both runtimes.

The discipline is to treat testID as a first-class part of the component API, not an afterthought sprinkled in when a test fails. A component without a testID is a control the agent can only find by visible text — localised, copy-dependent, and prone to collision — and it's a control a screen-reader user on either platform may struggle with too.

The agent's job is to keep the contract honest on both sides at once. Before a run it walks the tree on each platform and flags testIDs that are missing, that resolve on iOS but not Android (a frequent asymmetry, often a misconfigured native mapping), or that have drifted. Because the prop is shared, a fix lands once in the component and both platforms benefit — which is exactly the leverage React Native is supposed to give you, finally applied to testing.

One testID, two platforms. The leverage of a shared codebase only reaches your tests if the locator contract is shared too — and the most common asymmetry we find is an identifier that resolves on iOS and silently doesn't on Android.

2. The bridge is where the flake lives

A scripted React Native test flakes for a specific reason: it acted before the app was ready. A network call resolved on the JS thread but the native view hadn't re-rendered yet; an animation was mid-flight; the bridge hadn't drained its queue. Detox, the framework we most often drive here, exists largely to solve this — it synchronizes against native and JavaScript idleness so the test waits until the app has genuinely settled before it acts.

It mostly works, and where it works the agent inherits stable interaction for free. But synchronization isn't total: timers, certain animations, long-running JS, and some third-party native modules can leave Detox thinking the app is busy when it's idle, or idle when it's busy. That gap is where the agent's flake hides.

So the agent doesn't treat 'the tap did nothing' as a cue to tap again. A blind retry on a bridge-synchronization gap is how you get a double-action that lands on a now-ready control and reads as success while hiding a real responsiveness bug. Instead the agent observes: did the tree change after the action settled? If not, that's a tool or synchronization anomaly to surface, not a perception puzzle to re-reason. The same screenshot-diff gate we use elsewhere applies — confirm the screen changed before spending another model call.

3. Reset state and mock the network once, for both

Everything the single-platform posts said about state still applies — a fresh process is not a fresh app, and AsyncStorage, the keychain/keystore, SQLite, and cached files survive between tests. The React Native advantage is that much of that state lives behind JavaScript abstractions you can reset from one place. A test-only reset path in JS — clear AsyncStorage, wipe the secure store, reset the navigation stack to a known seed — works identically on both platforms and is the cleanest per-test reset you can give the agent.

Network mocking gets the same leverage. Because the app makes its requests through a JS networking layer, you can intercept and serve fixtures once — at the fetch/XHR boundary — and have it apply to both runtimes. Key the fixtures to named scenarios ('empty feed', 'payment declined', 'server 500') and let the agent select one via launch config, so it drives the same reproducible journey on iOS and Android from a single set of fixtures.

The principle from the other posts holds: mock at the boundary, not in the business logic, so every screen, parse path, and transition stays real. The React Native bonus is that 'the boundary' is one JS seam instead of two native ones — write it once, get determinism on both platforms.

Write the reset and the mocks once in JavaScript and they cover both runtimes. The shared-codebase leverage that makes React Native attractive for shipping is the same leverage that makes its test harness cheaper — if you put the seams in the JS layer.

4. Screen actions: shared intent, per-platform reality

Screen actions — 'sign in as a returning user', 'open the third feed item', 'pull to refresh' — are written at the level of user intent, and most of them are genuinely shared across platforms because the component tree is shared. That's the ideal: the agent calls one named action and it does the right thing on both runtimes.

But some actions can't be shared, because the platforms genuinely differ. A native date picker is a wheel on iOS and a calendar dialog on Android. A permission prompt is a different sheet with different copy. Back navigation is a gesture or system button on Android and an edge swipe or nav bar on iOS. The screen action layer handles this with per-platform overrides: a shared action with a small platform-specific branch where reality forces one.

When a screen changes and an action breaks, the agent does the same thing it does everywhere — detects the break, walks the new screen, proposes an updated action — but now with an extra check: did this change affect one platform or both? An override that drifts on Android while iOS stays put is itself a signal. The agent proposes the diff; a human approves it and records a new golden trace per platform before it ships.

5. Watch for divergence — the failure that only happens on one side

Every anomaly the single-platform posts watch for still applies here, per platform. But React Native adds the headline check, the one that justifies running the suite twice: platform divergence. We run the same agent intent on both runtimes and diff the outcome, and we flag it when they disagree in a way the scenario didn't expect:

Behavioural divergence. The same action produces a different result — a form submits on iOS and silently no-ops on Android, a deep link lands on the right screen on one platform and the wrong one on the other.
Rendering divergence. A layout that fits on iOS clips on Android (or vice versa) at the same logical size — caught by a cross-platform screenshot diff even when the target element still resolves on both.
Timing divergence. A transition that settles promptly on one runtime and drags on the other, often a sign of a bridge or native-module cost that only bites one platform.
Capability divergence. A native module or permission that behaves differently — a camera, a biometric prompt, a notification — where the JS code is identical but the native reality isn't.

None of these necessarily fails a single-platform run; the iOS test can be perfectly green while Android is broken. That's the entire point of testing both together. The divergence report — same intent, different outcome — is the artifact that catches the class of bug React Native is most prone to and most likely to ship, because the developer only ran one platform.

6. One config, two targets

The agent's declarative map — screens, testIDs, named actions with their per-platform overrides, mock scenarios, the platform matrix, bounded step counts — lives in one versioned config. Treat it as code: in the repo, reviewed, every change a traceable diff. The structure mirrors the codebase it tests: mostly shared, with explicit per-platform overrides where the runtimes force them.

This closes the loop with divergence watching. When a run surfaces an unexpected platform difference, the agent proposes a versioned change — here's the action that now needs an Android override, here's the testID that stopped resolving on one side, here's the golden trace per platform that now passes. A human reviews it exactly like a code change.

Versioning answers the React Native version of the universal question: did the suite change, did the shared code change, or did one platform's native reality change underneath us? With the config versioned alongside the app, the diff tells you which — and the per-platform override structure tells you whether the change was shared or one-sided. Pin the model version, pin the config version, pin both platform targets, and a regression has nowhere ambiguous to hide.

Structure the agent config the way React Native structures the app: shared by default, with explicit per-platform overrides. When something diverges, the config diff shows you immediately whether it was a shared change or a one-platform surprise.

7. Artifacts come back per platform

Because you ran on both runtimes, you capture the artifact set twice — and the comparison between the two is itself the most valuable output. On iOS that's the xcresult bundle (screenshots, timeline, logs); on Android it's the assembled set (logcat, screenshots, failure video). On both, attach the agent's-eye screenshot and the natural-language plan it emitted per step.

The cross-platform pairing is what makes divergence debuggable. When the same intent passed on iOS and failed on Android, putting the two artifact sets side by side — the screenshot the agent saw on each, the plan it wrote on each — turns 'it works on my machine' into a concrete, reproducible difference you can hand to whoever owns the native side. It also lets you reconstruct a one-platform anomaly without needing both devices in front of you.

And the screenshot-plus-plan pair does the usual job per platform: separating a perception error (the plan describes a screen the screenshot doesn't show) from a reasoning error (the plan reads the screen right but picks the wrong move). Run twice, capture twice, and the divergence you most need to catch is sitting in the diff between the two.

What this adds up to

React Native doesn't change the seven levers; it changes what you do with them. The locator contract gets cheaper — one testID for both platforms. The state reset and network mocks get cheaper — one JS seam for both. But the watching gets a new and essential job, because the bug class React Native is most likely to ship is the one that appears on exactly one runtime, and you only catch it by running both and diffing.

The shared codebase is a real advantage, and it extends to testing — but only if you put the seams in the JS layer and only if you actually run both platforms instead of trusting one to stand in for the other. Run once and you've tested half your app while believing you tested all of it. Run twice, diff the difference, and you've caught the thing your users would have caught for you.

React Native gives you one codebase and two runtimes. The bug that ships is almost always the one that only breaks on the platform the developer didn't open — so the whole job is running both and caring about the difference.

Key takeaways

Platform divergence is React Native's defining failure mode. Run the same agent intent on both runtimes and diff the outcome, or you ship the bug that only appears on one.
testID is the single cross-platform locator contract — it maps to the iOS accessibility identifier and the Android resource-id. Keep it honest once and both platforms benefit.
The bridge is the flake source. Detox synchronizes against native and JS idleness; where it still flakes, the agent watches for an unchanged tree rather than retrying blindly.
Put reset and network mocks in the JS layer so one seam covers both platforms; structure screen actions and the agent config as shared-with-per-platform-overrides.
Capture artifacts per platform and compare them — the side-by-side of two screenshot-plus-plan pairs is what turns 'works on iOS' into a concrete, reproducible Android bug.

FAQs

Detox or Appium for the underlying driver?+

We default to Detox for React Native because its grey-box synchronization against native and JS idleness removes a large class of bridge-timing flake that Appium, as a black-box driver, can't see. Appium earns its place when you need to drive flows Detox can't reach or share a harness with non-React-Native apps. Either way the agent reasons in screen actions; the driver is an implementation detail underneath.

Does this cover the New Architecture (Fabric, TurboModules, Hermes)?+

Yes, and it matters more there, not less. The New Architecture changes the rendering and native-module paths, which is exactly where platform divergence and timing differences originate. The testID contract and the divergence-diffing approach are unchanged; what changes is that the anomalies the New Architecture introduces tend to be one-platform timing or rendering differences — precisely what running both and diffing is built to catch.

Why run both platforms every time instead of alternating?+

Because the bug you're hunting is the one that appears on only one platform, and alternating means half your runs can't see it. The marginal cost of the second platform is real but small once the harness is shared — one testID contract, one set of JS-layer mocks, one config with overrides. The cost of shipping a one-platform bug to a store and waiting on a review cycle to fix it is much larger.

We share business logic but the UIs diverge a lot. Does the shared-action model still work?+

Yes — that's what the per-platform override structure is for. Actions are shared by default and branch only where reality forces it (native pickers, permission sheets, back navigation). If your UIs diverge heavily, you'll have more overrides, and that's fine: the config makes the divergence explicit and reviewable rather than hidden. The override count is itself a useful signal about how 'cross-platform' the app really is.

Can we keep our existing Jest/component tests?+

Absolutely. Those test logic and rendering in isolation and stay as fast, cheap coverage. The agentic layer sits at the end-to-end level on real builds across both platforms, adding the state discipline, divergence watching, and artifact capture that unit and component tests structurally can't provide. We run them side by side.

Shipping React Native to both stores?

We scope agentic React Native QA to run both platforms in one pass — shared testID contract, JS-layer state and network seams, and a divergence report that catches the one-platform bug before your users do. No retries-to-green theatre.

Talk to us

About the authorVenkata Kari · Founder, GVK Technologies

Twenty years in QA leadership, most of it spent watching teams ship around a red dashboard. GVK Technologies builds and operates agentic test suites for product engineering teams across web, mobile, and API — see the case studies for measured runs against real apps.

All posts