Testing

Beyond XCTSkip: Stop Flaky iOS Tests from Blocking Your Team

By Ventsi TsachevVentsi TsachevApril 24, 2025

For many iOS development teams, this scene plays out far too often: a critical pull request is good to go, the merge queue starts rolling, and... CI breaks. The culprit? Not a genuine regression, but that one unpredictable test – the one that usually passes.

Flaky tests in Continuous Integration (CI) aren't just annoying; they actively sabotage your team's productivity and slow down feature releases.

Many teams attempt workarounds like retries or detection dashboards, but the specific hurdles of the iOS ecosystem mean these fixes often fall short. This pain isn't just theoretical; iOS teams feel it acutely, and it calls for a more fundamental solution. We'll explore why flaky tests are particularly challenging for iOS teams and how shifting from reactive patches to automated mitigation can bring back reliable CI and boost your team's speed.

Why Flakes Hurt More in iOS projects

Mobile projects have inherent complexities that make flaky tests especially painful:

  • The High Cost of Execution: Launching simulators or managing physical device farms is significantly slower and demands far more resources than typical backend test environments. Even with parallelization, the overhead for each test run is significant. Every flaky retry wastes valuable compute time and pushes feedback further down the line.

  • XCTest's Fragile Nature: UI testing, particularly with XCTest, is notoriously flaky. Things like asynchronous operations, unexpected system pop-ups (hello, location permissions!), animation timings, network glitches, or slight variations in interaction coordinates can easily cause tests to fail unpredictably.

  • Environment Volatility: A new iOS version, an Xcode update, or even a change in a third-party dependency can suddenly break existing tests, even if your app code hasn't changed. Debugging these environment-specific flakes takes a frustrating amount of time.

  • Merge Queues Magnify the Blockage: In active repositories, a single flaky test failing in the merge queue doesn't just affect one developer; it blocks everyone waiting in line, slowing productivity to a crawl for completely unrelated changes.

  • The "Works on My Machine" Nightmare: Flakes that only appear in the specific context of CI, but pass reliably locally, are incredibly tough to reproduce and fix, often demanding painful deep dives into CI logs and environment setups.

When Common Tactics Aren't Enough

Recognizing the pain, many teams implement measures like:

  • Test Retries: Automatically re-running failed tests or entire jobs.

  • Detection Dashboards: Identifying tests that fail frequently.

  • Manual Triage: Alerting teams via Slack or requiring manual investigation/disabling of failed tests.

While these steps provide some visibility, they fundamentally fail to solve the core problem on iOS workflow:

  • Retries Mask, Don't Fix (and Cost): Automatic retries cost a lot in slow iOS CI environments. Worse, they often just mask underlying instability in the test or application code, allowing potential problems to linger and grow.

  • Detection Isn't Prevention: Simply knowing a test is flaky doesn't prevent it from failing right now and jamming the merge queue. This inevitably leads to alert fatigue (engineers start ignoring warnings) or requires constant manual fixes (like temporarily slapping on XCTSkip directives).

  • Manual Triage Doesn't Scale: Relying on platform teams or QA to manually disable/re-enable tests creates bottlenecks, especially when a merge queue is blocked and pressure is high. It often results in developers scattering XCTSkip flags throughout the codebase as a quick fix, piling up technical debt and potentially masking real bugs. Skipping tests might unblock the queue, but it blinds teams to the very issues those tests were written to catch, leading to long-term signal loss.

  • Erosion of Trust: When CI constantly fails because of flakes, developers naturally start distrusting the test suite. Real bugs might get dismissed as 'just another flake,' delaying crucial fixes.

The Shift: From Reactive Fixes to Automated Mitigation

To truly tackle flaky tests, teams need to move past simple retries and detection, embracing automated, system-level mitigation. An effective system should:

  1. Detect Flakes Reliably and Automatically: Identify flaky tests with high precision using historical data and run context, before they cause widespread disruption.

  2. Quarantine Without Code Changes: Automatically isolate known flaky tests from blocking CI runs (especially on critical paths like merge queues) without forcing developers to tweak test files (XCTSkip) or commit temporary hacks. Flake management shifts to being an infrastructure issue, decoupled from the code itself. This approach avoids technical debt and keeps the codebase clean while allowing infrastructure to respond dynamically to instability.

  3. Provide Contextual Visibility & Auditability: Offer clear insights into which tests are quarantined, why, when, and on which branches or CI jobs. Maintain a history for transparency and tracking fixes.

  4. Integrate Seamlessly with CI/CD & Workflows: Connect directly with systems like GitHub Actions, GitLab CI, Jenkins, etc., and tools like Slack, Linear and Jira. Developers need clear visibility into flake status and the ability to manage quarantines easily, right within their existing workflows.

Reclaiming Velocity: The Benefits of Automated Flake Mitigation

Implementing a system that automatically mitigates flaky tests delivers compounding returns:

  • Reliable CI, Faster Merges: Engineers waste far less time chasing down unpredictable failures. Builds get faster and become more reliable.

  • Smooth Merge Trains: Known flakes are automatically prevented from blocking unrelated changes, reducing developer frustration and wasted CI cycles.

  • Empowered Platform Teams: Manual triage vanishes, replaced by an automated, trackable system. This frees up platform engineers to focus on strategic infrastructure work instead of constant firefighting

  • Restored Confidence in Tests: When CI fails, it's far more likely to be a real issue, encouraging prompt investigation and better testing practices.

The Real-World Impact: Consider a large mobile organization that implemented automated flake quarantining with Trunk. In just three months, the system automatically identified and quarantined over 135,000 flaky test executions within their CI jobs. Based only on average CI job time and typical retry counts, this saved at least 4,000 developer hours. But when they factored in the reduced context switching, investigation time, and merge queue jams, their internal estimates put the real productivity gain closer to 12,000 hours. That's like getting back the work of 2-3 full-time engineers previously bogged down by flake-related headaches.

Quick Check: Is Your Team Drowning in Flakes?

Does this sound like your team?

  • Is your codebase littered with XCTSkip or similar flags added solely to unblock merges?

  • Do you have dashboards identifying flakes, but those tests still regularly block CI?

  • Do platform or QA engineers spend significant time manually disabling/re-enabling tests?

  • Is "rerun and pray" a common reaction to CI failures?

  • Is it hard to answer questions like: 'How many different flaky tests are hitting main?' or 'Is this failure something new, or just a known flake acting up again?'

If several of these sound familiar, it's a clear sign your manual processes can't keep up and you need a solution that tackles the root cause.

Treat Flakes Like the Infrastructure Problem They Are

Flaky tests might be an unavoidable side effect of complex iOS development, but they don't have to constantly disrupt your team and waste effort. By recognizing them as an infrastructure problem – one that needs automation, predictability, and clear tracking – teams can finally move beyond flimsy patches.

Putting systems in place that smartly detect and automatically sideline flakes before they derail developers turns CI from a bottleneck into a reliable engine for shipping faster. This reclaims precious engineering time, lifts team morale, and ultimately speeds up delivery.

The good news? You don't have to build this capability from scratch anymore. Tools like Trunk Flaky Tests are built specifically to deliver the automated detection, code-free quarantining, and smooth integration required to truly solve the flaky test problem at its source.

Ready to stop losing hours to flaky tests? Learn more about Trunk Flaky Tests.

Try it yourself or
request a demo

Get started for free

Try it yourself or
Request a Demo

Free for first 5 users