Keeping CI Green with

Keeping CI Green with
Keeping CI Green with
Trunk Merge Queue
Metabase

How metabase manages one of the largest Cypress test suites

With 14 days data for free

Challenges

Metabase runs an extremely large Cypress test suite with 4000 end-to-end tests. The team runs approximately 70 CI runners in parallel for every pull request, taking 30-40 minutes to complete. When tests fail due to flakiness, developers lose another 20-30 minutes to reruns.

Solution

Ryan explains. "It's really nice to be able to go into the dashboard and see—oh, this has flaked on 15 different PRs today. This is actually impacting the organization and needs to be either quarantined or fixed quickly."

Results

While flaky tests remain what Ryan calls "a garden that needs tending," Trunk provides the tools to tend it effectively. "It gives us high-level observability into the fact that tests are flaky and why they are flaky," he summarizes.

It's really nice to be able to go into the dashboard and see—oh, this has flaked on 15 different PRs today. This is actually impacting the organization and needs to be either quarantined or fixed quickly.
Ryan LaurieSoftware Engineer

Who is Metabase

Metabase is an open-source analytics and data visualization tool that lets anyone in a company explore and understand their data. With thousands of companies using it in production, Metabase connects to dozens of different database engines (from MySQL and Postgres to BigQuery and Databricks) and makes it easy to create dashboards and visualizations.

Testing at Massive Scale

"All of our merges to master are gated behind a huge suite of front end, back end and end-to-end tests," explains Ryan Laurie, a front-end engineer at Metabase. "If something's flaky, it blocks merge, which makes engineers unhappy."

The complexity comes from Metabase connecting to dozens of different database engines while maintaining a consistent user experience across all integrations.

"Metabase is a very user experience-heavy application," notes Eliot Daigneault-Mestre. "Some things are very difficult to test without fully setting up an end-to-end test. It's much easier to set up an end-to-end test than to mock all dependencies at the integration test layer."

Build vs Buy

When Metabase's previous CI analysis provider broke due to Cypress API changes, the team considered building an internal solution. They quickly recognized the engineering overhead.

"We have a data pipeline that collects test results into our own database, but it's pretty limited," Ryan shares. "We'd rather have you handle that analysis than build it ourselves—we have other things to focus on."

Trunk's flaky test detection offered resilience to API changes and statistical analysis without the build cost.

Data Over Guesswork

Before Trunk, developers couldn't tell if a test failure was their problem or a systemic issue affecting the entire team. Ryan describes it as preventing developers from "gaslighting themselves" about whether a failure is related to their code.

"As a developer, I've seen this test flake twice on my PRs today," Ryan explains. "It's really nice to be able to go into the dashboard and see—oh, this has flaked on 15 different PRs today. This is actually impacting the organization and needs to be either quarantined or fixed quickly."

The visibility is crucial for prioritization. The team can now distinguish between tests that fail once in a hundred thousand runs versus those that flaked 45 times in a single day.

Debugging at Scale

Trunk's aggregation of test failures proves particularly valuable for debugging issues that can't be reproduced locally.

"When a test flakes with the same error message repeatedly, I can identify patterns—especially for flakes that don't reproduce locally due to CI runner speed or environment differences," Ryan notes. "Having access to aggregated CI errors is really helpful."

The platform integrates with Metabase's existing workflow. Through webhooks and Linear integration, flaky tests become trackable issues. The PR comment feature provided developers with feedback about test failures directly in their pull requests.

Iterating Together

As an early beta partner, Metabase shaped the product's evolution. "It was great to have feedback and then a week later have a feature," Ryan recalls. "We worked pretty closely on that PR comment feature... it's been fun to see this integration evolve."

The Results

For teams facing similar challenges, both engineers advise: understand why your tests are flaking before fixing them, and treat flaky tests with high priority, even if that means deleting tests that aren't providing value. As Metabase continues to scale its testing infrastructure, Trunk Flaky Tests remains an essential tool for maintaining developer productivity and code quality across their organization.