If your company has Playwright for newer flows, Selenium for legacy coverage, and Cypress for the team that moved fastest last year, you are not alone. A mixed UI test stack usually starts as a sensible compromise. It lets teams ship, modernize gradually, and avoid a risky rewrite. The problem is that the bill for this compromise shows up later, and it rarely appears in one place.

The real cost of maintaining a mixed UI test stack is not just license spend or the number of test files. It is the combination of duplicated coverage, framework-specific debugging, CI runtime, flaky failures, test suite ownership, and the ongoing tax of keeping three mental models alive at once. If you are a CTO, QA director, engineering manager, or founder trying to decide whether to keep the stack as-is, consolidate, or migrate, you need a way to estimate that cost in practical terms.

This article breaks down the hidden maintenance costs of a split automation stack, shows a simple model for estimating them, and explains where consolidation can create leverage without forcing a big-bang rewrite. For teams already considering a migration, a platform like Endtest can be a simpler path because it uses agentic AI, imports existing tests, and reduces the ongoing burden of maintaining multiple frameworks in parallel.

Why mixed UI test stacks become expensive

Most teams do not choose a mixed stack because they love complexity. They end up there because the organization changed faster than the automation strategy.

Common causes include:

  • A legacy Selenium suite that still covers critical flows
  • New product areas started in Playwright because it is faster to author and better for modern browser automation
  • A Cypress suite built by a frontend team that preferred its local developer experience
  • Acquisitions, team splits, or vendor-driven migrations that never fully completed
  • Different testing goals, such as cross-browser coverage, component-adjacent checks, and end-to-end regression, being solved with different tools

Each framework has strengths. Selenium has broad ecosystem support and a long history. Playwright has strong browser automation ergonomics and modern selector handling. Cypress has a developer-friendly runner and tight feedback loops for certain frontend workflows. The issue is not that any one of them is bad. The issue is that every additional framework multiplies maintenance work.

A test suite does not just measure product quality. It also measures how many frameworks, patterns, and failure modes your organization is willing to pay for.

The main cost centers you should measure

To estimate the cost of maintaining a mixed UI test stack, track five buckets.

1. Duplication of coverage

Mixed stacks often test the same user journeys more than once. Login, checkout, signup, search, account settings, and other critical paths may exist in multiple frameworks because different teams wrote them at different times.

Duplication creates several forms of waste:

  • The same bug may fail in multiple suites, creating noisy triage
  • The same UI change requires updates in multiple codebases
  • No one is fully sure which framework is authoritative for a given flow
  • You may be paying for redundant CI time without additional confidence

To estimate duplication, map your top flows and ask:

  • Which flows are covered by more than one framework?
  • Which suites are required for release gating?
  • Which suites are effectively shadow coverage and rarely reviewed?

A practical metric is duplicate critical flow percentage, meaning the share of your most important flows that are covered in at least two frameworks.

2. Maintenance time per framework

This includes all the time spent writing and updating tests, not just the time spent on failures. It usually includes:

  • Locator updates after UI changes
  • Wait and timeout adjustments
  • Browser or driver upgrades
  • Changes to test utilities and shared helpers
  • Refactoring flaky or brittle tests
  • Updating assertions after product behavior changes

This is where Playwright maintenance cost and Selenium maintenance cost often diverge in practice. Playwright can reduce some friction with modern APIs, but it still needs attention. Selenium can be highly capable, but larger suites often accumulate more custom glue and browser compatibility work. Cypress can be pleasant to author, but maintaining stable test architecture still requires discipline.

The important point is that each framework has its own maintenance surface area. If one team owns Playwright, another owns Selenium, and a third owns Cypress, you are not maintaining one system. You are maintaining three.

3. Debugging and triage time

Broken UI tests are expensive because they interrupt engineering flow. Debugging a failure across a mixed stack is slower than debugging within a single standard.

Why?

  • Different reporters and logs
  • Different retry behavior
  • Different wait semantics
  • Different browser execution models
  • Different locator strategies
  • Different ways to reproduce locally

One failing checkout test might be a real product regression, a selector change, a timing issue, or a framework-specific edge case. If your team needs to remember how to diagnose failures in three tools, every red build has a higher handling cost.

The cleanest way to measure this is to track average triage minutes per failing test by framework and then multiply by monthly failure volume.

4. CI runtime and compute spend

Mixed stacks often execute the same or similar scenarios more than once. Even if compute cost is not the largest line item, it is real, especially at scale.

You should measure:

  • Total test runtime by framework
  • Number of parallel agents required
  • Container or VM minutes consumed per run
  • Retry overhead
  • Browser matrix duplication

Runtime also has an indirect cost, because slow feedback delays merges and stretches developer wait time. If your release pipeline has to run three browser automation stacks, the total cycle time may become the operational bottleneck rather than the test code itself.

5. Test suite ownership and coordination

This is the hidden cost that many leaders underestimate.

A test suite needs an owner, even if nobody formally assigned one. In a mixed stack, ownership usually fragments across teams:

  • QA maintains legacy Selenium coverage
  • Frontend engineers own Cypress tests for their area
  • Platform or automation engineers own Playwright infrastructure
  • Release managers try to interpret the combined signal

Fragmented ownership creates gaps:

  • Nobody knows which suite should be updated after a product change
  • Teams duplicate effort because they do not trust the other suite
  • Failures linger because fixes are outside the local team’s scope
  • Framework knowledge becomes siloed

If you care about test suite ownership, mixed frameworks often convert a technical problem into an organizational one.

A simple formula for estimating annual cost

You do not need a perfect financial model to get a useful answer. You need a model that is good enough to compare options.

Use this structure:

text Annual cost = maintenance labor + triage labor + duplicate coverage labor + CI compute + coordination overhead

Break each part down like this:

text maintenance labor = hours spent updating tests × blended hourly cost triage labor = failing test hours × blended hourly cost duplicate coverage labor = duplicated flows × update frequency × hours per update × blended hourly cost CI compute = test runtime cost per run × runs per year coordination overhead = meetings, handoffs, ownership churn, framework training, and review time

You can estimate labor cost using your fully loaded internal cost rate, not just salary. For many orgs, this means salary plus benefits, overhead, and management cost.

Example cost model

Imagine a team with the following profile:

  • 3 frameworks in active use
  • 120 critical UI flows
  • 35 flows duplicated across at least two frameworks
  • 40 test failures per month across all suites
  • 20 monthly hours spent on triage across frameworks
  • 60 monthly hours spent updating locators, waits, and helpers
  • 25 monthly hours spent on coordination, ownership handoffs, and review
  • 600 CI runs per month for UI tests

Now ask the more important question, not what the tests cost to run, but what it costs to keep them trustworthy.

A useful estimating method is to calculate monthly hours first, then annualize:

text monthly labor hours = 20 + 60 + 25 = 105 annual labor hours = 105 × 12 = 1,260

At a blended rate of $100 per hour, that is $126,000 per year in labor alone, before CI compute. If the failure rate and duplication are high, that number can climb quickly.

The point is not that every org will hit that exact figure. The point is that split ownership and duplicated maintenance create real staffing costs, even when the tools themselves seem inexpensive.

How to gather the data without overengineering the exercise

You can estimate the cost of maintaining a mixed UI test stack in one or two weeks if you are disciplined.

Start with your test inventory

Create a table with columns like:

  • Flow name
  • Framework
  • Owner
  • Runs per day or per release
  • Historical failure count
  • Average triage time
  • Last meaningful update date
  • Whether the flow overlaps with another framework

You do not need perfect data. You need enough to identify the worst offenders.

Separate flaky failures from product failures

This distinction matters because flaky tests are expensive in a different way than legitimate bug detection. Flakiness creates reruns, alerts, and mistrust. Product failures are valuable because they reveal regressions.

A framework with fewer false failures may still cost more if it requires more manual upkeep. Conversely, a framework with a steep onboarding curve may be acceptable if it dramatically lowers ongoing support time.

Measure change frequency in the application, not just in the test code

The most expensive suites are usually attached to parts of the UI that change often:

  • Auth flows
  • Checkout steps
  • Navigation and layout chrome
  • Feature flags and personalization
  • Dynamic tables and dashboards

If a framework makes these areas expensive to maintain, its real cost rises even if the test code looks tidy.

What mixed stacks hide in day-to-day practice

Framework-specific debugging knowledge

In Selenium, a stale element, timing issue, or driver mismatch may require one style of diagnosis. In Playwright, locator strictness and auto-waiting help, but failures still need context. In Cypress, command queueing and app-under-test constraints can change how a failure manifests.

That means every engineer who touches UI automation has to learn three failure grammars.

Different selector philosophies

Some teams use CSS selectors heavily in one framework and accessibility selectors in another. Others standardize on data attributes in one suite, then gradually drift in another.

The result is inconsistent maintenance cost. A test suite written with brittle selectors may appear cheap until the next redesign.

Here is a simple Playwright pattern that is usually easier to maintain than raw CSS chains:

import { test, expect } from '@playwright/test';
test('can sign in', async ({ page }) => {
  await page.goto('https://example.com/login');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('secret');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
});

That style still requires maintenance, but usually less than a long brittle selector path. Across three frameworks, consistency matters more than the isolated syntax choice.

Divergent retries and waits

One stack might retry aggressively, another might not. One might hide transient issues, another might expose them immediately. This is useful when intentionally configured, but confusing when it is accidental.

You need to know whether failure rates are comparable across frameworks, otherwise you will misread the signal.

When keeping multiple frameworks is justified

A mixed stack is not always a mistake. Sometimes it is the correct short-term answer.

You may want to keep multiple frameworks if:

  • You have a large legacy suite that is still providing value
  • You need to support different ownership models during a transition
  • One framework is best for a specific use case, such as component-adjacent browser checks versus broader end-to-end coverage
  • Regulatory or release risk makes a gradual migration safer than a rewrite
  • You do not yet have a clear consolidation target

The key is to treat mixed-stack maintenance as a planned cost, not an accidental byproduct.

If you cannot explain why a test exists in a specific framework, you probably cannot defend the cost of keeping it there.

When consolidation starts to pay for itself

Consolidation becomes attractive when one or more of these are true:

  • More than one framework covers the same top-tier flows
  • Failures take too long to triage because the team is context-switching
  • The organization has one or two people who deeply understand one framework, but everyone else is mostly guessing
  • CI time is growing faster than product coverage
  • Test maintenance is blocking feature delivery or migration work
  • Teams are rewriting the same scenarios in different tools instead of improving coverage quality

At that point, the question is not whether one framework is objectively better. It is whether the organization can afford the overhead of keeping all three alive.

A practical migration path if you want to reduce the stack

The worst migration plan is the one that requires a complete rewrite before any value appears. That is where projects stall.

A better approach is incremental:

  1. Identify the most duplicated or brittle flows
  2. Pick a target platform or framework standard
  3. Move one slice of coverage at a time
  4. Keep the old framework running until the new coverage is trusted
  5. Retire the old suite only after stable overlap and sign-off

For teams with a lot of existing Selenium, Playwright, or Cypress assets, Endtest AI Test Import is designed to reduce rewrite friction by converting existing tests into editable, platform-native tests. It supports importing Selenium, Playwright, Cypress, JSON, or CSV files, then maps selectors and steps into runnable tests inside the platform. That makes it a credible option when the main problem is not test creation, but the operational cost of maintaining fragmented suites.

If you are specifically moving away from Selenium, the migration documentation is worth reviewing because it shows a path for bringing Java, Python, and C# suites over in minutes rather than rebuilding everything by hand.

Where self-healing changes the cost equation

A major part of UI maintenance is locator drift. If your UI changes often, even a well-designed suite accumulates ongoing repair work.

That is where self-healing can materially reduce maintenance burden. Endtest’s self-healing tests use agentic AI to recover when a locator no longer resolves, choose a replacement from surrounding context, and keep the run moving. The practical impact is not that tests become magic, but that common UI changes stop turning into constant manual repairs.

This matters most for teams with one of these profiles:

  • Frequent class or DOM reshuffles
  • Legacy suites with many brittle locators
  • Mixed stacks where a significant portion of effort goes into babysitting failures
  • Teams trying to preserve coverage while reducing active maintenance

Self-healing is not a reason to ignore test design. It is a way to reduce the maintenance tax caused by unavoidable application change.

A decision checklist for leaders

Use this checklist to decide whether to keep, simplify, or migrate a mixed stack:

  • Do we know how much duplicated coverage we have?
  • Can we estimate monthly triage time by framework?
  • Are we paying for three browser automation cultures instead of one?
  • Is any framework a legacy holdover with declining ownership?
  • Do our CI runtimes slow down release decisions?
  • Would a smaller standard set of tests reduce coordination cost?
  • Can we migrate incrementally, without a rewrite freeze?

If the answer to several of these is yes, the cost of maintaining a mixed UI test stack is probably higher than it looks on paper.

The bottom line

The cost of maintaining a mixed UI test stack is rarely visible in a single budget line. It is distributed across engineering time, QA triage, CI minutes, duplicated flows, and ownership drift. That is why teams often underestimate it until the test suite becomes a second product of its own.

If your organization is carrying Playwright, Selenium, and Cypress in parallel, you should measure the maintenance cost as a first-class operational expense, not an afterthought. Start with duplicated coverage, triage hours, and CI runtime, then add the cost of ownership fragmentation. Once you have that number, you can make a rational decision about whether to keep the mix or consolidate.

For teams that want to reduce the burden without throwing away existing coverage, a platform like Endtest can be a practical consolidation path because it supports AI Test Import, self-healing, and incremental migration from existing Selenium, Playwright, and Cypress assets. That makes it easier to turn scattered, fragile, or duplicated UI automation into a more manageable test suite with clearer ownership.

If you are evaluating your next step, compare the operational cost of what you have now against the cost of a smaller, more unified system. In many cases, that comparison is where the real answer becomes obvious.