Authentication tests are where otherwise clean browser suites start behaving strangely. A login passes on your laptop, fails in CI, then mysteriously passes again after a rerun. A logout test leaks state into the next file. An MFA prompt appears for one account but not another. The root problem is usually not the login form itself, it is session state that survives longer than the test author intended.

If you want to test authentication flows in browser automation reliably, you need to treat browser storage, cookies, identities, and backend session lifetimes as part of the test design, not as incidental implementation details. That means separating the flow you are validating from the state you are trying to preserve, and being deliberate about where each test gets its authenticated context.

This article walks through login, logout, MFA, and session reuse pitfalls across local runs and CI, then shows practical ways to isolate auth state so your tests stay reproducible.

What actually counts as session state

When people say “the browser stayed logged in”, they often mean one of several different things:

  • Cookies, especially session cookies or refresh token cookies
  • Local storage, where SPAs often keep tokens or user metadata
  • Session storage, which survives reloads in the same tab
  • IndexedDB, used by some auth libraries and offline-capable apps
  • In-memory app state, which disappears on reload but can affect test timing
  • Server-side session records, which may outlive the browser entirely

A test can be “clean” from the browser’s point of view and still reuse an account that the backend considers authenticated. The reverse is also true, a browser may retain cookies that point to a server session that has already expired.

The safest mental model is this, authentication is not one thing. It is a contract between browser storage, application logic, and backend session management.

That contract changes across environments. Local runs often have persistent browser profiles, saved extensions, or cached credentials. CI usually starts cleaner, but runners may reuse workspaces, containers, or authenticated artifacts if the suite is configured that way.

The testing goal: validate behavior, not accidentally depend on prior state

Not every auth-related test should log in from scratch.

You usually need to cover at least four different behaviors:

  1. Login flow, credentials, redirects, MFA, remember-me, error handling
  2. Logout flow, clearing session state and invalidating access
  3. Session reuse, persisted login across reloads or new tabs, if the product supports it
  4. Session expiry and revocation, timeout behavior, forced reauthentication, role changes

The trap is to collapse all four into a single long end-to-end test that starts with a fresh browser, logs in, performs actions, logs out, and then verifies access is gone. That sounds comprehensive, but it often creates false confidence because the test only covers one exact path and one exact timing window.

Instead, separate concerns:

  • One test validates successful login
  • One test validates invalid credentials
  • One test validates MFA challenge handling
  • One test validates logout clears access
  • One test validates a session survives a reload, if that is a requirement
  • One test validates a stale session forces a new login

This separation gives you better failure localization and makes it easier to isolate state.

Choose the right level for auth coverage

Authentication is often best tested at multiple layers:

  • UI/browser automation for the login experience, redirects, MFA prompts, and session continuity
  • API tests for token issuance, revocation, and session expiry rules
  • Unit tests for client-side auth helpers, route guards, and token parsing

Browser automation is the most fragile layer, so use it where it provides value. If you only need to verify that the backend invalidates refresh tokens, API tests are cheaper and more deterministic. If you need to verify that the login page handles an MFA challenge, browser automation is the right tool.

For general context, see browser automation and continuous integration, because auth tests tend to behave differently once they are executed repeatedly in pipelines.

A practical pattern for isolating auth state

The simplest reliable pattern is, one test, one browser context, one account state.

In Playwright, that means using an isolated browser context per test, and avoiding shared storageState unless you intentionally want to reuse it. In Selenium, it means creating a new browser profile or a fresh session for each test and not depending on long-lived user data in the same profile directory. In Cypress, it usually means clearing cookies and storage explicitly between tests, while being careful not to rely on accidental persistence from a prior spec file.

Playwright example, logging in without reusing state

import { test, expect } from '@playwright/test';
test('user can log in', async ({ page }) => {
  await page.goto('https://app.example.com/login');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('correct-horse-battery-staple');
  await page.getByRole('button', { name: 'Sign in' }).click();

await expect(page).toHaveURL(/dashboard/); await expect(page.getByText(‘Welcome back’)).toBeVisible(); });

This is intentionally boring. It creates no reusable state, it proves the login route works, and it avoids assuming anything about previous tests.

Selenium Python example, fresh profile per run

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options() options.add_argument(‘–incognito’)

driver = webdriver.Chrome(options=options) try: driver.get(‘https://app.example.com/login’) # fill form, submit, assert dashboard finally: driver.quit()

Incognito is not a universal answer, but it is a decent baseline for making sure local sessions do not silently survive from one run to the next.

Why login tests fail in local runs but pass in CI

Local and CI failures often come from different kinds of leakage.

Common local-run leaks

  • Reusing the same browser profile across test runs
  • Signed-in Google, Microsoft, or enterprise accounts in the browser profile
  • Autofill and password manager overlays changing the DOM or focus flow
  • Persistent cookies from a previous manual login
  • Developer tools or extensions affecting redirects, headers, or storage

Common CI leaks

  • Reusing a workspace or a browser profile cache between jobs
  • Reusing the same test account across parallel jobs, causing state collisions
  • Saving auth artifacts to disk and accidentally restoring them in the next run
  • Running tests in parallel against the same account, which can invalidate sessions mid-run
  • Time-based expiry that hits only in slower CI environments

A login that depends on the timing of a session cookie expiring after 29 minutes might be stable locally and flaky in CI, because the job starts later, retries later, or takes longer to reach the assertion.

Make test data and test identities explicit

The cleanest auth suites treat accounts like fixtures, not like hidden globals.

You should know for each test:

  • Which user identity it runs as
  • Whether the account is new, active, locked, or MFA-enabled
  • Whether the account has role-specific permissions
  • Whether the account has a pre-existing session or must start fresh
  • Whether the test mutates account state, such as password reset or MFA enrollment

A good test setup creates the account state it needs, then tears it down or resets it. If your app supports it, use a dedicated test tenant or environment where auth policies are deterministic. If you cannot create users dynamically, at least use a controlled pool of accounts with known characteristics.

Avoid using one “shared test user” for every auth scenario. That account becomes a dependency magnet, and any state change, password reset, MFA enrollment, lockout, or profile update can break unrelated tests.

Testing logout correctly

Logout is deceptively easy to test badly.

Many apps do one of the following after logout:

  • Clear cookies and local storage
  • Invalidate a server session
  • Revoke refresh tokens
  • Redirect to a public page

A browser-only assertion that the UI shows “Signed out” is not enough. You need to verify that privileged access is actually gone.

A solid logout test usually checks:

  1. The user can access a protected page before logout
  2. Logout action completes successfully
  3. The browser is redirected or the UI reflects signed-out state
  4. Refreshing a protected page no longer grants access
  5. Navigating back does not resurrect protected UI from cache

That last point matters more than people expect. Some applications display stale content from browser history or cached SPA state even after the session is invalidated.

Example pattern for logout verification in Playwright

import { test, expect } from '@playwright/test';
test('logout clears access', async ({ page }) => {
  await page.goto('https://app.example.com/dashboard');
  // assume already authenticated in this test setup

await page.getByRole(‘button’, { name: ‘Logout’ }).click(); await expect(page).toHaveURL(/login/);

await page.goto(‘https://app.example.com/dashboard’); await expect(page).toHaveURL(/login/); });

If your application stores auth state in local storage, a more complete check may also inspect that storage has been cleared. Be cautious, though, because over-asserting internal implementation details can make tests brittle when auth architecture changes.

MFA testing without turning your suite into a one-time-code factory

MFA is where auth automation becomes either very careful or very messy.

There are several common MFA styles:

  • Time-based one-time passwords (TOTP)
  • Email one-time links or codes
  • SMS codes
  • Push-based approval
  • WebAuthn / passkeys

Each has different automation tradeoffs.

TOTP is usually the easiest to automate

TOTP can be generated in test code if you have the shared secret for a test account. That secret should be stored securely and limited to non-production accounts.

import { authenticator } from 'otplib';

const code = authenticator.generate(process.env.TEST_MFA_SECRET!);

The key risk is time drift. If the CI environment has clock issues or the test waits too long between code generation and submission, the code can expire. Generate the code as late as possible, and avoid unnecessary waits.

Email and SMS MFA need test-specific plumbing

For email codes, use a test mailbox you can query through API or IMAP in a controlled environment. For SMS, use a provider sandbox or a test hook, not a real phone number.

Do not depend on a human reading an inbox and manually entering codes, unless the test is explicitly exploratory rather than automated.

WebAuthn deserves a separate strategy

Passkeys and platform authenticators are harder to automate in standard browser runners because they involve device-bound or OS-bound flows. For these cases, it is often better to test:

  • The enrollment path with manual validation in a controlled environment
  • The fallback path, such as password plus recovery code
  • Backend enforcement of authentication state through API or contract tests

If your product requires WebAuthn, your test strategy should explicitly document whether automated browser coverage is possible, stubbed, or intentionally limited.

Session reuse, storage snapshots, and when they help

A common optimization is to reuse a logged-in session to avoid running the full login flow before every test. This can be a good idea, but only if you control the boundaries.

In Playwright, storageState can speed up test suites by reusing cookies and local storage. That is useful for non-auth-focused tests, but it becomes dangerous when the stored state is stale, tied to a specific environment, or shared between parallel runs.

Use session reuse when:

  • The test suite needs speed more than exhaustive login coverage
  • The auth state is deterministic and short-lived enough to refresh often
  • Each worker gets its own state file
  • The application accepts restored state across the target environment

Avoid session reuse when:

  • You are testing login behavior itself
  • Sessions are sensitive to user role or tenant context
  • Parallel tests might overwrite each other’s state
  • Expiry or MFA behavior is under test

A practical rule is to separate auth setup tests from authenticated behavior tests. The first group creates reusable session state, the second group consumes it.

Keep auth state isolated across parallel runs

Parallel execution makes auth leakage much more visible.

If ten workers share one account, one test logging out or changing permissions can affect nine others. If your application invalidates old sessions when a new one is created, logging in from worker A may boot worker B out of its session.

Safer options include:

  • One account per worker
  • One tenant per worker
  • One browser context per test, with no shared persistent profile
  • One session artifact per test file or worker process

When that is not possible, mark auth-sensitive tests as serial and keep them out of broad parallel pools. It is better to run a few tests sequentially than to chase nondeterministic failures for weeks.

CI setup tips that prevent auth leakage

A few configuration choices make a big difference.

Start from a disposable browser environment

Use a fresh container, VM, or browser profile for each job. If the browser runner supports a clean user data directory, point it at a temporary directory that is deleted after the run.

Never cache auth artifacts casually

Do not store cookies or storage snapshots in general build caches. If you do reuse them, treat them like secrets and scope them tightly to the exact environment, branch, and account type.

Make session-expiry tests time-aware

If a test depends on token expiry, ensure the CI job has predictable timing. A suite that sleeps for 31 minutes to prove a 30-minute timeout is usually a poor fit for standard pipeline runs. Consider shifting expiry verification to API or contract tests, and keep browser coverage focused on the visible user experience.

Record why a test is using reusable state

If a test loads storageState, name the fixture clearly and comment why it exists. Future maintainers should know whether it is a speed optimization or a required setup for a downstream scenario.

Assertions that matter for auth flows

Auth tests often fail because they assert the wrong thing.

Good assertions include:

  • Redirect lands on the expected route
  • Protected UI is visible only after authentication
  • Logout removes access, not just a header
  • MFA challenge appears for the correct account type
  • Error messages are specific enough for users, but not overly revealing
  • Session state is cleared or rotated after sensitive actions

Weak assertions include:

  • The page loaded without checking what state it is in
  • A single element existed at some point
  • The button was clicked, therefore login succeeded
  • A token was stored somewhere, without verifying the app uses it correctly

For stateful apps, also test that privileged actions force revalidation where appropriate, for example after password change, role update, or session revocation.

A maintainable test structure

A clean auth suite usually follows this structure:

  • auth.login.spec.ts, validates sign-in behavior
  • auth.logout.spec.ts, validates sign-out and revocation
  • auth.mfa.spec.ts, validates second-factor flows
  • auth.session.spec.ts, validates reuse and expiry behavior
  • protected-routes.spec.ts, validates route guards and redirect logic

Shared helpers should be small and explicit, such as loginAs(user) or logout(), but avoid burying too much logic in opaque helpers. If the helper clears cookies, reads email codes, or mutates account settings, that behavior should be obvious from the test setup.

Where Endtest, an agentic AI test automation platform, can fit in

If you want a more standardized way to run auth-flow regression across environments, Endtest’s AI Test Creation Agent can be useful because it turns a plain-English scenario into editable, platform-native test steps. That can help teams keep login and logout checks consistent across browsers and environments without hand-maintaining fragile scripts for every path.

For teams migrating existing suites, AI Test Import can bring in Selenium, Playwright, or Cypress assets and convert them into runnable tests, which is handy when you want to centralize a few high-value auth flows without rewriting everything at once.

If your auth pages vary by locale, role, or runtime data, Endtest’s AI Variables and AI Assertions can help reduce brittle checks by letting you reason over cookies, variables, or logs instead of hard-coding every string. That said, the core testing principles stay the same, isolate state, keep identities explicit, and do not let one flow contaminate another.

A checklist you can apply before merging auth tests

Before you merge a browser auth test, ask:

  • Does this test start from a known browser state?
  • Does it use a dedicated account or a clearly isolated account pool?
  • Does it verify user-visible behavior, not just implementation details?
  • If it reuses session state, is that reuse intentional and scoped?
  • Does it work both locally and in CI without depending on previous runs?
  • If MFA is involved, is the factor deterministic for automation?
  • Does logout prove access is gone, not only that the UI changed?
  • Does the test avoid sharing one session across parallel workers?

If you cannot answer yes to most of those, the test may still be useful, but it is not yet reproducible enough to trust.

Final thought

To test authentication flows in browser automation well, treat session state as a first-class test dependency. Login, logout, MFA, and session reuse all need different handling, and the safest suites are the ones that make state boundaries visible. Once your tests stop inheriting stray cookies and storage from previous runs, the failures you see become much easier to understand, and much more likely to represent real product issues.

That is the real goal, not just passing a login form once, but proving that authentication behaves predictably when the browser, the backend, and the CI system all try to complicate the story.