What is Sample Ratio Mismatch?

What is Sample Ratio Mismatch?

Sample Ratio Mismatch (SRM) is when the observed allocation of users to variants differs significantly from the planned allocation.
Example: You configured a 50/50 split, but after 10,000 users you see 5,300 in A and 4,700 in B. That’s likely SRM.

SRM means the randomization or eligibility pipeline is biased (or data capture is broken), so any effect estimates (lift, p-values, etc.) can’t be trusted.

How SRM Works (Conceptually)

When you specify a target split like 50/50 or 33/33/34, each incoming unit (user, device, session, etc.) should be randomly bucketed so that the expected distribution matches your target in expectation.

Formally, for a test with k variants and total N assigned units, the expected count for variant i is:

E_i = p_i N

where

p_i is the target proportion for variant 𝑖 i and N

is the total sample size.

If the observed counts,

O_i

, differ from the expected more than chance alone would allow, you have an SRM.

How to Identify SRM (Step-by-Step)

1) Use a Chi-Square Goodness-of-Fit Test (recommended)

For k variants, compute:

χ2 = ( (O_iE_i)2 E_i )

with degrees of freedom df=k−1. Compute the p-value from the chi-square distribution. If the p-value is very small (common thresholds: 10−3 to 10−6), you’ve likely got an SRM.

Example (two-arm 50/50):
N=10,000,  OA=5,300,  OB=4,700,  EA=EB=5,000

χ2 = (5300-5000)^2 5000 + (4700-5000)^2 5000 =36

With df=1, p≈1.97×10−9. This triggers SRM.

2) Visual/Operational Checks

  • Live split dashboard: Show observed vs. expected % by variant.
  • Stratified checks: Repeat the chi-square by country, device, browser, app version, traffic source, time-of-day to find where the skew originates.
  • Time series: Plot cumulative allocation over time—SRM that “drifts” may indicate a rollout, caching, or traffic-mix issue.

3) Early-Warning Rule of Thumb

If your observed proportion deviates from the target by more than a few standard errors early in the test, investigate. For two arms with target p=0.5, the sampling variance under perfect randomization is:

σp = p(1p) N

Large persistent deviations → likely SRM.

Common Causes of SRM

  1. Eligibility asymmetry: Filters (geo, device, login state, new vs. returning) applied after assignment or applied differently per variant.
  2. Randomization at the wrong unit: Assigning by session but analyzing by user (or vice versa); cross-device users collide.
  3. Inconsistent hashing/salts: Different hash salt/seed per service or per page; some code paths skip/override the assignment.
  4. Sticky sessions / caching / CDNs: Edge caching or load balancer stickiness pinning certain users to one variant.
  5. Traffic shaping / rollouts: Feature flags, canary releases, or time-based rollouts inadvertently biasing traffic into one arm.
  6. Bot or test traffic: Non-human or QA traffic not evenly distributed (or filtered in one arm only).
  7. Telemetry loss / logging gaps: Events dropped more in one arm (ad-blockers, blocked endpoints, CORS, mobile SDK bugs).
  8. User-ID vs. device-ID mismatch: Some users bucketed by cookie, others by account ID; cookie churn changes ratios.
  9. Late triggers: Assignment happens at “conversion event” time in one arm but at page load in another.
  10. Geo or platform routing differences: App vs. web, iOS vs. Android, or specific regions routed to different infrastructure.

How to Prevent SRM (Design & Implementation)

  • Choose the right unit of randomization (usually user). Keep it consistent from assignment through analysis.
  • Server-side assignment with deterministic hashing on a stable ID (e.g., user_id). Example mapping:
b= { A if (H(user\_id||salt)modM)<pM B otherwise }

where H is a stable hash, M a large modulus (e.g., 106), and p the target proportion for A.

  • Single source of truth for assignment (SDKs/services call the same bucketing service).
  • Pre-exposure assignment: Decide the variant before any UI/network differences occur.
  • Symmetric eligibility: Apply identical inclusion/exclusion filters before assignment.
  • Consistent rollout & flags: If you use gradual rollouts, do it outside the experiment or symmetrically across arms.
  • Bot/QA filtering: Detect and exclude bots and internal IPs equally for all arms.
  • Observability: Log (unit_id, assigned_arm, timestamp, eligibility_flags, platform, geo) to a central stream. Monitor split, by segment, in real time.
  • Fail-fast alerts: Trigger alerts when SRM p-value falls below a strict threshold (e.g., p<10−4).

How to Fix SRM (Triage & Remediation)

  1. Pause the experiment immediately. Do not interpret effect estimates from an SRM-affected test.
  2. Localize the bias. Recompute chi-square by segment (geo, device, source). The segment with the strongest SRM often points to the root cause.
  3. Audit the assignment path.
    • Verify the unit ID is consistent (user_id vs. cookie).
    • Check hash function + salt are identical everywhere.
    • Ensure assignment occurs pre-render and isn’t skipped due to timeouts.
  4. Check eligibility filters. Confirm identical filters are applied before assignment and in both arms.
  5. Review infra & delivery. Look for sticky sessions, CDN cache keys, or feature flag rollouts that differ by arm.
  6. Inspect telemetry. Compare event loss rates by arm/platform. Fix SDK/network issues (e.g., batch size, retry logic, CORS).
  7. Sanitize traffic. Exclude bots/internal traffic uniformly; re-run SRM checks.
  8. Rerun a smoke test. After fixes, run a small, short dry-run experiment to confirm the split is healthy (no SRM) before relaunching the real test.

Analyst’s Toolkit (Ready-to-Use)

  • SRM Chi-Square (two-arm 50/50):
χ2 = (O_AN/2)2 N/2 + (O_BN/2)2 N/2
  • General kkk-arm expected counts:
E_i=p_iN
  • Standard error for a two-arm proportion (target ppp):
σp= p(1p) N

Practical Checklist

  • Confirm unit of randomization and use stable IDs.
  • Perform server-side deterministic hashing with shared salt.
  • Apply eligibility before assignment, symmetrically.
  • Exclude bots/QA consistently.
  • Instrument SRM alerts (e.g., chi-square p<10−4).
  • Segment SRM monitoring by platform/geo/source/time.
  • Pause & investigate immediately if SRM triggers.

Summary

SRM isn’t a minor annoyance—it’s a stop sign. It tells you that the randomization or measurement is broken, which can fabricate uplifts or hide regressions. Detect it early with a chi-square test, design your experiments to prevent it (stable IDs, deterministic hashing, symmetric eligibility), and never ship decisions from an SRM-affected test.