What is unit of randomization?

What is a “Unit of Randomization”?

The unit of randomization is the entity you randomly assign to variants (A or B). It’s the “thing” that receives the treatment: a user, a session, a device, a household, a store, a geographic region, etc.

Choosing this unit determines:

  • Who gets which experience
  • How independence assumptions hold (or break)
  • How you compute statistics and sample size
  • How actionable and unbiased your results are

How It Works (at a high level)

  1. Define exposure: decide what entity must see a consistent experience (e.g., “Logged-in user must always see the same variant across visits.”).
  2. Create an ID: select an identifier for that unit (e.g., user_id, device_id, household_id, store_id).
  3. Hash & assign: use a stable hashing function to map each ID into variant A or B with desired split (e.g., 50/50).
  4. Persist: ensure the unit sticks to its assigned variant on every exposure (stable bucketing).
  5. Analyze accordingly: aggregate metrics at or above the unit level; use the right variance model (especially for clusters).

Common Units of Randomization (with pros/cons and when to use)

1) User-Level (Account ID or Login ID)

  • What it is: Each unique user/account is assigned to a variant.
  • Use when: Logged-in products; experiences should persist across devices and sessions.
  • Pros: Clean independence between users; avoids cross-device contamination for logged-in flows.
  • Cons: Requires reliable, unique IDs; guest traffic may be excluded or need fallback logic.

2) Device-Level (Device ID / Mobile Advertiser ID)

  • What it is: Each physical device is assigned.
  • Use when: Native apps; no login, but device ID is stable.
  • Pros: Better than cookies for persistence; good for app experiments.
  • Cons: Same human on multiple devices may see different variants; may bias human-level metrics.

3) Cookie-Level (Browser Cookie)

  • What it is: Each browser cookie gets a variant.
  • Use when: Anonymous web traffic without login.
  • Pros: Simple to implement.
  • Cons: Cookies expire/clear; users have multiple browsers/devices → contamination and assignment churn.

4) Session-Level

  • What it is: Each session is randomized; the same user may see different variants across sessions.
  • Use when: You intentionally want short-lived treatment (e.g., page layout in a one-off landing funnel).
  • Pros: Fast ramp, lots of independent observations.
  • Cons: Violates persistence; learning/carryover effects make interpretation tricky for longer journeys.

5) Pageview/Request-Level

  • What it is: Every pageview or API request is randomized.
  • Use when: Low-stakes UI tweaks with negligible carryover; ads/creative rotation tests.
  • Pros: Maximum volume quickly.
  • Cons: Massive contamination; not suitable when the experience should be consistent within a visit.

6) Household-Level

  • What it is: All members/devices of a household share the same assignment (derived from address or shared account).
  • Use when: TV/streaming, grocery delivery, multi-user homes.
  • Pros: Limits within-home interference; aligns with purchase behavior.
  • Cons: Hard to define reliably; potential privacy constraints.

7) Network/Team/Organization-Level

  • What it is: Randomize at a group/organization level (e.g., company admin sets a feature; all employees see it).
  • Use when: B2B products; settings that affect the whole group.
  • Pros: Avoids spillovers inside an org.
  • Cons: Fewer units → lower statistical power; requires cluster-aware analysis.

8) Geographic/Store/Region-Level (Cluster Randomization)

  • What it is: Entire locations are assigned (cities, stores, countries, data centers).
  • Use when: Pricing, inventory, logistics, or features tied to physical/geo constraints.
  • Pros: Realistic operational measurement, cleaner separation across regions.
  • Cons: Correlated outcomes within a cluster; requires cluster-robust analysis and typically larger sample sizes.

Why the Unit of Randomization Matters

1) Validity (Independence & Interference)

Statistical tests assume independent observations. If people in the control are affected by those in treatment (interference), estimates are biased. Picking a unit that contains spillovers (e.g., randomize at org or store level) preserves validity.

2) Power & Sample Size (Design Effect)

Clustered units (households, stores, orgs) share similarities—captured by intra-class correlation (ICC), often denoted ρ\rhoρ. This inflates variance via the design effect:

DE = 1 + ( m 1 ) ρ

Where m is the average cluster size. Your effective sample size becomes:

neff = n DE

Larger clusters or higher ρ → bigger DE → less power for the same raw n.

3) Consistency of Experience

Units like user-level + stable bucketing ensure a user’s experience doesn’t flip between variants, avoiding dilution and confusion.

4) Interpretability & Actionability

If you sell at the store level, store-level randomization makes metrics easier to translate into operational decisions. If you optimize user engagement, user-level makes more sense.

How to Choose the Right Unit (Decision Checklist)

  • Where do spillovers happen?
    Pick the smallest unit that contains meaningful interference (user ↔ household ↔ org ↔ region).
  • What is the primary decision maker?
    If rollouts happen per account/org/region, align the unit with that boundary.
  • Can you persist assignment?
    Use stable identifiers and hashing (e.g., SHA-256 on user_id + experiment_name) to keep assignments sticky.
  • How will you analyze it?
    • User/cookie/device: standard two-sample tests aggregated per unit.
    • Cluster (org/store/geo): use cluster-robust standard errors or mixed-effects models; adjust for design effect in planning.
  • Is the ID reliable & unique?
    Prefer user_id over cookie when possible. If only cookies exist, add fallbacks and measure churn.

Practical Implementation Tips

  • Stable Bucketing: Hash the chosen unit ID to a uniform number in [0,1); map ranges to variants (e.g., <0.5 → A, ≥0.5 → B). Store assignment server-side for reliability.
  • Cross-Device Consistency: If the same human might use multiple devices, prefer user-level (requires login) or implement a linking strategy (e.g., email capture) before randomization.
  • Exposure Control: Ensure treatment is only applied after assignment; log exposures to avoid partial-treatment bias.
  • Metric Aggregation: Aggregate outcomes per randomized unit first (e.g., user-level conversion), then compare arms. Avoid pageview-level analysis when randomizing at user level.
  • Bot & Duplicate Filtering: Scrub bots and detect duplicate IDs (e.g., shared cookies) to reduce contamination.
  • Pre-Experiment Checks: Verify balance on key covariates (traffic source, device, geography) across variants for the chosen unit.

Examples

  • Pricing test in retail chain → randomize at store level; compute sales per store; analyze with cluster-robust errors; account for region seasonality.
  • New signup flow on a web app → randomize at user level (or cookie if anonymous); ensure users see the same variant across sessions.
  • Homepage hero image rotation for paid ads landing page → potentially session or pageview level; keep awareness of contamination if users return.

Common Pitfalls (and how to avoid them)

  • Using too granular a unit (pageview) for features with memory/carryover → inconsistent experiences and biased results.
    Fix: move to session or user level.
  • Ignoring clustering when randomizing stores/teams → inflated false positives.
    Fix: use cluster-aware analysis and plan for design effect.
  • Cookie churn breaks persistence → variant switching mid-experiment.
    Fix: server-side assignment with long-lived identifiers; encourage login.
  • Interference across units (social/network effects) → contamination.
    Fix: enlarge the unit (household/org/region) or use geo-experiments with guard zones.