
What is a “Unit of Randomization”?
The unit of randomization is the entity you randomly assign to variants (A or B). It’s the “thing” that receives the treatment: a user, a session, a device, a household, a store, a geographic region, etc.
Choosing this unit determines:
- Who gets which experience
- How independence assumptions hold (or break)
- How you compute statistics and sample size
- How actionable and unbiased your results are
How It Works (at a high level)
- Define exposure: decide what entity must see a consistent experience (e.g., “Logged-in user must always see the same variant across visits.”).
- Create an ID: select an identifier for that unit (e.g.,
user_id,device_id,household_id,store_id). - Hash & assign: use a stable hashing function to map each ID into variant A or B with desired split (e.g., 50/50).
- Persist: ensure the unit sticks to its assigned variant on every exposure (stable bucketing).
- Analyze accordingly: aggregate metrics at or above the unit level; use the right variance model (especially for clusters).
Common Units of Randomization (with pros/cons and when to use)
1) User-Level (Account ID or Login ID)
- What it is: Each unique user/account is assigned to a variant.
- Use when: Logged-in products; experiences should persist across devices and sessions.
- Pros: Clean independence between users; avoids cross-device contamination for logged-in flows.
- Cons: Requires reliable, unique IDs; guest traffic may be excluded or need fallback logic.
2) Device-Level (Device ID / Mobile Advertiser ID)
- What it is: Each physical device is assigned.
- Use when: Native apps; no login, but device ID is stable.
- Pros: Better than cookies for persistence; good for app experiments.
- Cons: Same human on multiple devices may see different variants; may bias human-level metrics.
3) Cookie-Level (Browser Cookie)
- What it is: Each browser cookie gets a variant.
- Use when: Anonymous web traffic without login.
- Pros: Simple to implement.
- Cons: Cookies expire/clear; users have multiple browsers/devices → contamination and assignment churn.
4) Session-Level
- What it is: Each session is randomized; the same user may see different variants across sessions.
- Use when: You intentionally want short-lived treatment (e.g., page layout in a one-off landing funnel).
- Pros: Fast ramp, lots of independent observations.
- Cons: Violates persistence; learning/carryover effects make interpretation tricky for longer journeys.
5) Pageview/Request-Level
- What it is: Every pageview or API request is randomized.
- Use when: Low-stakes UI tweaks with negligible carryover; ads/creative rotation tests.
- Pros: Maximum volume quickly.
- Cons: Massive contamination; not suitable when the experience should be consistent within a visit.
6) Household-Level
- What it is: All members/devices of a household share the same assignment (derived from address or shared account).
- Use when: TV/streaming, grocery delivery, multi-user homes.
- Pros: Limits within-home interference; aligns with purchase behavior.
- Cons: Hard to define reliably; potential privacy constraints.
7) Network/Team/Organization-Level
- What it is: Randomize at a group/organization level (e.g., company admin sets a feature; all employees see it).
- Use when: B2B products; settings that affect the whole group.
- Pros: Avoids spillovers inside an org.
- Cons: Fewer units → lower statistical power; requires cluster-aware analysis.
8) Geographic/Store/Region-Level (Cluster Randomization)
- What it is: Entire locations are assigned (cities, stores, countries, data centers).
- Use when: Pricing, inventory, logistics, or features tied to physical/geo constraints.
- Pros: Realistic operational measurement, cleaner separation across regions.
- Cons: Correlated outcomes within a cluster; requires cluster-robust analysis and typically larger sample sizes.
Why the Unit of Randomization Matters
1) Validity (Independence & Interference)
Statistical tests assume independent observations. If people in the control are affected by those in treatment (interference), estimates are biased. Picking a unit that contains spillovers (e.g., randomize at org or store level) preserves validity.
2) Power & Sample Size (Design Effect)
Clustered units (households, stores, orgs) share similarities—captured by intra-class correlation (ICC), often denoted ρ\rhoρ. This inflates variance via the design effect:
Where m is the average cluster size. Your effective sample size becomes:
Larger clusters or higher ρ → bigger DE → less power for the same raw n.
3) Consistency of Experience
Units like user-level + stable bucketing ensure a user’s experience doesn’t flip between variants, avoiding dilution and confusion.
4) Interpretability & Actionability
If you sell at the store level, store-level randomization makes metrics easier to translate into operational decisions. If you optimize user engagement, user-level makes more sense.
How to Choose the Right Unit (Decision Checklist)
- Where do spillovers happen?
Pick the smallest unit that contains meaningful interference (user ↔ household ↔ org ↔ region). - What is the primary decision maker?
If rollouts happen per account/org/region, align the unit with that boundary. - Can you persist assignment?
Use stable identifiers and hashing (e.g., SHA-256 onuser_id + experiment_name) to keep assignments sticky. - How will you analyze it?
- User/cookie/device: standard two-sample tests aggregated per unit.
- Cluster (org/store/geo): use cluster-robust standard errors or mixed-effects models; adjust for design effect in planning.
- Is the ID reliable & unique?
Preferuser_idovercookiewhen possible. If only cookies exist, add fallbacks and measure churn.
Practical Implementation Tips
- Stable Bucketing: Hash the chosen unit ID to a uniform number in [0,1); map ranges to variants (e.g., <0.5 → A, ≥0.5 → B). Store assignment server-side for reliability.
- Cross-Device Consistency: If the same human might use multiple devices, prefer user-level (requires login) or implement a linking strategy (e.g., email capture) before randomization.
- Exposure Control: Ensure treatment is only applied after assignment; log exposures to avoid partial-treatment bias.
- Metric Aggregation: Aggregate outcomes per randomized unit first (e.g., user-level conversion), then compare arms. Avoid pageview-level analysis when randomizing at user level.
- Bot & Duplicate Filtering: Scrub bots and detect duplicate IDs (e.g., shared cookies) to reduce contamination.
- Pre-Experiment Checks: Verify balance on key covariates (traffic source, device, geography) across variants for the chosen unit.
Examples
- Pricing test in retail chain → randomize at store level; compute sales per store; analyze with cluster-robust errors; account for region seasonality.
- New signup flow on a web app → randomize at user level (or cookie if anonymous); ensure users see the same variant across sessions.
- Homepage hero image rotation for paid ads landing page → potentially session or pageview level; keep awareness of contamination if users return.
Common Pitfalls (and how to avoid them)
- Using too granular a unit (pageview) for features with memory/carryover → inconsistent experiences and biased results.
Fix: move to session or user level. - Ignoring clustering when randomizing stores/teams → inflated false positives.
Fix: use cluster-aware analysis and plan for design effect. - Cookie churn breaks persistence → variant switching mid-experiment.
Fix: server-side assignment with long-lived identifiers; encourage login. - Interference across units (social/network effects) → contamination.
Fix: enlarge the unit (household/org/region) or use geo-experiments with guard zones.
Recent Comments