Sequentially-Rerandomized Switchback Experiments

Imagine you are the manager of a massive ride-sharing app or an online marketplace. You want to test a new feature (like a new way to match drivers with riders) to see if it makes money or improves efficiency.

In the old days, you might have done a simple A/B test: You pick a bunch of cities, flip a coin for each, and tell half to use the new feature and half to keep the old one. You wait a week, look at the results, and decide.

The Problem:
This simple approach often fails in the real world for three reasons:

Too Few Cities: You might only have 50 cities to test. That's not enough data to be 100% sure the results aren't just luck.
Uneven Playing Field: One city (like Paris or New York) is huge and wealthy, while another is small and rural. If you accidentally put the "new feature" group in the big city and the "old feature" group in the small one, your results will be skewed.
The "Hangover" Effect: What happens today affects tomorrow. If you show a driver a new app feature today, they might get used to it, and their behavior changes for the next few days. A simple weekly test can't account for this "carryover."

The Solution: The "Smart Switchback" (SRSB)
The authors of this paper propose a new, smarter way to run these experiments called Sequentially-Rerandomized Switchback Experiments (SRSB).

Here is how it works, using a few analogies:

1. The "Switchback" (The Road Trip)

Instead of assigning a city to "New" or "Old" for the whole month, imagine you are driving a car on a winding mountain road. You switch back and forth between the left lane (New Feature) and the right lane (Old Feature) every hour.

Why? This lets you test both options in the same city, at the same time of day, under the same weather conditions. It cancels out the "big city vs. small city" problem because every city gets to try both.

2. The "Smart Rerandomization" (The Fair Coin)

Here is the tricky part. In a standard experiment, you flip a coin to decide which lane to switch to next. But what if the coin is slightly weighted? Or what if the traffic patterns from the last hour make one lane look better just by chance?

The SRSB method is like having a super-smart referee who watches the game before making a call.

The Old Way: The referee flips a coin. If the "New Feature" group happens to get all the rich cities by luck, the test is ruined.
The SRSB Way: The referee looks at the scoreboard from the last hour (past data). If the "New Feature" group is currently ahead just because they got lucky with the traffic, the referee re-flips the coin until the two groups are perfectly balanced based on what happened before.
The Result: You keep re-flipping until the two groups are identical twins in terms of their past performance. This ensures that any difference you see now is actually because of the new feature, not because of bad luck.

3. Handling the "Hangover" (The Blocked Design)

Sometimes, the effect of a feature lasts longer than one hour. If you switch from "New" to "Old" too quickly, the drivers might still be acting like they are on the "New" feature. This is the Carryover Effect.

To fix this, the authors introduce a Blocked Design:

Imagine you are testing a new diet. If you switch from "Diet A" to "Diet B" instantly, your body is still digesting Diet A.
The Blocked SRSB says: "Let's group people who were on Diet A yesterday and keep them together. Let's group people who were on Diet B yesterday and keep them together."
Then, within those groups, we carefully switch them to the new diet. This creates stable "stay" groups (people who stayed on the same diet for two days in a row) that we can compare fairly, ensuring the "hangover" doesn't mess up the math.

Why Does This Matter?

Think of it like tuning a radio.

Standard A/B Testing is like trying to tune a radio in a storm. You get static, and you can't hear the music clearly.
SRSB is like having a noise-canceling headset that listens to the storm, predicts the static, and cancels it out in real-time.

The Bottom Line:
This paper gives companies a mathematical "cheat code" to run experiments faster, with fewer cities, and with much more confidence. By constantly checking the past and re-balancing the groups, they can spot the true effect of a new policy even when the world is chaotic, changing, and full of "hangovers."

In short: Don't just flip a coin. Watch the scoreboard, check the history, and only make a move when the playing field is perfectly level.

1. Problem Statement

Large-scale online platforms (e.g., ride-sharing, advertising) often evaluate policies using switchback experiments, where operational units (e.g., geographic regions) switch between treatment and control over time. Standard A/B testing and simple switchback designs face four critical challenges in these settings:

Small Sample Size: The number of units ( $N$ ) is often small (tens to hundreds), making asymptotic inference based on $N \to \infty$ infeasible.
Heterogeneity: Units vary significantly (e.g., Paris vs. rural France), leading to imbalance that affects precision.
Non-Stationarity: Outcomes exhibit seasonality, trends, and serial correlation.
Carryover Effects: Treatments in one period may persist and affect outcomes in subsequent periods.

Standard complete randomization often fails to balance prognostic variables (like lagged outcomes) across treatment and control groups, leading to high variance and unreliable estimates.

2. Methodology: Sequentially-Rerandomized Switchback (SRSB)

The authors propose SRSB, an adaptive experimental design that rerandomizes treatment assignments at each time period $t$ to enforce balance on pre-specified prognostic variables constructed from past observations.

Core Framework

Design-Based Perspective: The paper adopts a finite-population framework where potential outcomes and covariates are fixed; randomness arises solely from the treatment assignment.
Sequential Rerandomization: At each time $t$ , the algorithm draws candidate assignments. An assignment is accepted only if the Mahalanobis distance between the treatment and control groups' balancing variables ( $H_{i,t}$ ) is below a threshold $c$ .
Balancing Variables: $H_{i,t}$ typically includes contemporaneous covariates ( $X_{i,t}$ ) and lagged outcomes ( $Y_{i,t-1}$ ), leveraging temporal dependence to reduce variance.

Two Scenarios

The paper addresses two distinct settings regarding carryover effects:

A. No Carryover Effects (Assumption 3a)

Mechanism: Outcomes at time $t$ depend only on the current treatment $W_{i,t}$ .
Estimator: The Sample Average Treatment Effect (SATE) is estimated by averaging period-specific difference-in-means estimators ( $\hat{\tau}_t$ ).
Inference:
1. Randomization Inference: Exact finite-sample inference under a sharp null hypothesis ( $H_0: Y(1) - Y(0) = \delta$ ) using Monte Carlo simulation of SRSB paths.
2. Asymptotic Inference: As $T \to \infty$ , the estimator satisfies a Martingale Central Limit Theorem (CLT). The sequence of estimation errors forms a martingale difference sequence due to the symmetry of the acceptance rule.

B. First-Order Carryover Effects (Assumption 3b)

Mechanism: Outcomes at time $t$ depend on $W_{i,t-1}$ and $W_{i,t}$ .
Challenge: Standard rerandomization balances $W_{i,t}=1$ vs. $W_{i,t}=0$ , but the estimand of interest compares "stay" groups ( $W_{i,t-1}=W_{i,t}=1$ vs. $0$). These groups may be unbalanced if not handled carefully.
Solution: Blocked SRSB:
- At time $t$ , units are partitioned into two blocks based on the previous treatment: $G_t^{(1)} = \{i: W_{i,t-1}=1\}$ and $G_t^{(0)} = \{i: W_{i,t-1}=0\}$ .
- Rerandomization is performed within each block to ensure the "stay" groups are comparable and representative.
- This ensures the sizes of the "stay" groups are fixed (approx. $N/4$ ), stabilizing estimation.
Inference:
- The estimator is not a martingale difference sequence relative to the immediate past ( $F_{t-1}$ ) but satisfies a lag-two unbiasedness property ( $E[\hat{\tau}_t | F_{t-2}] = \tau_t$ ).
- Asymptotic normality is established using mixingale arguments and "Bernstein sums" (blocking the time series into large blocks to approximate independence).
- A prediction-based conservative variance estimator is proposed, using residuals from predictors measurable at $t-1$ .

3. Key Contributions

Novel Design: Introduction of SRSB, which adapts rerandomization to the sequential nature of switchback experiments, balancing lagged outcomes and covariates dynamically.
Theoretical Guarantees:
- Proved that SRSB reduces variance compared to complete randomization when balancing variables are prognostic.
- Developed finite-sample randomization inference and asymptotic normality results (Martingale CLT for no-carryover; Mixingale CLT for carryover).
Blocked Variant for Carryover: Proposed a specific blocking strategy based on lagged treatment to handle first-order carryover, ensuring stable "stay" group sizes and comparability.
Robust Variance Estimation: Developed a conservative variance estimator for the carryover setting that relies only on observable data, avoiding the need for unobserved potential outcomes.

4. Results

Extensive simulations and semi-synthetic experiments (using Penn World Table GDP data) demonstrate:

Variance Reduction: SRSB consistently achieves lower Root Mean Squared Error (RMSE) than complete randomization (SB) and standard switchback designs.
Dependence on Predictability: The gains from SRSB increase as the correlation ( $\rho$ ) between lagged outcomes/covariates and future outcomes increases.
Carryover Handling:
- In first-order carryover settings, the Blocked SRSB significantly outperforms unblocked SRSB and complete randomization.
- In settings with higher-order (infinite) carryover (Markovian latent state model), SRSB remains robust for small persistence parameters ( $\rho$ ), though bias increases as persistence grows (a known limitation of finite-order approximations).
Scalability: The method performs well as $N$ and $T$ vary, maintaining parametric convergence rates ( $O(1/\sqrt{NT})$ ).

5. Significance

This paper bridges the gap between rerandomization (typically used in static cross-sectional settings) and switchback experiments (dynamic time-series settings).

Practical Impact: It provides a rigorous, implementable framework for companies (like Airbnb, the authors' affiliation) to run more efficient experiments with limited units and dynamic environments.
Theoretical Advancement: It extends the theory of causal inference to adaptive designs with dependent data, establishing new CLTs for martingales and mixingales in the context of experimental design.
Robustness: By explicitly addressing carryover effects through blocking, it offers a solution to a common source of bias in time-series experimentation that standard methods often ignore.

In summary, SRSB offers a statistically superior alternative to standard switchback designs by leveraging historical data to dynamically balance treatment groups, thereby significantly improving the precision of causal effect estimates in complex, non-stationary environments.

Sequentially-Rerandomized Switchback Experiments

1. The "Switchback" (The Road Trip)

2. The "Smart Rerandomization" (The Fair Coin)

3. Handling the "Hangover" (The Blocked Design)

Why Does This Matter?

1. Problem Statement

2. Methodology: Sequentially-Rerandomized Switchback (SRSB)

Core Framework

Two Scenarios

3. Key Contributions

4. Results

5. Significance

More like this

Varying risk exposure in auto insurance: a weighted tweedie framework for experience rating an cancellation penalties

Remote, bivariate expert elicitation to determine the prior probability distribution for sample size calculation in a Bayesian non-inferiority multicenter randomized controlled trial (Croup Dosing Trial)

Reinforcement Learning from Human Feedback: A Statistical Perspective

Applied Statistics Requires Scientific Context

Learning interacting particle systems from unlabeled data