E-values, Sequential Testing, and Concentrated Differential Privacy: Theory and Practice
Introduction to E-values and E-processes¶
An E-value (evidence value) is a non-negative random variable $E$ with expected value at most 1 under the null hypothesis:
$$ \mathbb{E}_{H_0}[E] \leq 1 $$
E-values generalize likelihood ratios and provide a universal framework for hypothesis testing without p-values. The fundamental property enabling sequential analysis is Ville’s inequality: for any E-value $E$ and $\alpha > 0$:
$$ P_{H_0}(E \geq 1/\alpha) \leq \alpha $$
An E-process $(E_t)_{t=1}^{\infty}$ is a non-negative adapted process where each $E_t$ is an E-value with respect to the filtration $\mathcal{F}_t$. Formally:
$$ \mathbb{E}{H_0}[E_t | \mathcal{F} $$}] \leq E_{t-1
This makes $(E_t)$ a supermartingale under $H_0$, enabling anytime-valid inference.
Mathematical Framework of Sequential Testing¶
Test Martingales and Supermartingales¶
Let $X_1, X_2, \ldots$ be observations. A test martingale $M_t$ for testing $H_0: \theta = \theta_0$ versus $H_1: \theta = \theta_1$ is:
$$ M_t = \prod_{i=1}^{t} \frac{f_{\theta_1}(X_i | X_1, \ldots, X_{i-1})}{f_{\theta_0}(X_i | X_1, \ldots, X_{i-1})} $$
Under $H_0$, $(M_t)$ is a martingale with $\mathbb{E}_{H_0}[M_t] = 1$.
For composite hypotheses $H_1: \theta \in \Theta_1$, we construct test supermartingales via mixture methods:
$$ S_t = \int_{\Theta_1} M_t(\theta) \, d\pi(\theta) $$
where $\pi$ is a prior over $\Theta_1$, or via maximum likelihood:
$$ S_t = \sup_{\theta \in \Theta_1} M_t(\theta) $$
Anytime-Valid Confidence Sequences¶
A $(1-\alpha)$-confidence sequence $(C_t)_{t=1}^{\infty}$ satisfies:
$$ P_{\theta}(\forall t \geq 1: \theta \in C_t) \geq 1 - \alpha $$
Confidence sequences are dual to E-processes via the fundamental relationship:
$$ C_t = {\theta : E_t(\theta) < 1/\alpha} $$
where $E_t(\theta)$ is an E-process for testing $H_0: \theta$.
Exponential Tilting and Optimal E-processes¶
Basic Exponential Tilting¶
For bounded observations $X_i \in [a, b]$, exponential tilting creates E-values via:
$$ E_{\lambda} = \exp\left(\lambda \sum_{i=1}^{n} (X_i - \mu_0) - n\psi(\lambda)\right) $$
where $\psi(\lambda) = \log \mathbb{E}_{H_0}[e^{\lambda(X - \mu_0)}]$ is the cumulant generating function.
Mixture E-values and Optimization¶
The mixture E-value over tilting parameters:
$$ E_{\text{mix}} = \int_{\Lambda} E_{\lambda} \, d\pi(\lambda) $$
achieves minimax optimality for certain priors $\pi$. The optimal prior for sub-Gaussian observations is:
$$ \pi^*(\lambda) \propto \exp\left(-\frac{\lambda^2 \sigma^2}{2}\right) $$
yielding the E-value:
$$ E_t = \exp\left(\frac{S_t^2}{2t\sigma^2} - \frac{S_t^2}{2t\sigma^2 + 2\sigma^2}\right) $$
where $S_t = \sum_{i=1}^{t} (X_i - \mu_0)$.
Predictable Mixture Martingales¶
For adaptive testing, we use predictable mixtures:
$$ M_t = \prod_{i=1}^{t} \int \frac{dP_{\theta}(X_i | \mathcal{F}{i-1})}{dP_0(X_i | \mathcal{F} d\pi_i(\theta) $$})
where $\pi_i$ can depend on past observations through $\mathcal{F}_{i-1}$.
Connection to Concentrated Differential Privacy¶
Concentrated Differential Privacy (CDP)¶
A randomized mechanism $\mathcal{M}$ satisfies $\rho$-concentrated differential privacy if for neighboring datasets $D, D’$:
$$ D_{\alpha}(\mathcal{M}(D) || \mathcal{M}(D’)) \leq \rho $$
for all $\alpha > 1$, where $D_{\alpha}$ is the Rényi divergence of order $\alpha$.
Privacy Loss as a Martingale¶
The privacy loss random variable for a single query:
$$ Z = \log\frac{P(\mathcal{M}(D) = y)}{P(\mathcal{M}(D’) = y)} $$
For sequential mechanisms, the cumulative privacy loss:
$$ L_t = \sum_{i=1}^{t} Z_i $$
forms a martingale under specific conditions, enabling sequential analysis.
E-processes for Privacy Accounting¶
Define the privacy E-process:
$$ E_t^{\text{priv}} = \exp(\lambda L_t - t\psi(\lambda)) $$
where $\psi(\lambda) = \log \mathbb{E}[e^{\lambda Z}]$ is the privacy loss CGF. Under $\rho$-CDP:
$$ \psi(\lambda) \leq \frac{\lambda(\lambda + 1)\rho}{2} $$
This E-process enables anytime-valid privacy guarantees:
$$ P(L_t \geq \log(1/\delta) + t\psi(\lambda)/\lambda) \leq \delta \cdot e^{-\lambda \log(1/\delta) + t\psi(\lambda)} $$
Composition Theorems in Sequential Testing¶
Basic Composition for E-processes¶
Theorem (Product Composition): If $E^{(1)}, \ldots, E^{(k)}$ are independent E-values for testing $H_0^{(1)}, \ldots, H_0^{(k)}$, then:
$$ E = \prod_{i=1}^{k} E^{(i)} $$
is an E-value for testing $H_0 = \cap_{i=1}^{k} H_0^{(i)}$.
Advanced Composition via Rényi Divergence¶
Theorem (Rényi Composition): For sequential tests with Rényi divergences bounded by $\rho_i$:
$$ D_{\alpha}\left(\prod_{i=1}^{t} P_i || \prod_{i=1}^{t} Q_i\right) \leq \sum_{i=1}^{t} \rho_i $$
This enables tight privacy composition in sequential settings.
Adaptive Composition¶
Theorem (Adaptive E-process Composition): Let $(E_t^{(i)})$ be E-processes for hypotheses $H_0^{(i)}$. The adaptively weighted combination:
$$ E_t = \sum_{i=1}^{k} w_i(t) E_t^{(i)} $$
remains an E-process if $\sum_{i=1}^{k} w_i(t) \leq 1$ and weights are predictable.
Optimal Composition Rates¶
For $k$ independent sub-Gaussian tests with variance proxy $\sigma^2$:
$$ E_{\text{composed}} = \exp\left(\frac{|S|_2^2}{2k\sigma^2} - \frac{k\log(k)}{2}\right) $$
where $S = (S_1, \ldots, S_k)$ are the individual test statistics.
Practical Implementation of Composition Theorems¶
Algorithm 1: Sequential Privacy Accounting¶
Initialize: E_0 = 1, privacy_budget = ε_total
For each query t = 1, 2, ...:
1. Compute privacy loss Z_t for query t
2. Update E-process:
E_t = E_{t-1} · exp(λZ_t - ψ(λ))
3. Check stopping condition:
if E_t ≥ 1/α:
STOP (privacy budget exceeded)
4. Compute remaining budget:
ε_remaining = (log(1/α) - log(E_t))/λ
Algorithm 2: Adaptive Sequential Testing¶
Initialize: E_0^{(i)} = 1 for all hypotheses i
For each observation t = 1, 2, ...:
1. Compute likelihood ratios L_t^{(i)}
2. Update E-processes:
E_t^{(i)} = E_{t-1}^{(i)} · L_t^{(i)}
3. Compute composite E-value:
E_t = min_i {k · E_t^{(i)}} (Bonferroni)
or
E_t = (∑_i (E_t^{(i)})^{1/r})^r (r-composition)
4. Make decision:
if E_t ≥ 1/α:
REJECT H_0
Algorithm 3: Confidence Sequence Construction¶
For each time t:
1. Define E-process for each θ:
E_t(θ) = exp(λ(S_t - tθ) - tψ_θ(λ))
2. Optimize λ = λ_t(θ):
λ_t(θ) = argmax_λ {λ(S_t - tθ) - tψ_θ(λ)}
3. Construct confidence set:
C_t = {θ : E_t(θ) < 1/α}
4. Simplify to interval [L_t, U_t] via monotonicity
Practical Considerations and Best Practices¶
Choosing the Right E-process¶
- For bounded observations: Use exponential tilting with mixture priors
- For unbounded observations: Use truncation or robust E-values
- For composite nulls: Use reverse information projection (RIPr)
Optimizing Composition¶
Strategy 1: Hierarchical Testing Organize hypotheses hierarchically and use:
$$ E_{\text{tree}} = \min_{v \in \text{tree}} {|A(v)| \cdot E_v} $$
where $|A(v)|$ is the number of ancestors of node $v$.
Strategy 2: Weighted Composition For hypotheses with different importance $w_i$:
$$ E_{\text{weighted}} = \left(\sum_{i=1}^{k} w_i E_i^{1/r}\right)^r $$
Choose $r$ based on correlation structure.
Computational Efficiency¶
- Use logarithmic representation: Work with $\log E_t$ to avoid numerical overflow
- Batch updates: Update E-processes in blocks for vectorization
- Pruning: Remove hypotheses with $E_t^{(i)} < \epsilon$ from active set
Handling Dependencies¶
For dependent observations, modify the E-process:
$$ E_t^{\text{dep}} = \exp\left(\lambda S_t - t\psi(\lambda) - \frac{\lambda^2 \hat{V}_t}{2}\right) $$
where $\hat{V}_t$ estimates the variance accounting for dependence.
Advanced Topics¶
Asymptotic Optimality¶
The growth rate optimal E-process satisfies:
$$ \lim_{t \to \infty} \frac{\log E_t}{t} = D_{KL}(P_1 || P_0) $$
achieving the Kullback-Leibler divergence asymptotically.
Connection to Information Theory¶
The e-power of a test:
$$ \text{POW}e = \inf_P[\log^+ E] $$} \mathbb{E
relates to the information projection of $H_1$ onto $H_0$.
Future Directions¶
- Nonparametric E-processes: Kernel-based methods for complex hypotheses
- Causal E-values: Sequential testing in causal inference
- Distributed testing: E-processes in federated settings
- Quantum E-values: Extension to quantum hypothesis testing
Conclusion¶
Sequential testing via E-processes provides a powerful framework for anytime-valid inference with strong connections to differential privacy. The composition theorems enable practical implementation in complex settings while maintaining theoretical guarantees. Key advantages include:
- Anytime validity: Can stop at any time without alpha inflation
- Optimal composition: Tight bounds via Rényi divergence
- Privacy accounting: Natural connection to CDP mechanisms
- Practical algorithms: Efficient implementation strategies
The framework continues to evolve with applications in online experimentation, privacy-preserving analytics, and adaptive clinical trials.