Privacy noise may negate the benefits of using adaptive optimizers in
differentially private model training. Prior works typically address this issue
by using auxiliary information (e.g., public data) to boost the effectiveness
of adaptive optimization. In this work, we explore techniques to estimate and
efficiently adapt to gradient geometry in private adaptive optimization without
auxiliary data. Motivated by the observation that adaptive methods can tolerate
stale preconditioners, we propose differentially private adaptive training with
delayed preconditioners (DP^2), a simple method that constructs delayed but
less noisy preconditioners to better realize the benefits of adaptivity.
Theoretically, we provide convergence guarantees for our method for both convex
and non-convex problems, and analyze trade-offs between delay and privacy noise
reduction. Empirically, we explore DP^2 across several real-world datasets,
demonstrating that it can improve convergence speed by as much as 4x relative
to non-adaptive baselines and match the performance of state-of-the-art
optimization methods that require auxiliary data.

By admin