Simultaneous Perturbation Stochastic Approximation (SPSA)

Gradient-like minimization routine: $$\hat{\theta}_{k+1} = \hat{\theta}_k - a_k \hat{g}_k(\hat{\theta}_k)$$ Finite-difference-like stochastic perturbation gradient estimator: $$\hat{g}_{ki}(\hat{\theta}_k) = \frac{y(\hat{\theta}_k+c_k\Delta_k)-y(\hat{\theta}_k-c_k\Delta_k)}{2c_k\Delta_{ki}}$$ with random perturbation vector $\Delta_k = (\Delta_{k1}, \Delta_{k2},\dots,\Delta_{kp})$ where $\Delta_{ki}$ i.i.d. Bernoulli $\pm1$.

Quadratic loss function:
$y(\theta) = \theta_x^2+\theta_y^2+\mathcal{N}(0,\sigma^2)$

Initial guess (click on plot):

Diminishing stepsize:
$a_k = a/(A+k)^\alpha$,
$c_k = c/k^\gamma$,

Loss $y(\theta_k)$ vs. Iteration $k$


Spall, J. C. (1998), "An Overview of the Simultaneous Perturbation Method for Efficient Optimization,"
   Johns Hopkins APL Technical Digest, vol. 19(4), pp. 482-492.