Gradient-based optimisation for variational inference: CAVI baseline, Gibbs MCMC gold standard, and course methods (gradient ascent, Newton, BFGS) applied to maximise the ELBO.
Gradient-based optimisation applied to variational inference across five Bayesian regression settings: linear, quadratic, logistic, hierarchical linear, and hierarchical logistic regression. The unifying objective is maximising the Evidence Lower Bound (ELBO) using CAVI, gradient ascent (Armijo backtracking), Newton's method (regularised finite-difference Hessian), and BFGS (Wolfe conditions). Reference posteriors from Gibbs samplers. CAVI is the most efficient method across all five settings; BFGS is the recommended gradient-based alternative when closed-form CAVI updates are unavailable.
Mean-field VI turns Bayesian posterior computation into an optimisation problem, but can produce posterior variances that are too small — posterior variance collapse. This case study investigates that issue in a controlled Bayesian linear regression setting (n=50, p=2) with Gaussian–Gamma conjugate priors. CAVI provides closed-form updates and an analytical baseline; gradient ascent, Newton's method, and BFGS are compared on ELBO convergence, runtime, and posterior standard-deviation ratios against a Gibbs sampler reference. A secondary experiment tests entropy regularisation as a remedy for variance collapse.
Extends Case Study 1 to Bayesian quadratic regression (n=100, p=3) via feature expansion. Mean-field VI can systematically underestimate posterior uncertainty; this report investigates that posterior variance collapse using CAVI, gradient ascent, Newton's method, and BFGS applied to the ELBO. Methods are evaluated on both optimisation diagnostics (convergence, final ELBO, step-size behaviour, computational cost) and posterior standard-deviation ratios relative to a Gibbs sampler reference.
Bayesian logistic regression has no conjugate variational family; the Jaakkola–Jordan variational bound provides a tractable ELBO (n=200, p=2). Four methods are compared: CAVI, gradient ascent, Newton's method, and BFGS. The reference posterior is from a Pólya–Gamma Gibbs sampler; diagnostics include ESS and ACF plots. All four VI methods recover posterior means within 0.02 of the Gibbs reference, with CAVI converging in fewer than 50 iterations and several orders of magnitude faster than the PG sampler.
Extends to a hierarchical model with J=5 group random effects and Gamma precision priors (N=150, p=2, D=19 unconstrained parameters). The conjugate ELBO admits closed-form CAVI updates but also serves as a smooth objective for BFGS, gradient ascent, and Newton's method. A striking finding is severe posterior variance collapse for β0 (SD ratio ≈ 0.13), driven by the mean-field factorisation absorbing group-level variance into the random effects. Newton's method fails to converge to the correct optimum in this high-dimensional hierarchical setting.
The most challenging setting: combines the non-conjugate logistic likelihood (Case Study 3) with hierarchical random effects (Case Study 4). The Jaakkola–Jordan bound retains a tractable ELBO (J=5 groups, N=150, p=2, D=17). Reference posteriors from a Pólya–Gamma blocked Gibbs sampler. Pronounced variance collapse in both β0 and β1 (SD ratios ≈ 0.46), reflecting the mean-field family's failure to capture posterior correlation between fixed and random effects. CAVI and BFGS are 1050× and 81× faster than the PG sampler; Newton's method again converges to a suboptimal point.