SahaBose-KFAC: Making KFAC Stable and Fast via Spectral Annealing and Curvature Condensation

1PragyaLab, BITS Pilani Goa; 2Apple; 3Google
Optimization diagnostic for SABER stabilization

Figure 2: Optimization diagnostic on a complex loss landscape. SABER-Full enters a stable descent channel earlier than Vanilla KFAC or SGD, avoiding tail-driven excursions in rugged regions.

Abstract

Kronecker-factored Approximate Curvature (KFAC) offers a powerful second-order optimization framework, yet it often suffers from instability and high computational overhead in large-scale deep learning. We present SahaBose-KFAC, a novel approach that decouples stability from efficiency through two core operators: Saha Spectral Annealing and Bose Curvature Condensation.

Saha Spectral Annealing introduces a time-varying spectral transformation \(\tilde{\sigma}_i(t) = (\sigma_i + \epsilon)^{\alpha_t}\) that flattens the curvature spectrum early in training, suppressing spurious spikes that lead to divergence. Bose Curvature Condensation identifies the smallest robust "head" subspace capturing the majority of curvature mass and caps the tail inverse-gain, significantly reducing the cost of inversion without sacrificing performance.

Our methodology provides a theoretical bridge between first-order smoothness and second-order convergence, ensuring stable descent even in highly non-convex landscapes. We evaluate SahaBose-KFAC across diverse benchmarks including AG_NEWS, SQuAD, and SNLI, demonstrating that it achieves competitive accuracy with significantly improved stability metrics, such as a reduction in Mean Kappa from 3129.89 (Vanilla) to 1075.24 (SABER) on AG_NEWS.

Methodology: SABER-KFAC Operators

SahaBose composition flowchart

Figure 1: SahaBose composition flowchart. Saha reshapes the factor spectrum while Bose caps the inverse tail, composing the layer preconditioner via Kronecker product.


Saha Spectral Annealing

The Saha operator introduces a time-varying spectral transformation that allows for a smooth transition from first-order exploration to second-order convergence. By applying: \[ \tilde{\sigma}_i(t) = (\sigma_i + \epsilon)^{\alpha_t} \] where \(\alpha_t\) is an annealing schedule, we flatten the curvature spectrum early in training. This suppresses the spurious eigenvalues that often cause KFAC to become unstable during the initial phase of optimization.

Bose Curvature Condensation

Bose Condensation identifies the smallest robust subspace that captures the dominant curvature information. We determine the optimal rank \(k^*\) such that: \[ k^* = \min\left\{k : \sum_{i=1}^k p_i \ge m\right\} \] where \(p_i\) represents the contribution of the \(i\)-th eigenpair to the overall curvature mass. By capping the tail inverse-gain, we ensure that the preconditioner \( \mathbf{P} \approx \mathbf{A} \otimes \mathbf{G} \) remains well-conditioned while significantly reducing computational overhead.

Quantitative Results

AG_NEWS Benchmarks

Optimizer Accuracy Eval Loss Train Loss Mean \(k^*\) Time (s)
SahaBose (saber) 0.9993 2.2526 2.4511 1249.01 2608.9
AdamW 0.9995 2.0409 2.1045 0.00 3316
Vanilla KFAC 0.9978 3.4076 3.6899 1231.88 2598.0
SGD 0.9981 3.1339 3.1371 0.00 3285

SQuAD Benchmarks

Optimizer Train Loss Eval Loss Time (s) Peak Mem (MB) Mean \(k^*\)
AdamW 2.1753 2.1289 2648.62 44264 0.00
SABER-KFAC 2.1035 2.1802 3602.33 44655 1029.66
Vanilla KFAC 2.0864 2.1517 3420.16 44660 1112.32
SGD 3.1371 3.1339 3285.00 - 0.00

SNLI Benchmarks

Optimizer Time (s) Peak Mem (MB) Train Loss Eval Loss
AdamW 141.13 18503.53 1.5160 1.9852
SGD 135.15 18469.40 4.5142 4.3085
Vanilla KFAC 176.50 18590.40 1.7485 1.9083
SABER-KFAC 190.16 18590.40 1.9371 1.8965

VLMS Image Captioning

Config Optimizer BLEU-4 BLEU-1 Mean Kappa Mean \(k^*\) Time (s)
Baseline (full data, damping=2.1) AdamW 22.1% 58.8% - - 2589s
SABER (no BOSE) 16.5% 49.2% 8.1 428 2796s
KFAC 16.5% 49.2% 8.1 656 2799s
Damping fix + BOSE on (5k, damping=0.01) AdamW 27.2% 67.9% - - 561s
SABER (BOSE) 29.6% 74.7% 1498 485 667s
KFAC 32.2% 76.4% 1481 663 667s

BibTeX

@inproceedings{sathyanarayanan2026sahabose,
  author    = {Sathyanarayanan, Anish and Mallagundla, Rishikesh and Narang, Visheshe and Chadha, Aman and Jain, Vinija and Das, Amitava},
  title     = {SahaBose-KFAC: Making KFAC Stable and Fast via Spectral Annealing and Curvature Condensation},
  booktitle = {NeurIPS},
  year      = {2026},
}