How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models
Klea Ziu As foundation models scale to billions or even trillions of parameters, they often exhibit training instabilities, particularly vanishing and exploding gradients. During the initial training phase (pre-training), it is common to observe loss spikes, which can degrade the model’s performance or...