[ICML 24] A novel framework for calculating GraB updates in a layer-wise fashion, integrating seamlessly into the PyTorch Automatic Differentiation (AD) engine that makes it possible to employ optimal convergence rates in the pre-training of Large Language Models (LLMs).