LLM Training —Google Hardware and Software Stack

Ranko Mosic
3 min readOct 29, 2023
Google Stack¹

Why Does Specialized Hardware Make Sense for Deep Learning Models?
Deep learning models have three properties that make them different than many other kinds of more general purpose computations. First, they are very tolerant of reduced-precision computations. Second the computations performed by most models are simply different compositions of a relatively small handful of operations like matrix multiplies, vector operations, application of convolutional kernels, and other dense linear algebra calculations [Vanhoucke ​et al. ​ 2011]. Third, many of the mechanisms developed over the past 40 years to enable general-purpose programs to run with high performance on
modern CPUs, such as branch predictors, speculative execution, hyperthreaded-execution processing cores, and deep cache memory hierarchies and TLB subsystems are unnecessary for machine learning computations. So, the opportunity exists to build computational hardware that is specialized for dense,
low-precision linear algebra, and not much else, but is still programmable at the level of specifying programs as different compositions of mostly linear algebra-style operations. This confluence of characteristics is not dissimilar from the observations that led to the development of specialized digital
signal processors (DSPs) for telecom applications starting in the 1980s
[ ​en.wikipedia.org/wiki/Digital_signal_processor​]. A key difference though, is because of the broad applicability of deep learning to huge swaths of computational problems across many domains and fields
of endeavor, this hardware, despite its narrow set of supported operations, can be used for a wide variety of important computations, rather than the more narrowly tailored uses of DSPs.

Google has not been shy about stay back a node or two with its TPU designs, and that is absolutely on purpose to keep the cost of chip design and production low.

TPU v4
TPU v4 chip

The TensorFlow ecosystem contains a number of compilers and optimizers that operate at multiple levels of the software and hardware stack.

It’s actually more complicated than this

In this diagram, we can see that TensorFlow graphs[1] can be run a number of different ways. This includes:

  • Sending them to the TensorFlow executor that invokes hand-written op-kernels
  • Converting them to XLA High-Level Optimizer representation (XLA HLO), which in turn can invoke the LLVM compiler for CPU or GPU, or else continue to use XLA for TPU. (Or some combination of the two!)
  • Converting them to TensorRT, nGraph, or another compiler format for a hardware-specific instruction set
  • Converting graphs to TensorFlow Lite format, which is then executed inside the TensorFlow Lite runtime, or else further converted to run on GPUs or DSPs via the Android Neural Networks API (NNAPI) or related tech.

XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes.

JAX

Flax

Flax is a high-performance neural network library and ecosystem for JAX that is designed for flexibility: Try new forms of training by forking an example and by modifying the training loop, not by adding features to a framework.

Flax is being developed in close collaboration with the JAX team and comes with everything you need to start your research, including:

  • Neural network API (flax.linen): Dense, Conv, {Batch|Layer|Group} Norm, Attention, Pooling, {LSTM|GRU} Cell, Dropout
  • Utilities and patterns: replicated training, serialization and checkpointing, metrics, prefetching on device
  • Educational examples that work out of the box: MNIST, LSTM seq2seq, Graph Neural Networks, Sequence Tagging
  • Fast, tuned large-scale end-to-end examples: CIFAR10, ResNet on ImageNet, Transformer LM1b

¹ TensorFlow is suspiciously missing from this Google schema. TF is still quite popular on github though.

--

--

Ranko Mosic

Applied AI Consultant Full Stack. GLG Network Expert https://glginsights.com/ . AI tech advisor for VCs, investors, startups.