A Course in Public

The Mathematics ML Runs On

The number system underneath every model, taught from the silicon up.

Most machine-learning courses start with the math and trust the hardware to keep up. This one starts where the trust breaks. Every class opens on a line of C that prints something that should be impossible, then builds the mathematics that explains it, and proves it on a real chip with no floating-point unit. Unit 1 settles what a number is doing on the silicon; Unit 2 builds the language of learning on top of it, functions, sums, counting, and limits, and Unit 3 opens on the exponential and the constant e.

Units 1 & 2 complete · Unit 3 in progress · 10 classes

What this is

A from-scratch course on the math machine learning actually runs on, taught through hardware. Each class begins with a surprising result on a real bench, derives the math behind it with no hand-waving, and backs every number with code you can run. Interactive widgets let you poke at each idea until it clicks.

Who it's for

Engineers, students, and curious people who want to know what is really happening when a model computes. Comfort with high-school algebra and a little C is plenty. You do not need to already know any machine learning; Unit 1 is the foundation everything else stands on.

The question Unit 1 answers

What is a number, really, once it has to live on a chip with finite memory, no infinities, and sometimes no floating-point unit at all?

Unit 1 · The Number System

Five classes, read in order. Each opens on a line of C that prints something impossible, then builds the math to explain it. The sidebar inside each class tracks where you are.

Part oneThe float, built and then broken

Class 1The finite machine

The Real Line and the Finite Machine

when 0.1 + 0.2 is not 0.3

The real line is infinite and gapless; a chip holds only a finite list of numbers. So every value you name snaps to the nearest entry the machine owns, and the error you carry forward starts there.

Why it matters: Every later surprise grows from this one fact. The machine never has your number, only the closest one on its list.

Class 2Anatomy

How a Floating-Point Number Is Built

when 16,777,216 + 1 = 16,777,216

Crack a float open and find three fields and one clever lie: a sign, an exponent, and a mantissa with a hidden leading one. Out of them falls the gap between numbers, and why the grid bunches near zero.

Why it matters: The structure of a single number explains why precision is relative and why a double is finer than a float.

Class 3Rounding

Rounding and Machine Epsilon

when adding one, two million times, changes nothing

The rule the machine uses to land on a grid point, the tie-break that rounds to even, and the small lawful lie measured by machine epsilon. Why summation order changes a result, and how Kahan summation claws the lost bits back.

Why it matters: Rounding is a bias you can predict and defend against, not noise. It is the difference between a sensor total that keeps counting and one that quietly freezes.

Class 4The two cliffs

Overflow and Underflow

when a number is not equal to itself

The two ends of the line: overflow saturating to infinity, underflow ramping gently to zero through the subnormals, and NaN, the value that fails its own equality check. One reserved code at each end does all the work.

Why it matters: The edges are where training runs die. Softmax overflow, vanishing gradients, loss-scaling, and the reason bfloat16 exists all live here.

Part twoUnderneath the float

Class 5Fixed-point

Integer, Fixed-Point, and No FPU

when 100 + 50 is negative

The two's complement clock, the silent wrap, and the hidden scale that gives integers fractions back. Then the exact arithmetic that lets a network run on a chip with no floating-point unit at all: quantization.

Why it matters: This is the world edge ML actually runs in. Multiply small, accumulate wide, and the whole number system closes the loop.

Unit 2 · The Language of Learning

Functions, sums, counting, and limits, the four words the rest of the course speaks. Each is pushed until the hardware shows through: a product that underflows, a space too big to search, an infinite sum that freezes.

Class 6Functions

The Function That Has No Formula

when a lookup table is a function too

A function is not a formula with an x in it; it is a promise of one definite output per input. A computed gamma curve, a stored table, a trained network, and a coin flip are all the same kind of object: a map from one set to another.

Why it matters: Read every layer by its signature, set in and set out, and a model stops being a wall of code and becomes a few maps composed.

Class 7Sums & products

The Sum, the Product, and the Number That Vanishes

when a product of probabilities hits a hard zero

Two symbols, sigma and pi, fold a list into one number by adding or multiplying. Adding runs the whole field; multiplying gives the likelihood, which underflows to exactly zero after a few dozen terms on a real float.

Why it matters: The logarithm turns the doomed product back into a safe sum. The pi becomes a sigma, and that sum is the loss every classifier is built from.

Class 8Counting

The Space Too Big to Search

when a toy model has more settings than atoms

How many things can happen? Count it, sample space, multiplication, permutations, combinations, and the answer explodes past the largest integer and past the atoms in the universe.

Why it matters: That explosion is why a model is found by following a slope, not by searching, and why outcomes are handled with probability, not by listing them.

Class 9Limits

The Endless Sum That Stops

when an infinite sum freezes at 15.4037

A repeated process is a sequence; the whole question is whether it converges to a limit. On a real machine "stopped moving" is ambiguous: it may have arrived, or only fallen below the resolution.

Why it matters: Training is that sequence. The learning rate decides whether the loss settles, oscillates, or explodes.

Unit 3 · Exponentials and Logarithms

The exponential and the logarithm, the two functions that turn multiplying into adding, and the reason a model can compute a probability at all.

Class 10The exponential & e

What Happens If You Just Keep Multiplying?

when one grain of rice buries a kingdom

One move, multiply then multiply again, run smooth, settles on a single number, e ≈ 2.718. As a function, e to the x is positive everywhere and trades sums for products, which is why it hides inside every sigmoid, softmax, and bell curve.

Why it matters: It is also the most expensive thing on a chip with no FPU, and it overflows a float32 at e^89, the reason softmax subtracts its max.

Classes 11 to 13in progress

The logarithm, log-space, and the cost of exp() and log()

The exponential run backwards becomes the logarithm and its product-to-sum identity; then log-space as plain arithmetic, and a lab measuring what exp() and log() actually cost on the ESP32-C3. Added here as they are written.

What Unit 1 gave you

A complete, ground-up picture of the numbers every model computes with.

Starting from "a chip cannot hold the real line," five classes later you know how a float is built, how it rounds, where it cliffs into infinity and zero, and what lives underneath it. You have watched 0.1 + 0.2 miss, a counter freeze, a value run off both ends of the line, and a positive sum turn negative, and you can now explain every one of them from the silicon up.

That foundation is load-bearing for everything ahead. When a training run produces a NaN, when a quantized model drifts, when changing the order of a sum changes the answer, you will not be guessing. You will know exactly which property of the number system is showing through, because you built that system yourself, one obstacle at a time.

Unit 2 has since stacked the mathematics of learning on top of this foundation, functions, sums, counting, and limits, and Unit 3 is underway on the exponential and the logarithm. But the move underneath it all is the one Unit 1 made concrete: numbers are physical, the hardware always shows through, and the honest chip is the one that shows its work.

If you're new here

Start with Class 1 and read in order. Each one opens on a real, runnable result and builds the math to explain it. The interactives reward pausing and playing, and every number is verified on the bench.

The bench

Examples run on an ESP32-C3 (a RISC-V chip with no floating-point unit), alongside an Arduino Nano BLE Sense and an STM32H7. The no-FPU chip is the star: it is the one that forces every hidden cost into the open.