The Mathematics ML Runs On · Class 1

When 0.1 + 0.2 Isn't 0.3

Before any model, any gradient, any tensor, there is the number. And here is the thing nobody warns you about: the numbers on a chip are not the numbers you learned in school. This is the first place the maths and the silicon quietly part ways.

A sum a child could do

Let me start with a program so simple it looks like a trick question. Three lines, nothing clever anywhere in it.

#include <stdio.h>

int main(void) {
    printf("%.17g\n", 0.1 + 0.2);   // you expect 0.3
    return 0;
}

You would bet money on the answer, wouldn't you? One tenth plus two tenths is three tenths. Every seven year old knows this. So run it, and the machine prints 0.30000000000000004.

That trailing 4 is not your screen glitching, and it is not printf rounding badly. The number actually sitting in memory really is a hair more than 0.3. So the machine, the thing we trust to fly planes and price options and steer cars, just fumbled a sum off a primary school worksheet.

The rest of this post is one long answer to "why". And the strange part of that answer is that it has nothing to do with addition. The machine added perfectly. The trouble started one step earlier, the moment you wrote 0.1.

The machine added correctly. The numbers were already wrong before it ever started.

The numbers you grew up with

Think back to the number line from school. It runs forever in both directions, smooth and unbroken, no gaps anywhere. Pick any two numbers sitting close together, say 0.1 and 0.1000001, and there are infinitely many more crammed between them. You can always zoom in and find more. You never hit a smallest step, and you never run out.

That is what a real number is. 0.1, one third, pi, the square root of two, each one is a single point sitting somewhere on that endless, gapless line. There are infinitely many of them, packed together with no space in between. This is the world your maths lessons lived in, and it is a world of pure abstraction. It does not fit on anything physical.

So before we put the machine on trial for the missing 0.3, let me ask a much smaller question first. When you type 0.1 into a program, what does the machine actually keep? Type something in below and look at what comes back.

Type a number. See what the machine keeps.

stored as a float · 32 bits

what the chip keeps

stored as a double · 64 bits

what the chip keeps

Try 0.5 and 0.25. They come back perfectly, because they are built from halves and quarters, which is the only currency the machine deals in. Now try 0.1, 0.2 or 0.3. None of them survive the trip. And try 1000000.1 as a float: it lands on 1000000.125, wrong by an eighth, because that far out the nearest number the chip owns is a whole eighth away.

What the machine actually has

So why can't it keep 0.1? Start from the one fact everything else grows out of. A computer is a physical object, and every number it stores takes up a fixed amount of room, decided in advance: 32 bits for a float, 64 for a double. And a fixed amount of room can only hold a fixed number of distinct patterns. That sounds too obvious to matter. It turns out to be the whole story.

Picture a combination lock with four dials. It has exactly ten thousand settings, a lot, but a countable, finite lot. There is no ten-thousand-and-first combination, because there is nowhere to put it. Bits behave the same way. So the set of numbers a machine can actually represent is not that smooth infinite line. It is a finite list. An enormous one, billions of entries long, but a list all the same, with a first entry, a last entry, and a definite count, exactly like the lock.

Now hold the two pictures side by side and you can already feel the collision coming. The real line has infinitely many numbers. The list has finitely many. So almost every number on the line is simply not on the list. Not stored a little roughly. Not there at all, the same way the lock has no setting for "two and a half".

The full line, and the list with gaps

The blue line is continuous: pick any two points and there are always more between them. The amber dots are everything the machine actually owns. Almost nothing you type sits on a dot, so it drops to the nearest one, and that small jump is where every later surprise in this series begins.

The machine does not store your number. It stores the nearest number it already owns.

So where did the 0.3 go?

Here is what actually happens when your program says 0.1. The machine checks its list, finds that 0.1 is not on it, and does the only thing it can. It picks the entry closest to 0.1 and uses that instead. No warning, no error. From that instant, your variable does not hold one tenth. It holds a stand-in.

So the sum was never 0.1 plus 0.2. It was the stand-in for 0.1, plus the stand-in for 0.2. Add those two together and the exact result is itself a number that is also not on the list, so it too gets swapped for its nearest entry. And that entry is not the same one the machine keeps as the stand-in for 0.3. Three swaps, none of them quite lining up. Seen this way, 0.30000000000000004 is not a glitch at all, it is exactly what you should expect. The machine never made an arithmetic mistake. It just never had your numbers in the first place.

You can watch the swap happen. Slide the marker to any value on the real line. The machine cannot land on it, so it drops to the nearest entry on its list, and whatever distance is left over is the error you carry forward into everything you compute after.

The real line is full. The list has gaps.

Pick a real value

This is a simplified, evenly spaced list, just enough to see the snapping. On a real chip the entries are not evenly spaced. They sit tightly packed near zero and spread further apart as the numbers grow, which is why 0.1 missed by almost nothing while 1000000.1 missed by a whole eighth. Why the spacing looks like that is the entire subject of the next post.

The float that looked right

Now let me show you a result that trips up almost everyone. In double precision, 0.1 + 0.2 == 0.3 comes out false, as we just saw. But switch to a float, the smaller 32-bit number, and the very same test comes out true. So did the float get it right and the double get it wrong?

Not even close. The float did not get it more right. It got it blurrier.

Think about why. A float keeps a shorter, coarser list than a double, fewer entries, spaced further apart. On that coarse list, the stand-in for the computed sum and the stand-in for 0.3 happen to land on the very same entry, so the machine sees them as identical and reports true. On the double's finer list, with far more entries packed in, those two values land on different entries, and the machine correctly reports that they are not the same. The double is not failing. It has enough resolution to notice a difference the float is simply too coarse to see. The smaller type passed the test by being unable to tell the two numbers apart.

Same sum, two lists, watch the verdict flip.

double · 64 bits

stores 0.1 as

stores 0.2 as

sum becomes

stores 0.3 as

Fine list. The sum and 0.3 land on two different entries.

float · 32 bits

stores 0.1 as

stores 0.2 as

sum becomes

stores 0.3 as

Coarse list. The sum and 0.3 land on the same entry, so the chip cannot tell them apart.

Nothing here is rounded for show. These are the exact values the two formats keep, computed live in your browser. The float and the double are looking at the same arithmetic. They only disagree because one of them can see more detail than the other.

Equality between these numbers never meant equal. It only ever meant they landed on the same entry of the list.

This is the silicon, not sloppy code

At this point you might be thinking a careful enough programmer could sidestep all of this and make 0.1 come out as exactly 0.1. You can't, and the reason is the whole point of this series. The list is finite because the memory is finite, and the memory is finite because it is a physical thing with a fixed number of switches. No amount of clever code can place 0.1 onto a list that has no slot for it. The gap between the smooth infinite line and the finite list is not a bug someone forgot to fix. It is the shape of the hardware showing through the maths. This series has a name for that, a hardware shadow, and this is the first one, the one every later surprise grows out of.

On a big processor you rarely notice any of this, because it has dedicated floating-point hardware that does the swapping and the arithmetic in a cycle or two, completely out of sight. But move to the kind of small chip that runs machine learning at the edge, something like an ESP32-C3 with no floating-point unit at all, and the chip has to do every bit of this stand-in bookkeeping in plain software. Now the cost is not hidden. You can count it in cycles. Same finite list, same swaps, same shadow. The only difference is whether you can see the machine paying for it, and on the small chips, you always can. That is exactly why this course works close to the metal: the honest hardware is the one that shows its work.

Where this is going

We leaned on a single idea for this whole post: the machine keeps a finite list, and every number you name gets swapped for the nearest entry. But we never said how that list is built, why a float's list is coarser than a double's, or why the entries bunch up near zero and thin out as you climb. That is the next post, where we pry a single number open and find the three fields packed inside it. After that comes the exact rule the machine uses to choose "nearest", and a small, famous quantity called machine epsilon that measures the size of the gap. And when we get there, this same 0.1 + 0.2 will be waiting for us, and we will finally be able to predict its last digit by hand.

For now, one line to carry with you.

The numbers on a chip are a finite list. Every real you name is swapped for the nearest entry. That swap, not the arithmetic, is where the surprises come from.

Check yourself before Class 2

Six quick questions, every one answerable from this post alone. Tap an answer and it tells you straight away whether it holds up, and why. Nothing here needs the next class.

A short, self-marking quiz

When 0.1 + 0.2 Isn't 0.3

A sum a child could do

The numbers you grew up with

What the machine actually has

So where did the 0.3 go?

The float that looked right

This is the silicon, not sloppy code

Where this is going

Check yourself before Class 2

Comments