The Mathematics ML Runs On · Class 2

When 16,777,216 + 1 = 16,777,216

Crack a single number open and you find three fields and one clever lie. Inside them is every answer Class 1 left hanging: why the list bunches near zero, why a float is coarser than a double, and why adding one can sometimes do absolutely nothing.

A number that refuses to move

Run this on a machine and watch the second line:

float a = 16777216.0f;     // this is 2^24
printf("%.1f\n", a);       // 16777216.0
printf("%.1f\n", a + 1);   // 16777216.0  ... still?

You asked for sixteen million and change, added one, and the machine handed back the same number. No error, no overflow warning, nothing. The one you added simply disappeared. Now, in Class 1 we already learned why this kind of thing is even possible, the machine keeps a finite list of numbers and your value snaps to the nearest one, so 16,777,217 must just not be on the list out there. Fine. But that answer raises three sharper questions, and the lovely thing is that all three turn out to have the same answer.

How is the list actually laid out? Why are the entries jammed together near zero but miles apart out here? And why does a double have a finer list than a float? Every one of those answers is hiding inside the structure of a single number. So let's open one up, slowly.

First, a warm-up in plain decimal

Forget computers for a second. Let me invent a tiny number format by hand, and the rule is this: you get exactly three significant digits, plus a power of ten. Every number looks like d.dd × 10^k. That is the whole format.

Store 12.3. That is 1.23 × 10¹, digits one-two-three, easy. Now try to store 12.34. You only have three digits, so the 4 has nowhere to live, and the closest you can write is 1.23 × 10¹, which is 12.3. The extra bit is just gone. That is Class 1's "snap to the nearest entry," except now you can see exactly why it happens: you ran out of digits.

Here is the part worth holding onto, and I want you to notice it yourself before I say it. What is the gap between storable numbers in this toy format? Near 12, you can store 12.3, then 12.4, a gap of 0.1. Near 1200, you can store 1230, then 1240, a gap of 10. Same three digits both times, but the gap near 12 is a tenth and the gap near 1200 is ten. The gap grew with the size of the number, while the digit budget never moved. That single sentence is the engine of this whole post.

A fixed number of digits gives you the same precision everywhere, which means a bigger gap the bigger the number gets.

The same idea, now in binary

A computer does exactly this, just in base two. Same three pieces: a sign, some significant digits we call the mantissa, and an exponent, except the exponent is a power of two instead of ten. And base two hands us one lovely gift we are going to lean on hard. When you write a binary number in normalised form, 1.something × 2^k, the leading digit is always 1. It has to be, because the only nonzero digit in binary is 1. Every number, every time, a 1 out front.

So here is a question worth asking: if that first digit is always going to be a 1, why bother storing it? Writing it down tells you nothing you didn't already know. And the format agrees with you, it just assumes a 1. in front and stores only the digits after the point. This is the hidden bit, and it is free precision: you get 24 significant bits while only paying to store 23. Read that twice if it feels slippery, because it is the one trick almost everyone misses.

The actual 32 bits

So how does a 32-bit float spend its bits? It chops them into three fields: 1 sign bit, then 8 exponent bits, then 23 mantissa bits. The sign bit is the easy one, 0 for positive and 1 for negative. The mantissa holds the digits after that implied 1., so the full significand is 1. followed by those 23 bits. And the exponent is the power of two. Put it together and you get the rulebook for reading any float at all:

value = (−1)^sign × 1.mantissa × 2^{(exponent − 127)}

Don't just take the formula on faith, play with it. Flip any bit below and watch the value rebuild itself. Load 1.0, then turn on the very last mantissa bit, and you jump to the float right next door. The size of that jump is the gap we keep circling back to.

Flip the bits. Watch the number.

sign · 1 bit

exponent · 8 bits (stored value, then minus 127)

mantissa · 23 bits (the digits after the implied 1.)

Read 1.0 first: exponent bits are 127, and 127 minus 127 is 0, so the scale is 2 to the power 0, which is 1. Load 0.5 and the exponent drops to 126, giving 2 to the power minus 1. Load 0.1 and the mantissa fills with that repeating 1001 pattern, the binary version of a repeating decimal, which is exactly why 0.1 never lands cleanly.

The exponent, and the “subtract 127” business

The exponent is just a power of two, a multiplier. A positive one multiplies, pushing the number above 1. A negative one divides, pulling it down into the fractions below 1. Exponent zero leaves the significand alone. And notice that negative does not mean a negative number, it means a small one: 0.1 is 1.6 × 2⁻⁴, so it needs a negative exponent simply because it is less than 1.

But that raises a real problem. The exponent field is 8 bits, and 8 bits naturally count from 0 to 255, all non-negative. There is no minus sign anywhere in there. So how on earth do you store a negative exponent like −4? The answer is a trick called bias: you never store the real exponent, you store it with 127 added on. To read a float, subtract 127. To build one, add 127. The stored field stays comfortably non-negative, and the subtraction is where negative exponents come from. Picture a dial that only shows 0 to 255, where everyone has agreed that 127 means zero. Read 130 and that is plus three; read 120 and that is minus seven. The dial never displays a negative, but the shared agreement lets it mean one.

Slide through it and watch everything pivot around 127.

The exponent is stored shifted by 127.

Stored exponent

127 = zero

Left of 127 the exponent is negative and the number is a fraction below 1. Right of 127 it is positive and the number is above 1. The bits themselves never go negative; the meaning does, the instant you subtract 127.

Now the gap appears on its own

Here is the payoff, and the satisfying part is that nobody had to invent the gap. It just falls out of two facts you already have in hand. First, the significand is a fixed ruler: 23 mantissa bits cut the range from 1 to 2 into 2²³ evenly spaced marks, and the smallest possible step is the last bit, worth 2⁻²³. That ruler never changes. Second, the exponent multiplies the whole ruler by 2ⁿ, laying it down across the octave from 2ⁿ to 2ⁿ⁺¹.

So take two neighbouring floats. In the significand they differ by 2⁻²³. But both get multiplied by 2ⁿ to sit in their octave, so the actual numbers differ by 2⁻²³ × 2ⁿ, which is:

gap in the octave at exponent k = 2^{(k − 23)}

That is the whole thing. The ruler's fixed step, stretched by the octave's scale. And because each octave bumps k up by one, the gap doubles every time you cross a power of two. Type any number into the explorer and it finds the octave, computes the gap straight from that formula, and shows you the two floats your number is stuck between.

Where does your number land, and how wide is the gap?

Push the number up across 2, 4, 8, 1024, a million, and watch the gap jump exactly at each power of two. Near a million the gap is already 0.0625, which is why a number like 1000000.1 cannot be stored and lands on 1000000.125 instead.

And now the opening mystery is fully mechanical, no hand-waving left. 16,777,216 is 2²⁴, so its octave has exponent k = 24, and the gap there is 2⁽²⁴⁻²³⁾ = 2¹ = 2. The two floats bracketing 16,777,217 are 16,777,216 and 16,777,218, sitting two apart, and your number lands dead in the empty middle. So +1 produces an answer with no float to live on, and it snaps right back to where it started. Adding one did nothing because, that far out, one is smaller than the space between neighbours.

The smallest walk and the biggest walk

Since the gap is 2^(k-23), the smallest gaps must sit at the most negative exponents, down near zero, and the biggest at the most positive, out near the top. At the bottom of the normal range the step shrinks to about 2^-149, roughly 1.4 × 10^-45, so floats are packed unbelievably tightly near zero. Out near the top, around k = 127, that same step balloons to 2^104, about 2 × 10^31, so two neighbouring giant floats sit twenty thousand trillion trillion apart. Add anything smaller than that gap to such a number and nothing happens at all, the same vanishing act as the +1 from before, just taken to an extreme.

So why does it bunch up near zero? Let me draw it out. Picture a toy float with only a couple of mantissa bits, so each octave holds just a small handful of values. Every octave holds the same handful. But each octave you step toward zero is half as wide as the one before it, so that same handful gets squeezed into half the space and the marks pile up. Step the other way and each octave is twice as wide, so the marks spread thin.

Why floats crowd near zero

Same number of floats in every octave, but the octaves nearer zero are narrower, so the marks bunch together, while the octaves further out are wider, so they spread apart. A real float does this with 2^23 marks per octave instead of four, running from a step of 2^-149 near zero out to 2^104 near the top.

What this means in practice

Notice one more thing the gap formula is quietly telling you: the gap is always about one ten-millionth of the number sitting beside it. Divide the gap 2⁽ₖ⁻²³⁾ by the number it sits next to, roughly 2ₖ, and the 2ₖ cancels, leaving about 2⁻²³. So a float pins down any number to about 7 significant digits, whether it is 0.001 or a million. That is what "precision is relative" really means: the relative error stays fixed while the absolute gap grows with the number. A float knows a coffee price to the cent and a national budget only to the nearest few thousand.

Two edges close the whole thing off. The exponent is only 8 bits, so it reaches a largest finite float of about 3.4 × 10³⁸ and a smallest normal float of about 1.2 × 10⁻³⁸. Past those edges lie overflow and underflow, the two cliffs we save for Class 4. And here, finally, is the Class 1 mystery settled: a double spends 52 bits on the mantissa and 11 on the exponent, against a float's 23 and 8. More mantissa bits means finer gaps, about 16 significant digits instead of 7. More exponent bits means a wider reach. So a double's list is both finer and longer, which is the entire reason 0.1f + 0.2f == 0.3f came out true while the double version came out false. Same arithmetic, different resolution.

The hardware shadow

Everything you just did in that bit playground, pulling a value apart into sign, exponent and mantissa and packing it back together, is exactly what a chip without a floating-point unit does in software, for every single add and multiply. On a processor with an FPU the surgery is buried in silicon and you never see it. On something like an ESP32-C3, which has no FPU, the software float library pulls those three fields apart as plain integers, does integer work, and packs them back, every time. That is why one float operation on the C3 costs real, countable cycles, and why the price of a single exp() or log() becomes something you measure later in this course rather than wave away. On that chip, "a packed integer with a rulebook" is not a metaphor. It is the source code.

Where this is going

We built where the entries on the list sit, their exact positions, and the recipe to encode or decode any one of them by hand. What we have not built yet is the rule the machine follows when your number lands between two entries: how it picks which neighbour to snap to, and the small, famous quantity called machine epsilon that measures the step right around 1.0. That rule and that number are the next post. We know the grid now. Next we learn how the machine lands on it, and we will finally be able to predict the last digit of 0.1 + 0.2 by hand.

A float is a packed integer read through a rulebook: sign, then 1.mantissa, then a power of two shifted by 127. Because the mantissa budget is fixed, the gap is 2 to the power (k minus 23), which is why precision is relative and the list bunches near zero.

Check yourself before Class 3

Six questions, all answerable from this post. Tap an answer and it tells you straight away whether it holds, and why. The playground and the gap explorer above are there if you want to check your working afterwards.

A short, self-marking quiz

When 16,777,216 + 1 = 16,777,216

A number that refuses to move

First, a warm-up in plain decimal

The same idea, now in binary

The actual 32 bits

The exponent, and the “subtract 127” business

Now the gap appears on its own

The smallest walk and the biggest walk

What this means in practice

The hardware shadow

Where this is going

Check yourself before Class 3

Comments