When Adding One, Two Million Times, Changes Nothing
A counter that freezes, a sum whose answer depends on the order you add it, and the last digit of 0.1 + 0.2 finally predicted by hand. Class 2 showed where the grid points sit. This is the rule the machine uses to land on one, and the small, lawful lie it tells every single time.
Two things that shouldn't happen
Run a float counter up by one and, at a certain point, it stops climbing. Not slows down. Stops, dead, forever:
And here is a second one to sit beside it. Take the same pile of numbers, one big value and two million ones, and add them up in two different orders:
Both of these should be impossible. In real arithmetic, order never changes a sum, and adding one always adds one. Class 2 got us close to the first: at 16,777,216 = 2²⁴ the gap between floats is 2, so 16,777,217 is simply not on the grid. But notice that only tells us the answer can't be 16,777,217. It doesn't tell us why the result falls back to 16,777,216 instead of jumping up to 16,777,218, and it says nothing at all about the two million missing ones. Both need the one thing Class 2 deliberately left out: the rule for what happens to a value that lands in the gap.
The rule: round to the nearest
So what does the machine do with a result stuck between two grid points? Call them lo just below and hi just above, one gap apart. The machine can only hand back a grid point, so it hands back the nearer of the two. The dividing line is the midpoint of the gap: below it you are closer to lo and round down, above it you round up. Every float owns a little catchment area, half a gap wide on each side, and any real value that falls inside it becomes that float.
The error you pay is whatever distance was left over, and it is never more than half a gap, because the worst place to land is right at the midpoint. Type a number below and watch which way float rounding actually goes, and where that midpoint falls.
Try 100.7, which sits below its midpoint and rounds down. Then try 16777217, which lands exactly on the midpoint. That is a tie, and it is the special case the next section is about.
The tie, and why it rounds to even
But what happens when the true result lands exactly on the midpoint, the same distance from both neighbours? That is a tie, and "nearest" has stopped deciding anything. Your first instinct is probably "just always round up." And that is exactly what the format refuses to do, for a reason that is worth slowing down for. At a tie, both neighbours are equally wrong, off by exactly half a gap either way, so the choice cannot make that one operation any more accurate. What it can do is sneak in a direction. Always rounding ties the same way nudges every single tie toward the same side, and over millions of operations those little nudges pile up into a steady drift.
So the rule is round half to even: on a tie, pick the neighbour whose last mantissa bit is 0. Since consecutive floats alternate their last bit 0, 1, 0, 1, the even neighbour sits below you half the time and above you the other half, so ties round down about as often as up, and the errors cancel instead of marching in one direction. This is the same rule banks use, where it goes by the name banker's rounding.
This is why the counter freezes. 16777216 + 1 is the tie at 16777217, whose even neighbour (16777216, last bit 0) is below, so it rounds back down and never moves. 16777218 + 1 is the tie at 16777219, whose even neighbour (16777220) is above, so that one rounds up.
The size of the lie: machine epsilon
If every operation rounds, the natural next question is: how big is the damage? One rounding costs at most half a gap, but the gap is different at every magnitude, so "the rounding error" isn't a single number you can quote. The fix is to measure the gap at one fixed reference point, 1.0. That gap has a name, machine epsilon, and it equals 2⁻²³ ≈ 1.2 × 10⁻⁷ for a float and 2⁻⁵² ≈ 2.2 × 10⁻¹⁶ for a double. Why 1.0? Because that is the bottom of the octave where the scale is 2⁰ = 1, so the gap there is the bare significand step with nothing multiplied on top.
But here is the question that really matters: how can one number describe the precision everywhere, when the gap itself changes constantly? Take a value near magnitude 2ⁿ. The gap there is 2⁽ₖ⁻²³⁾, half of it is 2⁽ₖ⁻²⁴⁾, and if you divide that by the value itself, about 2ⁿ, you get:
Watch what happened there. The 2ⁿ appears on top, because the gap grows with magnitude, and on the bottom, because the number is that magnitude, so it cancels and disappears completely. That cancellation is the whole point. The absolute gap explodes from a ten-millionth near 1 to a full 2 near sixteen million, but the instant you measure the error relative to the number, the magnitude drops out and you are left with a fixed eps/2. A float holds the same roughly seven significant figures whether the number is a coffee price or a national budget. Keep the two ideas apart and you will never be confused again: the worst-case error is half a gap in absolute terms, and about eps/2 in relative terms.
On top, the gap really does explode as the numbers grow. Underneath, once you measure that gap against the number beside it, the size cancels and the bars are identical. That flat bottom row is what machine epsilon captures: one number for the precision you get anywhere on the line.
When the lie bites: a small number meets a large one
Now let me show you the most common floating-point trap there is, and the nice thing is that it is nothing new, it is just round-to-nearest applied to addition. To actually move y by adding x, the true sum has to reach at least the midpoint to y's next neighbour, which is half a gap away. So the rule comes out sharp:
If x is smaller than half the local gap, the true sum lands inside y's own catchment area and rounds straight back to y. The addition genuinely happened; the rounding quietly erased it. And that is the frozen counter, exactly: below 2²⁴ the gap is 1, so +1 clears the half-gap threshold of 0.5 and the counter climbs; at 2²⁴ the gap doubles to 2, the threshold becomes 1, and now +1 is exactly the threshold, a tie that rounds back down. Pick a large y and a small x below and see whether it survives the meeting.
At a million the gap is 0.0625, so the threshold is 0.03125. Adding 0.02 vanishes; adding 0.05 survives. A running total stops growing the moment its increments fall below half its own gap.
When the lie bites: the order you add in
And this is where the second mystery cracks open. Because survival depends on how large the running total is at the moment each value arrives, the order of a sum changes the answer. Add the big number first, and every one of the two million ones meets a total of 16,777,216 (gap 2, threshold 1) and dies on arrival, leaving 16,777,216. Add the ones first, and they accumulate among themselves while the total is still tiny and the grid is still fine, reaching an exact 2,000,000, and only then meet the big number for the true 18,777,216. Same numbers, same additions, two million apart. The cheap defence is to sum from smallest to largest, so the small values add up before they ever meet anything big enough to swallow them.
The robust defence is Kahan summation, and the cleanest way to picture it is a bank balance that can only show whole dollars. You deposit 40 cents and it rounds away to nothing. So you keep a second number on the side, a sticky note of the cents that didn't fit, and you fold it back into the next deposit. The cents pile up on the note until they are worth a whole dollar, and at that point the balance finally records it. The four lines below are exactly that idea:
The clever line is the third, and it is worth staring at. In ordinary algebra (t - sum) - y is just zero, since t = sum + y. It is not zero here, and only for one reason: t got rounded to the grid, so the balance actually moved by a slightly different amount than you pushed in, and that difference is precisely the bits that fell off the edge. Step through it one deposit at a time below, and watch the naive sum freeze while the note c quietly catches the falling cents.
Naive freezes at 100, losing every deposit. Kahan reaches 102, the correctly rounded true total, because the dropped 0.4s live on the note until they are large enough for the balance to show them.
The payoff: 0.1 + 0.2, predicted by hand
And now, at last, we can settle the number that opened this whole series. Three roundings happen, not one. 0.1 is rounded to its nearest double the very moment you write it, and so is 0.2. Then those two stored doubles are added, and that exact result is rounded a third time. That last rounding is plain round-to-nearest, nothing exotic. Measured on the grid near 0.3, the double that stands in for 0.3 sits just below the true value, while the exact sum of the stored 0.1 and stored 0.2 lands one full step higher, essentially right on the next double up. Round-to-nearest sends it to that upper neighbour, one step past 0.3, and that reads in decimal as 0.30000000000000004. So that trailing ...04 was never noise. It is exactly one step of the grid, produced by the rounding rule picking the nearer of two neighbours. No tie, no mystery, just the rule doing its job.
The hardware shadow
Every rounding decision in this post, the nearest-neighbour choice, the tie-break, the half-gap survival check, happens in silicon on a chip with a floating-point unit, and in plain software on a chip without one. On an ESP32-C3, with no FPU, the runtime carries out these steps as real instructions on every add and multiply. That is why a long float accumulation can silently flatline on such a device once the total grows large compared to the increments, and why four lines of Kahan, costing you one extra variable, can be the whole difference between a sensor total that keeps counting and one that quietly freezes. The lie is cheap to tell and cheap to defend against, but only if you know it is there in the first place.
Where this is going
Everything here was about rounding in the middle of the number line, where a value always has two real neighbours to choose between. But what happens at the ends, when a result is too big to have an upper neighbour, or too small to have a lower one? That is a different kind of failure entirely. That is overflow and underflow, the two cliffs of Class 4.
Check yourself before Class 4
Seven questions, all answerable from this post. Tap an answer and it tells you straight away whether it holds, and why. The explorers above are there if you want to check your working after you commit.
Comments