Part 1 · Architecting Intelligence: Building LLMs From First Principle ·
Encodings
Foundations
LLMs
Telegraphs, Morse, ASCII, encoding wars, Unicode, and UTF-8, a tour of how
we went from “machines can only do arithmetic” to “machines can at least store text”, and why
even that doesn’t give us meaning. This is where the need for tokenisation starts
to become unavoidable.
Part 2 · Architecting Intelligence: Building LLMs From First Principle ·
Geometry
Similarity
LLMs
We warm up with MNIST digits, Euclidean distance, and cosine similarity to see how numbers can
carry structure and meaning in a vector space. Then we ask the slightly uncomfortable question:
if this works so nicely for images, can we coax text into behaving this way too?