graph LR
A[Input Text]:::hl --> B[Tokenizer]:::hl --> C[Token IDs]:::hl --> D[Embedding]:::hl --> E["Layers ×80"]:::hl --> F[Predict]:::hl --> G[Output Token]:::hl
G -.->|decode loop| E
classDef hl fill:#2d6a4f,stroke:#1b4332,color:#d8f3dc
classDef default fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
click B "/llms/what-happens/tokens/"
click C "/llms/what-happens/tokens/"
click D "/llms/what-happens/embeddings/"
click E "/llms/what-happens/embeddings/model-layers/"
click F "/llms/what-happens/embeddings/model-layers/final-vector-to-token/"
When you send a message to an LLM like ChatGPT or Claude, here’s what happens at a high level:
- Your message, along with any previous messages in the conversation (the “context”), gets converted from human-readable text into numbers — specifically, a sequence of numerical vectors. This step is called tokenization and embedding.
- Those numbers flow through the model — a stack of mathematical layers that transform them, over and over, each layer refining the model’s internal representation of what you said and what should come next.
- The final layer outputs a probability distribution: a ranked list of every possible next word (technically “token”) the model could produce, with a score for how likely each one is.
- A token is selected from that distribution, appended to the sequence, and the whole process repeats — the model now takes everything so far (your messages + its own partial response) and predicts the next token again. This loop continues until the model produces a stop signal.
That’s it. Every response you’ve ever gotten from an LLM was generated one token at a time, left to right, by a system that only knows how to do one thing: predict what comes next.