microGPT Playground

This is a single-layer transformer, the same architecture behind GPT-4, Claude, and every other large language model, running entirely in your browser. It starts with random weights and learns to generate plausible human names by training on 500 real names, one character at a time. Each card below visualizes a different component of the model as it trains: embeddings, attention patterns, MLP activations, and output probabilities.

Read the full explanation of how every piece works.

How to use: Hit Train to start training automatically, or Step to advance one step at a time and watch each piece update. The cards below show what happens inside the model at every step. Tweak the knobs, hit Reset, and train again to see how they change the result.
What is the model doing?
Press Train or Step to start
Forward Pass Data flows through the model to make a prediction
1
Turning Characters into Numbers The model represents each character as a list of 16 numbers called an embedding. te = wte[token_id]; pe = wpe[position]
Combined embedding feeds into attention. x = token_emb + pos_emb
2
Which Characters Matter? The attention mechanism decides which earlier characters are relevant for predicting the next one. score = (Q . K) / sqrt(d); attn = softmax(score)
Attention output + residual feeds into MLP and then to output. x = attn(x) + x
3
Thinking It Through h = relu(x @ W1); out = h @ W2
4
Making a Prediction probs = softmax(x @ lm_head)
Backward Pass Compare prediction to truth. loss = -log(prob[correct]). Error flows back through every layer.
5
How Wrong Was It? loss.backward() computes gradients for all weights
6
Learning from the Mistake param -= lr * grad / (sqrt(v) + eps) across weights
Result Weights updated. The model tries again next step, slightly less wrong.
Generated Names Invented from scratch, not copied from training data
 
Model Stats