This project is a minimal implementation of a character-level language model inspired by the makemore series by Andrej Karpathy. The goal is to build an intuitive understanding of how neural networks can learn patterns in text and generate new sequences—in this case, human-like names.
The model operates at the character level, meaning it learns to predict the next character given a fixed-length context of previous characters. Despite its simplicity, it captures interesting statistical patterns in names and can generate realistic outputs.
The pipeline consists of:
- Building a vocabulary of characters
- Encoding characters into integers
- Creating context-target training pairs
- Training a simple neural network
- Sampling new names from the learned distribution
This implementation uses a fixed context window (block size) and a feedforward neural network with embeddings.
The dataset (names.txt) contains ~32,000 names. Each name is treated as a sequence of characters, with a special token . representing the end of a word.
Example:
emma → [e, m, m, a, .]
The model is a simple neural network with the following components:
-
Embedding Layer Maps each character index to a dense vector representation.
-
Hidden Layer Fully connected layer with
tanhactivation. -
Output Layer Produces logits over the vocabulary.
- Vocabulary size: 27 (26 letters +
.) - Embedding size: 10
- Context length (block size): 3
- Hidden layer size: 300
- Total parameters: 17,697
Training is performed using mini-batch gradient descent.
-
Batch size: 32
-
Iterations: 200,000
-
Learning rate:
- 0.1 for first 100k steps
- 0.01 thereafter
-
Loss function: Cross-entropy
The model learns to predict the next character given the previous three characters.
Final losses:
- Training loss: ~2.11
- Validation loss: ~2.17
These values indicate the model has learned meaningful structure, though it is still a relatively simple architecture.
After training, the model can generate new names by iteratively sampling characters:
Example outputs:
ter
maleah
makilah
tah
mallissana
nalusan
katha
samiyah
javer
gotti
The model starts with an empty context and keeps generating characters until it produces the end token ..
This project demonstrates several important ideas:
- Character-level modeling
- Embedding representations
- Context-based prediction
- Neural network training with backpropagation
- Probabilistic sampling
-
Install dependencies:
pip install torch matplotlib -
Place
names.txtin the root directory. -
Run the notebook or converted Python script:
python makemore.py
This project is based on the teachings and ideas from Andrej Karpathy’s makemore project. His work provides one of the clearest and most intuitive introductions to building neural networks from scratch, particularly by emphasizing first principles over heavy abstraction. Instead of relying on high-level frameworks, the approach breaks down every component—data processing, embeddings, forward pass, loss computation, and gradient updates—into explicit, understandable steps.
The makemore series is especially valuable because it demonstrates how seemingly complex systems like language models can be constructed incrementally from simple building blocks. By starting with basic probabilistic models and gradually introducing neural networks, it helps bridge the gap between theory and practical implementation. This philosophy strongly influences this project, which aims to remain minimal while still capturing the core mechanics of modern language modeling.
A significant aspect of Karpathy’s teaching style is his focus on intuition. Concepts such as embeddings, context windows, and non-linear transformations are not just implemented but also explained in a way that builds mental models for how and why they work. This project follows the same spirit—prioritizing clarity and transparency over optimization or scale—so that each part of the model can be inspected, modified, and extended with ease.
If you are interested in developing a deeper understanding of neural networks, language models, or PyTorch-based implementations, exploring his original work is highly recommended.
- Increase context window size
- Use deeper architectures
- Replace tanh with modern activations (ReLU, GELU)
- Add batch normalization
- Train on larger or different datasets
- Move to transformer-based models
This project is for educational purposes. Refer to the original makemore repository for licensing details.
