makemore: Character-Level Language Model

This project is a minimal implementation of a character-level language model inspired by the makemore series by Andrej Karpathy. The goal is to build an intuitive understanding of how neural networks can learn patterns in text and generate new sequences—in this case, human-like names.

The model operates at the character level, meaning it learns to predict the next character given a fixed-length context of previous characters. Despite its simplicity, it captures interesting statistical patterns in names and can generate realistic outputs.

Overview

The pipeline consists of:

Building a vocabulary of characters
Encoding characters into integers
Creating context-target training pairs
Training a simple neural network
Sampling new names from the learned distribution

This implementation uses a fixed context window (block size) and a feedforward neural network with embeddings.

Dataset

The dataset (names.txt) contains ~32,000 names. Each name is treated as a sequence of characters, with a special token . representing the end of a word.

Example:

emma → [e, m, m, a, .]

Model Architecture

The model is a simple neural network with the following components:

Embedding Layer Maps each character index to a dense vector representation.
Hidden Layer Fully connected layer with tanh activation.
Output Layer Produces logits over the vocabulary.

Configuration

Vocabulary size: 27 (26 letters + .)
Embedding size: 10
Context length (block size): 3
Hidden layer size: 300
Total parameters: 17,697

Training

Training is performed using mini-batch gradient descent.

Batch size: 32
Iterations: 200,000
Learning rate:
- 0.1 for first 100k steps
- 0.01 thereafter
Loss function: Cross-entropy

The model learns to predict the next character given the previous three characters.

Results

Final losses:

Training loss: ~2.11
Validation loss: ~2.17

These values indicate the model has learned meaningful structure, though it is still a relatively simple architecture.

Sampling

After training, the model can generate new names by iteratively sampling characters:

Example outputs:

ter
maleah
makilah
tah
mallissana
nalusan
katha
samiyah
javer
gotti

The model starts with an empty context and keeps generating characters until it produces the end token ..

Key Concepts

This project demonstrates several important ideas:

Character-level modeling
Embedding representations
Context-based prediction
Neural network training with backpropagation
Probabilistic sampling

How to Run

Install dependencies:
```
pip install torch matplotlib
```
Place names.txt in the root directory.
Run the notebook or converted Python script:
```
python makemore.py
```

Credits

This project is based on the teachings and ideas from Andrej Karpathy’s makemore project. His work provides one of the clearest and most intuitive introductions to building neural networks from scratch, particularly by emphasizing first principles over heavy abstraction. Instead of relying on high-level frameworks, the approach breaks down every component—data processing, embeddings, forward pass, loss computation, and gradient updates—into explicit, understandable steps.

The makemore series is especially valuable because it demonstrates how seemingly complex systems like language models can be constructed incrementally from simple building blocks. By starting with basic probabilistic models and gradually introducing neural networks, it helps bridge the gap between theory and practical implementation. This philosophy strongly influences this project, which aims to remain minimal while still capturing the core mechanics of modern language modeling.

A significant aspect of Karpathy’s teaching style is his focus on intuition. Concepts such as embeddings, context windows, and non-linear transformations are not just implemented but also explained in a way that builds mental models for how and why they work. This project follows the same spirit—prioritizing clarity and transparency over optimization or scale—so that each part of the model can be inspected, modified, and extended with ease.

If you are interested in developing a deeper understanding of neural networks, language models, or PyTorch-based implementations, exploring his original work is highly recommended.

Possible Extensions

Increase context window size
Use deeper architectures
Replace tanh with modern activations (ReLU, GELU)
Add batch normalization
Train on larger or different datasets
Move to transformer-based models

License

This project is for educational purposes. Refer to the original makemore repository for licensing details.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
README.md		README.md
image.jpg		image.jpg
makemore.ipynb		makemore.ipynb
names.txt		names.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

makemore: Character-Level Language Model

Overview

Dataset

Model Architecture

Configuration

Training

Results

Sampling

Key Concepts

How to Run

Credits

Possible Extensions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

makemore: Character-Level Language Model

Overview

Dataset

Model Architecture

Configuration

Training

Results

Sampling

Key Concepts

How to Run

Credits

Possible Extensions

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages