Skip to content

kanavgoyal898/makemore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

makemore: Character-Level Language Model

This project is a minimal implementation of a character-level language model inspired by the makemore series by Andrej Karpathy. The goal is to build an intuitive understanding of how neural networks can learn patterns in text and generate new sequences—in this case, human-like names.

The model operates at the character level, meaning it learns to predict the next character given a fixed-length context of previous characters. Despite its simplicity, it captures interesting statistical patterns in names and can generate realistic outputs.

Multi-Layer Perceptron

Overview

The pipeline consists of:

  • Building a vocabulary of characters
  • Encoding characters into integers
  • Creating context-target training pairs
  • Training a simple neural network
  • Sampling new names from the learned distribution

This implementation uses a fixed context window (block size) and a feedforward neural network with embeddings.

Dataset

The dataset (names.txt) contains ~32,000 names. Each name is treated as a sequence of characters, with a special token . representing the end of a word.

Example:

emma → [e, m, m, a, .]

Model Architecture

The model is a simple neural network with the following components:

  • Embedding Layer Maps each character index to a dense vector representation.

  • Hidden Layer Fully connected layer with tanh activation.

  • Output Layer Produces logits over the vocabulary.

Configuration

  • Vocabulary size: 27 (26 letters + .)
  • Embedding size: 10
  • Context length (block size): 3
  • Hidden layer size: 300
  • Total parameters: 17,697

Training

Training is performed using mini-batch gradient descent.

  • Batch size: 32

  • Iterations: 200,000

  • Learning rate:

    • 0.1 for first 100k steps
    • 0.01 thereafter
  • Loss function: Cross-entropy

The model learns to predict the next character given the previous three characters.

Results

Final losses:

  • Training loss: ~2.11
  • Validation loss: ~2.17

These values indicate the model has learned meaningful structure, though it is still a relatively simple architecture.

Sampling

After training, the model can generate new names by iteratively sampling characters:

Example outputs:

ter
maleah
makilah
tah
mallissana
nalusan
katha
samiyah
javer
gotti

The model starts with an empty context and keeps generating characters until it produces the end token ..

Key Concepts

This project demonstrates several important ideas:

  • Character-level modeling
  • Embedding representations
  • Context-based prediction
  • Neural network training with backpropagation
  • Probabilistic sampling

How to Run

  1. Install dependencies:

    pip install torch matplotlib
    
  2. Place names.txt in the root directory.

  3. Run the notebook or converted Python script:

    python makemore.py
    

Credits

This project is based on the teachings and ideas from Andrej Karpathy’s makemore project. His work provides one of the clearest and most intuitive introductions to building neural networks from scratch, particularly by emphasizing first principles over heavy abstraction. Instead of relying on high-level frameworks, the approach breaks down every component—data processing, embeddings, forward pass, loss computation, and gradient updates—into explicit, understandable steps.

The makemore series is especially valuable because it demonstrates how seemingly complex systems like language models can be constructed incrementally from simple building blocks. By starting with basic probabilistic models and gradually introducing neural networks, it helps bridge the gap between theory and practical implementation. This philosophy strongly influences this project, which aims to remain minimal while still capturing the core mechanics of modern language modeling.

A significant aspect of Karpathy’s teaching style is his focus on intuition. Concepts such as embeddings, context windows, and non-linear transformations are not just implemented but also explained in a way that builds mental models for how and why they work. This project follows the same spirit—prioritizing clarity and transparency over optimization or scale—so that each part of the model can be inspected, modified, and extended with ease.

If you are interested in developing a deeper understanding of neural networks, language models, or PyTorch-based implementations, exploring his original work is highly recommended.

Possible Extensions

  • Increase context window size
  • Use deeper architectures
  • Replace tanh with modern activations (ReLU, GELU)
  • Add batch normalization
  • Train on larger or different datasets
  • Move to transformer-based models

License

This project is for educational purposes. Refer to the original makemore repository for licensing details.

About

This project is a minimal implementation of a character-level language model inspired by the makemore series by Andrej Karpathy. The goal is to build an intuitive understanding of how neural networks can learn patterns in text and generate new sequences—in this case, human-like names.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors