A complete creative AI system built entirely from scratch using PyTorch. Three specialized transformers collaborate to generate stories, melodies, and images β no pre-trained models, no black boxes. Pure transformer implementations.
Transformer -- Output -- Architecture
1 Storyformer -- Text narrative -- Decoder-only (GPT-style)
2 Melodyformer -- MIDI/music sequence -- Encoder-decoder with relative attention
3 Imageformer -- Pixel images -- Vision transformer (ViT) + diffusion
π¨ From Scratch β No HuggingFace, no pretrained weights. Learn and control every detail
π Multi-Modal β Text, music, and image generation in one unified framework
β‘ PyTorch Native β Pure PyTorch with custom attention implementations
πΎ Lightweight β Designed to train on consumer GPUs (tested on 8GB VRAM)
ποΈ Fully Configurable β Embedding dimensions, layers, heads β all tunable