Where Learning Meets Play — In Two Languages!
About • Features • Tech Stack • Get Started • Contribute
Remember how we all learned as kids? By touching things, playing games, and asking "what's that?" a hundred times a day. That's exactly what VisionLearn brings to the digital world.
We've built an app that turns your phone into a magical learning companion for little ones aged 3 to 7. Instead of boring flashcards and repetitive drills, kids can:
- 📷 Point their camera at a banana and hear "Banana! کیلا!" in both English and Urdu
- ✋ Wave their hand and watch the app recognize their gestures
- 🎨 Trace letters, numbers, and shapes with their fingers and get instant AI feedback
- 🔍 Go on treasure hunts to find objects around the house
Everything works in both English and Urdu, helping children from bilingual families feel right at home.
Most educational apps treat kids like tiny robots, tap here, repeat that, watch this animation. But that's not how children actually learn. They learn by exploring, making mistakes, and having fun.
VisionLearn uses real camera-based interactions and gesture recognition to make learning feel like play. No passive screen time, just active, engaging exploration, with parental controls that let grown-ups stay in charge of when and how long.
Four core learning modules with beautiful visuals and bilingual narration (English + Urdu via on-device text-to-speech):
| Module | What's Included |
|---|---|
| Alphabets | A–Z with phonics, example words, and fun facts |
| Numbers | 1–10 with quantity visuals and pronunciation |
| Shapes | 10 everyday shapes with names, sides, and examples |
| Colors | 8 colors with real-world examples |
Items unlock sequentially as the child progresses, and each module ends with a quiz that awards stars.
Four AI-powered activities that turn practice into playtime:
| Activity | How It Works | AI Behind It |
|---|---|---|
| Drawing Fun | Trace a letter, number, or shape, the app classifies it and tells the child how they did | PyTorch CNN (MNIST / EMNIST) + OpenCV contour analysis |
| Gesture Play | Show one of 8 hand gestures (thumbs up/down, peace, fist, open palm, pointing up, OK, rock) | MediaPipe Hand Landmarker + rule-based classifier |
| Object Hunt | "Find a cup!". The child points the camera and the app verifies it found the right object | Ultralytics YOLOv8 over 40 allow-listed everyday objects |
| Name Game | Point the camera at an object and the app generates a 4-option bilingual quiz | YOLOv8 + on-server quiz generator |
- 🌐 True Bilingual Experience: Every prompt, hint, and success message is authored in both English and Urdu, with native TTS pronunciation
- 📷 Smart Camera Recognition: 40 allow-listed everyday objects from YOLOv8's 80 COCO classes
- 👋 Gesture Magic: 8 distinct hand gestures recognized from MediaPipe's 21 landmarks
- 🏆 Rewards That Motivate: Stars, badges, rewards, XP levels, and a daily streak system
- 📊 Parent Dashboard: Watch progress per module, see daily activity time, and review session history
- 🔒 PIN-Protected Controls: Parental dashboard, settings, and PIN updates are all gated by a device-local PIN and a math challenge
- ⏱ Daily Time Limits: Configurable per-day cap with per-screen time tracking; gameplay routes lock when the cap is reached
- 🎨 Free Canvas: A separate drawing space for creative play with save-to-gallery and share
💾 About data: Sign-in, profile, progress, achievements, streaks, activity history, and time tracking are stored in Firebase Auth + Firestore. The vision endpoints receive only the frame being analyzed and discard it after inference.
| Package | Stack |
|---|---|
📱 app/ mobile client |
Expo SDK 54 + Expo Router, React Native 0.81 (New Architecture), React 19, TypeScript 5.9, Firebase JS SDK, Google Sign-In |
🔧 api/ backend |
FastAPI 0.124+, uv, Firebase Admin SDK, Ultralytics YOLOv8, MediaPipe Tasks, PyTorch (CPU), OpenCV, Pydantic v2 |
Full dependency lists live in
app/README.mdandapi/README.md.
| Model | What It Recognizes |
|---|---|
| YOLOv8s | 40 allow-listed everyday objects (fruits, toys, animals, household items) from the 80 COCO classes |
| MediaPipe Hand Landmarker | 21-point hand landmarks fed into a rule-based classifier for 8 gestures |
| CharCNN (digits) | MNIST-trained classifier for digits 0–9 |
| CharCNN (letters) | EMNIST-Letters-trained classifier for A–Z |
| OpenCV shape detector | 6 shapes (circle, square, triangle, rectangle, star, diamond) via contour analysis |
Training & evaluation scripts live in api/scripts/ and reproducible metrics (confusion matrices, classification reports) are saved to api/evaluation/.
Each package is self-contained and has its own detailed setup guide. Pick the package you need and follow its README:
| I want to… | Go to |
|---|---|
| Run the mobile app (Expo, EAS, Firebase config, dev client) | 📱 app/README.md |
| Run the backend (FastAPI, models, Firebase Admin, training scripts) | 🔧 api/README.md |
- Node.js 18+ · Python 3.12+ · uv
- A Firebase project with Auth + Firestore enabled (free tier is fine) and a service-account JSON for the backend Admin SDK
- A development build of the mobile app. Expo Go can't run the native modules used here (camera, Google Sign-In, canvas capture)
git clone https://github.com/developer-ayyaz/vision-learn.git
cd vision-learn
# Terminal 1 — backend
cd api && uv sync && ./run dev # → http://localhost:8000 (Swagger at /docs)
# Terminal 2 — mobile app
cd app && npm install && npx expo start --dev-clientBefore either command works you'll need to drop your .env (and, for the backend, a Firebase service-account JSON) into each package. The full variable list, model file locations, and platform-specific notes are documented in the per-package READMEs linked above.
vision-learn/
├── 📱 app/ Expo / React Native mobile client → see app/README.md
├── 🔧 api/ FastAPI backend with ML pipelines → see api/README.md
├── LICENSE
└── README.md (you are here)
┌──────────────────────────┐ HTTPS + Firebase ID Token ┌────────────────────────────┐
│ React Native (Expo) │ ───────────────────────────────────► │ FastAPI Backend │
│ app/ │ │ api/ │
│ │ ◄────── JSON (bilingual feedback) ─── │ /object-hunt /name-game │
│ │ │ /gesture-play /drawing-fun│
└──────────────────────────┘ └────────────────────────────┘
│ │
│ Firebase SDK │ Firebase Admin
▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Firebase Auth (email/password + Google) + Firestore (per-user data) │
└─────────────────────────────────────────────────────────────────────────┘
- The app authenticates with Firebase, then calls FastAPI endpoints with a fresh ID token (
Bearer …). - FastAPI verifies the token with the Admin SDK and runs YOLOv8 / MediaPipe / the PyTorch CNN against the submitted image.
- Gameplay state (stars, badges, streaks, history, time on task) is written directly from the app to Firestore — the API stays stateless.
We chose colors that are easy on young eyes but still fun and engaging. The full design system lives in app/constants/colors.ts.
| Name | Color | Used For |
|---|---|---|
| Primary | #2B4D84 | Buttons, active elements |
| On Primary | #FFFFFF | Text on buttons |
| Background | #E6EFFF | Screen backgrounds |
| Text | #1C3D74 | Headings, body text |
VisionLearn including its product design, UI, and implementation, is built and maintained by:
![]() Fahad Ayyaz @developer-ayyaz |
![]() Muhammad Fahad @muhammadfahad9 |
![]() Jawad Ahmad @Jdahmad313 |
- Logo design by Shujaat Ali — shujaatdesigns.framer.website
We'd love your help making VisionLearn even better! Here's how to jump in:
- Fork this repo
- Create a branch for your feature (
git checkout -b feature/cool-idea) - Make your changes, please match the existing code style (TypeScript on the app, type-annotated Python on the API)
- Run the linter (
npm run lintinapp/) and verify the API starts (./run devinapi/) - Submit a pull request with a clear description
💡 Please follow clean code principles and the conventions already established in
app/README.mdandapi/README.mdto keep the codebase consistent.
Whether it's fixing a typo, adding a feature, or improving the AI, every contribution counts.
VisionLearn is open source under the MIT License. Use it, learn from it, build upon it.
Got questions, ideas, or just want to say hi?
- 🐛 Open an issue for bugs or feature requests
- 💡 Start a discussion for questions
- ⭐ Star the repo if you find it useful
Built with ❤️ for curious little minds
VisionLearn — A new way to see, play, and learn


