VisionLearn

Where Learning Meets Play — In Two Languages!

About • Features • Tech Stack • Get Started • Contribute

🌟 What is VisionLearn?

Remember how we all learned as kids? By touching things, playing games, and asking "what's that?" a hundred times a day. That's exactly what VisionLearn brings to the digital world.

We've built an app that turns your phone into a magical learning companion for little ones aged 3 to 7. Instead of boring flashcards and repetitive drills, kids can:

📷 Point their camera at a banana and hear "Banana! کیلا!" in both English and Urdu
✋ Wave their hand and watch the app recognize their gestures
🎨 Trace letters, numbers, and shapes with their fingers and get instant AI feedback
🔍 Go on treasure hunts to find objects around the house

Everything works in both English and Urdu, helping children from bilingual families feel right at home.

💡 Why We Built This

Most educational apps treat kids like tiny robots, tap here, repeat that, watch this animation. But that's not how children actually learn. They learn by exploring, making mistakes, and having fun.

VisionLearn uses real camera-based interactions and gesture recognition to make learning feel like play. No passive screen time, just active, engaging exploration, with parental controls that let grown-ups stay in charge of when and how long.

✨ Features

📚 Learn the Basics

Four core learning modules with beautiful visuals and bilingual narration (English + Urdu via on-device text-to-speech):

Module	What's Included
Alphabets	A–Z with phonics, example words, and fun facts
Numbers	1–10 with quantity visuals and pronunciation
Shapes	10 everyday shapes with names, sides, and examples
Colors	8 colors with real-world examples

Items unlock sequentially as the child progresses, and each module ends with a quiz that awards stars.

🎮 Play & Practice

Four AI-powered activities that turn practice into playtime:

Activity	How It Works	AI Behind It
Drawing Fun	Trace a letter, number, or shape, the app classifies it and tells the child how they did	PyTorch CNN (MNIST / EMNIST) + OpenCV contour analysis
Gesture Play	Show one of 8 hand gestures (thumbs up/down, peace, fist, open palm, pointing up, OK, rock)	MediaPipe Hand Landmarker + rule-based classifier
Object Hunt	"Find a cup!". The child points the camera and the app verifies it found the right object	Ultralytics YOLOv8 over 40 allow-listed everyday objects
Name Game	Point the camera at an object and the app generates a 4-option bilingual quiz	YOLOv8 + on-server quiz generator

🎁 What Makes It Special

🌐 True Bilingual Experience: Every prompt, hint, and success message is authored in both English and Urdu, with native TTS pronunciation
📷 Smart Camera Recognition: 40 allow-listed everyday objects from YOLOv8's 80 COCO classes
👋 Gesture Magic: 8 distinct hand gestures recognized from MediaPipe's 21 landmarks
🏆 Rewards That Motivate: Stars, badges, rewards, XP levels, and a daily streak system
📊 Parent Dashboard: Watch progress per module, see daily activity time, and review session history
🔒 PIN-Protected Controls: Parental dashboard, settings, and PIN updates are all gated by a device-local PIN and a math challenge
⏱ Daily Time Limits: Configurable per-day cap with per-screen time tracking; gameplay routes lock when the cap is reached
🎨 Free Canvas: A separate drawing space for creative play with save-to-gallery and share

💾 About data: Sign-in, profile, progress, achievements, streaks, activity history, and time tracking are stored in Firebase Auth + Firestore. The vision endpoints receive only the frame being analyzed and discard it after inference.

🛠 Tech Stack

Package	Stack
📱 `app/` mobile client	Expo SDK 54 + Expo Router, React Native 0.81 (New Architecture), React 19, TypeScript 5.9, Firebase JS SDK, Google Sign-In
🔧 `api/` backend	FastAPI 0.124+, uv, Firebase Admin SDK, Ultralytics YOLOv8, MediaPipe Tasks, PyTorch (CPU), OpenCV, Pydantic v2

Full dependency lists live in app/README.md and api/README.md.

🧠 Inference Models

Model	What It Recognizes
YOLOv8s	40 allow-listed everyday objects (fruits, toys, animals, household items) from the 80 COCO classes
MediaPipe Hand Landmarker	21-point hand landmarks fed into a rule-based classifier for 8 gestures
CharCNN (digits)	MNIST-trained classifier for digits 0–9
CharCNN (letters)	EMNIST-Letters-trained classifier for A–Z
OpenCV shape detector	6 shapes (circle, square, triangle, rectangle, star, diamond) via contour analysis

Training & evaluation scripts live in api/scripts/ and reproducible metrics (confusion matrices, classification reports) are saved to api/evaluation/.

🚀 Getting Started

Each package is self-contained and has its own detailed setup guide. Pick the package you need and follow its README:

I want to…	Go to
Run the mobile app (Expo, EAS, Firebase config, dev client)	📱 `app/README.md`
Run the backend (FastAPI, models, Firebase Admin, training scripts)	🔧 `api/README.md`

Prerequisites

Node.js 18+ · Python 3.12+ · uv
A Firebase project with Auth + Firestore enabled (free tier is fine) and a service-account JSON for the backend Admin SDK
A development build of the mobile app. Expo Go can't run the native modules used here (camera, Google Sign-In, canvas capture)

Sixty-second walkthrough

git clone https://github.com/developer-ayyaz/vision-learn.git
cd vision-learn

# Terminal 1 — backend
cd api && uv sync && ./run dev                # → http://localhost:8000 (Swagger at /docs)

# Terminal 2 — mobile app
cd app && npm install && npx expo start --dev-client

Before either command works you'll need to drop your .env (and, for the backend, a Firebase service-account JSON) into each package. The full variable list, model file locations, and platform-specific notes are documented in the per-package READMEs linked above.

📁 Repository Layout

vision-learn/
├── 📱 app/        Expo / React Native mobile client     →  see app/README.md
├── 🔧 api/        FastAPI backend with ML pipelines     →  see api/README.md
├── LICENSE
└── README.md     (you are here)

🔌 How the Pieces Fit Together

┌──────────────────────────┐       HTTPS + Firebase ID Token       ┌────────────────────────────┐
│  React Native (Expo)     │  ───────────────────────────────────► │   FastAPI Backend          │
│  app/                    │                                       │   api/                     │
│                          │ ◄────── JSON (bilingual feedback) ─── │  /object-hunt /name-game   │
│                          │                                       │  /gesture-play /drawing-fun│
└──────────────────────────┘                                       └────────────────────────────┘
            │                                                                  │
            │ Firebase SDK                                                     │ Firebase Admin
            ▼                                                                  ▼
        ┌─────────────────────────────────────────────────────────────────────────┐
        │  Firebase Auth (email/password + Google)  +  Firestore (per-user data)  │
        └─────────────────────────────────────────────────────────────────────────┘

The app authenticates with Firebase, then calls FastAPI endpoints with a fresh ID token (Bearer …).
FastAPI verifies the token with the Admin SDK and runs YOLOv8 / MediaPipe / the PyTorch CNN against the submitted image.
Gameplay state (stars, badges, streaks, history, time on task) is written directly from the app to Firestore — the API stays stateless.

🎨 Our Colors

We chose colors that are easy on young eyes but still fun and engaging. The full design system lives in app/constants/colors.ts.

Name	Color	Used For
Primary	#2B4D84	Buttons, active elements
On Primary	#FFFFFF	Text on buttons
Background	#E6EFFF	Screen backgrounds
Text	#1C3D74	Headings, body text

👥 Contributors

VisionLearn including its product design, UI, and implementation, is built and maintained by:

_{Fahad Ayyaz}
_{@developer-ayyaz}

_{Muhammad Fahad}
_{@muhammadfahad9}

_{Jawad Ahmad}
_@Jdahmad313

Acknowledgements

Logo design by Shujaat Ali — shujaatdesigns.framer.website

🤝 Contributing

We'd love your help making VisionLearn even better! Here's how to jump in:

Fork this repo
Create a branch for your feature (git checkout -b feature/cool-idea)
Make your changes, please match the existing code style (TypeScript on the app, type-annotated Python on the API)
Run the linter (npm run lint in app/) and verify the API starts (./run dev in api/)
Submit a pull request with a clear description

💡 Please follow clean code principles and the conventions already established in app/README.md and api/README.md to keep the codebase consistent.

Whether it's fixing a typo, adding a feature, or improving the AI, every contribution counts.

📄 License

VisionLearn is open source under the MIT License. Use it, learn from it, build upon it.

💬 Questions?

Got questions, ideas, or just want to say hi?

🐛 Open an issue for bugs or feature requests
💡 Start a discussion for questions
⭐ Star the repo if you find it useful

Built with ❤️ for curious little minds

VisionLearn — A new way to see, play, and learn

Back to top ⬆️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionLearn

🌟 What is VisionLearn?

💡 Why We Built This

✨ Features

📚 Learn the Basics

🎮 Play & Practice

🎁 What Makes It Special

🛠 Tech Stack

🧠 Inference Models

🚀 Getting Started

Prerequisites

Sixty-second walkthrough

📁 Repository Layout

🔌 How the Pieces Fit Together

🎨 Our Colors

👥 Contributors

Acknowledgements

🤝 Contributing

📄 License

💬 Questions?

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
api		api
app		app
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

VisionLearn

🌟 What is VisionLearn?

💡 Why We Built This

✨ Features

📚 Learn the Basics

🎮 Play & Practice

🎁 What Makes It Special

🛠 Tech Stack

🧠 Inference Models

🚀 Getting Started

Prerequisites

Sixty-second walkthrough

📁 Repository Layout

🔌 How the Pieces Fit Together

🎨 Our Colors

👥 Contributors

Acknowledgements

🤝 Contributing

📄 License

💬 Questions?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages