Skip to content

developer-ayyaz/vision-learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VisionLearn

VisionLearn

Where Learning Meets Play — In Two Languages!

Version License: MIT Expo React Native FastAPI PyTorch Firebase

Platform Languages Age Group

AboutFeaturesTech StackGet StartedContribute


🌟 What is VisionLearn?

Remember how we all learned as kids? By touching things, playing games, and asking "what's that?" a hundred times a day. That's exactly what VisionLearn brings to the digital world.

We've built an app that turns your phone into a magical learning companion for little ones aged 3 to 7. Instead of boring flashcards and repetitive drills, kids can:

  • 📷 Point their camera at a banana and hear "Banana! کیلا!" in both English and Urdu
  • Wave their hand and watch the app recognize their gestures
  • 🎨 Trace letters, numbers, and shapes with their fingers and get instant AI feedback
  • 🔍 Go on treasure hunts to find objects around the house

Everything works in both English and Urdu, helping children from bilingual families feel right at home.

💡 Why We Built This

Most educational apps treat kids like tiny robots, tap here, repeat that, watch this animation. But that's not how children actually learn. They learn by exploring, making mistakes, and having fun.

VisionLearn uses real camera-based interactions and gesture recognition to make learning feel like play. No passive screen time, just active, engaging exploration, with parental controls that let grown-ups stay in charge of when and how long.


✨ Features

📚 Learn the Basics

Four core learning modules with beautiful visuals and bilingual narration (English + Urdu via on-device text-to-speech):

Module What's Included
Alphabets A–Z with phonics, example words, and fun facts
Numbers 1–10 with quantity visuals and pronunciation
Shapes 10 everyday shapes with names, sides, and examples
Colors 8 colors with real-world examples

Items unlock sequentially as the child progresses, and each module ends with a quiz that awards stars.

🎮 Play & Practice

Four AI-powered activities that turn practice into playtime:

Activity How It Works AI Behind It
Drawing Fun Trace a letter, number, or shape, the app classifies it and tells the child how they did PyTorch CNN (MNIST / EMNIST) + OpenCV contour analysis
Gesture Play Show one of 8 hand gestures (thumbs up/down, peace, fist, open palm, pointing up, OK, rock) MediaPipe Hand Landmarker + rule-based classifier
Object Hunt "Find a cup!". The child points the camera and the app verifies it found the right object Ultralytics YOLOv8 over 40 allow-listed everyday objects
Name Game Point the camera at an object and the app generates a 4-option bilingual quiz YOLOv8 + on-server quiz generator

🎁 What Makes It Special

  • 🌐 True Bilingual Experience: Every prompt, hint, and success message is authored in both English and Urdu, with native TTS pronunciation
  • 📷 Smart Camera Recognition: 40 allow-listed everyday objects from YOLOv8's 80 COCO classes
  • 👋 Gesture Magic: 8 distinct hand gestures recognized from MediaPipe's 21 landmarks
  • 🏆 Rewards That Motivate: Stars, badges, rewards, XP levels, and a daily streak system
  • 📊 Parent Dashboard: Watch progress per module, see daily activity time, and review session history
  • 🔒 PIN-Protected Controls: Parental dashboard, settings, and PIN updates are all gated by a device-local PIN and a math challenge
  • ⏱ Daily Time Limits: Configurable per-day cap with per-screen time tracking; gameplay routes lock when the cap is reached
  • 🎨 Free Canvas: A separate drawing space for creative play with save-to-gallery and share

💾 About data: Sign-in, profile, progress, achievements, streaks, activity history, and time tracking are stored in Firebase Auth + Firestore. The vision endpoints receive only the frame being analyzed and discard it after inference.


🛠 Tech Stack

Package Stack
📱 app/ mobile client Expo SDK 54 + Expo Router, React Native 0.81 (New Architecture), React 19, TypeScript 5.9, Firebase JS SDK, Google Sign-In
🔧 api/ backend FastAPI 0.124+, uv, Firebase Admin SDK, Ultralytics YOLOv8, MediaPipe Tasks, PyTorch (CPU), OpenCV, Pydantic v2

Full dependency lists live in app/README.md and api/README.md.

🧠 Inference Models

Model What It Recognizes
YOLOv8s 40 allow-listed everyday objects (fruits, toys, animals, household items) from the 80 COCO classes
MediaPipe Hand Landmarker 21-point hand landmarks fed into a rule-based classifier for 8 gestures
CharCNN (digits) MNIST-trained classifier for digits 0–9
CharCNN (letters) EMNIST-Letters-trained classifier for A–Z
OpenCV shape detector 6 shapes (circle, square, triangle, rectangle, star, diamond) via contour analysis

Training & evaluation scripts live in api/scripts/ and reproducible metrics (confusion matrices, classification reports) are saved to api/evaluation/.


🚀 Getting Started

Each package is self-contained and has its own detailed setup guide. Pick the package you need and follow its README:

I want to… Go to
Run the mobile app (Expo, EAS, Firebase config, dev client) 📱 app/README.md
Run the backend (FastAPI, models, Firebase Admin, training scripts) 🔧 api/README.md

Prerequisites

  • Node.js 18+ · Python 3.12+ · uv
  • A Firebase project with Auth + Firestore enabled (free tier is fine) and a service-account JSON for the backend Admin SDK
  • A development build of the mobile app. Expo Go can't run the native modules used here (camera, Google Sign-In, canvas capture)

Sixty-second walkthrough

git clone https://github.com/developer-ayyaz/vision-learn.git
cd vision-learn

# Terminal 1 — backend
cd api && uv sync && ./run dev                # → http://localhost:8000 (Swagger at /docs)

# Terminal 2 — mobile app
cd app && npm install && npx expo start --dev-client

Before either command works you'll need to drop your .env (and, for the backend, a Firebase service-account JSON) into each package. The full variable list, model file locations, and platform-specific notes are documented in the per-package READMEs linked above.


📁 Repository Layout

vision-learn/
├── 📱 app/        Expo / React Native mobile client     →  see app/README.md
├── 🔧 api/        FastAPI backend with ML pipelines     →  see api/README.md
├── LICENSE
└── README.md     (you are here)

🔌 How the Pieces Fit Together

┌──────────────────────────┐       HTTPS + Firebase ID Token       ┌────────────────────────────┐
│  React Native (Expo)     │  ───────────────────────────────────► │   FastAPI Backend          │
│  app/                    │                                       │   api/                     │
│                          │ ◄────── JSON (bilingual feedback) ─── │  /object-hunt /name-game   │
│                          │                                       │  /gesture-play /drawing-fun│
└──────────────────────────┘                                       └────────────────────────────┘
            │                                                                  │
            │ Firebase SDK                                                     │ Firebase Admin
            ▼                                                                  ▼
        ┌─────────────────────────────────────────────────────────────────────────┐
        │  Firebase Auth (email/password + Google)  +  Firestore (per-user data)  │
        └─────────────────────────────────────────────────────────────────────────┘
  • The app authenticates with Firebase, then calls FastAPI endpoints with a fresh ID token (Bearer …).
  • FastAPI verifies the token with the Admin SDK and runs YOLOv8 / MediaPipe / the PyTorch CNN against the submitted image.
  • Gameplay state (stars, badges, streaks, history, time on task) is written directly from the app to Firestore — the API stays stateless.

🎨 Our Colors

We chose colors that are easy on young eyes but still fun and engaging. The full design system lives in app/constants/colors.ts.

Name Color Used For
Primary #2B4D84 Buttons, active elements
On Primary #FFFFFF Text on buttons
Background #E6EFFF Screen backgrounds
Text #1C3D74 Headings, body text

👥 Contributors

VisionLearn including its product design, UI, and implementation, is built and maintained by:

Fahad Ayyaz
Fahad Ayyaz

@developer-ayyaz
Muhammad Fahad
Muhammad Fahad

@muhammadfahad9
Jawad Ahmad
Jawad Ahmad

@Jdahmad313

Acknowledgements


🤝 Contributing

We'd love your help making VisionLearn even better! Here's how to jump in:

  1. Fork this repo
  2. Create a branch for your feature (git checkout -b feature/cool-idea)
  3. Make your changes, please match the existing code style (TypeScript on the app, type-annotated Python on the API)
  4. Run the linter (npm run lint in app/) and verify the API starts (./run dev in api/)
  5. Submit a pull request with a clear description

💡 Please follow clean code principles and the conventions already established in app/README.md and api/README.md to keep the codebase consistent.

Whether it's fixing a typo, adding a feature, or improving the AI, every contribution counts.


📄 License

VisionLearn is open source under the MIT License. Use it, learn from it, build upon it.


💬 Questions?

Got questions, ideas, or just want to say hi?

  • 🐛 Open an issue for bugs or feature requests
  • 💡 Start a discussion for questions
  • ⭐ Star the repo if you find it useful

Built with ❤️ for curious little minds

VisionLearn — A new way to see, play, and learn

Back to top ⬆️

About

VisionLearn is a fun and engaging educational app that helps children aged 3-7 learn in both English and Urdu through interactive activities, exploration, and play.

Topics

Resources

License

Stars

Watchers

Forks

Contributors