Skip to content
View Darsh29's full-sized avatar

Block or report Darsh29

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Darsh29/README.md
Hi, I'm Darsh Vora subtitle

Views


ML/AI Engineer with 2+ years building production LLM systems, deep learning models, and scalable data pipelines delivering $229K+ verified business impact across robotics, marketing intelligence, and e-commerce. Proficient in Python, PyTorch, and TensorFlow with hands-on experience in GenAI, RAG, end-to-end MLOps, and distributed data engineering.

I came to ML through Electronics Engineering. Hardware teaches you that latency compounds, systems degrade under load, and the gap between a prototype and something production-ready is almost never just a code problem. That mindset is in everything I build.

Currently at Tatum Robotics: production ASR at <200ms, 3x edge inference speedup via INT8 quantization, and a text-to-ASL engine covering 3,000+ phrases.

"Always in Beta. Always Compounding."


🟢  Open to new opportunities

I'm actively looking for my next role. If you're working on something hard and care about what gets shipped, let's talk.

Roles I'm targeting:

ML Engineer AI Engineer Data Scientist Data Analyst Data Engineer

Industries: Worked across FinTech, Robotics, Marketing and AI. Always looking for new domains to dive in


🛠  When I code, I rely on

Core Stack

Data Stack


Domain Tools
Languages Python · C# (.NET) · C++ · R · Go · SQL (PostgreSQL · MySQL) · React · TypeScript · Git
GenAI / NLP LangChain · LangGraph · LlamaIndex · RAG · Whisper ASR · BERT · Transformers · LoRA · QLoRA · PEFT · spaCy · NLTK · Vector DBs · FAISS · ChromaDB
ML / DL PyTorch · TensorFlow · TFLite · Keras · XGBoost · LightGBM · SHAP · CUDA · CNN · LSTM · RNN · GANs · BLIP-2 · statsmodels · SciPy · Hugging Face
Computer Vision YOLOv8 · ByteTrack · MediaPipe · ResNet · EfficientNet · Tesseract OCR · OpenCV
MLOps / Cloud Docker · Kubernetes · MLflow · Airflow · AWS SageMaker · AWS Lambda · Azure · GCP · Terraform · FastAPI · Kafka · CI/CD
Data Engineering Spark · PySpark · dbt · Snowflake · ETL · Redis · MySQL · PostgreSQL · MongoDB
Analytics / BI Tableau · Power BI · Looker · Plotly · Streamlit · A/B Testing · Causal Inference · Hypothesis Testing · Bayesian Methods · Propensity Score Matching

⚡  What I've built and shipped

Real systems, real constraints, real production.

Tatum Robotics - AI Software Engineer (Aug 2025 – Present)

Building production AI systems for robotic communication at the intersection of speech, language, and gesture.

  • 🎙️  Whisper ASR pipeline containerized with CI/CD version control and automated quality validation, processing 500+ daily utterances at 95%+ accuracy and <200ms latency
  • 🤟  Text-to-ASL translation engine on a C# (.NET) backend, mapping 3,000+ phrases to 26 hand configurations across diverse signing contexts via a gesture mapping engine
  • ⚡  Post-training quantization (FP32 to INT8) benchmarked across GPU (CUDA) vs. CPU latency profiles, delivering 3x on-device inference speedup, 70% model compression, and <1% accuracy loss
  • 📉  Redesigned the gesture-to-phrase mapping pipeline, reducing ASL interpretation latency by 40% and improving response consistency across varying input conditions

Crewasis AI - ML Engineer Intern (Jan 2025 - Jun 2025)

Built multimodal ML infrastructure for marketing intelligence at scale across social media platforms.

  • 🧠  Fine-tuned BLIP-2 with LoRA adapters and deployed a multimodal RAG system over audio, video, and text, containerized with Docker, processing 5K+ daily social media assets
  • 🚀  Scaled ETL pipeline throughput 60x (30 min to 30 sec) by deploying Python workers on AWS Lambda with Airflow triggers and automated data quality checks, saving $19K+ annually
  • 🔍  Built a vector search system across 1.6M+ records integrating REST APIs (YouTube, Instagram, TikTok) with FAISS vector retrieval at sub-3s query latency, orchestrated with Kubernetes
  • 📊  Validated a 29% cost advantage across 20+ A/B experiments using MLflow tracking, translating results into deployment decisions for senior leadership

Red Moments Pvt Ltd - Jr. Data Scientist (Jun 2022 - May 2023)

Data science and analytics across manufacturing and e-commerce operations in Mumbai.

  • 📈  Built time-series forecasting models (Prophet + XGBoost) on 75K+ transactions with SQL-driven feature engineering, improving production planning by 23%
  • 💰  Designed A/B testing frameworks translating business questions into structured recommendations, generating $100K annually with 16% inventory reduction
  • 🏗️  Constructed ETL pipelines with dbt transformation workflows and CI/CD schema validation, lifting margins by 9% and producing $80K in revenue
  • ⏱️  Built Tableau and Power BI dashboards with documented KPI definitions for cross-functional stakeholders, cutting reporting from 3 days to real-time and saving $30K annually

🔬  Selected projects

  • 🔍  FinSight RAG - Hybrid RAG pipeline with MiniLM embeddings, dense/sparse retrieval, and semantic reranking over SEC 10-K filings. Benchmarks 7 retrieval strategies via an LLM-as-judge framework. 94% query success · 4.25/5 relevance · 42% latency cut · 40% API cost reduction
  • 🎵  Speech Emotion Recognition - CNN-LSTM with MFCC, mel-spectrogram, and chroma feature extraction on 15K+ audio samples. 90.5% accuracy · 90.4 F1 across 8 classes · outperformed InceptionV3 baseline by 3% while training 25% faster
  • 🦅  Bird Species Classification - 4 CNN architectures benchmarked on 89,885 images across 100 species. Deep VGG-style CNN vs. InceptionV3 transfer learning. 90.5% accuracy · 90.4 F1
  • 🧬  NeuroDigest AI - Agentic LLM pipeline using LangChain reasoning chains to ingest multi-format sources and generate structured digests

📊  GitHub activity

Top Languages

Activity Graph


🤝  Let's connect

I'm always open to conversations about interesting problems and the right opportunities. Reach out through any of the channels below.

Portfolio LinkedIn Email AWS Certified


MS Data Analytics Engineering · Northeastern University · GPA 3.94  ·  AWS ML Engineer Certified  ·  Published Research · ISBN 978-93-5777-300-3

Popular repositories Loading

  1. Bird-Species-Image-Classification Bird-Species-Image-Classification Public

    4 CNN architectures benchmarked on 89,885 images across 100 bird species. Deep VGG-style CNN achieves 90.5% accuracy and 90.4% F1 score. Includes InceptionV3 transfer learning comparison.

    Jupyter Notebook 1

  2. Treatlive_Healthcare_Database Treatlive_Healthcare_Database Public

    As the lead developer of 'TreatLive,' I integrated 50+ U.S. healthcare entities, enhancing access for 10,000+ users. By engineering a hybrid MySQL/MongoDB database, I boosted processing efficiency …

  3. Bankruptcy_Prediction Bankruptcy_Prediction Public

    Jupyter Notebook

  4. speech-emotion-recognition speech-emotion-recognition Public

    Forked from x4nth055/emotion-recognition-using-speech

    Building and training Speech Emotion Recognizer that predicts human emotions using Python, Sci-kit learn and Keras

    Python

  5. NeuroDigest-AI NeuroDigest-AI Public

    Python

  6. darshvora_portfolio darshvora_portfolio Public

    HTML