I am a Data Scientist focused on building production-grade machine learning pipelines, scalable data tools, and intelligent systems. I believe in writing modular, object-oriented code and designing self-service data solutions that empower cross-functional teams.
-
📍 Based in Germany
-
⚡ Tech Philosophy: Clean OOP architecture, model explainability, and breaking data bottlenecks.
-
🌱 Currently exploring: Advanced MLOps workflows and AI Agents.
Click a domain button below to expand and explore the repositories:
👁️ Computer Vision & Multimodal AI
- Tech Stack: Python, Vision Transformer (ViT). PyTorch, MLflow
- Core Focus: AgroVision-Dx2 is a specialized deep learning tool designed to identify plant pathologies from leaf images. Unlike traditional Convolutional Neural Networks (CNNs) that focus on local pixel neighborhoods, this project utilizes a Vision Transformer (ViT).
- Tech Stack: Python, Sklearn, Pytorch, Docker, CLIP, ResNet50, EfficientNet
- Core Focus: Engineer a zero-shot promting image classifier using CLIP. Can be used for image labeling task on unlabelled data. Conducted comparison on performance against ResNet and EfficientNet.
📊 Tabular Data & Explainable AI
- Tech Stack: Python, Scikit-learn, XGBoost, LightGBM, CatBoost, Isolation Trees
- Core Focus: Developed an end-to-end anomaly detection framework comparison between supervised and unsupervised ML. Investigates behaviour of the models for this task.
- Tech Stack: Python, XGBoost, LGMB, Pandas
- Core Focus: Built a business-driven predictive modeling system achieving 94% recall to proactively capture high-risk customer turnover. Emphasized structured evaluation matrices and rigorous feature engineering pipelines.
📊 NLP & LLMs
- Tech Stack: Python, Sklearn, Pytorch, Hugging Face (DistilBERT), NLP Pipelines, FastAPI
- Core Focus: Engineered a natural language processing tool utilizing DistilBERT to perform large-scale sentiment analysis on engagement data, identifying bot activity and quantifying audience authenticity.
- Tech Stack: Python, Sklearn, Pytorch, Hugging Face (BERT), NLP Pipelines
- Core Focus: Implements Sentiment Analysis using BERT (Bidirectional Encoder Representations from Transformers). The goal is to classify text data into positive or negative sentiment, leveraging BERT's powerful ability to understand deep contextual relations in text.
- Tech Stack: Python, Sklearn, Pytorch, Hugging Face (DistilBERT), NLP Pipelines
- Core Focus: Implements Sentiment Analysis using a Bidirectional LSTM (BiLSTM) neural network. The goal is to classify text data into positive or negative sentiment, leveraging the ability of BiLSTMs to capture contextual information from both past and future tokens in a sequence.
| Category | Technologies |
|---|---|
| Languages | Python (Pandas, NumPy, PySpark), SQL, R |
| Classic ML & Tuning | XGBoost, LightGBM, CatBoost, Random Forest, Scikit-learn, MLflow, Optuna |
| Deep & Multimodal Learning | PyTorch, TensorFlow, Keras, Hugging Face Transformers, CLIP, OpenCV |
| MLOps & Infrastructure | Docker, CI/CD (GitHub Actions), FastAPI, Streamlit, Git, Bitbucket, GCP (Cloud Run, Dataflow, BigQuery) |
| BI & Analytics | Looker, Power BI, Tableau, Matplotlib, Seaborn, Excel |
- Technical Mentorship: Former Data Science Teaching Instructor—guided end-to-end student capstone analytics projects, focusing on robust problem framing, hypothesis testing, and interactive reporting using Power BI and Tableau.
- Tech Leadership: Actively involved in driving developer communities, organizing regional tech initiatives, managing speaker acquisitions, and structuring agile hackathons.
- LinkedIn: linkedin.com/in/anne-mburu
- Kaggle: kaggle.com/annemburu
- Email: annemburu11@gmail.com

