🎓 BSc Data Science & AI @ Leiden University | 📍 Leiden (open to relocate)
- 🛠️ Tech stack: Python · SQL · R · scikit-learn · pandas · NumPy · statsmodels · Power BI · Tableau · Excel · Git
- 📊 Direction: Data Cleaning & Visualization, Machine Learning Modeling, Risk Analytics, Business Analytics,Data Angineering
- BSc. Data Science & Artificial Intelligence Leiden University, NL | Sep 2023 – Jun 2026
- MBA in Professional Accounting
Rutgers University, USA | May 2019 – Oct 2020 - BA Business Administration
Beijing Normal–Hong Kong Baptist University, CN | Sep 2014 – Jun 2018
🐍 Python · pandas · NumPy · scikit-learn
- Exploratory analysis and visualization of datasets
- RFM Feature Engineering + Full Process scikit-learn Pipeline(StandardScaler → OneHotEncoder → LogisticRegression)
- Deploying as a Jupyter Notebook Demo
🐍 Python · pandas · scikit-learn · SHAP · Matplotlib · Seaborn
- Built a leakage-aware customer churn prediction pipeline for bank customer retention analysis
- Diagnosed unrealistic near-perfect model performance caused by a complaint-related feature and rebuilt the model under a more realistic setting
- Compared Logistic Regression, Random Forest, and Gradient Boosting models using precision, recall, F1, ROC-AUC, and PR-AUC
- Applied threshold tuning, cost-sensitive analysis, and customer segmentation to translate model outputs into retention strategy insights
🐍 Python · pandas · NumPy · Power BI
- Pre-processing of high-powered credit card transaction data
- Anomaly Detection Model Training and Evaluation
- Visual dashboards to show anomaly trends
🐍 Python · pandas · NumPy · scikit-learn · statsmodels · matplotlib · seaborn
- Built an interpretable credit risk scorecard using LendingClub loan data with manually implemented WOE, IV, PSI, and score scaling
- Designed a leakage-aware modeling pipeline by removing post-origination variables and using time-based train / validation / test splitting
- Trained a logistic regression scorecard with 0.663 test ROC-AUC, 0.412 PR-AUC, and 0.234 KS on out-of-time test data
- Converted predicted default probabilities into credit scores and risk bands, showing around 5x bad-rate difference between highest-risk and lowest-risk bands
🐍 Python · pandas · NumPy · Darts · scikit-learn · Matplotlib
- Built a 561-component hierarchical retail forecasting pipeline across total, store, item, and store-item sales levels
- Compared Seasonal Naive and Linear Regression models, reducing Total MAPE from 34.98% to 6.80%
- Applied forecast reconciliation, with Top-Down improving Store-item MAPE from 20.05% to 15.50%
- Identified high-risk store-item demand segments to support inventory replenishment analysis
🐍 Python · pandas · NumPy · scikit-learn · Matplotlib · Jupyter
- Built a profit-aware pricing optimization workflow using 2,800 historical sales records
- Compared Ridge, Random Forest, and Gradient Boosting demand models, selecting Random Forest with around 25.60% SMAPE
- Recommended product-level prices under observed price ranges and a ±15% price-change guardrail
- Identified major pricing opportunities for Carretera, Paseo, and Velo based on predicted profit uplift
- Added segment-level pricing, discount simulation, robust optimization, and A/B test rollout planning
Thanks for visiting! 🚀