Zhemin.Xie Ayrie741

ZHEMIN_XIE

🎓 BSc Data Science & AI @ Leiden University | 📍 Leiden (open to relocate)

🛠️ Tech stack: Python · SQL · R · scikit-learn · pandas · NumPy · statsmodels · Power BI · Tableau · Excel · Git
📊 Direction: Data Cleaning & Visualization, Machine Learning Modeling, Risk Analytics, Business Analytics，Data Angineering

BSc. Data Science & Artificial Intelligence Leiden University, NL | Sep 2023 – Jun 2026
MBA in Professional Accounting
Rutgers University, USA | May 2019 – Oct 2020
BA Business Administration
Beijing Normal–Hong Kong Baptist University, CN | Sep 2014 – Jun 2018

🐍 Python · pandas · NumPy · scikit-learn

Exploratory analysis and visualization of datasets
RFM Feature Engineering + Full Process scikit-learn Pipeline（StandardScaler → OneHotEncoder → LogisticRegression）
Deploying as a Jupyter Notebook Demo

🐍 Python · pandas · scikit-learn · SHAP · Matplotlib · Seaborn

Built a leakage-aware customer churn prediction pipeline for bank customer retention analysis
Diagnosed unrealistic near-perfect model performance caused by a complaint-related feature and rebuilt the model under a more realistic setting
Compared Logistic Regression, Random Forest, and Gradient Boosting models using precision, recall, F1, ROC-AUC, and PR-AUC
Applied threshold tuning, cost-sensitive analysis, and customer segmentation to translate model outputs into retention strategy insights

🐍 Python · pandas · NumPy · Power BI

🐍 Python · pandas · NumPy · scikit-learn · statsmodels · matplotlib · seaborn

Built an interpretable credit risk scorecard using LendingClub loan data with manually implemented WOE, IV, PSI, and score scaling
Designed a leakage-aware modeling pipeline by removing post-origination variables and using time-based train / validation / test splitting
Trained a logistic regression scorecard with 0.663 test ROC-AUC, 0.412 PR-AUC, and 0.234 KS on out-of-time test data
Converted predicted default probabilities into credit scores and risk bands, showing around 5x bad-rate difference between highest-risk and lowest-risk bands

🐍 Python · pandas · NumPy · Darts · scikit-learn · Matplotlib

Built a 561-component hierarchical retail forecasting pipeline across total, store, item, and store-item sales levels
Compared Seasonal Naive and Linear Regression models, reducing Total MAPE from 34.98% to 6.80%
Applied forecast reconciliation, with Top-Down improving Store-item MAPE from 20.05% to 15.50%
Identified high-risk store-item demand segments to support inventory replenishment analysis

🐍 Python · pandas · NumPy · scikit-learn · Matplotlib · Jupyter

Built a profit-aware pricing optimization workflow using 2,800 historical sales records
Compared Ridge, Random Forest, and Gradient Boosting demand models, selecting Random Forest with around 25.60% SMAPE
Recommended product-level prices under observed price ranges and a ±15% price-change guardrail
Identified major pricing opportunities for Carretera, Paseo, and Velo based on predicted profit uplift
Added segment-level pricing, discount simulation, robust optimization, and A/B test rollout planning

Thanks for visiting! 🚀