Production-grade MLOps pipeline and analytics dashboard for the Superstore retail dataset. Combines XGBoost (Optuna-tuned) with Prophet in a walk-forward validated ensemble, served via FastAPI, tracked in MLflow, and visualised in a professional Streamlit dashboard.
flowchart LR
A[(SampleSuperstore.csv)] -->|ETL| B[data_prep.py\nClean · Aggregate · Feature Eng.]
B --> C[(clean.parquet\nprocessed.parquet)]
C -->|Train| D[train.py\nTimeSeriesSplit CV\nOptuna 50 trials]
D --> E[XGBoost\nOptimal Hyperparams]
D --> F[Prophet\nUS Holidays + Seasonality]
E --> G{Ensemble Weight\nOptimisation}
F --> G
G --> H[(models/\nxgb_model.pkl\nprophet_model.pkl\nmetadata.json)]
G --> I[MLflow\nTracking Server :5000]
H -->|Artefact load| J[predict.py\n30-Day Iterative Forecast]
J --> K[(data/forecast.parquet)]
H --> L[FastAPI :8000\n/forecast /health /metrics]
C --> M[Streamlit Dashboard :8501]
K --> M
L --> M
I --> M
superstore-mlops-forecast/
├── src/
│ ├── __init__.py
│ ├── data_prep.py # ETL: load → clean → daily agg → feature engineering
│ ├── train.py # Training: MLflow + Optuna + XGBoost + Prophet + ensemble
│ ├── predict.py # Inference: ForecastEngine + generate_forecast()
│ └── api.py # FastAPI: /forecast /health /metrics
├── dashboard/
│ ├── __init__.py
│ └── app.py # Streamlit: KPIs · 3-D charts · forecast · export
├── models/ # Saved artefacts (auto-generated by make train)
│ ├── xgb_model.pkl
│ ├── prophet_model.pkl
│ └── metadata.json
├── data/ # Raw and processed datasets
│ ├── SampleSuperstore.csv # ← place here (download from Kaggle)
│ ├── clean.parquet # generated by make prepare
│ ├── processed.parquet # generated by make prepare
│ └── forecast.parquet # generated by make train
├── requirements.txt
├── Dockerfile
├── mlflow.Dockerfile
├── docker-entrypoint.sh
├── docker-compose.yml
└── Makefile
- Python 3.11+
- pip
- (Optional) Docker & Docker Compose
pip install -r requirements.txtDownload SampleSuperstore.csv from Kaggle:
https://www.kaggle.com/datasets/vivek468/superstore-dataset-final
Place the file at data/SampleSuperstore.csv.
make prepare
# Generates: data/clean.parquet data/processed.parquetmake train # Full training — 50 Optuna trials (recommended)
# or
make quick-train # 10 Optuna trials — for rapid prototypingWhat this does:
- Walk-forward cross-validation (5 folds, TimeSeriesSplit)
- Optuna hyperparameter search for XGBoost (100–500 estimators, depth 3–10, etc.)
- Prophet training with US holiday regressors
- Scipy-based ensemble weight optimisation (minimises MAPE)
- Logs all metrics and artefacts to MLflow (SQLite backend)
- Generates
data/forecast.parquet(30-day predictions)
make dashboard
# → http://localhost:8501make serve
# → http://localhost:8000/docs (Swagger UI)
# → http://localhost:8000/forecast?days=30make mlflow-ui
# → http://localhost:5000Bring up all three services (MLflow + FastAPI + Streamlit) with a single command:
# Build images and start in detached mode
docker-compose up --build -d
# Service URLs
# Dashboard : http://localhost:8501
# API : http://localhost:8000
# MLflow : http://localhost:5000
# View logs
make docker-logs
# Shut down
make docker-down| Metric | Target | Typical Result |
|---|---|---|
| Ensemble MAPE (validation) | < 12% | ~9–10% |
| XGBoost MAPE | — | ~10–12% |
| Prophet MAPE | — | ~12–15% |
| Walk-Forward CV MAPE (5-fold) | < 15% | ~11–13% |
| RMSE (validation) | — | Logged in MLflow |
| Group | Features |
|---|---|
| Calendar | day_of_week, day_of_month, month, quarter, year, week_of_year, day_of_year |
| Binary flags | is_weekend, is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_holiday |
| Cyclical | sin/cos encodings for month, day-of-week, day-of-year |
| Holiday proximity | days_to_next_holiday, days_since_last_holiday |
| Lag features | sales_lag_1/2/3/7/14/21/28 |
| Rolling stats | 7/14/28-day rolling mean, std, min, max |
| EWM | sales_ewm_7/14/28 |
| Contextual | discount, quantity, orders |
Experiment name: superstore_sales_forecast
Logged per run:
- Parameters: all XGBoost hyperparams (Optuna best), ensemble_alpha, dataset sizes
- Metrics:
xgb_val_mape,xgb_val_rmse,prophet_val_mape,ensemble_val_mape,cv_mape_mean,cv_mape_std - Artefacts:
xgb_model.pkl,metadata.json,forecast.parquet
Live MLOps Sales Platform:
https://sales-predictor-7vfxujqpva6pe3zvwvffnm.streamlit.appGitHub:https://github.com/your-org/superstore-mlops-forecastAchieved ~9.5% MAPE on 30-day Superstore sales forecasting using an XGBoost + Prophet ensemble with Optuna tuning, MLflow experiment tracking, FastAPI serving, and Docker Compose deployment. Walk-forward CV validated across 5 temporal folds. Dashboard features 3-D Plotly visuals, live KPI cards, filterable product metrics, and one-click CSV export.