Superstore Sales Analytics & Forecasting Platform

Production-grade MLOps pipeline and analytics dashboard for the Superstore retail dataset. Combines XGBoost (Optuna-tuned) with Prophet in a walk-forward validated ensemble, served via FastAPI, tracked in MLflow, and visualised in a professional Streamlit dashboard.

Architecture Overview

flowchart LR
    A[(SampleSuperstore.csv)] -->|ETL| B[data_prep.py\nClean · Aggregate · Feature Eng.]
    B --> C[(clean.parquet\nprocessed.parquet)]
    C -->|Train| D[train.py\nTimeSeriesSplit CV\nOptuna 50 trials]
    D --> E[XGBoost\nOptimal Hyperparams]
    D --> F[Prophet\nUS Holidays + Seasonality]
    E --> G{Ensemble Weight\nOptimisation}
    F --> G
    G --> H[(models/\nxgb_model.pkl\nprophet_model.pkl\nmetadata.json)]
    G --> I[MLflow\nTracking Server :5000]
    H -->|Artefact load| J[predict.py\n30-Day Iterative Forecast]
    J --> K[(data/forecast.parquet)]
    H --> L[FastAPI :8000\n/forecast  /health  /metrics]
    C --> M[Streamlit Dashboard :8501]
    K --> M
    L --> M
    I --> M

Repository Structure

superstore-mlops-forecast/
├── src/
│   ├── __init__.py
│   ├── data_prep.py        # ETL: load → clean → daily agg → feature engineering
│   ├── train.py            # Training: MLflow + Optuna + XGBoost + Prophet + ensemble
│   ├── predict.py          # Inference: ForecastEngine + generate_forecast()
│   └── api.py              # FastAPI: /forecast  /health  /metrics
├── dashboard/
│   ├── __init__.py
│   └── app.py              # Streamlit: KPIs · 3-D charts · forecast · export
├── models/                 # Saved artefacts (auto-generated by make train)
│   ├── xgb_model.pkl
│   ├── prophet_model.pkl
│   └── metadata.json
├── data/                   # Raw and processed datasets
│   ├── SampleSuperstore.csv        # ← place here (download from Kaggle)
│   ├── clean.parquet               # generated by make prepare
│   ├── processed.parquet           # generated by make prepare
│   └── forecast.parquet            # generated by make train
├── requirements.txt
├── Dockerfile
├── mlflow.Dockerfile
├── docker-entrypoint.sh
├── docker-compose.yml
└── Makefile

Quick Start

Prerequisites

Python 3.11+
pip
(Optional) Docker & Docker Compose

1 — Install dependencies

pip install -r requirements.txt

2 — Download the dataset

Download SampleSuperstore.csv from Kaggle:

https://www.kaggle.com/datasets/vivek468/superstore-dataset-final

Place the file at data/SampleSuperstore.csv.

3 — Prepare data

make prepare
# Generates: data/clean.parquet  data/processed.parquet

4 — Train the ensemble model

make train          # Full training — 50 Optuna trials (recommended)
# or
make quick-train    # 10 Optuna trials — for rapid prototyping

What this does:

Walk-forward cross-validation (5 folds, TimeSeriesSplit)
Optuna hyperparameter search for XGBoost (100–500 estimators, depth 3–10, etc.)
Prophet training with US holiday regressors
Scipy-based ensemble weight optimisation (minimises MAPE)
Logs all metrics and artefacts to MLflow (SQLite backend)
Generates data/forecast.parquet (30-day predictions)

5 — Launch the dashboard

make dashboard
# → http://localhost:8501

6 — Start the prediction API

make serve
# → http://localhost:8000/docs  (Swagger UI)
# → http://localhost:8000/forecast?days=30

7 — View MLflow experiment tracking

make mlflow-ui
# → http://localhost:5000

Docker Deployment

Bring up all three services (MLflow + FastAPI + Streamlit) with a single command:

# Build images and start in detached mode
docker-compose up --build -d

# Service URLs
#   Dashboard : http://localhost:8501
#   API       : http://localhost:8000
#   MLflow    : http://localhost:5000

# View logs
make docker-logs

# Shut down
make docker-down

Model Performance

Metric	Target	Typical Result
Ensemble MAPE (validation)	< 12%	~9–10%
XGBoost MAPE	—	~10–12%
Prophet MAPE	—	~12–15%
Walk-Forward CV MAPE (5-fold)	< 15%	~11–13%
RMSE (validation)	—	Logged in MLflow

Feature Engineering Summary

Group	Features
Calendar	day_of_week, day_of_month, month, quarter, year, week_of_year, day_of_year
Binary flags	is_weekend, is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_holiday
Cyclical	sin/cos encodings for month, day-of-week, day-of-year
Holiday proximity	days_to_next_holiday, days_since_last_holiday
Lag features	sales_lag_1/2/3/7/14/21/28
Rolling stats	7/14/28-day rolling mean, std, min, max
EWM	sales_ewm_7/14/28
Contextual	discount, quantity, orders

MLflow Experiment Structure

Experiment name: superstore_sales_forecast

Logged per run:

Parameters: all XGBoost hyperparams (Optuna best), ensemble_alpha, dataset sizes
Metrics: xgb_val_mape, xgb_val_rmse, prophet_val_mape, ensemble_val_mape, cv_mape_mean, cv_mape_std
Artefacts: xgb_model.pkl, metadata.json, forecast.parquet

Summary

Live MLOps Sales Platform: https://sales-predictor-7vfxujqpva6pe3zvwvffnm.streamlit.app GitHub: https://github.com/your-org/superstore-mlops-forecast Achieved ~9.5% MAPE on 30-day Superstore sales forecasting using an XGBoost + Prophet ensemble with Optuna tuning, MLflow experiment tracking, FastAPI serving, and Docker Compose deployment. Walk-forward CV validated across 5 temporal folds. Dashboard features 3-D Plotly visuals, live KPI cards, filterable product metrics, and one-click CSV export.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Superstore Sales Analytics & Forecasting Platform

Architecture Overview

Repository Structure

Quick Start

Prerequisites

1 — Install dependencies

2 — Download the dataset

3 — Prepare data

4 — Train the ensemble model

5 — Launch the dashboard

6 — Start the prediction API

7 — View MLflow experiment tracking

Docker Deployment

Model Performance

Feature Engineering Summary

MLflow Experiment Structure

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.streamlit		.streamlit
dashboard		dashboard
data		data
mlruns/1		mlruns/1
models		models
src		src
.DS_Store		.DS_Store
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
mlflow.Dockerfile		mlflow.Dockerfile
mlruns.db		mlruns.db
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Superstore Sales Analytics & Forecasting Platform

Architecture Overview

Repository Structure

Quick Start

Prerequisites

1 — Install dependencies

2 — Download the dataset

3 — Prepare data

4 — Train the ensemble model

5 — Launch the dashboard

6 — Start the prediction API

7 — View MLflow experiment tracking

Docker Deployment

Model Performance

Feature Engineering Summary

MLflow Experiment Structure

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages