Predictive Maintenance Engine — Enterprise Edition

An end-to-end production ML pipeline that predicts industrial machine failures before they happen.
Optimizes for total business cost in dollars — not accuracy, not F1.

🚀 Live Demo

Try it now — no setup, no API key required:

👉 https://predictive-maintenance-deep-shah.streamlit.app/

The live application allows you to:

Adjust real-time sensor sliders and watch the failure probability gauge update instantly
Upload a CSV of machine readings and get a full fleet risk assessment in seconds
Explore the business dashboard — compare reactive vs preventive vs AI-driven maintenance costs
Drag the decision threshold slider and watch FP/FN counts and total cost update live
Inspect SHAP waterfall charts that explain exactly why the model flagged any specific machine

📸 Application Screenshots

Live Prediction — Safe Machine (H-Type, Fresh Tool)

18.9% failure probability — SAFE. H-type machine with 25 minutes of tool wear, 2000 RPM, and 30 Nm torque. The gauge, risk badge, and cost analysis update in real time as sliders are adjusted. Cost if ignored: $1,891. Preventive maintenance: $500. Model recommendation: Save $1,391. All four tabs — Live Prediction, Batch Analysis, Business Dashboard, and Model Explainability — are visible in the tab bar.

Cost Impact Analysis — Safe Machine

Physics-derived features at the bottom — Temp Differential: 9.50 K (above the 8.6 K Heat Dissipation threshold), Mechanical Power: 60,842 W (within safe operating range), Force Ratio: 0.01509 (well below the 0.035 Overstrain threshold). All three engineered features confirm the machine is operating within healthy parameters.

Live Prediction — Critical Danger (L-Type, Multiple Failure Modes)

81.5% failure probability — DANGER. L-type machine with Temp_Diff of 7.40 K (below the 8.6 K Heat Dissipation threshold), 1,208 RPM (low), and 65 Nm torque. DANGER badge fires immediately. Multiple simultaneous failure modes detected — the model identifies the specific physical mechanisms, not just a probability score.

Cost Impact Analysis — Critical Machine

Expected cost if ignored: $8,151. Preventive maintenance cost: $500. Model recommendation: Save $7,651 — act immediately. Physics features confirm the failure signal: Temp Differential at 7.40 K (below the 8.6 K threshold). The 20:1 cost asymmetry ($10,000 failure vs $500 inspection) makes the maintenance decision unambiguous.

Batch Analysis — Fleet-Wide Risk Assessment

12 machines analyzed in one CSV upload. Fleet summary: 2 CRITICAL (16.7%), 3 MONITOR, 7 SAFE — $39,965 total cost at risk. The failure probability distribution chart separates the healthy cluster (left) from the at-risk machines (right of the DANGER threshold line). Maintenance teams get an immediate prioritized action list.

Batch Analysis — Risk-Ranked Machine Table

Machines sorted by failure probability descending. MACHINE-003 and MACHINE-004 flagged DANGER in red (81.2% and 80.9% — both L-type with tool wear 240+ minutes). Three MONITOR machines follow in orange. Color-coded Risk_Level column and Expected_Cost_$ give maintenance teams an immediate dollar-ranked action list. Full results downloadable as CSV.

Business Dashboard — Annual Cost Comparison

1,000-machine fleet simulation: Reactive maintenance costs $340,000/year. Full preventive costs $500,000/year. This model costs $79,000/year — catching 32 of 34 failures (94% recall). Savings vs reactive: $261,000 (76.8%). Savings vs full preventive: $421,000 (84.2%). Fleet size, failure rate, and cost parameters are all adjustable.

Model Leaderboard — 9-Model Benchmark

LightGBM selected as champion via 5-fold cross-validated F1 mean (0.7857) — not by test-set score. CatBoost ranks second with lower CV std (0.051 vs 0.063), indicating more stable folds. Champion selection by CV score prevents the model selection bias that occurs when the test set is used to pick between models.

SHAP Explainability — Safe Machine Waterfall

SHAP waterfall for a healthy H-type machine (18.9% failure probability). Blue bars dominate — high RPM (2000), low tool wear (25 min), and H-type quality tier all push the prediction strongly away from failure. The baseline probability (~3.4%, the dataset failure rate) is adjusted downward by each safe feature. This tells the operator not just "safe" but exactly which sensor readings are responsible for the clean health signal.

SHAP Explainability — Critical Danger Waterfall

SHAP waterfall for a critical L-type machine (81%+ failure probability). Red bars dominate — Tool Wear at 240 min, high Force Ratio from low RPM + high torque, and collapsed Temp_Diff all push the prediction strongly toward failure. The global feature importance chart below confirms these are the model's top features across all training data, not just this one prediction. An interviewer or business stakeholder can verify exactly which physical signals drove the alarm.

1. The Business Problem

Every hour of unplanned downtime in heavy manufacturing costs between $10,000 and $250,000 depending on the industry. Yet the two standard maintenance strategies are both fundamentally broken:

Strategy	What Goes Wrong	Hidden Cost
Reactive	Wait for failure, then fix it	Emergency repair + full production halt
Preventive (fixed schedule)	Service everything on a calendar	Replacing healthy components, unnecessary labor

Predictive maintenance is the only strategy that is neither wasteful nor dangerous. It uses real-time sensor data to generate a maintenance alert only when a specific machine is genuinely showing signs of imminent failure — catching the failure before it happens, touching nothing that doesn't need attention.

This project builds a full production-structured ML pipeline on the AI4I 2020 Predictive Maintenance Dataset (UCI / Kaggle) — a realistic simulation of CNC machine sensor telemetry across 10,000 operating cycles with a 97:3 healthy-to-failure class ratio.

2. What Makes This Different

The majority of ML classification projects optimize for accuracy. Accuracy is the wrong metric for this problem. On a factory floor, errors are not symmetric:

A missed failure (False Negative) = unplanned downtime, possible safety incident → $10,000
A false alarm (False Positive) = a technician dispatched unnecessarily → $500

That is a 20:1 cost asymmetry. Every decision in this pipeline flows from that single insight.

Side-by-side comparison

What a standard ML project does	What this pipeline does
Optimize accuracy or generic F1	Optimize total dollar cost: `(FP × $500) + (FN × $10,000)`
Single train/test split	3-way stratified split — train (60%) / val (20%) / test (20%)
Decision threshold fixed at 0.5	Threshold searched on validation set, reported on test set
`GridSearchCV` on F1	`GridSearchCV` on a custom business-cost scorer
SMOTE applied to the full dataset	SMOTE inside CV folds only — no synthetic leakage
Pick champion by test-set F1	Pick champion by 5-fold cross-validated F1 mean
No unit tests	14 pytest unit tests covering all core functions
Black-box predictions only	SHAP waterfall explains every individual prediction
Notebook only	Streamlit app + FastAPI + Docker + drift monitoring

3. System Architecture

3.1 ML Training Pipeline

Raw CSV (Google Drive / local cache)
        │
        ▼
┌─────────────────────────────────────┐
│         data_ingestion.py           │
│  Download → Schema validation       │
│  Deduplication → Null audit         │
│  Target column sanity check         │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│       feature_engineering.py        │
│  Physics feature creation           │
│  Drop leakage columns               │
│  3-way stratified split (60/20/20)  │
└───────┬─────────────┬───────────────┘
        │             │
   X_train        X_val, X_test
   y_train        y_val, y_test
        │             │
        ▼             │
┌─────────────────────────────────────┐
│           modeling.py               │
│  9-model zoo benchmarked via        │
│  5-fold StratifiedKFold CV          │
│                                     │
│  Each fold pipeline:                │
│    preprocessor (fit on fold only)  │
│    → SMOTE (train fold only)        │
│    → classifier                     │
│                                     │
│  Champion = highest CV_F1_Mean      │
│  GridSearchCV on business-cost      │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│          evaluation.py              │
│  optimize_threshold(X_val, y_val)   │  ← val set ONLY
│  Final report on (X_test, y_test)   │  ← test set, first touch here
│  Confusion matrix · ROC · Features  │
│  Save model → artifacts/models/     │
└─────────────────────────────────────┘

3.2 Full Production Stack

┌──────────────────────────────────────────────────────────────────┐
│                     PRODUCTION SYSTEM                            │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                  streamlit_app.py                        │   │
│  │                                                          │   │
│  │  Tab 1: Live Prediction  — gauge + risk + cost           │   │
│  │  Tab 2: Batch Analysis   — fleet CSV upload              │   │
│  │  Tab 3: Business Dashboard — cost comparison             │   │
│  │  Tab 4: Model Explainability — SHAP waterfall            │   │
│  └──────────────────────────┬───────────────────────────────┘   │
│                             │                                    │
│  ┌──────────────────────┐   │   ┌──────────────────────────┐   │
│  │  api/main.py         │   │   │     monitoring.py        │   │
│  │  POST /predict       │───┘   │  KS drift detection      │   │
│  │  POST /predict-batch │       │  → drift_alerts.csv      │   │
│  │  GET  /health        │       └──────────────────────────┘   │
│  └──────────────────────┘                                       │
│                             │                                    │
│                             ▼                                    │
│              ┌────────────────────────┐                         │
│              │  lightgbm_champion.pkl │                         │
│              │  + SHAP TreeExplainer  │                         │
│              └────────────────────────┘                         │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │            Docker / docker-compose (port 8000)            │  │
│  └───────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

4. Technical Decisions & Rationale

4.1 Physics-Based Feature Engineering

Three features were engineered from first principles of thermodynamics and rotational mechanics rather than feeding raw sensor readings directly into the model.

Feature	Formula	Physical Interpretation
`Temp_Diff`	Process Temp − Air Temp	Thermal gradient: a rising value signals heat retention preceding thermal failure
`Power`	Torque [Nm] × RPM	Mechanical power input to spindle: sustained peaks accelerate tool wear
`Force_Ratio`	Torque / (RPM + ε)	Load per revolution: high ratio at low speed indicates heavy cutting conditions

The ε = 1e-5 guard in Force_Ratio prevents division-by-zero. The SHAP global feature importance chart confirms Power ranks 2nd and Temp_Diff 3rd — above every raw sensor reading. Domain-driven features outperformed raw sensor data, and SHAP makes this empirically verifiable on any individual prediction.

4.2 Class Imbalance — SMOTE in the Right Place

The dataset is 96.6% healthy machines and 3.4% failures. Three decisions handle this correctly:

Stratified splits preserve the 3.4% failure rate across all three subsets. SMOTE inside CV folds via imblearn.Pipeline ensures synthetic minority samples are generated from training data only — the common mistake of applying SMOTE before CV inflates CV metrics by leaking synthetic copies of validation samples into training folds. Business-cost scorer explicitly encodes the 20:1 class cost asymmetry into hyperparameter search.

4.3 Why Three Splits (Train / Val / Test)?

If the decision threshold were optimised on the test set and then reported on the same set, the reported cost would be the minimum achievable on that specific sample — overly optimistic and non-generalising. The validation set is used exclusively for threshold search. The test set is touched exactly once — in evaluation.py — for the final unbiased report.

4.4 Champion Selection by CV F1, Not Test F1

Selecting the champion model by test-set score is model selection bias. Once you use the test set to make a decision, it is no longer a clean estimate of generalisation. All 9 models are ranked by 5-fold cross-validated F1 mean. The test set is only used for the final report after both champion and threshold are locked in.

4.5 Hyperparameter Tuning Objective

GridSearchCV minimizes (FP × $500) + (FN × $10,000) via a custom make_scorer with greater_is_better=False. The tuner directly searches for the configuration that saves the most money — not the one that maximises an abstract metric.

4.6 OrdinalEncoder for Machine Type

Type encodes a genuine quality tier: L (Low) < M (Medium) < H (High). OrdinalEncoder with categories=[['L', 'M', 'H']] preserves this ordering as integers (0, 1, 2). OneHotEncoder would discard the ordinal structure. The handle_unknown='use_encoded_value', unknown_value=-1 guard ensures the pipeline never crashes on unseen categories at inference time.

4.7 SHAP Explainability — Why It Matters Here

Most production ML deployments are black boxes. A maintenance technician who receives a "DANGER" alert needs to know which sensor triggered it — not just the probability. The SHAP TreeExplainer is initialised once per session (cached via st.cache_resource) and computes exact Shapley values for any input in milliseconds.

The implementation uses the LightGBM classifier extracted from the sklearn Pipeline:

classifier = model.named_steps["model"]
explainer  = shap.TreeExplainer(classifier)
shap_values = explainer.shap_values(X_transformed)  # X already preprocessed

The SHAP tab includes four quick-load presets (Safe, Critical Danger, Heat Dissipation Risk, Tool Wear Limit) so any visitor to the live app can see the explainability working within seconds — no manual slider adjustment required.

5. Results

5.1 Model Leaderboard — 5-Fold Stratified CV

Rank	Model	CV F1 Mean	CV F1 Std	CV AUC	Test F1	Test AUC
🥇	LightGBM	0.7857	0.0626	0.9707	0.7808	0.9847
🥈	CatBoost	0.7758	0.0512	0.9709	0.7200	0.9782
🥉	XGBoost	0.7543	0.0615	0.9638	0.7125	0.9799
4	Random Forest	0.7346	0.0522	0.9698	0.7355	0.9727
5	Gradient Boosting	0.6227	0.0217	0.9726	0.5957	0.9794
6	Decision Tree	0.5953	0.0370	0.8653	0.6067	0.8826
7	SVC	0.4972	0.0263	0.9621	0.4917	0.9731
8	Logistic Regression	0.2857	0.0147	0.9191	0.3021	0.9316
9	Gaussian NB	0.2654	0.0200	0.9075	0.2821	0.9038

LightGBM vs CatBoost: LightGBM wins on CV F1 mean (0.786 vs 0.776). CatBoost has lower CV std (0.051 vs 0.063) — more stable across folds. In production, an ensemble of both would be the natural next step.

5.2 Champion: LightGBM — Final Test-Set Report

Threshold optimized on validation set: 0.32

              precision    recall  f1-score   support

           0     0.9977    0.9063    0.9498      1932
           1     0.2612    0.9412    0.4089        68

    accuracy                         0.9075      2000
   macro avg     0.6295    0.9238    0.6794      2000
weighted avg     0.9652    0.9075    0.9320      2000

The model catches 64 of 68 actual failures (94.1% recall). 4 failures missed. 181 false alarms — a deliberate trade-off given a missed failure costs 20× more than a false alarm.

5.3 Diagnostic Plots

Confusion Matrix

64 failures correctly flagged. 4 missed at $10,000 each ($40,000). 181 false alarms at $500 each ($90,500). Total projected test-set cost: $130,500.

ROC Curve

AUC = 0.9847. The curve immediately reaches ~80% True Positive Rate at near-zero False Positive Rate.

Feature Importance

Tool wear [min] ranks first. Power and Temp_Diff — both engineered features — rank 2nd and 3rd, above every raw sensor reading. Domain engineering validated. The SHAP tab in the live app shows these same rankings at the individual-prediction level.

6. Business Impact

Cost breakdown on the test set (2,000 machine cycles)

Outcome	Count	Unit Cost	Total
False Negatives — missed failures	4	$10,000	$40,000
False Positives — unnecessary inspections	181	$500	$90,500
Total projected cost			$130,500

Comparison against standard maintenance strategies (1,000-machine fleet)

Strategy	Failures Caught	Annual Cost	Saving vs Reactive
Reactive — wait for breakdown	0%	$340,000	—
Preventive — fixed schedule	100%	$500,000	−$160,000
This Model — LightGBM, threshold 0.32	94%	$79,000	$261,000 (76.8%)

7. Repository Structure

predictive-maintenance-engine/
│
├── assets/
│   └── screenshots/
│       ├── 01_live_prediction_safe.png
│       ├── 02_cost_analysis_safe.png
│       ├── 03_live_prediction_danger.png
│       ├── 04_cost_analysis_danger.png
│       ├── 05_batch_analysis_summary.png
│       ├── 06_batch_analysis_table.png
│       ├── 07_business_dashboard.png
│       ├── 08_model_leaderboard.png
│       ├── 09_shap_safe_machine.png
│       └── 10_shap_critical_danger.png
│
├── artifacts/                         # Auto-generated — gitignored
│   ├── graphs/
│   │   ├── confusion_matrix.png
│   │   ├── roc_curve.png
│   │   └── feature_importance.png
│   └── model_leaderboard.csv
│
├── api/
│   ├── __init__.py
│   └── main.py                        # /predict, /predict-batch, /health
│
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── data_ingestion.py
│   ├── feature_engineering.py
│   ├── modeling.py
│   └── evaluation.py
│
├── tests/
│   └── test_pipeline.py               # 14 pytest unit tests
│
├── main_execution.ipynb               # Training pipeline (Colab)
├── run_pipeline.py                    # Training pipeline (local)
├── streamlit_app.py                   # Streamlit dashboard (4 tabs)
├── monitoring.py                      # KS drift detection
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

8. Quickstart

Option A — Live App (No Installation)

Visit https://predictive-maintenance-deep-shah.streamlit.app/ directly in your browser.

Option B — Google Colab (Recommended for Training)

1. Upload the project to Google Drive:

MyDrive/
└── predictive-maintenance-engine/
    ├── src/
    ├── api/
    ├── tests/
    ├── streamlit_app.py
    ├── monitoring.py
    └── requirements.txt

2. Open main_execution.ipynb in Google Colab and run all cells.

The pipeline mounts Drive, downloads the dataset automatically via gdown, trains all 9 models, tunes the champion, and saves every artifact back to Drive.

Option C — Local

git clone https://github.com/DeepShah111/predictive-maintenance-engine.git
cd predictive-maintenance-engine

python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Mac/Linux

pip install -r requirements.txt
python run_pipeline.py

9. Streamlit App

streamlit run streamlit_app.py
# → http://localhost:8501

Tab	What it does
⚡ Live Prediction	Sensor sliders → real-time failure probability gauge + risk level + cost impact
📂 Batch Analysis	Upload CSV → ranked fleet risk table + distribution chart + downloadable results
📊 Business Dashboard	Strategy cost comparison + live threshold slider with FP/FN/cost update
🔍 Model Explainability	SHAP waterfall chart + global feature importance — explains any prediction in plain English

SHAP Explainability Tab

The Model Explainability tab uses shap.TreeExplainer on the LightGBM classifier to compute exact Shapley values for any sensor reading. Four quick-load presets are included so any user can immediately see the explainability working:

Preset	What it demonstrates
Safe Machine (H-type)	Blue bars dominate — high RPM, fresh tool, H-tier push prediction toward safe
Critical Danger (L-type)	Red bars dominate — Tool Wear, Force Ratio, Temp_Diff all fire simultaneously
Heat Dissipation Risk	Temp_Diff below 8.6 K threshold as the primary red bar
Tool Wear Limit	Tool Wear at 253 min (maximum) as the single dominant red bar

Live deployment: https://predictive-maintenance-deep-shah.streamlit.app/

10. FastAPI — REST Endpoints

uvicorn api.main:app --reload --port 8000
# → http://localhost:8000/docs

Method	Endpoint	Description
`GET`	`/health`	Model loaded status, threshold, version
`POST`	`/predict`	Single reading → probability + risk level + recommended action
`POST`	`/predict-batch`	List of readings → predictions + fleet summary

Example — Single Prediction:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "machine_type": "L",
    "air_temperature_K": 302.0,
    "process_temperature_K": 309.0,
    "rotational_speed_rpm": 1200,
    "torque_Nm": 65.0,
    "tool_wear_min": 240,
    "machine_id": "MACHINE-001"
  }'

Expected response:

{
  "machine_id": "MACHINE-001",
  "failure_probability": 0.812,
  "failure_probability_pct": 81.2,
  "risk_level": "DANGER",
  "recommended_action": "IMMEDIATE maintenance required. Take machine offline.",
  "expected_cost_if_ignored": 8120.0,
  "physics_features": {
    "Temp_Diff": 7.0,
    "Power": 72600.0,
    "Force_Ratio": 0.054167
  },
  "model_name": "Lightgbm",
  "threshold_used": 0.32
}

11. Docker Deployment

# Build and run
docker compose up --build
# → API available at http://localhost:8000

# Stop
docker compose down

The artifacts/ directory is mounted as a read-only volume so the container always uses the latest trained model without a rebuild.

12. Drift Detection & Monitoring

The monitoring.py module detects covariate shift between training and production data using the Kolmogorov-Smirnov test (α = 0.05).

from monitoring import DriftMonitor
import pandas as pd

monitor = DriftMonitor()
alerts = monitor.check_drift(pd.read_csv("new_readings.csv"), tag="production_batch_1")

if alerts:
    for a in alerts:
        print(f"DRIFT: {a['feature']} — shift {a['mean_shift_pct']:.1f}%")

CLI usage:

python monitoring.py --csv new_sensor_data.csv --tag production_jan_2025

All alerts logged to artifacts/drift_alerts.csv with timestamp, KS statistic, p-value, and mean shift percentage.

13. Running Tests

python -m pytest tests/ -v

collected 14 items

tests/test_pipeline.py::test_physics_features_columns_created             PASSED
tests/test_pipeline.py::test_physics_features_temp_diff_value             PASSED
tests/test_pipeline.py::test_physics_features_power_value                 PASSED
tests/test_pipeline.py::test_physics_features_no_infinities               PASSED
tests/test_pipeline.py::test_leakage_cols_dropped_after_split             PASSED
tests/test_pipeline.py::test_get_preprocessor_returns_column_transformer  PASSED
tests/test_pipeline.py::test_clean_data_removes_duplicates                PASSED
tests/test_pipeline.py::test_clean_data_index_is_contiguous               PASSED
tests/test_pipeline.py::test_build_features_and_split_returns_six_objects PASSED
tests/test_pipeline.py::test_build_features_and_split_sizes               PASSED
tests/test_pipeline.py::test_build_features_and_split_class_balance       PASSED
tests/test_pipeline.py::test_total_cost_metric_correct_value              PASSED
tests/test_pipeline.py::test_total_cost_metric_degenerate_returns_inf     PASSED
tests/test_pipeline.py::test_schema_validation_raises_on_missing_columns  PASSED

14 passed in ~18s

14. Dataset

AI4I 2020 Predictive Maintenance Dataset

Property	Value
Source	UCI ML Repository · Kaggle
Rows	10,000
Features used	11 (8 numerical + 1 categorical + 3 physics-derived)
Target	`Machine failure` (binary: 0 = healthy, 1 = failure)
Class distribution	96.6% healthy / 3.4% failure
Leakage columns dropped	`UDI`, `Product ID`, `TWF`, `HDF`, `PWF`, `OSF`, `RNF`

The leakage columns (TWF through RNF) are individual failure-mode sub-flags set to 1 only when Machine failure is also 1. Keeping them would let the model read the answer directly — they are dropped before any modelling step. The dataset downloads automatically on first run via gdown.

Built as a portfolio project demonstrating production ML engineering practices.
Structured for clarity, correctness, and interview-readiness.

🚀 Live Demo | 📁 GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
api		api
artifacts		artifacts
assets/screenshots		assets/screenshots
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
monitoring.py		monitoring.py
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py
runtime.txt		runtime.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

Predictive Maintenance Engine — Enterprise Edition

🚀 Live Demo

📸 Application Screenshots

Live Prediction — Safe Machine (H-Type, Fresh Tool)

Cost Impact Analysis — Safe Machine

Live Prediction — Critical Danger (L-Type, Multiple Failure Modes)

Cost Impact Analysis — Critical Machine

Batch Analysis — Fleet-Wide Risk Assessment

Batch Analysis — Risk-Ranked Machine Table

Business Dashboard — Annual Cost Comparison

Model Leaderboard — 9-Model Benchmark

SHAP Explainability — Safe Machine Waterfall

SHAP Explainability — Critical Danger Waterfall

Table of Contents

1. The Business Problem

2. What Makes This Different

Side-by-side comparison

3. System Architecture

3.1 ML Training Pipeline

3.2 Full Production Stack

4. Technical Decisions & Rationale

4.1 Physics-Based Feature Engineering

4.2 Class Imbalance — SMOTE in the Right Place

4.3 Why Three Splits (Train / Val / Test)?

4.4 Champion Selection by CV F1, Not Test F1

4.5 Hyperparameter Tuning Objective

4.6 OrdinalEncoder for Machine Type

4.7 SHAP Explainability — Why It Matters Here

5. Results

5.1 Model Leaderboard — 5-Fold Stratified CV

5.2 Champion: LightGBM — Final Test-Set Report

5.3 Diagnostic Plots

6. Business Impact

Cost breakdown on the test set (2,000 machine cycles)

Comparison against standard maintenance strategies (1,000-machine fleet)

7. Repository Structure

8. Quickstart

Option A — Live App (No Installation)

Option B — Google Colab (Recommended for Training)

Option C — Local

9. Streamlit App

SHAP Explainability Tab

10. FastAPI — REST Endpoints

11. Docker Deployment

12. Drift Detection & Monitoring

13. Running Tests

14. Dataset

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages