Autorma — Automated Refund Item Classification

An end-to-end MLOps system for classifying returned e-commerce items using computer vision. Built with production-grade practices: model versioning, containerised services, batch inference, and real-time monitoring.

System Overview

Service	Purpose	Port
MLflow	Model registry & experiment tracking	5000
Model Service	FastAPI inference API	8000
Orchestrator	Batch inference runner	—
Prometheus	Metrics collection	9090
Pushgateway	Batch metrics ingestion	9091
Grafana	Dashboards	3000
Streamlit UI	Manual batch interface	8501

Model: EfficientNet-B0 fine-tuned on 5 return categories — Shirts, Watches, Casual Shoes, Tops, Handbags. Test accuracy: 96.53%

Asset Management

The following files are not in Git and must be downloaded before running the system:

Asset	Size	Location	Download	Required
Training Dataset	~1GB	`data/processed/`	Google Drive	Yes
Trained Model v1	~50MB	`models/v1/`	Google Drive	Yes

See docs/ASSETS.md for detailed download and verification instructions.

Quick Start

Prerequisites

Docker & Docker Compose
Python 3.12+ with uv
8GB RAM minimum
Downloaded assets (see above)

1. Clone and install

git clone https://github.com/DanielPopoola/autorma.git
cd autorma
uv sync

2. Create required directories

mkdir -p data/inference/{input,output,checkpoints} mlflow_data/artifacts logs

3. Start core services with Docker

MLflow and the Model Service run in Docker. Start them together:

docker compose up --build -d

Wait for both to be healthy (takes ~60s on first build — torch is large):

docker compose ps  # Both should show "healthy"

Services are accessible at:

MLflow UI: http://localhost:5000
Model Service: http://localhost:8000/docs

4. Register the model

This only needs to be done once (or after clearing the MLflow database). Run while Docker services are running:

MLFLOW_TRACKING_URI=http://localhost:5000 python scripts/register_model.py
MLFLOW_TRACKING_URI=http://localhost:5000 python scripts/set_production.py

⚠️ Always register while the Dockerised MLflow server is running. Registering against a locally-run MLflow instance records host-absolute artifact paths that containers cannot resolve. See docs/DEVELOPMENT.md for the full explanation of why this matters.

After registering, restart the model service to load the model:

docker compose restart model-service

Verify it loaded: curl http://localhost:8000/health

5. Start the Streamlit UI (optional)

streamlit run streamlit-ui/app.py

Access at: http://localhost:8501

6. Start the monitoring stack (optional)

cd monitoring && docker compose up -d

Access Grafana at http://localhost:3000 (admin/admin).

Running Batch Jobs

The orchestrator runs as a one-shot container — triggered manually or by cron.

Manual run

# Populate the input directory with test images
find data/processed/test -name "*.jpg" | shuf -n 50 | xargs -I {} cp {} data/inference/input/

# Run the orchestrator
docker compose --profile manual up orchestrator

Results are written to data/inference/output/ as JSON.

Scheduled via cron

crontab -e

# Add: run nightly at 2 AM
0 2 * * * cd /home/youruser/autorma && docker compose --profile manual run --rm orchestrator >> logs/cron.log 2>&1

Idempotency

The orchestrator checkpoints after each batch to data/inference/checkpoints/checkpoint.json. Re-running will skip already-processed images. To force a full rerun:

rm data/inference/checkpoints/checkpoint.json

Model Management

Register a new model version:

MLFLOW_TRACKING_URI=http://localhost:5000 python scripts/register_model.py
MLFLOW_TRACKING_URI=http://localhost:5000 python scripts/set_production.py

Roll back to a previous version:

import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
client = mlflow.MlflowClient()
client.set_registered_model_alias("refund-classifier", "production", "1")  # version number

Then restart the model service: docker compose restart model-service

Monitoring

With the monitoring stack running:

Prometheus: http://localhost:9090
Pushgateway: http://localhost:9091
Grafana: http://localhost:3000 — request rate, latency, class distribution, batch success rate

Key metrics exposed by the model service at /metrics:

api_requests_total{endpoint, status}
api_request_duration_seconds
prediction_confidence
predictions_by_class_total{class_name}
images_processed_total

Troubleshooting

Model service fails with "No such file or directory" on an artifact path

The model was registered against a non-Docker MLflow instance. The artifact path was recorded as a host-absolute path containers can't reach. Fix:

Delete the registered model and its experiment in the MLflow UI (http://localhost:5000)
Ensure Docker is running: docker compose up -d
Re-register: MLFLOW_TRACKING_URI=http://localhost:5000 python scripts/register_model.py
docker compose restart model-service

Model service exits immediately

docker logs model-service — MLflow likely wasn't healthy when the service started. Run docker compose restart model-service.

Orchestrator can't find images

data/inference/input/ on your host is mounted into the container. Confirm images are there: ls data/inference/input/.

Grafana shows no data

Run a batch job to generate traffic first, then expand the time range to "Last 6 hours".

👤 Author

Built as a final year project demonstrating end-to-end ML systems engineering.

Stack: Python · PyTorch · FastAPI · MLflow · Docker · Prometheus · Grafana · Streamlit

📄 License

MIT — see LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
docs		docs
koyeb-deployment		koyeb-deployment
model_service		model_service
models/v1		models/v1
monitoring		monitoring
notebooks		notebooks
orchestrator		orchestrator
scripts		scripts
streamlit-ui		streamlit-ui
.gitignore		.gitignore
.python-version		.python-version
PRE_DEFENSE_CHECKLIST.md		PRE_DEFENSE_CHECKLIST.md
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
mlflow.Dockerfile		mlflow.Dockerfile
mlflow.db		mlflow.db
model_service.Dockerfile		model_service.Dockerfile
orchestrator.Dockerfile		orchestrator.Dockerfile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autorma — Automated Refund Item Classification

Table of Contents

System Overview

Asset Management

Quick Start

Prerequisites

1. Clone and install

2. Create required directories

3. Start core services with Docker

4. Register the model

5. Start the Streamlit UI (optional)

6. Start the monitoring stack (optional)

Running Batch Jobs

Manual run

Scheduled via cron

Idempotency

Model Management

Monitoring

Troubleshooting

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autorma — Automated Refund Item Classification

Table of Contents

System Overview

Asset Management

Quick Start

Prerequisites

1. Clone and install

2. Create required directories

3. Start core services with Docker

4. Register the model

5. Start the Streamlit UI (optional)

6. Start the monitoring stack (optional)

Running Batch Jobs

Manual run

Scheduled via cron

Idempotency

Model Management

Monitoring

Troubleshooting

👤 Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages