A minimal FastAPI service that accepts CSV uploads and returns a structured JSON data quality report.
CSV files are still common in analytics, operations, and internal business workflows. Before a CSV file is used in a pipeline, it is useful to run simple quality checks such as missing values, duplicate rows, empty columns, and schema mismatch checks.
This API demonstrates a complete but lightweight backend workflow:
- upload a CSV file
- validate basic file constraints
- analyze the CSV with pandas
- return a typed JSON response with Pydantic models
- test the API with pytest
- package the service with Docker
The POST /analyze endpoint returns:
row_countcolumn_countcolumn_namesmissing_values_by_columnmissing_value_ratio_by_columnduplicate_row_countduplicate_row_ratioempty_columnscolumn_name_issues- optional
schema_validation warnings
The API also includes structured errors for:
- non-CSV files
- empty uploads
- parse errors
- unsupported encodings
- files larger than 5 MB
- Python 3.12
- FastAPI
- Pydantic
- pandas
- pytest
- Uvicorn
- Docker
fastapi-csv-quality-api/
├── README.md
├── LICENSE
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── .gitignore
├── app/
│ ├── __init__.py
│ ├── main.py
│ ├── models.py
│ ├── analyzer.py
│ └── errors.py
├── tests/
│ ├── __init__.py
│ ├── test_health.py
│ ├── test_analyze.py
│ └── fixtures/
├── sample_data/
├── screenshots/
├── docs/
└── article_assets/
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt
uvicorn app.main:app --reloadIf PowerShell blocks virtual environment activation, run:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserThen activate again:
.\.venv\Scripts\Activate.ps1python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
uvicorn app.main:app --reloadOpen Swagger UI:
http://127.0.0.1:8000/docs
curl http://127.0.0.1:8000/healthExpected response:
{
"status": "ok",
"service": "fastapi-csv-quality-api",
"version": "0.1.0"
}PowerShell:
curl.exe -X POST "http://127.0.0.1:8000/analyze" `
-F "file=@sample_data/good_sample.csv"bash:
curl -X POST "http://127.0.0.1:8000/analyze" \
-F "file=@sample_data/good_sample.csv"PowerShell:
curl.exe -X POST "http://127.0.0.1:8000/analyze" `
-F "file=@sample_data/bad_sample.csv"bash:
curl -X POST "http://127.0.0.1:8000/analyze" \
-F "file=@sample_data/bad_sample.csv"PowerShell:
curl.exe -X POST "http://127.0.0.1:8000/analyze" `
-F "file=@sample_data/good_sample.csv" `
-F "expected_columns=id,name,email,age,signup_date"bash:
curl -X POST "http://127.0.0.1:8000/analyze" \
-F "file=@sample_data/good_sample.csv" \
-F "expected_columns=id,name,email,age,signup_date"{
"filename": "bad_sample.csv",
"row_count": 6,
"column_count": 6,
"column_names": [
"id",
"name",
"email",
"age",
"signup_date",
"notes"
],
"missing_values_by_column": {
"id": 0,
"name": 1,
"email": 2,
"age": 1,
"signup_date": 2,
"notes": 6
},
"missing_value_ratio_by_column": {
"id": 0.0,
"name": 0.1667,
"email": 0.3333,
"age": 0.1667,
"signup_date": 0.3333,
"notes": 1.0
},
"duplicate_row_count": 1,
"duplicate_row_ratio": 0.1667,
"empty_columns": [
"notes"
],
"column_name_issues": {
"duplicate_columns": [],
"unnamed_columns": [],
"columns_with_leading_or_trailing_spaces": [],
"empty_column_names": []
},
"schema_validation": null,
"warnings": [
"The CSV file contains 12 missing value(s).",
"The CSV file contains 1 duplicate row(s).",
"The CSV file contains empty column(s): notes."
]
}{
"error": {
"code": "invalid_file_type",
"message": "Only .csv files are supported.",
"details": {
"filename": "not_csv.txt"
}
}
}Run:
pytestThe minimum test suite covers:
/healthreturns 200- normal CSV analysis
- missing value detection
- duplicate row detection
- non-CSV error handling
- expected column schema validation
docker build -t fastapi-csv-quality-api .docker run --rm -p 8000:8000 fastapi-csv-quality-apiOpen:
http://127.0.0.1:8000/docs
The container listens on 0.0.0.0:8000. Your local machine accesses it through 127.0.0.1:8000 after port mapping.
docker compose up --buildStop:
docker compose downCommon Windows, Docker, and CSV parsing issues are documented in docs/troubleshooting.md.
In Windows PowerShell, use curl.exe for multipart upload examples. Some PowerShell environments treat curl as an alias.
Run:
docker psStop the container using the port:
docker stop <container_id>Or map to a different local port:
docker run --rm -p 8001:8000 fastapi-csv-quality-apiThen open:
http://127.0.0.1:8001/docs
This MVP intentionally avoids overengineering. Possible future improvements:
- configurable file size limit
- optional date format checks
- optional numeric column checks
- JSON schema export
- CI workflow with GitHub Actions
- deployment tutorial for a small cloud VM or Kubernetes platform
Not included in this MVP:
- authentication
- database storage
- frontend UI
- background jobs
- large-file streaming
- Kubernetes deployment
- production cloud infrastructure
- Build a CSV Data Quality API with FastAPI, Pandas, Pytest, and Docker
- AI-Assisted Development Is Not Autopilot
MIT




