Audio Processor

Quality & Security

CI/CD Status

Project Info


Author	Byron Williams
Created	2025-12-04
Repository	ByronWilliamsCPA/audio-processor

Overview

Audio file conversion and processing for RAG content pipelines

This project provides:

Core functionality for audio file conversion and processing for rag content pipelines
Production-ready code with comprehensive testing
Well-documented API and architecture
Security-first development practices

Features

High Quality: 80%+ test coverage enforced via CI
Type Safe: Full type hints with BasedPyright strict mode
Well Documented: Clear docstrings and comprehensive guides
Developer Friendly: Pre-commit hooks, automated formatting, linting
Security First: Dependency scanning, security analysis, SBOM generation
CLI Tool: Command-line interface via audio_processor
ML Ready: Optional ML dependencies with PyTorch support

Quick Start

Prerequisites

Python 3.10+ (tested with 3.12)
UV for dependency management

Install UV:

# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or with pip/pipx
pip install uv
# or
pipx install uv

Installation

# Clone repository
git clone https://github.com/ByronWilliamsCPA/audio-processor.git
cd audio_processor

# Install dependencies (includes dev tools - REQUIRED for development)
uv sync --all-extras
# Install with ML dependencies
uv sync --all-extras,ml

# Setup pre-commit hooks (required)
uv run pre-commit install

Basic Usage

# Import and use the package
from audio_processor import YourModule

# Example: Create an instance and use it
module = YourModule()
result = module.process()
print(result)

CLI Usage

# Display help
uv run audio_processor --help

# Use the CLI tool
uv run audio_processor command --option value

# Example: Process input file
uv run audio_processor process input.txt --output result.json

Supply Chain Security

This project implements enterprise-grade supply chain security with a multi-tier package index strategy and centralized secrets management.

Security Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Package Index Priority                        │
├─────────────────────────────────────────────────────────────────┤
│  1. Google Assured OSS (SLSA Level 3) - Third-party packages    │
│  2. Internal Artifact Registry - Organization packages           │
│  3. PyPI (fallback) - Packages not in tier 1 or 2               │
└─────────────────────────────────────────────────────────────────┘

Quick Start

# Run the setup script
./scripts/setup-supply-chain.sh

# Or manually configure
gcloud auth login
gcloud auth application-default login
pip install keyrings.google-artifactregistry-auth

Package Indexes

Index	SLSA Level	Purpose	Default
PyPI	-	Standard packages	Yes (default)
Google Assured OSS	3	Verified third-party packages	Opt-in
Internal Registry	2+	Organization-maintained packages	Opt-in

How It Works:

By default, all packages resolve from PyPI. After configuring GCP authentication, you can opt-in specific packages to use Assured OSS by uncommenting entries in pyproject.toml:

[tool.uv.sources]
numpy = { index = "assured-oss" }
pandas = { index = "assured-oss" }
requests = { index = "assured-oss" }

Why This Matters:

SLSA Level 3: Build integrity, provenance, and tamper-proof artifacts
Supply Chain Protection: Reduced risk of dependency confusion attacks
Compliance: Meets enterprise security and audit requirements
Graceful Fallback: Works without authentication, opt-in when ready

Secrets Management with Infisical

Secrets are managed via Infisical instead of environment variables or GitHub Secrets.

Local Development:

# Login to Infisical
infisical login

# Initialize project connection
infisical init

# Run commands with secrets injected
infisical run --env=dev -- uv run python main.py

# Or export secrets to local file
infisical export --env=dev > .env.local

CI/CD Integration:

GitHub Actions use Infisical's Machine Identity authentication
Secrets are injected at runtime, never stored in repositories
Environment mapping: main → prod, develop → staging, * → dev

SBOM & Attestation

Software Bill of Materials (SBOM) is generated on every release:

# Generate SBOM locally
uv run cyclonedx-py environment -o sbom.json

# Verify package attestation
pip-audit --require-hashes

Automated via CI:

CycloneDX SBOM generated in JSON and XML formats
Attestation attached to GitHub releases
Vulnerability scanning with OSV database

Setup Instructions

Run the setup script (recommended):
```
./scripts/setup-supply-chain.sh
```

Or configure manually:

Google Cloud Authentication:

gcloud auth login
gcloud auth application-default login
pip install keyrings.google-artifactregistry-auth

Infisical Setup:

# Install Infisical CLI
# macOS
brew install infisical/get-cli/infisical

# Linux
curl -1sLf 'https://dl.cloudsmith.io/public/infisical/infisical-cli/setup.deb.sh' | sudo -E bash
sudo apt-get install infisical

# Connect to project
infisical login
infisical init

Configure CI/CD secrets in Infisical:
- GCP_SA_KEY_BASE64: Base64-encoded GCP service account key
- CODECOV_TOKEN: Codecov upload token (if using Codecov)
- SONAR_TOKEN: SonarCloud token (if using SonarCloud)

Required GCP Permissions

Role	Purpose
`roles/artifactregistry.reader`	Read from Assured OSS and internal registry
`roles/artifactregistry.writer`	Publish to internal registry (CI only)

Troubleshooting

Q: Packages not found in Assured OSS?

UV automatically falls back to PyPI - no action needed
Check available packages: Assured OSS Supported Packages

Q: Authentication errors with Artifact Registry?

Run gcloud auth application-default login to refresh credentials
Verify service account has Artifact Registry Reader role
Check keyring is installed: pip install keyrings.google-artifactregistry-auth

Q: Infisical connection issues?

Verify .infisical.json has correct workspaceId
Check your Infisical organization permissions
For CI: Ensure INFISICAL_CLIENT_ID and INFISICAL_CLIENT_SECRET are set

Q: How to verify supply chain setup?

# Test package index access
./scripts/setup-supply-chain.sh  # Re-run to verify all checks pass

Development

Setup Development Environment

# Install all dependencies including dev tools
uv sync --all-extras

# Setup pre-commit hooks
uv run pre-commit install

# Install Qlty CLI for unified code quality checks
curl https://qlty.sh | bash

# Run tests
uv run pytest -v

# Run with coverage
uv run pytest --cov=audio_processor --cov-report=html

# Run all quality checks (using Qlty)
qlty check

# Or use pre-commit
uv run pre-commit run --all-files

Code Quality Standards

All code must meet these requirements:

Formatting: Ruff (88 char limit)
Linting: Ruff with PyStrict-aligned rules (see below)
Type Checking: BasedPyright strict mode
Testing: Pytest with 80%+ coverage
Security: Bandit + dependency scanning
Documentation: Docstrings on all public APIs

Unified Quality Tool: This project uses Qlty to consolidate all quality checks into a single fast tool. See .qlty/qlty.toml for configuration.

PyStrict-Aligned Ruff Configuration

This project uses PyStrict-aligned Ruff rules for stricter code quality enforcement beyond standard Python linting:

Rule	Category	Purpose
BLE	Blind except	Prevent bare `except:` clauses
EM	Error messages	Enforce descriptive error messages
SLF	Private access	Prevent access to private members
INP	Implicit packages	Require explicit `__init__.py`
ISC	Implicit concatenation	Prevent implicit string concatenation
PGH	Pygrep hooks	Advanced pattern-based checks
RSE	Raise statement	Proper exception raising
TID	Tidy imports	Clean import organization
YTT	sys.version	Safe version checking
FA	Future annotations	Modern annotation syntax
T10	Debugger	No debugger statements in production
G	Logging format	Safe logging string formatting

These rules catch bugs that standard linting misses and enforce production-quality code patterns.

Claude Code Standards

This project includes standardized Claude Code configuration via git subtree:

Directory Structure:

.claude/
├── claude.md          # Project-specific Claude guidelines
└── standard/          # Standard Claude configuration (git subtree)
    ├── CLAUDE.md      # Universal development standards
    ├── commands/      # Custom slash commands
    ├── skills/        # Reusable skills
    └── agents/        # Specialized agents

Updating Standards:

# Pull latest standards from upstream
./scripts/update-claude-standards.sh

# Or manually
git subtree pull --prefix .claude/standard \
    https://github.com/williaby/.claude.git main --squash

What's Included:

Universal development best practices
Response-Aware Development (RAD) system for assumption tagging
Agent assignment patterns and workflow
Security requirements and pre-commit standards
Git workflow and commit conventions

Project-Specific Overrides: Edit .claude/claude.md for project-specific guidelines. See .claude/README.md for details.

Running Tests

# Run all tests
uv run pytest -v

# Run specific test file
uv run pytest tests/unit/test_module.py -v

# Run with coverage report
uv run pytest --cov=audio_processor --cov-report=term-missing

# Run tests in parallel
uv run pytest -n auto

Quality Checks with Qlty

Recommended: Use Qlty CLI for unified code quality checks.

# Run all quality checks (fast!)
qlty check

# Run checks on only changed files (fastest)
qlty check --filter=diff

# Run specific plugins only
qlty check --plugin ruff --plugin pyright

# Auto-format code
qlty fmt

# View current configuration
qlty config show

Qlty runs all these tools in a single pass:

Python Quality:

Ruff (linting + formatting)
BasedPyright (type checking)
Bandit (security scanning)

Security & Secrets:

Gitleaks (secrets detection)
TruffleHog (entropy-based secrets detection)
OSV Scanner (dependency vulnerabilities)
Semgrep (advanced SAST)

File & Configuration:

Markdownlint (markdown linting)
Yamllint (YAML linting)
Prettier (JSON, YAML, Markdown formatting)
Actionlint (GitHub Actions workflows)
Shellcheck (shell script linting)

Container & Infrastructure (if Docker enabled):

Hadolint (Dockerfile linting)
Trivy (container security scanning)
Checkov (infrastructure as code security)

Code Quality Metrics:

Complexity analysis (cyclomatic, cognitive)
Code smells detection
Maintainability scoring

Individual Tool Commands (if needed)

# Format code
uv run ruff format src tests

# Lint and auto-fix
uv run ruff check --fix src tests

# Type checking
uv run basedpyright src

# Security scanning
uv run bandit -r src

# Dependency vulnerabilities
qlty check --plugin osv_scanner

Project Structure

audio_processor/
├── src/audio_processor/     # Main package
│   ├── __init__.py
│   ├── core.py                           # Core functionality
│   └── utils/                            # Utility modules
├── tests/                                # Test suite
│   ├── unit/                             # Unit tests
│   └── integration/                      # Integration tests
├── docs/                                 # Documentation
│   ├── ADRs/                             # Architecture Decision Records
│   ├── planning/                         # Project planning docs
│   └── guides/                           # User guides
├── pyproject.toml                        # Dependencies & tool config
├── README.md                             # This file
├── CONTRIBUTING.md                       # Contribution guidelines
└── LICENSE                               # License

Documentation

CONTRIBUTING.md: How to contribute to the project
docs/ADRs/README.md: Architecture Decision Records documentation
docs/planning/project-plan-template.md: Project planning guide

Writing Documentation

Use Markdown for all documentation
Include code examples for clarity
Update README.md when adding major features
Maintain architecture documentation (see docs/ADRs/)

Testing

Testing Policy

All new functionality must include tests:

Unit tests: Test individual functions/classes
Integration tests: Test component interactions
Coverage: Maintain 80%+ coverage
Markers: Use pytest markers (@pytest.mark.unit, @pytest.mark.integration)

Test Guidelines

# Run all tests
uv run pytest -v

# Run only unit tests
uv run pytest -v -m unit

# Run only integration tests
uv run pytest -v -m integration

# Run with coverage requirements
uv run pytest --cov=audio_processor --cov-fail-under=80

Security

Security-First Development

Validate all inputs
Use secure defaults
Scan dependencies regularly
Report vulnerabilities responsibly

Reporting Security Issues

Please report security vulnerabilities to byron@williamshome.family rather than using the public issue tracker.

See the ByronWilliamsCPA Security Policy for complete disclosure policy and response timelines.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for:

Development setup
Code quality standards
Testing requirements
Git workflow and commit conventions
Pull request process

Quick Checklist Before Submitting PR

Code follows style guide (Ruff format + lint)
All tests pass with 80%+ coverage
BasedPyright type checking passes
Docstrings added for new public APIs
CHANGELOG.md updated (if significant change)
Commits follow conventional commit format

Versioning

This project uses Semantic Versioning:

MAJOR version: Incompatible API changes
MINOR version: Backwards-compatible functionality additions
PATCH version: Backwards-compatible bug fixes

Current version: 0.1.0

Automated Releases with Semantic Release

This project uses python-semantic-release for automated versioning based on Conventional Commits.

How it works:

Commit messages determine version bumps:
- fix: commits trigger a PATCH release (1.0.0 → 1.0.1)
- feat: commits trigger a MINOR release (1.0.0 → 1.1.0)
- BREAKING CHANGE: in commit body or ! after type triggers MAJOR release (1.0.0 → 2.0.0)
On merge to main:
- Analyzes commits since last release
- Determines appropriate version bump
- Updates version in pyproject.toml
- Generates/updates CHANGELOG.md
- Creates Git tag and GitHub Release
- Publishes to PyPI (if configured)

Commit message examples:

# Patch release (bug fix)
git commit -m "fix: resolve null pointer in data parser"

# Minor release (new feature)
git commit -m "feat: add CSV export functionality"

# Major release (breaking change)
git commit -m "feat!: redesign API for better ergonomics

BREAKING CHANGE: API has been redesigned for improved usability.
See migration guide in docs/migration/v2.0.0.md"

Configuration: See [tool.semantic_release] in pyproject.toml for settings.

Template Maintenance

This project was generated from a cookiecutter template and is managed with cruft.

Updating from Template

To sync with the latest template changes:

# Preview changes first
cruft diff

# Apply updates (recommended: use the wrapper script)
./scripts/cruft-update.sh

# Or use cruft directly (requires manual cleanup)
cruft update
python scripts/cleanup_conditional_files.py

Important: Cruft Update Limitations

Cruft only syncs file contents - it does NOT re-run post-generation hooks that clean up conditional files.

When you change feature flags in .cruft.json (e.g., disabling include_api_framework), the corresponding files are NOT automatically removed. You must run the cleanup script:

# Check for orphaned files
python scripts/check_orphaned_files.py

# Remove orphaned files
python scripts/cleanup_conditional_files.py

# Or preview what would be removed
python scripts/cleanup_conditional_files.py --dry-run

Conditional Files

Files that may need cleanup when features are disabled:

Feature	Files to Remove
`include_api_framework: no`	`src//api/`, `src//middleware/`
`include_sentry: no`	`src/*/core/sentry.py`
`include_background_jobs: no`	`src/*/jobs/`
`include_caching: no`	`src/*/core/cache.py`
`include_docker: no`	`Dockerfile`, `docker-compose*.yml`
`use_mkdocs: no`	`mkdocs.yml`, `docs/`

The CI pipeline includes automated checks for orphaned files to prevent this issue.

License

MIT License - see LICENSE for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: byron@williamshome.family

Acknowledgments

Thank you to all contributors and the open-source community!

Made with by Byron Williams

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.claude		.claude
.clusterfuzzlite		.clusterfuzzlite
.github		.github
.qlty		.qlty
.standards		.standards
LICENSES		LICENSES
configs		configs
data		data
docs		docs
fuzz		fuzz
overrides		overrides
scripts		scripts
src/audio_processor		src/audio_processor
tests		tests
tools		tools
.codecov.yml		.codecov.yml
.coderabbit.yaml		.coderabbit.yaml
.cruft.json		.cruft.json
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.hadolint.yaml		.hadolint.yaml
.infisical.json		.infisical.json
.markdownlint.json		.markdownlint.json
.mutmut_config		.mutmut_config
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierrc		.prettierrc
.secrets.baseline		.secrets.baseline
.semgrep.yml		.semgrep.yml
.shellcheckrc		.shellcheckrc
.trivyignore		.trivyignore
.yamllint		.yamllint
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONFIG_TEMPLATES_SUMMARY.md		CONFIG_TEMPLATES_SUMMARY.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
GEMINI.md		GEMINI.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
README.md		README.md
REUSE.toml		REUSE.toml
SECURITY.md		SECURITY.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
noxfile.py		noxfile.py
osv-scanner.toml		osv-scanner.toml
pyproject.toml		pyproject.toml
renovate.json		renovate.json
sonar-project.properties		sonar-project.properties
uv.lock		uv.lock

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Audio Processor

Quality & Security

CI/CD Status

Project Info

Overview

Features

Quick Start

Prerequisites

Installation

Basic Usage

CLI Usage

Supply Chain Security

Security Architecture

Quick Start

Package Indexes

Secrets Management with Infisical

SBOM & Attestation

Setup Instructions

Required GCP Permissions

Troubleshooting

Development

Setup Development Environment

Code Quality Standards

PyStrict-Aligned Ruff Configuration

Claude Code Standards

Running Tests

Quality Checks with Qlty

Individual Tool Commands (if needed)

Project Structure

Documentation

Writing Documentation

Testing

Testing Policy

Test Guidelines

Security

Security-First Development

Reporting Security Issues

Contributing

Quick Checklist Before Submitting PR

Versioning

Automated Releases with Semantic Release

Template Maintenance

Updating from Template

Important: Cruft Update Limitations

Conditional Files

License

Support

Acknowledgments

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages