| Author | Byron Williams |
| Created | 2025-12-04 |
| Repository | ByronWilliamsCPA/audio-processor |
Audio file conversion and processing for RAG content pipelines
This project provides:
- Core functionality for audio file conversion and processing for rag content pipelines
- Production-ready code with comprehensive testing
- Well-documented API and architecture
- Security-first development practices
- High Quality: 80%+ test coverage enforced via CI
- Type Safe: Full type hints with BasedPyright strict mode
- Well Documented: Clear docstrings and comprehensive guides
- Developer Friendly: Pre-commit hooks, automated formatting, linting
- Security First: Dependency scanning, security analysis, SBOM generation
- CLI Tool: Command-line interface via audio_processor
- ML Ready: Optional ML dependencies with PyTorch support
- Python 3.10+ (tested with 3.12)
- UV for dependency management
Install UV:
# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or with pip/pipx
pip install uv
# or
pipx install uv# Clone repository
git clone https://github.com/ByronWilliamsCPA/audio-processor.git
cd audio_processor
# Install dependencies (includes dev tools - REQUIRED for development)
uv sync --all-extras
# Install with ML dependencies
uv sync --all-extras,ml
# Setup pre-commit hooks (required)
uv run pre-commit install# Import and use the package
from audio_processor import YourModule
# Example: Create an instance and use it
module = YourModule()
result = module.process()
print(result)# Display help
uv run audio_processor --help
# Use the CLI tool
uv run audio_processor command --option value
# Example: Process input file
uv run audio_processor process input.txt --output result.jsonThis project implements enterprise-grade supply chain security with a multi-tier package index strategy and centralized secrets management.
┌─────────────────────────────────────────────────────────────────┐
│ Package Index Priority │
├─────────────────────────────────────────────────────────────────┤
│ 1. Google Assured OSS (SLSA Level 3) - Third-party packages │
│ 2. Internal Artifact Registry - Organization packages │
│ 3. PyPI (fallback) - Packages not in tier 1 or 2 │
└─────────────────────────────────────────────────────────────────┘
# Run the setup script
./scripts/setup-supply-chain.sh
# Or manually configure
gcloud auth login
gcloud auth application-default login
pip install keyrings.google-artifactregistry-auth| Index | SLSA Level | Purpose | Default |
|---|---|---|---|
| PyPI | - | Standard packages | Yes (default) |
| Google Assured OSS | 3 | Verified third-party packages | Opt-in |
| Internal Registry | 2+ | Organization-maintained packages | Opt-in |
How It Works:
By default, all packages resolve from PyPI. After configuring GCP authentication, you can opt-in specific packages to use Assured OSS by uncommenting entries in pyproject.toml:
[tool.uv.sources]
numpy = { index = "assured-oss" }
pandas = { index = "assured-oss" }
requests = { index = "assured-oss" }Why This Matters:
- SLSA Level 3: Build integrity, provenance, and tamper-proof artifacts
- Supply Chain Protection: Reduced risk of dependency confusion attacks
- Compliance: Meets enterprise security and audit requirements
- Graceful Fallback: Works without authentication, opt-in when ready
Secrets are managed via Infisical instead of environment variables or GitHub Secrets.
Local Development:
# Login to Infisical
infisical login
# Initialize project connection
infisical init
# Run commands with secrets injected
infisical run --env=dev -- uv run python main.py
# Or export secrets to local file
infisical export --env=dev > .env.localCI/CD Integration:
- GitHub Actions use Infisical's Machine Identity authentication
- Secrets are injected at runtime, never stored in repositories
- Environment mapping:
main→prod,develop→staging,*→dev
Software Bill of Materials (SBOM) is generated on every release:
# Generate SBOM locally
uv run cyclonedx-py environment -o sbom.json
# Verify package attestation
pip-audit --require-hashesAutomated via CI:
- CycloneDX SBOM generated in JSON and XML formats
- Attestation attached to GitHub releases
- Vulnerability scanning with OSV database
-
Run the setup script (recommended):
./scripts/setup-supply-chain.sh
-
Or configure manually:
Google Cloud Authentication:
gcloud auth login gcloud auth application-default login pip install keyrings.google-artifactregistry-auth
Infisical Setup:
# Install Infisical CLI # macOS brew install infisical/get-cli/infisical # Linux curl -1sLf 'https://dl.cloudsmith.io/public/infisical/infisical-cli/setup.deb.sh' | sudo -E bash sudo apt-get install infisical # Connect to project infisical login infisical init
-
Configure CI/CD secrets in Infisical:
GCP_SA_KEY_BASE64: Base64-encoded GCP service account keyCODECOV_TOKEN: Codecov upload token (if using Codecov)SONAR_TOKEN: SonarCloud token (if using SonarCloud)
| Role | Purpose |
|---|---|
roles/artifactregistry.reader |
Read from Assured OSS and internal registry |
roles/artifactregistry.writer |
Publish to internal registry (CI only) |
Q: Packages not found in Assured OSS?
- UV automatically falls back to PyPI - no action needed
- Check available packages: Assured OSS Supported Packages
Q: Authentication errors with Artifact Registry?
- Run
gcloud auth application-default loginto refresh credentials - Verify service account has
Artifact Registry Readerrole - Check keyring is installed:
pip install keyrings.google-artifactregistry-auth
Q: Infisical connection issues?
- Verify
.infisical.jsonhas correctworkspaceId - Check your Infisical organization permissions
- For CI: Ensure
INFISICAL_CLIENT_IDandINFISICAL_CLIENT_SECRETare set
Q: How to verify supply chain setup?
# Test package index access
./scripts/setup-supply-chain.sh # Re-run to verify all checks pass# Install all dependencies including dev tools
uv sync --all-extras
# Setup pre-commit hooks
uv run pre-commit install
# Install Qlty CLI for unified code quality checks
curl https://qlty.sh | bash
# Run tests
uv run pytest -v
# Run with coverage
uv run pytest --cov=audio_processor --cov-report=html
# Run all quality checks (using Qlty)
qlty check
# Or use pre-commit
uv run pre-commit run --all-filesAll code must meet these requirements:
- Formatting: Ruff (88 char limit)
- Linting: Ruff with PyStrict-aligned rules (see below)
- Type Checking: BasedPyright strict mode
- Testing: Pytest with 80%+ coverage
- Security: Bandit + dependency scanning
- Documentation: Docstrings on all public APIs
Unified Quality Tool: This project uses Qlty to consolidate all quality checks into a single fast tool. See .qlty/qlty.toml for configuration.
This project uses PyStrict-aligned Ruff rules for stricter code quality enforcement beyond standard Python linting:
| Rule | Category | Purpose |
|---|---|---|
| BLE | Blind except | Prevent bare except: clauses |
| EM | Error messages | Enforce descriptive error messages |
| SLF | Private access | Prevent access to private members |
| INP | Implicit packages | Require explicit __init__.py |
| ISC | Implicit concatenation | Prevent implicit string concatenation |
| PGH | Pygrep hooks | Advanced pattern-based checks |
| RSE | Raise statement | Proper exception raising |
| TID | Tidy imports | Clean import organization |
| YTT | sys.version | Safe version checking |
| FA | Future annotations | Modern annotation syntax |
| T10 | Debugger | No debugger statements in production |
| G | Logging format | Safe logging string formatting |
These rules catch bugs that standard linting misses and enforce production-quality code patterns.
This project includes standardized Claude Code configuration via git subtree:
Directory Structure:
.claude/
├── claude.md # Project-specific Claude guidelines
└── standard/ # Standard Claude configuration (git subtree)
├── CLAUDE.md # Universal development standards
├── commands/ # Custom slash commands
├── skills/ # Reusable skills
└── agents/ # Specialized agents
Updating Standards:
# Pull latest standards from upstream
./scripts/update-claude-standards.sh
# Or manually
git subtree pull --prefix .claude/standard \
https://github.com/williaby/.claude.git main --squashWhat's Included:
- Universal development best practices
- Response-Aware Development (RAD) system for assumption tagging
- Agent assignment patterns and workflow
- Security requirements and pre-commit standards
- Git workflow and commit conventions
Project-Specific Overrides: Edit .claude/claude.md for project-specific guidelines. See .claude/README.md for details.
# Run all tests
uv run pytest -v
# Run specific test file
uv run pytest tests/unit/test_module.py -v
# Run with coverage report
uv run pytest --cov=audio_processor --cov-report=term-missing
# Run tests in parallel
uv run pytest -n autoRecommended: Use Qlty CLI for unified code quality checks.
# Run all quality checks (fast!)
qlty check
# Run checks on only changed files (fastest)
qlty check --filter=diff
# Run specific plugins only
qlty check --plugin ruff --plugin pyright
# Auto-format code
qlty fmt
# View current configuration
qlty config showQlty runs all these tools in a single pass:
Python Quality:
- Ruff (linting + formatting)
- BasedPyright (type checking)
- Bandit (security scanning)
Security & Secrets:
- Gitleaks (secrets detection)
- TruffleHog (entropy-based secrets detection)
- OSV Scanner (dependency vulnerabilities)
- Semgrep (advanced SAST)
File & Configuration:
- Markdownlint (markdown linting)
- Yamllint (YAML linting)
- Prettier (JSON, YAML, Markdown formatting)
- Actionlint (GitHub Actions workflows)
- Shellcheck (shell script linting)
Container & Infrastructure (if Docker enabled):
- Hadolint (Dockerfile linting)
- Trivy (container security scanning)
- Checkov (infrastructure as code security)
Code Quality Metrics:
- Complexity analysis (cyclomatic, cognitive)
- Code smells detection
- Maintainability scoring
# Format code
uv run ruff format src tests
# Lint and auto-fix
uv run ruff check --fix src tests
# Type checking
uv run basedpyright src
# Security scanning
uv run bandit -r src
# Dependency vulnerabilities
qlty check --plugin osv_scanneraudio_processor/
├── src/audio_processor/ # Main package
│ ├── __init__.py
│ ├── core.py # Core functionality
│ └── utils/ # Utility modules
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── docs/ # Documentation
│ ├── ADRs/ # Architecture Decision Records
│ ├── planning/ # Project planning docs
│ └── guides/ # User guides
├── pyproject.toml # Dependencies & tool config
├── README.md # This file
├── CONTRIBUTING.md # Contribution guidelines
└── LICENSE # License
- CONTRIBUTING.md: How to contribute to the project
- docs/ADRs/README.md: Architecture Decision Records documentation
- docs/planning/project-plan-template.md: Project planning guide
- Use Markdown for all documentation
- Include code examples for clarity
- Update README.md when adding major features
- Maintain architecture documentation (see docs/ADRs/)
All new functionality must include tests:
- Unit tests: Test individual functions/classes
- Integration tests: Test component interactions
- Coverage: Maintain 80%+ coverage
- Markers: Use pytest markers (
@pytest.mark.unit,@pytest.mark.integration)
# Run all tests
uv run pytest -v
# Run only unit tests
uv run pytest -v -m unit
# Run only integration tests
uv run pytest -v -m integration
# Run with coverage requirements
uv run pytest --cov=audio_processor --cov-fail-under=80- Validate all inputs
- Use secure defaults
- Scan dependencies regularly
- Report vulnerabilities responsibly
Please report security vulnerabilities to byron@williamshome.family rather than using the public issue tracker.
See the ByronWilliamsCPA Security Policy for complete disclosure policy and response timelines.
Contributions are welcome! Please see CONTRIBUTING.md for:
- Development setup
- Code quality standards
- Testing requirements
- Git workflow and commit conventions
- Pull request process
- Code follows style guide (Ruff format + lint)
- All tests pass with 80%+ coverage
- BasedPyright type checking passes
- Docstrings added for new public APIs
- CHANGELOG.md updated (if significant change)
- Commits follow conventional commit format
This project uses Semantic Versioning:
- MAJOR version: Incompatible API changes
- MINOR version: Backwards-compatible functionality additions
- PATCH version: Backwards-compatible bug fixes
Current version: 0.1.0
This project uses python-semantic-release for automated versioning based on Conventional Commits.
How it works:
-
Commit messages determine version bumps:
fix:commits trigger a PATCH release (1.0.0 → 1.0.1)feat:commits trigger a MINOR release (1.0.0 → 1.1.0)BREAKING CHANGE:in commit body or!after type triggers MAJOR release (1.0.0 → 2.0.0)
-
On merge to main:
- Analyzes commits since last release
- Determines appropriate version bump
- Updates version in
pyproject.toml - Generates/updates
CHANGELOG.md - Creates Git tag and GitHub Release
- Publishes to PyPI (if configured)
Commit message examples:
# Patch release (bug fix)
git commit -m "fix: resolve null pointer in data parser"
# Minor release (new feature)
git commit -m "feat: add CSV export functionality"
# Major release (breaking change)
git commit -m "feat!: redesign API for better ergonomics
BREAKING CHANGE: API has been redesigned for improved usability.
See migration guide in docs/migration/v2.0.0.md"Configuration: See [tool.semantic_release] in pyproject.toml for settings.
This project was generated from a cookiecutter template and is managed with cruft.
To sync with the latest template changes:
# Preview changes first
cruft diff
# Apply updates (recommended: use the wrapper script)
./scripts/cruft-update.sh
# Or use cruft directly (requires manual cleanup)
cruft update
python scripts/cleanup_conditional_files.pyCruft only syncs file contents - it does NOT re-run post-generation hooks that clean up conditional files.
When you change feature flags in .cruft.json (e.g., disabling include_api_framework), the corresponding files are NOT automatically removed. You must run the cleanup script:
# Check for orphaned files
python scripts/check_orphaned_files.py
# Remove orphaned files
python scripts/cleanup_conditional_files.py
# Or preview what would be removed
python scripts/cleanup_conditional_files.py --dry-runFiles that may need cleanup when features are disabled:
| Feature | Files to Remove |
|---|---|
include_api_framework: no |
src/*/api/, src/*/middleware/ |
include_sentry: no |
src/*/core/sentry.py |
include_background_jobs: no |
src/*/jobs/ |
include_caching: no |
src/*/core/cache.py |
include_docker: no |
Dockerfile, docker-compose*.yml |
use_mkdocs: no |
mkdocs.yml, docs/ |
The CI pipeline includes automated checks for orphaned files to prevent this issue.
MIT License - see LICENSE for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: byron@williamshome.family
Thank you to all contributors and the open-source community!
Made with by Byron Williams