Thank you for your interest in improving NullifyPDF! This guide explains how to contribute code, report bugs, and help grow this project.
Important
All contributions must follow the standards outlined in this document. Questions? Open a discussion.
# Fork on GitHub, then:
git clone https://github.com/YOUR_USERNAME/NullifyPDF.git
cd NullifyPDF
git remote add upstream https://github.com/overwrite00/NullifyPDF.gitpython setup_env.pyThis creates isolated venv, installs dependencies, and downloads NLP models.
# Update main branch
git fetch upstream
git checkout -b feature/your-feature-name develop
# Example names:
# feature/ai-entity-detection
# fix/export-memory-leak
# docs/installation-guide# Activate venv
source .venv/bin/activate # or .venv\Scripts\Activate.ps1 on Windows
# Make your changes...
# Run tests
pytest tests/ -v
# Build to verify compilation
python build_local.pySee Commit Message Format below.
git push origin feature/your-feature-nameThen open Pull Request on GitHub with description from template.
✅ Bug Fixes — Crashes, memory leaks, incorrect behavior
✅ Performance — Optimization, reduced memory usage
✅ Features — New functionality (discuss first in issues)
✅ Documentation — README, guides, API docs, examples
✅ Tests — Improved coverage, edge cases
✅ Build/CI — GitHub Actions improvements, cross-platform fixes
❌ Major Architecture Changes — Discuss first via issue
❌ New Dependencies — Heavy libraries (keep slim & local)
❌ OCR Implementation — Out of scope (use Blindfold Mode instead)
❌ Cloud Features — Must remain 100% local
Tip
Want to work on something big? Open an issue first to discuss approach and get buy-in.
All functions must have complete type hints. This improves IDE support and catches bugs early.
# ✅ GOOD
def extract_text(pdf_path: str, page_num: int) -> str:
"""Extract text from PDF page.
Args:
pdf_path: Path to PDF file
page_num: Page number (0-indexed)
Returns:
Extracted text as string
Raises:
FileNotFoundError: If PDF not found
ValueError: If page_num out of range
"""
...
# ❌ BAD
def extract_text(pdf_path, page_num):
# Extract text from PDF
...Use Google-style docstrings with Args, Returns, Raises sections.
def redact_entities(text: str, entities: list[str]) -> str:
"""Replace entities with redaction markers.
Args:
text: Input text to redact
entities: List of entity strings to hide
Returns:
Text with entities replaced by [REDACTED]
"""- Imports: Use isort for automatic sorting
- Functions: Group related functions by feature
- Classes: Use dataclasses where applicable
- Comments: Only for non-obvious logic (see examples in code)
- Line Length: Max 100 characters
- New features must include tests
- All tests must pass:
pytest tests/ -v - Aim for >80% code coverage
- Test edge cases, not just happy path
def test_extract_text_empty_pdf():
"""Verify extraction handles empty PDFs."""
result = extract_text("empty.pdf", 0)
assert result == ""
def test_extract_text_invalid_page():
"""Verify ValueError on page out of range."""
with pytest.raises(ValueError):
extract_text("test.pdf", 999)Use descriptive, concise commit messages following this pattern:
type(scope): description
Body with more details (optional)
- Type:
feat,fix,docs,refactor,test,chore,perf,ci - Scope: Component affected (e.g.,
ai,pdf,ui,logging) - Description: Imperative mood, lowercase, no period
- Body: Explain why, not what (body is optional)
# Feature
git commit -m "feat(ai): add IBAN detection to Presidio pipeline"
# Bug fix
git commit -m "fix(export): resolve memory doubling in forensic scrubbing"
# Documentation
git commit -m "docs: update installation guide for Python 3.12"
# Performance
git commit -m "perf(allowlist): implement O(1) exact-match lookup"
# With detailed explanation
git commit -m "fix(threading): add QMutex serialization for document access
Prevents race conditions between UI thread (render) and AI worker
thread when scanning large PDFs concurrently."- ✅ Run tests:
pytest tests/ -v - ✅ Build locally:
python build_local.py - ✅ Update CHANGELOG.md with your changes
- ✅ Self-review your code (would you understand this in 6 months?)
- ✅ Check links in documentation
## 📝 Description
Brief explanation of what changed and why.
## 🎯 Type of Change
- [ ] Bug fix (non-breaking, fixes issue)
- [ ] New feature (non-breaking)
- [ ] Breaking change (requires version bump)
- [ ] Documentation update
## ✅ Testing
- [ ] Unit tests added/updated
- [ ] Manual testing completed
- [ ] All tests pass locally
## 📋 Checklist
- [ ] Code follows style guidelines
- [ ] No new dependencies added
- [ ] Documentation updated
- [ ] CHANGELOG.md updated
- [ ] Commits have clear messages- CI/CD checks must pass (GitHub Actions)
- Code review by maintainer
- Approval → merge to
developbranch - Included in next release
Note
Major changes may require discussion before implementation. Open an issue first if uncertain.
Found a bug? Open an issue on GitHub with:
- Title: Clear, specific description
- Environment: Python version, OS, GUI version
- Steps to Reproduce: Exact steps to trigger bug
- Expected Behavior: What should happen
- Actual Behavior: What actually happens
- Logs: Paste relevant logs from
~/.nullifypdf/logs/
Enable debug mode for detailed logs:
# Windows (PowerShell)
$env:NULLIFYPDF_DEBUG = "true"
python NullifyPDF.py
# macOS/Linux (Bash)
export NULLIFYPDF_DEBUG=true
python3.12 NullifyPDF.pyHave an idea? Open an issue with:
- Title: Feature description
- Use Case: Why do you need this?
- Proposed Solution: How should it work?
- Alternatives Considered: Other approaches?
Feature discussion happens in issues before implementation. No PRs without prior discussion.
See ARCHITECTURE.md for detailed system design:
- Core Components — PDF engine, AI pipeline, UI layer
- Threading Model — QThread + QMutex architecture
- Data Flow — Load → Extract → Analyze → Redact → Export
- Key Files — Entry point, worker classes, utility modules
Understanding architecture helps choose right place for changes.
NullifyPDF prioritizes staying lean and local. Before adding a dependency:
- Justify why it's needed
- Check if functionality exists in stdlib
- Verify cross-platform support (Windows/macOS/Linux)
- Consider maintenance burden
- Discuss in issue first
Heavy dependencies (like large ML frameworks) are not accepted unless truly essential.
Found a security vulnerability? Do not open a public issue. Please follow the responsible disclosure guidelines in SECURITY.md to report it safely and privately.
See SECURITY.md for the full responsible disclosure policy and reporting instructions.
main (stable releases)
↓
develop (integration branch)
↓
feature/your-feature (your work)
Branch strategy:
main— Production-ready code onlydevelop— Integration branch for next release- Feature branches — Created from
develop, merged back todevelop
- Python 3.12 — docs.python.org
- PySide6 — doc.qt.io/qtforpython
- PyMuPDF — pymupdf.io
- spaCy — spacy.io
- Presidio — microsoft.github.io/presidio
- 💬 Discussions: GitHub Discussions — General questions and ideas
- 🐛 Bug Reports: GitHub Issues — Report bugs and request features
Every contribution — code, docs, bug reports, ideas — helps make NullifyPDF better. We appreciate your time and effort!
Last updated: 2026-06-06
← Back to README | Architecture →