file-processing-analytics

file-processing-analytics is a Python library designed for analyzing and gathering metadata from a collection of files. It seamlessly integrates with the file-processing suite, enabling efficient metadata extraction and storage in CSV format. This library is ideal for data discovery, auditing, and understanding the contents of file directories or file lists.

Features

Directory and List Input Support: Accepts file input via directories (with optional recursive search) or predefined lists of file paths.
Metadata Extraction: Leverages the file-processing library to gather metadata from each file.
Error Logging: Captures processing errors per file, logging them into the CSV output for easy diagnostics.
Progress Tracking: Supports tracking of processed files to resume long-running tasks, using SQLite.
Output to CSV: Aggregates results in a CSV format, making it easy to view, share, or further analyze.

Installation

Install file-processing-analytics via GitHub:

pip install git+https://github.com/hc-sc-ocdo-bdpd/file-processing-analytics.git

Quick Start

Here’s how to start using file-processing-analytics:

from file_processing_analytics import AnalyticsProcessor

# Initialize an AnalyticsProcessor
processor = AnalyticsProcessor(
    input_collection="path/to/directory",
    output_csv_path="output/results.csv"
)

# Process files and save metadata to CSV
processor.process_files()

Using List Inputs

Alternatively, provide a list of file paths for processing:

file_list = ["file1.pdf", "file2.docx", "file3.jpg"]
processor = AnalyticsProcessor(input_collection=file_list, output_csv_path="output/results.csv")
processor.process_files()

Architecture

Key Components

AnalyticsProcessor: Core class that orchestrates metadata extraction and error handling.
Input Collections: Supports both DirectoryInput (for directories) and ListInput (for custom file lists).
ProgressTracker: Utilizes SQLite to keep track of processed files, ensuring resiliency in case of interruptions.

Error Handling

Errors encountered during processing are logged directly into the CSV file, capturing both the file name and the error description.

Extending the Library

You can customize or extend functionality by creating custom InputCollections or by adding new processing behaviors in conjunction with file-processing.

Contributing

We welcome contributions! To get involved:

Fork the repository: Create your own fork on GitHub.
Create a feature branch: Work on your feature or bug fix in a new branch.
Write tests: Ensure your changes are thoroughly tested.
Submit a Pull Request: When ready, submit a PR for review.

License

This project is licensed under the MIT License.

Contact

For questions, support, or collaboration inquiries:

Email: ocdo-bdpd@hc-sc.gc.ca

Empowering data discovery and metadata analysis. Explore our repository or contribute to enhance its capabilities!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
file_processing_analytics		file_processing_analytics
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

file-processing-analytics

Table of Contents

Features

Installation

Quick Start

Using List Inputs

Architecture

Key Components

Error Handling

Extending the Library

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

file-processing-analytics

Table of Contents

Features

Installation

Quick Start

Using List Inputs

Architecture

Key Components

Error Handling

Extending the Library

Contributing

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages