Skip to content

Parallel Hashing Performance Upgrade#372

Merged
pwgit-create merged 9 commits into
masterfrom
develop
May 24, 2026
Merged

Parallel Hashing Performance Upgrade#372
pwgit-create merged 9 commits into
masterfrom
develop

Conversation

@pwgit-create
Copy link
Copy Markdown
Contributor

Parallel Hashing Performance Upgrade, Dependency Update, and Documentation Improvements

Overview

This release introduces a significant performance enhancement to the file integrity scanning engine through the addition of parallel hashing for large files. It also includes dependency updates, scan lifecycle improvements, logging configuration adjustments, and README restructuring.

The primary focus of this change is improving large-file scan performance and reducing redundant I/O operations during multi-algorithm hashing.


Key Changes

Hashing Engine Performance Improvement

  • Introduced ParallelFileHashHandler to handle large file hashing using parallel execution.

  • Refactored FileHashComputer to route large file processing through the parallel hashing implementation.

  • Added explicit lifecycle control:

    • initializeParallelHashing() to set up thread pool and handler before scan start
    • shutdownParallelHashProcessor() to properly release resources after scan completion
  • Improved fallback handling:

    • OutOfMemoryError now switches to parallel hashing instead of single-threaded big file processing.

Performance Impact

  • Parallel hashing provides a major performance improvement in local testing, with 2x+ speedup observed for large file scans.

  • For files 1GB and larger, the improvement is especially significant:

    • The file is read once only, regardless of the number of hashing algorithms applied.
    • Previously, each algorithm required separate file reads.
    • Now SHA-256, SHA-3, and BLAKE2b computations are executed in parallel over a single read stream.
  • Performance gains scale with file size and are increasingly beneficial in environments with many large files, where I/O reduction becomes the dominant factor.

Scan Service Integration

  • Parallel hashing initialization added at scan start for both:

    • scanAllDirectories
    • scanSingleDirectory
  • Thread pool shutdown added during scan finalization to prevent resource leaks.

Dependency Update

  • Updated algorithm-hash-extraction from 1.2.8 to 1.2.9.

Logging Updates

  • Added dedicated logging configuration for:

    • lib.pwss.hash.file_hash_handler.parallel

Documentation Updates (README)

  • Refactored project title to: File-Integrity Scanner Backend (FIM Engine)
  • Improved technical clarity and structure of documentation.
  • Expanded explanation of cryptographic hashing and integrity verification.
  • Added Related Repositories section for end-user distribution.
  • Improved system architecture description and component breakdown.
  • Updated setup instructions for developers.

Architecture / Behavioral Impact

  • Large file processing is now optimized for parallel execution.

  • File I/O is reduced significantly when multiple hashing algorithms are used.

  • Clear separation of concerns between:

    • Small file hashing (FileHashHandler)
    • Large file parallel hashing (ParallelFileHashHandler)
  • Better scalability for environments with high-volume or large-size file systems.


Notes

  • Parallel hashing must be initialized before scan execution and properly shut down after completion to avoid thread pool leaks.
  • Requires algorithm-hash-extraction:1.2.9.

Testing

  • Verified scan execution for:

    • Single directory scans
    • Full directory scans
    • Large file handling (1GB+ test cases)
  • Confirmed:

    • Correct parallel execution of hashing algorithms
    • Proper resource cleanup after scan completion
    • Stable fallback behavior under memory pressure

pwgit-create and others added 7 commits May 23, 2026 12:53
…h computation

* Upgraded the dependency algorithm-hash-extraction from version 1.2.8 to 1.2.9
* Added ParallelFileHashHandler for parallel hash computation of large files
* Updated FileHashComputer to use parallel processing when computing hashes
* Modified ScanServiceImpl to initialize and shutdown parallel hash processors
* Added debugging log level for lib.pwss.hash.file_hash_handler.parallel package

This change improves performance by utilizing parallel processing for file hash computations, especially for
larger files.
* Updated project version from 1.8.5 to 1.9 in pom.xml
* Changed log level for lib.pwss.hash.file_hash_handler.parallel package from DEBUG to ERROR in logback.xml

This change prepares the project for a new release with improved logging configuration.
* Removed unused `import lib.pwss.hash.ParallelFileHash;` in FileHashComputer.java

This cleanup removes an unnecessary import to keep the codebase tidy and improve maintainability.
Improved README structure and clarity for the backend (FIM Engine).

Focused on better architecture explanation, reduced redundancy, and clearer separation of system components.

No functional changes.
Added a section explaining cryptographic hashes and their importance in file integrity.
@pwgit-create pwgit-create requested a review from lilstiffy May 23, 2026 22:08
@pwgit-create pwgit-create added enhancement New feature or request Spring Discussions specifically about the Spring Framework in Java Java Identifies issues and discussions related to the Java programming language hash Topics related to using hashes in code, including hashing algorithms labels May 23, 2026
pwgit-create and others added 2 commits May 24, 2026 00:35
Fix security vulnerabilities reported by Snyk in Tomcat (11.0.21 → 11…
Copy link
Copy Markdown
Collaborator

@lilstiffy lilstiffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the addition to the README :D

@pwgit-create pwgit-create merged commit 8f5e382 into master May 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request hash Topics related to using hashes in code, including hashing algorithms Java Identifies issues and discussions related to the Java programming language Spring Discussions specifically about the Spring Framework in Java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants