Skip to content

Pallavrai/DocCraft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“ DocCraft: AI-Powered Document Formatter

Transform your unformatted documents into professionally styled documents using the power of AI. DocCraft learns formatting patterns from your reference documents and applies them intelligently to your raw content.

πŸš€ Features

  • AI-Powered Style Learning: Uses Google Gemini AI to understand and replicate document formatting patterns
  • Intelligent Block Classification: Automatically categorizes text blocks (headings, paragraphs, lists, etc.)
  • Seamless DOCX Processing: Works with Microsoft Word documents (.docx format)
  • User-Friendly Interface: Built with Streamlit for an intuitive web-based experience
  • Batch Processing: Format entire documents in seconds
  • Style Preservation: Maintains font sizes, bold, italic, underline, and other formatting attributes

πŸ› οΈ How It Works

  1. Upload Raw Document: Provide your unformatted DOCX file
  2. Upload Reference Document: Provide a formatted DOCX file as a style template
  3. AI Analysis: DocCraft analyzes the formatting patterns in your reference document
  4. Smart Application: The AI applies similar formatting to your raw document
  5. Download Result: Get your professionally formatted document instantly

πŸ“‹ Prerequisites

  • Python 3.13+
  • Google Gemini API key
  • Google Cloud Service Account (optional, for enhanced features)

⚑ Quick Start

1. Clone the Repository

git clone <repository-url>
cd DocCraft

2. Install Dependencies

pip install -r requirements.txt

3. Set Up Environment Variables

Create a .env file in the project root:

export GEMINI_API_KEY=your_gemini_api_key_here
export GOOGLE_SERVICE_ACCOUNT_JSON='{"type": "service_account", ...}'

4. Run the Application

streamlit run app.py

5. Open Your Browser

Navigate to http://localhost:8501 to start using DocCraft!

πŸ”§ Installation

Using pip

pip install -e .

Dependencies

  • streamlit - Web interface
  • python-docx - DOCX file processing
  • langchain - AI framework
  • langchain-google-genai - Google Gemini integration
  • google-api-python-client - Google API client
  • python-dotenv - Environment variable management
  • watchdog - File monitoring

πŸ“– Usage Examples

Basic Usage

from doccraft import DocCraft

# Initialize DocCraft
formatter = DocCraft(api_key="your_gemini_key")

# Format a document
formatted_doc = formatter.format_document(
    raw_file="unformatted.docx",
    reference_file="template.docx"
)

# Save the result
formatted_doc.save("formatted_output.docx")

Web Interface

  1. Start the Streamlit app: streamlit run app.py
  2. Upload your raw DOCX file
  3. Upload your reference/template DOCX file
  4. Click "Format and Download DOCX"
  5. Download your formatted document

🎯 Use Cases

  • Academic Papers: Apply consistent formatting to research documents
  • Business Reports: Maintain corporate style guidelines across documents
  • Legal Documents: Ensure uniform formatting for legal briefs and contracts
  • Technical Documentation: Standardize formatting for manuals and guides
  • Content Migration: Convert documents between different style formats

βš™οΈ Configuration

Environment Variables

Variable Description Required
GEMINI_API_KEY Your Google Gemini API key Yes
GOOGLE_SERVICE_ACCOUNT_JSON Google Cloud service account JSON Optional

Supported Formats

  • Input: Microsoft Word (.docx)
  • Output: Microsoft Word (.docx)
  • Styling: Font size, bold, italic, underline, colors, alignment

πŸ”’ API Rate Limits

DocCraft includes built-in rate limiting and retry logic for Google Gemini API:

  • Automatic retry on rate limit exceeded
  • 60-second backoff on resource exhaustion
  • Optimized prompt engineering to minimize API calls

🚨 Troubleshooting

Common Issues

API Key Not Found

Error: Gemini API key not found
Solution: Ensure GEMINI_API_KEY is set in your environment variables

File Upload Issues

Error: Could not process DOCX file
Solution: Ensure the file is a valid .docx format (not .doc)

Memory Issues with Large Documents

Solution: Break large documents into smaller sections

πŸ›£οΈ Roadmap

  • Support for PDF files
  • Advanced style customization
  • Bulk document processing
  • Integration with Google Docs
  • Custom style templates
  • API endpoint for programmatic access

🀝 Contributing

This is a proprietary project. For collaboration opportunities or feature requests, please contact the project maintainer.

πŸ“„ License

This project is proprietary software. See LICENSE file for details.

πŸ†˜ Support

For support, feature requests, or licensing inquiries:

πŸ† Why DocCraft?

  • Time-Saving: Format documents in seconds, not hours
  • Consistency: Ensure uniform styling across all documents
  • AI-Powered: Leverage cutting-edge AI for intelligent formatting
  • Professional: Create polished, professional-looking documents
  • Easy to Use: No technical expertise required

Made with ❀️ for document formatting excellence

About

Transform your unformatted documents into professionally styled documents using the power of AI. DocCraft learns formatting patterns from your reference documents and applies them intelligently to your raw content.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages