Skip to content

parinaB/FinTax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FinTax

FinTax is a Retrieval-Augmented Generation (RAG) based tax and accounting assistant built using LangChain, FAISS, and Large Language Models. The application enables users to query accounting and taxation documents in natural language and receive context-aware answers grounded in the uploaded knowledge base.

Features

  • Document-based question answering
  • Retrieval-Augmented Generation (RAG) pipeline
  • Semantic search using FAISS vector database
  • Support for multiple PDF documents
  • Context-aware responses from an LLM
  • Interactive web interface
  • Source-grounded answers based on uploaded documents

Tech Stack

  • Python
  • LangChain
  • FAISS
  • Mistral AI
  • Streamlit
  • PyPDF
  • Hugging Face Embeddings

Project Structure

FinTax/
│
├── documents/
│   ├── Chapter-2-Accounting-Process.pdf
│   ├── Chapter-6-Bills-of-Exchange-and-Promissory-Notes.pdf
│   ├── faq.pdf
│   └── interplay_transition.pdf
│
├── faiss_index/
│   ├── index.faiss
│   └── index.pkl
│
├── UITAX.py
├── main.py
├── requirements.txt
├── pyproject.toml
├── uv.lock
├── README.md
└── .gitignore

How It Works

  1. Documents are loaded from the documents directory.
  2. The text is extracted and split into chunks.
  3. Embeddings are generated for each chunk.
  4. The embeddings are stored in a FAISS vector database.
  5. User queries are converted into embeddings.
  6. Relevant document chunks are retrieved from FAISS.
  7. Retrieved context is sent to the language model.
  8. The model generates an answer grounded in the retrieved information.

Installation

Clone the repository:

git clone https://github.com/your-username/FinTax.git
cd FinTax

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Environment Variables

Create a .env file in the project root:

MISTRAL_API_KEY=your_api_key

Add any additional environment variables required by your model provider.

Running the Application

Launch the application:

streamlit run UITAX.py

or

python main.py

depending on the entry point you intend to use.

Example Queries

  • What is the accounting process?
  • Explain bills of exchange and promissory notes.
  • What are the key provisions discussed in the FAQ document?
  • Explain the concept of interplay and transition rules.
  • Summarize the main topics covered in Chapter 2.

Screenshots

Screenshot 2026-06-19 at 7 26 05 PM Screenshot 2026-06-19 at 7 25 49 PM

Future Improvements

  • Multi-document source citations
  • Conversation memory
  • Hybrid search (keyword + vector retrieval)
  • Support for additional accounting and taxation datasets
  • Advanced reranking for improved retrieval quality
  • Deployment on cloud infrastructure

Learning Outcomes

This project demonstrates:

  • Retrieval-Augmented Generation (RAG)
  • Vector databases and semantic search
  • Embedding generation and retrieval pipelines
  • Prompt engineering
  • LLM application development
  • End-to-end document question answering systems

License

This project is intended for educational and learning purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages