FinTax

FinTax is a Retrieval-Augmented Generation (RAG) based tax and accounting assistant built using LangChain, FAISS, and Large Language Models. The application enables users to query accounting and taxation documents in natural language and receive context-aware answers grounded in the uploaded knowledge base.

Features

Document-based question answering
Retrieval-Augmented Generation (RAG) pipeline
Semantic search using FAISS vector database
Support for multiple PDF documents
Context-aware responses from an LLM
Interactive web interface
Source-grounded answers based on uploaded documents

Tech Stack

Python
LangChain
FAISS
Mistral AI
Streamlit
PyPDF
Hugging Face Embeddings

Project Structure

FinTax/
│
├── documents/
│   ├── Chapter-2-Accounting-Process.pdf
│   ├── Chapter-6-Bills-of-Exchange-and-Promissory-Notes.pdf
│   ├── faq.pdf
│   └── interplay_transition.pdf
│
├── faiss_index/
│   ├── index.faiss
│   └── index.pkl
│
├── UITAX.py
├── main.py
├── requirements.txt
├── pyproject.toml
├── uv.lock
├── README.md
└── .gitignore

How It Works

Documents are loaded from the documents directory.
The text is extracted and split into chunks.
Embeddings are generated for each chunk.
The embeddings are stored in a FAISS vector database.
User queries are converted into embeddings.
Relevant document chunks are retrieved from FAISS.
Retrieved context is sent to the language model.
The model generates an answer grounded in the retrieved information.

Installation

Clone the repository:

git clone https://github.com/your-username/FinTax.git
cd FinTax

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Environment Variables

Create a .env file in the project root:

MISTRAL_API_KEY=your_api_key

Add any additional environment variables required by your model provider.

Running the Application

Launch the application:

streamlit run UITAX.py

or

python main.py

depending on the entry point you intend to use.

Example Queries

What is the accounting process?
Explain bills of exchange and promissory notes.
What are the key provisions discussed in the FAQ document?
Explain the concept of interplay and transition rules.
Summarize the main topics covered in Chapter 2.

Screenshots

Future Improvements

Multi-document source citations
Conversation memory
Hybrid search (keyword + vector retrieval)
Support for additional accounting and taxation datasets
Advanced reranking for improved retrieval quality
Deployment on cloud infrastructure

Learning Outcomes

This project demonstrates:

Retrieval-Augmented Generation (RAG)
Vector databases and semantic search
Embedding generation and retrieval pipelines
Prompt engineering
LLM application development
End-to-end document question answering systems

License

This project is intended for educational and learning purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinTax

Features

Tech Stack

Project Structure

How It Works

Installation

Environment Variables

Running the Application

Example Queries

Screenshots

Future Improvements

Learning Outcomes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
documents		documents
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
UITAX.py		UITAX.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

FinTax

Features

Tech Stack

Project Structure

How It Works

Installation

Environment Variables

Running the Application

Example Queries

Screenshots

Future Improvements

Learning Outcomes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages