FinTax is a Retrieval-Augmented Generation (RAG) based tax and accounting assistant built using LangChain, FAISS, and Large Language Models. The application enables users to query accounting and taxation documents in natural language and receive context-aware answers grounded in the uploaded knowledge base.
- Document-based question answering
- Retrieval-Augmented Generation (RAG) pipeline
- Semantic search using FAISS vector database
- Support for multiple PDF documents
- Context-aware responses from an LLM
- Interactive web interface
- Source-grounded answers based on uploaded documents
- Python
- LangChain
- FAISS
- Mistral AI
- Streamlit
- PyPDF
- Hugging Face Embeddings
FinTax/
│
├── documents/
│ ├── Chapter-2-Accounting-Process.pdf
│ ├── Chapter-6-Bills-of-Exchange-and-Promissory-Notes.pdf
│ ├── faq.pdf
│ └── interplay_transition.pdf
│
├── faiss_index/
│ ├── index.faiss
│ └── index.pkl
│
├── UITAX.py
├── main.py
├── requirements.txt
├── pyproject.toml
├── uv.lock
├── README.md
└── .gitignore
- Documents are loaded from the
documentsdirectory. - The text is extracted and split into chunks.
- Embeddings are generated for each chunk.
- The embeddings are stored in a FAISS vector database.
- User queries are converted into embeddings.
- Relevant document chunks are retrieved from FAISS.
- Retrieved context is sent to the language model.
- The model generates an answer grounded in the retrieved information.
Clone the repository:
git clone https://github.com/your-username/FinTax.git
cd FinTaxCreate a virtual environment:
python -m venv .venv
source .venv/bin/activateInstall dependencies:
pip install -r requirements.txtCreate a .env file in the project root:
MISTRAL_API_KEY=your_api_keyAdd any additional environment variables required by your model provider.
Launch the application:
streamlit run UITAX.pyor
python main.pydepending on the entry point you intend to use.
- What is the accounting process?
- Explain bills of exchange and promissory notes.
- What are the key provisions discussed in the FAQ document?
- Explain the concept of interplay and transition rules.
- Summarize the main topics covered in Chapter 2.
- Multi-document source citations
- Conversation memory
- Hybrid search (keyword + vector retrieval)
- Support for additional accounting and taxation datasets
- Advanced reranking for improved retrieval quality
- Deployment on cloud infrastructure
This project demonstrates:
- Retrieval-Augmented Generation (RAG)
- Vector databases and semantic search
- Embedding generation and retrieval pipelines
- Prompt engineering
- LLM application development
- End-to-end document question answering systems
This project is intended for educational and learning purposes.