This project implements an image caption generation model using deep learning techniques. The model processes images and generates descriptive captions using a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). The quality of generated captions is evaluated using the BLEU (Bilingual Evaluation Understudy) score, a widely used metric for measuring the accuracy of machine-generated text against human references.
Include a preview image of the model's output here:
- Uses CNN (e.g., InceptionV3) for image feature extraction.
- Implements an LSTM-based decoder for caption generation.
- Tokenization and vocabulary handling with a custom word index.
- Evaluation using BLEU Score to measure the quality of generated captions.
To set up the project, install the required dependencies:
pip install tensorflow numpy matplotlib nltkRun the Jupyter Notebook to train and test the model:
jupyter notebook image-caption-generation.ipynbThe model performance is measured using the BLEU score, which evaluates how similar the generated captions are to reference captions. Higher BLEU scores indicate better caption accuracy.
This project is licensed under the MIT License.
