NLP Text Classifier

A simple sentiment analysis project using the IMDB movie review dataset. It classifies text as Positive or Negative using TF‑IDF + Logistic Regression.

Prerequisites

Python (recommended: 3.10+)
Git (optional, only needed if you want to clone with git clone)

Setup (clone + virtual environment)

1) Clone the repository

git clone https://github.com/JoeWat2005/nlp-text-classifier.git
cd nlp-text-classifier

2) Create a virtual environment called `env`

python -m venv env

3) Activate the environment

Windows (PowerShell):

.\env\Scripts\Activate.ps1

If PowerShell blocks activation, run this once (in the same PowerShell window) and try again:

Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass

Windows (CMD):

env\Scripts\activate

macOS / Linux:

source env/bin/activate

4) Install dependencies

python -m pip install --upgrade pip
python -m pip install -r requirements.txt

The dependency list is pinned in requirements.txt (includes scikit-learn, datasets, pandas, matplotlib, seaborn, etc.).

Run the project

1) Download the dataset

Downloads the IMDB dataset from Hugging Face and writes it to data/data.csv.

python download_data.py

2) Train the model

Trains a TF‑IDF vectorizer + Logistic Regression model, prints evaluation metrics, and saves artifacts under models/.

python train.py

3) Make predictions (interactive)

Loads models/model.pkl + models/vectorizer.pkl and lets you type sentences to classify.

python predict.py

Type quit to exit.

Folder structure

data/ — downloaded dataset (data.csv)
models/ — saved model/vectorizer + plots
utils/ — helper modules

Troubleshooting

`FileNotFoundError: ... models/confusion_matrix.png`

If you see an error about saving plots into models/, create the directory and rerun:

mkdir models
python train.py

Dataset download issues

download_data.py requires an internet connection and the Hugging Face datasets package.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_data.py		download_data.py
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Text Classifier

Prerequisites

Setup (clone + virtual environment)

1) Clone the repository

2) Create a virtual environment called `env`

3) Activate the environment

4) Install dependencies

Run the project

1) Download the dataset

2) Train the model

3) Make predictions (interactive)

Folder structure

Troubleshooting

`FileNotFoundError: ... models/confusion_matrix.png`

Dataset download issues

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP Text Classifier

Prerequisites

Setup (clone + virtual environment)

1) Clone the repository

2) Create a virtual environment called env

3) Activate the environment

4) Install dependencies

Run the project

1) Download the dataset

2) Train the model

3) Make predictions (interactive)

Folder structure

Troubleshooting

FileNotFoundError: ... models/confusion_matrix.png

Dataset download issues

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2) Create a virtual environment called `env`

`FileNotFoundError: ... models/confusion_matrix.png`

Packages