Skip to content

feat(karthik_model): neural network training on NBA CSV data#13

Open
Kravi001 wants to merge 1 commit into
mainfrom
karthik-neural-networks
Open

feat(karthik_model): neural network training on NBA CSV data#13
Kravi001 wants to merge 1 commit into
mainfrom
karthik-neural-networks

Conversation

@Kravi001

Copy link
Copy Markdown
Collaborator

What I changed

  • Added karthik_model.py in src/mini_nn/ to train a neural network on NBA CSV data.
  • Loads data from Data/PlayerStatistics.csv, preprocesses features, handles NaNs, and manages class imbalance.
  • Trains a model to predict whether a player scores ≥20 points and saves predictions (gitignored).

Why I changed it

  • Provides a fully working neural network pipeline for NBA CSV data.
  • Allows generating predictions without connecting to a database.

How did I test it

  • Ran the script locally on the full dataset (1,655,736 rows, 19 features).
  • Observed class imbalance: 0 → 0.8699, 1 → 0.1301; positive class weight = 6.69.
  • Training metrics:
    • Epoch 10 | Train Loss: 0.1343 | Val F1: 0.9847
    • Epoch 20 | Train Loss: 0.0942 | Val F1: 0.9400
    • Epoch 30 | Train Loss: 0.0690 | Val F1: 0.9919
    • Epoch 40 | Train Loss: 0.0624 | Val F1: 0.9946
    • Epoch 50 | Train Loss: 0.0418 | Val F1: 0.9990
    • Epoch 60 | Train Loss: 0.0330 | Val F1: 0.9997
    • Epoch 70 | Train Loss: 0.0264 | Val F1: 1.0000
  • Early stopping triggered at epoch 72.
  • Test results:
    • Best Epoch: 57
    • Test Accuracy: 0.9999
    • Test F1 Score: 0.9997
    • Test ROC-AUC: 1.0000

@Kravi001 Kravi001 force-pushed the karthik-neural-networks branch from 22dc60f to 1cbbb6d Compare February 10, 2026 08:36
@JonathanPLev JonathanPLev changed the title Add karthik_model.py: neural network training on NBA CSV data feat(karthik_model): neural network training on NBA CSV data Feb 10, 2026

@JonathanPLev JonathanPLev left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add your results? curious to know how well this model did.

edit: sorry i see it actually. how are you getting 100% test accuracy? I think you are having leaking or overfit data.

# -------------------------
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.176, random_state=42)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are definitely having leaking data here because you are having random split

# -------------------------
# SPLIT
# -------------------------
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.15, random_state=42)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

youre also not dropping any of the lines that leak data to the outcome. like youre not removing any stats like points or anything thats current game.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants