This guide explains how to use cross-validation and hyperparameter tuning to optimize your Intrusion Detection System's Random Forest classifier for better performance.
Model optimization helps you:
- Improve accuracy: Find the best hyperparameters for your specific dataset
- Reduce overfitting: Validate model performance across multiple data splits
- Increase reliability: Ensure consistent performance on unseen data
- Optimize trade-offs: Balance between precision, recall, and computational cost
Cross-validation is a technique to evaluate model performance by:
- Splitting the training data into K folds (typically 5)
- Training the model K times, each time using a different fold as validation
- Averaging the results to get a more reliable performance estimate
Benefits:
- Detects overfitting (model memorizing training data)
- Provides confidence intervals for performance metrics
- Uses all data for both training and validation
Hyperparameters are settings that control how the model learns. For Random Forest:
- n_estimators: Number of trees in the forest
- max_depth: Maximum depth of each tree
- min_samples_split: Minimum samples required to split a node
- min_samples_leaf: Minimum samples required at a leaf node
- max_features: Number of features to consider for best split
- bootstrap: Whether to use bootstrap sampling
Grid Search systematically tests all combinations to find the best settings.
You must first run the main notebook to:
- Generate preprocessed data (
train_processed.csv,test_processed.csv) - Train the baseline model (
intrusion_detection_model_unsw.pkl)
python3 test_setup.pyThis checks that all required files and packages are available.
python3 model_optimization.pyWhat it does:
- Loads the preprocessed training and test data
- Evaluates the baseline model performance
- Performs 5-fold cross-validation on the baseline
- Runs GridSearchCV to test parameter combinations
- Trains the optimized model with best parameters
- Compares baseline vs optimized performance
- Saves the optimized model and results
Expected runtime: 10-30 minutes (depends on dataset size and CPU)
Output:
INTRUSION DETECTION SYSTEM - MODEL OPTIMIZATION
================================================================
Step 1: Loading preprocessed data...
✓ Training data loaded: (X, Y)
✓ Test data loaded: (X, Y)
Step 2: Evaluating baseline model...
Baseline Model Performance:
Accuracy: 0.9009
Precision: 0.9876
Recall: 0.7982
F1-Score: 0.8829
ROC-AUC: 0.9654
Step 3: Performing K-Fold Cross-Validation...
Cross-Validation Results:
Mean CV Accuracy: 0.8995 (+/- 0.0123)
Step 4: Hyperparameter Tuning with GridSearchCV...
Parameter grid size: 288 combinations
[GridSearchCV progress output...]
Best Parameters:
n_estimators: 200
max_depth: 30
min_samples_split: 2
min_samples_leaf: 1
max_features: sqrt
bootstrap: True
Step 5: Evaluating optimized model...
Optimized Model Performance:
Accuracy: 0.9156
Precision: 0.9901
Recall: 0.8301
F1-Score: 0.9029
ROC-AUC: 0.9721
Performance Improvements:
Accuracy : +1.47%
Precision : +0.25%
Recall : +3.19%
F1 : +2.00%
Roc_auc : +0.67%
✓ Optimized model saved
✓ Results saved
python3 model_optimization_visualizations.pyWhat it does:
- Loads both baseline and optimized models
- Generates 5 comparison visualizations
- Saves all charts as PNG files
Expected runtime: 2-5 minutes
- intrusion_detection_model_optimized.pkl
- Random Forest model with tuned hyperparameters
- Ready for deployment
- Better performance than baseline
-
optimization_results.csv
- Comparison table with all metrics
- Columns: Metric, Baseline, Optimized, Improvement (%)
- Easy to import into Excel or other tools
-
best_hyperparameters.csv
- Optimal parameter values found by GridSearch
- Use these settings for future training
- Documents the model configuration
-
optimization_metrics_comparison.png
- Side-by-side bar chart
- Compares all 5 metrics (accuracy, precision, recall, F1, ROC-AUC)
- Shows exact values on each bar
- Best for: Presentations, reports
-
optimization_improvement.png
- Horizontal bar chart
- Shows percentage improvement for each metric
- Green bars = improvement, Red bars = degradation
- Best for: Quick assessment of optimization impact
-
optimization_confusion_matrices.png
- Two heatmaps side-by-side
- Left: Baseline model confusion matrix
- Right: Optimized model confusion matrix
- Best for: Understanding prediction errors
-
optimization_roc_curves.png
- ROC curves for both models
- Includes AUC scores
- Diagonal line shows random classifier
- Best for: Evaluating classification quality
-
optimization_learning_curves.png
- Two plots showing training vs validation scores
- Left: Baseline model learning curve
- Right: Optimized model learning curve
- Shaded areas show standard deviation
- Best for: Diagnosing overfitting/underfitting
Accuracy: Overall correctness
- Higher is better
- Can be misleading with imbalanced datasets
Precision: Of all predicted attacks, how many were real?
- Important for minimizing false alarms
- High precision = fewer false positives
Recall: Of all real attacks, how many did we detect?
- Important for catching all threats
- High recall = fewer missed attacks
F1-Score: Harmonic mean of precision and recall
- Balances both metrics
- Good overall performance indicator
ROC-AUC: Area under ROC curve
- Measures classification quality across all thresholds
- 1.0 = perfect, 0.5 = random
Good Signs:
- ✓ All metrics improved or stayed the same
- ✓ Cross-validation scores are consistent (low std dev)
- ✓ Learning curves converge (training and validation close)
- ✓ ROC-AUC close to 1.0
Warning Signs:
- ⚠ Large gap between training and validation scores (overfitting)
- ⚠ High variance in cross-validation scores (unstable model)
- ⚠ Some metrics improved but others degraded significantly
- ⚠ Very long training time with minimal improvement
Based on network intrusion detection, you might see:
- Accuracy: +1-3% improvement
- Recall: +2-5% improvement (better attack detection)
- F1-Score: +1-3% improvement
- ROC-AUC: +0.5-1% improvement
Even small improvements are valuable in security applications!
Use the optimized model for:
- Real-time intrusion detection
- Integration with Snort IDS
- REST API deployment
- Automated threat response
The results provide:
- Proof of model validation
- Performance benchmarks
- Configuration documentation
- Comparison baselines
Use insights to:
- Identify which metrics need improvement
- Decide if more data is needed
- Determine if feature engineering would help
- Guide selection of alternative algorithms
Use visualizations to:
- Demonstrate model improvements
- Justify computational costs
- Explain model reliability
- Support deployment decisions
Solutions:
- Reduce parameter grid size (fewer values to test)
- Use
RandomizedSearchCVinstead ofGridSearchCV - Reduce
cvparameter (use 3-fold instead of 5-fold) - Use a smaller subset of training data for tuning
Solutions:
- Reduce
n_estimatorsin parameter grid - Limit
max_depthvalues - Use fewer cross-validation folds
- Process on a machine with more RAM
Possible reasons:
- Baseline model already well-tuned
- Dataset too small for complex models
- Features not informative enough
- Need different algorithm (try XGBoost, Neural Networks)
Next steps:
- Try feature engineering
- Collect more training data
- Experiment with different algorithms
- Consider ensemble methods
Possible causes:
- Overfitting to validation set
- Random variation in data splits
- Parameter grid doesn't include optimal values
Solutions:
- Use nested cross-validation
- Expand parameter search space
- Increase cross-validation folds
- Check for data leakage
Edit model_optimization.py to test different parameters:
param_grid = {
'n_estimators': [50, 100, 150, 200, 250, 300],
'max_depth': [5, 10, 15, 20, 25, 30, None],
'min_samples_split': [2, 5, 10, 20],
'min_samples_leaf': [1, 2, 4, 8],
'max_features': ['sqrt', 'log2', None],
'bootstrap': [True, False],
'class_weight': ['balanced', None] # For imbalanced datasets
}For faster tuning with large parameter spaces:
from sklearn.model_selection import RandomizedSearchCV
random_search = RandomizedSearchCV(
estimator=RandomForestClassifier(random_state=42, n_jobs=-1),
param_distributions=param_grid,
n_iter=50, # Test 50 random combinations
cv=3,
scoring='accuracy',
n_jobs=-1,
verbose=2,
random_state=42
)Change the scoring parameter to optimize for specific goals:
# Optimize for recall (catch more attacks)
grid_search = GridSearchCV(..., scoring='recall')
# Optimize for precision (fewer false alarms)
grid_search = GridSearchCV(..., scoring='precision')
# Optimize for F1 (balance)
grid_search = GridSearchCV(..., scoring='f1')
# Optimize for ROC-AUC
grid_search = GridSearchCV(..., scoring='roc_auc')- Always use cross-validation: Never tune on test data
- Document parameters: Save best_hyperparameters.csv with model
- Version models: Keep both baseline and optimized for comparison
- Monitor production: Track if optimized model performs as expected
- Retune periodically: As data changes, optimal parameters may change
- Consider trade-offs: Faster models vs more accurate models
After optimization, consider:
- Feature Selection: Use optimized model with reduced features
- Ensemble Methods: Combine multiple optimized models
- Multi-class Classification: Optimize for specific attack types
- Real-time Deployment: Deploy optimized model to production
- Continuous Learning: Retrain periodically with new data
For issues or questions:
- Check this guide first
- Review the main README.md
- Check console output for error messages
- Verify all prerequisites are met
Happy Optimizing! 📊