This guide explains how to use the feature importance analysis tools to understand which network features are most critical for detecting intrusions.
Feature importance analysis helps you:
- Identify critical features: Understand which network traffic characteristics are most indicative of attacks
- Optimize the model: Reduce dimensionality by focusing on the most important features
- Gain insights: Learn about attack patterns and network behavior
- Improve performance: Potentially speed up predictions by using fewer features
pip install -r requirements.txtOr verify your setup:
python3 test_setup.pyIf you haven't already, you need to run the main notebook to generate the preprocessed training data:
-
Open the Jupyter notebook:
jupyter notebook Intrusion_Detection_System(IDS).ipynb
-
Run all cells up to and including the "Model Training" section
-
This will create:
train_processed.csv- Preprocessed training datatest_processed.csv- Preprocessed test dataintrusion_detection_model_unsw.pkl- Trained model
-
Open the notebook:
jupyter notebook Intrusion_Detection_System(IDS).ipynb
-
Navigate to the "Feature Importance Analysis" section (near the end)
-
Run all cells in that section
-
View the visualizations inline and interact with the data
Run the standalone script:
python3 feature_importance_analysis.pyThis will:
- Load the trained model and preprocessed data
- Extract feature importances
- Generate all visualizations
- Save results to files
- Print summary statistics
-
feature_importance_top20.png
- Horizontal bar chart of the 20 most important features
- Easy to read and compare feature importance scores
- Best for presentations and reports
-
feature_importance_top15_vertical.png
- Vertical bar chart with exact importance values
- Shows the top 15 features with numerical labels
- Useful for detailed analysis
-
cumulative_feature_importance.png
- Line plot showing cumulative importance
- Indicates how many features capture 90% and 95% of predictive power
- Helps determine optimal feature subset size
-
top10_features_correlation.png
- Heatmap showing correlations between top 10 features
- Identifies redundant or complementary features
- Useful for feature engineering
- feature_importance_full.csv
- Complete ranking of all features with importance scores
- Two columns: Feature name and Importance score
- Sorted by importance (highest to lowest)
- Can be used for further analysis in Excel or other tools
- Range: 0.0 to 1.0 (all scores sum to 1.0)
- Interpretation: Higher score = more important for predictions
- Example: A score of 0.15 means the feature contributes 15% to the model's decisions
The analysis provides several insights:
- Top Features: Which features are most critical
- Cumulative Importance: How many features you really need
- Feature Correlations: Which features are related
- Distribution: How importance is spread across features
Based on network intrusion detection, you might find:
- Flow-based features (duration, bytes, packets) are often highly important
- TCP connection features (window size, TTL) can be critical
- Rate-based features (packets per second) help detect anomalies
- Service and protocol information provides context
If the analysis shows that 20 features capture 95% of importance:
- Retrain the model using only those 20 features
- Reduce computational cost
- Potentially improve generalization
If highly correlated features are both important:
- Consider creating combined features
- Remove redundant features
- Engineer new features based on relationships
Use the results to:
- Validate that important features make sense for intrusion detection
- Identify unexpected patterns
- Guide data collection priorities
Use the visualizations to:
- Explain the model to stakeholders
- Justify feature selection decisions
- Document model behavior
Solution: Run the main notebook first to generate preprocessed data.
Solution: Install required packages:
pip install -r requirements.txtSolution: Add this at the top of the notebook:
%matplotlib inlineSolution: Check write permissions in the current directory.
Modify the script to analyze specific features:
# In feature_importance_analysis.py or notebook
specific_features = ['sbytes', 'dbytes', 'rate', 'sttl', 'dttl']
subset_data = train_data[specific_features]
# Analyze correlations, distributions, etc.If you train different models, compare their feature importances:
model1 = joblib.load('model1.pkl')
model2 = joblib.load('model2.pkl')
importance_comparison = pd.DataFrame({
'Feature': feature_names,
'Model1': model1.feature_importances_,
'Model2': model2.feature_importances_
})The CSV file can be imported into:
- Excel: For custom charts and analysis
- Tableau/Power BI: For interactive dashboards
- R: For statistical analysis
- Python notebooks: For further exploration
- Run after model training: Always generate fresh importance scores after retraining
- Compare across datasets: Check if importance is consistent across different data splits
- Validate findings: Ensure important features make domain sense
- Document insights: Keep notes on what you learn from the analysis
- Version control: Save importance scores with model versions
After analyzing feature importance, consider:
- Cross-validation: Verify importance scores are stable across folds
- Hyperparameter tuning: Optimize model with important features
- Feature selection: Retrain with reduced feature set
- Multi-class classification: Analyze importance for specific attack types
- Real-time deployment: Use insights to optimize production systems
For issues or questions:
- Check this guide first
- Review the main README.md
- Open an issue on GitHub
- Check the Jupyter notebook comments
Happy Analyzing! 📊