This repository provides an end-to-end pipeline for extracting features from Android malware/benign app benchmarks and building machine learning models to classify apps.
It is inspired by the droidxp/benchmark framework for automated Android security evaluation.
Below is a walkthrough of:
- How to set up DroidXP and collect benchmark data
- How to extract ML features from results
- How to train, evaluate ML models and automate the testing using this repository
To generate analysis results, you must first set up DroidXP and collect benign/malicious app execution traces.
Steps:
-
Clone the DroidXP repository:
git clone https://github.com/droidxp/benchmark
-
Clone the configuration repository:
git clone https://github.com/droidxp/benchmark-vm
-
Create a folder called
scriptin your home directory:mkdir ~/script -
Copy the 3 files from
benchmark-vm/ubuntu_20.04_python3_x86/to~/script:cp benchmark-vm/ubuntu_20.04_python3_x86/* ~/script/
-
Run
./config.shin~/scriptto install and configure DroidXP with all required tools:
This includes dependencies like DroidBot, Monkey, Java 7, Android SDK 23+, Python 2/3, Pixel 2 emulator, and ensures commands (adb,java,jarsigner,emulator,aapt) are in yourPATH.cd ~/script ./config.sh
-
Use apps from
LargeI.csvin the DroidXP repo'sdata/input/folder:
(Exclude app-25, app-32, app-36, app-41, app-88, app-93 for 96 pairs; or useSmallE.csvfor a smaller experiment.) -
Get an AndroZoo access key and insert it into
getApps.py(indata/input/):
Replace the placeholder inkey="Insert here your key". -
Edit
getApps.pyto select the CSV:
For example, set:with open('LargeI.csv') as csvfile: -
Run the script to download and format apps as malicious/benign:
Store them inbenchmark/data/input/.python getApps.py
- Main experiment (disable static analysis, 120s timeout, 3 repetitions, with tools Monkey, DroidBot, DroidMate, Humanoid):
e.g., python3 main.py -tools monkey droidbot droidmate humanoid -t time -r repetitions --disable-static
- Results appear in:
benchmark/results/<timestamp>/report/
(e.g.,benchmark/results/20210220212608/report/)
AutoDroid/
│
├── src_parse.py # Parses src.txt files to CSV (API features)
├── logs_parser.py # Parses security_report.log using Gemini AI (dynamic log features)
├── training_small.py # for smaller API-based features
├── train_big.py # for large log-based features
├── model.py # Predicts label for a single src.txt using trained model
├── automate.py # GUI automation: select APK, benchmark, predict
├── android-malware-analysis-4.ipynb # Jupyter notebook for other training/evaluation models
├── requirements.txt # Python package requirements
│
├── src_features.csv # Example output: extracted API features
├── securitylog_features.xlsx # Example output: extracted log-based features
├── benchmark_ml_features.csv # Unified ML features CSV from src_parse.py
│
├── README.md # This documentation
│
└── (other files/folders as needed, e.g., logs, models, data)
This proposal builds on the benchmark outputs (e.g., logs from DroidXP) to extract features, train models, and predict malware.
Sequential steps:
Generate logs via benchmarking (do this before feature extraction), extract features next (requires benchmark outputs), train models after that (uses extracted features), and finally predict or automate.
Customize paths (e.g., BASE_DIR, model paths, API keys) in all scripts before running.
- Use Python 3.6+ (Specifically 3.6.6 used in the notebook).
- Create a virtual environment:
python -m venv autodroid_env source autodroid_env/bin/activate # or autodroid_env\Scripts\activate on Windows
- Install libraries:
pip install pandas scikit-learn imbalanced-learn xgboost lightgbm joblib google-generativeai textual tkinter numpy ast
- For AI parsing in
logs_parser.py, replace"GEMINI_API_KEY"using Gemini API key. - Jupyter/Kaggle for the notebook:
- Gather benign/malicious APK's security_logs or src.txt files (assuming they are either replicated or referenced from here). [Recommended - Kaggle for training]
-To produce structured logs (src.txt and security_report.log) in folders like:
RESULTS_DIR/execution/timeout/repetition/tool/app/security_report/
-
To extract API features from
src.txt:
Use suitable inp/output dirs (BASE_DIR,RESULTS_DIR,EXECUTIONS) insrc_parse.pyand run:python3 src_parse.py
Outputs
benchmark_ml_features.csv. -
For detailed log features from
security_report.log:
Edit paths,OUTPUT_PATH, and API key inlogs_parser.pyand run:python3 logs_parser.py
Outputs a CSV (convert to Excel if needed, like
securitylog_features.xlsx).
-
For small API-based model (XGBoost):
Edit dataset path intraining_small.py(e.g., tobenchmark_ml_features.csvorsrc_features.csv) and run:python3 training_small.py
Outputs (e.g.,
detection_MODEL.pkl(model) andscaler_MODEL.pkl.) -
For larger features:
Edit path intrain_big.py(e.g., tosecuritylog_features.xlsx) and run:python3 train_big.py
-
For reference visit
android-malware-analysis-4.ipynb(includes version checks and use cases).
-
Basic prediction:
Edit paths inmodel.py(to trained model/scaler and asrc.txtfile) and run:python model.py
Outputs benign/malicious label and probability.
-
Full automation:
Edit paths inautomate.py(APK_INPUT_DIR,RESULTS_DIR,DROIDXP_PATH, model/scaler) and run:python automate.py
Use the TUI to select APK, run benchmark (via subprocess to DroidXP), parse, and predict.
- Test with samples like
src_features.csvorsecuritylog_features.xlsx. - Check metrics (e.g., F1-score ~0.74-0.82) in console outputs.
- Common issues: Path mismatches or API errors—verify logs.
Here's a breakdown of each file's purpose, key functions, and how they work (based on the project files):
-
src_parse.py:
Parsessrc.txtfiles (API category counts) from benchmark dirs.
Functionparse_src_txtreads tab-separated data, sums counts (e.g., "VIDEO": sum of values).
It traverses dirs to collect benign/malicious pairs, outputs CSV with "App", "Label" (0=benign, 1=malicious), and features.
Dependencies: Pandas, os. Run after benchmarking for small datasets. -
logs_parser.py:
Parsessecurity_report.logusing Gemini AI.
Functionparse_security_log_with_geminisends log to AI with a prompt, parses response as dict (e.g.,{"Callback_onCreate": 4}) viaast.literal_eval.
Traverses dirs like above, outputs CSV with dynamic features.
Dependencies: Pandas, os, google-generativeai, ast. Requires API key; handles errors in AI responses. -
training_small.py:
Trains XGBoost on small API datasets.
Loads CSV, splits train/test (stratified 75/25), scales with StandardScaler, applies SMOTE for imbalance, grid searches params (e.g., max_depth, learning_rate).
Evaluates with classification report, confusion matrix, tunes F1 threshold. Saves model/scaler via joblib.
Dependencies: Pandas, Scikit-learn, imblearn, XGBoost, joblib. -
train_big.py:
Similar to above but for large log datasets with LightGBM (more efficient).
Loads Excel/CSV, same preprocessing/grid search/evaluation.
No default save—uncomment joblib lines.
Dependencies: Same as above plus LightGBM. -
src_features.csv:
Sample output from src_parse.py. Columns: "App", "Label", API features (e.g., "NETWORK" sums). Use for quick training tests. -
securitylog_features.xlsx:
Sample from logs_parser.py. Columns: "App", "Label", detailed features (e.g., callback counts). For large-scale training. -
model.py:
Predicts on single src.txt. Functionparse_inconsistent_src_txtsums features flexibly.
Loads model/scaler, scales input, predicts label/probability (e.g., >0.5 = malicious).
Dependencies: Pandas, joblib, numpy. Interactive via input prompt. -
automate.py:
TUI for end-to-end workflow. Uses Textual for interface (buttons for APK select, benchmark, predict).
Runs subprocess for benchmarking (customize MAIN_COMMAND to DroidXP), parses like src_parse.py, loads model for prediction.
Logs in console via threading.
Dependencies: os, glob, shutil, subprocess, threading, tkinter, datetime, pandas, numpy, joblib, xgboost, textual. -
android-malware-analysis-4.ipynb:
Notebook version of training. Cells: Load data, train XGBoost/LightGBM with grid search, evaluate/save.
For testing on Kaggle (e.g., with GPU). Dependencies: Same as scripts plus Jupyter.
Questions or contributions?
Open an issue or pull request!