VulGNN

This repository contains the code for VulGNN's model and data processing pipeline.

Getting Started

Environment

The primary dependencies of this project are:

PyTorch Geometric (version 2.3.1)
PyTorch
NumPy
scikit-learn
Joern (we used version 2.0.107 - many versions may work, but versions which are significantly older than this may not properly parse some samples)

It is built for a CUDA-enabled environment, but could be adapted to CPU easily by removing calls to "cuda()" or "to()" on tensors.

Execution

Extract the normalized Juliet source code
Follow data processing steps in the DataPreparation readme, starting at generate_cpgs.py and skipping cpg_normalizer.py (for our Juliet dataset - otherwise, run all necessary scripts). Make sure to change any in-code relative paths to the location you want.
Run main.py to start training

Repository Structure

Top level
- main.py - Main training execution script
- open_data.py - Functions used in main to load the dataset
- network.py - Contains the GNN models
- DataPreparation
  - Contains scripts used to prepare data for the model. See readme in directory for more information.
- sent2vec
  - A dockerized version of sent2vec. Can be used to (manually) generate CPGs using sent2vec tokenization. This processing is not automated in any way - the container is purely an environment to run sent2vec.
- Data
  - Contains an archive of our normalized subset of Juliet. It is the same as the one used in VulCNN.

Model Architecture

Five attentional message passing layers containing:
- The 160-node message passing operation with 4 heads of attention using GeneralConv for most tests or RGATConv for heterogeneous data
- Parametric ReLU with a learnable parameter for each node as opposed to one for the entire layer
- Graph normalization
- Random 3.5% dropout
Global mean pool
Random dropout 3.5%
A final linear layer with two nodes representing the binary classification

GeneralConv is configured with mean aggregation and dot-product attention while RGATConv is configured with within-relation attention, F-scaled cardinality preservation, and concatenation disabled. Parameters not mentioned here or above are left at their defaults. Training parameters/features include weighted cross entropy loss (besides in the weighted loss experiment), the Adam optimizer with default parameters, 350 epochs, and a batch size of 256.

Hardware Used for Training

We provide the following general hardware info, notes, and timings to allow for estimation of feasibility/timings on other hardware.

Hardware and Notes

One RTX 8000 (48GB PCIe)
- Note that the heterogenenous network uses nearly the entire VRAM of this GPU - it will not function on GPUs with less than ~44-48GB VRAM without modification of the network.
Dual Xeon E5-2630 v3 (total of 16 cores)
- Multicore performance is important in most data processing steps. The more cores you have, the faster most of these steps complete.
540 GB of RAM
- Relatively little system memory is used for this application.

Timings

Training the standard, 64-length, homogeneous network for 350 epochs takes about 70-80 minutes on this hardware with no other intensive software running concurrently.

Referencing This Work

If you use something from this repository or reference this work, please use the following citation:

@article{farmer2026software,
  title={Software Vulnerability Detection Using a Lightweight Graph Neural Network},
  author={Farmer, Miles and Ufuktepe, Ekincan and Watson, Anne and Carvalho, Hialo Muniz and Okun, Vadim and Maasaoui, Zineb and Palaniappan, Kannappan},
  journal={arXiv preprint arXiv:2603.29216},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Data		Data
DataPreparation		DataPreparation
.gitignore		.gitignore
README.md		README.md
config.py		config.py
environment.yml		environment.yml
main.py		main.py
metrics_util.py		metrics_util.py
network.py		network.py
new_open_dataset.py		new_open_dataset.py
test_model_standalone.py		test_model_standalone.py
train.sub		train.sub
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VulGNN

Getting Started

Environment

Execution

Repository Structure

Model Architecture

Hardware Used for Training

Hardware and Notes

Timings

Referencing This Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VulGNN

Getting Started

Environment

Execution

Repository Structure

Model Architecture

Hardware Used for Training

Hardware and Notes

Timings

Referencing This Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages