Skip to content

HassounLab/SpectraClassifiers

Repository files navigation

SpectraClassifiers: Transformer-Based Spectra-to-Class Prediction for Improved Molecular Retrieval

Frederick Zhang, Yan Zhou Chen, Soha Hassoun

Department of Computer Science, Tufts University

SpectraClassifiers repository contains the working code used in <...> paper, for molecular classification from LC/MS2 spectra.

Install & setup

  1. Clone the repository:
git clone git@github.com:HassounLab/SpectraClassifiers.git
  1. Install environment
conda env create -f env.yml -p /path/to/env
conda activate /path/to/env

# install msgym with python3.9 compatability
pip install "git+ssh://git@github.com/HassounLab/SpectraClassifiers@msgym_setup#egg=massspecgym" --extra-index-url https://download.pytorch.org/whl/cu117 -f https://data.pyg.org/whl/torch-1.13.0+cu117.html
  1. Add the project into your python path
export PYTHONPATH=/path/to/classification/repo:$PYTHONPATH

Data prep

Run download_data.py to obtain data from zenodo link as well as other data resources needed to run the project

python download_data.py

Delete downloaded zip file

Running Models on Test

To replicate this work we provide checkpoints for the main models we evaluate aginst as well as configs to run each model on test, these configs are stored in config_test.

We recommend you use our run_configs.py to run them all in a batch as follows:

python run_configs.py --test --config_path config_test --dst_dir model_preds

This will create a model_preds subdirectory in the results folder, and each results will be stored in that subdirectory.

Alternatively you could run each test config one at a time using:

python test.py --param_pth config_test/config_file.yaml

Experiments

Experiments are run from the paper_notebooks folder which contain jupyter notebooks for various experiments conducted on our model predictions. Notebook can be initalized with:

python -m jupyter notebook

Training model

We additionally provide the configs used to train each model in config_train

You can run multiple models in a batch using run_configs.py

python run_configs.py --config_path config_train --dst_dir name_of_folder

Alternatively you can run each model one at a time as follows:

python train.py --param_pth config_test/config_file.yaml

Checkpoint of each model will be stored under results folder for their corresponding model run name, so to run each model on test, you will need to individually run each test config changing the checkpoint to the checkpoint stored inside each run

Acknowledgements

Code for this work was built from MVP codebase.

References

Paper link here

Contact

Soha.Hassoun@tufts.edu

About

Transformer-based spectra classifiers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors