SpectraClassifiers: Transformer-Based Spectra-to-Class Prediction for Improved Molecular Retrieval

Frederick Zhang, Yan Zhou Chen, Soha Hassoun

Department of Computer Science, Tufts University

SpectraClassifiers repository contains the working code used in <...> paper, for molecular classification from LC/MS2 spectra.

Install & setup

Clone the repository:

git clone git@github.com:HassounLab/SpectraClassifiers.git

Install environment

conda env create -f env.yml -p /path/to/env
conda activate /path/to/env

# install msgym with python3.9 compatability
pip install "git+ssh://git@github.com/HassounLab/SpectraClassifiers@msgym_setup#egg=massspecgym" --extra-index-url https://download.pytorch.org/whl/cu117 -f https://data.pyg.org/whl/torch-1.13.0+cu117.html

Add the project into your python path

export PYTHONPATH=/path/to/classification/repo:$PYTHONPATH

Data prep

Run download_data.py to obtain data from zenodo link as well as other data resources needed to run the project

python download_data.py

Delete downloaded zip file

Running Models on Test

To replicate this work we provide checkpoints for the main models we evaluate aginst as well as configs to run each model on test, these configs are stored in config_test.

We recommend you use our run_configs.py to run them all in a batch as follows:

python run_configs.py --test --config_path config_test --dst_dir model_preds

This will create a model_preds subdirectory in the results folder, and each results will be stored in that subdirectory.

Alternatively you could run each test config one at a time using:

python test.py --param_pth config_test/config_file.yaml

Experiments

Experiments are run from the paper_notebooks folder which contain jupyter notebooks for various experiments conducted on our model predictions. Notebook can be initalized with:

python -m jupyter notebook

Training model

We additionally provide the configs used to train each model in config_train

You can run multiple models in a batch using run_configs.py

python run_configs.py --config_path config_train --dst_dir name_of_folder

Alternatively you can run each model one at a time as follows:

python train.py --param_pth config_test/config_file.yaml

Checkpoint of each model will be stored under results folder for their corresponding model run name, so to run each model on test, you will need to individually run each test config changing the checkpoint to the checkpoint stored inside each run

Acknowledgements

Code for this work was built from MVP codebase.

References

Paper link here

Contact

Soha.Hassoun@tufts.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpectraClassifiers: Transformer-Based Spectra-to-Class Prediction for Improved Molecular Retrieval

Frederick Zhang, Yan Zhou Chen, Soha Hassoun

Department of Computer Science, Tufts University

Install & setup

Data prep

Running Models on Test

Experiments

Training model

Acknowledgements

References

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
build_data		build_data
config_test		config_test
config_train		config_train
configs		configs
paper_notebooks		paper_notebooks
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
download_data.py		download_data.py
env.yml		env.yml
make_hparams.py		make_hparams.py
run_configs.py		run_configs.py
test.py		test.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

SpectraClassifiers: Transformer-Based Spectra-to-Class Prediction for Improved Molecular Retrieval

Frederick Zhang, Yan Zhou Chen, Soha Hassoun

Department of Computer Science, Tufts University

Install & setup

Data prep

Running Models on Test

Experiments

Training model

Acknowledgements

References

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages