A multimodal approach to domain-specific drug-kinase assignment using QR code embedding and image-based transfer learning
To get started, you'll need the run_qrxvision.py script, the requirements.txt file, and the necessary data files.
git clone https://github.com/mitkeng/QRxVision.git
cd your-repository-name- 📜 Scripts:
run_qrxvision.py,requirements.txt - 📊 Datasets:
Train_data.csv,Kinase_drug.csv,Total_dataset.csv
├── run_qrxvision.py # Main execution script
├── requirements.txt # Package dependencies
├── Train_data.csv # Dataset 1
├── Kinase_drug.csv # Dataset 2
└── Total_dataset.csv # Dataset 3
Open your terminal or command prompt, navigate to the directory where you placed the files, and install the required Python libraries using uv pip (or pip if uv is not installed):
# Using uv (recommended for speed)
uv pip install -r requirements.txt
# Or using standard pip
pip install -r requirements.txt
The run_qrxvision.py script is designed to be run from the command line using argparse to handle different input scenarios.
Command Structure: The basic command structure is python run_qrxvision.py [arguments].
--smile <SMILE_STRING>: (Required for single compound processing) Provide a single SMILES string for analysis.--name <COMPOUND_NAME>: (Optional, for single compound) Provide a name for the compound being analyzed.--csv_file <PATH_TO_CSV>: (Required for batch processing) Provide the path to a CSV file containing compounds. This CSV must have a column namedsmilefor SMILES strings and can optionally have anamecolumn.--output_file <OUTPUT_PATH.csv>: (Optional) Specify a path to save the similarity results to a CSV file. If not provided, results will only be printed to the console.--top_n <NUMBER>: (Optional) Specify the number of top similar compounds to display/save. Defaults to 10.
To find the top 7 similar compounds for Ripretinib and save the results:
python run_qrxvision.py --smile "CCN1C2=CC(=NC=C2C=C(C1=O)C3=CC(=C(C=C3Br)F)NC(=O)NC4=CC=CC=C4)NC" --name "Ripretinib" --output_file single_compound_results.csv --top_n 7The script will print the processing status and the top N similar compounds directly to your terminal.
If you specify --output_file, a CSV file will be generated. This file will contain details for each query compound (Name, SMILE, Test Image filename) and its top similar reference compounds (Reference Compound filename without .png extension, Similarity Score).
Example structure for single_compound_results.csv:
| TestCompound | TestCompoundSMILE | TestImage | ReferenceCompound | SimilarityScore |
|---|---|---|---|---|
| Ripretinib | CCN1C2=CC(...) | test_4.png | Cenerimod | 0.9729 |
| Ripretinib | CCN1C2=CC(...) | test_161.png | Erdafitinib | 0.9711 |
| ... | ... | ... | ... | ... |