Skip to content

raghavagps/Pcleavage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Pcleavage: Prediction of Proteasome Cleavage Sites in Antigenic Sequences

Overview

Pcleavage is a Support Vector Machine (SVM)-based computational method developed for predicting:

  • Constitutive proteasome cleavage sites
  • Immunoproteasome cleavage sites

in antigenic protein sequences.

The method predicts cleavage positions generated during intracellular protein degradation and antigen processing pathways important for MHC class I presentation and T-cell epitope generation.

Pcleavage uses:

  • Support Vector Machine (SVM)
  • PEBLS (Parallel Exemplar-Based Learning)
  • Weka machine learning algorithms

The web server was developed to provide user-friendly prediction of proteasomal cleavage sites for immunoinformatics and vaccine design applications.


Research Paper

Title

Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences

Authors

  • Manoj Bhasin
  • G. P. S. Raghava

Journal

Nucleic Acids Research

Volume

33

Issue

Web Server Issue

Pages

W202–W207

Published Date

2005

Correct DOI

https://doi.org/10.1093/nar/gki587


Background

Proteasomes are cellular protein complexes responsible for intracellular protein degradation.

They play major roles in:

  • Protein turnover
  • Antigen processing
  • Generation of MHC class I ligands
  • T-cell epitope generation

There are two major forms of proteasomes:

Constitutive Proteasome

Present in normal cells and involved in general protein degradation.

Immunoproteasome

Activated by interferon-gamma and involved in generating peptides for MHC class I presentation.

Prediction of proteasome cleavage sites is important for:

  • Vaccine design
  • Immunoinformatics
  • T-cell epitope prediction
  • Antigen processing analysis

Objectives

The study aimed to:

  • Predict proteasome cleavage sites in protein sequences
  • Develop classifiers for constitutive proteasomes
  • Develop classifiers for immunoproteasomes
  • Improve antigen processing prediction
  • Create an accessible web server for researchers

Dataset Information

In Vitro Digested Dataset

Proteasome cleavage data were collected from:

  • Yeast enolase I
  • β-casein digestion studies

Cleavage residues were assigned as:

  • P1 cleavage sites

MHC Ligand Dataset

The MHC ligand dataset was collected from:

  • MHCBN database

Dataset Statistics

  • 1288 HLA-A and HLA-B restricted ligands
  • Final processed dataset:
    • 506 ligands
    • Derived from more than 250 proteins

Natural MHC ligands were assumed to contain major cleavage sites at their C-termini.


Independent Test Dataset

Independent datasets were obtained from:

  • Saxova et al.

The dataset included:

In Vitro Digestion Data

  • SSX-2 protein
  • HIV1-Nef protein
  • RUI protein

MHC Ligand Dataset

  • 231 unique ligands
  • Derived from 135 proteins

Machine Learning Approaches

Support Vector Machine (SVM)

SVM classifiers were implemented using:

  • SVM_light

Input Encoding

Each amino acid was encoded using:

  • 21-dimensional binary representation

Window sizes:

  • 7 amino acids for in vitro digestion data
  • 19 amino acids for MHC ligand data

PEBLS

Parallel Exemplar-Based Learning was used as a nearest-neighbor learning system for symbolic feature analysis.


Weka Algorithms

The following Weka algorithms were evaluated:

  • Logistic Regression
  • Naive Bayes
  • J48.PART

Cost-sensitive classification was applied because of imbalanced datasets.


Cross Validation Strategy

The models were evaluated using:

  • Five-fold cross-validation

Performance metrics included:

  • Sensitivity
  • Specificity
  • Accuracy
  • Matthew’s Correlation Coefficient (MCC)

Performance on In Vitro Data

SVM Performance

Kernel Sensitivity Specificity Accuracy MCC
RBF 86.4% 50.7% 68.6% 0.42
Polynomial 84.6% 55.6% 70.0% 0.43

Performance on MHC Ligand Data

SVM Performance

Kernel Sensitivity Specificity Accuracy MCC
RBF 84.3% 69.0% 76.7% 0.54
Polynomial 86.2% 65.4% 75.8% 0.53

The SVM classifier outperformed:

  • PEBLS
  • Naive Bayes
  • J48.PART
  • Logistic Regression

Independent Dataset Performance

In Vitro Dataset

Metric Value
Sensitivity 86.9%
Specificity 60.9%
Accuracy 68.0%
MCC 0.43

MHC Ligand Dataset

Metric Value
Sensitivity 82.3%
Specificity 45.0%
Accuracy 63.9%
MCC 0.29

ROC Analysis

Threshold-independent ROC analysis demonstrated:

In Vitro Dataset

Method AUC
Pcleavage 0.790
NetChop 0.805

MHC Ligand Dataset

Method AUC
Pcleavage 0.615
NetChop 0.609

The performance of Pcleavage was comparable to NetChop.


Web Server Features

The Pcleavage server allows users to:

  • Submit antigenic protein sequences
  • Predict constitutive proteasome cleavage sites
  • Predict immunoproteasome cleavage sites
  • Select prediction thresholds
  • Upload sequence files
  • Visualize cleavage positions

Supported formats include:

  • FASTA
  • EMBL
  • GCG
  • Plain text

Output Features

The server provides:

  • Cleavage site positions
  • Prediction scores
  • Cleavage/non-cleavage state
  • Graphical sequence mapping

Cleavage residues are displayed in:

  • Larger red-colored letters

Applications

Pcleavage can be used for:

  • T-cell epitope prediction
  • Vaccine design
  • Immunoinformatics
  • Antigen processing analysis
  • Proteasome cleavage analysis
  • Computational immunology

Technologies Used

  • Support Vector Machine (SVM)
  • SVM_light
  • PEBLS
  • Weka
  • Machine Learning
  • Binary Sequence Encoding
  • ROC Analysis

Important Findings

The study demonstrated that:

  • SVM-based classifiers outperform traditional methods
  • Proteasome cleavage prediction is feasible using sequence patterns
  • Pcleavage performs comparably to NetChop
  • MHC ligand data improve immunoproteasome prediction

Conclusion

Pcleavage provides an efficient computational framework for predicting constitutive proteasome and immunoproteasome cleavage sites in antigenic proteins.

The developed SVM-based models achieved strong performance and provide a valuable resource for:

  • Vaccine design
  • Antigen processing analysis
  • T-cell epitope identification
  • Immunoinformatics research

Web Server

http://www.imtech.res.in/raghava/pcleavage/

Mirror Server:

http://bioinformatics.uams.edu/mirror/pcleavage/


Contact

Dr. G. P. S. Raghava

Email: raghava@iiitd.ac.in

Address:
Indraprastha Institute of Information Technology Delhi


License

This project is intended for academic and research purposes only.

About

Pcleavage: Prediction of Proteasome Cleavage Sites in Antigenic Sequences

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors