Skip to content

ZJUFanLab/Subtypist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Subtypist

R >= 4.0 Seurat >= 4.0.0

Reference-free identification of cell subtypes for single-cell transcriptomic data

curation

Subtypist is an R toolkit for reference-free discovery and annotation of cell subtypes from single-cell transcriptomic data. It is designed primarily for subtype discovery within a predefined major cell type, such as macrophages, T cells, fibroblasts, or epithelial cells. Subtypist evaluates clustering results across multiple resolutions, merges clusters with insufficient marker specificity, and reports subtype-associated phenotypic molecules.

Installation

Install devtools if needed:

install.packages("devtools")

Install Subtypist from GitHub:

devtools::install_github("ZJUFanLab/Subtypist")

Or install from a local source directory:

devtools::install_local("/path/to/Subtypist-main.zip")

Usage

1. Load or subset a major cell type

Subtypist takes a predefined major cell type as input. For subtype analysis, users should provide cells subsetted from the major cell type of interest.

library(Seurat)
library(Subtypist)

# Subset a major cell type from a full Seurat object as input for Subtypist
FullObject <- readRDS("your_seurat_object.rds")
Seu_sub <- subset(FullObject, idents = "T cells")

Before running Subtypist_merge(), the input object should be a processed Seurat object with:

  • normalized expression data;
  • scaled data;
  • PCA or another appropriate reduction;
  • a nearest-neighbor graph generated by FindNeighbors();

The cluster_assay should match the assay used to build the nearest-neighbor graph. The examples below use "RNA" as a broadly applicable default. If your object was integrated and FindNeighbors() was run on the integrated assay, use cluster_assay = "integrated" instead.

2. Run Subtypist across clustering resolutions

  • For datasets characterized by continuous or transitional cellular states, the elbow mode is recommended.

  • For general datasets without clear transitional-state structure, the default mode is recommended.

  • For datasets focused on rare subtype detection, the default and elbow modes generally exhibit comparable robustness.

result_default <- Subtypist_merge(
  object = Seu_sub,
  min.resolution = 0.3,
  max.resolution = 1.5,
  by = 0.1,
  marker_assay = "RNA",
  cluster_assay = "RNA",
  n_candidate_markers = 300,
  top_k = 3,
  termination.mode = "default",
  prefix = "Subtypist"
)

result_elbow <- Subtypist_merge(
  object = Seu_sub,
  min.resolution = 0.3,
  max.resolution = 1.5,
  by = 0.1,
  marker_assay = "RNA",
  cluster_assay = "RNA",
  n_candidate_markers = 300,
  top_k = 3,
  termination.mode = "elbow",
  prefix = "Subtypist_elbow"
)

Subtypist_merge() returns a list with two main elements:

  • Object: the Seurat object with Subtypist metadata columns for evaluated resolutions.
  • result.table: a table summarizing subtype-supporting phenotypic molecules and specificity scores.

In Object, each evaluated resolution is stored as a separate metadata column. By default, these columns are named as: Subtypist_snn_res.<resolution>

Column Description
resolution Clustering resolution.
merged_cluster Cluster index after Subtypist merging.
initial_cluster Original cluster(s) contributing to the merged cluster.
phenotypic_molecules Top subtype-supporting marker genes.
Score Cluster-level specificity score based on selected marker genes.

Example:

head(result_default$result.table)
     resolution merged_cluster initial_cluster  phenotypic_molecules             Score
          <dbl>          <dbl> <list>           <list>                           <dbl>
  55        0.9              0 0                HCST, NKG7, GZMA             1.3664346
  56        0.9              1 1, 5             HLA-DRA, MS4A1, CD79A        3.7787186
  57        0.9              2 2                IL7R, LDHB, SARAF            0.9614802
  58        0.9              3 3                IL7R, ANXA1, CCL5            0.6511924
  59        0.9              4 4                NMNAT3, S100A6, FXYD2        0.0000000
  60        0.9              5 6                CMC1, NKG7, GZMK             3.2758660
  61        0.9              6 7                IL32, TNFRSF4, TIGIT         1.4888952
  62        0.9              7 8                JCHAIN, MZB1, XBP1           8.4554747
  63        0.9              8 9, 12            STMN1, TUBA1B, TUBB          2.3954183
  64        0.9              9 10               MS4A1, HLA-DRA, NMNAT3       0.5784272
  65        0.9             10 11               OXNAD1, KLRB1, IL7R          0.9706570

3. Rank resolutions by specificity score

Use sortScore() to summarize cluster-level scores at each resolution.

resolution_rank <- sortScore(result_default$result.table, mean)
resolution_rank <- resolution_rank[order(resolution_rank$value, decreasing = TRUE), ]
resolution_rank

Select the top-ranked resolution:

best_resolution <- resolution_rank$resolution[1]

4. Add Subtypist annotations to the Seurat object

Choose a resolution based on the specificity score ranking, or manually set a preferred resolution after inspecting the result table.

best_resolution <- resolution_rank$resolution[1]
# best_resolution <- 0.9

Seu_sub <- AddSubtypist(
  object = result_default$Object,
  result.table = result_default$result.table,
  resolution = best_resolution,
  prefix = "Subtypist",
  meta.prefix = "phenotypic_molecules_"
)

To manually select one marker from each cluster's phenotypic_molecules, use select_index with one index per merged_cluster.

selected_rows <- result_default$result.table[
  result_default$result.table$resolution == best_resolution,
]
select_index <- stats::setNames(
  rep(1, nrow(selected_rows)),
  as.character(selected_rows$merged_cluster)
)

Seu_sub <- AddSubtypist(
  object = result_default$Object,
  result.table = result_default$result.table,
  resolution = best_resolution,
  prefix = "Subtypist",
  meta.prefix = "phenotypic_molecules_",
  value.suffix = "+",
  select_index = select_index
)

5. Visualize Subtypist results

p <- Subtypist_Dimplot(
  object = Seu_sub,
  result.table = result_default$result.table,
  resolution = best_resolution,
  show = "molecular_phenotype",
  prefix = "Subtypist",
  meta.prefix = "phenotypic_molecules_"
)
p

6. Save results

saveRDS(result_default,'path_to_save.rds')
# saveResults(result_default$result.table, path = "output", name = "subtype_results.csv")
# saveResults(result_default$result.table, path = "output", name = "subtype_results.xlsx")

7. inter-resolution consensus analysis

Inter-resolution consensus analysis can be used to assess whether the merged clusters are stable across neighboring resolutions. Here, resolution 0.9 is used as an example. Subtypist_consensus() compares clusters at this resolution with clusters from nearby resolutions and adds consensus metrics to the result table. The neighbor_clusters column reports the matched cluster at each neighboring resolution, for example 0.8: 1; 1.0: 2.

consensus_table <- Subtypist_consensus(
  result.list = result_default,
  selected.resolution = 0.9,
  evaluate.all = FALSE,
  window.size = 1,
  prefix = "Subtypist"
)

consensus_table[
  consensus_table$resolution == 0.9,
  c(
    "resolution",
    "merged_cluster",
    "initial_cluster",
    "phenotypic_molecules",
    "Score",
    "neighbor_clusters",
    "consensus_score",
    "preservation_score",
    "cluster_fraction"
  )
]

Return:

resolution merged_cluster initial_cluster phenotypic_molecules Score neighbor_clusters consensus_score preservation_score cluster_fraction
0.9 0 0 HCST, NKG7, GZMA 1.37 0.8: 0; 1.0: 0 0.90 0.92 0.219
0.9 1 1, 5 HLA-DRA, MS4A1, CD79A 3.78 0.8: 1; 1.0: 1 0.94 0.99 0.213
0.9 2 2 IL7R, LDHB, SARAF 0.96 0.8: 2; 1.0: 2 0.87 0.96 0.118
0.9 7 8 JCHAIN, MZB1, XBP1 8.46 0.8: 7; 1.0: 8 1.00 1.00 0.035

The full per-neighbor matching table, including matched cluster size, intersection size, Jaccard score, and preservation score, is also available from attr(consensus_table, "consensus.match.table").

Notes

  • Subtypist does not require an external reference atlas or predefined subtype label set.

  • The current implementation is built on Seurat graph-based clustering and differential expression.

  • The main use case is subtype discovery within a selected major cell type.

  • phenotypic_molecules currently refers to subtype-supporting marker genes from the expression assay used for marker detection.

  • Metadata columns for merged Subtypist clusters are named with the selected prefix, for example Subtypist_snn_res.0.4.

About

Subtypist was developed by Yue Yao. For questions, please contact Yue Yao at yuey@zju.edu.cn.

About

Subtypist is a computational toolkit for subtype identification of single-cell transcriptomic data without reference.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages