Subtypist is an R toolkit for reference-free discovery and annotation of cell subtypes from single-cell transcriptomic data. It is designed primarily for subtype discovery within a predefined major cell type, such as macrophages, T cells, fibroblasts, or epithelial cells. Subtypist evaluates clustering results across multiple resolutions, merges clusters with insufficient marker specificity, and reports subtype-associated phenotypic molecules.
Install devtools if needed:
install.packages("devtools")Install Subtypist from GitHub:
devtools::install_github("ZJUFanLab/Subtypist")Or install from a local source directory:
devtools::install_local("/path/to/Subtypist-main.zip")Subtypist takes a predefined major cell type as input. For subtype analysis, users should provide cells subsetted from the major cell type of interest.
library(Seurat)
library(Subtypist)
# Subset a major cell type from a full Seurat object as input for Subtypist
FullObject <- readRDS("your_seurat_object.rds")
Seu_sub <- subset(FullObject, idents = "T cells")Before running Subtypist_merge(), the input object should be a processed Seurat object with:
- normalized expression data;
- scaled data;
- PCA or another appropriate reduction;
- a nearest-neighbor graph generated by
FindNeighbors();
The cluster_assay should match the assay used to build the nearest-neighbor graph. The examples below use "RNA" as a broadly applicable default. If your object was integrated and FindNeighbors() was run on the integrated assay, use cluster_assay = "integrated" instead.
-
For datasets characterized by continuous or transitional cellular states, the elbow mode is recommended.
-
For general datasets without clear transitional-state structure, the default mode is recommended.
-
For datasets focused on rare subtype detection, the default and elbow modes generally exhibit comparable robustness.
result_default <- Subtypist_merge(
object = Seu_sub,
min.resolution = 0.3,
max.resolution = 1.5,
by = 0.1,
marker_assay = "RNA",
cluster_assay = "RNA",
n_candidate_markers = 300,
top_k = 3,
termination.mode = "default",
prefix = "Subtypist"
)
result_elbow <- Subtypist_merge(
object = Seu_sub,
min.resolution = 0.3,
max.resolution = 1.5,
by = 0.1,
marker_assay = "RNA",
cluster_assay = "RNA",
n_candidate_markers = 300,
top_k = 3,
termination.mode = "elbow",
prefix = "Subtypist_elbow"
)Subtypist_merge() returns a list with two main elements:
Object: the Seurat object with Subtypist metadata columns for evaluated resolutions.result.table: a table summarizing subtype-supporting phenotypic molecules and specificity scores.
In Object, each evaluated resolution is stored as a separate metadata column. By default, these columns are named as: Subtypist_snn_res.<resolution>
| Column | Description |
|---|---|
resolution |
Clustering resolution. |
merged_cluster |
Cluster index after Subtypist merging. |
initial_cluster |
Original cluster(s) contributing to the merged cluster. |
phenotypic_molecules |
Top subtype-supporting marker genes. |
Score |
Cluster-level specificity score based on selected marker genes. |
Example:
head(result_default$result.table) resolution merged_cluster initial_cluster phenotypic_molecules Score
<dbl> <dbl> <list> <list> <dbl>
55 0.9 0 0 HCST, NKG7, GZMA 1.3664346
56 0.9 1 1, 5 HLA-DRA, MS4A1, CD79A 3.7787186
57 0.9 2 2 IL7R, LDHB, SARAF 0.9614802
58 0.9 3 3 IL7R, ANXA1, CCL5 0.6511924
59 0.9 4 4 NMNAT3, S100A6, FXYD2 0.0000000
60 0.9 5 6 CMC1, NKG7, GZMK 3.2758660
61 0.9 6 7 IL32, TNFRSF4, TIGIT 1.4888952
62 0.9 7 8 JCHAIN, MZB1, XBP1 8.4554747
63 0.9 8 9, 12 STMN1, TUBA1B, TUBB 2.3954183
64 0.9 9 10 MS4A1, HLA-DRA, NMNAT3 0.5784272
65 0.9 10 11 OXNAD1, KLRB1, IL7R 0.9706570
Use sortScore() to summarize cluster-level scores at each resolution.
resolution_rank <- sortScore(result_default$result.table, mean)
resolution_rank <- resolution_rank[order(resolution_rank$value, decreasing = TRUE), ]
resolution_rankSelect the top-ranked resolution:
best_resolution <- resolution_rank$resolution[1]Choose a resolution based on the specificity score ranking, or manually set a preferred resolution after inspecting the result table.
best_resolution <- resolution_rank$resolution[1]
# best_resolution <- 0.9
Seu_sub <- AddSubtypist(
object = result_default$Object,
result.table = result_default$result.table,
resolution = best_resolution,
prefix = "Subtypist",
meta.prefix = "phenotypic_molecules_"
)To manually select one marker from each cluster's phenotypic_molecules, use select_index with one index per merged_cluster.
selected_rows <- result_default$result.table[
result_default$result.table$resolution == best_resolution,
]
select_index <- stats::setNames(
rep(1, nrow(selected_rows)),
as.character(selected_rows$merged_cluster)
)
Seu_sub <- AddSubtypist(
object = result_default$Object,
result.table = result_default$result.table,
resolution = best_resolution,
prefix = "Subtypist",
meta.prefix = "phenotypic_molecules_",
value.suffix = "+",
select_index = select_index
)p <- Subtypist_Dimplot(
object = Seu_sub,
result.table = result_default$result.table,
resolution = best_resolution,
show = "molecular_phenotype",
prefix = "Subtypist",
meta.prefix = "phenotypic_molecules_"
)
psaveRDS(result_default,'path_to_save.rds')
# saveResults(result_default$result.table, path = "output", name = "subtype_results.csv")
# saveResults(result_default$result.table, path = "output", name = "subtype_results.xlsx")Inter-resolution consensus analysis can be used to assess whether the merged clusters are stable across neighboring resolutions. Here, resolution 0.9 is used as an example. Subtypist_consensus() compares clusters at this resolution with clusters from nearby resolutions and adds consensus metrics to the result table. The neighbor_clusters column reports the matched cluster at each neighboring resolution, for example 0.8: 1; 1.0: 2.
consensus_table <- Subtypist_consensus(
result.list = result_default,
selected.resolution = 0.9,
evaluate.all = FALSE,
window.size = 1,
prefix = "Subtypist"
)
consensus_table[
consensus_table$resolution == 0.9,
c(
"resolution",
"merged_cluster",
"initial_cluster",
"phenotypic_molecules",
"Score",
"neighbor_clusters",
"consensus_score",
"preservation_score",
"cluster_fraction"
)
]Return:
| resolution | merged_cluster | initial_cluster | phenotypic_molecules | Score | neighbor_clusters | consensus_score | preservation_score | cluster_fraction |
|---|---|---|---|---|---|---|---|---|
| 0.9 | 0 | 0 | HCST, NKG7, GZMA | 1.37 | 0.8: 0; 1.0: 0 | 0.90 | 0.92 | 0.219 |
| 0.9 | 1 | 1, 5 | HLA-DRA, MS4A1, CD79A | 3.78 | 0.8: 1; 1.0: 1 | 0.94 | 0.99 | 0.213 |
| 0.9 | 2 | 2 | IL7R, LDHB, SARAF | 0.96 | 0.8: 2; 1.0: 2 | 0.87 | 0.96 | 0.118 |
| 0.9 | 7 | 8 | JCHAIN, MZB1, XBP1 | 8.46 | 0.8: 7; 1.0: 8 | 1.00 | 1.00 | 0.035 |
The full per-neighbor matching table, including matched cluster size, intersection size, Jaccard score, and preservation score, is also available from attr(consensus_table, "consensus.match.table").
-
Subtypist does not require an external reference atlas or predefined subtype label set.
-
The current implementation is built on Seurat graph-based clustering and differential expression.
-
The main use case is subtype discovery within a selected major cell type.
-
phenotypic_moleculescurrently refers to subtype-supporting marker genes from the expression assay used for marker detection. -
Metadata columns for merged Subtypist clusters are named with the selected
prefix, for exampleSubtypist_snn_res.0.4.
Subtypist was developed by Yue Yao. For questions, please contact Yue Yao at yuey@zju.edu.cn.

