Subtypist

Reference-free identification of cell subtypes for single-cell transcriptomic data

Subtypist is an R toolkit for reference-free discovery and annotation of cell subtypes from single-cell transcriptomic data. It is designed primarily for subtype discovery within a predefined major cell type, such as macrophages, T cells, fibroblasts, or epithelial cells. Subtypist evaluates clustering results across multiple resolutions, merges clusters with insufficient marker specificity, and reports subtype-associated phenotypic molecules.

Installation

Install devtools if needed:

install.packages("devtools")

Install Subtypist from GitHub:

devtools::install_github("ZJUFanLab/Subtypist")

Or install from a local source directory:

devtools::install_local("/path/to/Subtypist-main.zip")

Usage

1. Load or subset a major cell type

Subtypist takes a predefined major cell type as input. For subtype analysis, users should provide cells subsetted from the major cell type of interest.

library(Seurat)
library(Subtypist)

# Subset a major cell type from a full Seurat object as input for Subtypist
FullObject <- readRDS("your_seurat_object.rds")
Seu_sub <- subset(FullObject, idents = "T cells")

Before running Subtypist_merge(), the input object should be a processed Seurat object with:

normalized expression data;
scaled data;
PCA or another appropriate reduction;
a nearest-neighbor graph generated by FindNeighbors();

The cluster_assay should match the assay used to build the nearest-neighbor graph. The examples below use "RNA" as a broadly applicable default. If your object was integrated and FindNeighbors() was run on the integrated assay, use cluster_assay = "integrated" instead.

2. Run Subtypist across clustering resolutions

For datasets characterized by continuous or transitional cellular states, the elbow mode is recommended.
For general datasets without clear transitional-state structure, the default mode is recommended.
For datasets focused on rare subtype detection, the default and elbow modes generally exhibit comparable robustness.

result_default <- Subtypist_merge(
  object = Seu_sub,
  min.resolution = 0.3,
  max.resolution = 1.5,
  by = 0.1,
  marker_assay = "RNA",
  cluster_assay = "RNA",
  n_candidate_markers = 300,
  top_k = 3,
  termination.mode = "default",
  prefix = "Subtypist"
)

result_elbow <- Subtypist_merge(
  object = Seu_sub,
  min.resolution = 0.3,
  max.resolution = 1.5,
  by = 0.1,
  marker_assay = "RNA",
  cluster_assay = "RNA",
  n_candidate_markers = 300,
  top_k = 3,
  termination.mode = "elbow",
  prefix = "Subtypist_elbow"
)

Subtypist_merge() returns a list with two main elements:

Object: the Seurat object with Subtypist metadata columns for evaluated resolutions.
result.table: a table summarizing subtype-supporting phenotypic molecules and specificity scores.

In Object, each evaluated resolution is stored as a separate metadata column. By default, these columns are named as: Subtypist_snn_res.<resolution>

Column	Description
`resolution`	Clustering resolution.
`merged_cluster`	Cluster index after Subtypist merging.
`initial_cluster`	Original cluster(s) contributing to the merged cluster.
`phenotypic_molecules`	Top subtype-supporting marker genes.
`Score`	Cluster-level specificity score based on selected marker genes.

Example:

head(result_default$result.table)

     resolution merged_cluster initial_cluster  phenotypic_molecules             Score
          <dbl>          <dbl> <list>           <list>                           <dbl>
  55        0.9              0 0                HCST, NKG7, GZMA             1.3664346
  56        0.9              1 1, 5             HLA-DRA, MS4A1, CD79A        3.7787186
  57        0.9              2 2                IL7R, LDHB, SARAF            0.9614802
  58        0.9              3 3                IL7R, ANXA1, CCL5            0.6511924
  59        0.9              4 4                NMNAT3, S100A6, FXYD2        0.0000000
  60        0.9              5 6                CMC1, NKG7, GZMK             3.2758660
  61        0.9              6 7                IL32, TNFRSF4, TIGIT         1.4888952
  62        0.9              7 8                JCHAIN, MZB1, XBP1           8.4554747
  63        0.9              8 9, 12            STMN1, TUBA1B, TUBB          2.3954183
  64        0.9              9 10               MS4A1, HLA-DRA, NMNAT3       0.5784272
  65        0.9             10 11               OXNAD1, KLRB1, IL7R          0.9706570

3. Rank resolutions by specificity score

Use sortScore() to summarize cluster-level scores at each resolution.

resolution_rank <- sortScore(result_default$result.table, mean)
resolution_rank <- resolution_rank[order(resolution_rank$value, decreasing = TRUE), ]
resolution_rank

Select the top-ranked resolution:

best_resolution <- resolution_rank$resolution[1]

4. Add Subtypist annotations to the Seurat object

Choose a resolution based on the specificity score ranking, or manually set a preferred resolution after inspecting the result table.

best_resolution <- resolution_rank$resolution[1]
# best_resolution <- 0.9

Seu_sub <- AddSubtypist(
  object = result_default$Object,
  result.table = result_default$result.table,
  resolution = best_resolution,
  prefix = "Subtypist",
  meta.prefix = "phenotypic_molecules_"
)

To manually select one marker from each cluster's phenotypic_molecules, use select_index with one index per merged_cluster.

selected_rows <- result_default$result.table[
  result_default$result.table$resolution == best_resolution,
]
select_index <- stats::setNames(
  rep(1, nrow(selected_rows)),
  as.character(selected_rows$merged_cluster)
)

Seu_sub <- AddSubtypist(
  object = result_default$Object,
  result.table = result_default$result.table,
  resolution = best_resolution,
  prefix = "Subtypist",
  meta.prefix = "phenotypic_molecules_",
  value.suffix = "+",
  select_index = select_index
)

5. Visualize Subtypist results

p <- Subtypist_Dimplot(
  object = Seu_sub,
  result.table = result_default$result.table,
  resolution = best_resolution,
  show = "molecular_phenotype",
  prefix = "Subtypist",
  meta.prefix = "phenotypic_molecules_"
)
p

6. Save results

saveRDS(result_default,'path_to_save.rds')
# saveResults(result_default$result.table, path = "output", name = "subtype_results.csv")
# saveResults(result_default$result.table, path = "output", name = "subtype_results.xlsx")

7. inter-resolution consensus analysis

Inter-resolution consensus analysis can be used to assess whether the merged clusters are stable across neighboring resolutions. Here, resolution 0.9 is used as an example. Subtypist_consensus() compares clusters at this resolution with clusters from nearby resolutions and adds consensus metrics to the result table. The neighbor_clusters column reports the matched cluster at each neighboring resolution, for example 0.8: 1; 1.0: 2.

consensus_table <- Subtypist_consensus(
  result.list = result_default,
  selected.resolution = 0.9,
  evaluate.all = FALSE,
  window.size = 1,
  prefix = "Subtypist"
)

consensus_table[
  consensus_table$resolution == 0.9,
  c(
    "resolution",
    "merged_cluster",
    "initial_cluster",
    "phenotypic_molecules",
    "Score",
    "neighbor_clusters",
    "consensus_score",
    "preservation_score",
    "cluster_fraction"
  )
]

Return:

resolution	merged_cluster	initial_cluster	phenotypic_molecules	Score	neighbor_clusters	consensus_score	preservation_score	cluster_fraction
0.9	0	0	HCST, NKG7, GZMA	1.37	0.8: 0; 1.0: 0	0.90	0.92	0.219
0.9	1	1, 5	HLA-DRA, MS4A1, CD79A	3.78	0.8: 1; 1.0: 1	0.94	0.99	0.213
0.9	2	2	IL7R, LDHB, SARAF	0.96	0.8: 2; 1.0: 2	0.87	0.96	0.118
0.9	7	8	JCHAIN, MZB1, XBP1	8.46	0.8: 7; 1.0: 8	1.00	1.00	0.035

The full per-neighbor matching table, including matched cluster size, intersection size, Jaccard score, and preservation score, is also available from attr(consensus_table, "consensus.match.table").

Notes

Subtypist does not require an external reference atlas or predefined subtype label set.
The current implementation is built on Seurat graph-based clustering and differential expression.
The main use case is subtype discovery within a selected major cell type.
phenotypic_molecules currently refers to subtype-supporting marker genes from the expression assay used for marker detection.
Metadata columns for merged Subtypist clusters are named with the selected prefix, for example Subtypist_snn_res.0.4.

About

Subtypist was developed by Yue Yao. For questions, please contact Yue Yao at yuey@zju.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
R		R
data		data
img		img
man		man
renv		renv
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subtypist

Reference-free identification of cell subtypes for single-cell transcriptomic data

Installation

Usage

1. Load or subset a major cell type

2. Run Subtypist across clustering resolutions

3. Rank resolutions by specificity score

4. Add Subtypist annotations to the Seurat object

5. Visualize Subtypist results

6. Save results

7. inter-resolution consensus analysis

Notes

About

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Subtypist

Reference-free identification of cell subtypes for single-cell transcriptomic data

Installation

Usage

1. Load or subset a major cell type

2. Run Subtypist across clustering resolutions

3. Rank resolutions by specificity score

4. Add Subtypist annotations to the Seurat object

5. Visualize Subtypist results

6. Save results

7. inter-resolution consensus analysis

Notes

About

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages