The reproducible, equity-aware, question-driven, AI-assisted bibliometric toolkit for working biomedical researchers.
scimapR is a comprehensive R package for bibliometric and scientometric analysis. It provides unified ingestion from 12+ bibliographic sources, classical and modern science-mapping analytics, embedding-based research cluster discovery, and a publication-ready Shiny application – all in one CRAN-compatible package with tibble-based outputs and viridis-themed visualisations.
- Live corpus refresh. Your corpus knows when it was last refreshed,
what is stale, and how to update itself (
sm_refresh(),sm_staleness(),sm_lock()). - Research questions as first-class objects. Build structured
PICO/PECO questions, auto-generate search queries, and screen with
optional LLM grounding (
sm_question(),sm_screen_against_question()). - Reproducible-by-construction corpus certificates. A YAML document
that another researcher can use to re-derive your exact corpus
(
sm_certificate(),sm_rebuild_from_cert()). - Author trajectory analysis. Track topic pivots, collaborator
turnover, and productivity curves across a career
(
sm_author_trajectory()). - Equity and representation auditing. Geographic, gender, funding,
and OA audits with built-in confidence reporting and epistemic
humility (
sm_audit_geographic(),sm_audit_gender()). - LLM-grounded corpus chat. Ask questions about your corpus with
every claim anchored to actual works – no hallucinated references
(
sm_chat()).
scimapR is inspired by and designed as a complement to the excellent bibliometrix package by Massimo Aria and Corrado Cuccurullo (2017, Journal of Informetrics, doi:10.1016/j.joi.2017.08.007).
bibliometrix is the foundational R package for science mapping. It pioneered many of the analyses that scimapR also provides. scimapR is not a fork, not a derivative, and contains no code copied or adapted from bibliometrix. For shared bibliographic formats, scimapR ships clean-room parsers written from public format specifications.
First-class round-trip interop is provided:
M <- sm_to_bibliometrix(corpus) # use with bibliometrix
corpus <- as_sm_corpus(M) # come back to scimapRSee vignette("relationship-to-bibliometrix") for details.
# Install from GitHub (development version)
# install.packages("pak")
pak::pak("CTTIR/scimapR")library(scimapR)
# Generate a synthetic corpus
corpus <- sm_example_corpus(n_works = 100, seed = 42)
print(corpus)
#>
#> ── <sm_corpus> ─────────────────────────────────────────────────────────────────
#> Works: 100 | Authors: 80 | Institutions: 0
#> Years: 2015 - 2024
#> Sources (journals): 10
#> Embeddings: 100 x 64
#> Provenance: synthetic (100)
#> Status: Unlocked (last refreshed: 2026-05-09 19:13:44)
# Visualise production
sm_plot_production(corpus)| Module | Functions |
|---|---|
| File ingestion | sm_read_bib(), sm_read_ris(), sm_read_wos(), sm_read_scopus(), sm_read_pubmed_xml(), … |
| API ingestion | sm_fetch_openalex(), sm_fetch_crossref(), sm_fetch_pubmed(), sm_fetch_semantic_scholar(), … |
| Enrichment | sm_enrich_unpaywall(), sm_enrich_altmetric(), sm_enrich_concepts(), … |
| Networks | sm_network_citation(), sm_network_cocitation(), sm_network_coupling(), sm_network_collab(), sm_network_coword() |
| Embeddings | sm_embed_works(), sm_cluster_hdbscan(), sm_cluster_leiden(), sm_cluster_label() |
| Indicators | sm_metric_h_index(), sm_metric_disruption(), sm_metric_rcr(), sm_metric_fnci(), sm_metric_novelty() |
| Visualisation | sm_plot_landscape(), sm_plot_thematic_map(), sm_plot_production(), sm_plot_equity_dashboard(), … |
| Export | sm_export_figure(), sm_export_table(), sm_export_zip(), sm_export_gephi() |
| Shiny app | sm_run_app() |
vignette("scimapR")– Getting startedvignette("ingestion")– Building a corpusvignette("embeddings-and-clusters")– Semantic landscapevignette("modern-indicators")– CD/RCR/FNCI/noveltyvignette("question-driven-reviews")– Research questions + screeningvignette("reproducibility-and-certificates")– Corpus certificatesvignette("equity-trajectory-and-chat")– Equity audit + trajectoriesvignette("relationship-to-bibliometrix")– Interop and credit
scimapR stands on the shoulders of the bibliometrix project. We are deeply grateful to Massimo Aria and Corrado Cuccurullo for creating the foundational R package for science mapping, and for their landmark 2017 paper which defined the field of R-based bibliometrics.
We also acknowledge the many data sources that make scimapR possible: OpenAlex, Crossref, PubMed, Semantic Scholar, Unpaywall, and others.
If you use scimapR in your research, please cite both scimapR and the foundational bibliometrix package:
citation("scimapR")BibTeX entries:
@Manual{scimapR,
title = {scimapR: Reproducible, Question-Driven, Embedding-Aware Science Mapping},
author = {Raban Heller},
year = {2026},
note = {R package version 0.1.0},
url = {https://github.com/CTTIR/scimapR},
}
@Article{bibliometrix,
title = {bibliometrix: An R-tool for comprehensive science mapping analysis},
author = {Massimo Aria and Corrado Cuccurullo},
journal = {Journal of Informetrics},
year = {2017},
volume = {11},
number = {4},
pages = {959--975},
doi = {10.1016/j.joi.2017.08.007},
}For a complete citation block including each data source used in your
corpus, run sm_cite_corpus(your_corpus).
MIT
Issues and pull requests are welcome at github.com/CTTIR/scimapR.
