WikiParser-Node is an offline Wikitext parser developed by Bhsd for the Node.js environment. It can parse almost all wiki syntax and generate an Abstract Syntax Tree (AST) (Try it online). It also allows for easy querying and modification of the AST, and returns the modified Wikitext.
Although WikiParser-Node is not primarily designed to convert Wikitext to HTML, it provides pragmatic HTML rendering for many situations. Here is a list of example HTML pages from MediaWiki.org rendered using this package.
WikiParser-Node has been extensively tested against the official MediaWiki PHP parser tests with ~3,000 test cases, covering various edge cases and peculiarities of Wikitext. These tests are available here.
- Round-trip editing for bots and automation: parse Wikitext into an AST, query and modify nodes, then write back valid Wikitext.
- LSP and linting ready for Node.js tooling: powers WikiLint and Wikitext LSP.
- Browser/editor integration: works with CodeMirror, Monaco, and MediaWiki's official CodeMirror extension.
- Large-scale usage evidence: full-dump parsing and linting on English Wikipedia scale is practical on consumer hardware.
- Transparent quality signals: CI, CodeQL, public parser-test results, and coverage are all visible in this repository.
This version provides a CLI, but only retains the parsing and linting functionality. The parsed AST cannot be modified. It powers the Wikitext LSP, which provides multiple language services for editors such as VS Code, Sublime Text, and Helix.
A list of available linting rules can be found here.
A browser-compatible version, which can be used for code highlighting or as a LSP plugin in conjunction with editors such as CodeMirror and Monaco (Usage example). It has been integrated into the MediaWiki official CodeMirror extension since Release 1.45.
A lightweight version that only supports parsing and manipulation of templates. This version is designed for use cases where only template processing is needed, such as certain types of bots or web tools (e.g., GANReviewTool) that focus on template manipulation.
Please install the corresponding version as needed (WikiParser-Node or WikiLint), for example:
npm i wikiparser-nodeor
npm i wikilintYou can download the code via CDN, for example:
<script src="//cdn.jsdelivr.net/npm/wikiparser-node"></script>or
<script src="//unpkg.com/wikiparser-node/bundle/bundle-lsp.min.js"></script>For more browser extensions, please refer to the corresponding documentation.
For MediaWiki sites with the CodeMirror extension installed, such as different language editions of Wikipedia and other Wikimedia Foundation-hosted sites, you can use the following command to obtain the parser configuration:
npx getParserConfig <site> <script path> [user] [force]
# For example:
npx getParserConfig frwiki https://fr.wikipedia.org/w user@example.netThe generated configuration file will be saved in the config directory. You can then use the site name for Parser.config.
// For example:
Parser.config = "frwiki";Please refer to the Wiki. In particular, there are some usage examples that demonstrate how to use this package to complete various tasks.
import Parser from "wikiparser-node";
import type {TranscludeToken} from "wikiparser-node";
Parser.config = "enwiki";
const root = Parser.parse("{{Infobox|name=Old}}\nText"),
template = root.querySelector<TranscludeToken>("template#Template:Infobox");
template?.setValue("name", "New");
const wikitext = String(root);
assert.strictEqual(wikitext, "{{Infobox|name=New}}\nText");A full database dump (*.xml.bz2) scan of English Wikipedia's ~19 million articles (parsing and linting) on a personal MacBook Air takes about 5 hours.
- MediaWiki bot workflows that require robust AST manipulation and round-trip-safe edits.
- Node.js pipelines for linting and refactoring Wikitext.
- LSP-based language tooling.
- Browser-side editing helpers, and gadgets/user scripts that require Wikitext parsing.
The following limitations are documented for transparency.
- Memory leaks may occur in rare cases.
- Invalid page names with unicode characters are treated like valid ones (Example).
- Preformatted text with a leading space is only processed by
Token#toHtml. - BCP 47 language codes are not supported in language conversion (Example).
Expand
- Many extensions are not supported, such as
<indicator>and<ref>. &needs to be escaped in<syntaxhighlight>(Example).
- Some parser functions are not supported.
- New lines in
{{localurl:}}are not handled correctly (Example).
- The table of contents (TOC) is not supported.
- Style sanitization is sometimes different (Example).
- Table fostered content from
<table>HTML tags (Example).
<caption>elements are wrapped in<tbody>elements (Example).- Unclosed HTML tags in the table fostered content (Example).
<tr>elements should not be fostered (Example).
- Link trail is not supported (Example).
- Block elements inside a link should break it into multiple links (Example).
- Invalid or missing images (Examples 1, 2).
- Link starting with
../on a subpage (Example).
- External images are not supported (Examples 1, 2).
- No percent-encoding in displayed free external links (Example).
- Incomplete
<p>wrapping when there are block elements (e.g.,<pre>,<div>or even closing tags). - Mixed lists (Example).
- Automatic language conversion is not supported.
- Support for manual language conversion is minimal (Example).
- Illegal HTML entities (Example).
