WikiParser-Node

Other Languages

简体中文

Introduction

WikiParser-Node is an offline Wikitext parser developed by Bhsd for the Node.js environment. It can parse almost all wiki syntax and generate an Abstract Syntax Tree (AST) (Try it online). It also allows for easy querying and modification of the AST, and returns the modified Wikitext.

Although WikiParser-Node is not primarily designed to convert Wikitext to HTML, it provides pragmatic HTML rendering for many situations. Here is a list of example HTML pages from MediaWiki.org rendered using this package.

WikiParser-Node has been extensively tested against the official MediaWiki PHP parser tests with ~3,000 test cases, covering various edge cases and peculiarities of Wikitext. These tests are available here.

Why WikiParser-Node

Round-trip editing for bots and automation: parse Wikitext into an AST, query and modify nodes, then write back valid Wikitext.
LSP and linting ready for Node.js tooling: powers WikiLint and Wikitext LSP.
Browser/editor integration: works with CodeMirror, Monaco, and MediaWiki's official CodeMirror extension.
Large-scale usage evidence: full-dump parsing and linting on English Wikipedia scale is practical on consumer hardware.
Transparent quality signals: CI, CodeQL, public parser-test results, and coverage are all visible in this repository.

Used by

Other Versions

WikiLint

This version provides a CLI, but only retains the parsing and linting functionality. The parsed AST cannot be modified. It powers the Wikitext LSP, which provides multiple language services for editors such as VS Code, Sublime Text, and Helix.

A list of available linting rules can be found here.

Browser-compatible

A browser-compatible version, which can be used for code highlighting or as a LSP plugin in conjunction with editors such as CodeMirror and Monaco (Usage example). It has been integrated into the MediaWiki official CodeMirror extension since Release 1.45.

WikiParser-Template

A lightweight version that only supports parsing and manipulation of templates. This version is designed for use cases where only template processing is needed, such as certain types of bots or web tools (e.g., GANReviewTool) that focus on template manipulation.

Installation

Node.js

Please install the corresponding version as needed (WikiParser-Node or WikiLint), for example:

npm i wikiparser-node

or

npm i wikilint

Browser

You can download the code via CDN, for example:

<script src="//cdn.jsdelivr.net/npm/wikiparser-node"></script>

or

<script src="//unpkg.com/wikiparser-node/bundle/bundle-lsp.min.js"></script>

For more browser extensions, please refer to the corresponding documentation.

Usage

CLI usage

For MediaWiki sites with the CodeMirror extension installed, such as different language editions of Wikipedia and other Wikimedia Foundation-hosted sites, you can use the following command to obtain the parser configuration:

npx getParserConfig <site> <script path> [user] [force]
# For example:
npx getParserConfig frwiki https://fr.wikipedia.org/w user@example.net

The generated configuration file will be saved in the config directory. You can then use the site name for Parser.config.

// For example:
Parser.config = "frwiki";

API usage

Please refer to the Wiki. In particular, there are some usage examples that demonstrate how to use this package to complete various tasks.

Round-trip editing quickstart (TypeScript)

import Parser from "wikiparser-node";
import type {TranscludeToken} from "wikiparser-node";
Parser.config = "enwiki";
const root = Parser.parse("{{Infobox|name=Old}}\nText"),
	template = root.querySelector<TranscludeToken>("template#Template:Infobox");
template?.setValue("name", "New");
const wikitext = String(root);
assert.strictEqual(wikitext, "{{Infobox|name=New}}\nText");

Performance

A full database dump (*.xml.bz2) scan of English Wikipedia's ~19 million articles (parsing and linting) on a personal MacBook Air takes about 5 hours.

Best fit

MediaWiki bot workflows that require robust AST manipulation and round-trip-safe edits.
Node.js pipelines for linting and refactoring Wikitext.
LSP-based language tooling.
Browser-side editing helpers, and gadgets/user scripts that require Wikitext parsing.

Known issues

The following limitations are documented for transparency.

Parser

Memory leaks may occur in rare cases.
Invalid page names with unicode characters are treated like valid ones (Example).
Preformatted text with a leading space is only processed by Token#toHtml.
BCP 47 language codes are not supported in language conversion (Example).

HTML conversion

Expand

Extension

Many extensions are not supported, such as <indicator> and <ref>.
& needs to be escaped in <syntaxhighlight> (Example).

Transclusion

Some parser functions are not supported.
New lines in {{localurl:}} are not handled correctly (Example).

Heading

The table of contents (TOC) is not supported.

HTML tag

Style sanitization is sometimes different (Example).
Table fostered content from <table> HTML tags (Example).

Table

<caption> elements are wrapped in <tbody> elements (Example).
Unclosed HTML tags in the table fostered content (Example).
<tr> elements should not be fostered (Example).

Link

Link trail is not supported (Example).
Block elements inside a link should break it into multiple links (Example).
Invalid or missing images (Examples 1, 2).
Link starting with ../ on a subpage (Example).

External link

External images are not supported (Examples 1, 2).
No percent-encoding in displayed free external links (Example).

Block element

Incomplete <p> wrapping when there are block elements (e.g., <pre>, <div> or even closing tags).
Mixed lists (Example).

Language conversion

Automatic language conversion is not supported.
Support for manual language conversion is minimal (Example).

Miscellaneous

Illegal HTML entities (Example).

Name		Name	Last commit message	Last commit date
Latest commit History 2,489 Commits
.github		.github
addon		addon
bin		bin
config		config
coverage		coverage
data		data
dist/extensions		dist/extensions
errors		errors
extensions		extensions
i18n		i18n
lib		lib
mixin		mixin
parser		parser
printed		printed
render		render
script		script
src		src
test		test
typings		typings
util		util
.codacy.yml		.codacy.yml
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.mocharc.json		.mocharc.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README-(ZH).md		README-(ZH).md
README.md		README.md
base.ts		base.ts
build.sh		build.sh
bump.sh		bump.sh
diff.sh		diff.sh
eslint.config.mjs		eslint.config.mjs
index.ts		index.ts
internal.ts		internal.ts
logo.png		logo.png
package-lock.json		package-lock.json
package.json		package.json
sed.sh		sed.sh
tsconfig.dist.json		tsconfig.dist.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

WikiParser-Node

Other Languages

Introduction

Why WikiParser-Node

Used by

Other Versions

WikiLint

Browser-compatible

WikiParser-Template

Installation

Node.js

Browser

Usage

CLI usage

API usage

Round-trip editing quickstart (TypeScript)

Performance

Best fit

Known issues

Parser

HTML conversion

Extension

Transclusion

Heading

HTML tag

Table

Link

External link

Block element

Language conversion

Miscellaneous

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 107

Uh oh!

Contributors

Uh oh!

Languages