wikiparser-node
TypeScript icon, indicating that this package has built-in type declarations

1.18.3 • Public • Published

npm version CodeQL CI jsDelivr hits (npm) Codacy Badge Istanbul coverage

Other Languages

Introduction

WikiParser-Node is an offline Wikitext parser developed by Bhsd for the Node.js environment. It can parse almost all wiki syntax and generate an Abstract Syntax Tree (AST) (Try it online). It also allows for easy querying and modification of the AST, and returns the modified wikitext.

Other Versions

Mini (also known as WikiLint)

This version provides a CLI, but only retains the parsing functionality and linting functionality. The parsed AST cannot be modified. It is used for the WikiParser Language Server VSCode extension.

Browser-compatible

A browser-compatible version, which can be used for code highlighting or as a linting plugin in conjunction with editors such as CodeMirror and Monaco. (Usage example)

Installation

Node.js

Please install the corresponding version as needed (WikiParser-Node or WikiLint), for example:

npm i wikiparser-node

or

npm i wikilint

Browser

You can download the code via CDN, for example:

<script src="//cdn.jsdelivr.net/npm/wikiparser-node@browser/bundle/bundle.min.js"></script>

or

<script src="//unpkg.com/wikiparser-node@browser/bundle/bundle.min.js"></script>

For more browser extensions, please refer to the corresponding documentation.

Usage

CLI usage

For MediaWiki sites with the CodeMirror extension installed, such as different language editions of Wikipedia and other Wikimedia Foundation-hosted sites, you can use the following command to obtain the parser configuration:

npx getParserConfig <site> <script path> [force]
# For example:
npx getParserConfig jawiki https://ja.wikipedia.org/w

The generated configuration file will be saved in the config directory. You can then use the site name for Parser.config.

// For example:
Parser.config = 'jawiki';

API usage

Please refer to the Wiki.

Performance

A full dump scan of Chinese Wikipedia's ~2.8 million articles (parsing and linting) on a personal MacBook Air takes about 4 hours.

Known issues

Parser

  1. Memory leaks may occur in rare cases.
  2. Preformatted text with a leading space is only processed by Token.prototype.toHtml.

HTML conversion

  1. Many extensions are not supported, such as <indicator> and <ref>.
  2. Most parser functions are not supported.
  3. TOC is not supported.
  4. Link trail is not supported (Example).
  5. Incomplete <p> wrapping (Examples 1, 2).
  6. URI encoding in free external links (Example).
  7. When the entire table content is fostered, the table does not have an empty <td> (Examples 1, 2, 3, 4).

Package Sidebar

Install

npm i wikiparser-node

Weekly Downloads

393

Version

1.18.3

License

GPL-3.0

Unpacked Size

1.65 MB

Total Files

208

Last publish

Collaborators

  • bhsd