pandoc-wasm
TypeScript icon, indicating that this package has built-in type declarations

0.0.2 • Public • Published

Pandoc on WebAssembly

The universal document converter, compiled for WebAssembly and running in the browser.

Demo application: https://georgestagg.github.io/pandoc-wasm

What's included

This repository builds an npm package that wraps Pandoc, compiled for WebAssembly using the Asterius Haskell-to-Wasm compiler. A demo application is also included allowing for conversion between various document types.

Warning: Running Pandoc under WebAssembly using this library is fairly fragile at the moment (see the Extra Notes section below for details). Small documents seem to convert well, but there is definitely room for stability improvement when converting larger documents or including images.

How to use

First, install the pandoc-wasm package using npm:

npm install --save pandoc-wasm

Import the module, run init() to download the Wasm binary, and convert documents using run():

import { Pandoc } from "pandoc-wasm";

const pandoc = new Pandoc()
pandoc.init().then(async (pandoc) => {
  const result = await pandoc.run({
    text: "Some input text",
    options: { from: "markdown", to: "html" },
  });
  console.log(result);
});

See the example and src/app directories for more detailed examples.

Pandoc options

The Haskell code that powers the run() function is a modified version of Pandoc's own built in pandoc-server code, and takes a similar options object to control how documents are converted. See Pandoc's server documentation for details on the options settings that can be passed to Pandoc.

Including additional files

Supplemental files, such as images, can be included in the argument to the Pandoc .run() function. The files property should be a mapping from paths to file content, encoded either in a Uint8Array or a base64 encoded string:

pandoc.run({
  text: "![An image](images/test.png)",
  options: {
    'from': "markdown",
    'to': "html",
    'embed-resources': true
  },
  files: {
    'images/test.png': "iVBORw0KGgoAAAANSUhEUgAAADAAAAAlAQAAAAAsYlcCAAAACklEQVR4AWMYBQABAwABRUEDtQAAAABJRU5ErkJggg=="
  }
});

Using a JS Web Worker

Pandoc can be run from inside a Web Worker. I recommend the Comlink library as a way to handle communication between the main and worker threads. For an example see the demo application in the src/app directory, which uses this method.

Extra Notes

  • Asterius is deprecated in favour of the newer ghc-wasm-meta version of the GHC compiler. Once there is a simple way to use Template Haskell with ghc-wasm-meta, I'll switch to compiling with that toolchain.

  • Pandoc relies on some Haskell libraries that use external C sources (e.g. zlib), which does not work when compiling with Asterius. For those libraries the functionality is instead replicated with JavaScript libraries called using Asterius's JS FFI.

  • Asterius's --yolo mode has been used to avoid GC issues. Smaller documents seem to work OK, but it is easy to trigger "out of memory" errors. This should be less of a problem once we've switched to using ghc-wasm-meta.

  • Node/Deno should be possible, but does not work right now due to the way fetch() is used to download the Pandoc binary and support files.

  • No Lua filters right now. I think it should be possible to compile a C Lua interpreter using Emscripten (or something similar), then hook it up to the Pandoc wasm binary through a JS FFI.

Related Projects

We use the same general process as used in the above projects to build Pandoc, but this project has the following advantages:

  • Provides a newer version of Pandoc.
  • Supports more readers and writers, including binary formats such as docx.
  • Supports more options and extensions, including parsing YAML headers.
  • Supports adding and embedding supplemental files, such as images, to document output.

Readme

Keywords

Package Sidebar

Install

npm i pandoc-wasm

Weekly Downloads

24

Version

0.0.2

License

GPL-2.0-or-later

Unpacked Size

19.1 MB

Total Files

9

Last publish

Collaborators

  • georgestagg