baa-lexer
TypeScript icon, indicating that this package has built-in type declarations

0.3.1 • Public • Published

Original image by lemmling on OpenClipArt.org

Baa!

Baa is a highly-optimised tokenizer/lexer written in TypeScript. It is inspired by moo , but completely rewritten.

It accepts most of moo's configurations, but lacks some features.

  • No support for arrays of keywords.
  • No support for rules that are arrays of rule definitions.
  • No support for regular expressions with unicode flag
  • Less dynamic checks (e.g. silently drops all provided regex flags)

Advantages:

  • Compiles to a reusable concurrency-save lexer instead of creating an iterable object directly (see "Usage").
  • Different token format.
  • Slightly faster than moo (at least not much slower)
  • About 2.2kb of size.
  • Strong typings, including state-names and token-types
  • Understandable code

Note: This was mostly an exercise for me to practice test-driven development and think about architecture a bit. In the end, I tried to optimize speed and build size. I don't think it makes a lot of difference whether you use moo or baa. moo is more popular and may be better supported in the long run. I will use baa in handlebars-ng though.

Installation

Install the baa-lexer with

npm install baa-lexer

Usage

The examples/ show you how to use baa. One of the simple examples is this:

import { baa } from "baa-lexer";

const lexer = baa({
  main: {
    A: "a",
    FALLBACK: { fallback: true },
    B: "b",
  },
});

for (const token of lexer.lex("a b")) {
  console.log(token);
}

This will print in the following tokens:

{ type: 'A',  original: 'a', value: 'a', start: { line: 1, column: 0 }, end: { line: 1, column: 1 } }
{ type: 'FALLBACK', original: ' ', value: ' ', start: { line: 1, column: 1 }, end: { line: 1, column: 2 } }
{ type: 'B', original: 'b', value: 'b', start: { line: 1, column: 2 }, end: { line: 1, column: 3 } }

For a complete list of rules, have a look at the tests

Using types

If you create a type

interface Typings {
  tokenType: "my" | "token" | "types";
  stateName: "my" | "state" | "names";
}

and pass it as generic to the baa function, you will get auto-completion for types within the configuration as well as for the "type" field in the created tokens. The following screenshot highlights all places that are type-checked and auto-completed.

Benchmarks

See performance/ for the exact tests and run then yourself with

yarn perf

These are the results, but be aware that results may vary a lot:

 BENCH  Summary

  moo - performance/moo-baa.bench.ts > moo-baa test: './tests/abab.ts' (+0)
    1.07x faster than baa

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/fallback.ts' (+0)
    1.19x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (+0)
    1.50x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (1)
    1.25x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (2)
    1.19x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/json-regex.ts' (+0)
    1.15x faster than moo

  moo - performance/moo-baa.bench.ts > moo-baa test: './tests/json.ts' (+0)
    1.04x faster than baa

Readable / Extendable code

What bothered me most about moo was that it is just one large JavaScript file, and it took me a long while to understand all the optimizations they implemented.

It tried to take modular approach. Basically the whole program is divided into

  • The Lexer: Responsible for creating an IterableIterator of tokens which then manages state transitions. Uses the TokenFactory to create the actual tokens.
  • The Matcher: Finds the next token match. There are different strategies
    • RegexMatcher: Creates a large regex to find the next match
    • StickySingleCharMatcher: Uses an array to map char-codes to rules. Can only find single-char tokens, but this can be done much faster than with Regex.
  • The StateProcessor: Uses the Matcher to find the next match, interleaves matches for fallback and error rules.
  • The TokenFactory: Keeps track of the current location and creates tokens from matches.
  • The mooAdapter takes a moo-config and combines all those components so that they do what they should.

Advances usage

You do not have to use the mooAdapter though: Most the internal components are exposed, so you can use them yourself. You can create a StateProcess and pass your own Matcher instance to it. You can create a completely new StateProcessor with completely custom logic.

The program could also be extended to allow a custom TokenFactory, applying the token format that you need (but I won't do this unless somebody needs it).

Readme

Keywords

none

Package Sidebar

Install

npm i baa-lexer

Weekly Downloads

2

Version

0.3.1

License

MIT

Unpacked Size

28.8 kB

Total Files

30

Last publish

Collaborators

  • knappi