An implementation of the Unicode Line Breaking Algorithm UAX #14. This implementation was originally started as a refresh of the linebreak package, and still shares a small amount of test driver code with that project. The rest has been rewritten to support a fully rules-based approach that implements UAX #14 from Unicode version 15.0. From that document:
Line breaking, also known as word wrapping, is the process of breaking a section of text into lines such that it will fit in the available width of a page, window or other display area. The Unicode Line Breaking Algorithm performs part of this process. Given an input text, it produces a set of positions called "break opportunities" that are appropriate points to begin a new line. The selection of actual line break positions from the set of break opportunities is not covered by the Unicode Line Breaking Algorithm, but is in the domain of higher level software with knowledge of the available width and the display size of the text.
npm install @cto.af/linebreak
Create and use a new Rules object:
import {Rules} from '@cto.af/linebreak'
const r = new Rules({string: true});
for (const brk of r.breaks('my input string')) {
console.log(brk.string); // "my ", "input ", "string"
console.log(brk.pos); // 3, 9, 15
console.log(brk.required); // false, false, true
}
The string
option in the constructor will chop the input up for you into
strings, rather than your having to do the slicing yourself. You may only
need the positions of the breaks, which is why this isn't done by default.
The iterated Break
objects also have a required
field.
You can tailor the rules that will be applied:
import {Rules, PASS} from '@cto.af/linebreak'
const r = new Rules();
r.replaceRule('LB25', (state) => PASS); // Do something more interesting that this!
There are a few other convenience function available for modifying rules. A few of the rules have interactions with one another due to idiosyncrasies of the specification text. Comments have been left at these points in the source. If you are going to replace or remove an existing rule, please make sure to account for those interactions.
In order for the conformance tests to pass, you can use the expanded number definition from UAX #14, Example 7:
const r = new Rules({example7: true});
Full API documentation is available.
This package intends to be fully conformant with UAX #14. It currently passes
ALL of the
tests
published by Unicode, when the example7
option is enabled in the costructor.
Other tailoring is possible by adding and removing rules.
MIT