multibyte
TypeScript icon, indicating that this package has built-in type declarations

1.0.5 • Public • Published

multibyte

NPM Link Language Build Status Code Coverage Gzipped Size Dependency details Tree shakeable ISC License

multibyte provides common string functions that respect multibyte Unicode characters.

npm install multibyte

The problem and the solution

On one hand, JavaScript strings use UTF-16 encoding, and on the other hand, JavaScript strings behave like an Array of code points. Unicode characters that take more than 2 bytes (like newer emoji) get split into 2 code points in many situations.

If you display Unicode text from a UTF-8 source, you need these multibyte functions that take advantage of the fact that Array.from(string) is Unicode safe.

import {
  charAt,
  codePointAt,
  length,
  slice,
  split,
  truncateBytes,
} from 'multibyte';

// JavaScript String.prototype.charAt() can return a UTF-16 surrogate
'a🚀c'.charAt(1); //  ❌ "\ud83d" (half a rocket)
charAt('a🚀c', 1); // ✅ "🚀"

// JavaScript String.prototype.codePointAt() can return a UTF-16 surrogate
'🚀abc'.codePointAt(1); //  ❌ 56960 (surrogate pair of rocket emoji)
codePointAt('🚀abc', 1); // ✅ 97 (the letter a)

// JavaScript returns length in UTF-16, not Unicode characters
'a🚀c'.length; //  ❌ 4
length('a🚀c'); // ✅ 3

// JavaScript slices along UTF-16 boundaries, not Unicode characters
'a🚀cdef'.slice(2, 3); //  ❌ "\ude80" (half a rocket)
slice('a🚀cdef', 2, 3); // ✅ "c"

// JavaScript splits along UTF-16 boundaries, not Unicode characters
'a🚀c'.split(''); //  ❌ ["a", "\ud83d", "\ude80", "c"]
split('a🚀c', ''); // ✅ ["a", "🚀", "c"] ✅

// JavaScript slices strings along UTF-16 boundaries, not Unicode characters
'a🚀cdef'.slice(0, 2); //       ❌ "a\ud83d" (half a rocket)
truncateBytes('a🚀cdef', 2); // ✅ "a" (including the rocket would be 3 total bytes)

BOM (Byte order mark) - U+FEFF

Under the hood, all these functions strip a leading BOM if present.

Package Sidebar

Install

npm i multibyte

Weekly Downloads

40

Version

1.0.5

License

ISC

Unpacked Size

11.8 kB

Total Files

6

Last publish

Collaborators

  • kensnyder