msgpack-nodejs
Yet another javascript/nodejs implementation of MsgPack Spec.
The purpose behind is learning by doing, which focuses on modern tools/techniques of nodejs/typescript ecosystem.
Contents
Usage
npm i msgpack-nodejs
npm test
Example
Please check example.md
API
-
encode():
any
=>Uint8Array
-
decode():
Uint8Array
=>Exclude<any, Map>
-
EncodeStream class:
Exclude<any, null>
=>Buffer
-
DecodeStream class:
Buffer
=>Exclude<any, null>
1 - registerExtension(): Register your own extension
- stringBufferStat(): Show string buffer copied count and size
- lruCacheStat(): Show cache hit/miss count
- bufferAllocatorStat(): Show how byte-array allocate new buffer
- prefixTrieStat(): Show prefix-trie hit/miss count
- applyOptions(): Manually control caching
Options
You can apply options like this
Key | type | default | Description |
---|---|---|---|
encoder.mapKeyCache.enabled | boolean | true | Cache map-key or not |
encoder.mapKeyCache.size | number | 30 | How big is the mapKeyCache |
encoder.stringCache.enabled | boolean | true | Cache any string except map-key or not |
encoder.stringCache.size | number | 100 | How big is the stringCache |
encoder.byteArray.base | number | 1024 | How many bytes will be allocated for every execution. Setting this would increase performance when handling many big JSON data |
decoder.shortStringCache.enabled | boolean | true | Use prefix-trie or not |
decoder.shortStringCache.lessThan | number | 10 | Only cache if string is shorter than this value |
decoder.jsUtf8Decode.enabled | boolean | true | Use JS utf8-decode or not |
decoder.jsUtf8Decode.lessThan | number | 200 | Only use JS utf8-decode if string is shorter than this value |
Project status
Compability
Env | Executable? |
---|---|
Node.js 18 | |
Node.js 16 | |
Node.js 14 | |
Node.js 12 |
Limitation
- Does not support float 32 encoding, because Javascript float is always 64-bit.
TODO
- Ext tests
- Map 16/32 tests
Benchmark
By utlizing the great benchmark tool by msgpack-lite, I thought the performance of this project would not be disappointing.
Runs on node.js 16 & laptop with R5-5625U.
operation | op | ms | op/s |
---|---|---|---|
buf = Buffer(JSON.stringify(obj)); | 1021200 | 5000 | 204240 |
obj = JSON.parse(buf); | 1279500 | 5000 | 255900 |
buf = require("msgpack-lite").encode(obj); | 685800 | 5000 | 137160 |
obj = require("msgpack-lite").decode(buf); | 389800 | 5001 | 77944 |
buf = Buffer(require("msgpack.codec").msgpack.pack(obj)); | 713600 | 5000 | 142720 |
obj = require("msgpack.codec").msgpack.unpack(buf); | 401300 | 5001 | 80243 |
buf = require("msgpack-js-v5").encode(obj); | 284400 | 5000 | 56880 |
obj = require("msgpack-js-v5").decode(buf); | 544600 | 5000 | 108920 |
buf = require("msgpack-js").encode(obj); | 277100 | 5001 | 55408 |
obj = require("msgpack-js").decode(buf); | 559800 | 5000 | 111960 |
buf = require("msgpack5")().encode(obj); | 147700 | 5001 | 29534 |
obj = require("msgpack5")().decode(buf); | 239500 | 5000 | 47900 |
buf = require("notepack").encode(obj); | 1041500 | 5000 | 208300 |
obj = require("notepack").decode(buf); | 671300 | 5000 | 134260 |
obj = require("msgpack-unpack").decode(buf); | 163400 | 5001 | 32673 |
buf = require("msgpack-nodejs").encode(obj); (Run in sequence) | 1148900 | 5000 | 229780 |
obj = require("msgpack-nodejs").decode(buf); (Run in sequence) | 777500 | 5000 | 155500 |
buf = require("msgpack-nodejs").encode(obj); (Run exclusively) | 1321900 | 5000 | 264380 |
obj = require("msgpack-nodejs").decode(buf); (Run exclusively) | 805400 | 5000 | 161080 |
Implementation detail
Encode
Encoder uses a recursive function match()
to match JSON structure (primitive value, object, array or nested), and pushes anything encoded into ByteArray that responsible for allocating buffer. Encoded string will be written in StringBuffer first and cached in LruCache.
Decode
Decoder uses parseBuffer()
to read every value out, and push them into StructBuilder to rebuild whole JSON object. For string less than 200 bytes, use pure JS utf8Decode(), then cache in prefix trie.
Optimization strategies:
Cache
- To improve encoding performance, LruCache was used for caching encoded string and its header.
- To improve decoding peformance, prefix trie was deployed for Uint8Array caching.
- To avoid evicting, map-key caching and string caching were separated.
ArrayBuffer / TypedArray
- To efficiently allocate new buffer, every ByteArray begins with small buffer (1K). 2
- To efficiently handle unpredictable large JSON, ByteArray allocates exponentially.
- To avoid overhead on writing, ByteArray uses
DataView
calls as much as possible.
Node.js
- To maximize performance of array, use pre-allocated array. 3
- To maximize performance of string encoding, string are encoded in StringBuffer with encodeInto() to prevent unnecessary copying. Then these encoded content will be referenced by
subarray()
for writing and caching. 4 - To avoid overhead of
TextDecoder()
, decode UTF-8 bytes with pure JS when less than 200 bytes. 3 - To avoid GC overhead in decoder, every parsed typed value will be passed into
builder.insertValue()
directly. 5 - To avoid syntax penalty of private class fields under node.js 18, use TypeScript's syntax instead.
Lessons learned
-
After attaching another stream that does not expect a object as its input, you may encounter error
↩ -
Thanks to kriszyp/msgpackr for inspiration of better buffer allocation strategy.
↩ -
Thanks to AppSpector, this article gives very practical advices including pre-allocated array and manual decoding under 200 characters.
↩ ↩ 2↩ 3 -
Thanks to msgpack/msgpack-javascript for technique including UTF-8 bytes calculation and usage of
encodeInto()
, which led me to the ultra optimization strategy.↩ -
使用 ESLint, Prettier, Husky, Lint-staged 以及 Commitizen 提升專案品質及一致性
↩