qb-bithelp
Functions for viewing and understanding binary encodings.
Complies with the 100% test coverage and minimum dependency requirements of qb-standard .
Install:
npm install qb-bithelp
Usage
utfinfo( string_or_code_points, options )
Print hex/bit information for a raw (code-point), UTF-8, and UTF-16 encodings.
string_or_code_points: a single UTF code point (integer), an array of code points, or a string
options:
log: (default is console.log) set this to a function (takes single string input) to provide your own logger to capture output.
To highlight special cases, utfinfo only prints the 'interesting' stuff. That is, it will print UTF-8 only if it is different from the code-point (e.g. non-ascii) and will print UTF-16 only if it is different from UTF-8 ()has hi/lo surrogates).
Again, utfinfo() takes a variety of inputs - array of code points, single code-point or string:
var bhelp = require('qb-bithelp');
bhelp.utfinfo('e', 'simple ascii');
bhelp.utfinfo('\u00E6', 'a ligature');
bhelp.utfinfo([ 0x10009, 0x10010 ], 'large (4-byte) characters' );
bhelp.utfinfo('做法方。', 'some chinese');
... gives the output:
==============================================================================
1 code point for " e ", simple ascii
-----------
(1/1) " e ":
point : 00.00.00.65 | 0000 0000 : 0000 0000 : 0000 0000 : 0110 0101
-----------
==============================================================================
1 code point for " æ ", a ligature
-----------
(1/1) " æ ":
point : 00.00.00.E6 | 0000 0000 : 0000 0000 : 0000 0000 : 1110 0110
utf8 : 00.00.C3.A6 | 0000 0000 : 0000 0000 : 1100 0011 : 1010 0110
-----------
==============================================================================
2 code points for " 𐀉𐀐 ", large (4-byte) characters
-----------
(1/2) " 𐀉 ":
point : 00.01.00.09 | 0000 0000 : 0000 0001 : 0000 0000 : 0000 1001
utf8 : F0.90.80.89 | 1111 0000 : 1001 0000 : 1000 0000 : 1000 1001
utf16 : D8.00.DC.09 | 1101 1000 : 0000 0000 : 1101 1100 : 0000 1001
-----------
(2/2) " 𐀐 ":
point : 00.01.00.10 | 0000 0000 : 0000 0001 : 0000 0000 : 0001 0000
utf8 : F0.90.80.90 | 1111 0000 : 1001 0000 : 1000 0000 : 1001 0000
utf16 : D8.00.DC.10 | 1101 1000 : 0000 0000 : 1101 1100 : 0001 0000
-----------
==============================================================================
4 code points for " 做法方。 ", some chinese
-----------
(1/4) " 做 ":
point : 00.00.50.5A | 0000 0000 : 0000 0000 : 0101 0000 : 0101 1010
utf8 : 00.E5.81.9A | 0000 0000 : 1110 0101 : 1000 0001 : 1001 1010
-----------
(2/4) " 法 ":
point : 00.00.6C.D5 | 0000 0000 : 0000 0000 : 0110 1100 : 1101 0101
utf8 : 00.E6.B3.95 | 0000 0000 : 1110 0110 : 1011 0011 : 1001 0101
-----------
(3/4) " 方 ":
point : 00.00.65.B9 | 0000 0000 : 0000 0000 : 0110 0101 : 1011 1001
utf8 : 00.E6.96.B9 | 0000 0000 : 1110 0110 : 1001 0110 : 1011 1001
-----------
(4/4) " 。 ":
point : 00.00.30.02 | 0000 0000 : 0000 0000 : 0011 0000 : 0000 0010
utf8 : 00.E3.80.82 | 0000 0000 : 1110 0011 : 1000 0000 : 1000 0010
-----------
Note that if you see quoted " ????? " or similar output in your browser or editor, it is because it doesn't handle those characters correctly. A viewer that supports encodings will show characters.
bitinfo( string_or_code_points, options )
Takes same types of input and options
as utfinfo()
but just prints the raw code-point
information:
var bhelp = require('qb-bithelp');
bhelp.bitinfo( 'abcé', 'simple string');
bhelp.bitinfo( '\u00E6', 'a ligature');
bhelp.bitinfo( [ 0x10009, 0x10010 ], 'large (4-byte) characters' );
bhelp.bitinfo( '做法方便上手。', 'some chinese' );
... gives the output:
point : 00.00.00.61 | 0000 0000 : 0000 0000 : 0000 0000 : 0110 0001 | " a " : simple string
point : 00.00.00.62 | 0000 0000 : 0000 0000 : 0000 0000 : 0110 0010 | " b " : simple string
point : 00.00.00.63 | 0000 0000 : 0000 0000 : 0000 0000 : 0110 0011 | " c " : simple string
point : 00.00.00.E9 | 0000 0000 : 0000 0000 : 0000 0000 : 1110 1001 | " é " : simple string
point : 00.00.00.E6 | 0000 0000 : 0000 0000 : 0000 0000 : 1110 0110 | " æ " : a ligature
point : 00.01.00.09 | 0000 0000 : 0000 0001 : 0000 0000 : 0000 1001 | " 𐀉 " : large (4-byte) characters
point : 00.01.00.10 | 0000 0000 : 0000 0001 : 0000 0000 : 0001 0000 | " 𐀐 " : large (4-byte) characters
point : 00.00.50.5A | 0000 0000 : 0000 0000 : 0101 0000 : 0101 1010 | " 做 " : some chinese
point : 00.00.6C.D5 | 0000 0000 : 0000 0000 : 0110 1100 : 1101 0101 | " 法 " : some chinese
point : 00.00.65.B9 | 0000 0000 : 0000 0000 : 0110 0101 : 1011 1001 | " 方 " : some chinese
point : 00.00.4F.BF | 0000 0000 : 0000 0000 : 0100 1111 : 1011 1111 | " 便 " : some chinese
point : 00.00.4E.0A | 0000 0000 : 0000 0000 : 0100 1110 : 0000 1010 | " 上 " : some chinese
point : 00.00.62.4B | 0000 0000 : 0000 0000 : 0110 0010 : 0100 1011 | " 手 " : some chinese
point : 00.00.30.02 | 0000 0000 : 0000 0000 : 0011 0000 : 0000 0010 | " 。 " : some chinese