fm-index.jsx
Synopsis
FM-index is the fastest full text search algorithm using a compressed index file. This is FM-index for JSX/JS/AMD/Common.js.
Motivation
FM-index is the alternate search algorithm of an inverse index algorithm. FM-index has the following advantages:
- It doesn't need to split word (like N-gram). It is good for CJK languages.
- It can recreate original document from the index file
- Index file is compressed.
- Easy to control the performance and the index file size.
Code Example
Use from JSX
; static : void { var fm = ; fm; fm; thisfm; console; // -> [5] }
Use from node.js
var FMIndex = FMIndex;
Use from require.js
// use fm-index.amd.js;
Use via standard JSX function
Use via global variables
Installation
$ npm install fm-index.jsx
You should add the following modules to package.json
if you want to use from JSX:
- burrows-wheeler-transform.jsx (0.3.x)
- wavelet-matrix.jsx (0.3.x)
- binary-io.jsx (0.3.x)
- bit-vector.jsx (0.4.x)
- binary-support.jsx (0.2.x)
If you want to use this library from other JSX project, install like the following:
$ npm install fm-index.jsx --save-dev
or add like these lines to your parent project's package.json
:
devDependencies: "fm-index.jsx": "~0.3.0" peerDepenencies: "fm-index.jsx": "~0.3.0"
And add node_modules/fm-index.jsx/src
as a search path.
You should add to peerDepenencies
if your product is library.
API Reference
class FMIndex()
Constructor.
FMIndex.push(str : string) : void
Append string.
FMIndex.contentSize()
Return total length of pushed string. It is available before build()
.
FMIndex.build(ddic : int, maxChar : int = 65535) : void
Build search index. ddic
is a cache density. (1 / ddic) * 100
% is a actual cache rate.
If ddic == 1, densty = 100%, it provides maximum speed but it use match memory and storage.
Initial recommendation value is 50.
maxChar
is a maximum character code. If you reduce this, you can save memory.
FMIndex.size()
Return contetn size. It is available after build()
.
FMIndex.search(keyword : string) : int[]
Return position list that includes keyword
.
FMIndex.getSubstring(pos : int, len : int) : string
Return original document content.
FMIndex.dump(output : BinaryOutput) : void
Export bit-vector.
FMIndex.load(input : BinaryInput) : void
Import bit-vector.
Development
JSX
Don't be afraid JSX! If you have an experience of JavaScript, you can learn JSX quickly.
- Static type system and unified class syntax.
- All variables and methods belong to class.
- JSX includes optimizer. You don't have to write tricky unreadalbe code for speed.
- You can use almost all JavaScript API as you know. Some functions become static class functions. See reference.
Setup
To create development environment, call following command:
$ npm install
Repository
- Repository: git://github.com/shibukawa/fm-index.jsx.git
- Issues: https://github.com/shibukawa/fm-index.jsx/issues
Run Test
$ grunt test
Build
$ grunt build
Generate API reference
$ grunt doc
Author
- shibukawa / yoshiki@shibu.jp
License
MIT
Complete license is written in LICENSE.md
.