A Javascript API for Securibox Parse
Securibox Parse JavaScript API is an open source software released under LGPL-3.0 license.
You are welcome to report bugs or create pull requests on github.
The easiest way to install sbx-parse-api is with npm
.
npm install sbx-parse-api
Alternatively, download the source.
git clone https://github.com/Securibox/parse-api-js.git
import {Parse, AuthMethods} from "sbx-parse-api";
// Using JWT authentication
let jwt = "thisIsMyEncodedToken";
authMethod = AuthMethods.JWT;
var parser = new Parse(url, authMethod, jwt);
// OR, with basic authentication:
// user = "MyUsername";
// password = "MySecretPassword";
// authMethod = AuthMethods.BASIC;
// var parser = new Parse(url, authMethod, user, password);
Supported authentication methods:
-
AuthMethods.BASIC
: basic authentication using username and password -
AuthMethods.JWT
: JSON Web Tokens using tokens
After you have instanciated a Parse
object, you can use it to call the API. Every call will return a Promise
. Only requests returning a 200 HTTP code will result in a fulfilled promise and trigger the .then()
method; everything else will fall into the .catch()
method and return an error structured as {"error": [Error Object]}
.
The API has four methods:
-
classify(docs, take=5)
: takes a set of documents and labels them. Internally, the classification is done in two steps: first a fast algorithm returns a list of candidate labels; then a slower high-precision algorithm choses among thetake
most probable labels and determines the document's specific layout. Thetake
optional parameter is a number between 1 and 9 (5 is the default value). -
parse(docs, take=5, mode=undefined)
: takes a set of documents, classifies and parses them. Along with thetake
parameter (same as inclassify
), it accepts an optionalmode
parameter, that can be one of the following:-
undefined
(default) - handles every document as it is -
"split"
- splits the document into pages and handles every page as a separate document.
-
-
guess(docs)
: takes a set of partially parsed documents with similar layout and tries to infere the missing data. This method can be used to speed up data entry when theparse
method fails. -
feed(docs)
: takes a set of documents and stores them for the next training cycles. This method must be used with wrongly classified or wrongly parsed documents after the errors have been corrected by the user; it allows the application to learn and improve over time.
The docs
object is used on both Requests and Responses. The structure is always an array of the following dictionary:
-
id
: the document identifier, must be unique in the set -
buffer
orbytes
orcontent
: the content of the PDF document. Thebuffer
is waiting for anArrayBuffer
, thebytes
is waiting for an array of bytes while thecontent
is the content of the PDF inbase64
encoding. Only exists on Requests. -
labelId
(optional): the document label identifier-
parse()
andclassify()
: if filled, the document will be only layout-classified -
feed()
: used to train the models - Response: will be filled with the best matching label
-
-
detailedLabelId
(optional): the document layout identifier-
parse()
andclassify()
: if filled, the document will not be classified - Response: will be filled with the best matching layout
-
-
extractedData
: the extracted data fields. Array object, every item contains aname
and avalue
field. Returned onparse()
andguess()
, should be filled onfeed()
and (for some documents) onguess()
. -
errors
: an array containing processing errors for the specific document. Storing errors by document allows you to successfully process the rest of the batch.
let docs = [];
let doc = {id: "Doc_01", content: "Base64ContentMustGoHere"};
docs.push(doc);
parser.parse(docs).then(function(parsedDocs){
// parsedDocs is an array of documents
alert("The doc contains " + parsedDocs[0].extractedData);
});