This project investigates the usefulness and performance of client-side autocompletion using http requests over the web.
The datastructure used to provide this functionality is a distributed patricia tree. The patricia tree is created in a streaming fashion. The nodes of the tree are saved in linked data fragments. Adding nodes to the tree when written will parse the needed fragments, update them and write them back.
Nodes in the tree contain the data objects that were inserted with a given representation. Multiple different data objects can be contained in a single node if they have the same representation. These objects are stored as hydra:member elements in the jsonld fragments generated by the tree.
Nodes in the tree also contain suggestions (also provided as hydra:member elements in the jsonld fragments). These suggestions are data points from deeper in the tree, with the highest given score. These permit the client to stop querying the tree earlyer when an answer is found to the question in one of these suggestions.
The used ontology for the description of linked data trees can be found at https://w3id.org/tree# Querying the data fragments can be done using the linked data tree browser npm package 'ldtree-browser'.
github + node
git clone https://gitlab.ilabt.imec.be/lodi/RDF-autocomplete.git
npm install
node bin/main.js {data source folder} {data folder} {fragment folder} {tree file folder} {tree file name} {fragment size} {max fragment cache elements}
npm
npm install rdf-patricia-tree
const treeManager = require('rdf-patricia-tree');
The tree is managed with a TreeManager object .
This object allows the creation, reading and writing of a linked data tree.
The folder layout in these methods is as follows.
{sourceDirectory}/{dataLocation}/fragment{id}.jsonld<br/>
{sourceDirectory}/{treeLocation}/{treeFile}.jsonld<br/>
The sourceDirectory can be published online as base-folder for the data.
The interface of the treeManager object is:
// creating a tree using the treeManager object.
var tree = treeManager.createTree(sourceDirectory, dataFolder, maxCachedFragments = 10000, maxFragmentSize = 50, customBalancer = null)
// writing a tree using the treeManager object.
treeManager.writeTree(tree, treeLocation, treeFile);
// reading a tree using the treeManager object.
var readTree = treeManager.readTree(sourceDirectory, treeLocation, treeFile, dataLocation, maxCachedFragments)
The maxCachedFragments arguments defaults to 10000. This is the maximal amount of fragments that can be held in cache at the same time while creating the tree. Picking this number too low will result in frequent cache flushes, resulting in decreased performance.
The maxFragmentSize arguments defaults to 5O. This is the maximal amount of nodes that can be contained in a single tree fragment.
Reading a tree from a file or creating a new linked data tree returns a Tree object. Adding data to the tree is done by calling the following method on the tree object.
tree.addData(representation, data);
The representation argument is the searchstring on which the data will be placed in the Patricia Tree. Multiple different data objects can be stored in the same node in the tree. The data object can be any object except for undefined or null.
After all data has been added to the tree, the cache has to be flushed to make sure that all the latest data is written back into the tree files. We need to call:
tree.doneAdding();
after we have added all our data. This will ensure all the data has been written back from the cache.
npm install
node bin/testing_main.js {data source folder} {data folder} {fragment folder} {tree file folder} {tree file name} {fragment size} {max fragment cache elements}
This test file will test if all data has been added successfully, as well if there Statistics are printed in the file statistics.csv
Custom balancer classes can be passed to the treeManager.createTree() method. A custom balancer class needs to extend the lib/fragment_balancers/TreeBalancer.js class and follow its interface.
module.exports = class DefaultBalancer extends TreeBalancer{
balance(fragment);
}
The balance method is called when a fragment is bigger than the maximal fragment size provided. Care should be taken to only replace nodes to a different fragment, and to not remove connections between different nodes. Documentation on Node and Fragment methods can be found in the files /lib/Node.js and /lib/Fragment.js, or on the github page. https://gitlab.ilabt.imec.be/lodi/RDF-autocomplete/blob/master/lib/Node.js https://gitlab.ilabt.imec.be/lodi/RDF-autocomplete/blob/master/lib/Fragment.js
Inspiration can be taken from the default balancer implementation: https://gitlab.ilabt.imec.be/lodi/RDF-autocomplete/blob/master/lib/fragment_balancers/DefaultBalancer.js
The use of the root_node in the Fragment objects is only for balancing purposes, and only have to be updated if you want to use them during rebalancing.
Take note that this does not happen automatically, and should be manually set using fragment.set_root_node(node)
.