npm module for inferring the semantic types of tabular data fields. Includes support for inferring Frictionless data packages json and incorporating semantic inference into the data package json.
There are 3 ways to use this package: semantic_infer (if you don't need data packages), datapackage_infer (for browser client based data package and semantic inference), and datapackage_infer_filesytem (for file system based data package and semantic inference).
Starting with version 1.2.0 You can use a config file to override default values.
semantic_infer only
semantic_infer takes a column name, an array of values and data type as input and returns an object if a match is found else returns 'None'
Example usage:
const semanticinfer = require('semantic_infer');
var val_arr = ['V8r 1g7', 'V8X 5m2'];
result2 = semanticinfer.semantic_infer.semantically_classify_field('Post_CD',val_arr,'string',true);
name: 'Postal code',
rdfType: '',
var_class: 'indirect_identifier'
Takes a data package with sample data in it and infers the fields, field data types (e.g., integer, string), and semantic types (e.g., postal code).
DataPackage rules:
- Datapackage object must have a "resources" array.
- Each resource must have a "name" field.
- Each resource must have a "data" or "path" field (but not both).
Semantic inference rules:
- Only resources with a "data" field will be sematically inferred.
- Providing a "SAVED_PATH_ATTR" attribute for data resources will result in the "data" field being replaced by a "path" field.
Example usage:
const semanticinfer = require('semantic_infer');
const descriptor = {
resources: [
name: 'example',
saved_path: 'example.csv',
data: [
['height', 'age', 'name'],
['180', '18', 'V8R1G6'],
['192', '32', 'B4D 4G1'],
const results = semanticinfer.datapackage_infer.infer_datapackage(descriptor,true);
results.then(function(result) {
"resources": [
"name": "example",
"profile": "tabular-data-resource",
"encoding": "utf-8",
"schema": { "fields": [
{ "name": "height", "type": "integer", "format": "default" },
{ "name": "age", "type": "integer", "format": "default" },
"name": "name",
"type": "string",
"format": "default",
"var_class": "indirect_identifier",
"rdfType": ""
"missingValues": [ "" ]
"path": "example.csv"
"profile": "data-package"
Infers data package (including semantic inference) json for all csv and txt files in the current directory and its sub-directories.
Example usage:
const semanticinfer = require('./datapackage_infer_filesystem');
You may optional pass in an object to add to the data package as top level attributes of the data package.
const source = {"sources": [{
"title": "my source location",
"path": "path/to/my/datafile"
"resources": [ ... ],
"profile": "data-package",
"sources": [{
"title": "my source location",
"path": "path/to/my/datafile"
How to override default settings
Overriding the default settings are supported by the config npm module. Create a "config" directory in your project folder and within that folder a "default.json" file with the settings you wish to override.
See semantic_settings.js and datapackage_settings.js files for all the settings that can be overriden. Make sure you have a corresponding pattern for each label if you override semantic settings.
Example contents of default.json:
{"name":"Phone number","rdfType":"","var_class":"direct_identifier"},
{"name":"First name","rdfType":"","var_class":"direct_identifier"},
{"name":"Last name","rdfType":"","var_class":"direct_identifier"},
{"name":"Middle name","rdfType":"","var_class":"direct_identifier"},
{"name":"Full name","var_class":"direct_identifier"},
{"name":"Postal code","rdfType":"","var_class":"indirect_identifier"},
{"name":"Street address","rdfType":"","var_class":"direct_identifier"},
Optional calcuation of number of records in tabular resources
You can optionally calculate the number of records in a CSV by setting DATA_PACKAGE_FILE_RECORD_NUM_RECORDS=1 in your config file. Works only for linux environments.