JSON Table Schema
A utility library for working with JSON Table Schema in Javascript.
Version v0.2.0 has renewed API introduced in NOT backward-compatibility manner. Previous version could be found here.
Table of Contents
- Schema - a javascript model of a JSON Table Schema with useful methods for interaction
- Field - a javascript model of a JSON Table Schema field
- Infer - a utility that creates a JSON Table Schema based on a data sample
- Validate - a utility to validate a schema as valid according to the current spec
- Table
Goals
Contributing
Installation
npm install jsontableschema
Library requires Promise
to work properly, and need to be sure that Promise
available globally. You are free to choose any Promise polyfill.
Components
Let's look at each of the components in more detail.
Schema
A model of a schema with helpful methods for working with the schema and supported data. Schema instances can be initialized with a schema source as a url to a JSON file or a JSON object. The schema is initially validated (see validate below), and will raise an exception if not a valid JSON Table Schema.
var Schema = Schema;
var model = 'http://someurl.com/remote.json'
or
var model = JSON OBJECT
instance always returns Promise
model
Following methods are available on Schema
instances:
castRow(items, failFast = false, skipConstraints = false)
- convert the arguments given to the types of the current schema 1descriptor
- JSON representation ofSchema
descriptionfields
- returns an array of Field instances of the schema's fieldsforeignKeys
- returns the foreign key property for the schemagetField(fieldName, index = 0)
- returns an instance of Field by field name (fieldName
) 2hasField(fieldName)
- checks if the field exists in the schema by it's name. Returns a booleanheaders
- returns an array of the schema headersprimaryKey
- returns the primary key field for the schema as an arraysave(path)
- saves the schema JSON to provided localpath
. ReturnsPromise
1 Where the option failFast
is given, it will raise the first error it encounters, otherwise an array of errors thrown (if there are any errors occur)
2 Where the optional index argument is available, it can be used as a positional argument if the schema has multiple fields with the same name
Field
Class represents field in the Schema
castValue(value, skipConstraints)
- returns a value cast against the type of the field and it's constraints 1constraints
- returns the constraints object for a givenfieldName
format
- returns the format of the fieldname
- returns the name of the fieldrequired
- returnsboolean
testValue(value, skipConstraints)
- returns boolean after a check if value can be casted against the type of the field and it's constraints 1type
- returns the type of the field
1 Skip constraints if set to false
, will check all the constraints set for field while casting or testing the value
Field types
Data values can be cast to native Javascript types. Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema.
'name': 'birthday' 'type': 'date' 'format': 'default' 'constraints': 'required': True 'minimum': '2015-05-30'
Following code will not raise the exception, despite the fact our date is less than minimum constraints in the field, because we do not check constraints of the field descriptor
var dateType = field
And following example will raise exception, because we set flag 'skip constraints' to false
, and our date is less than allowed by minimum
constraints of the field. Exception will be raised as well in situation of trying to cast non-date format values, or empty values
try var dateType = field catche // uh oh, something went wrong
Values that can't be cast will raise an Error
exception.
Casting a value that doesn't meet the constraints will raise an Error
exception.
Note: the unique
constraint is not currently supported.
Available types, formats and resultant value of the cast:
Type | Formats | Casting result |
---|---|---|
string | default1, uri, email, binary | String |
integer | default | Number |
number | default, currency | Number2 |
boolean | default | Boolean |
array | default | Array |
object | default | Object |
date | default, any, fmt | Date object |
time | default, any, fmt | Date object |
datetime | default, any, fmt | Date object |
geopoint | default, array, object | Accordingly to format3 |
geojson | default, topojson | Accordingly to format3,4 |
1 default
format can be not specified in the field descriptor
2 in case value has 00 after point (1.00), it will return Number(1).toFixed(2), which is actually String '1.00'
3 default format returns String
4 topojson is not implemented
Infer
Given headers and data, infer
will return a JSON Table Schema as a JSON object based on the data values. Given the data file, example.csv:
id,age,name1,39,Paul2,23,Jimmy3,36,Jane4,28,Judy
Call infer
with headers and values from the datafile:
var parse = ;var fs = ;var infer = infer; fs;
The schema
variable is now a JSON object:
fields: name: 'id' title: '' description: '' type: 'integer' format: 'default' name: 'age' title: '' description: '' type: 'integer' format: 'default' name: 'name' title: '' description: '' type: 'string' format: 'default'
It possible to provide additional options to build the JSON schema as 3rd argument of infer
function. It is an object with following possible values:
rowLimit
(integer) - limit number of rows used byinfer
explicit
(boolean) - addrequired
constraints to fieldsprimaryKey
(string, array) - addprimary key
constraintscast
(object) - object with cast instructions for types in the schema. For example:
var parse = ;var fs = ;var infer = infer; fs;
The schema
variable will look as follow:
fields: name: 'id' title: '' description: '' type: 'integer' format: 'default' required: true name: 'age' title: '' description: '' type: 'integer' format: 'default' required: true name: 'name' title: '' description: '' type: 'string' format: 'default' required: true primaryKey: 'id' 'name'
In this example:
rowLimit
: only two rows of values from example.csv
will be proceed to set field type. It can be useful in cases when data in CSV
file is not normalized and
values type can be different in each row. Consider following example:
id,age,name1,39,Paul2,23,Jimmy3,thirty six,Janefour,28,Judy
In this case by limiting rows to 2, we can build schema structure with correct field types
cast
: every string
value will be casted using email
format, number
will be tried as a currency
format, and date
- as any
format
Validate
Given a schema as JSON object, validate
returns Promise
, which success for a valid JSON Table Schema, or reject with array of errors.
var validate = validate;var schema = fields: name: 'id' title: '' description: '' type: 'integer' format: 'default' name: 'age' title: '' description: '' type: 'integer' format: 'default' name: 'name' title: '' description: '' type: 'string' format: 'default' ;
Note: validate()
validates whether a schema is a validate JSON Table Schema accordingly to the (specifications)[http://schemas.datapackages.org/json-table-schema.json]. It does not validate data against a schema.
Table
A javascript model of a table (schema+source of data)
Instance always returns Promise
. In case if schema object is not valid, it will reject promise.
Source of data can be:
- array of objects with values, represent the rows
- local CSV file
- remote CSV file (URL)
- readable stream
Following methods are available on Table
instances:
iter(callback, failFast, skipConstraints)
1,2 - iterate through the given dataset provided in constructor and returns converted dataread(keyed, extended, limit)
- Read part or full source into array.keyed
: row looks like{header1: value1, header2: value2}
extended
: row looks like[row_number, [header1, header2], [value1, value2]]
.- Low-level usage: when you need all information about row from stream but there is no guarantee that it is not malformed. For example, in goodtables you cannot use keyed because there is no guarantee that it will not fail - https://github.com/frictionlessdata/goodtables-py/blob/master/goodtables/inspector.py#L205
- High-level usage: useful when you need to get row + row number. This row number is exact row number of source stream row. It's not like counted or similar. So if you skip first 9 rows using skipRows first row number from iter(extended=True) will be 10. It's not possible to get this information on client code level using other approach - iter() index in this case will start from 0.
limit
: limt the number of rows return tolimit
save(path)
- Save source to file locally in CSV format with,
(comma) delimiter. ReturnsPromise
1 If failFast
is set to true
, it will raise the first error it encounters, otherwise an array of errors thrown (if there are any errors occur). Default is false
2 Skip constraints if set to true
, will check all the constraints set for field while casting or testing the value. Default is false
var jts = ;var Table = jtsTable; var model = SCHEMA SOURCEvar { // ... do something with converted items // iter method convert values row by row from the source}model
Goals
- A core set of utilities for working with JSON Table Schema
- Use in other packages that deal with actual validation of data, or other 'higher level' use cases around JSON Table Schema (e.g. Tabular Validator)
- Be 100% compliant with the the JSON Table Schema specification (we are not there yet)
Contributing
Please read the contribution guideline:
Thanks!