json-grammar - a grammar-based validator for JSON structures
JSON Grammar, or JSG, is a language for describing the structure of JSON documents. It can be used for documentation, describing what a service or tool consumes or emits, and validation, testing conformance of some data to that description.
See a simple online demo.
Language
A JSG schema is composed of objects, rules and values.
Objects are represented by a production name followed by a "{
", some named members or rule references, and "}
".
These describe JSON objects like { "street":"Elm", "number":"123b" }
.
A member is composed of an attribute name, a ":", and a type: { "street":NAME, "number":NUMBER }
.
A type can be a constant, value pattern, rule name, or a list of types:
By convention, value patterns labeled with ALL CAPS.
JSON Grammar | matching JSON | |
---|---|---|
doc { status:"ready" } |
|
|
doc { street:NAME no:NUM } NAME : .*; NUM : [0-9]+[a-e]?; |
|
|
doc { street:(NAME|"*"|TEMPLATE) } NAME : .*; TEMPLATE : '{' .* '}'; |
|
|
doc { street:nameOrTemplate } nameOrTemplate = NAME | "*" | TEMPLATE NAME : .*; TEMPLATE : '{' .* '}'; |
|
|
doc { street:[nameOrTemplate{2,} } nameOrTemplate = NAME | "*" | TEMPLATE NAME : .*; TEMPLATE : '{' .* '}'; |
|
A schema can be composed with no rules but rule names can help with:
- factoring common patterns
- shortening member types
- applying semantic names to patterns.
Values
Values are represented by a terminal name followed by a ":" and a regular pattern (c.f. lex) nad a ";". They can reference each other (but not circularly) allowing a value to be composed of other values. The syntax is reminiscent of EBNF or W3C language specifications, e.g.:
Value pattern | matching JSON | |
---|---|---|
'@' START+ ('-' MIDCHAR+)* START : [a-zA-Z]; MIDCHAR : START | [0-9]; NUM : [0-9]+[a-e]?; |
|
Code points in values can be specified by:
- a symbol in a quoted string (
'-'
,"x-"
,'"'
), - a symbol in a character range (
[a-z]
) - "
\u
" followed by a hexidecimal numeric unicode code point. These can appear in quoted strings and character ranges.
If we had a disdain for writing the letter 'a
' and the symbol '@
', we could write the above value pattern as:
\u0040 START+ ('-' MIDCHAR+)* START : [\u0061-z\u0041-Z]; MIDCHAR : START | [0-9]; NUM : [0-9]+[\u0061-e]?;
.Directives
- .IGNORE takes a list of properties to globally ignore.
- .TYPE takes a single property to act as a type discriminator which must match the production name.
JSON Grammar | JSON | |
---|---|---|
doc { a:STRING } STRING=".*" |
passes | { "a":"hi" } |
doc { a:STRING } STRING=".*" |
fails | { "type":"doc", "a":"hi" } |
.IGNORE type; doc { a:STRING } STRING=".*" |
passes | { "type":"doc", "a":"hi" } |
doc { a:STRING, type:STRING } STRING=".*" |
passes | { "type":"doc", "a":"hi" } |
.TYPE type; doc { a:STRING } STRING=".*" |
passes | { "type":"doc", "a":"hi" } |
.TYPE type; doc { a:STRING } STRING=".*" |
fails | { "type":"docXXX", "a":"hi" } |
You can push the .TYPE property into each object if you want (and have to if it's not universal). | ||
Error reports on schemas with a .TYPE directive tend to be terser as failing a discriminator check shortcuts the tests of all the other object properties. |
Contributing
All PRs welcome. Please run tests first:
Testing
JSG has a set of built-in tests. It also tests JSON structures from the ShEx and SPARQL.js repositories. This presumes specific paths between where these are checked out. You can accomplish this by checking everything out in a directory, e.g. github
:
mkdir github
cd github
git clone git@github.com:shexSpec/shexTest shexSpec/shexTest
git clone git@github.com:RubenVerborgh/SPARQL.js RubenVerborgh/SPARQL.js
# now to get JSG, initialize it and run the tests:
git clone git@github.com:ericprud/jsg ericprud/jsg
cd ericprud/jsg
npm install
npm run test-all
test/test.js
has an easy way to enter passing and failing tests, e.g.
["ShExJ.jsg", "empty.json", true],
["ShExJ.jsg", "bad-noType.json", "type"],
["ShExJ.jsg", "bad-wrongType.json", false],
which tests that empty.json
passes, bad-noType.json
fails with an error mentioning "type" and bad-wrongType.json
fails for some reason.