Schema Analyzer
An Open Source joint by Dan Levy ✨
Analyze column type & size summary from any input JSON array
Schema Analyzer is the core library behind Dan's Schema Generator.
Features
The primary goal is to support any input JSON/CSV and infer as much as possible. More data will generally yield better results.
- Heuristic type analysis for arrays of objects.
- Browser-based (local, no server necessary)
- Automatic type detection:
- ID - Identifier column, by name and unique Integer check (detects BigInteger)
- ObjectId (MongoDB's 96 bit/12 Byte ID. 32bit timestamp + 24bit MachineID + 16bit ProcessID + 24bit Counter)
- UUID/GUID (Common 128 bit/16 Byte ID. Stored as a hex string, dash delimited in parts: 8, 4, 4, 4, 12)
- Boolean (detects obvious strings
true
,false
,Y
,N
) - Date (Smart detection via comprehensive regex pattern)
- Timestamp (integer, number of milliseconds since unix epoch)
- Currency (62 currency symbols supported)
- Float (w/ scale & precision measurements)
- Number (Integers)
- Null (sparse column data helps w/ certain inferences)
- Email (falls back to string)
- String (big text and varchar awareness)
- Array (includes min/max/avg length)
- Object
- Detects column size minimum, maximum and average
- Includes data points at the 30th, 60th and 90th percentiles (for detecting outliers and enum types!)
- Handles some error/outliers
- Quantify # of unique values per column
- Identify
enum
Fields w/ Values - Identify
Not Null
fields - Nested data structure & multi-table relational output.
Getting Started
npm install schema-analyzer
: TypeSummary
Preview Analysis Results
What does this library's analysis look like?
It consists of 3 key top-level properties:
totalRows
- # of rows analyzed.fields: FieldTypeSummary
- a map of field names with all detected types (includes meta-data for each type detected, with possible overlaps. e.g. anEmail
is also aString
,"42"
is a String and Number)
Review the raw results below
Details about each field can be found below.
Sample input dataset for the example results above
id | name | role | createdAt | accountConfirmed | |
---|---|---|---|---|---|
1 | Eve | poweruser | eve@example.com |
01/20/2020 | undefined |
2 | Alice | user | ali@example.com |
02/02/2020 | true |
3 | Bob | user | robert@example.com |
12/31/2019 | true |
4 | Elliot Alderson | admin | falkensmaze@protonmail.com |
01/01/2001 | false |
5 | Sam Sepiol | admin | falkensmaze@hotmail.com |
9/9/99 | true |
AggregateSummary
Numeric and String types include a summary of the observed field sizes:
Number & String Range Object Details
Properties
min
the minimum number or string lengthmax
the maximum number or string lengthmean
the average number or string lengthpercentiles[25th, 33th, 50th, 66th, 75th, 99th]
values from theNth
percentile number or string length
Percentile is based on input data, as-is with out sorting.
Length Range Data
Range data for the length
of a String
field type:
"rank": 11 "count": 5 "length": "min": 15 "mean": 194 "max": 26 "p25": 15 "p33": 15 "p50": 18 "p66": 23 "p75": 23 "p99": 26
This is useful for defining strict length limits or minimums, for example as SQL servers often require..
Range data for a Date
fields value
:
"rank": 4 "count": 4 "value": "min": "2001-01-01T00:00:00.000Z" "mean": "2015-04-14T18:00:00.000Z" "max": "2020-02-02T00:00:00.000Z" "p25": "2020-02-02T00:00:00.000Z" "p33": "2020-02-02T00:00:00.000Z" "p50": "2019-12-31T00:00:00.000Z" "p66": "2019-12-31T00:00:00.000Z" "p75": "2001-01-01T00:00:00.000Z" "p99": "2001-01-01T00:00:00.000Z"
Notes
We recommend you provide at least 100+ rows. Accuracy increases greatly with 1,000 rows.
The following features require a certain minimum # of records:
- Enumeration detection.
- 100+ Rows Required.
- Number of unique values must not exceed 20 or 5% of the total number of records. (100 records will identify as Enum w/ 5 values. Up to 20 are possible given 400 or 1,000+.)
Not Null
detection.- where rowCount === field count
Full List of Detected Types
Unknown
ObjectId
UUID
Boolean
Date
Timestamp
Currency
Float
Number
Email
String
Array
Object
Null