Normalization
The purpose of the normiliztion moudles is to take the data as it is given to us by an api call to a source and to pull out data that we are interested in.
The Schema
{
id: String
ts: Number, // unix milliseconds
text: String,
href: String,
is_bad: Boolean,
direction: String,
source: String,
stream: String,
_raw: String,
_version: String,
user_info: {
user_id: String, // User ID reported by our clients
source_user_id: String, // User ID generated by the source of this event
name: String,
email: String,
ip: String,
avatar: String,
nickname: String,
phone: String,
city: String,
continent_code: String,
country: String,
postal_code: String,
region: String,
address_line1: String,
address_line2: String,
timezone: String,
country_code: String,
user_agent: String,
is_employee: Boolean,
title: String, // Title of the person at their company (ex: VP Product)
birthday: String
company: {
name: String,
id: String,
industry: String,
employee_count: Number,
plan: String // The plan this company is on (ex: "premium", "enterprise")
},
created_at: Number, // Unix ms
age: Number,
gender: String,
website: String, // The user's website if they have one and you know it
custom_attiributes: String // A stringified json blob of custom user attributes
}
}
id
A unique identifier for an event in Windsor. Usually obtained by concatenating the name of the source with the unique identifier for a source. This ensure that even if there are collisions between id's from different sources, they will still have a unique identifier in Windsor.
Example: event.id
ts
The time at which an event was generated. (NOT when an event was updated)
event.date_created
text
The text that the user will see when viewing this event. Usually some custom text along with a some fields from the object
Example: `${event.direction ? "Up" : "Down"}vote for board ${event.board.name}`
href
Link to the event if one is given or can be generated with the data provided
Example: event.board.url
is_bad
True if this event is an error, false if this event is not an error. If there is no way to determine if the event is an error then null.
Example Boolean(event.errorCode)
direction
"inbound" if the event was invoked by a user action (ex: any Segment.track
event).
"outbound" if the event was invoked by the company (ex: sending users a marketing email via Sendgrid ).
null
if ambiguous (ex: person opening an email or any Segment.identify
event)
Example inbound
source
The source which this event comes from
Example: Sentry
stream
The endpoint from the source that this data came from
Example conversations
_raw
The raw event stringified.
Example JSON.stringify(event)
_version
The version of the normalizer that this event came from
Example require(./package).version
userInfo
This is all data that we can find about a user from an event. All the subfields here are self describing.
Architecture
Index.js
The export of this directory is a function which takes a source and returns a stream that knows how to normalize all events from that source. The stream expects all events to be given as RECORD
's in the singer format and also emits events as RECORD
's in the singer format
The stream
If the stream encounters a Schema message it ignore that message and instead print the standard schema If the stream encounters a record it will look to see if it knows about the stream that the record is from. If it does it will normalize the record. Else it will print a WARNING message to the error stream and throw the message away. If the stream encounters any other line (A state message for example) it will ignore the line.
Sources
For each source there is a directory. The export of the directory for a given source is a map from each supported stream of that source, to a function which is capable of normalizing events from that source (by normalize I mean put into the format specified above in The Schema). The stream references these maps when it encounters events.
Schema
This is just a json schema of the above normalized schema (see The Schema above)
Explain
explain.js is a file which is meant to help debug normalizers. By running ``` doppler local "node ./explain --source mixpanel" for instance, you can see mixpanel events from a mix of different teams with the current normalizer applied.
Explain comes with a source argument (required, the source you want to see events for) a team argument (to specify a team you want to see events for) and a stream argument (if you want to only see events from one stream)