Filtering Introduction.

This project contains all the logic for processing filtering in EPCC. This document provides an introduction into how this conceptually works.

A problem we often have is that we have a string, and want to classify it as being valid or invalid. For instance here are some things you might care about strings:

Does a string start with a capital letter.
Is a string a valid email address.
Is a string a palindrome (written the same forwards and backwards like redivider, civic, radar, level)
Is a string a valid Java program.

There are different techniques that can be used to check the above, including writing code directly and using a regular expression. But some things are more complicated, like case #3 and #4.

In the case of the last one, you would likely define a grammar which is a set of rules that say what a valid string looks like. A common technique is to proceed in two phases:

Lexing - Converting the string into chunks of things called tokens, each token representing a some part of the string (e.g., in a java program you might have a token for each keyword). More details are in the lexer.js file
Parsing - Write rules for checking the output of the lexer. We use a recursive decent parser, which means that each grammar rule calls a function that checks for a match and may recursively call other functions.

The grammar that we are attempting to parse in Augmented Backus-Naur Form (https://www.rfc-editor.org/rfc/rfc5234) is the following:

(Some of the way this is structured is to make the code easier to read)

root = or_expression eof

# Or support
or_expression = and_expression *( "|" and_expression )  

# And support
and_expression = group_expression  *( ":" group_expression )
and_expression =/ group_expression  *( "," group_expression ) // Legacy chaining operator

# Parenthesis support
group_expression =/ search_term
group_expression =/ "(" or_expression ")"


search_term = binary_term 
search_term =/ unary_term 
search_term =/ vararg_term 

binary_term = binary_operator "(" literal "," literal ")"
unary_term = unary_operator "(" literal ")"

# The code slightly deviates from this to handle weird cases for backwards compatability reasons, I hate
vararg_term = vararg_operator "(" literal "," literal *( "," literal ) ")"

# Everything below is largely handled as a function of the lexer
binary_operator = "eq" / "like" / "gt" / "ge " / "lt" / "le" / "text"
unary_operator = "is_null"
vararg_operator = "in"

; these literal represent standard unencoded literals.
literal = 1*(ALPHA / DIGIT / "$" / "-" / "_" / "*" / "." / " " / "{" / "}" / "@" / "|")

; we treat binary, unary & vararg operators as literals in cases where people want to do something like eq(foo, ne).
literal =/ binary_operator
literal =/ unary_operator
literal =/ vararg_operator

; 0x22 is a quotation mark
literal =/ 0x22 *string_characters 0x22
; there is probably a better way to write this in ABNF involving hex like 0x01-0x21 0x23-0x5B 0x5D-0x7E
string_character = any_character but not " or \
string_character =/ escape_sequence
escape_sequence = "\" 0x22

A more interested reader is encouraged to read: Domain-Specific Languages, by Fowler, the two techniques we use here are: (Regex Table Lexer p.239-244, Recursive Descent Parser p.245-255).

Released Package

https://www.npmjs.com/package/@elasticpath/epcc-search-ast-generator-js

@elasticpath/epcc-search-ast-generator-js

Filtering Introduction.

Released Package

Readme

Keywords

Package Sidebar

Install

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

@elasticpath/epcc-search-ast-generator-js

Filtering Introduction.

Released Package

Readme

Keywords

Package Sidebar

Install

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads