@elasticpath/epcc-search-ast-generator-js
TypeScript icon, indicating that this package has built-in type declarations

1.6.0 • Public • Published

Filtering Introduction.

This project contains all the logic for processing filtering in EPCC. This document provides an introduction into how this conceptually works.

A problem we often have is that we have a string, and want to classify it as being valid or invalid. For instance here are some things you might care about strings:

  1. Does a string start with a capital letter.
  2. Is a string a valid email address.
  3. Is a string a palindrome (written the same forwards and backwards like redivider, civic, radar, level)
  4. Is a string a valid Java program.

There are different techniques that can be used to check the above, including writing code directly and using a regular expression. But some things are more complicated, like case #3 and #4.

In the case of the last one, you would likely define a grammar which is a set of rules that say what a valid string looks like. A common technique is to proceed in two phases:

  1. Lexing - Converting the string into chunks of things called tokens, each token representing a some part of the string (e.g., in a java program you might have a token for each keyword). More details are in the lexer.js file
  2. Parsing - Write rules for checking the output of the lexer. We use a recursive decent parser, which means that each grammar rule calls a function that checks for a match and may recursively call other functions.

The grammar that we are attempting to parse in Augmented Backus-Naur Form (https://www.rfc-editor.org/rfc/rfc5234) is the following:

(Some of the way this is structured is to make the code easier to read)

root = term  *( ":" term )
term /= binary_term 
term /= unary_term 
term /= vararg_term 

binary_term = binary_operator "(" literal "," literal ")"
unary_term = unary_operator "(" literal ")"
vararg_term = vararg_operator "(" literal "," literal *( "," literal ) ")"

binary_operator = "eq" / "like" / "gt" / "ge " / "lt" / "le" / "text"
unary_operator = "is_null"
vararg_operator = "in"

; these literal represent standard unencoded literals.
literal = 1*(ALPHA / DIGIT / "$" / "-" / "_" / "*" / "." / " " / "{" / "}" / "@" / "|")

; we treat binary, unary & vararg operators as literals in cases where people want to do something like eq(foo, ne).
literal /= binary_operator
literal /= unary_operator
literal /= vararg_operator

; 0x22 is a quotation mark
literal /= 0x22 *string_characters 0x22
; there is probably a better way to write this in ABNF involving hex like 0x01-0x21 0x23-0x5B 0x5D-0x7E
string_character = any_character but not " or \
string_character /= escape_sequence
escape_sequence = "\" 0x22

A more interested reader is encouraged to read: Domain-Specific Languages, by Fowler, the two techniques we use here are: (Regex Table Lexer p.239-244, Recursive Descent Parser p.245-255).

Readme

Keywords

none

Package Sidebar

Install

npm i @elasticpath/epcc-search-ast-generator-js

Weekly Downloads

647

Version

1.6.0

License

MIT

Unpacked Size

73.7 kB

Total Files

27

Last publish

Collaborators

  • bsteinbach.ep
  • samblacklock
  • field123
  • itaccounts-ep