PaCo
javascript monadic parser combinators
This is a tool for building parsers and parse, so that you do not have to be a parser expert to do it.
const myParser=
skip(char('#'))
.then(many(letter).join())
.skip(char('-'))
.then(digits.join().as(parseInt))
parse(">")(myParser)("#AN-123")
outputs:
Right { value: [ 'AN', 123 ] }
All parsers can chain up or group to form other parsers that still can chain up and group.
Some available metaparsers like many()
, some()
, skip()
can accept other parsers or metaparsers.
Some parsers are already a composition with metaparsers, that is the case of digits
, it will perform many(digit)
.
abbreviations
a single string can now be used in place of a non-chaining parser and it will translate either to a char
or string
parser.
digits.then('.')
is valid asdigits.then(char('.'))
Building objects
.to(tag)
extender will grab the current parsing group result and store it on object key tag
if an object does not exist yet it is created, if there is already an object on the results tail it will be used.
const kchk=
string("temp: ")
.then(
option("",oneOf("-+"))
.then(digits)
.join().as(parseInt).to("temp")
.then(char('K').to("unit"))
.verify(o=>o[0].unit==='K'&&o[0].temp>=0)
.failMsg("positive Kelvin!")
)
#>res(">")(kchk.parse("temp: 12K")).value
[ 'temp: ', { temp: 12, unit: 'K' } ]
or failing:
#>res(">")(kchk.parse("temp: -12K")).value
'>error, expecting positive Kelvin! but found `-` here->-12K...'
this, along .verify
, .post
and .as
allow event callbacks and all sort of automation during the parsing, if not then let me know.
It's now possible to parse this: enable/disable by config
Enable with config.backtrackExclusions=true
#>config.optimize=true//turn on optimizations on construction
#>config.backtrackExclusions=true//track exclusions on optimization
#>digits.join().as(parseInt).then(count(2,digit).join()).parse("12345")
Right { value: Pair { a: [ 123, '45' ], b: '' } }
.then
, .skip
and others can inject exclusion checks on the chain at construction time.
We allow the parser base to be re-writen at construction time, keeping away all checking from parse time.
many
will peek this injected parameters and possibly exclude them from the sequence match
one can still call
.optim
even with optimizations turned off
however backtrack will still respect its flag
optimization chain is not very populated yet, there are many things to fit in...
Config
**module exported variable **
var config={
optimize:false,//all optimizations
backtrackExclusions: false//exclude next selector root from current loop match
}
-
optimize disable all optimizations when false
-
backtrackExclusions exclude next parser root from the current selection
backtrack can be dismissed for well writen parsers
(there is still a ling way to go here)
.then | .skip
The chaining is done with .then
or .skip
, the first combines the output, while the second will drop it.
.or
Parsers can alternate with .or
.notFollowedBy(p)
parser succeeds only if p
fails
.lookAhead(p)
predicated p
with no consume before parsing, if p
fails the parsing will fail
.excluding(p)
predicated p
before parsing, if p
succeedes the parsing will fail
this could be achieved by grouping parsers instead of separate them, but some grammars are writen so
must apply to same level parser. using .excluding(char(..))
at character level on a string level parser will have no effect
digits.excluding(oneOf("89"))//this will have no effect
many(digit.excluding(oneOf("89")))//but this will
if optimizatumizing with exclusion back-track, the the first will have effect
as PaCo will re-write the base to be exactly the second
.as
Parse output can be formated with .as
, it will apply to the parser or group where inserted. .as
will accept an output transformer function.
Output transformations can stack up.
.join
.join()
and .join(«sep»)
are shortcuts for .as(mappend)
and .as(o=>o.join(«sep»))
Parsers can group by nesting ex: x.then( y.then(z).join() )
, here the join
will only apply to the (y.z) results.
TODO: this (grouping) is not fully generalized yet
.verify
.verify(func,msg)
function func
will receive the parse group result (list) and should return true
if approved or false
to resume in error with message msg
.
.post
.post(f)
post-processing the result, this is still a static parser definition. Function f
return will replace the previous result.
.failMsg
.onFailMsg(msg)
provides a message for a failing parser
.parse
.parse("...")
can be used to quick feed a string to any parser.
The result will include both input and output state.
ex:
digits.parse("123a")
use parse
function to get only output
all transformation definitions should be applyed to the parser and not to the result, so .parse
should be the last item of the group.
a parser can be stored, combined, passed around and perform parsing on many contents many times, all transitory state is kept outside.
-- failing --
this parse will fail as it expects at least one digit
#>parse(">")(some(digit))("#123")
Left { value: 'error, expecting digit but found `#` here->#123' }
Composition examples
parse(">")(
many(
some(digit.or(letter)).join()
.skip(spaces)
).join("-")
)("As armas e os baroes")
expected result
Right { value: [ 'As-armas-e-os-baroes' ] }
const nr=
skip(spaces)
.then(digits).join().as(parseInt)//get first digits as number
.then(many(//then seek many separated by `,` or '|'
skip(spaces)
.skip(char(',').or(char('|')))//drop the separators (not included in output)
.skip(spaces)
.then(digits.join().as(parseInt))
)).as(foldr1(a=>b=>a+b))//transform output by adding all values
parse(">")(nr)(" 12 , 2 | 1")
expected result
Right { value: [ 15 ] }
above parser could be writen using sepBy
, we were just emphasizing the combinatory
Parsers
-
satisfy(f) uses a function
char->bool
to evaluate a character -
char(c) matches charater
c
-
cases(c) case insensitive character
c
match -
oneOf("...") matches any given string character
-
noneOf("...") matches any character not included in string
-
range(a,z) matches characters between the given ones (inclusive)
-
digit any digit
0-9
-
lower lower case letters
a-z
-
upper upper case letters
A-Z
-
letter any letter
a-z
orA-Z
-
alphaNum letter or digit
-
hexDigit hexadecimal digit
-
octDigit octal digit
-
space single space
-
tab single tab
-
nl newline
-
cr carriage return
-
blank tab or space
-
spaces optional many space
-
blanks optional many white space
-
spaces1 one or more spaces -
blanks1 one or more white spaces -
digits optional many digits
-
eof end of file
-
string("...") match with given string
-
caseInsensitive("...") non case-sensitive string match -
regex(expr) match with regex expression
#>parse(">")(regex("#([a-zA-Z]+)[ -]([0-9]+)"))("#an-123...")
Right { value: [ 'an', '123' ] }
-
skip(...) ignore the group/parser output
-
many(p) optional many ocourences or parser
p
targets. This parser never fails as it can return an empty list. -
some(p) one or more ocourences of parser
p
targets -
manyTill(p,end) one or more ocourences of parser
p
terminating with parserend
-
optional(p) parse
p
if present, otherwise ignore and continue parsing -
choice[ps] parse from a list of alternative parsers, this is just an abbreviation of
.or
sequence. -
count(n)(p) parses
n
ocourences ofp
-
between(open)(close)(p) parses
p
surounded byopen
andclose
, dropping the delimiters.
Be sure to exclude the delimiters from the content or provide any other meaning of content end
#>parse(">")(between(space,space,some(noneOf(" "))).join())(" ab.12 ")
Right { value: [ 'ab.12' ] }
-
option(x)(p) parses
p
or returnsx
if it fails, this parser never fails.
#>parse(">")(option(["0"])(digit))("1")
Right { value: [ '1' ] }
#>parse(">")(option(["0"])(digit))("")
Right { value: [ '0' ] }
#>parse(">")(option(["0"])(digit))("#")
Right { value: [ '0' ] }
-
optionMaybe(p) parse
p
and returnsJust
the result orNothing
if it fails, this parser never fails -
sepBy(p)(sep) parses zero or more ocourences of
p
separated bysep
and droping the separators, this parser never fails. -
sepBy1(p)(sep) parses one or more ocourences of
p
separated bysep
and droping the separators, this parser never fails. -
endBy(p)(sep)(end) parses zero or more ocourences of
p
separated bysep
droping the separators and terminating withend
-
endBy1(p)(sep)(end) parses one or more ocourences of
p
separated bysep
droping the separators and terminating withend
-
none non-consume happy parser.
none is an identity parser, will just output the given input as a successful parse. So it never fails or consumes.
We use it to turn binary combinators into unary metaparsers. That is the case of.skip(...)
, it uses thenone
parser to be available as a unary modifierskip()
.
none
can do so for any binary combinator and can apear where you want to disable a part.
using
none
assep
withendBy(p,sep,end)
whill silentrly skip thesep
need.
try and consume
Untill now, all failing parsers do not consume... lets see... while so, no need to inplement *try
to be more accurate, failing parsers do consume, we need the failing point on the reports, however the upper parser might pick the starting point to move on, ignoring the consume (as try do).
Parsers basic IO
For now parsers accept a state pair of (input,output) and will return Either
:
- on error: a pair of an error and the input state.
- on success: a pair of parsed content and the input state.
*expect changes on this arguments format (changed on v1.1)
testing a simple parser
#>digits.run(Pair([],"123"))
Right { value: Pair { a: [ '1', '2', '3' ], b: '' } }
This is the basic form of parsing (feeding a parser).
However a parse
function is available, it will perform as the former but gives only output state or a fancy error message.
#>parse(">")(digits)("123")
Right { value: [ '1', '2', '3' ] }
Same with
#>digits.parse("123")
Right { value: Pair { a: [ '1', '2', '3' ], b: '' } }
the only difference is that this last one, as the first will give full output, including the input state.
utility
parse
parse(filename)(parser)(input string or stream)
the filename is merelly a decoration here, to be used on error report
#>parse(">")(letter.or(digit))("1")
Right { value: [ '1' ] }
#>parse(">")(letter.or(digit))("a")
Right { value: [ 'a' ] }
#>parse(">")(letter.or(digit))("#123")
Left {
value: 'error, expecting letter or digit but found `#` here->#123' }
direct parse
#>letter.or(digit).parse("1")
Right { value: Pair { a: '', b: [ '1' ] } }
#>letter.or(digit).parse("a")
Right { value: Pair { a: '', b: [ 'a' ] } }
#>letter.or(digit).parse("#123")
Left { value: Pair { a: '#123', b: 'letter or digit' } }
desugared parse
#>letter.or(digit).run(Pair("1",[]))
Right { value: Pair { a: '', b: [ '1' ] } }
#>letter.or(digit).run(Pair("a",[]))
Right { value: Pair { a: '', b: [ 'a' ] } }
#>letter.or(digit).run(Pair("#123",[]))
Left { value: Pair { a: '#123', b: 'letter or digit' } }
res(r)
process a parser return to produce a result or error message, discarding input state description.
#>res(">")(letter.then(digits).parse("123"))
Left { value: '>error, expecting letter but found `1` here->1...' }
without res()
procesing
#>letter.then(digits).parse("123")
Left { value: Pair { a: '123', b: 'letter' } }
.expect
as a consequence of the error report system we got a parser description for free, no great effort was put to it thou
const p=
optional(skip(char('#')))
.then(some(letter).join())
.skip(char('-').or(spaces1))
.then(digits.join().as(parseInt))
description:
#>console.log(p.expect)
optional skip character `#`
then (at least one letter)->join()
skip character `-` or at least one space
then ((digits)->join())->as(parseInt)
using:
#>console.log(parse(">")(p)("#AN-123"))
Right { value: [ 'AN', 123 ] }
Chronology
1.1
Using character domain analisys to detect parser overlap
[0-9] ∩ ([0-9] ∪ [a-z])
<=> ([0-9] ∩ [0-9]) ∪ ([0-9] ∩ [a-z])
<=> ((∅)) ∪ (([0-9]))
<=> [0-9] ∪ ∅
<=> [0-9]
version 1.1 is a full re-write with focus on speed
- output pair content swaped
-
many1
replaced bysome
-
onFailMsg
replaced byfailMsg
- Parsers are no longuer functions (they are classes and do not derive from Function anymore) so they must be called with
.run
instead of direct function call.
< 1.1
some experiments with composition and parser analysis, coding was easy with no performance care.
this parser is inspired but not following "parsec"