CSpell Grammar is used to generate a parser. The Parser is used to add context / scope to parts of a document, making it easier to define the parts to spell spell checked.
This is to address the issues and limitations related to ignoreRegExpList
and includeRegExpList
.
The parser is use to add scope
to sections of a document. The scope
can then be used to apply spell checking rules.
Example: Only check comments and strings
rules:
'*': false
comment: true
string: true
It can be even more powerful like controlling the language settings based upon scope.
rules:
comment:
language: en
string:
language: en,fr
dictionaries: ['marketing-terms'],
caseSensitive: true
string.javascript:
caseSensitive: false
Rules are applied in the order they match the scope of the text.
When checking JavaScript files with the above example rules:
- strings will:
- use the locale
en,fr
- the
marketing-terms
dictionary will be enabled -
caseSensitive
will betrue
- use the locale
- everything else:
- locale:
en
-
caseSensitive
will befalse
- locale:
At its core, cspell-grammar
uses a simplified form of the TextMate grammar.
Why use a grammar parser? Couldn't a colorizer / highlighter or a language AST be used? At one level, needs of the spell checker are simpler and different from colorizers or language AST parsers. The goal of a spell checker is to spell check relevant text. The spell check does not need to care about the syntactical correctness of a document or presentation.
The goal of a grammar parser for the spell checker is to allow the user to decide:
- What text should be checked.
- Which dictionaries (or languages) should be used.
- Are accents and case important
Note: CSpell is a pure JavaScript application, so including the Oniguruma is not an option.
- Parsing a document should be fast - meaning the grammar should be as simple as possible to meet the needs of the spell checker and not focus on scope detail. This is where a colorizer grammar is not a good fit to be used.
- AST's are a bit of an overkill for a spell checker. They provide too much detail while not bringing much benefit from the detail.
Consider the following bit of LaTeX:
k\"{o}nnen
können
For the spell checker to work correctly, the \"{o}
should be transformed into ö
before it is checked against the German dictionary.
This creates a few challenges.
Possible options:
- Simple whole document substitution
- Challenges
- It is not context aware and might replace the wrong text.
- It changes the location of the words and messes up issue reporting (some sort of Map would be needed to get the correct line / character offset).
- Challenges
- Advantages
- Easy to implement except for the context and mapping.
- Scope level substitution
Transformations occur at the scope level.
- Challenges
- offset mapping is still and issue (maybe)
- need a way to merge text with adjacent scopes after transformation
- Advantages
- it is context aware
- Challenges