html5parser
A simple and fast html5 parser, the result could be manipulated like ECMAScript ESTree, especially about the attributes.
Introduction
Currently, all the public parsers, like htmlparser2
, parser5
, etc,
could not be used for manipulate attributes. For example: the htmlparser2
has startIndex
and endIndex
for tags and texts, but no range information
about attribute name and values. This project is used for resolve this problem.
Just added ranges for tags, texts, and attribute name and values, and else,
with the information of attribute quote type, (without or with '
/"
).
Install
# var npm npm install html5parser -S # var yarn yarn add html5parser
Quick Start
; ; ; html.walkast, ; // Should output:// hello
API
// Top level API, parse html to ast tree; // Low level API, get tokens; // Utils API, walk the ast tree;
Abstract Syntax Tree Spec
-
IBaseNode
: the base struct for all the nodes: -
IText
: The text node struct: -
ITag
: The tag node struct -
IAttribute
: the attribute struct: -
IAttributeValue
: the attribute value struct:// NOTE: the range start and end contains quotes. -
INode
: the exposed nodes:;
Warnings
This is use for HTML5, that means:
- All tags like
<? ... ?>
,<! ... >
(except for<!doctype ...>
, case insensitive) is treated asComment
, that meansCDATASection
is treated as comment. - Special tag names:
"!doctype"
(case insensitive), the doctype declaration"!"
: short comment"!--"
: normal comment""
(empty string): short comment, for<? ... >
, the leading?
is treated as comment content
Benchmark
Thanks for htmlparser-benchmark, I created a pull request at pulls/7, and its result on my MacBook Pro is:
$ npm test > htmlparser-benchmark@1.1.3 test ~/htmlparser-benchmark> node execute.js gumbo-parser failed high5 failed html-parser : 28.6524 ms/file ± 21.4282 html5 : 130.423 ms/file ± 161.478 html5parser : 2.37975 ms/file ± 3.30717 htmlparser : 16.6576 ms/file ± 109.840 htmlparser2-dom : 3.45602 ms/file ± 5.05830 htmlparser2 : 2.61135 ms/file ± 4.33535hubbub failed libxmljs failed neutron-html5parser: 2.89331 ms/file ± 2.94316parse5 failed sax : 10.2110 ms/file ± 13.5204