als-document

1.1.2 • Public • Published

als-document: HTML Parser & DOM Manipulation Library

Overview

als-document is a powerful library for parsing HTML and manipulating the DOM structure on backend and frontend. It provides a robust and intuitive API for querying and interacting with DOM elements using selectors, making it a valuable tool for web developers.

Release notes

  • als-document is still on alpha testing. All tested features works fine, but through the use, discovering some bugs or things that should work different. For example in this release, changed the way for storing attributes with empty value.
  • Also, this release, has additional very powefull feature which is building cache for storing DOM tree as json and building back DOM from cache.

Installation

To install the als-document library, use the following npm command:

npm i als-document

Including the Library

The library provides three different files to cater to different module systems:

  1. index.js: This file uses the CommonJS module system. It's suitable for projects using Node.js or bundlers like Browserify or Webpack. The entry point in package.json for this file is "main".
const { parseHTML, Node, Query, TextNode, SingleNode,Root } = require('als-document');
  1. index.mjs: This file uses the ES Modules (ESM) system. It's suitable for modern JavaScript environments that support ESM. The entry point in package.json for this file is "module".
import { parseHTML, Node, Query, TextNode, SingleNode, Root } from 'als-document';
  1. document.js: By including this file, a constant variable named alsDocument is created, which wraps all the exports.
<script src="/node_modules/als-document/document.js"></script>
<script>
   const { parseHTML, Node, Query, TextNode, SingleNode, buildFromCache, cacheDoc, Root } = alsDocument
</script>

parseHTML

parseHTML is a function that takes an HTML string and constructs a DOM tree representation from it. It recognizes various HTML elements, such as comments, scripts, styles, and CDATA, and organizes them into nodes that can be manipulated and queried.

API:

parseHTML(html: string) -> Node

Parses an HTML string and returns a tree structure representing its content.

  • html: The HTML string to parse.
  • Returns: A Node object representing the root of the parsed HTML content tree.

Expected Outcome:

When using the parseHTML function, the output will be a tree of nodes representing the HTML content. Each node can be one of the following:

  • Node: A standard HTML element node with tag name, attributes, and child nodes.
  • SingleNode: Represents self-closing or void HTML elements.
  • TextNode: Represents text content in the HTML.

Each node will have a tag name, a dictionary of attributes, and a list of child nodes (if applicable).

Examples

const parsedHTML = parseHTML('<div class="container"><img src="image.jpg" alt="Image"/><p>Hello, world!</p></div>');

// The returned `parsedHTML` object will be a tree-like structure. 
// For instance, parsedHTML.childNodes[0] would represent the <div> element, 
// and parsedHTML.childNodes[0].childNodes[0] would represent the <img> element inside it.
const parsedScript = parseHTML('<script>console.log("Hello, world!");</script>');

// The returned `parsedScript` object will contain a `script` Node with a child node 
// holding the JavaScript code as text content.

Remember, the actual tree structure will be more complex and detailed, but the provided examples give you a basic understanding of how to navigate through the parsed result.

Node

Node is a fundamental class that represents an element node in the DOM tree. It provides functionality similar to the native DOM API in browsers, but with its own implementation.

Properties:

  • tagName: Represents the tag name of the element.
  • attributes: A dictionary of attributes and their values.
  • childNodes: An array of child nodes for the element.
  • isSingle: Boolean value to check if the node is a self-closing tag.
  • parentNode, previousElementSibling, nextElementSibling, children: Navigation properties to move through the DOM tree.
  • dataset, classList, style: Special properties for interacting with data-* attributes, classes, and inline styles.

Methods:

  • getAttribute, setAttribute, removeAttribute: Manipulate element's attributes.
  • remove: Removes the element from its parent.
  • innerHTML, outerHTML: Get and set the inner or entire HTML of the element.
  • querySelector, querySelectorAll: Find elements within the node based on CSS-like selectors.
    • limits: pseudo selector like :first-of-type or :checked not available
    • namaspace for tags some:namspace available
    • there are additional methods $ for querySelector and $$ for querySelectorAll
  • getElementsByClassName, getElementsByTagName, getElementById: Get elements by class, tag, or id respectively.
  • insertAdjacentElement, insertAdjacentHTML, insertAdjacentText: Insert content relative to the element.
  • appendChild: Add a child node to the element.
  • insert(place,element): place (0-3) or beforebegin,afterbegin,... eleemnt - raw html or element

SingleNode

SingleNode extends from the Node class and represents elements that don't have closing tags (self-closing tags) in HTML. Examples include <img>, <br>, and <!DOCTYPE>. This class has restricted methods and properties since these elements can't have child nodes.

TextNode

TextNode is a class that represents text content within the DOM. A TextNode holds raw text data and does not have child nodes.

Root node (extends Node)

Has additional getters and setters:

  • getter root.title
  • setter root.title
  • getter root.body
  • getter root.head

Examples:

const div = new Node('div');
div.setAttribute('class', 'container');

const img = new SingleNode('img', { src: 'image.jpg', alt: 'An image' });
div.appendChild(img);

console.log(div.outerHTML);  // Outputs: <div class="container"><img src="image.jpg" alt="An image"></div>

const p = new Node('p',{},div); // adding as last child to parent div
p.textContent = "Hello, world!";

const foundP = div.querySelector('p');
console.log(foundP.textContent);  // Outputs: Hello, world!

Query

The Query class is designed to parse CSS selector strings and transform them into a structured object format, providing detailed insights into each selector and its components.

By using the class, one can expect to transform a CSS selector string into an array of objects.

Each object will represent a selector, containing detailed information such as its tag, identifier, classes, attributes, and associated selectors if any. This can be useful for further processing or analysis of CSS selectors in an application.

Example

let q1 = 'html>body>div.tabs~.some[type $= "radio and some"]>p+div>.some-id .tab-content~input[disabled] div.some'
let result = new Query(q1).selectors
let result1 = Query.get(q1)
// result and result1 has to be same
console.log(result)

Result:

[
   {
      "query": "div.some",
      "tag": "div",
      "classList": [
         "some"
      ],
      "ancestors": [
         {
            "query": ".some-id",
            "classList": [
               "some-id"
            ],
            "parents": [
               {
                  "query": "div",
                  "tag": "div"
               }
            ],
            "prev": {
               "query": "p",
               "tag": "p",
               "parents": [
                  {
                     "query": ".some[0]",
                     "classList": [
                        "some"
                     ],
                     "attribs": [
                        {
                           check:(f),
                           "query": "[type$=\"radio and some\"]",
                           "name": "type",
                           "value": "radio and some",
                           "sign": "$="
                        }
                     ]
                  }
               ],
               "prevAny": {
                  "query": "div.tabs",
                  "tag": "div",
                  "classList": [
                     "tabs"
                  ],
                  "parents": [
                     {
                        "query": "html",
                        "tag": "html"
                     },
                     {
                        "query": "body",
                        "tag": "body"
                     }
                  ]
               },
               "group": "html>body>div.tabs~.some[0]>p"
            },
            "group": "html>body>div.tabs~.some[0]>p+div>.some-id"
         },
         {
            "query": "input[1]",
            "tag": "input",
            "attribs": [
               {
                  "query": "[disabled]",
                  "name": "disabled"
               }
            ],
            "prevAny": {
               "query": ".tab-content",
               "classList": [
                  "tab-content"
               ]
            },
            "group": ".tab-content~input[1]"
         }
      ],
      "group": "html>body>div.tabs~.some[type $= \"radio and some\"]>p+div>.some-id .tab-content~input[disabled] div.some"
   }
]

Attribs and check function

if attribute has value, attrib object will contain check function with one parameter for value to check.

let s = Query.get('[test^="some"]')[0]
console.log(s.attribs[0].check('some value test')) // true

buildFromCache and cacheDoc

Building DOM from raw html, usually takes tens of milliseconds. But now, you can build DOM once and save it's cache as regular stringified JSON. The caching process and building from cache takes less then 5ms for each and require realy low resources.

How it works?

const html = `` // some real html 255KB
const root = parseHTML(html); // 31.9ms
const cache = cacheDoc(root); // 2.4ms
const root1 = buildFromCache(cache); // 1.2ms
console.log(root.inneHTML === root1.innerHTML) // true

Package Sidebar

Install

npm i als-document

Weekly Downloads

47

Version

1.1.2

License

ISC

Unpacked Size

550 kB

Total Files

31

Last publish

Collaborators

  • alexsorkin