hd-html-parser
TypeScript icon, indicating that this package has built-in type declarations

2.1.0 • Public • Published

HD HTML Parser

npm version

A fast and simple HTML parser that returns a Document object with various methods to manipulate and query the HTML elements.

Features

  • [x] React Native support
  • [x] All Browser support
  • [x] Node.js support

Installation

To install the hd-html-parser package, run the following command:

npm i hd-html-parser

Usage

To use the hd-html-parser package, you need to import it and call the HtmlParser function with a HTML string as an argument. It will return a Promise that resolves to a Document object or null if the HTML string is invalid. For example:

import HDHtmlParser from "hd-html-parser";

// declare a HTML string
const html = `
<html>
<head>
  <title>Example</title>
</head>
<body>
  <h1>Hello, world!</h1>
  <p>This is a paragraph.</p>
  <ul>
    <li>Item 1</li>
    <li>Item 2</li>
    <li>Item 3</li>
  </ul>
</body>
</html>
`;

// parse the HTML string
HDHtmlParser(html).then((document) => {
    // do something with the document object
    console.log(document.getHtml()); // prints the whole HTML string
    console.log(document.querySelector("h1").getText()); // prints "Hello, world!"
    console.log(document.querySelectorAll("li").length); // prints 3
});

Document Methods

The Document object returned by the hd-html-parser package has the following methods:

  • querySelector(selector: string): Document | null - Returns the first element that matches the given CSS selector, or null if none is found.
  • querySelectorAll(selector: string): (Document | null)[] - Returns an array of all elements that match the given CSS selector, or an empty array if none is found.
  • getHtml(): string | null - Returns the HTML string of the element, or null if the element is not valid.
  • getAttribute(name: string): string | null | undefined - Returns the value of the attribute with the given name, or null if the attribute does not exist, or undefined if the element is not valid.
  • getText(): string | null - Returns the text content of the element, or null if the element is not valid.
  • getParent(): Document - Returns the parent element of the element, or the element itself if it has no parent.
  • getChildren(): Array<Document> - Returns an array of the child elements of the element, or an empty array if the element has no children.
  • getOuterHTML(): string | null - Returns the HTML string of the element including its opening and closing tags, or null if the element is not valid.
  • getInnerHTML(): string | null - Returns the HTML string of the element excluding its opening and closing tags, or null if the element is not valid.
  • getNext(): Document | null - Returns the next sibling element of the element, or null if the element has no next sibling.
  • getPrev(): Document | null - Returns the previous sibling element of the element, or null if the element has no previous sibling.
  • getListData(selector: string, itemsSelector: any, meta?: any): Array<object> - Returns an array of objects that represent the data of the list elements that match the given selector. The itemsSelector parameter is an object that maps the keys of the data objects to the CSS selectors of the list items. The optional meta parameter is an object that maps the keys of the data objects to the values of the meta attributes of the list elements. For example:

License

This project is licensed under the terms of the MIT license. See the LICENSE file for details.

Package Sidebar

Install

npm i hd-html-parser

Weekly Downloads

2

Version

2.1.0

License

ISC

Unpacked Size

14.8 kB

Total Files

4

Last publish

Collaborators

  • hoangdaicntt