HtmlToJson is a package designed for converting HTML to JSON and JSON to THML using Rust and WebAssembly. While the package is relatively raw, it gets the job done.
The creation of this package was made possible by utilizing the HtmlEditor library.
To use HtmlToJson in your project, you can install it using npm:
npm install html-to-json-rs
After installing the package, you can use it in your project by importing it into your JavaScript or TypeScript code:
import {init, jsonToHtml, htmlToJson, NODES} from 'html-to-json-rs';
const main = async () => {
/**
* First you need to call init
* Init is an async function, the rest are normal
*/
await init();
// Constants of all node types, for check obj_type field
console.log(NODES);
// Example: Convert HTML to JSON
const htmlString = '<p>Hello, </p><span>World!</span>';
const jsonResult = htmlToJson(htmlString);
console.log(jsonResult);
// Example: Convert JSON to HTML
const jsonObject = [
{
obj_type: 'Element',
name: 'p',
attrs: [],
children: [{ Text: 'World!'}]
}
];
const htmlResult = jsonToHtml(JSON.stringify(jsonObject));
console.log(htmlResult);
}
-
htmlToJson(content: string, trim: boolean = true):
Converts an HTML string into a JSON string. The trim parameter (defaulting to true) determines whether whitespace within the HTML should be removed to minimize the number of elements in the resulting JSON (e.g., eliminating empty text nodes likeText(" \n ")
). Whitespace inside<pre>
elements is preserved regardless of the trim setting. -
jsonToHtml(content: string):
convert JSON string to HTML string.
Result of calling function htmlToJson
is a json string with array of JsonObj
, if you want to render it by yourself below the definition of this struct.
The JsonObj
type represents the core structure of the generated JSON. It has the following fields:
-
obj_type
(String): Indicates the type of the JSON object. Possible values include:-
"Element"
: Represents an HTML element. -
"Text"
: Represents text content. -
"Comment"
: Represents a comment in the HTML. -
"Doctype"
: Represents the document type declaration.
-
-
text
(String): The content of the Text or Comment element. -
name
(String): The name of the HTML element. For elements, this corresponds to the tag name. -
attrs
(Array of Tuples): Represents the attributes associated with the HTML element. Each attribute is a tuple of key-value pairs. -
children
(Array of JsonObj): Contains child elements if the current object is an HTML element. It represents the nested structure of the HTML. -
id
(String): Tag id, also present in theattrs
array (maybe in the future it will not be inattrs
) -
class
(String): Tag classes, also present in theattrs
array (maybe in the future it will not be inattrs
)
Fields text, attrs, children, id, class
- are optional, if id
is not presented in html tag, id
field will be undefined
.
Here's an example JSON structure for a simple HTML document:
<p class="ql-align-center">
Hello <span id="fav" style="color: rgb(230, 0, 0);">World</span>
</p>
[
{
"obj_type": "Element",
"name": "p",
"attrs": [
[
"class",
"ql-align-center"
]
],
"class": "ql-align-center",
"children": [
{
"obj_type": "Text",
"name": "Text",
"text": "Hello ",
"attrs": [],
"children": []
},
{
"obj_type": "Element",
"name": "span",
"attrs": [
["style", "color: rgb(230, 0, 0);"],
["id", "fav"]
],
"id": "fav",
"children": [
{
"obj_type": "Text",
"name": "Text",
"text": "World",
"attrs": [],
"children": []
}
]
}
]
}
]