npm install compromise-output
Demo
const nlp = require('compromise')
nlp.extend(require('compromise-output'))
let doc = nlp('The Children are right to laugh at you, Ralph')
// generate an md5 hash for the document
doc.hash()
// 'KD83KH3L2B39_UI3N1X'
// create a html rendering of the document
doc.html({ '#Person+': 'red', '#Money+': 'blue' })
/*
<pre>
<span>The Children are right to laugh at you, </span><span class="red">Ralph</span>
</pre>
*/
.hash()
this hash function incorporates the term pos-tags, and whitespace, so that tagging or normalizing the document will change the hash.
Md5 is not considered a very-secure hash, so heads-up if you're doing some top-secret work.
It can though, be used successfully to compare two documents, without looping through tags:
let docA = nlp('hello there')
let docB = nlp('hello there')
console.log(docA.hash() === docB.hash())
// true
docB.match('hello').tag('Greeting')
console.log(docA.hash() === docB.hash())
// false
if you're looking for insensitivity to punctuation, or case, you can normalize or transform your document before making the hash.
let doc = nlp(`He isn't... working `)
doc.normalize({
case: true,
punctuation: true,
contractions: true,
})
nlp('he is not working').hash() === doc.hash()
// true
.html({segments}, {options})
this turns the document into easily-to-display html output.
Special html characters within the document get escaped, in a simple way. Be extra careful when rendering untrusted input, against XSS and other forms of sneaky-html. This library is not considered a battle-tested guard against these security vulnerabilities.
let doc = nlp('i <3 you')
doc.html()
// <div>i <3 you</div>
you can pass-in a mapping of tags to html classes, so that document metadata can be styled by css.
let doc = nlp('made by Spencer Kelly')
doc.html({
'#Person+': 'red',
})
// <pre><span>made by </span><span class="red">Spencer Kelly</span></pre>
The library uses .segment()
method, which is documented here.
by default, whitespace and punctuation are outside the html tag. This is sometimes awkward, and not-ideal.
the method returns html-strings by default, but the library uses Jason Miller's htm library so you can return React Components, or anything:
doc.html(
{},
{
bind: React.createElement,
}
)
MIT