partsley
- A tool for parsing the web
Usage
Install from npm/yarn
$ npm install partsley
Use a "parselet" as a recipe/filter to parse a website.
Parselets are just plain JS objects, so can be serialized using e.g. YAML or JSON. Examples here are shown in YAML for brevity.
Here is an example of a parselet for grabbing business data from a Yelp page:
name: h1phone: .biz-phoneaddress: addressreviews(.review):- date: meta[itemprop=datePublished] @content name: .user-name a comment: .review-content p
As a module
You can also use partsley as a module:
; ;;
Tips
This is a very general purpose and flexible tool. But here are some tips for getting started.
Grabbing a list of data
Use a reference selector in the key and an Array as the value.
users(.user):- name: .name age: .age
Use transformation functions on data
Add a pipe (|) and the transformation name after the data selector.
user: name: .name age: .age|parseInt worth: .age|parseFloat someNumber: .age|Math.floor
By default functions in scope include any standard library functions. However, you're encouraged to bring your own functions into scope. You may consider e.g. curried libs like Ramda or Lodash FP, such as to expose transforms like toLower
and split(',')
:
;; ;;
Grabbing an attribute
Use a (@) symbol to reference an attribute.
user: name: .name nickname: .name@data-nickname
Have fun!