crawler-url-parser
An URL parser for crawling purpose
Installation
npm install crawler-url-parser
Usage
Parse
const cup = ; //// parse(current_url[,base_url])let result = cup; console;// http://question.stackoverflow.com/aaa/bbb/ddd?q1=query1&q2=query2 console;// null console;// http://question.stackoverflow.com/aaa/bbb/ddd?q1=query1&q2=query2 console; // question.stackoverflow.com console; // stackoverflow.com console; // question console; // http: console; // /aaa/bbb/ddd console; // q1=query1&q2=query2 console; // 2
Parse with baseURL
const cup = ; //// parse(current_url[,base_url])let result = cup; console;// http://question.stackoverflow.com/aaa/bbb/ddd?q1=query1&q2=query2 console;// http://question.stackoverflow.com/aaa/bbb/ccc console;// http://question.stackoverflow.com/aaa/bbb/ddd?q1=query1&q2=query2 console; // question.stackoverflow.com console; // stackoverflow.com console; // question console; // http: console; // /aaa/bbb/ddd console; // q1=query1&q2=query2 console; // 2
Extract
const cup = ; //// extract(html_str,current_url);let htmlStr='<html><body> \ <a href="http://best.question.stackoverflow.com">subdomain</a><br /> \ <a href="http://faq.stackoverflow.com">subdomain</a><br /> \ <a href="http://stackoverflow.com">updomain</a><br /> \ <a href="http://www.google.com">external</a><br /> \ <a href="http://www.facebook.com">external</a><br /> \ <a href="http://question.stackoverflow.com/aaa/bbb/ccc">sublevel</a><br /> \ <a href="http://question.stackoverflow.com/aaa/bbb/zzz">sublevel</a><br /> \ <a href="http://question.stackoverflow.com/aaa/">uplevel</a><br /> \ <a href="http://question.stackoverflow.com/aaa/ddd">samelevel</a><br /> \ <a href="http://question.stackoverflow.com/aaa/eee">samelevel</a><br /> \ <a href="http://question.stackoverflow.com/aaa/ddd/eee">internal</a><br /> \ <a href="http://question.stackoverflow.com/zzz">internal</a><br /> \</body></html>'; let currentUrl= "http://question.stackoverflow.com/aaa/bbb";let urls = cup; console; //subdomainconsole; //subdomainconsole; //updomainconsole; //externalconsole; //externalconsole; //sublevelconsole; //sublevelconsole; //uplevelconsole; //samelevelconsole; //samelevelconsole; //internalconsole; //subdomain
Level
const cup = ; //// gettype(current_url,base_url);let level = cup;console; //sublevel level = cup;console; //uplevel level = cup;console; //samelevel level = cup;console; //external
Test
mocha
ornpm test
- More than 200 unit test cases.
- Check test folder and quickstart.js for extra usages.
Support
I use this package actively myself, so it has my top priority. You can chat on WhatsApp about any infos, ideas and suggestions.
Submitting an Issue
If you find a bug or a mistake, you can help by submitting an issue to GitLab Repository
Creating a Merge Request
GitLab calls it merge request instead of pull request.
License
MIT licensed and all it's dependencies are MIT or BSD licensed.