WebSpy
My name is Spy. WebSpy.
Description
WebSpy silently sends its agents out there, to the darkest creepiest corners of the internet,
to keep an eye and let you know if someone has messed with web pages you care for...
(OR, to be less dramatic, use WebSpy if you want to be notified when a web page has changed)
How To Use WebSpy
- Create a new directory for the project. The recommended structure is:
/project-root
|-- results
|-- agents
| |-- agent-1.js
| |-- agent-2.js
| |-- ...
|-- operator.js
|-- operator-a.js
- Install WebSpy node package:
npm install webspy --save
- Create as many WebSpy Agent files as you wish, one per URL (read more about agent later)
- (Optional) - Create WebSpy Operator for managing your agents
- Run the agent file directly or use the operator (same as you do with any other node file):
node ./agents/agent-1.js
WebSpy Components
Agent
The agent module is the most important WebSpy building block, which is merely a configuration file. Each agent can collect data from a single URL.
Agent Sample File
'use strict'; var Agent = Agent; moduleexports = Agent;
Agent Hooks
Hooks are basically methods that allow you to run code at specific times in a component's life cycle. Most hooks on WebSpy allow you also to modify arguments which are relevant for the specific hook; for example: you may change the output file path before WebSpy saves it. IMPORTANT TO KNOW If you do modify the hooks argument, make sure you return them at the end of the method for the modification to take place. When hook has more than one argument, you should return all arguments as an object, i.e.
{ arg1 = 'WebSpy'; arg2 = 'Is Awesome'; return arg1 arg2 ;}
-
willScrape(url, selectors)
- Occurs before agent begins to scrape data. Arguments can be modified.url
{String} The web page URL from which data will be scrapedselectors
{String} selectors The data selectors object
-
didScrape(current)
- Occurs after agent finishes the scraping. Argument can be modified.current
{Object} Current execution results
-
willCompare(previous, current)
- Occurs before agent begins to compare execution results. Arguments can be modified.previous
{Object} Previous execution resultscurrent
{Object} Current execution results
-
didCompare(comparison)
- Occurs after agent finishes the comparison. Argument can be modified.comparison
{Object} Comparison results
-
willNotify(slack, comparison)
- Occurs before agent sends comparison results. Arguments can be modified.slack
{Object} Slack instance configurationcomparison
{Object} Comparison results
-
didNotify(status)
- Occurs after agent sends comparison results.status
{Object} Slack sending status
-
willSave(file, current)
- Occurs before agent saves current execution results to a file. Arguments can be modified.file
{String} Execution results file pathcurrent
{Object} Current execution results
-
didSave(file)
- Occurs after agent saves current execution results to a file.file
{String} Execution results file path
Running an Agent
Running an agent is as simple as running any other node module:
node ./my-agent.js
However, you might want to check out WebSpy Operator, which offers advanced
features, such as agent scheduler (keep reading).
Operator
WebSpy Operator is an optional component, which basically runs the agents for you on
specific date and time or even on a recurring basis.
Notice you can add multiple operators in the same file, just clone the operator variable
and extend it, too. This becomes handy when you want to to run your agents on
different schedules.
Agent Sample File
'use strict'; var Operator = Operator; moduleexports = Operator
CRON Syntax
When specifying your cron values you'll need to make sure that your values fall within the ranges. For instance, some cron's use a 0-7 range for the day of week where both 0 and 7 represent Sunday. We do not:
-
Units
- Seconds: 0-59
- Minutes: 0-59
- Hours: 0-23
- Day of Month: 1-31
- Months: 0-11
- Day of Week: 0-6
-
Patterns
*
- any1-4,7
- range*/2
- steps
Examples
* * * * * *
- run every second30 * * * * *
- run every 30 seconds0 */5 * * * *
- run every 5 minutes0 0 1 * * 1-5
- run every hour only on working days0 30 14 1 6 *
- run on July 1st, 14:30 (month is 0 based index)
Glob Syntax Examples (for path definition)
*
Matches 0 or more characters in a single path portion?
Matches 1 character[...]
Matches a range of characters, similar to a RegExp range. If the first character of the range is ! or ^ then it matches any character not in the range.!(pattern|pattern|pattern)
Matches anything that does not match any of the patterns provided.?(pattern|pattern|pattern)
Matches zero or one occurrence of the patterns provided.+(pattern|pattern|pattern)
Matches one or more occurrences of the patterns provided.*(a|b|c)
Matches zero or more occurrences of the patterns provided@(pattern|pat*|pat?erN)
Matches exactly one of the patterns provided**
If a "globstar" is alone in a path portion, then it matches zero or more directories and subdirectories searching for matches. It does not crawl symlinked directories.
Tips
- WebSpy shines as a productivity tool when it runs constantly. If you have your own server, you might want to try PM2 as your operators process manager. Otherwise, there are some really cool free hosting services, such as Heroku and more.
Useful Links
- Mustache Template Engine
- json-query
- Slack Webhooks
- CRON syntax
- Valid timezones
- Free node.js hosting services: