Simple way to play with a site requests and responses.
Spiritual heir of puppeteer-page-proxy. Just the same behavior but in more extend way with promises.
Using a proxy is optional.
Supported proxy(through proxy-agent):
http://proxy-server-over-tcp.com:3128
https://proxy-server-over-tls.com:3129
-
socks://username:password@some-socks-proxy.com:9050
(username & password are optional) -
socks5://username:password@some-socks-proxy.com:9050
(username & password are optional) socks4://some-socks-proxy.com:9050
pac+http://www.example.com/proxy.pac
Tested in puppeteer/chromium
only!
- Grease chips are missing.
- Some headers are missing.
- [CORS] OPTIONS requests (preflight requests) are missing before the actual request will be executed.
- [CORS] Headers are close to being correct, but they're not.
- WebSockets will be handled by the browser (IP leak may occur if you are using a proxy in the package but not in Puppeteer itself).
- Optimization can be bad on high load.
Using npm
npm install automation-extra-interception-proxy
Using yarn
yarn add automation-extra-interception-proxy
This package solves next problems:
Time to time required to reach information from the browser request. By default you can reach easily only to headers information. If you want to just read all responses you also can do that but time to time it will throw errors by one of next reasons.
At first page can be already closed and then your code will throw an error.
At second some sites using service workers for requesting some information. Unfortunately you cant handle this situation without manual requesting and then converting to puppeteer.
If you want just adjust some requests or responses you should do that manually.
Example. You want get original request/response and do some adjustments. This package will help do that easily. You just getting what you want by single function call.
Yes, puppeteer already have a proxy support throw additional process arguments. But you should manually maintain proxy credentials each request(?, not sure). Also you cant use socks proxy(?, not sure).
Even with cooperative mode you can not make your decisions asynchronously. Here you can chain of handlers with will proceed request decision one by one. Also you can say that this is latest decision and no need to ask another handlers in the chain. Also in one handler you can can adjust request/response for the next one.
We live in the world where almost each website have internal api. When you are looking at the network tab in Chrome DevTools its easy to handle where and what. Data already yours but you cant just get what you want. But you have to fight for the information you desired for. So lets fight together!
- wrapPage
- IConfig
- continue
- ignoreResponseBodyIfPossible
- flushLocal
- recordError
- recordInternalError
- recordWarning
- RequestMode
- RequestStage
- IRequestOptions
- _bodyError
- IAbortReason
- InterceptionProxyRequest
Add interception ability to the page (sample)
-
page
Puppeteer.Page Page for future interceptions -
config
IConfig?
Returns Promise<InterceptionProxyPageConfig>
Plugin configuration object
Puppeteer' "Cooperative Intercept Mode" priority
This package using own way to manage cooperation
Use only if you know what it does
ignore
- Plugin will do nothing about original request
native
- Plugin will just listen to the original request/response data and all requests will fulfilled by puppeteer itself. But some plugin functionality can be unavailable.
managed
- Plugin will do all requests by requestHandlers
or by himself. All plugin features will be available.
Default - managed
Type: RequestMode
Proxy for request
Automatically sets agent
property using proxy-agent
Examples:
http://proxy-server-over-tcp.com:3128
https://proxy-server-over-tls.com:3129
-
socks://username:password@some-socks-proxy.com:9050
(username & password are optional) -
socks5://username:password@some-socks-proxy.com:9050
(username & password are optional) socks4://some-socks-proxy.com:9050
pac+http://www.example.com/proxy.pac
Default null
Type: (string | null)
Your agent hot handling requests
Sets by proxy
property. Cleans proxy
property if sets directly.
Default null
Type: (Agent | null)
Meta
-
deprecated: Use
proxy
property instead. Deprecated because of possibly incoming request handling rework.
You can handle all plugins messages
Type: any
Request timeout in milliseconds(actual execution only)
Type: number
src/interfaces/base.ts:103-103
If you didn't changed request or response, let puppeteer handle this request by himself
Default: false
Type: boolean
src/interfaces/base.ts:113-113
If you did not use the plugin' response object it will not retrieve response from puppeteer for better performance
Applies for native
mode only
Type: boolean
src/interfaces/base.ts:121-121
For old versions of puppeteer, plugin should handle cookies by himself.
Enable this option, if you are have an issue with cookie.
Recommended to upgrade your puppeteer version instead.
Type: boolean
src/interfaces/base.ts:129-129
It is not recommended to use. Use another library properties to do it.
Modify requests in more advanced way through interaction with got.
Type: Hooks
src/interfaces/classes.ts:42-42
Will send gathered response back to the puppeteer immediately
If response not collected yet will call getResponse first.
Returns Promise<void>
src/interfaces/mixins.ts:14-14
If you are using this specific method global ignoreResponseBodyIfPossible
will be ignored
Type: boolean
src/interfaces/mixins.ts:43-43
Flush local configuration
-
key
any? If provided will flush only specific parameter at local level
Returns void
src/interfaces/mixins.ts:55-59
Pass an error to the logger
-
message
any Flow description -
error
any? Original error object -
meta
...any non specific meta information
Returns void
src/interfaces/mixins.ts:65-68
Pass an internal error to the logger
-
message
any Flow/error description -
meta
...any non specific meta information
Returns void
src/interfaces/mixins.ts:74-77
Pass an warn to the logger
-
message
any Flow/error description -
meta
...any non specific meta information
Returns void
src/interfaces/network.ts:7-21
Plugin mode for handling requests
src/interfaces/network.ts:11-11
Plugin will do nothing about original request
Type: string
src/interfaces/network.ts:16-16
Plugin will just listen to the original request/response data and all requests will fulfilled by puppeteer itself. But some plugin functionality can be unavailable.
Type: string
src/interfaces/network.ts:20-20
Plugin will do all requests by himself. All plugin features will be available.
Type: string
src/interfaces/network.ts:26-65
Current stage of the request
src/interfaces/network.ts:35-35
We got a new request from the puppeteer witch includes all necessary information about.
At this stage we can adjust request.
Type: string
src/interfaces/network.ts:42-42
The request in requesting process
At this stage we unable to adjust request but still have not response to go forward.
Type: string
src/interfaces/network.ts:50-50
We got response from the request witch probably was modified by the user and now user can adjust the response.
At this stage we can adjust response. At this stage the user will unable to override the request anymore.
Type: string
src/interfaces/network.ts:57-57
We sent final response of the request to the browser.
Its too late to adjust request or response.
Type: string
src/interfaces/network.ts:64-64
Page were closed and we unable do anything
From technical perspective sentResponse
looks just the same
Type: string
src/interfaces/network.ts:72-100
Plugin' request options. The request have significant difference with Puppeteer' request.
Can be modified. All changes will be applied to the actual Puppeteer' request and will be executed
src/interfaces/network.ts:78-78
Request method.
If request were executed you will unable to change this property.
Type: Method
src/interfaces/network.ts:85-85
Request url.
If request were executed you will unable to change this property.
Type: string
src/interfaces/network.ts:92-92
Request headers.
If request were executed you will unable to change this property.
Type: Headers
src/interfaces/network.ts:99-99
Request body.
If request were executed you will unable to change this property.
Type: (string | Buffer | undefined)
src/interfaces/network.ts:108-108
Type: string
src/interfaces/network.ts:129-129
This option will override the response
-
aborted
- An operation was aborted (due to user action). -
accessdenied
- Permission to access a resource, other than the network, was denied. -
addressunreachable
- The IP address is unreachable. This usually means
that there is no route to the specified host or network.
-
blockedbyclient
- The client chose to block the request. -
blockedbyresponse
- The request failed because the response was delivered along with requirements which are not met ('X-Frame-Options' and 'Content-Security-Policy' ancestor checks, for instance). -
connectionaborted
- A connection timed out as a result of not receiving an ACK for data sent. -
connectionclosed
- A connection was closed (corresponding to a TCP FIN). -
connectionfailed
- A connection attempt failed. -
connectionrefused
- A connection attempt was refused. -
connectionreset
- A connection was reset (corresponding to a TCP RST). -
internetdisconnected
- The Internet connection has been lost. -
namenotresolved
- The host name could not be resolved. -
timedout
- An operation timed out. -
failed
- A generic failure occurred.
Type: ErrorCode
Extends RequestBase
Plugin' request. The request have significant difference with Puppeteer' request.
-
initial
INewRequestInitialArgs -
requestOptions
IRequestOptions
/**
* This example will show how to enable proxy for single page.
*/
// require libs
const puppeteer = require('puppeteer');
const InterceptionUtils = require('automation-extra-interception-proxy');
// do everything async
(async () => {
// launch some browser
const browser = await puppeteer.launch({
headless: false,
});
// get some page
const page = await browser.newPage();
// attach interception commands
await InterceptionUtils.wrapPage(page, {
requestMode: "managed",
// optional, will be handled by https://www.npmjs.com/package/proxy-agent
proxy: "socks5://username:password@some-socks-proxy.com:9050"
});
// goto to our destination and wait for the response
await page.goto('https://www.npmjs.com/package/automation-extra-interception-proxy');
// closing browser
await browser.close();
})(); // ent of our thread
/**
* This example will show how to enable interceptions for single page.
*
* This code will get some wallpaper image urls from bing.com
*
* This code could be broken if their behavior was changed.
*/
// require libs
const puppeteer = require('puppeteer');
const InterceptionUtils = require('automation-extra-interception-proxy');
// do everything async
(async () => {
// launch some browser
const browser = await puppeteer.launch({
headless: false,
});
// get some page
const page = await browser.newPage();
// attach interception commands
await InterceptionUtils.wrapPage(page, {
requestMode: "managed",
// optional, will be handled by https://www.npmjs.com/package/proxy-agent
// proxy: "socks5://username:password@some-socks-proxy.com:9050"
});
// create promise callback for async processing
let callback;
const promise = new Promise((resolve) => { callback = resolve; });
// add some listener
page.interceptions.addRequestListener('bing-images', async request => {
// filter anything else
if (request.url !== 'https://www.bing.com/hp/api/model') {
// just letting you know that we got something else here
console.log('Ignoring', request.url.slice(0, 50));
return
}
// get response data
const response = await request.getResponse();
// grab data directly from their api response
const apiData = response.json;
// doing anything you like
const imageUrls = apiData.MediaContents.map(({ ImageContent }) =>
`https://www.bing.com${ImageContent.Image.Url}`);
// back to async thread
callback(imageUrls);
}); // end of listener
// goto to our destination and wait for the response
const [imageUrls] = await Promise.all([
promise,
page.goto('https://www.bing.com/'),
]);
// print our image urls
console.log('imageUrls', imageUrls);
// not necessary: cleaning our listener
page.interceptions.deleteLocalRequestListener('bing-images');
// closing browser
await browser.close();
})(); // ent of our thread
Probably you're using old version of puppeteer. Try you upgrade first.
In case if you don't want to or cookies still does not work enable enableLegacyCookieHandling.
Yes, the implementation is still raw.
- finalize cors managed requests - need to pass cors test
- add tests
-
- plugin flow
- documentation
-
- improve
docs
command
- improve
-
- describe
wrapPage
- describe
-
- describe
InterceptionProxyPlugin
class
- describe
- add more proxy api
-
- waitRequest
- websocket support
- migrate to automation-extra-plugin
- support Grease cipher
Copyright © 2021 - 2023, Utyfua. Released under the MIT License.