A tiny, fast, and customizable Node.js library to crawl websites and save all pages as compact, AI-ready PDFs. Use it from the command line or as a module in your Node.js scripts. Perfect for data archiving, offline analysis, and feeding content to AI tools.
- Blazing Fast: Optimized for speed and performance.
- Lightweight: Minimal resource usage for crawling and PDF generation.
- Customizable: Full control over PDF formatting and crawling behavior.
- AI-Optimized PDFs: Compact and structured for AI consumption.
- Dual Usage: Use via CLI or integrate into Node.js scripts.
Star this repository and share it with your friends.
Install using pnpm, npm, or yarn
pnpm add e2pdf
or
npm install e2pdf
or
yarn add e2pdf
To use e2pdf from the command line:
e2pdf <website-url>
For example:
e2pdf https://example.com
This will crawl the website and save all pages as PDFs in the current directory.
Here’s an example of using e2pdf in a Node.js script:
import e2pdf from "e2pdf";
(async () => {
await e2pdf("https://example.com", {
out: "./pdfs",
pdf: {
format: "A4",
printBackground: true,
margin: { top: "20px", bottom: "20px" },
},
crawlerOptions: { maxRequestsPerCrawl: 100 },
});
console.log("Crawling completed! PDFs saved to ./pdfs");
})();
The e2pdf
function accepts two arguments:
- startUrl (string): The URL to start crawling from.
- options (E2PdfOptions): Configuration object for crawling and PDF generation.
-
Type:
string
-
Default:
process.cwd()
- Directory to save the generated PDFs.
PDF generation options (compatible with Playwright’s PDF options):
-
displayHeaderFooter
: Display header and footer. Defaults tofalse
. -
footerTemplate
: HTML template for the footer. -
format
: Paper format (e.g.,A4
,Letter
). Defaults toLetter
. -
headerTemplate
: HTML template for the header. -
landscape
: Paper orientation. Defaults tofalse
. -
margin
: Margins for the PDF (top
,right
,bottom
,left
). -
printBackground
: Print background graphics. Defaults tofalse
. - ...and many more options for fine-tuning PDFs.
Options for the Crawlee PlaywrightCrawler.
Configuration for Crawlee’s Configuration object.
We welcome contributions! Please fork the repository and submit a pull request.
This library is licensed under the MPL-2.0 open-source license.
If you encounter any issues or have suggestions, please open an issue or contact us. We’d love to hear from you!
Please enroll in our courses or sponsor our work.
with 💖 by Mayank Kumar Chaudhari