Crawlee is a Node.js library for web scraping, maintained by Apify.
RISING STAR: Crawlee is one the most complete Node.js web scraping libraries. Despite being a relatively new player in the scraping scene, Crawlee is rapidly growing in popularity due to its extensive features and focus on mimicking real-user behavior to bypass website anti-bot protections.
Crawlee comes with three main crawler classes: CheerioCrawler, PuppeteerCrawler and PlaywrightCrawler. All classes share the same interface for maximum flexibility when switching between them.
A plain HTTP crawler, that parses HTML using the Cheerio library. It's very fast and efficient, but can't handle JavaScript rendering.
A headless browser crawler, controlled by the Puppeteer library. It can control Chromium or Chrome. Puppeteer is the de-facto standard in Node.js headless browser automation.
Playwright can be considered as the successor to Puppeteer. It can control Chromium, Chrome, Firefox, Webkit and many other browsers. If you're not familiar with Puppeteer already, and you need a headless browser, we recommend you go with Playwright.
Crawlee builds on top of popular Node.js libraries such as Cheerio, Puppeteer and Playwright while adding extra functionalities such as out of the box anti-block features.
This makes Crawlee a great choice for Node.js web scraping developers looking for a flexible and complete library that takes away the complexity of configuring scrapers to bypass modern anti-bot protections.
Building an Amazon Scraper in Node.js with Crawlee - Written tutorial Building an Amazon Scraper in Node.js with Crawlee - Video tutorial