How to take bulk screenshots with Puppeteer

Learn how to take screenshots of multiple URLs with Puppeteer, including concurrency management, error handling, retries, and proxy support.

Blog post 8 min read

Written by

Dmytro Krasun

Updated on

Taking screenshots of multiple URLs is a common requirement for building website directories, monitoring tools, SEO analyzers, and archiving systems. While taking a single screenshot with Puppeteer is straightforward, processing hundreds or thousands of URLs requires careful consideration of concurrency, error handling, and resource management.

In this guide, I will walk you through building a robust bulk screenshot solution with Puppeteer, from a basic sequential approach to a production-ready implementation with retries and proxy support.

Setting up the project

You can skip that section if you already have a project where you want to add Puppeteer or it is already installed.

First, create a new Node.js project and install the required dependencies:

Terminal window
mkdir bulk-screenshots
cd bulk-screenshots
npm init -y
npm install puppeteer typescript ts-node @types/node

Create a tsconfig.json:

{
"compilerOptions": {
"target": "ES2020",
"module": "commonjs",
"strict": true,
"esModuleInterop": true,
"outDir": "./dist"
}
}

Sequential processing

Make it work first. Make it fast. And then make it simple.

The simplest way to take bulk screenshots is to process URLs one by one:

import puppeteer from "puppeteer";
const urls = ["https://example.com", "https://screenshotone.com"];
async function takeScreenshots() {
const browser = await puppeteer.launch();
for (const url of urls) {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 800 });
await page.goto(url, { waitUntil: "networkidle0" });
const filename = url.replace(/[^a-z0-9]/gi, "_") + ".png";
await page.screenshot({ path: filename });
await page.close();
console.log(`Screenshot saved: ${filename}`);
}
await browser.close();
}
takeScreenshots();

This approach works but has significant limitations:

  1. Slow execution — URLs are processed one at a time, wasting resources while waiting for pages to load.
  2. No error handling — A single failed URL stops the entire process.
  3. No retry mechanism — Temporary network issues cause permanent failures.

Concurrent processing

To speed up bulk screenshots, we can process multiple URLs in parallel using a worker pool pattern:

import puppeteer, { Browser } from "puppeteer";
const urls = ["https://example.com", "https://screenshotone.com"];
const CONCURRENCY = 3;
async function takeScreenshot(browser: Browser, url: string): Promise<void> {
const page = await browser.newPage();
try {
await page.setViewport({ width: 1280, height: 800 });
await page.goto(url, { waitUntil: "networkidle0", timeout: 30000 });
const filename = url.replace(/[^a-z0-9]/gi, "_") + ".png";
await page.screenshot({ path: filename });
console.log(`Success: ${url}`);
} finally {
await page.close();
}
}
async function processWithConcurrency() {
const browser = await puppeteer.launch();
const queue = [...urls];
async function worker() {
while (queue.length > 0) {
const url = queue.shift();
if (url) {
try {
await takeScreenshot(browser, url);
} catch (error) {
console.error(`Failed: ${url}`, error);
}
}
}
}
const workers = Array(CONCURRENCY)
.fill(null)
.map(() => worker());
await Promise.all(workers);
await browser.close();
}
processWithConcurrency();

This implementation creates a pool of workers that continuously pull URLs from a shared queue. The CONCURRENCY constant controls how many screenshots are taken simultaneously.

Be careful with concurrency limits. Too many concurrent pages can exhaust system memory and cause crashes. Start with 3-5 concurrent workers and adjust based on your system resources.

Error handling and retries

If you plan to deploy it to production, consider covering the following errors and issues:

  • Network timeouts;
  • Pages returning error status codes (403, 429, 503);
  • Memory issues;

Here is an implementation with retry logic:

import puppeteer, { Browser, Page } from "puppeteer";
interface ScreenshotResult {
url: string;
success: boolean;
filepath?: string;
error?: string;
}
const MAX_RETRIES = 3;
const RETRY_DELAY = 1000;
async function delay(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
function isRetryableError(error: unknown): boolean {
if (error instanceof Error) {
const retryableMessages = [
"net::ERR_CONNECTION_RESET",
"net::ERR_CONNECTION_REFUSED",
"net::ERR_TIMED_OUT",
"Navigation timeout",
];
return retryableMessages.some((msg) => error.message.includes(msg));
}
return false;
}
async function takeScreenshotWithRetry(browser: Browser, url: string): Promise<ScreenshotResult> {
let lastError: unknown;
for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
const page = await browser.newPage();
try {
await page.setViewport({ width: 1280, height: 800 });
await page.goto(url, {
waitUntil: "networkidle0",
timeout: 30000,
});
const filename = `screenshots/${url.replace(/[^a-z0-9]/gi, "_")}.png`;
await page.screenshot({ path: filename });
return {
url,
success: true,
filepath: filename,
};
} catch (error) {
lastError = error;
if (!isRetryableError(error) || attempt === MAX_RETRIES) {
break;
}
console.log(`Retry ${attempt + 1}/${MAX_RETRIES} for ${url}`);
await delay(RETRY_DELAY * (attempt + 1));
} finally {
await page.close();
}
}
return {
url,
success: false,
error: lastError instanceof Error ? lastError.message : String(lastError),
};
}

The retry logic uses exponential backoff (increasing delay between retries) and only retries on specific error types that are likely to be transient.

Using proxies for failed requests

Some websites block requests from datacenter IPs or rate-limit aggressive crawlers. Using proxies can help bypass these restrictions. Check out how to use proxy per page with Puppeteer for detailed proxy configuration.

Here is how to integrate proxy rotation into the retry logic:

import puppeteer, { Browser } from "puppeteer";
const proxies = [
"http://user:pass@proxy1.example.com:8080",
"http://user:pass@proxy2.example.com:8080",
];
async function takeScreenshotWithProxy(
browser: Browser,
url: string,
proxyIndex: number
): Promise<ScreenshotResult> {
const proxy = proxies[proxyIndex % proxies.length];
const page = await browser.newPage();
try {
await page.setViewport({ width: 1280, height: 800 });
await page.setRequestInterception(true);
page.on("request", (request) => {
request.continue();
});
await page.goto(url, {
waitUntil: "networkidle0",
timeout: 30000,
});
const filename = `screenshots/${url.replace(/[^a-z0-9]/gi, "_")}.png`;
await page.screenshot({ path: filename });
return { url, success: true, filepath: filename };
} catch (error) {
return {
url,
success: false,
error: error instanceof Error ? error.message : String(error),
};
} finally {
await page.close();
}
}

For proper per-page proxy support, you will need the puppeteer-page-proxy package as described in the proxy guide.

Complete working example

Here is a production-ready implementation that combines all the concepts:

import puppeteer, { Browser } from "puppeteer";
import * as fs from "node:fs/promises";
import * as path from "node:path";
interface Config {
concurrency: number;
maxRetries: number;
outputDirectory: string;
viewport: {
width: number;
height: number;
};
timeout: number;
}
interface ScreenshotResult {
url: string;
success: boolean;
filepath?: string;
error?: string;
attempts: number;
}
const config: Config = {
concurrency: 3,
maxRetries: 3,
outputDirectory: "./screenshots",
viewport: {
width: 1280,
height: 800,
},
timeout: 30000,
};
const urls = ["https://example.com", "https://screenshotone.com"];
function getFilename(url: string): string {
const urlObj = new URL(url);
const hostname = urlObj.hostname.replace(/\./g, "_");
const pathname = urlObj.pathname.replace(/\//g, "_").replace(/^_/, "");
return pathname ? `${hostname}${pathname}.png` : `${hostname}.png`;
}
function isRetryableError(error: unknown): boolean {
if (!(error instanceof Error)) return false;
const retryable = [
"net::ERR_CONNECTION",
"net::ERR_TIMED_OUT",
"Navigation timeout",
"Protocol error",
];
return retryable.some((msg) => error.message.includes(msg));
}
async function delay(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function takeScreenshot(browser: Browser, url: string): Promise<ScreenshotResult> {
let lastError: unknown;
let attempts = 0;
for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
attempts = attempt + 1;
const page = await browser.newPage();
try {
await page.setViewport(config.viewport);
await page.goto(url, {
waitUntil: "networkidle0",
timeout: config.timeout,
});
const filename = getFilename(url);
const filepath = path.join(config.outputDirectory, filename);
await page.screenshot({ path: filepath });
return {
url,
success: true,
filepath,
attempts,
};
} catch (error) {
lastError = error;
if (!isRetryableError(error) || attempt === config.maxRetries) {
break;
}
await delay(1000 * (attempt + 1));
} finally {
await page.close();
}
}
return {
url,
success: false,
error: lastError instanceof Error ? lastError.message : String(lastError),
attempts,
};
}
async function processUrls(urls: string[]): Promise<ScreenshotResult[]> {
const browser = await puppeteer.launch({
args: ["--disable-setuid-sandbox", "--disable-dev-shm-usage", "--no-first-run"],
});
await fs.mkdir(config.outputDirectory, { recursive: true });
const queue = [...urls];
const results: ScreenshotResult[] = [];
async function worker(): Promise<void> {
while (queue.length > 0) {
const url = queue.shift();
if (!url) continue;
console.log(`Processing: ${url}`);
const result = await takeScreenshot(browser, url);
results.push(result);
if (result.success) {
console.log(`Success: ${url} -> ${result.filepath}`);
} else {
console.log(`Failed: ${url} - ${result.error}`);
}
}
}
const workers = Array(config.concurrency)
.fill(null)
.map(() => worker());
await Promise.all(workers);
await browser.close();
return results;
}
async function main() {
console.log(`Processing ${urls.length} URLs with concurrency ${config.concurrency}`);
const results = await processUrls(urls);
const successful = results.filter((r) => r.success).length;
const failed = results.filter((r) => !r.success).length;
console.log(`\nCompleted: ${successful} successful, ${failed} failed`);
if (failed > 0) {
console.log("\nFailed URLs:");
results.filter((r) => !r.success).forEach((r) => console.log(` ${r.url}: ${r.error}`));
}
}
main().catch(console.error);

Run this with npx ts-node index.ts and it will process all URLs concurrently with retry support.

In case, if you plan to render full-page screenshots, check out the complete guide on how to take full page screenshots with Puppeteer, Playwright, or Selenium for detailed instructions on handling lazy loading and other corner cases.

ScreenshotOne API as an alternative

Building and maintaining your own bulk screenshot infrastructure requires handling many edge cases: cookie banners, anti-bot protection, proxy management, browser crashes, memory leaks, and more. ScreenshotOne provides a managed API that handles all of this complexity.

You can use a bulk screenshots endpoint that can process multiple URLs in a single request or for more complex bulk processing with retries and concurrency management, check out our bulk screenshots guide.

A few nuances on why and when use ScreenshotOne:

  • No infrastructure to manage: o need to run and maintain headless browsers, handle crashes, or manage server resources.
  • Built-in caching. Screenshots are cached if requested, reducing costs for repeated requests.
  • Cookie banners and ads blocking — Built-in features to hide cookie banners, ads and more block ads without additional configuration compared to doing with Puppeteer.
  • S3 storage integration. You can upload screenshots directly to any S3-compatible storage.
  • Concurrency management. The API manages concurrency limits and queuing automatically.
  • SDKs for multiple languages and many more integrations.

But:

  • Monthly cost. However, there is a cost compared to self-hosted solutions, though often cheaper than running your own infrastructure at scale.
  • Third-party dependency. Your application depends on an external service availability.
  • Less browser control. Some advanced browser configurations may not be available. But you can reach to our support at support@screenshotone.com and we will try to help you as fast as possible.

Summary

Taking bulk screenshots with Puppeteer requires careful consideration of:

  1. Concurrency: process multiple URLs in parallel but respect system limits.
  2. Error handling: implement retry logic with exponential backoff.
  3. Proxies: use proxy rotation for blocked or rate-limited sites.
  4. Resource management: close pages properly and monitor memory usage.

For production workloads, consider using ScreenshotOne API which handles all these complexities out of the box, letting you focus on your application logic instead of infrastructure management.

Read more Puppeteer guides

Interviews, tips, guides, industry best practices, and news.

View all posts

Automate website screenshots

Exhaustive documentation, ready SDKs, no-code tools, and other automation to help you render website screenshots and outsource all the boring work related to that to us.