How to Take Bulk Screenshots with Playwright in Python

To get familiar with Playwright in Python, you can check out our guide on how to take website screenshots with Playwright in Python or the guide about the full page screenshots.

The following guide is fully focused on bulk screenshots with Playwright in Python only.

Taking one screenshot is easy. Taking thousands while handling failures, managing memory, and maintaining speed is another challenge entirely. Let me show you how to build a robust bulk screenshot system.

Basic Batch Processing

Let’s start simple—processing a list of URLs:

1
from playwright.sync_api import sync_playwright
2

3
urls = [
4
    'https://example.com',
5
    'https://github.com',
6
    'https://stackoverflow.com',
7
]
8

9
with sync_playwright() as p:
10
    browser = p.chromium.launch()
11

12
    for i, url in enumerate(urls):
13
        page = browser.new_page()
14
        try:
15
            page.goto(url, timeout=30000)
16
            page.screenshot(path=f'screenshot_{i}.png')
17
        except Exception as e:
18
            print(f'Failed: {url} - {e}')
19
        finally:
20
            page.close()
21

22
    browser.close()

This works, but it’s slow. Each screenshot waits for the previous one to complete.

Concurrent Processing with Async

The real power comes from async processing:

1
import asyncio
2
from playwright.async_api import async_playwright
3

4
async def take_screenshot(browser, url, output_path):
5
    """Take a single screenshot."""
6
    page = await browser.new_page()
7
    try:
8
        await page.goto(url, timeout=30000)
9
        await page.wait_for_load_state('networkidle', timeout=10000)
10
        await page.screenshot(path=output_path, full_page=True)
11
        return {'url': url, 'status': 'success', 'path': output_path}
12
    except Exception as e:
13
        return {'url': url, 'status': 'failed', 'error': str(e)}
14
    finally:
15
        await page.close()
16

17
async def process_urls(urls):
18
    """Process multiple URLs concurrently."""
19
    async with async_playwright() as p:
20
        browser = await p.chromium.launch()
21

22
        tasks = [
23
            take_screenshot(browser, url, f'screenshots/{i}.png')
24
            for i, url in enumerate(urls)
25
        ]
26

27
        results = await asyncio.gather(*tasks)
28
        await browser.close()
29

30
        return results
31

32
# Run it
33
urls = ['https://example.com', 'https://github.com', 'https://stackoverflow.com']
34
results = asyncio.run(process_urls(urls))
35

36
for r in results:
37
    print(f"{r['url']}: {r['status']}")

This processes all URLs concurrently. On my machine, it’s about 5x faster than sequential processing.

Reading URLs from CSV

For real-world use, you’ll often read URLs from a file:

1
import asyncio
2
import csv
3
from playwright.async_api import async_playwright
4
from pathlib import Path
5
from urllib.parse import urlparse
6

7
def sanitize_filename(url):
8
    """Convert URL to safe filename."""
9
    parsed = urlparse(url)
10
    name = parsed.netloc.replace('.', '_')
11
    return f"{name}.png"
12

13
async def take_screenshot(browser, url, output_dir):
14
    """Take screenshot with safe filename."""
15
    filename = sanitize_filename(url)
16
    output_path = output_dir / filename
17

18
    page = await browser.new_page()
19
    try:
20
        await page.goto(url, timeout=30000)
21
        await page.wait_for_load_state('networkidle', timeout=10000)
22
        await page.screenshot(path=str(output_path), full_page=True)
23
        return {'url': url, 'status': 'success', 'path': str(output_path)}
24
    except Exception as e:
25
        return {'url': url, 'status': 'failed', 'error': str(e)}
26
    finally:
27
        await page.close()
28

29
async def process_csv(csv_path, output_dir):
30
    """Process URLs from CSV file."""
31
    output_dir = Path(output_dir)
32
    output_dir.mkdir(exist_ok=True)
33

34
    # Read URLs from CSV
35
    urls = []
36
    with open(csv_path, 'r') as f:
37
        reader = csv.reader(f)
38
        next(reader)  # Skip header
39
        urls = [row[0] for row in reader if row]
40

41
    print(f"Processing {len(urls)} URLs...")
42

43
    async with async_playwright() as p:
44
        browser = await p.chromium.launch()
45

46
        tasks = [take_screenshot(browser, url, output_dir) for url in urls]
47
        results = await asyncio.gather(*tasks)
48

49
        await browser.close()
50

51
    # Summary
52
    success = sum(1 for r in results if r['status'] == 'success')
53
    failed = len(results) - success
54
    print(f"Complete: {success} success, {failed} failed")
55

56
    return results
57

58
# Usage
59
asyncio.run(process_csv('urls.csv', 'screenshots'))

Batch Processing for Large Volumes

Processing thousands of URLs at once will exhaust memory. Use batching:

1
import asyncio
2
from playwright.async_api import async_playwright
3

4
async def take_screenshot(browser, url, output_path):
5
    page = await browser.new_page()
6
    try:
7
        await page.goto(url, timeout=30000)
8
        await page.screenshot(path=output_path, full_page=True)
9
        return {'url': url, 'status': 'success'}
10
    except Exception as e:
11
        return {'url': url, 'status': 'failed', 'error': str(e)}
12
    finally:
13
        await page.close()
14

15
async def process_batch(browser, batch, start_index):
16
    """Process a batch of URLs."""
17
    tasks = [
18
        take_screenshot(browser, url, f'screenshots/{start_index + i}.png')
19
        for i, url in enumerate(batch)
20
    ]
21
    return await asyncio.gather(*tasks)
22

23
async def process_all_urls(urls, batch_size=10):
24
    """Process all URLs in batches."""
25
    all_results = []
26

27
    async with async_playwright() as p:
28
        browser = await p.chromium.launch()
29

30
        for i in range(0, len(urls), batch_size):
31
            batch = urls[i:i + batch_size]
32
            print(f"Processing batch {i//batch_size + 1} ({len(batch)} URLs)...")
33

34
            results = await process_batch(browser, batch, i)
35
            all_results.extend(results)
36

37
            # Optional: brief pause between batches
38
            await asyncio.sleep(0.5)
39

40
        await browser.close()
41

42
    return all_results
43

44
# Usage
45
urls = [f'https://example{i}.com' for i in range(100)]
46
results = asyncio.run(process_all_urls(urls, batch_size=10))

Rate Limiting

Some servers may rate-limit or block rapid requests. Add delays:

1
import asyncio
2
from playwright.async_api import async_playwright
3

4
class RateLimiter:
5
    def __init__(self, requests_per_second=2):
6
        self.delay = 1.0 / requests_per_second
7
        self.last_request = 0
8

9
    async def wait(self):
10
        import time
11
        now = time.time()
12
        wait_time = self.last_request + self.delay - now
13
        if wait_time > 0:
14
            await asyncio.sleep(wait_time)
15
        self.last_request = time.time()
16

17
async def take_screenshot_with_rate_limit(browser, url, output_path, limiter):
18
    await limiter.wait()
19
    page = await browser.new_page()
20
    try:
21
        await page.goto(url, timeout=30000)
22
        await page.screenshot(path=output_path)
23
        return {'url': url, 'status': 'success'}
24
    except Exception as e:
25
        return {'url': url, 'status': 'failed', 'error': str(e)}
26
    finally:
27
        await page.close()

Progress Tracking

For long-running jobs, track progress:

1
import asyncio
2
from playwright.async_api import async_playwright
3
from tqdm import tqdm
4

5
async def process_with_progress(urls):
6
    results = []
7

8
    async with async_playwright() as p:
9
        browser = await p.chromium.launch()
10

11
        with tqdm(total=len(urls), desc="Screenshots") as pbar:
12
            for i, url in enumerate(urls):
13
                result = await take_screenshot(browser, url, f'screenshots/{i}.png')
14
                results.append(result)
15
                pbar.update(1)
16
                pbar.set_postfix({
17
                    'success': sum(1 for r in results if r['status'] == 'success'),
18
                    'failed': sum(1 for r in results if r['status'] == 'failed')
19
                })
20

21
        await browser.close()
22

23
    return results

Retry Failed URLs

Don’t lose failed screenshots—retry them:

1
import asyncio
2
from playwright.async_api import async_playwright
3

4
async def take_screenshot_with_retry(browser, url, output_path, max_retries=3):
5
    """Take screenshot with automatic retry."""
6
    for attempt in range(max_retries):
7
        page = await browser.new_page()
8
        try:
9
            await page.goto(url, timeout=30000)
10
            await page.screenshot(path=output_path)
11
            await page.close()
12
            return {'url': url, 'status': 'success', 'attempts': attempt + 1}
13
        except Exception as e:
14
            await page.close()
15
            if attempt == max_retries - 1:
16
                return {'url': url, 'status': 'failed', 'error': str(e)}
17
            await asyncio.sleep(2 ** attempt)  # Exponential backoff
18

19
async def process_with_retry(urls, batch_size=10):
20
    results = []
21
    failed = []
22

23
    async with async_playwright() as p:
24
        browser = await p.chromium.launch()
25

26
        # First pass
27
        for i in range(0, len(urls), batch_size):
28
            batch = urls[i:i + batch_size]
29
            batch_results = await asyncio.gather(*[
30
                take_screenshot_with_retry(browser, url, f'screenshots/{i+j}.png')
31
                for j, url in enumerate(batch)
32
            ])
33
            results.extend(batch_results)
34

35
        await browser.close()
36

37
    # Log failures
38
    failed = [r for r in results if r['status'] == 'failed']
39
    if failed:
40
        with open('failed_urls.txt', 'w') as f:
41
            for r in failed:
42
                f.write(f"{r['url']}\t{r.get('error', 'Unknown')}\n")
43

44
    return results

Memory Management

Long-running screenshot jobs can leak memory. Tips:

Close pages after each screenshot
Restart browser periodically
Process in batches

1
async def process_with_browser_restart(urls, batch_size=50, restart_every=200):
2
    """Restart browser periodically to prevent memory leaks."""
3
    results = []
4

5
    async with async_playwright() as p:
6
        browser = await p.chromium.launch()
7
        screenshots_since_restart = 0
8

9
        for i, url in enumerate(urls):
10
            # Restart browser if needed
11
            if screenshots_since_restart >= restart_every:
12
                await browser.close()
13
                browser = await p.chromium.launch()
14
                screenshots_since_restart = 0
15
                print(f"Browser restarted at screenshot {i}")
16

17
            result = await take_screenshot(browser, url, f'screenshots/{i}.png')
18
            results.append(result)
19
            screenshots_since_restart += 1
20

21
        await browser.close()
22

23
    return results

When to Use an API Instead

Playwright is great for moderate volumes, but at scale you’ll face:

Server costs for compute resources
Memory management complexity
Browser crashes and recovery
Rate limiting from target sites

For high-volume screenshot automation, a screenshot API like ScreenshotOne handles the infrastructure. See the comparison in our bulk screenshots API guide.

Complete Production Script

Here’s a production-ready script combining all techniques:

1
import asyncio
2
import csv
3
import json
4
from pathlib import Path
5
from datetime import datetime
6
from playwright.async_api import async_playwright
7
from urllib.parse import urlparse
8

9
class BulkScreenshotter:
10
    def __init__(self, output_dir='screenshots', batch_size=10, max_retries=3):
11
        self.output_dir = Path(output_dir)
12
        self.output_dir.mkdir(exist_ok=True)
13
        self.batch_size = batch_size
14
        self.max_retries = max_retries
15
        self.results = []
16

17
    def _get_filename(self, url, index):
18
        parsed = urlparse(url)
19
        safe_name = parsed.netloc.replace('.', '_').replace(':', '_')
20
        return f"{index:05d}_{safe_name}.png"
21

22
    async def _screenshot(self, browser, url, index):
23
        filename = self._get_filename(url, index)
24
        output_path = self.output_dir / filename
25

26
        for attempt in range(self.max_retries):
27
            page = await browser.new_page()
28
            try:
29
                await page.goto(url, timeout=30000)
30
                await page.wait_for_load_state('networkidle', timeout=10000)
31
                await page.screenshot(path=str(output_path), full_page=True)
32
                await page.close()
33
                return {
34
                    'url': url,
35
                    'status': 'success',
36
                    'path': str(output_path),
37
                    'attempts': attempt + 1
38
                }
39
            except Exception as e:
40
                await page.close()
41
                if attempt == self.max_retries - 1:
42
                    return {
43
                        'url': url,
44
                        'status': 'failed',
45
                        'error': str(e),
46
                        'attempts': attempt + 1
47
                    }
48
                await asyncio.sleep(2 ** attempt)
49

50
    async def process(self, urls):
51
        async with async_playwright() as p:
52
            browser = await p.chromium.launch()
53

54
            for i in range(0, len(urls), self.batch_size):
55
                batch = urls[i:i + self.batch_size]
56
                tasks = [
57
                    self._screenshot(browser, url, i + j)
58
                    for j, url in enumerate(batch)
59
                ]
60
                batch_results = await asyncio.gather(*tasks)
61
                self.results.extend(batch_results)
62

63
                success = sum(1 for r in self.results if r['status'] == 'success')
64
                print(f"Progress: {len(self.results)}/{len(urls)} ({success} success)")
65

66
            await browser.close()
67

68
        self._save_report()
69
        return self.results
70

71
    def _save_report(self):
72
        report = {
73
            'timestamp': datetime.now().isoformat(),
74
            'total': len(self.results),
75
            'success': sum(1 for r in self.results if r['status'] == 'success'),
76
            'failed': sum(1 for r in self.results if r['status'] == 'failed'),
77
            'results': self.results
78
        }
79
        with open(self.output_dir / 'report.json', 'w') as f:
80
            json.dump(report, f, indent=2)
81

82
# Usage
83
urls = ['https://example.com', 'https://github.com']
84
screenshotter = BulkScreenshotter(batch_size=5)
85
results = asyncio.run(screenshotter.process(urls))

Summary

Building a bulk screenshot system:

Use async Playwright for concurrent processing
Process in batches to manage memory
Implement retry logic for resilience
Track progress and save reports
Restart browser periodically for long jobs

Frequently Asked Questions

If you read the article, but still have questions. Please, check the most frequently asked. And if you still have questions, feel free reach out at support@screenshotone.com.

How to automatically screenshot multiple websites in Python?

Use Playwright's async API with asyncio.gather() to process multiple URLs concurrently. Read URLs from a CSV or list, create async tasks for each, and process them in batches to manage memory.

How to handle failed screenshots in bulk processing?

Implement try-except blocks around each screenshot, log failures to a separate file, and optionally retry failed URLs. Keep track of success/failure counts for monitoring.

How to speed up bulk screenshot processing?

Use async Playwright with concurrent processing, reuse browser contexts, process in batches, and consider using a screenshot API for very large volumes. Concurrent processing can be 5-10x faster than sequential.

How to Take Bulk Screenshots with Playwright in Python

Written by

Published on

Tags

Basic Batch Processing

Concurrent Processing with Async

Reading URLs from CSV

Batch Processing for Large Volumes

Rate Limiting

Progress Tracking

Retry Failed URLs

Memory Management

When to Use an API Instead

Complete Production Script

Summary

Frequently Asked Questions

How to automatically screenshot multiple websites in Python?

How to handle failed screenshots in bulk processing?

How to speed up bulk screenshot processing?

Read more Screenshot rendering

How to Take Full Page Screenshots with Playwright in Python

How to take bulk screenshots with Puppeteer

How to take website screenshots with Java

Automate website screenshots

Integrations

Use Cases

Screenshot Tools

Customers