I posted worked Puppeteer examples to understand the context of the solution better and copy it if needed.
Meet Puppeteer
It is a Node library that interacts with browsers that support Chrome DevTools Protocol (CDP). It is not only Chrome and Chromium, but Firefox also has partial support of CDP.
The Chrome DevTools Protocol was developed to manage, debug and inspect Chromium and Chrome at the low level.
So, think of Puppeteer high-level API over Chrome DevTools Protocol which allows you to do everything in the browser that you can do manually:
- Extract data from a SPA, submit a form, type text, perform end-to-end UI testing and other automation-related tasks.
- Debug performance issues.
- Run, debug and test Chrome Extensions.
- Pre-render SPA to make a static site. But for Google SEO, it does not matter since Google renders JavaScript for every page nowadays.
- And guess what? Make screenshots and PDFs of pages.
Generating Screenshots and PDFs with Puppeteer is the main focus of the post.
Puppeteer architecture and internals for curious
You can skip this section. It is not required to start using the library. But I love to explore the internals of the libraries I use, and so might you.
Lightweight option of Puppeteer
First of all, there are two versions of the library available: puppeteer-core and puppeteer. You should use puppeteer-core when you are going to manage browser instances by yourself, or you do not need it, otherwise stick to puppeteer.
Three simple examples that come to my mind with puppeteer-core:
- You are using CDP from the extension, so you do not have to download Chrome or Chromium.
- You want to use a different Chrome, Chromium, or Firefox build.
- You have a running cluster of browsers or a separate browser instance on an other machine.
When you use puppeteer-core, you must ensure that you use a compatible browser version. But the puppeteer library downloads and runs a compatible version of Chromium instance for you, without any worries.
Puppeteer Alternatives
There are a lot more, but the most popular two are:
- The oldest alternative to make screenshots is using the Selenium WebDriver protocol.
- The second one is Playwright, and it is a good one. It is the competitor to the Puppeteer.
Playwright and Puppeteer have compatible API, but Playwright supports more browsers. So, if you must take screenshots in different browsers, prefer to use Playwright. By the way, top contributors of the Puppeteer work on Playwright. But the library is still considered new.
Practical Examples of using Puppeteer to take screenshots
Before starting to work with Puppeteer, let’s install it using npm:
npm i puppeteer
A simple screenshot
To take a simple screenshot with Puppeteer and save it into the file, you can use the following code:
You use the parameter path
in Puppeteer
to save the screenshot. And always close the browser to avoid resource leaking!
You can use our reliable and scalable screenshot API with myriad
options to avoid the burden of setting up and managing Puppeteer
.
Resolution and Retina Display
To avoid blurred images on a high-resolution display like Retina Display you can change the viewport properties width
, height
and deviceScaleFactor
:
That’s called pixel-perfect screenshots.
A full page screenshot
Puppeteer knows how to make screenshot of the scrollable page. Use fullPage
option:
But it won’t work good with lazy-loaded images. I wrote a brief guide on how to take full page screenshots with Puppeteer right.
Wait until the page is completely loaded
It is a good practice to wait until the page is completely loaded to make screenshot:
It is a little bit of magic, but networkidle2
event is heuristic to determine page load state. It works quite well for many real-world use cases.
But if you need to wait until some element is rendered and visible, you need to add Page.waitForSelector():
You can also wait:
- for selector or function or timeout;
- for file chooser;
- for frame;
- for function;
- for navigation;
- for network idle;
- for request;
- for response;
- for selector;
- for timeout;
- and for XPath.
A screenshot of the page area
To take the screenshot of the page area, use the clip
option:
But if you need to take a screenshot of the element, there is a better approach.
A screenshot of the specific element
Puppeteer allows to take the screenshot of any element on the web page:
As you see, it is essential to make sure that the element is ready.
In ScreenshotOne screenshot API, you can take the screenshot of the element by specifying the selector parameter.
A screenshot with transparent background
Puppeteer provides a useful option to omit the background of the site. Just set omitBackground
to true:
Have you run the code? If yes, you spotted that the screenshot does not have a transparent background. It happens because omitting background works only for elements with transparent background.
So if your target site does not have a transparent background and you want to force it, you can use JavaScript to accomplish the task. Change the background of the body in the evaluate function:
Screenshot as Base64
I also wrote about rendering PNG, JPEG, and WebP in Base64 encoding with Puppeetter.
You build Puppeteer as a service and do not want to store screenshot files. You can choose to return the screenshot in Base64 encoding format:
You will receive a string that you can share with another service or even store somewhere.
Generate JPEG or WebP instead of PNG
It is super easy to generate JPEG or WebP instead of PNG:
When you generate JPEG or WebP, you can specify the quality of the screenshot.
If JPEG, WebP or PNG is not enough for you, you can render URL or HTML in GIF, JP2, TIFF, AVIF or HEIF format with Puppeetter.
Generate PDF instead of PNG
It is relatively easy to generate PDF instead of PNG:
Look at all possible Puppeteer PDF options. It is an exciting and complex problem, which deserves a separate post.
It depends on your use case, but also consider using PDFKit for programmatic PDF generation.
Blocking ads when using Puppeteer
In our screenshot API, you can block ads by setting blockAds=true.
I do not use any ad blocking extension because life is tough, and everybody needs some way to earn money. If I can help sites sustain and survive by non-blocking the ads, I will do it.
But when you test your site or your customer site, you might need to block the ads. There are 2 ways to do it:
- Intercept and block request that load ad into the site.
- Use an extension that is optimized exactly to solve this problem.
The first one is tricky and highly depends on the site you are taking screenshots of. But using an extension is a highly-scalable approach that works out of the box.
Install puppeteer-extra
and puppeteer-extra-plugin-adblocker
in addition to puppeteer
package:
npm i puppeteer-extra puppeteer-extra-plugin-adblocker
And then use it:
Most pages include ads and trackers, which consume a lot of bandwidth and take a long time to load. Because fewer requests are made, and less JavaScript is performed when advertisements and trackers are blocked, pages load substantially quicker.
Block trackers
To take screenshots faster you might block trackers. It will help to speed up rendering. The ad blocking plugin can help us with this issue.
Do not forget to install puppeteer-extra
and puppeteer-extra-plugin-adblocker
in addition to puppeteer
package:
npm i puppeteer-extra puppeteer-extra-plugin-adblocker
And then use it:
If you need to block only trackers, but do not block ads, just use request interceptor.
Preventing Puppeteer detection
Some sites might block your Puppeteer script because of the user agent, and it is easy to fix:
There are also many other hacks to ensure that Puppeteer is not detected, but you can save time by using the ready puppeteer-extra-plugin-stealth
plugin for the stealth mode. Install it in addition to puppeteer
package:
npm i puppeteer-extra puppeteer-extra-plugin-stealth
And then use:
Important! As you see, I remove the webdriver
property since the stealth plugin misses this hack and by using webdriver
property usage of the Puppeteer can be detected.
Hide cookies banners
It is a tricky task to implement generically, but you can accept a cookie by finding the selector of the Accept or reject button and clicking on it.
I wrote an ultimate guide on how to block cookie banners, GDPR overlay windows and other privacy-related notices when taking a screenshot with Puppeteer.
Using basic access authentication with Puppeteer
If your page is protected by HTTP basic access authentication, the only thing you need to do is to specify username and password before loading and taking the screenshot of the page:
Using a proxy for Puppeteer
In our screenshot API you just need to specify proxy option to take screenshots through proxy.
In case if you need to use a proxy to make a screenshot with Puppeteer, you can specify a browser-wide proxy:
But in some cases, you might want to use a page-wide proxy without recreating the browser instance. In this case, you can install puppeteer-page-proxy
:
npm i puppeteer-page-proxy
And use it specify proxy on per-page basis:
I also wrote in more detail and with more examples about using authenticated proxy with Puppeteer and proxy on a per request basis.
Add support of emojis, Japanese, Arabic and other non-Latin languages to Puppeteer
If you run Puppeteer in OS without emojis support, you need to install OS-wide fonts to support emojis. The same can happen with non-English characters like Chinese, Japanese, Korean, Arabic, Hebrew, etc.
To get Puppeteer to render emojis, you can use Noto Fonts published under SIL Open Font License (OFL) v1.1.
You need to search and how to install fonts for your host OS.
Have a nice day 👋
I posted a lot of Puppeteer examples, and I hope I helped you solve your screenshot problems with Puppeteer. I described problems I encountered and the solution to it.
You also might find useful: