A short and quick answer
While waiting a fixed period of time is a bad practice, in the real world, it is hard to find a solution that works well in all cases.
In order to take a screenshot when the page is fully loaded and rendered, one of the most working combination is to set waitUntil
for the page.goto()
function to domcontentloaded
and wait a bit before taking a screenshot:
While waitUntil
might work for most scenarios with networkidle0
or networkidle2
, there are caveats.
Without waiting
If you are interested, there is a deep dive guide on how to take screenshots with Puppeteer.
Let’s take a simple screenshot without waiting for any event and see what happens.
Install Puppeteer
:
And as an example, I will take a screenshot of the Yahoo Finance site. It has a lot of widgets, and they are loaded asynchronously, so it will be indicative that we can’t take the screenshot right away.
Let’s take a screenshot without any waiting options:
And the result is:
You can see that the widgets on the right are not loaded, but we take a screenshot anyway. It is half-backed — it is not good. Let’s improve it.
Delay
The simplest, but not the best solution is to wait for some amount of time before taking a screenshot:
And the result is OKish:
Now, we see the widgets on the right.
By the way, the ads are probably not loaded because headless browsers might block them, and the video is not loaded because Puppeteer uses Chromium which does not support the rendering of MP4
videos.
Why is it not good to use delay before taking a screenshot? It does not scale:
- time varies on your Internet connection;
- different sites have different loading times;
- rendering time also varies on the machine load.
You can safely use this simple approach if you need to take one or two screenshots occasionally for well-known sites and with big enough delays.
Wait until an event occurs
With some exceptions, the most optimal and bullet-proof approach is to specify the waitUntil
parameter when calling page.goto()
.
The page.goto()
function accepts an instance of the WaitForOptions
type, which is defined as:
Let’s consider the definition of which accepted value in the waitUntil
property of the WaitForOptions
type:
load
: the navigation is successful when the load even is fired;domcontentloaded
: the navigation is finished when theDOMContentLoaded
even is fired;networkidle0
: the navigation is finished when there are no more than 0 network connections for at least500
ms;networkidle2
: consider navigation to be finished when there are no more than 2 network connections for at least500
ms.
You might specify an array of expected events. This way page.goto()
will resolve after all events are fired.
Specifying the timeout
option is supercritical. By default, it is 30000 milliseconds — 30 seconds. If events are not resolved within this time, page.goto()
will throw an error.
Difference between networkidle0 and networkidle2
With options, Puppeteer
waits for the network idle.
Use networkidle0
for sites that loaded once and then don’t send requests. An example is a SPA without any background activities.
While networkidle2
is suitable for applications with open connections and sends requests after the page is loaded. Imagine observing a trading graph in real time on an exchange site.
There is a also separate method in Puppeteer
to wait for the network idle:
Wait until DOMContentLoaded
Let’s try to use wait for page until the DOMContentLoaded
event occurs and see if it helps, to render the Finance Yahoo page correctly:
It does not help a lot:
Probably, because after the DOMContentLoaded event occurred, they sent another request for widgets. Let’s try with both events and networkidle2
:
And here we go:
It works fast and as we need. I chose networkidle2
instead of networkidle0
, because they constantly send requests, and page.goto()
will throw an error on timeout.
Caveats
The combination of options like domcontentloaded
and networkidle2
, might work well in many cases, but not in all cases.
You still might have pages with lazy loading images, so you need to scroll to the bottom of the page and then wait until the images are loaded. And sometimes, you trap in infinite scrolling. Some pages can stop sending networking requests, and some not.
You can write standard code to handle these issues if you are working with a known set of sites. But you might trap in new problems, so test your code repeatedly and on many sites.
Wait for the page ready after a button click or a form submit
In case, if you need to wait for page to be ready a button click or a form submit, use page.waitForNavigation()
The Puppeteer
API suggests using a Promise.All()
to prevent a race condition.
A third-party API to take screenshots
A shameless promotion! In case if you don’t want to waste on handling all Puppeeter issues and scaling, feel free to use ScreenshotOne.com as a screenshot API.
All described options are supported! And you can start for free.
Afterwords and recommendations
I hope I helped you today to solve your problem and have a nice day 👋
You also might find helpful: