Improving performance and stability by consolidating validation and access check logic in the API gateway
Posted July 30, 2023 by Dmytro Krasun ‐ 2 min read
For the past few days, I have been working on improving the stability and performance of the ScreenshotOne API. I started from low-hanging fruits—moving validations and access key management from rendering services to ScreenshotOne's API gateway. The API will be more stable and performant as a result. If you are curious why, please continue reading.
A few words about how the ScreenshotOne API worked before the refactoring and why it was done the way done:
An API Gateway service (built on top of the Cloudflare Workers platform) received all requests and sent them to rendering services ( Google Cloud Run). And the rendering services, each on their own, were sending requests for validating requests, checking quotas, signatures, and so on to the access management services.
That architecture had evolved organically:
- The API started as one simple service hosted on one server.
- Then, for scalability, it was moved to Render and eventually to Google Cloud Run, which required decoupling key management and other common logic from the rendering services.
- Later, the API gateway was added to use Cloudflare Caching and Storage.
It worked nicely, but in the past few months, ScreenshotOne grew dramatically. And the setup revealed a few bottlenecks:
Each request to the ScreenshotOne API spins up a new Google Cloud Run instance. Propagating back takes time, and the number of instances is limited after all. So, when an error happens in the validation and key management services, it propagates back through the Google Cloud Run instances and increases the response time. In addition to that, on rare occasions, there might be no available instances.
It is obvious, yes, how to solve it? And, yes, the solution was simple. I moved validation and access check logic to the API gateway.
The new approach means that validation in case of invalid requests will be super fast. And new instances won’t be used. It implies that ScreenshotOne customers can enjoy improved stability at the end of the day.
ScreenshotOne has a public roadmap, but I invest time in improving what the API already has to offer.
I am now improving observability, and the quality of rendering is one of the core metrics for the API. I am eliminating errors and issues I find, and customers report one by one to make the API as great as possible.
If you have any questions or want to share some ideas, feel free to write at