Cloudflare Workers as an API gateway

Published on Dmytro Krasun 7 min read
Check out how ScreenshotOne relies on Cloudflare Workers as an API gateway to enhance performance, reliability, and cost-efficiency.

Since Cloudflare featured ScreenshotOne in their “Built With” series, I was often asked about Workers, and this post is the answer to many of the frequently asked questions I received, as well as a sharing of my overall experience using the Cloudflare platform (in the context of Workers).

How it all started

I was just proxying all API requests through Cloudflare (without any code) to cache screenshots with their Edge Cache.

But due to its temporary nature and unpredictability, I was getting a lot of cache misses.

In the case of the ScreenshotOne API that meant using expensive resources to render screenshots again for missed cache requests. And since cached requests are free for my customers, I was simply losing money.

So when Cloudflare launched its storage—R2, I decided to use it as a second-level cache to increase the cache hit rate.

Now you can use R2 for caching in a few clicks, but by the time I needed it as a second-level cache, the only option was to use it Workers.

That’s how ScreenshotOne first adopted Cloudflare Workers as a super thin API gateway and only for better caching.

Cloudflare Workers as an API gateway

Today, Workers is a guard and a servant of the ScreenshotOne API, serving 100K+ heavy API requests daily on average. I use them for request validation, access key checks, rate-limit checks, caching, load balancing and even to switch between data centers in case of failures.

Overview

Validations

I share my request validation library between the API gateway (as Workers) and the rendering service itself.

It means that when the request is not valid, it doesn’t go to rendering servers and don’t causes any additional load on them. It was super important when I was fully reliant on Google Cloud Run. Now, it is less critical. But I still admire how fast are error messages returned.

Caching

I have 2 levels of cache. I cache screenshots on edges, and in the long-term storage. The Worker algorithm to render a screenshot:

  1. Check if the screenshot is already rendered and is stored in the edge cache. If not, check the storage.
  2. Check if the screenshot is already rendered and is stored in the edge cache. If not, render the screenshot.
  3. Once rendered, store it in the storage and in the edge cache.

Accidentally, it turns out that some customers even now use it as a CDN for screenshots!

Access key checks and rate limits

The same as validation, but why not check API keys and rate limits, before sending requests to the rendering services.

Routing requests and retrying

In case if rendering fails, or the main server cluster reports that it is loaded, I just route requests to another data center. With Workers, it is super simple.

Developing and debugging Workers

You can run Workers locally with Miniflare. There is no problem at all. And even proxy requests through them to your endpoints.

If you want to debug workers on production, you can do that both locally with their CLI tool or even through the Cloudflare dashboard:

Logs

Observability

The default Cloudflare dashboard does not provide much information and analytics about Workers runtime:

Workers in the Cloudflare dashboard

If you want more data than on the chart above, you will need to push metrics and logs to an external provider and build your analytics dashboards yourself.

Logging

Workers have a native option to push logs—Logpush. But it only supports a limited set of destinations.

And it has a bit of a latency issue—Logpush delivers logs not immediately but in batches with some delay. It might be critical in some situations.

I use Grafana Loki for logs which wasn’t supported by default. However recently Cloudflare introduced Tail Workers. And I immediately jumped on that opportunity to use them for logging.

Tail Workers basically allow you to spin up a Worker that can be attached to any other Worker and read its output stream and any events happening in the Worker you listen to. You can then do a lot of different things with it, and it is super easy to implement them:

export default {
async tail(events) {
fetch("https://example.com/endpoint", {
method: "POST",
body: JSON.stringify(events),
});
},
};

I also wrote a post about how to push Cloudflare Worker logs to Grafana Loki. And I open-sourced the Tail Worker to push logs to Loki.

Performance

I really enjoy the speed of Workers. Especially, for the validation use cases (not “happy path” requests), e.g. when Workers are executed immediately, reject request and users get almost instant feedback.

Also, to my experience, Cloudflare Workers is significantly faster than other serverless platforms. And Cloudflare has eliminated cold starts for Workers, which is a common performance issue—they execute code immediately.

Workers are executed on the edge, meaning they run on the global network of data centers distributed across hundreds of cities worldwide—it brings computation and data storage closer to your end-users, which significantly improves response times and reduces bandwidth usage.

Smart placement

Smart Placement is a feature designed to optimize the performance of Workers by automatically placing workloads in optimal locations to minimize latency and speed up applications.

I tried to use it, but in my case, it doesn’t make much difference in performance. However, if I had a few more data centers located around the world, running Workers close to them, would boost the performance significantly.

Cost

Workers are suspiciously cheap. There is a screenshot of how much I pay for Workers for serving more than millions requests monthly:

My invoice for Workers

Yes, it is $0 for usage, except $5 for subscription.

They have a generous free plan with 100,000 requests per day and 10 milliseconds of CPU time limit per invocation, but after that, it is still not that costly.

I use the Standard plan. It is a $5 subscription + $0.30 per million requests + $0.02 per million CPU milliseconds. But! 10 million requests 30 million CPU milliseconds are included included per month.

Downsides

There is not a lot of downsides. Workers is already established and popular technology with a huge community of developers. But I want to share a few things that from my experience, I wish I knew when started:

Local development

While Workers have superb developer experience, there is even a local environment for them like Miniflare, you any will encounter some services (e.g. Cloudflare Browser Rendering) that are not available locally.

But it is not a huge problem most of the time.

Timeout

Keep in mind that, the maximum timeout for any proxied HTTP request, including Workers, is 100 seconds, if you need more, you need to upgrade to the enterprise. And the enterprise prices will be way more different than the ones you are used to.

However, if you don’t have any long-running requests, it shouldn’t be a problem for you. You can always add asynchronous API methods that return a response immediately but process everything with the background jobs.

You depend on Cloudflare

I have never encountered any serious issues with Cloudflare for the past 2+ years. And I am a happy customer of them.

But you probably heard about how Cloudflare took down one company website after trying to force the company to pay them $120,000 within 24 hours.

You need to be careful with your infrastructure dependencies. One of the ways, to get the benefits of Cloudflare but make sure you can move if something happens is to use wrappers around their services, e.g.—Hono as a web framework. And register your domain at a different service.

Summary

Considering all the advantages and disadvantages of using workers, I will continue to use them and even allocate more complex logic to them, in order to provide a better API experience for my customers.

I want provide the best performance possible for rendering screenshots, and Workers help me to do that.

However, to mitigate any risks, I might consider writing code that is agnostic to the Cloudflare platform, so that if I need to reduce costs or gain more control over my vendors, I could easily move.

I hope sharing my experience about using Workers for the ScreenshotOne API helped you a bit to understand if it is a good fit for your use case, too.

If you want to discuss more or have any additional questions, don’t hesitate to reach out at hey@screenshotone.com.