Published: April 03, 2025 by PerfGrid
Pulse: Week 14, 2025
Welcome to the fifth edition of our Pulse series, where we share insights into our infrastructure changes, service improvements, and other behind-the-scenes activities at PerfGrid.
Varnish 7.7 Rollout in Progress
With the release of Varnish Cache 7.7 on March 17th, we began preparing our builds to upgrade our Photon Optimizer setup, including compiling a few custom packages against Varnish 7.7.
The release doesn't introduce many new features beneficial to us; instead, it primarily fixes some bugs, one of which had a side effect on our logging used for statistics.
Prior to Varnish 7.7, you could still access response headers in varnishncsa
even if they were unset within the vcl_deliver
VCL.
Our log format is slightly customized to include details like the percentage of savings when going through our Optimizer, as well as information such as the upstream domain ID, the "path" taken within our OpenResty setup, and metrics like TTFB.
For example, the Domain ID could previously be accessed in varnishncsa
using something like %{x-dom-id}o
to read the Domain ID response header. However, Varnish 7.7 now correctly applies set
and unset
changes within VCL to varnishncsa
as well, meaning that if a header is unset, it will no longer be available within varnishncsa
. To adapt, we modified our configuration to reference these headers via the VSL context, e.g., %{VSL:RespHeader:x-dom-id}x
.
We caught this during testing and were able to fix it before rolling it out to the first few locations.
Photon Optimizer Improvements
In certain cases, when converting images on the fly to WebP or AVIF, the result can be larger than the original format, such as JPEG or PNG. This leads to negative savings, defeating the purpose of the conversion. This issue typically occurs with very small images, usually under 2 kilobytes.
We naturally wanted a clean solution to this. Our initial idea was to intercept the response, check the savings, and if negative, perform a sub-request within OpenResty to return the original image.
While this isn't difficult—OpenResty's ngx.exec
call makes such sub-requests straightforward—it still means we're spending time optimizing an image for no benefit, only to then revert to the original.
So, we needed another approach: perform the initial optimization, detect negative savings, and then remember the result for future requests. There are many ways to do this, such as maintaining a list of URLs with negative savings and bypassing optimization based on this list. This would work well in our case since the number of affected images is very small (hundreds), but as we scale, this could become a bottleneck and would involve more Valkey lookups, which we'd like to avoid.
Varnish to the Rescue
One great thing about Varnish is its ability to access HTTP headers from cached objects and use them during revalidations towards the Photon Optimizer origin. When we detect negative savings, we now do two things:
- Set a low TTL on the object but increase its grace TTL, so Varnish will revalidate the image much sooner.
- Add a special request header during revalidation to instruct OpenResty to serve the original image.
Then, within OpenResty, we simply check for this special request header and handle the logic accordingly.
As a result, we may serve an image with negative savings briefly (less than a minute), but from then on, we'll serve the original image, which is the optimal choice in terms of size.
This approach allows us to build the logic without maintaining any state or list of affected URLs. We simply leverage the available object metadata in Varnish.
nlsh02 Migrations Completed
We've completed the migration from nlsh02 to nlsh05. Most websites experienced only 10-15 seconds of downtime, while the site with the longest downtime was offline for 7 minutes due to a large database that needed careful handling to ensure consistency during the transfer.
Overall, we've observed roughly a 15% improvement in Time to First Byte (TTFB) metrics for the migrated sites. Some see slightly higher gains, while others slightly lower.
As part of the migration, we also identified a few improvements for our internal tooling for Grid Hosting plan migrations. We will continue implementing these changes over the next few weeks, aiming to have most of them ready for our next scheduled migration.