some practical caching notes

Cache-Control header

  • public
  • private

  • max-age

  • must-revalidate — "If the response includes the 'must-revalidate' cache-control directive, the cache MAY use that response in replying to a subsequent request. But if the response is stale, all caches MUST first revalidate it with the origin server, using the request-headers from the new request to allow the origin server to authenticate the new request" HTTP 1.1 spec
  • no-cache — when no-cache, "...a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests" HTTP 1.1 spec

    "...in practice, IE and Firefox have started treating the no-cache directive as if it instructs the browser not to even cache the page. We started observing this behavior about a year ago. We suspect that this change was prompted by the widespread (and incorrect) use of this directive to prevent caching." Cache Control Directives Demystified

    McAfee Web Gateway was not re-validating with public, max-age=31557600, no-cache, causing corporate clients to not see changes. It seems the McAfee cache was paying attention to max-age and ignoring the no-cache. Changed to public, max-age=0, must-revalidate (that way, we continue to get caching in Firefox and IE, but intermediary caches will see that it is stale and that they must revalidate).
  • no-store


The default Rails cache-control header is "max-age=0, private, must-revalidate" presumably to consistently achieve the intended effect of no-cache.

demo

Last-Modified & ETag

When the browser determines that it needs to revalidate a cached copy, the browser can send:

  • If-Modified-Since (when it received Last-Modified)
  • If-None-Match (when it received an ETag)

demo

checking cache settings in Chrome

  • Press return in address bar = refresh =  revalidate using previous header values; response may be new content or 304 Not Modified

  • Refresh + shift = send a request without the If-Modified-Since or If-None-Match, and including no-cache headers, to force the server to send content again

  • Only navigation via clicks seems to use local cache without revalidating

reverse proxy caches (web accelerators)

A reverse proxy cache sit between the internet and the origin server. It receives incoming requests and only sends to the origin server those requests that it cannot fulfill (based on the headers defined when an object was returned by the origin server).

As an added benefit, the reverse proxy cache can terminate SSL to reduce load on the origin server.

CloudFront

Amazon's globally distributed reverse proxy cache

Of note:
  • CloudFront prioritizes max-age
  • when backed by S3, it does not support the Content-Encoding header
  • it takes many minutes to invalidate a resource cached to CloudFront
  • our experience with videos from S3

Varnish

An open source web accelerator

Of note:
  • configurable via VCL (Varnish Configuration Language) example
  • run your own Varnish instances (there is also Fastly— globally distributed Varnish as a service, including VCL support)
  • instantly purge items from Varnish
  • health-check the back-end and protect from traffic when down
  • grace period—when an object is expired, but requested again, Varnish can fulfill the request immediately with the expired version if it is still within the grace period rather than making it wait while it goes and gets the fresh resource (which it does, it just doesn't make the requester wait for the fresh version); make sure max-age > 0 or this grace period can be confusing

caching Paperclip files in CloudFront

Since CloudFront supports S3 buckets as distributions, this is trivial:

class Image < ActiveRecord::Base
    attr_accessible :source

    has_attached_file :source,
                                        :styles => { :medium => "300x300>", :thumb => "100x100>" },
                                        :storage => :s3,
                                        :s3_credentials => "#{Rails.root}/config/s3.yml",
                                        :path => "images/:id/:"

    def source_url(style=nil)
        "//d3l5bx7ow11yzt.cloudfront.net/#{source.url(style)}"
    end
end


This is only suitable for assets that never change (or in some other, specific use cases) because invalidating CloudFront objects takes on the order of 10 minutes.

Note our experience with videos

caching the asset pipeline in CloudFront

The Rails asset pipeline is a perfect candidate for CloudFront because the precompiled assets have file names fingerprinted based on their contents. For this to work, you must adhere strictly to the asset pipeline when including assets—that is, always use asset_path & co. for images, etc. The below approach involves precompiling assets locally (well, at least not on the production server) and moving them to S3:

  • assets:move_to_s3 rake task

  • precompile_and_deploy bash script (since we are always using the fingerprinted file names, we can cut the precompilation time in half by doing rake assets:precompile:primary, which only computes the hashed version of the files)

  • since S3 does not negotiate the Content-Encoding header, you can either serve everything gzipped or you can back CloudFront with a server that handles content negotiation, in which case CloudFront will properly deliver subsequent requests based on the Conten-Encoding header; the example VCL file shows how to do this using Varnish

  • in order for asset pipeline helpers to generate URLs pointing at CloudFront, add the following lines to config/environments/production.rb:

    # Enable serving of images, stylesheets, and JavaScripts from an asset serve
    # make asset_path generate // format URLs in web pages so they take on the protocol of the page
    config.action_controller.asset_host = "//d3l5bx7ow11yzt.cloudfront.net"
    # action_mailer line needs a protocol because email clients don't have a protocol to inherit
    config.action_mailer.asset_host = "https://d3l5bx7ow11yzt.cloudfront.net"


WARNING: you must ensure that in your VCL, you delete the Set-Cookie header from assets or visitors will get cookies from your users—that is, they can suddenly be signed in as another user simply by visiting your site

how we are using this at Populr

  • precompiled assets served via Varnish backed (for Content-Encoding negotiation) CloudFront distribution

  • image transformer backed CloudFront distribution

  • published pops, which need to be cached in Varnish and the browser, but re-validated by the browser every time they are opened