Cache

Caching is a multilayer technique. Each type of content requires its own caching approach.

Static content

Static files, images, and documents could be cached on the server or CDN level by filename or meta-information. Cache invalidation could be done via conditional requests or filename param modification.

Dynamic content

Dynamic content is much harder to cache as it requires clear invalidation criteria. Even harder is personalized content (that is unique per user) - be careful with it, sometimes caching and invalidating this cache could be even more resource consumption, than having no cache at all!

The most obvious solution for a personalized content is to cache on application level: whole pages, fragments or long queries. It is up to you to distinguish and measure what part of long page generation process is the slowest.

Nginx page microcache

The whole page could be cached on the web server level. Nginx server has a “microcache” feature that could be applied to the whole site or selected locations and cache even dynamic content for a short period of time (about 1 minute). For example, even if you have a highly dynamic page with ranked posts, that changes each 10 seconds (according to votes and number of comments), it seems that you need to generate it on each request to show actual data. But if you have a highload site with 1000 simultaneous visitors (that refresh a page each minute) you can do only 1 page generation instead of 1000+. Of course, it depends on the sensitivity of the data (you won’t cache stock data) and you need to adjust appropriate cache lifetime while your users would not notice that data is “dramatically outdated”.

Application level key-value cache

A more correct way to do caching, without tricks with total microcaching, is to have a strict cache key and to regenerate content when this key changes. Very often in-memory key-value storages like Redis and Memcached are used to store cached data (due to high speed of data access).

But even a file-cache could be productive if you cache operation that takes tens of seconds (the only thing to remember: file-system operation could be a performance bottleneck as disk read speed is very limited, so it should not be very often).

Another thing to remember - saving Ruby data structures like Hash or Array to the external cache (file or memory based) requires to serialize it into text representation (and vice versa on reading). This operation takes CPU resources and time. Usually, it is a JSON or YAML representation and not all data structures could be represented into the string automatically. Also Ruby provides a special Marshal class to serialize Objects.

Let’s imagine the same page with the ranged list of posts, you want to cache just a plain data of ranged list and automatically rebuild cache when needed. You can make a cache key that includes data like last post date (when a new post is added, you cache key is changed and the page is regenerated). As you see here, to determine cache key you still need to perform one DB query (to get latest post date). It is a reasonable tradeoff of you spare 10-20 or more queries with this caching.

The main rule here is - cache key computation should be as simple as possible. Don’t make too many dependencies in the cache key.