Web Site Caching without a Reverse Proxy: how to cache web pages using Apache, mod_cache and mod_cache_disk

Web Site Caching without a Reverse Proxy: how to cache web pages using Apache, mod_cache and mod_cache_disk

In the latest days we’ve published a number of Nginx-related post that might be useful for those who want to implement a FastCGI-Cache or a Proxy-Cache solution to speed up their website. These techniques will greatly benefit any WordPress-based (or Joomla-based) web site in terms of page load, response time and system resource usage and are definitely a better solution compared to caching plugins such as W3C Total Cache, WP Super Cache and so on: although they have the benefit to integrate nicely with the administration interface thanks to their stunning GUI, they still have to pass the request off to a PHP handler. This basically means that there will always be some overhead to the request-handling phase: consequently, the caching process won’t be as lightweight as it would be with some caching mechanism configured at a lower stack.

The proper tool to achieve that is clearly a reverse-proxy with caching features sitting in front of our web server: the big players in that field currently are Nginx, Squid and Varnish, each one coming with their own set of pros and cons… Well, in case of Nginx, the cons are not that much. If you want to follow that route, we strongly suggest reading our recent posts about FastCGI-Cache and Proxy-Cache techniques with Nginx: you’ll see how it would be easy to make Nginx sit in front of Apache with its full set of caching and buffering features, or even replace Apache as a web server and do all the job by itself.

However, in case you don’t want to switch out Apache and don’t want to install and configure a reverse-proxy, you can make good use of some Apache modules to achieve the same result. Although you won’t have all the flexibility granted by Nginx, you’ll still be able to cache your content just like a reverse-proxy does… Without installing anything!

These modules are mod_cache and mod_cache_disk and are shipped with the Apache installation package for all the major Linux distributions, including CentOS 7 – which we’ll be using for this tutorial.

A quick benchmark

The first thing we should do when implementing a caching feature is to setup a benchmark tool that will tell us how fast our request are actually handled. Here’s a simple home-made script that will hit a chosen webpage 10 consecutive times with a 1 second delay between each request and output an average response time:

Alternatively you can install Siege, a great http load testing and benchmarking tool, with the following command:

In case your CentOS installation doesn’t find anything, you can also download and compile siege manually with the following set of commands:

Once installed, you’ll be able to stress-test a web site in multiple ways with highly-configurable shell commands such as the following:

This will tell siege to send 50 concurrent users with a random access delay of 1-3 seconds for 1 minute to the URL addresses specified in the   file: the -i switch instructs siege to pick a random URL for each request, while the -f switch is to tell him to read them from the specified file.

Regardless of the benchmark technique you choose to use, it’s really important to run them on our non-cached website and record the results, so that we’ll be able to compare them with the cached-version and see how much we gained.

Configuring mod_cache

Here’s a good mod_cache configuration that will suit a general-purpose WordPress website:

This snipped can be placed within the Apache configuration file – the   file – either at root level, if we want to cache everything, or inside a   block, if we want to cache out a single virtual host.

Configuration Parameters

Let’s try to understand the meaning of the configuration settings we’ve used above:

mod_cache

  • CacheQuickHandler: The CacheQuickHandler directive controls the phase in which the cache is handled. This parameter is On by default, meaning that the cache will operate within the quick handler phase. This phase short circuits the majority of server processing, and represents the most performant mode of operation for a typical server: the cache bolts onto the front of the server, and the majority of server processing is avoided. When this options is disabled (Off) the cache operates as a normal handler, and is subject to the full set of phases when handling a server request. While this mode is slower than the default, it allows the cache to be used in cases where full processing is required, such as when content is subject to authorization.
  • CacheIgnoreNoLastMod and CacheDefaultExpire: The CacheIgnoreNoLastMod directive provides a way to specify that documents without last-modified dates should be considered for caching, even without a last-modified date. If neither a last-modified date nor an expiry date are provided with the document then the value specified by the CacheDefaultExpire directive will be used to generate an expiration date.
  • CacheIgnoreCacheControl: Tells the server to attempt to serve the resource from the cache even if the request from a client contains no-cache header value.
  • CacheLastModifiedFactor: In the event that a document does not provide an expiry date but does provide a last-modified date, an expiry date can be calculated based on the time since the document was last modified with the CacheLastModifiedFactor directive.
  • CacheIgnoreHeaders: specifies additional HTTP headers that should not to be stored in the cache: for example, it makes sense in some cases to prevent cookies from being stored in the cache.
  • CacheHeader: When the CacheHeader directive is switched on, an   header will be added to the response with the cache status of this response.
  • CacheLockEnables the thundering herd lock for the given URL space.
  • CacheDisable: The CacheDisable directive instructs mod_cache to not cache urls at or below url-string. The values we’ve put here are specific to WordPress and will prevent caching of some admin-related and system-related folders and pages.

mod_cache_disk

  • CacheRoot: This option simply defines where on disk the cache will be stored: the installation default is  , which is acceptable if we enable mod_cache for the whole web server: conversely, if we choose to enable it for some Virtual Hosts only, it would be wiser to create a dedicated sub-folder, such as   , in order to have better control over it – for example, if we want to clear out the cache for a specific web site without affecting the others. IMPORTANT: regardless of our choice, we need to ensure that the read/write permissions for that folder will be set for the apache user (or www-data, or www, or whoever the web service user actually is), otherwise the caching feature won’t work.
  • CacheEnable: This option simply enables the cache for any URL under this domain: we put   there because we wanted to cache the wole website content. To cache a subfolder, type   instead.
  • CacheDirLevels, CacheDirLength and CacheMaxFileSize: This set of options controls how many files can be stored on disk and the max-file size (in bytes) that can be committed to cache. CacheDirLevels specifies how many levels of subdirectory there should be, while CacheDirLength specifies how many characters should be in each directory. With the example settings given above, the hash would be turned into a filename prefix as  . The overall aim of this technique is to reduce the number of subdirectories or files that may be in a particular directory, as most file-systems slow down as this number increases. With setting of 1 for CacheDirLength there can at most be 64 subdirectories at any particular level. With a setting of 2 there can be 64 * 64 subdirectories, and so on: unless you have a good reason not to, using a setting of 1 for CacheDirLength is recommended. Setting CacheDirLevels depends on how many files you anticipate to store in the cache. With the setting of 2 used in the above example, a grand total of 4096 subdirectories can ultimately be created: with 1 million files cached, this works out at roughly 245 cached URLs per directory. CacheMaxFileSize controls the max size of a file that can be stored (in bytes).

The latest two settings, SetOutputFilter and AddOutputFilterByType, are a workaround required to avoid caching multiple copies of the same page – one for each different  header value received from the clients. In a nutshell, the SetOutputFilter settings will force all output to be passed through the CACHE filter, ensuring that the content will be saved to our cache before further processing: right after that, all text-based content will pass through the DEFLATE filter so that they will be compressed with the encoding format(s) accepted by each client.

However, such workaround comes with a nasty caveat: the headers set by the mod_expires module won’t be honored by mod_cache because the latter will actually kick in first, meaning that all content will be cached for the duration of the CacheDefaultExpire value specified above. If you don’t mind caching a different version of the same page for each different   header value, you could as well skip these two parameters and get rid of that issue as well.

Post-Implementation Benchmarks

It’s now time to use our self-made script (or launch Siege) again and see how much we gained in terms of average response time: the performance gain with mod_cache is usually much better (between 2x and 4x times faster) than what you can get using WordPress caching plugins – and much similar to what we can get with the top two Nginx alternatives (FastCGI-Cache and Proxy-Cache).

That’s about it: we sincerely hope that this tutorial will help our readers to increase the speed, response time and overall performance of their web sites.

 

RELATED POSTS

About Ryan

IT Project Manager, Web Interface Architect and Lead Developer for many high-traffic web sites & services hosted in Italy and Europe. Since 2010 it's also a lead designer for many App and games for Android, iOS and Windows Phone mobile devices for a number of italian companies.

View all posts by Ryan