Web Caching without Reverse Proxy using Apache and mod

Table of Contents

A quick benchmark
Configuring mod_cache
Configuration Parameters
- mod_cache
- mod_cache_disk
Post-Implementation Benchmarks

In the latest days we've published a number of Nginx-related post that might be useful for those who want to implement a FastCGI-Cache or a Proxy-Cache solution to speed up their website. These techniques will greatly benefit any WordPress-based (or Joomla-based) web site in terms of page load, response time and system resource usage and are definitely a better solution compared to caching plugins such as W3C Total Cache, WP Super Cache and so on: although they have the benefit to integrate nicely with the administration interface thanks to their stunning GUI, they still have to pass the request off to a PHP handler. This basically means that there will always be some overhead to the request-handling phase: consequently, the caching process won't be as lightweight as it would be with some caching mechanism configured at a lower stack.

The proper tool to achieve that is clearly a reverse-proxy with caching features sitting in front of our web server: the big players in that field currently are Nginx, Squid and Varnish, each one coming with their own set of pros and cons... Well, in case of Nginx, the cons are not that much. If you want to follow that route, we strongly suggest reading our recent posts about FastCGI-Cache and Proxy-Cache techniques with Nginx: you'll see how it would be easy to make Nginx sit in front of Apache with its full set of caching and buffering features, or even replace Apache as a web server and do all the job by itself.

However, in case you don't want to switch out Apache and don't want to install and configure a reverse-proxy, you can make good use of some Apache modules to achieve the same result. Although you won't have all the flexibility granted by Nginx, you'll still be able to cache your content just like a reverse-proxy does... Without installing anything!

These modules are mod_cache and mod_cache_disk and are shipped with the Apache installation package for all the major Linux distributions, including CentOS 7 - which we'll be using for this tutorial.

A quick benchmark

The first thing we should do when implementing a caching feature is to setup a benchmark tool that will tell us how fast our request are actually handled. Here's a simple home-made script that will hit a chosen webpage 10 consecutive times with a 1 second delay between each request and output an average response time:

i=0; while [$i -lt 10]; do time -p curl "http://www.example.com/" > /dev/null; sleep 1; i=$[$i+1]; done 2>&1 | grep real | awk '{print $2}' | awk '{avg += ($1 - avg) / NR;} END {print "Average: " avg "s";}'

1	i=0; while [$i -lt 10]; do time -p curl "http://www.example.com/" > /dev/null; sleep 1; i=$[$i+1]; done 2>&1 \| grep real \| awk '{print $2}' \| awk '{avg += ($1 - avg) / NR;} END {print "Average: " avg "s";}'

Alternatively you can install Siege, a great http load testing and benchmarking tool, with the following command:

yum install siege

1	yum install siege

In case your CentOS installation doesn't find anything, you can also download and compile siege manually with the following set of commands:

yum install wget nano build-essential zlib zlib-devel libssl-dev openssl-devel -y
yum group install development tools
wget http://download.joedog.org/siege/siege-latest.tar.gz
tar -zxvf siege-latest.tar.gz
cd siege-*/
./configure --with-ssl
make && make install
siege.config
siege -C

yum install wget nano build-essential zlib zlib-devel libssl-dev openssl-devel -y

yum group install development tools

wget http://download.joedog.org/siege/siege-latest.tar.gz

tar -zxvf siege-latest.tar.gz

cd siege-*/

./configure --with-ssl

make && make install

siege.config

siege -C

Once installed, you'll be able to stress-test a web site in multiple ways with highly-configurable shell commands such as the following:

siege -c50 -d3 -t1M -i -f url_list.txt

1	siege -c50 -d3 -t1M -i -f url_list.txt

This will tell siege to send 50 concurrent users with a random access delay of 1-3 seconds for 1 minute to the URL addresses specified in the url_list.txt file: the -i switch instructs siege to pick a random URL for each request, while the -f switch is to tell him to read them from the specified file.

Regardless of the benchmark technique you choose to use, it's really important to run them on our non-cached website and record the results, so that we'll be able to compare them with the cached-version and see how much we gained.

Configuring mod_cache

Here's a good mod_cache configuration that will suit a general-purpose WordPress website:

# The following line could be required or not depending on your Apache installation
LoadModule cache_module modules/mod_cache.so

<IfModule mod_cache.c>
    CacheQuickHandler off

    CacheIgnoreNoLastMod On
    CacheDefaultExpire 7200

    CacheIgnoreCacheControl On
    CacheLastModifiedFactor 0.5
    CacheIgnoreHeaders Set-Cookie Cookie
    CacheHeader on
    CacheLock on
    CacheDisable /wp-admin
    CacheDisable /wp-login.php
    CacheDisable /wp-cron.php

    SetOutputFilter CACHE
    AddOutputFilterByType DEFLATE text/html text/plain text/css application/javascript application/rss+xml text/xml image/svg+xml

    # The following line could be required or not depending on your Apache installation
    LoadModule cache_disk_module modules/mod_cache_disk.so

    <IfModule mod_cache_disk.c>
        CacheRoot /var/cache/apache2/mod_cache_disk
        CacheEnable disk /
        CacheDirLevels 2
        CacheDirLength 1
        CacheMaxFileSize 2000000
    </IfModule>
</IfModule>

# The following line could be required or not depending on your Apache installation

LoadModule cache_module modules/mod_cache.so

CacheQuickHandler off

CacheIgnoreNoLastMod On

CacheDefaultExpire 7200

CacheIgnoreCacheControl On

CacheLastModifiedFactor 0.5

CacheIgnoreHeaders Set-Cookie Cookie

CacheHeader on

CacheLock on

CacheDisable /wp-admin

CacheDisable /wp-login.php

CacheDisable /wp-cron.php

SetOutputFilter CACHE

AddOutputFilterByType DEFLATE text/html text/plain text/css application/javascript application/rss+xml text/xml image/svg+xml

# The following line could be required or not depending on your Apache installation

LoadModule cache_disk_module modules/mod_cache_disk.so

CacheRoot /var/cache/apache2/mod_cache_disk

CacheEnable disk /

CacheDirLevels 2

CacheDirLength 1

CacheMaxFileSize 2000000

</IfModule>

This snipped can be placed within the Apache configuration file - the httpd.conf file - either at root level, if we want to cache everything, or inside a <VirtualHost> block, if we want to cache out a single virtual host.

Configuration Parameters

Let's try to understand the meaning of the configuration settings we've used above:

mod_cache

CacheQuickHandler: The CacheQuickHandler directive controls the phase in which the cache is handled. This parameter is On by default, meaning that the cache will operate within the quick handler phase. This phase short circuits the majority of server processing, and represents the most performant mode of operation for a typical server: the cache bolts onto the front of the server, and the majority of server processing is avoided. When this options is disabled (Off) the cache operates as a normal handler, and is subject to the full set of phases when handling a server request. While this mode is slower than the default, it allows the cache to be used in cases where full processing is required, such as when content is subject to authorization.
CacheIgnoreNoLastMod and CacheDefaultExpire: The CacheIgnoreNoLastMod directive provides a way to specify that documents without last-modified dates should be considered for caching, even without a last-modified date. If neither a last-modified date nor an expiry date are provided with the document then the value specified by the CacheDefaultExpire directive will be used to generate an expiration date.
CacheIgnoreCacheControl: Tells the server to attempt to serve the resource from the cache even if the request from a client contains no-cache header value.
CacheLastModifiedFactor: In the event that a document does not provide an expiry date but does provide a last-modified date, an expiry date can be calculated based on the time since the document was last modified with the CacheLastModifiedFactor directive.
CacheIgnoreHeaders: specifies additional HTTP headers that should not to be stored in the cache: for example, it makes sense in some cases to prevent cookies from being stored in the cache.
CacheHeader: When the CacheHeader directive is switched on, an X-Cache header will be added to the response with the cache status of this response.
CacheLock: Enables the thundering herd lock for the given URL space.
CacheDisable: The CacheDisable directive instructs mod_cache to not cache urls at or below url-string. The values we've put here are specific to WordPress and will prevent caching of some admin-related and system-related folders and pages.

mod_cache_disk

CacheRoot: This option simply defines where on disk the cache will be stored: the installation default is /var/cache/apache2/mod_cache_disk , which is acceptable if we enable mod_cache for the whole web server: conversely, if we choose to enable it for some Virtual Hosts only, it would be wiser to create a dedicated sub-folder, such as /var/cache/apache2/mod_cache_disk/example.com , in order to have better control over it - for example, if we want to clear out the cache for a specific web site without affecting the others. IMPORTANT: regardless of our choice, we need to ensure that the read/write permissions for that folder will be set for the apache user (or www-data, or www, or whoever the web service user actually is), otherwise the caching feature won't work.
CacheEnable: This option simply enables the cache for any URL under this domain: we put / there because we wanted to cache the wole website content. To cache a subfolder, type /subfolder-name instead.
CacheDirLevels, CacheDirLength and CacheMaxFileSize: This set of options controls how many files can be stored on disk and the max-file size (in bytes) that can be committed to cache. CacheDirLevels specifies how many levels of subdirectory there should be, while CacheDirLength specifies how many characters should be in each directory. With the example settings given above, the hash would be turned into a filename prefix as /var/cache/apache2/mod_cache_disk/x/y/<unique_identifier> . The overall aim of this technique is to reduce the number of subdirectories or files that may be in a particular directory, as most file-systems slow down as this number increases. With setting of 1 for CacheDirLength there can at most be 64 subdirectories at any particular level. With a setting of 2 there can be 64 * 64 subdirectories, and so on: unless you have a good reason not to, using a setting of 1 for CacheDirLength is recommended. Setting CacheDirLevels depends on how many files you anticipate to store in the cache. With the setting of 2 used in the above example, a grand total of 4096 subdirectories can ultimately be created: with 1 million files cached, this works out at roughly 245 cached URLs per directory. CacheMaxFileSize controls the max size of a file that can be stored (in bytes).

The latest two settings, SetOutputFilter and AddOutputFilterByType, are a workaround required to avoid caching multiple copies of the same page - one for each different Accept-Encoding header value received from the clients. In a nutshell, the SetOutputFilter settings will force all output to be passed through the CACHE filter, ensuring that the content will be saved to our cache before further processing: right after that, all text-based content will pass through the DEFLATE filter so that they will be compressed with the encoding format(s) accepted by each client.

However, such workaround comes with a nasty caveat: the headers set by the mod_expires module won't be honored by mod_cache because the latter will actually kick in first, meaning that all content will be cached for the duration of the CacheDefaultExpire value specified above. If you don't mind caching a different version of the same page for each different Accept-Encoding header value, you could as well skip these two parameters and get rid of that issue as well.

Post-Implementation Benchmarks

It's now time to use our self-made script (or launch Siege) again and see how much we gained in terms of average response time: the performance gain with mod_cache is usually much better (between 2x and 4x times faster) than what you can get using WordPress caching plugins - and much similar to what we can get with the top two Nginx alternatives (FastCGI-Cache and Proxy-Cache).

That's about it: we sincerely hope that this tutorial will help our readers to increase the speed, response time and overall performance of their web sites.

Print Friendly & PDF Download

2 Comments on “Web Site Caching without a Reverse Proxy: how to cache web pages using Apache, mod_cache and mod_cache_disk”

Thomas says:

July 22, 2019 at 11:25

Hi Ryan,

Great article, just remember that those caching plugins have advanced modes where you can enable static html files to be written to disk which then can be used with .htaccess to gain full performance without any php scripts.

Waqass Khalid says:

October 24, 2019 at 16:58

Hi Thomas…you mean to say this will only tweak WP? we should go for other plugins like varnish or nginx reverse proxy over apache in order to cover all the other aspects of cache?

Web Site Caching without a Reverse Proxy: how to cache web pages using Apache, mod_cache and mod_cache_disk

A quick benchmark

Configuring mod_cache

Configuration Parameters

mod_cache

mod_cache_disk

Post-Implementation Benchmarks

About Ryan

2 Comments on “Web Site Caching without a Reverse Proxy: how to cache web pages using Apache, mod_cache and mod_cache_disk”

Leave a Reply Cancel reply

A quick benchmark

Configuring mod_cache

Configuration Parameters

mod_cache

mod_cache_disk

Post-Implementation Benchmarks

Related Posts

Linux - Set default permissions when creating new Files with SSH/FTP How to set default RWX permissions when creating, uploading, or adding new files in Ubuntu, CentOS, Debian, and other Linux distributions

The role of the Web Server General overview of the tool that handles the HTTP requests and provides responses: what it is, what it does, what it is for

Web Administrator Training Course A learning path to acquire the necessary skills to configure, manage and administer a web server on Windows, Linux, and in the Cloud

About Ryan

2 Comments on “Web Site Caching without a Reverse Proxy: how to cache web pages using Apache, mod_cache and mod_cache_disk”

Leave a Reply Cancel reply