Varnish and Pressflow (Drupal) - Improve HIT rate and SEO - 301 redirects using Varnish

Submitted by Janak on Mon, 10/04/2010 - 18:52

Following on from previous post about VCL tweaks to improve hitrate; there are occasions when a website should not be served from both www.foobar.com and http://foobar.com. In some instances Google will deem the content to be duplicate copy of each other and a website can suffer from dupe content penalty.

In such cases, it is often best to redirect (301) the incoming request to either automatically add or remove www to the domain name. So, if a request comes in for http://foo.com, you can get Apache to redirect to http://www.foo.com. Something like:

Apache redirect example

<IfModule mod_rewrite.c>
   Options +FollowSymLinks
   Options +Indexes
   RewriteEngine On
   RewriteBase /
   RewriteCond %{HTTP_HOST} ^foo\.com$
   RewriteRule ^(.*)$ http://www.foo.com/$1 [R=301,L]
</IfModule>

This works great BUT it requires a redirect at the Apache level, wasting precious Apache resources. Wouldnt it be great if Varnish could do the redirect instead and lookup the requested page in cache without waking Apache? Here is how:

Concept is simple:
- varnish checks incoming request.
- match criteria for host
- throw an error
- catch the error
- redirect

Varnish 301 Redirect VCL example


sub vcl_recv {

// rediercts for subdomain, add www
if (req.http.host == "foo.com") {
error 301;
}

// rediercts for subdomain, remove www
if (req.http.host == "www.barbaz.com") {
#set req.http.host = "barbaz.com";
error 301;
}
}

sub vcl_error {

// 301 if the domain does not contain www
if (obj.status == 301 && req.http.host == "foo.com") {
set obj.http.Location = "http://www.foo.com" req.url;
set obj.status = 301;
return(deliver);
}

// 301 redirect if domain contains www
if (obj.status == 301 && req.http.host == "www.barbaz.com") {
set obj.http.Location = "http://barbaz.com" req.url;
set obj.status = 301;
return(deliver);
}
}