Stephen Newey


Cheap, resilient hosting

10 February 2015

One of my clients is a digital agency, producing Django CMS powered websites for a range of clients, many in the legal industry. These clients care about the uptime of their sites but don’t have big budgets.

My client wants to offer redundancy both in physical location and in service provider. They’re especially keen that hardware failure doesn’t result in all their sites going down at the same time.

The budget available doesn’t extend to the costs involved in purchasing duplicate sets of hardware, hosting it at two locations and provisioning BGP routable IP addresses to allow automatic failover.

A more affordable alternative

Typically these aren’t highly trafficked sites and need no more than 1-2Gb of RAM and a couple of CPU cores to offer acceptable performance. Our policy is that each site uses TLS encryption, both for the privacy benefits and for improved Google ranking. For now this means that to support as broad a user base a possible we need to dedicate each site its own IPv4 address.

Given these requirements hosting with Virtal Private Server providers like DigitalOcean 1 and Linode are ideal. Each site has it’s own dedicated pair of VPSes, one on each provider, limiting the impact of any single hardware or network failure. We balance things so that half of the sites are live on ‘primary’ VPSes with DigitalOcean and the other half with Linode.

I’ve built a deployment system using Ansible and Fabric that sets up each new host pair for database replication and near-realtime syncronisation of uploaded media. The system takes a Git URL and a few settings and deploys the site on both hosts. A single command deploys future changes to both. I’ll go into more detail about the deployment processes in a future post.

An important detail is that the end clients understandably don’t want to hand over control of their DNS. In most cases the process of getting DNS changes made is a difficult and slow one involving internal IT departments and external providers.

These circumstances mean a failover procedure would take a considerable amount of time and involve many people. The common solution is to ask the clients to set their DNS records to CNAME to a domain name with a short TTL that we control. And this is what we do for “www” records.

We host a domain for this purpose on Amazon’s Route 53 service and the deployment process can automatically create records for new projects and change records in the event of a failover.

Solving the root CNAME problem by sidestepping it

Whilst CNAMEs solve our problem for www.client.com we still have an issue with visitors to client.com. CNAMEs should not be used at the root level of a domain2. All kinds of things can and will break if you try.

Some DNS providers offer a solution to this in the form of custom record types like ALIAS or ANAME, where they do the work to return the correct address at the time of the query. Whilst this is ideal, getting clients to change DNS providers is definitely not.

You may know that for SEO purposes it’s important to have a single, canonical URL for your web pages. Showing the same content at client.com and www.client.com is harmful to your ranking. Given that, any request to a URL to client.com should return a redirect to the relevant page on www.client.com.

What we need is a server with a static IP address that can easily be shifted onto another host should there be a hardware/network failure. That server only needs to know what domains we’re hosting to perform the correct redirects. It doesn’t need to be particularly powerful because it won’t experience much traffic and nginx makes light work of redirecting.

An Amazon EC2 micro instance with an Elasic IP address is ideal.

We allocate an Elastic IP in the AWS console and give this address to our clients to use as the root A record in their DNS. A simple Ansible playbook creates a new instance and binds our elastic IP to it in less than 5 minutes in the event an instance or availability zone fails (note an elastic IP can be used in any availability zone within a region).

---
- hosts: localhost
  gather_facts: false

  tasks:
    # 1 vCPU with credits system, 1GB RAM, approx 10USD/month ubuntu 14.04 with hvm
    - name: start new instance
      local_action: ec2 key_name="MYKEY" instance_type="t2.micro" image="ami-f0b11187" wait=yes zone="{{ lookup('env', 'AWS_ZONE') }}" region="eu-west-1" group="webserver"
      register: ec2_info

    - name: assign elastic IP
      local_action: ec2_eip instance_id={{ item }} ip=192.168.100.1 reuse_existing_ip_allowed=yes region="eu-west-1"
      with_items: ec2_info.instance_ids

A second Ansible playbook collates the domain names used in our projects, generates the appropriate nginx configuration and sets everything up on the new instance. Again, within a couple of minutes.

Conclusion

Aided by cheap, yet dependable hosting packages and some simple automation we’re able to offer clients resilient hosting across two physical locations, network connections and providers for under 50 USD per month.

The failover process does require manual intervention. That intervention is simply executing another playbook that handles updating a few configuration options on the secondary instances and switching where our CNAMEd addresses point.

I’m reluctant to ever fully automate the failover process. The issues around the secondary site knowing with certainly that the primary has failed and performing the switchover are difficult to overcome.

The solution outlined above generally has sites back up and running in well under 30 minutes of notification that there’s a problem, and that seems to be good enough.

  1. Note: these links include my referral code and give me a little kickback in the form of hosting credit if you become a paying customer.
  2. RFC covering the issue. Search it for “CNAME RR