Tag: nginx

Scaling my blog for high traffic – WordPress + Apache + Varnish + Nginx

Monday, November 12th, 2012

On the 3rd of October, after some quite high load, my website crashed and went offline. Given I’d just gone to Sweden, this was a bit awkward.

Looking at the stats – on the 2nd of October it was getting on average 5 page loads *per second* – except it wasn’t evenly spread traffic – it peaked much high with 9% of the whole day’s traffic happening between 7pm and 8pm.

At the time, the site was running on a simple Debian BigV VM, with 1GB ram, SATA disk storage, 2 cores and 1 GB of swap and I thought it might be interesting to look at the architecture I was using, which let a relatively low specification machine handle 25,000 page views in 24 hours with 9,000 unique visitors.

This website's architecture

This website's architecture

When a request is sent for a page on the website, the first thing it hits in nginx – if it’s a static asset – an uploaded image like the one above, or part of the theme etc. it is served directly from disk by nginx. If it’s anything else, it’s sent to Varnish, which has a cache of pages it’s previously loaded sitting in memory (malloc), if the page hasn’t been found in the varnish cache, it’s passed back to Apache and the WordPress/PHP/MySQL stack sorts it out and sends it back. The next time that page is asked for, Varnish will send the cached version.

I installed WordPress installed from the Debian package, for ease of upgrades and there are two performance related WordPress plugins installed – W3 Total Cache and WP Varnish. W3 Total Cache does a bit of caching into memcache, and a few other tweaks, but the majority of the load is handled in Varnish. Using Apache to do webserver-ing makes life simpler, because we don’t have to mess around rewriting .htaccess rules into a different syntax.

WP Varnish is basically the hook we need to flush the varnish cache whenever something changes – when someone comments, when a page is updated, when a new page is added – WP Varnish will issue varnish with a “flush” that will ensure that viewers see the most up to date page.

The nice thing about this is that users can look at pages without ever touching Apache or the database once, with the page being dumped out over the network port from RAM, and the images etc, simply being read off the disk – resulting in fast server response times, and great scaling.

When, on the third, Varnish crashed, I hadn’t ever envisaged that amount of traffic – in fact I’d deliberately made the server relatively small/underspecified to see how it would perform under pressure.

Arguably, it’s over complicated, has unnecessary complication (W3 Total Cache + Memcache seem like recipes for confusion and pain) and I could just use nginx, php-fpm and varnish, but this setup gives me the flexibility and known quantity that is Apache, whilst letting me still scale my site to reddit-proof-like proportions.

I think really the lesson I learnt, is that all the services should be running under runit or monit, to make sure that, in the event of a service stopping responding, it’ll be automatically restarted.

How to setup a proxy for a website like the Pirate Bay

Wednesday, May 2nd, 2012

You may have read recently about attempts to block the Pirate Bay.

There are a variety of reasons I think this is a bad idea, perhaps I’ll write a post about it, but this is simply about how to quickly and easily deploy a web proxy for a specific website which could be anywhere in the world.

This is REALLY quick and simple. Let’s go!

1) Go to LowEndBox.com and buy a cheap VM of your choice.

  • The more exotic the location the better, though even the UK should work.
  • The specification doesn’t matter, though 128MB of RAM or more will be best.
  • Don’t accept anything with less 15GB monthly bandwidth
  • I’d expect, even at peak, your proxy to use less that 500MB/month – well within most limits.
  • Be aware of your provider’s T’s & C’s. They may not like you.

NOTE: For other uses, I’d recommend networks with more reliable reputations than simply “is cheap” – ability to reimage, console access, awesome support – this time none of those are required.

2) Request Debian Squeeze or Ubuntu Lucid 10.04 as the server OS

  • You could use other things, we’re going to use Debuntu.

3) Login as root. If you’re not root, you can always “sudo -s” for root.

Let’s update the machine and install the nginx – the program that we’re going to be using.

  • apt-get update
  • apt-get dist-upgrade -y
  • apt-get install nginx -y

4) Let’s configure your DNS before we go further. I’m assuming you have a domain – yourdomain.com. Go to your domain’s DNS records and create an “A” record called tpb.yourdomain.com, with your server’s IP address as the details. The TTL doesn’t matter, but generally you’ll prefer smaller to larger. Save that, and let’s get back to the server!

5 ) Let’s configure nginx:

  • nano /etc/nginx/sites-enabled/tpb.config
  • Paste in :
    server {
    listen THESERVERSIPADDRESS:80;

    server_name tpb.yourdomain.com;
    location / {
    proxy_passĀ  http://thepiratebay.se/;
    }
    access_log /dev/null;
    error_logĀ  /dev/null;
    }

Obviously, you’ll need to change tpb.yourdomain.com and YOURSERVERSIPADDRESS to what they actually are.

To save this, type ctrl-o, *enter* ctrl-x.

6) You can now configure SSL if you want, or leave it unconfigured as it is. I’m not going to cover this here, right now, but it’s a nice touch.

7) Run:

  • /etc/init.d/nginx restart

Hopefully nginx should restart without errors. If there are errors, look at them carefully and try and understand where you might need to go back to.

8) Go to tpb.yourdomain.com – hopefully your DNS changes will have been noticed by now and that should work nicely.

9) Publicise your URL to your friends and family.

10) Introduce someone else to these instructions. :)

Building reddit-frontpage-proof websites?

Tuesday, February 7th, 2012

Rick Falkvinge’s website Falkvinge.net recently frontpaged reddit.

Falkvinge.net

Falkvinge.net

Actually, in the default setup, he had three articles, (#16, #22, #24), which, as he says, is a record for him.

Why is this a big deal? Well with reddit being Alexa-ranked 133 and getting about 8.7 million visitors every day, being on the front page 3 times at once, means you’re going to get a lot of traffic in a relatively short space of time. Think of 3 phone numbers being read out concurrently on 3 TV stations that all point to the same call centre – that’s falkvinge.net

This is pretty much a nightmare scenario to prepare for from a systems administration point of view as you have to prepare for lots of traffic in a short window of time. In addition, with social media, you don’t have the foggiest clue how popular something is going to be – something posted to reddit is much more likely not to generate much traffic or a smallish amount of traffic than it is to cripple your webserver, so you actually need to be constantly prepared for lots of traffic in a short window of time.

Stats for the 24 hours when I had 3 articles on Reddit’s front: 421 gigs of data served, 21.7M HTTP requests, peak 630 reqs/sec

- @Falkvinge

Rick has a somewhat customised WordPress setup with the W3 Total Cache plugin on the latest version of Ubuntu, probably definitely with Apache from what I can tell. It’s anyone’s best guess what hardware it’s running on (UPDATE: this is the hardware he’s running). Fairly standard as far as I can see – it’s mainly static content and not outrageously interactive or personalised. There are some images, but they don’t form the main part of the site.

Again, I could not have survived that traffic peak without @CloudFlare (see previous tweet)

- @Falkvinge

Rick’s solution to the problem is to the “cloud” Infrastructure-As-A-Service provider Cloudflare, which is essentially is a caching reverse proxy/CDN combined with a Distributed DNS service. What this means in practice is that they’re able to use Cloudflare to handle these unexpected large peaks in traffic without changing their infrastructure.

Using a blackbox called Cloudflare to scale one’s website is all very well, but doesn’t suit everyone and presents an interesting sysadmin challenge:

How would you build a setup for a simple-ish WordPress instance, like Rick’s, to cope with the levels of traffic he mentions?