Building reddit-frontpage-proof websites?

Rick Falkvinge’s website Falkvinge.net recently frontpaged reddit.

Falkvinge.net

Falkvinge.net

Actually, in the default setup, he had three articles, (#16, #22, #24), which, as he says, is a record for him.

Why is this a big deal? Well with reddit being Alexa-ranked 133 and getting about 8.7 million visitors every day, being on the front page 3 times at once, means you’re going to get a lot of traffic in a relatively short space of time. Think of 3 phone numbers being read out concurrently on 3 TV stations that all point to the same call centre – that’s falkvinge.net

This is pretty much a nightmare scenario to prepare for from a systems administration point of view as you have to prepare for lots of traffic in a short window of time. In addition, with social media, you don’t have the foggiest clue how popular something is going to be – something posted to reddit is much more likely not to generate much traffic or a smallish amount of traffic than it is to cripple your webserver, so you actually need to be constantly prepared for lots of traffic in a short window of time.

Stats for the 24 hours when I had 3 articles on Reddit’s front: 421 gigs of data served, 21.7M HTTP requests, peak 630 reqs/sec

- @Falkvinge

Rick has a somewhat customised WordPress setup with the W3 Total Cache plugin on the latest version of Ubuntu, probably definitely with Apache from what I can tell. It’s anyone’s best guess what hardware it’s running on (UPDATE: this is the hardware he’s running). Fairly standard as far as I can see – it’s mainly static content and not outrageously interactive or personalised. There are some images, but they don’t form the main part of the site.

Again, I could not have survived that traffic peak without @CloudFlare (see previous tweet)

- @Falkvinge

Rick’s solution to the problem is to the “cloud” Infrastructure-As-A-Service provider Cloudflare, which is essentially is a caching reverse proxy/CDN combined with a Distributed DNS service. What this means in practice is that they’re able to use Cloudflare to handle these unexpected large peaks in traffic without changing their infrastructure.

Using a blackbox called Cloudflare to scale one’s website is all very well, but doesn’t suit everyone and presents an interesting sysadmin challenge:

How would you build a setup for a simple-ish WordPress instance, like Rick’s, to cope with the levels of traffic he mentions?

3 Responses to “ Building reddit-frontpage-proof websites? ”

  1. A few obvious things:

    Clearly bandwidth, as impressive as 500GiB in 24 hours sounds, isn’t going to be an issue. Assuming his server has at least a 100Mb/s uplink, it should be able to cope fine. If his hosting company was likely to cut him off if he used all his bandwidth limit or would be unable to cope with traffic levels like this, then he probably shouldn’t be using them.

    He is going to want something caching web-pages – varnish or nginx are obvious choices but not the only ones by any means. These need tuning and testing. If he can push these caches to RAM/fast storage as much as possible, he should get better performance.

    Getting the cache expiration so comments appear in real time-ish is a little bit trickier, but not at all impossible. Mainly I guess there’d be a lot of testing, benchmarking, looking for bottlenecks and removing them.

  2. Just filling in with the hardware:

    My server is standing at home on a 100 Mbit/s connection (on the balcony, actually — the winter serves well to cool it right now). Hardware-wise, it is an old 2-by-2 Opteron server (dual cores, two processors) with 6 gigs of memory running 64-bit Ubuntu 11.10. And yes, it runs Apache.

    The stories floated up and down on the front page over the day, by the way; one peaked at #2, another around #10, and I don’t know if the third hit top, but I saw it at #4, at least. At one time, they were all on the front page simultaneously.

    Cheers,
    Rick

  3. [...] Arguably, it’s over complicated, has unnecessary complication (W3 Total Cache + Memcache seem like recipes for confusion and pain) and I could just use nginx, php-fpm and varnish, but this setup gives me the flexibility and known quantity that is Apache, whilst letting me still scale my site to reddit-proof-like proportions. [...]

Leave a Reply