Large platform architecture in (mostly) perl

Large platform
architecture in (mostly)
perl - an illustrated
tour
Tomas (t0m) Doran
São Paulo.pm perl workshop 2010
YAPC::EU Pisa 2010

This talk

• Is mostly a ramble
• About what I do for a living
• Good bits
• and bad bits (probably mostly bad bits)
• And when I say ‘illustrated’, I’m not very
good at diagrams, sorry...

Making money from
independent music

• IMPOSSIBLE
• No, no it isn’t. But we’re very lucky to have
people who know the music industry
• A startup would tank
• Last.fm guys “keep losing less money”

The state51 conspiracy
Consolidated Independent
Media Service Provider
• Several (largely proﬁtable) businesses based
on the same technology platform
• East London (Brick Lane), a warehouse.
• > 60% of UK independent content goes
through us somewhere

Being S3 on the cheap
• WAV ﬁles are big.Videos are bigger.
• Transcodes aren’t small, especially when
you have 15 of them.
• My music collection is several hundred
terrabytes
• Need to be able to serve this stuff fast and
concurrently.

MogileFS

• Is free.
• Runs on cheap hardware
• Cheaper then S3.
• Not so awesome if you aren’t Livejournal

Data center design
• 8 amp racks. Seriously, WTF!?!?!
• Electricity is more expensive than servers,
ergo rolling hardware upgrades trivially pay
for themselves.
• Transit is really, really expensive.
• Worth buying ﬁber to other locations to
peer if you need lots of bandwidth.

Platform overview
<VIP> <VIP> <VIP>

Varnish Varnish Varnish

ESI: ESI: ESI:

nginx nginx nginx
Apache + mogile Apache + mogile Apache + mogile
+ custom + custom + custom

FCGI FCGI le FCGI FCGI le FCGI FCGI le
apps auth apps auth apps auth

Also: Encoding (bare metal)
Encoding (VMWare)
Encoding SOAP service
Memcached
Mogile Tracker

Storage
Replication StorageStorage
StorageStorageStorage
MySQL MySQL StorageStorageStorage
Object store Object store StorageStorageStorage
Master Slave StorageStorageStorage
StorageStorage
Storage

Web architecture
• App servers apache, apps FastCGI, port 81
• Varnish + ESI, caching, port 80
• 1 varnish per host, talks to all the apaches
• 1 VIP per host
• Host fail:VIP transfer
• Apache/app fail (or overload), varnish
rebalances/retries.

Web architecture (cont)

• Varnish doesn’t cache media, just provides
failover.
• nginx sends the hit to FastCGI app.
• Returns X-Accel-Redirect.
• nginx talks to MogileFS, handles delivery.

<VIP> <VIP> <VIP>

Varnish Varnish Varnish

ESI: ESI: ESI:

nginx nginx nginx
Apache + mogile Apache + mogile Apache + mogile
+ custom + custom + custom

FCGI FCGI le FCGI FCGI le FCGI FCGI le
apps auth apps auth apps auth

Also: Encoding (bare metal)
Encoding (VMWare)
Encoding SOAP service
Memcached
Mogile Tracker

Storage
Replication StorageStorage
MySQL MySQL StorageStorageStorage
Object store Object store StorageStorageStorage
Master Slave StorageStorageStorage
StorageStorage
Storage

Storage architecture
• Lots of boxes with lots of disk.
• Many additional roles to storage. (Mogile
tracker, memcache node, metal encoding,
VMWare, SOAP Service)
• Not all the boxes do all the roles.
• All the roles can safely fall over and die.
• Which is good, as they do. Or the box falls
over. Or a, then b.

WAV files

• WAV is a container format.
• Loosely defined.
• You can stuff XML documents in WAV files
• Some encoders (oh hai flac) very picky.
• ‘dirty’ and ‘clean’ WAV files.

Transcoding everything

• Lots of different formats
• WMA - GNARGGH%$@*&!!

Win32

• We’re running ActiveState for hysterical
raisins.
• No XS modules
• Thin as possible

Encoding
HTTP Nodes
HTTP Nodes
HTTP Nodes Encoding Service Uploading Service

GET
&
PUT
SOAP
media
Encoder

Downloader Uploader

Win32 &
Local Disk Encoder
(mp3)
Encoder
(wma) Unix

Snakes On A Plane

• SOAP actually works ok here, as we
control both ends.
• Old version of SOAP::Lite
• Wouldn’t recommend interoperating

Logging
• Used to be terribly hard to debug
• Push logs into syslog
• Aggregate in splunk - time correlated from
encoding machines, web service machines,
etc.
• Much easier to work out what happened.

Hardware is shit

• When you have several 100 Tb, undetected
bit error rate of magnetic media is actually
signiﬁcant.
• See also networks, memory, etc.

Things will always fail

• If you need reliability, you have to design it
in from the start.
• Not only will you have (a lot of) hardware
failures, all the software will break in
unexpected ways. Lets not talk about
netotworks..
• Maybe you don’t need this..

Queueing

• We have work queues of different types of
media (e.g. mp3/wma/aac etc)
• In the database.
• Don’t do this.

MySQL sucks

• 1 type of JOIN
• No query rewriting
• Not enough stats for the planner to be
sane

This can hurt
• File Transform table:
• Master (File)
• Result (File)
• Status (pending/complete/failed/running)
• TransformStep (from/to)
• Leads to bad join order, massive fail

How to fail
• SELECT all file transforms that lead to wma
(millions).
• JOIN all files, ever (millions). Filter to find
those in state ‘pending’
• All pending looks like a bad bet - cardinality
of ‘all wmas’ looks better than cardinality of
‘all pending’.
• JOIN in the wrong order, nested loop,
screwed..

Queueing
• Did I mention queues in the DB suck?
• Even if you’re not screwing it up.
• Get a Message Queue (or at least an async
job server)
• If your problem is simple - Gearman.
Harder or you need interop - RabbitMQ.

Mutable state

• Mutable state is the enemy
• Too many things rw.
• No idea how an object got to this state

Anemic domain model
Object-oriented programming (OOP) is a
programming paradigm that uses "objects" –
data structures consisting of data ﬁelds and
methods together with their
interactions – to design applications and
computer programs. Programming techniques
may include features such as data
abstraction, encapsulation, modularity,
polymorphism, and inheritance.

Anemic domain model
• Superset of too much mutable state
• Able to create invalid objects
• Able to make previously valid objects
invalid
• Violation of the encapsulation and
information hiding principles.

scripts

• Lots of our business logic was in scripts
that manipulated objects
• You need people to run scripts (in screen
sessions)
• Ewwww, ewwwww.

Jobs
• Moved to a job based approach
• Jobs started by ﬁle creation, or changing
state of something in a web app
• Jobs sent via message queuing.
• Results go via message queueing
• Jobs trigger other jobs

Jobs Example
• Validate XLS file supplied with order.
• Valid files trigger another job to create
objects for each thing in the XLS
• This then triggers another job to create
transforms, which are then done...
• ... etc ...
• Can’t do this workflow in a web request.

Jobs Future

• More automation of things people run
scripts for.
• Automatic job regeneration (you will lose
messages).

Lava ﬂow

• Old (possibly unclean/invalid) data
• Old (unused/unmaintained) code
• “What harm does it do”

Relational integrity

• Seems to be a pipe dream more often then
not in the real world.
• Why?
• It’s not hard

Data consistency

• This should theoretically be the same thing
as relational integrity.
• In practice...

Mumble View Crap

• Too much logic in templates
• Copy & paste
• Business objects viewed as unchangeable
• Deleted 3000 lines from 2 simple
workﬂows. This ﬁxed a dozen bugs.

Tangram
• No LEFT JOIN
• Displaying a product list becomes an x n
problem.
• OUCH
• Keep stupid - put the entire DB hot in
memcache!

Don’t do web design

• You are a programmer
• Make people pay for a design/CSS/HTML
person
• Work with them
• Be happy

Love your sysadmins
• Help them out.
• Build packages, or local::libs or something
• Keep everything in revision control
• Allow things to be sensibly conﬁgured.
• DOCUMENT THE POSSIBLE SETTINGS
• Use systems management - Puppet?

Love your logs

• Active feedback
• Aggregate in splunk
• Actively prune useless stuff
• Actively add useful stuff after a production
incident

ESI

• Is really awesome
• Make the pain go away
• PURGE requests
• Keep everything hot all the time

memcache everything

• Keep the entire database hot in memcache
• We mostly ask trivial questions, so just
cache those paths.
• 30 Gb of RAM isn’t actually much (3
boxes..)

memcache
• IS A CACHE
• Use sequential port numbers and CNAMES
• E.g. cache0:11210, cache1:11211,
cache2:11212 etc..
• Run several per machine
• Allows you to scale capacity and rebalance
without entire cache ﬂush.

Don’t push bytes

• X-Sendﬁle and X-Accel-Redirect
• I already talked about ﬁle delivery like this
• Using 100Mb of RAM to proxy web
requests does not scale.

Test everything

• Redundant systems need testing
• You’ll still die unexpectedly in production
• If you can manage it, make responsibility for
deployment SEP.

• Thanks for listening
• Questions?

Large platform architecture in (mostly) perl

Recommended

Recommended

More Related Content

Similar to Large platform architecture in (mostly) perl

Similar to Large platform architecture in (mostly) perl (20)

More from Tomas Doran

More from Tomas Doran (20)

Recently uploaded

Recently uploaded (20)

Large platform architecture in (mostly) perl

Editor's Notes