Make Life Suck Less (Building Scalable Systems)

How To Make Life Suck
Less!
(when building scalable systems)

Bradford Stephens
c: www.DrawnToScaleHQ.com
b: www.roadtofailure.com
t: @lusciouspear

About Me

• Founder, Drawn to Scale. Lead Engineer,
Visible Technologies
• CS Degree, University of North FL
• Former careers in politics, music, ﬁnance,
consulting

Drawn to Scale

• Building the “Big Data” platform: ingestion,
processing, storage, search
• Products coming: Big Log, Big Search
(faceted), Big Message...

Topics

• Overview
• Operations
• Engneering
• Process

Everything Changes
with Big Data

• Bar is set higher: a previously niche ﬁeld,
few standard stacks (like LAMP)
• You need to have better engineering for
minimum success

Scalability Matters

• “Web-Scale” data is unstructured and
exponentially interconnected
• Social Media: Catalyst
• All data is important
• Data Size != Business Size

The Traditional DB
• Excel with highly structured, normalizable
data
• Non-Linear Scale Cost
• More data = less features
• Optimized for single-node
• 90% of utility is 5% of capability

Ergo, Distributed

• Optimize for the problems, no Swiss-Army
knife
• Shared-nothing, commodity boxes
• Linear scale cost

The State of Things

• Order changed from 20 years ago:
• Cust. Experience is paramount
• Engineers are precious
• Fast I/O is expensive
• Storage is cheap

Recovery-Oriented
Computing

1. Seamlessly Partitioned
2. Synchronously Redundant
3. Heavily Monitored

Operations

Moving the Box: Sysadmin ratio from 2:1 to
200:1 to 2000:1

(yes devs, you’ll care about this too)

Ops vs. Eng

• Engineers build, Ops manages
• Fixing problems: devs code+automate, ops
hire
• Want something ﬁxed? Call devs at 2 AM.

Config is Important

• Configuration is not 2nd-class anymore
• Needs to be tackled by Engineers
• New frameworks = months of
configuration and experimentation
• Chef is a good start, but...

Production = Test

• Surprise! You don’t have a Test environment
any more.
• Test Cost => Prod Cost
• Anything that’s not your data center is an
approximation. Switches, cable, power,
boxes, etc...

You’re Always Testing

• Constantly simulate failures and brownouts
of boxes, racks, switches...
• “Canary in the Coal Mine”: run a box and
rack at 175% current load.

Deployment

• Deploy gradually: 1 box, 2 boxes, 1 rack...
• Code granularly, backwards-compatible

Built to Fail
• “It’s working” isn’t binary
• Acting weird? Shoot it.
• Multi-system failure is common: be
topology aware
• Avoid false negative: something’s wrong and
you don’t know it, lose customer data
• This is empowering!

Engineering

This is Systems Software, not Applications
Software

This is Hard :(
• Engineering at scale is very different than
writing a 3-tier webapp
• Care about garbage collection, election
algorithms, data structures, access patterns,
etc...
• CS knowledge is required, not a luxury
• DBA/RDBMS skills pretty useless
• CAP is law

Not Everything’s a Table

• Structure your data according to how it
needs to be used
• Unstructured massive ﬁles, graphs, KV-
stores
• The more your problem narrows, the
easier it is to scale

Big Data is BIG

• Imagine your test passes taking hours
• What works at 1.5 TB may fail at 10MB or
2 TB
• Many tests, simple code
• Soft Delete Only

“No, I won’t give you a
repro”

• Often impossible to repro a bug on
demand in a cluster
• Either ﬁx your logging or your bug
• Log everything (we have a product for this!)

Avoiding Impedance
Mismatch

• High vs. Low Latency vs. Throughput
• A lot of data eventually, or a little now
• MapReduce vs. Sharding/Indexing

Simple Workﬂow
Semantic Unstructured
Hadoop Collect
Analysis Analysis

Structured
Analysis
Hadoop + Store in
HBase HBase
Store in
Indexing
Hadoop

Lucene+ Load/
Pull
Solr+ Replicate
Indexes
Katta Shards Search

Biz + Process

The softer side of distributed computing

Hiring

• Plan for more engineers, less ops
• Be aware of “context switch cost” when
training RDBMS-folks

It’s Not Just Coding
• Be aware of research cost
• Much more time spent experimenting, not
coding
• Coding all this from scratch is horriﬁc
• Nailing together 10+ OSS projects is a pain
• Open source anything not “Secret sauce”

Solve your Core
Problem

• “Making your own electricity doesn’t create
better tasting beer”
• Plan to use an end-to-end platform in the
future (hint: ours!)

In Summary

• Plan for everything to fail
• Test constantly in production
• Systems Software requires Computer
Science
• Don’t build it if you don’t have to

Thanks!

• Ya’ll
• Road to Failure Readers
• James Hamilton, Amazon/MS
• Bradford Cross, Flightcaster
• Ryan Rawson, HBase/Stumbleupon

Useful Resources

• www.roadtofailure.com
• www.highscalability.com
• perspectives.mvdirona.com

Make Life Suck Less (Building Scalable Systems)

More Related Content

What's hot

Similar to Make Life Suck Less (Building Scalable Systems)

Recently uploaded

Make Life Suck Less (Building Scalable Systems)