How To Make Life Suck
       Less!
    (when building scalable systems)

       Bradford Stephens
    c: www.DrawnToScaleH...
About Me

• Founder, Drawn to Scale. Lead Engineer,
  Visible Technologies
• CS Degree, University of North FL
• Former ca...
Drawn to Scale

• Building the “Big Data” platform: ingestion,
  processing, storage, search
• Products coming: Big Log, B...
Topics

• Overview
• Operations
• Engneering
• Process
Everything Changes
      with Big Data

• Bar is set higher: a previously niche field,
  few standard stacks (like LAMP)
• ...
Scalability Matters

• “Web-Scale” data is unstructured and
  exponentially interconnected
• Social Media: Catalyst
• All ...
The Traditional DB
• Excel with highly structured, normalizable
  data
• Non-Linear Scale Cost
• More data = less features...
Ergo, Distributed

• Optimize for the problems, no Swiss-Army
  knife
• Shared-nothing, commodity boxes
• Linear scale cost
The State of Things

• Order changed from 20 years ago:
• Cust. Experience is paramount
• Engineers are precious
• Fast I/...
Recovery-Oriented
      Computing

1. Seamlessly Partitioned
2. Synchronously Redundant
3. Heavily Monitored
Operations

Moving the Box: Sysadmin ratio from 2:1 to
            200:1 to 2000:1


   (yes devs, you’ll care about this ...
Ops vs. Eng

• Engineers build, Ops manages
• Fixing problems: devs code+automate, ops
  hire
• Want something fixed? Call ...
Config is Important

• Configuration is not 2nd-class anymore
• Needs to be tackled by Engineers
• New frameworks = months o...
Production = Test

• Surprise! You don’t have a Test environment
  any more.
• Test Cost => Prod Cost
• Anything that’s no...
You’re Always Testing

• Constantly simulate failures and brownouts
  of boxes, racks, switches...
• “Canary in the Coal M...
Deployment


• Deploy gradually: 1 box, 2 boxes, 1 rack...
• Code granularly, backwards-compatible
Built to Fail
• “It’s working” isn’t binary
• Acting weird? Shoot it.
• Multi-system failure is common: be
  topology awar...
Engineering


This is Systems Software, not Applications
                 Software
This is Hard :(
• Engineering at scale is very different than
  writing a 3-tier webapp
• Care about garbage collection, e...
Not Everything’s a Table

• Structure your data according to how it
  needs to be used
• Unstructured massive files, graphs...
Big Data is BIG

• Imagine your test passes taking hours
• What works at 1.5 TB may fail at 10MB or
  2 TB
• Many tests, s...
“No, I won’t give you a
        repro”

• Often impossible to repro a bug on
  demand in a cluster
• Either fix your loggin...
Avoiding Impedance
       Mismatch

• High vs. Low Latency vs. Throughput
• A lot of data eventually, or a little now
• Ma...
Simple Workflow
                       Semantic     Unstructured
Hadoop      Collect
                       Analysis       ...
Biz + Process


The softer side of distributed computing
Hiring


• Plan for more engineers, less ops
• Be aware of “context switch cost” when
  training RDBMS-folks
It’s Not Just Coding
• Be aware of research cost
• Much more time spent experimenting, not
  coding
• Coding all this from...
Solve your Core
         Problem

• “Making your own electricity doesn’t create
  better tasting beer”
• Plan to use an en...
In Summary

• Plan for everything to fail
• Test constantly in production
• Systems Software requires Computer
  Science
•...
Thanks!

• Ya’ll
• Road to Failure Readers
• James Hamilton, Amazon/MS
• Bradford Cross, Flightcaster
• Ryan Rawson, HBase...
Useful Resources

• www.roadtofailure.com
• www.highscalability.com
• perspectives.mvdirona.com
Upcoming SlideShare
Loading in...5
×

Make Life Suck Less (Building Scalable Systems)

2,851

Published on

This presentation was given at LinkedIn. It is a collection of guidelines and wisdom for re-thinking how we do engineering for massively scalable systems. Useful for anyone who cares about Big Data, Distributed Computing, Hadoop, and more.

Published in: Technology, Education
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,851
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
69
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Make Life Suck Less (Building Scalable Systems)

    1. 1. How To Make Life Suck Less! (when building scalable systems) Bradford Stephens c: www.DrawnToScaleHQ.com b: www.roadtofailure.com t: @lusciouspear
    2. 2. About Me • Founder, Drawn to Scale. Lead Engineer, Visible Technologies • CS Degree, University of North FL • Former careers in politics, music, finance, consulting
    3. 3. Drawn to Scale • Building the “Big Data” platform: ingestion, processing, storage, search • Products coming: Big Log, Big Search (faceted), Big Message...
    4. 4. Topics • Overview • Operations • Engneering • Process
    5. 5. Everything Changes with Big Data • Bar is set higher: a previously niche field, few standard stacks (like LAMP) • You need to have better engineering for minimum success
    6. 6. Scalability Matters • “Web-Scale” data is unstructured and exponentially interconnected • Social Media: Catalyst • All data is important • Data Size != Business Size
    7. 7. The Traditional DB • Excel with highly structured, normalizable data • Non-Linear Scale Cost • More data = less features • Optimized for single-node • 90% of utility is 5% of capability
    8. 8. Ergo, Distributed • Optimize for the problems, no Swiss-Army knife • Shared-nothing, commodity boxes • Linear scale cost
    9. 9. The State of Things • Order changed from 20 years ago: • Cust. Experience is paramount • Engineers are precious • Fast I/O is expensive • Storage is cheap
    10. 10. Recovery-Oriented Computing 1. Seamlessly Partitioned 2. Synchronously Redundant 3. Heavily Monitored
    11. 11. Operations Moving the Box: Sysadmin ratio from 2:1 to 200:1 to 2000:1 (yes devs, you’ll care about this too)
    12. 12. Ops vs. Eng • Engineers build, Ops manages • Fixing problems: devs code+automate, ops hire • Want something fixed? Call devs at 2 AM.
    13. 13. Config is Important • Configuration is not 2nd-class anymore • Needs to be tackled by Engineers • New frameworks = months of configuration and experimentation • Chef is a good start, but...
    14. 14. Production = Test • Surprise! You don’t have a Test environment any more. • Test Cost => Prod Cost • Anything that’s not your data center is an approximation. Switches, cable, power, boxes, etc...
    15. 15. You’re Always Testing • Constantly simulate failures and brownouts of boxes, racks, switches... • “Canary in the Coal Mine”: run a box and rack at 175% current load.
    16. 16. Deployment • Deploy gradually: 1 box, 2 boxes, 1 rack... • Code granularly, backwards-compatible
    17. 17. Built to Fail • “It’s working” isn’t binary • Acting weird? Shoot it. • Multi-system failure is common: be topology aware • Avoid false negative: something’s wrong and you don’t know it, lose customer data • This is empowering!
    18. 18. Engineering This is Systems Software, not Applications Software
    19. 19. This is Hard :( • Engineering at scale is very different than writing a 3-tier webapp • Care about garbage collection, election algorithms, data structures, access patterns, etc... • CS knowledge is required, not a luxury • DBA/RDBMS skills pretty useless • CAP is law
    20. 20. Not Everything’s a Table • Structure your data according to how it needs to be used • Unstructured massive files, graphs, KV- stores • The more your problem narrows, the easier it is to scale
    21. 21. Big Data is BIG • Imagine your test passes taking hours • What works at 1.5 TB may fail at 10MB or 2 TB • Many tests, simple code • Soft Delete Only
    22. 22. “No, I won’t give you a repro” • Often impossible to repro a bug on demand in a cluster • Either fix your logging or your bug • Log everything (we have a product for this!)
    23. 23. Avoiding Impedance Mismatch • High vs. Low Latency vs. Throughput • A lot of data eventually, or a little now • MapReduce vs. Sharding/Indexing
    24. 24. Simple Workflow Semantic Unstructured Hadoop Collect Analysis Analysis Structured Analysis Hadoop + Store in HBase HBase Store in Indexing Hadoop Lucene+ Load/ Pull Solr+ Replicate Indexes Katta Shards Search
    25. 25. Biz + Process The softer side of distributed computing
    26. 26. Hiring • Plan for more engineers, less ops • Be aware of “context switch cost” when training RDBMS-folks
    27. 27. It’s Not Just Coding • Be aware of research cost • Much more time spent experimenting, not coding • Coding all this from scratch is horrific • Nailing together 10+ OSS projects is a pain • Open source anything not “Secret sauce”
    28. 28. Solve your Core Problem • “Making your own electricity doesn’t create better tasting beer” • Plan to use an end-to-end platform in the future (hint: ours!)
    29. 29. In Summary • Plan for everything to fail • Test constantly in production • Systems Software requires Computer Science • Don’t build it if you don’t have to
    30. 30. Thanks! • Ya’ll • Road to Failure Readers • James Hamilton, Amazon/MS • Bradford Cross, Flightcaster • Ryan Rawson, HBase/Stumbleupon
    31. 31. Useful Resources • www.roadtofailure.com • www.highscalability.com • perspectives.mvdirona.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×