• Share
  • Email
  • Embed
  • Like
  • Private Content
Austin Cloud Users Group - August 23rd, 2011
 

Austin Cloud Users Group - August 23rd, 2011

on

  • 752 views

This is a short slide deck from a talk given by Eric Anderson of CopperEgg to the Austin Cloud Users Group on August 23rd, 2011. Talk included the migration around Amazon's services, Rackspace Cloud, ...

This is a short slide deck from a talk given by Eric Anderson of CopperEgg to the Austin Cloud Users Group on August 23rd, 2011. Talk included the migration around Amazon's services, Rackspace Cloud, and then back again. The important of super real-time monitoring becomes clear by the end.

Statistics

Views

Total Views
752
Views on SlideShare
752
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Austin Cloud Users Group - August 23rd, 2011 Austin Cloud Users Group - August 23rd, 2011 Presentation Transcript

    • copperegg Austin CUG - August 23rd, 2011 (presented by Eric Anderson) anderson@copperegg.comWednesday, August 24, 11
    • About Us CopperEgg • Founded spring 2010 • Super real-time monitoring and analytics About me (Eric Anderson) • SysAdmin - Centaur - 1999-2007 • 1400 compute nodes, ~50-100 file servers, ~200 misc systems, hundreds of TB’s • Software Engineer - StorSpeed - 2007-2010 • built distributed file system cache for NAS acceleration product • Co-Founder/COO - CopperEgg - 2010-Present 2Wednesday, August 24, 11
    • Why Cloud? Important Differences: • All reliable and business-worthy install need something like this: Installs in seconds – copy/paste systems • No configuration required - anyone can do it •Physical security •Redundant infrastructure •Redundant power •Multi-AZ, Regions, storage, etc •Redundant AC •Resilient Applications •Redundant & fast network •Designed for failure •Peak hardware •Performance measurement •Spare equipment •Automatic failover/recovery •Physical space (storage of •Security of your infrastructure spare stuff too) •Monitoring - up/down/status •People to manage physical •Visibility into system as a whole infrastructure •Don’t rely on cloud vendor! •Hardware repairs •Delayed, inaccurate 3Wednesday, August 24, 11
    • Why Cloud? Important Differences: All reliable and business-worthy systems need something like this: Physical Cloud •Physical security •Redundant infrastructure •Redundant power •Multi-AZ, Regions, storage, etc •Redundant AC •Resilient Applications •Redundant & fast network •Designed for failure •Peak hardware •Performance measurement •Spare equipment •Automatic failover/recovery •Physical space (storage of •Security of your infrastructure spare stuff too) •Monitoring - up/down/status •People to manage physical •Visibility into system as a whole infrastructure •Don’t rely on cloud vendor! •Hardware repairs •Delayed, inaccurate 4Wednesday, August 24, 11
    • Why Cloud? (for CopperEgg) Why did we go cloud? • Needed to get building fast • We didn’t know what we needed • Just-in-time scaling • Keep costs low and still provide awesome service levels • Easy deployment for developers • Test different scenarios, try new setups, etc • We use it for everything! • code repositories, tickets, email, phone, alerting, etc 5Wednesday, August 24, 11
    • What we were building Storage analytics product • visualize network attached storage in real-time • massive amounts of data • analyzing 10 billion ops/day in beta, in real-time • super real-time (seconds vs minutes) Requirements: • highly available • super responsive • gobble large amounts of analytics data in real-time • historical data for 2 yrs • great UI 6Wednesday, August 24, 11
    • Where we started + SimpleDB Bad: • Outgrew it before we outgrew it • Slow! So then what? 7Wednesday, August 24, 11
    • Amazon RDS to save the day! + SimpleDB + RDS Good: • Faster than SimpleDB • Could scale the storage Bad: • Realized it still would not handle our dataset • Inserts were too slow So then what? 8Wednesday, August 24, 11
    • MySQL on EC2 to save the day! + SimpleDB + RDS EC2 + MySQL Good: • Faster than RDS • Increased insert performance • Using some cheats to get the insert rate up Bad: • Still not good enough insert performance.. So then what? 9Wednesday, August 24, 11
    • MySQL on Rackspace Cloud + SimpleDB + RDS + MySQL EC2 + MySQL Good: • Faster than Amazon (CPU) • Seemed cheaper Bad: • No easy way to scale across different zones or regions • No way to expand storage per instance (whole instance only - costly!) • Then we got the bill: they charge for data xfer between instances - OUCH So then what? 10Wednesday, August 24, 11
    • Back to Amazon! + SimpleDB EC2, EBS, + RDS MongoDB + MySQL EC2 + MySQL Why did we move back? • Lots of great services: S3, EC2, EBS, Route 53, ELB (we use all of these) • Even more: SQS, SES, etc • Multiple regions and availability zones • Scale-as-you-need: storage, memory, cpu, redundancy • Documentation We’re still happy with this.. (9 months and running) 11Wednesday, August 24, 11
    • What’s this NoSQL thing? Realized maybe MySQL was not the best choice • How about a NoSQL database? • So we tested and measured every one we thought was worth looking at: • Redis • Tokyo Tyrant, Kyoto Cabinet • Cassandra • MongoDB • etc, etc, etc (there are a lot) 12Wednesday, August 24, 11
    • MongoDB won MongoDB won the award - why? • Redundant • Scalable • Persistent data-store • Handles large amounts of data • Awesome user community • Vendor support • Open source • Lots of momentum 13Wednesday, August 24, 11
    • Where are we now? Needed a way to monitor our site: • Requirements: • Know right away when problems occur • See into the performance of the system • See historical trends as we grow the business • Super real-time product needs super real-time monitoring • Not satisfied with existing solutions • slow updates (1m or 5m way to slow - not real-time) • not ‘cloud friendly’ • pain to maintain • some are pricey 14Wednesday, August 24, 11
    • Not real-time? Then what *is* real-time? • Smallest amount of time you can comfortably have poor service before someone notices and changes their behavior. • Example: • Web site can only be slow/unavailable for a few seconds before people leave • Email can be slow for tens of seconds before people get grumpy (or less depending on the people!) • Twitter - well, we’ll leave that one for you to decide So, if seconds is the yardstick for measuring poor performance, why do we monitor every 1 or 5 minutes? 15Wednesday, August 24, 11
    • CPU Usage: 5min sampling 100 75 50 25 1 5:00 PM 5:05 PM Here’s what a 5 minute sample provides • Doesn’t look like much is happening • Users should not be complaining right? 16Wednesday, August 24, 11
    • CPU Usage: 1min sampling 100 75 50 25 0 5:00 PM 5:01 PM 5:02 PM 5:03 PM 5:04 PM 5:05 PM Same data - 1 minute sample • Looks like there was some kind of cpu activity at 5:01pm - 5:02pm • Still no issue though - right? 17Wednesday, August 24, 11
    • CPU Usage: 5 second sampling 100 75 50 25 0 5:00 PM 5:01 PM 5:02 PM 5:03 PM 5:04 PM 5:05 PM Same data - 5s sampling • Becomes clear there was something happening: • between 5:01:10pm - 5:01:25pm 18Wednesday, August 24, 11
    • So we rolled our own RevealCloud • Turns out a lot of people agreed with us • Highlights: • Built on our super real-time analytics engine • Updates in seconds vs minutes • Easy to install, no config required • Great looking and usable interface • Works anywhere - public/private cloud, vm, bare metal) 19Wednesday, August 24, 11
    • copperegg QuestionsWednesday, August 24, 11
    • copperegg DemoWednesday, August 24, 11
    • Demo Screenshots 22Wednesday, August 24, 11
    • Demo Screenshots 23Wednesday, August 24, 11
    • Demo Screenshots 24Wednesday, August 24, 11