• Save
A journey in the public clouds
Upcoming SlideShare
Loading in...5
×
 

A journey in the public clouds

on

  • 677 views

Presented at the NYC IASA Chapter, 6/28/2011.

Presented at the NYC IASA Chapter, 6/28/2011.

Statistics

Views

Total Views
677
Views on SlideShare
604
Embed Views
73

Actions

Likes
2
Downloads
0
Comments
0

6 Embeds 73

http://rg.qdbproject.net 44
http://www.linkedin.com 20
http://posterous.com 4
https://www.linkedin.com 3
http://www.slideshare.net 1
http://feeds.feedburner.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    A journey in the public clouds A journey in the public clouds Presentation Transcript

    • A Journey In The Public Clouds With Datadog Alexis Lê-Quôc (Product Guy) at Datadog IASA New York Chapter June 28th, 2011
    • What I’m going to talk about ‣What we do and for whom ‣The kind of data we deal with ‣Our architecture ‣Our architecture in a public cloud (AWS) ‣What we learned ‣Q+A
    • SaaS Platform forAggregation, Correlation, Collaboration For Dev & Ops What we do?
    • The Mess Usage Analytics Too many data streams, IAAS / PAAS too many silos Issue Resolution t ics Servers and Devices ics igh ices etr ins metr g billin Too many choices to m m cho et ri c s s ?!? change make, too often Dev team changes !? ics choices metr Ops team Applications tri cs ch an Only getting worse as me nts ge SaaS Silos multiplyme even s ve tstri ad e + fe es edb cs vic oic ack ch e me s s tric choice tri me cs Separate Dev and Ops Cap. Planning SDLC support Monitoring teams, looking at separate Hosting data streams Asset Mgmt CDNs Data-Driven decision making in IT is rarely happening. Too slow, Too expensive, requires too much discipline.
    • We SimplifyDatadog to the rescue system metrics key metrics quality metrics to Alice Dev SaaS data visibility capacity metrics usage analytics recommendations cloud billing to Bob Ops code metrics visibility config changes IaaS pricing business metrics perf. data to Charlie CEO vendors info curated metadata Aggregation Correlation Collaboration
    • Concretely
    • etc. Aggregation
    • AGGREGATION Aggregation
    • https://app.datad0g.com/dash/dash/1000#/date_range/1308057152698-1308143552698 Correlation
    • Collaboration
    • What Architecture For What Kind Of Data?
    • Events MetricsUser comments Unique visitorsAlert LoadBuild Transaction durationBatch job etc.
    • Taxonomy
    • AtomicityConcistencyIsolationDurabilitye.g. SQL DBs CLASSICS http://en.wikipedia.org/wiki/Eventual_consistency
    • Atomicity BasicallyConcistency AvailableIsolation Soft-stateDurability Eventual consistencye.g. SQL DBs e.g. DNS CLASSICS http://en.wikipedia.org/wiki/Eventual_consistency
    • Data Intensive Real Time e.g. real-time webNEW COMERBrian Cantrill: http://dtrace.org/resources/bmc/DIRT.pdf
    • AggregationConstant data influxLarge data sets Correlation On-demand visualization Background data analysis Collaboration Real-time updates On-the-fly data analysis
    • Aggregation SEConstant data influx BALarge data sets Correlation On-demand visualization Background data analysis Collaboration Real-time updates On-the-fly data analysis
    • Aggregation SE TConstant data influx IR BA DLarge data sets Correlation On-demand visualization Background data analysis Collaboration Real-time updates On-the-fly data analysis
    • Aggregation SE TConstant data influx IR BA DLarge data sets Correlation SE On-demand visualization BA Background data analysis Collaboration Real-time updates On-the-fly data analysis
    • Aggregation SE TConstant data influx IR BA DLarge data sets Correlation SE On-demand visualization BA Background data analysis Collaboration T Real-time updates IR D On-the-fly data analysis
    • Aggregation SE TConstant data influx IR BA DLarge data sets Correlation SE On-demand visualization BA Background data analysis Collaboration T Real-time updates IR D On-the-fly data analysis Datadog = DIRT + BASE + a tiny bit of ACID
    • How It All Fits Together http://www.flickr.com/photos/tom-margie/1253798184/
    • Architecture Simplified
    • Architecture Simplified SEBA
    • Architecture Simplified SE T IR BAD
    • Architecture Simplified SE ID T IR C BA AD
    • The Environment
    • 4 DimensionsComputeStorageNetworkManagement
    • ON-PREMISE TRAITShttp://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
    • ComputeFastInelastic ON-PREMISE TRAITS http://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
    • ComputeFastInelasticStorageFastCentralizedRedundant ON-PREMISE TRAITS http://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
    • Compute NetworkFast FastInelastic LocalizedStorageFastCentralizedRedundant ON-PREMISE TRAITS http://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
    • Compute NetworkFast FastInelastic LocalizedStorageFast ManagementCentralized People-basedRedundant Full access ON-PREMISE TRAITS http://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
    • CLOUD TRAITS
    • ComputeSlowElastic CLOUD TRAITS
    • ComputeSlowElasticStorageSlowJitteryMaybe durableLow memory CLOUD TRAITS
    • Compute NetworkSlow “Fast”Elastic Geo-distributedStorageSlowJitteryMaybe durableLow memory CLOUD TRAITS
    • Compute NetworkSlow “Fast”Elastic Geo-distributedStorageSlowJittery ManagementMaybe durable No bare-metalLow memory “Magic” API CLOUD TRAITS
    • What We Have Found
    • Network
    • NetworkLayer 2: Virtual DomainLayer 3: Crude Edge FilteringLayer 7: Crude Load BalancingDNSCDN
    • NetworkLayer 2: Virtual Domain !Layer 3: Crude Edge Filtering ks orLayer 7: Crude Load BalancingDNS W ItCDN
    • Storage
    • Latency BASE Amazon S3 BASE Apache Cassandra ACID PostgreSQL DIRT Redis Capacity Storage
    • Latency BASE y nc Amazon S3 te La t BASE pu y gh er Apache Cassandra ou ACID tt hr Ji dt PostgreSQL i te Lim DIRT y or em Redis Capacity m wLo Storage
    • Low Memory http://aws.amazon.com/ec2/#instance
    • Jittery, Limited Throughput Network Block Storage (EBS) https://app.datad0g.com/dash/dash/1032#/date_range/1308608717016-1309213517016
    • Average wait in ms DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util03:35:02 PM dev8-80 375.95 23614.08 5.70 62.83 47.21 125.58 1.26 47.3403:35:02 PM dev8-96 373.63 23749.65 5.64 63.58 45.55 121.91 1.22 45.7203:35:02 PM dev8-112 375.28 23693.47 5.52 63.15 45.52 121.22 1.23 46.3103:35:02 PM dev8-128 375.31 23721.57 7.19 63.22 56.00 148.96 1.34 50.35 Read throughput in sector/s Average service Total: 368Mb/s time in ms Limited Throughput In Numbers RAID 0 EBS Volumes, m1.large instances
    • Some Tricks
    • Software RAIDRAID 0Offsite backups Some Tricks
    • Software RAID Limited by slowestRAID 0 volumeOffsite backups Some Tricks
    • Software RAID Limited by slowestRAID 0 volumeOffsite backupsStreaming replicationS3 backups Some Tricks
    • Software RAID Limited by slowestRAID 0 volumeOffsite backupsEphemeral volumesAnd Offsite backupsStreaming replicationS3 backups Some Tricks
    • Software RAID Limited by slowestRAID 0 volumeOffsite backupsEphemeral volumesAnd Offsite backups Complexity Recovery Time ObjectiveStreaming replication Recovery Point ObjectiveS3 backups Some Tricks
    • Software RAID Limited by slowestRAID 0 volumeOffsite backupsEphemeral volumesAnd Offsite backups Complexity Recovery Time ObjectiveStreaming replication Recovery Point ObjectiveS3 backupsDatabase ServiceMySQL/Oracle RDS Some Tricks
    • Software RAID Limited by slowestRAID 0 volumeOffsite backupsEphemeral volumesAnd Offsite backups Complexity Recovery Time ObjectiveStreaming replication Recovery Point ObjectiveS3 backupsDatabase Service TrustMySQL/Oracle RDS RDS Outage 2 months ago Some Tricks
    • Network Block Storage Is The Dark Side
    • Network Block Storage Is The Dark Side Bait For Enterprise Customers
    • Network Block Storage Is The Dark Side Bait For Enterprise CustomersHard Problem For Cloud Providers
    • Don’t rely on networked block storageSmall data sets only if you have toDon’t trust data-at-restCopy, replicate, back upDo use S3 if you canObject semantics a limitationSlow but durable Some Do’s And Don’t
    • Compute
    • “Performance” Scale up Shard ACID Nodes BASE DIRT Add more Nodes Nodes Number Compute
    • Don’t rely on scale-upsLow memory a hard limit for DBsNoisy neighborsIndividual performance poor and jitteryScale outFirst scale upThen ShardParallelize across machinesVector-processing via GPUs Some Do’s And Don’t
    • Management
    • An API for everythingComputeStorageNetworkManagement
    • Do use the AWS APIsAlmost like magicRich librariesEver expandingDo use toolse.g. Chef, Puppet, cfengine, etc.DatadogDo Kill and RespawnLow-level debugging impossibleInstance creation is cheapSome Do’s And Don’t
    • New RulesNew ToolsNew PlaybookSame Fundamentals
    • Questions!http://datadoghq.com twitter: @alq