Your SlideShare is downloading. ×
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012

870

Published on

Session presented at Big Data Spain 2012 Conference …

Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/architecture-to-scale/donn-rochette

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
870
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data Spain 2012 The International Big Data Conference in Spain Madrid, 16th Nov 2012 ETSI Telecomunicacion UPM www.bigdataspain.org
  • 2. Architecture for Scale A Case Study
  • 3. Who am I?• CTO and Co-Founder of AppFirst• Application Virtualization o UNIX server applications o Solaris 2.6 applications on Solaris 10• Real-time Operating Systems o Hubble Space Telescope o Under Wing Armaments o Medical Instruments• Launch Processing System o NASA Kennedy Space Center o Hardware and Software Design of Ground-based Launch Control Systems
  • 4. AppFirst Collects, Aggregates and Correlates Information from Production Applications• NYC based software start-up• Application o Aggregate & summarize data from 10ks of remote servers o Provide information for web apps and APIs• A Few Metrics o 45k to 50k summaries per minute o GBs per remote server per day o TBs of new data daily o Query & retrieve information in < 100 MS o Data store for up to 1 year
  • 5. Simplified Architecture
  • 6. Design for Scale•Micro scaleoApplication Components•Macro scaleoThe Entire Service
  • 7. Micro Scale: Data ProcessingRequirements:•Process a constant stream of datao3 snapshots per minute, per remote server•Create summaries in real-timeoup to 1 minute behind wall clock time•Provide query results in < 100 MS
  • 8. Micro Scale: EfficiencyWe found that:•Summaries of the data were needed in order to keepqueries < 100 MSoServeroProcessoProcess setsoTopology•Time series needed for each summary typeoMinuteoHouroDay We tried: •Flat files •Network file systems •Distributed file systems •Relational databases •NoSQL key-value store •Memory based SQL databases •Distributed shared memory
  • 9. Micro Scale:We learned the hard way Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006
  • 10. Micro Scale: Solution Aggregation: •HPC pipeline processing model •RAM based data model •Queues as message bus •Stateless processing •Adaptive control •Queries are fully abstractedHorizontal scale may require that you revisit your design
  • 11. Micro Scale We all know we need to scale horizontallyCluster Stateless •Any data processing with any time constraint•Use components that cluster •Processes can be run on any server•Don’t do backups, use replication •Processes can be migrated•Redis, memcached, RabbitMQ, Hbase can be clustered •Multiple processes can be added as load varies•Postgresql & MySQl don’t really cluster •All data stored in distributed shared memory •Message passing between components •Send keys and not data
  • 12. Macro Scale: Application Capacity Load: •Most significant load impact from remote servers •User interaction, APIs, and queries do not load the system as much as remote servers •Support 100, 1,000, 10,000, 100,000 remote serversWill a design that supports 10,000 remote servers scale to support 100,000 remote servers?
  • 13. Infinite Scale•Paralyzes the design team •But... you don’t want to say no to the business•Fosters bad behavior •The whole purpose is to add users•Unrealistic expectations •When the business brings a customer with•Developers forced to take unrealistic action 10,000 servers you want to say; bring it on
  • 14. Macro Scale: CapacityWe started with a snapshot:•Supported 1000 remote servers•Micro scale results made it possible to scale out•fairly flexible application component design•Scale out to 10,000 remote servers•This is a financial calculation•Scaled out in linear fashion•Data processing•Storage•Started in linear fashion then determined actual requirements
  • 15. Macro Scale Solution: The PodPod 0 Pod 1 Pod Architecture: •Segmented infrastructure along the lines of load sources •Create infrastructure to support specific load •Instantiate additional infrastructure with additional load •When a pod gets to 85-90% capacity spin out a new pod •Capacity of a pod is a financial calculation •Scale within a pod in 1000 server increments •Need to automate the deployment of a pod
  • 16. Adaptive Control The Pod Rocks Metrics are king•You can’t react fast enough •Isolated•Scale out •Distributed •Business metrics•Scale back •Located where needed •Application metrics•Migrate •Behind the firewall Time Series Data Don’t trust the data •Issues relate to a specific time •Clocks are skewed •Complete state information for any given minute •Encodings fail •Don’t know what info is needed before a problem •Save all bad data & replay occurs; all data every minute •Think defensive
  • 17. Conclusions•Stateless DataoKey to horizontal scale•Disk is tapeoRAM based design is critical, not optional•ClusteroUse components that cluster, not just master/salve•Design for infinite scale does not work•Pod approach is an answer for infinite scale
  • 18. Thank You! Donn Rochettedonn@appfirst.com www.appfirst.com

×