• Save
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - Stephen Daniel, NetApp
 

Like this? Share it with your network

Share

Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - Stephen Daniel, NetApp

on

  • 3,566 views

NetApp is in the process of moving a petabyte-scale database of customer support information from a traditional relational data warehouse to a Hadoop-based application stack. This talk will explore ...

NetApp is in the process of moving a petabyte-scale database of customer support information from a traditional relational data warehouse to a Hadoop-based application stack. This talk will explore the application requirements and the resulting hardware and software architecture. Particular attention will be paid to trade-offs in the storage stack, along with data on the various approaches considered, benchmarked, and the resulting final architecture. Attendees will learn a range of architectures available when contemplating a large Hadoop project and some of the process used by NetApp to choose amongst the alternatives.

Statistics

Views

Total Views
3,566
Views on SlideShare
3,179
Embed Views
387

Actions

Likes
10
Downloads
0
Comments
0

6 Embeds 387

http://www.cloudera.com 380
http://blog.cloudera.com 2
https://twitter.com 2
http://cloudera.matt.dev 1
http://paper.li 1
https://www.cloudera.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • AutoSupport (resident in DATA ONTAP (OS) of every NetApp storage system) constantly monitors, troubleshoots and reports on the health of NetApp systemsIn addition to using AutoSupport for case generation and part dispatch, NetApp’s risk prognosis ecosystem (developed through innovations in people, process, and technology) delivers exemplary storage uptime and customer satisfactionRisks handled include issues in areas of configuration, interoperability, and other errors induced in the storage system from unintentional operationsNetApp support site has knowledgebase articles and support bulletins to help SAMs (Support Account Managers) and FSEs (Field Support Engineers) drive adoption and awareness and help customers actively mitigate risks
  • The Current DataWarehouse will reach limits of capacity as well as processing capabilities for future Data ONTAP releasesMissed SLAsThe current environment has limited reporting capabilities, with a large demand for ASUP reportingProcessing all Performance Data for analysis is not due to size and scale of dataData doubling every 16 months
  • Proactive SupportPredict failure probabilitiesText events, performance changes, lifetime usageProduct AnalysisFeature usagePer segment variationsCapacity PlanningGrowth trendsSeasonality factorsUp-sell, cross-sell models

Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - Stephen Daniel, NetApp Presentation Transcript

  • 1. Architecting aBusiness-CriticalApplication in HadoopStephen DanielTechnical DirectorMarty MayerSr. Manager, AutoSupport
  • 2. Agenda NetApp: Drowning in Data Technology Assessment Business Drivers to Choose E-Series Solution Architecture Performance Benchmarks Best Practices Questions 2
  • 3. The AutoSupport FamilyThe foundation of NetApp Support strategies  Catch issues before they become critical  Secure automated “call-home” service  System monitoring and nonintrusive alerting  RMA requests without customer action  Enables faster incident management “My AutoSupport Upgrade Advisor tool does all the hard work for me, saving me 4 to 5 hours of work per storage system and providing an upgrade plan that’s complete and easy to follow.” 3
  • 4. AutoSupport CapabilitiesCustomer Install Base NetApp and Partner Usage Auto Replacement Parts (Reactive) Auto Case Creation (Reactive) Customer Assess & Optimize Environments (Proactive) AutoSupport Messages (HTTPS) AutoSupport Database NetApp Storage System Risk Detection & Automation Engine Customer Messages Sizing and (Email) modeling (Proactive) Storage My AutoSupport – Customer Administrator Portal (Proactive and Predictive) 4
  • 5. Business Challenges Gateways ETL Data Warehouse Reporting• 600K ASUPs • Data needs to • Only 5% of data goes into the • Numerous mining every week be parsed and data warehouse, rest requests are not satisfied• 40% coming over loaded in 15 unstructured, yet it’s growing currently the weekend mins 6-8TB per month • Huge untapped potential• .5% growth week • Oracle DBMS struggling to of valuable information for over week scale, maintenance and lead generation, backups challenging supportability, and BI • No easy way to access this unstructured content Finally, the incoming load doubles every 16 months! 5
  • 6. Incoming AutoSupport Volumesand TB Consumption Flat-File Storage Requirement35003000 Total Usage (tb)25002000 Projected Total Usage (tb)1500 Doubles1000500 0 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16 At projected current rate of growth, As of June 2011: total storage requirement will ~ 600,000 events archived each week double every 16 months ~ 3 TB Disk space used each week Cost Model: Events growing at 40% year over year > $15M per year Ecosystem costs Disk use growing faster Expanding products & features 7
  • 7. Big Data is ExpensiveGrowth Rates (CAGR) – Data: +68% – Cost/byte: -30% – Net cost: +30% 4Budget is flat 8
  • 8. Problem Summary1. Data Growing at 68% CAGR2. Current implementation will not survive much longer – We will fail to meet SLAs on ingest of new data – To meet business critical SLAs we will limit the scope of the data warehouse3. Many new opportunities / requirements 9
  • 9. New Functionality Needed Weeks Product Analysis Service Cross Sell & Performance Up Sell Planning Customer Intelligence Sales License Management Proactive Support Customer Product Self Service DevelopmentSeconds Gigabytes Petabytes 10
  • 10. Predictive Analytics Examples Proactive Support – Predict failure probabilities – Text events, performance changes, lifetime usage Product Analysis – Feature usage – Per segment variations Capacity Planning – Growth trends – Seasonality factors Up-sell, cross-sell models 11
  • 11. Technology Assessment
  • 12. Requirements used for POC & RFP Cost Effective Highly Scalable Adaptive New Analytical Capabilities 13
  • 13. POC TestsLog Data: Report analysis for an event across all install- base (25% of the install base and 2 months of data used for benchmarks) – 6 months to 1 year. – I/O bound Counter Manager : Analysis restricted generally to 1 system or 1 cluster data for a single month (2 days 25% install base used for benchmark) – Trending across install-base are generally rare and ad-hoc. – More CPU bound (some tools will query large numbers of counters) 14
  • 14. POC Environment 15
  • 15. Prime Hadoop Use Cases in POC WorkloadUse Case Current Capabilities How Hadoop can help? TypeLogs (EMS) I/O • One month of data is worth • POC shows a 10 nodeFind bound 24 B records cluster could processoccurrence • Out of this some 100 M one month of dataof a pattern records are loaded per within 20 minutesacross all log month in DW. Takes 4 daysfiles in last to load a week6 months • No ad-hoc capability exists to mine the pending records 17
  • 16. Prime Hadoop Use Cases in POC WorkloadUse Case Current Capabilities How Hadoop can help? TypeLogs (EMS) I/O • One month of data is worth • POC shows a 10 nodeFind bound 24 B records cluster could processoccurrence • Out of this some 100 M one month of dataof a pattern records are loaded per within 20 minutesacross all log month in DW. Takes 4 daysfiles in last to load a week6 months • No ad-hoc capability exists to mine the pending recordsCM CPU • Up to 10 M records in • Achieved throughput ofFind hot disks bound single CM file 3M records perby disk types, • 200 B records in a month second during POCsys model etc. • No capability exists today • 100 node cluster is in backend infrastructure projected to process to process these one month of data in 1.8 hours 18
  • 17. Solution Architecture
  • 18. ASUP.Next Hadoop Architecture HDFS Lookup F L Ingest Logs, R Ingest AsupIngest U E Performance Config Tools M and raw config S Data E T Pig Subscribe Analyze Metrics, Analytics, E BI 20
  • 19. NetApp Open Solution for Hadoop Easy to Deploy, Manage, Scale Performance; Resilience; Density  Performance  Bandwidth for streaming  IOPs for metadata  Reduced cluster network congestion  Capacity and density  4 servers and 120TB fit in 8U  Fully serviceable storage system  Reliability  Hardware RAID and hot swap prevent job restart in case of media failure  Reliable metadata (Name Node)  Enterprise-class fit and finish Enterprise Class Hadoop 21
  • 20. NetApp Open Solution for Hadoop Easy to Deploy, Manage, Scale Performance; Resilience; Density  Performance  Bandwidth for streaming  IOPs for metadata  Reduced cluster network congestion  Capacity and density  4 servers and 120TB fit in 8U  Fully serviceable storage system  Reliability  Hardware RAID and hot swap prevent job restart in case of media failure  Reliable metadata (Name Node)  Enterprise-class fit and finish Enterprise Class Hadoop 22
  • 21. NetApp Storage Solution Architecture Key Attributes: – Storage is protected by in-box RAID  Shared spare pool defers replacement of drives  Rebuild does not consume network bandwidth – Storage is striped  Maximize performance by minimizing unequal storage utilization – Reliable storage: HDFS replication count 2  Fewer disks  Less space, power, cooling, cost, … 23
  • 22. NetApp Storage Solution Architecture Primary Questions: – Performance? – Cost? 24
  • 23. NetApp Storage Solution Architecture RESULTS ARE PRELIMINARY Performance Concerns – Initial testing has focused on using TestDFSIO 25
  • 24. NetApp Storage Solution Architecture RESULTS ARE PRELIMINARY Performance Concerns – Initial testing has focused on using TestDFSIO Per-Disk: – 14 disks/server in array – 6 disks/server direct- attach 26
  • 25. NetApp Storage Solution Architecture Minimizing TCO – Disk rebuild  Handled in the controller  Minimal impact to performance  No network bandwidth consumed – Server uptime  Very high – Hardware maintenance  Swap out dead disks as routine, not exception  Swap out of stateless servers is painless 27
  • 26. Conclusions
  • 27. Take Aways NetApp Assessed multiple traditional DB technologies to solve it’s Big Data problem and determined Hadoop was the best fit Moved from direct attach disks to array-based storage to improve TCO The overall architecture supports scale out growth 30
  • 28. © 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent ofNetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Gofurther, faster, AutoSupport, Data ONTAP, NOW, and Snapshot are trademarks or registered trademarks of NetApp, Inc. inthe United States and/or other countries. Symantec is a registered trademark of Symantec Corporation. All other brands orproducts are trademarks or registered trademarks of their respective holders and should be treated as such.