Architecting aBusiness-CriticalApplication in HadoopStephen DanielTechnical DirectorMarty MayerSr. Manager, AutoSupport
Agenda NetApp: Drowning in Data Technology Assessment Business Drivers to Choose E-Series Solution Architecture Perfo...
The AutoSupport FamilyThe foundation of NetApp Support strategies              Catch issues before they become critical  ...
AutoSupport CapabilitiesCustomer Install Base                                       NetApp and Partner Usage              ...
Business Challenges  Gateways               ETL               Data Warehouse                       Reporting• 600K ASUPs  ...
Incoming AutoSupport Volumesand TB Consumption                              Flat-File Storage Requirement35003000         ...
Big Data is ExpensiveGrowth Rates (CAGR)  – Data: +68%  – Cost/byte: -30%  – Net cost: +30%                         4Budge...
Problem Summary1. Data Growing at 68% CAGR2. Current implementation will not survive   much longer  –   We will fail to me...
New Functionality Needed Weeks                                          Product                                          A...
Predictive Analytics Examples Proactive Support – Predict failure probabilities – Text events, performance changes, lifet...
Technology Assessment
Requirements used for POC & RFP Cost Effective Highly Scalable Adaptive New Analytical Capabilities                   ...
POC TestsLog Data: Report analysis for an event across all install-  base (25% of the install base and 2 months of data  u...
POC Environment                  15
Prime Hadoop Use Cases in POC                 WorkloadUse Case                        Current Capabilities        How Hado...
Prime Hadoop Use Cases in POC                 WorkloadUse Case                        Current Capabilities        How Hado...
Solution Architecture
ASUP.Next Hadoop Architecture                    HDFS                                  Lookup         F         L Ingest L...
NetApp Open Solution for Hadoop                 Easy to Deploy, Manage, Scale                 Performance; Resilience; Den...
NetApp Open Solution for Hadoop                 Easy to Deploy, Manage, Scale                 Performance; Resilience; Den...
NetApp Storage Solution Architecture Key Attributes: – Storage is protected by in-box RAID     Shared spare pool defers ...
NetApp Storage Solution Architecture Primary Questions: – Performance? – Cost?                                       24
NetApp Storage Solution Architecture RESULTS ARE PRELIMINARY Performance  Concerns – Initial testing has   focused on usi...
NetApp Storage Solution Architecture RESULTS ARE PRELIMINARY Performance  Concerns – Initial testing has   focused on usi...
NetApp Storage Solution Architecture Minimizing TCO – Disk rebuild    Handled in the controller    Minimal impact to pe...
Conclusions
Take Aways NetApp Assessed multiple traditional DB  technologies to solve it’s Big Data problem and  determined Hadoop wa...
© 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent ofNe...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - Stephen Daniel, NetApp
Upcoming SlideShare
Loading in...5
×

Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - Stephen Daniel, NetApp

3,242

Published on

NetApp is in the process of moving a petabyte-scale database of customer support information from a traditional relational data warehouse to a Hadoop-based application stack. This talk will explore the application requirements and the resulting hardware and software architecture. Particular attention will be paid to trade-offs in the storage stack, along with data on the various approaches considered, benchmarked, and the resulting final architecture. Attendees will learn a range of architectures available when contemplating a large Hadoop project and some of the process used by NetApp to choose amongst the alternatives.

Published in: Technology, Business
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,242
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide
  • AutoSupport (resident in DATA ONTAP (OS) of every NetApp storage system) constantly monitors, troubleshoots and reports on the health of NetApp systemsIn addition to using AutoSupport for case generation and part dispatch, NetApp’s risk prognosis ecosystem (developed through innovations in people, process, and technology) delivers exemplary storage uptime and customer satisfactionRisks handled include issues in areas of configuration, interoperability, and other errors induced in the storage system from unintentional operationsNetApp support site has knowledgebase articles and support bulletins to help SAMs (Support Account Managers) and FSEs (Field Support Engineers) drive adoption and awareness and help customers actively mitigate risks
  • The Current DataWarehouse will reach limits of capacity as well as processing capabilities for future Data ONTAP releasesMissed SLAsThe current environment has limited reporting capabilities, with a large demand for ASUP reportingProcessing all Performance Data for analysis is not due to size and scale of dataData doubling every 16 months
  • Proactive SupportPredict failure probabilitiesText events, performance changes, lifetime usageProduct AnalysisFeature usagePer segment variationsCapacity PlanningGrowth trendsSeasonality factorsUp-sell, cross-sell models
  • Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - Stephen Daniel, NetApp

    1. 1. Architecting aBusiness-CriticalApplication in HadoopStephen DanielTechnical DirectorMarty MayerSr. Manager, AutoSupport
    2. 2. Agenda NetApp: Drowning in Data Technology Assessment Business Drivers to Choose E-Series Solution Architecture Performance Benchmarks Best Practices Questions 2
    3. 3. The AutoSupport FamilyThe foundation of NetApp Support strategies  Catch issues before they become critical  Secure automated “call-home” service  System monitoring and nonintrusive alerting  RMA requests without customer action  Enables faster incident management “My AutoSupport Upgrade Advisor tool does all the hard work for me, saving me 4 to 5 hours of work per storage system and providing an upgrade plan that’s complete and easy to follow.” 3
    4. 4. AutoSupport CapabilitiesCustomer Install Base NetApp and Partner Usage Auto Replacement Parts (Reactive) Auto Case Creation (Reactive) Customer Assess & Optimize Environments (Proactive) AutoSupport Messages (HTTPS) AutoSupport Database NetApp Storage System Risk Detection & Automation Engine Customer Messages Sizing and (Email) modeling (Proactive) Storage My AutoSupport – Customer Administrator Portal (Proactive and Predictive) 4
    5. 5. Business Challenges Gateways ETL Data Warehouse Reporting• 600K ASUPs • Data needs to • Only 5% of data goes into the • Numerous mining every week be parsed and data warehouse, rest requests are not satisfied• 40% coming over loaded in 15 unstructured, yet it’s growing currently the weekend mins 6-8TB per month • Huge untapped potential• .5% growth week • Oracle DBMS struggling to of valuable information for over week scale, maintenance and lead generation, backups challenging supportability, and BI • No easy way to access this unstructured content Finally, the incoming load doubles every 16 months! 5
    6. 6. Incoming AutoSupport Volumesand TB Consumption Flat-File Storage Requirement35003000 Total Usage (tb)25002000 Projected Total Usage (tb)1500 Doubles1000500 0 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16 At projected current rate of growth, As of June 2011: total storage requirement will ~ 600,000 events archived each week double every 16 months ~ 3 TB Disk space used each week Cost Model: Events growing at 40% year over year > $15M per year Ecosystem costs Disk use growing faster Expanding products & features 7
    7. 7. Big Data is ExpensiveGrowth Rates (CAGR) – Data: +68% – Cost/byte: -30% – Net cost: +30% 4Budget is flat 8
    8. 8. Problem Summary1. Data Growing at 68% CAGR2. Current implementation will not survive much longer – We will fail to meet SLAs on ingest of new data – To meet business critical SLAs we will limit the scope of the data warehouse3. Many new opportunities / requirements 9
    9. 9. New Functionality Needed Weeks Product Analysis Service Cross Sell & Performance Up Sell Planning Customer Intelligence Sales License Management Proactive Support Customer Product Self Service DevelopmentSeconds Gigabytes Petabytes 10
    10. 10. Predictive Analytics Examples Proactive Support – Predict failure probabilities – Text events, performance changes, lifetime usage Product Analysis – Feature usage – Per segment variations Capacity Planning – Growth trends – Seasonality factors Up-sell, cross-sell models 11
    11. 11. Technology Assessment
    12. 12. Requirements used for POC & RFP Cost Effective Highly Scalable Adaptive New Analytical Capabilities 13
    13. 13. POC TestsLog Data: Report analysis for an event across all install- base (25% of the install base and 2 months of data used for benchmarks) – 6 months to 1 year. – I/O bound Counter Manager : Analysis restricted generally to 1 system or 1 cluster data for a single month (2 days 25% install base used for benchmark) – Trending across install-base are generally rare and ad-hoc. – More CPU bound (some tools will query large numbers of counters) 14
    14. 14. POC Environment 15
    15. 15. Prime Hadoop Use Cases in POC WorkloadUse Case Current Capabilities How Hadoop can help? TypeLogs (EMS) I/O • One month of data is worth • POC shows a 10 nodeFind bound 24 B records cluster could processoccurrence • Out of this some 100 M one month of dataof a pattern records are loaded per within 20 minutesacross all log month in DW. Takes 4 daysfiles in last to load a week6 months • No ad-hoc capability exists to mine the pending records 17
    16. 16. Prime Hadoop Use Cases in POC WorkloadUse Case Current Capabilities How Hadoop can help? TypeLogs (EMS) I/O • One month of data is worth • POC shows a 10 nodeFind bound 24 B records cluster could processoccurrence • Out of this some 100 M one month of dataof a pattern records are loaded per within 20 minutesacross all log month in DW. Takes 4 daysfiles in last to load a week6 months • No ad-hoc capability exists to mine the pending recordsCM CPU • Up to 10 M records in • Achieved throughput ofFind hot disks bound single CM file 3M records perby disk types, • 200 B records in a month second during POCsys model etc. • No capability exists today • 100 node cluster is in backend infrastructure projected to process to process these one month of data in 1.8 hours 18
    17. 17. Solution Architecture
    18. 18. ASUP.Next Hadoop Architecture HDFS Lookup F L Ingest Logs, R Ingest AsupIngest U E Performance Config Tools M and raw config S Data E T Pig Subscribe Analyze Metrics, Analytics, E BI 20
    19. 19. NetApp Open Solution for Hadoop Easy to Deploy, Manage, Scale Performance; Resilience; Density  Performance  Bandwidth for streaming  IOPs for metadata  Reduced cluster network congestion  Capacity and density  4 servers and 120TB fit in 8U  Fully serviceable storage system  Reliability  Hardware RAID and hot swap prevent job restart in case of media failure  Reliable metadata (Name Node)  Enterprise-class fit and finish Enterprise Class Hadoop 21
    20. 20. NetApp Open Solution for Hadoop Easy to Deploy, Manage, Scale Performance; Resilience; Density  Performance  Bandwidth for streaming  IOPs for metadata  Reduced cluster network congestion  Capacity and density  4 servers and 120TB fit in 8U  Fully serviceable storage system  Reliability  Hardware RAID and hot swap prevent job restart in case of media failure  Reliable metadata (Name Node)  Enterprise-class fit and finish Enterprise Class Hadoop 22
    21. 21. NetApp Storage Solution Architecture Key Attributes: – Storage is protected by in-box RAID  Shared spare pool defers replacement of drives  Rebuild does not consume network bandwidth – Storage is striped  Maximize performance by minimizing unequal storage utilization – Reliable storage: HDFS replication count 2  Fewer disks  Less space, power, cooling, cost, … 23
    22. 22. NetApp Storage Solution Architecture Primary Questions: – Performance? – Cost? 24
    23. 23. NetApp Storage Solution Architecture RESULTS ARE PRELIMINARY Performance Concerns – Initial testing has focused on using TestDFSIO 25
    24. 24. NetApp Storage Solution Architecture RESULTS ARE PRELIMINARY Performance Concerns – Initial testing has focused on using TestDFSIO Per-Disk: – 14 disks/server in array – 6 disks/server direct- attach 26
    25. 25. NetApp Storage Solution Architecture Minimizing TCO – Disk rebuild  Handled in the controller  Minimal impact to performance  No network bandwidth consumed – Server uptime  Very high – Hardware maintenance  Swap out dead disks as routine, not exception  Swap out of stateless servers is painless 27
    26. 26. Conclusions
    27. 27. Take Aways NetApp Assessed multiple traditional DB technologies to solve it’s Big Data problem and determined Hadoop was the best fit Moved from direct attach disks to array-based storage to improve TCO The overall architecture supports scale out growth 30
    28. 28. © 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent ofNetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Gofurther, faster, AutoSupport, Data ONTAP, NOW, and Snapshot are trademarks or registered trademarks of NetApp, Inc. inthe United States and/or other countries. Symantec is a registered trademark of Symantec Corporation. All other brands orproducts are trademarks or registered trademarks of their respective holders and should be treated as such.

    ×