Architecting businesscritical enterpriseapplication:Automated SupportKumar PalaniappanEnterprise Architect, NetApp
Agenda¡  NetApp’s Business Challenge¡  Solution Architecture¡  Best Practices¡  Performance Benchmarks¡  Questions   ...
The AutoSupport FamilyThe Foundation of NetApp Support Strategies            ¡  Catch issues before they become critical ...
AutoSupport – Why Does it Matter?                   Customers                    Partners                            NetAp...
Business Challenges   Gateways                ETL               Data Warehouse                              Reporting     ...
Incoming AutoSupport Volumes   and TB Consumption6,000                          Actual (tb)                            Pro...
New Functionality Needed Weeks                                          Product                                          A...
Solution Architecture                        8
Hadoop ArchitectureIngest   F Ingest HDFS       Ingest                           Lookup         l                         ...
Solution Architecture                        10
Data Ingestion¡  Use of Flume (v1) to consume large XML objects up to  20 MB compressed ea.¡  4 agents feed 2 collectors...
Data Transformation¡  Ingested data processed every 1 min. (w/ 5 min. lag)  –  Relies on Fair Scheduler to meet SLA  –  O...
Low Latency Application Data Access¡  High performance REST lookups¡  Data stored as Avro serialized objects for    perf...
Export to Oracle DSS¡  Pentaho pulls data from HBase and HDFS¡  Pushes into Oracle star schema¡  Daily export –  530 mi...
Disaster Recovery¡  DR cluster with 75% of production capacity    –  in Release 2¡  Active/active from Flume back    –  ...
NetApp OpenSolution for Hadoop(NOSH)                      16
HDFS Storage: Key NeedsAttribute     Key Drivers                                 RequirementPerformance   •  Fast response...
NetApp Open Solution for Hadoop                                     NFS over 1GbE                      HDFS               ...
Performance andScaling                  19
Linear Throughput Scaling as             DataNode Count Increases                            Read/Write Throughput        ...
Summary          21
Takeaways¡  Hadoop-based Big Data architecture    enables  –  Cost effective scaling  –  Low latency access to data  –  A...
¡  Kumar Palaniappan                                                                                  @megamda© 2011 NetA...
Upcoming SlideShare
Loading in...5
×

Architecting Business Critical Enterprise Apps-NetApp

1,812

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,812
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Architecting Business Critical Enterprise Apps-NetApp"

  1. 1. Architecting businesscritical enterpriseapplication:Automated SupportKumar PalaniappanEnterprise Architect, NetApp
  2. 2. Agenda¡  NetApp’s Business Challenge¡  Solution Architecture¡  Best Practices¡  Performance Benchmarks¡  Questions 2
  3. 3. The AutoSupport FamilyThe Foundation of NetApp Support Strategies ¡  Catch issues before they become critical ¡  Secure automated “call-home” service ¡  System monitoring and nonintrusive alerting ¡  RMA requests without customer action ¡  Enables faster incident management “My AutoSupport Upgrade Advisor tool does all the hard work for me, saving me 4 to 5 hours of work per storage system and providing an upgrade plan that’s complete and easy to follow.” 3
  4. 4. AutoSupport – Why Does it Matter? Customers Partners NetApp Product Adoption & UsageProduct Planning Install Base Mgmt & Development Data Mining Lead Generation Pre Sales Stickiness Measurements “What If’ Scenarios & Capacity Planning Establish Initial Call Home Deployment Measure Implementation Effectiveness Storage usage Monitoring & Billing (NAFS) Event-Based Triggers & Alerts Automated E2E Case Technical Automated Case Creation Handling Support Automated… …Parts & Support Dispatch SAM Services: 1) Proactive Health Checks 2) Upgrade Planning Proactive Planning & Storage Efficiency Measurements & Recommendations Optimization PS Consulting: 1) Perf Analysis & Opt. Recommendations 2) Storage Capacity Planning Critical to Quality Metrics Product Adoption & Usage Metrics Feedback Quality & Reliability Metrics NetApp Confidential – Limited Use 4
  5. 5. Business Challenges Gateways ETL Data Warehouse Reporting •  Only 5% of data goes into the•  600K ASUPs •  Data needs to •  Numerous mining data warehouse, rest every week be parsed and requests are not satisfied unstructured. It’s growing loaded in 15 currently•  40% coming over 6-8TB per month the weekend mins •  Huge untapped potential •  Oracle DBMS struggling to of valuable information for•  .5% growth week scale, maintenance and lead generation, over week backups challenging supportability, and BI •  No easy way to access this unstructured content Finally, the incoming load doubles every 16 months! NetApp Confidential – Limited Use 5
  6. 6. Incoming AutoSupport Volumes and TB Consumption6,000 Actual (tb) Projected5,000 Double High Count & Size Low Count & Size4,0003,0002,0001,000 0 Jan-00 Jan-01 Jan-02 Jan-03 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-15 Jan-16 Jan-17 Jan-04 Jan-14 ¡  At projected current rate of growth, total storage requirements continue doubling every 16 months ¡  Cost Model: > $15M per year Ecosystem costs NetApp Confidential – Limited Use 6
  7. 7. New Functionality Needed Weeks Product Analysis Service Cross Sell & Performance Up Sell Planning Customer Intelligence Sales License Management Proactive Support Customer Product Self Service DevelopmentSeconds Gigabytes Petabytes 7
  8. 8. Solution Architecture 8
  9. 9. Hadoop ArchitectureIngest F Ingest HDFS Ingest Lookup l ASUP u Logs, m Config R e Performance Tools and raw config Data E S T Subscribe MapReduce Pig Analyze Metrics, Analytics, EBI 9
  10. 10. Solution Architecture 10
  11. 11. Data Ingestion¡  Use of Flume (v1) to consume large XML objects up to 20 MB compressed ea.¡  4 agents feed 2 collectors in production¡  Basic Process Control using supervisord (ZK in R2?)¡  Reliability Mode: Disk Failover (Store on Failure)¡  Separate sinks for Text and Binary sections¡  Arrival time bucketing by minute¡  Snappy Sequence Files with JSON values¡  Evaluating Flume NG¡  Ingesting 4.5 TB uncompressed/week 80% in an 8 hour window
  12. 12. Data Transformation¡  Ingested data processed every 1 min. (w/ 5 min. lag) –  Relies on Fair Scheduler to meet SLA –  Oozie (R0) -> Pentaho PDI (R1) for scheduling¡  Configuration data written to HBase using Avro¡  Duplicate data written to HDFS as Hive / JSON for ad hoc queries¡  User scans of HBase for ad hoc queries avoided to meet SLA¡  Also simplifies data access –  query tools don’t yet have support for Avro serialization in HBase –  they all assume String keys and values (evolving to support Avro)
  13. 13. Low Latency Application Data Access¡  High performance REST lookups¡  Data stored as Avro serialized objects for performance and versioning¡  Solr used to search for objects (one core per region)¡  Then details pulled from HBase¡  Large objects (logs) indexed and pulled from HDFS¡  ~100 HBase regions (500 GB ea.) –  no splitting –  Snappy compressed tables¡  Future: HBase coprocessors to keep Solr indexes up to date
  14. 14. Export to Oracle DSS¡  Pentaho pulls data from HBase and HDFS¡  Pushes into Oracle star schema¡  Daily export –  530 million rows and 350 GB on peak days¡  Runs on 2 VMs –  64 GB RAM, 12 cores¡  Enables existing BI tools (OBIE) to query DSS database
  15. 15. Disaster Recovery¡  DR cluster with 75% of production capacity –  in Release 2¡  Active/active from Flume back –  Primary cluster the one HTTP/SMTP responder¡  SLA: cannot lose >1 hour of data –  can be lost in front-end switchover¡  HBase incremental backups¡  Staging used frequently for engineering test, operationally expensive so not used for DR
  16. 16. NetApp OpenSolution for Hadoop(NOSH) 16
  17. 17. HDFS Storage: Key NeedsAttribute Key Drivers RequirementPerformance •  Fast response time for •  Minimize Network bottlenecks search, ad-hoc, and real- •  Optimize server workload time queries •  Leverage storage HW to •  High replication counts increase cluster performance impact throughputOpex •  Lower operational costs for •  Optimize usable storage managing huge amounts of capacity data •  Decouple storage from •  Controlling staff costs and compute nodes to decrease cluster management costs the need to add more as clusters scale compute nodesEnterprise •  Protect SPOF at the •  Protect cluster metadata fromRobustness Hadoop name node SPOF •  Minimize cluster rebuild •  Minimize risks where equipment tends to fail NetApp Confidential – Limited Use 17
  18. 18. NetApp Open Solution for Hadoop NFS over 1GbE HDFS ¡  Easy to Deploy, Manage and Scale 10GbE NameNode ¡  Uses High Performance storage FAS2040 –  Resilient and Compact Secondary –  RAID Protection of Data NameNode –  Less Network Congestion ¡  Raw Capacity and densityMap –  120TB or 180TB in 4UReduce DataNodes / –  Fully serviceable storage system TaskTracker 4 separate sharedJobTracker : ¡  Reliability nothing partitions per datanode –  Hardware RAID & hot swap prevent job restart due to node go off-line in case of media failure E2660 DataNodes / –  Reliable metadata (Name Node) TaskTracker 6Gb/s SAS Direct Connect (1 per DataNode) Enterprise Class Hadoop 10GbE Links (1 per Node) NetApp Confidential – Limited Use 18
  19. 19. Performance andScaling 19
  20. 20. Linear Throughput Scaling as DataNode Count Increases Read/Write Throughput 6000 Tot Read Throughput (MB/s) 5000 Tot Write Throughput (MB/s) 4000Throughput 3000 2000 1000 0 4 8 12 24 DataNodes per Configuration Tested NetApp Confidential – Limited Use 20
  21. 21. Summary 21
  22. 22. Takeaways¡  Hadoop-based Big Data architecture enables –  Cost effective scaling –  Low latency access to data –  Ad hoc issues & pattern detection –  Predictive modeling in future¡  Using our own innovative Hadoop storage technology NOSH¡  An enterprise transformation 22
  23. 23. ¡  Kumar Palaniappan @megamda© 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced withoutprior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp,the NetApp logo, and Go further, faster, are trademarks or registered trademarks of NetApp, Inc.in the United States and/or other countries. All other brands or products are trademarks orregistered trademarks of their respective holders and should be treated as such.

×