SlideShare a Scribd company logo
1 of 29
Architecting a
Business-Critical
Application in Hadoop
Stephen Daniel
Technical Director
Marty Mayer
Sr. Manager, AutoSupport
Agenda

 NetApp: Drowning in Data
 Technology Assessment
 Business Drivers to Choose E-Series
 Solution Architecture
 Performance Benchmarks
 Best Practices
 Questions

                                        2
The AutoSupport Family
The foundation of NetApp Support strategies



              Catch issues before they become critical
              Secure automated “call-home” service
              System monitoring and nonintrusive
               alerting
              RMA requests without customer action
              Enables faster incident management


       “My AutoSupport Upgrade Advisor tool does all the hard work for
       me, saving me 4 to 5 hours of work per storage system and
       providing an upgrade plan that’s complete and easy to follow.”



                                                                         3
AutoSupport Capabilities
Customer Install Base                                       NetApp and Partner Usage


                                                     Auto Replacement Parts (Reactive)      Auto Case
                                                                                            Creation
                                                                                            (Reactive)

  Customer                                                                                         Assess &
                                                                                                   Optimize
  Environments                                                                                     (Proactive)
                                  AutoSupport

                                  Messages (HTTPS)
                                                       AutoSupport
                                                        Database
                 NetApp Storage
                 System
                                                            Risk Detection
                                                         & Automation Engine
                 Customer
                 Messages
                                                                                                Sizing and
                 (Email)                                                                        modeling
                                                                                                (Proactive)


 Storage                                                                     My AutoSupport – Customer
 Administrator                                                               Portal (Proactive and Predictive)




                                                                                                                 4
Business Challenges




  Gateways               ETL               Data Warehouse                       Reporting
• 600K ASUPs        • Data needs to   • Only 5% of data goes into the    • Numerous mining
  every week          be parsed and     data warehouse, rest               requests are not satisfied
• 40% coming over     loaded in 15      unstructured, yet it’s growing     currently
  the weekend         mins              6-8TB per month                  • Huge untapped potential
• .5% growth week                     • Oracle DBMS struggling to          of valuable information for
  over week                             scale, maintenance and             lead generation,
                                        backups challenging                supportability, and BI
                                      • No easy way to access this
                                        unstructured content

         Finally, the incoming load doubles every 16 months!

                                                                                                         5
Incoming AutoSupport Volumes
and TB Consumption
                              Flat-File Storage Requirement
3500
3000
                                 Total Usage (tb)
2500
2000                             Projected Total Usage (tb)
1500                             Doubles
1000
500
  0
  Jan-05   Jan-06   Jan-07   Jan-08   Jan-09   Jan-10   Jan-11   Jan-12   Jan-13   Jan-14   Jan-15   Jan-16


 At projected current rate of growth,                      As of June 2011:
  total storage requirement will                            ~ 600,000 events archived each week
  double every 16 months                                    ~ 3 TB Disk space used each week
 Cost Model:                                               Events growing at 40% year over year
  > $15M per year Ecosystem costs                           Disk use growing faster
                                                            Expanding products & features


                                                                                                          7
Big Data is Expensive


Growth Rates (CAGR)

  – Data: +68%

  – Cost/byte: -30%

  – Net cost: +30%

                         4
Budget is flat




                             8
Problem Summary

1. Data Growing at 68% CAGR
2. Current implementation will not survive
   much longer
  –   We will fail to meet SLAs on ingest of new
      data
  –   To meet business critical SLAs we will limit
      the scope of the data warehouse
3. Many new opportunities / requirements




                                                     9
New Functionality Needed


 Weeks
                                          Product
                                          Analysis
                                                                   Service
                Cross Sell &                  Performance
                  Up Sell                      Planning
                                Customer
                               Intelligence                         Sales
                   License
                 Management           Proactive
                                      Support
              Customer                                             Product
             Self Service                                        Development
Seconds
          Gigabytes                                  Petabytes


                                                                             10
Predictive Analytics Examples
 Proactive Support
 – Predict failure probabilities
 – Text events, performance changes, lifetime
   usage
 Product Analysis
 – Feature usage
 – Per segment variations
 Capacity Planning
 – Growth trends
 – Seasonality factors
 Up-sell, cross-sell models

                                                11
Technology Assessment
Requirements used for POC & RFP

 Cost Effective
 Highly Scalable
 Adaptive
 New Analytical Capabilities




                                  13
POC Tests
Log Data: Report analysis for an event across all install-
  base (25% of the install base and 2 months of data
  used for benchmarks)
  – 6 months to 1 year.
  – I/O bound
 Counter Manager : Analysis restricted generally to 1
  system or 1 cluster data for a single month (2 days
  25% install base used for benchmark)
  – Trending across install-base are generally rare and
    ad-hoc.
  – More CPU bound (some tools will query large
    numbers of counters)


                                                             14
POC Environment




                  15
Prime Hadoop Use Cases in POC
                 Workload
Use Case                        Current Capabilities        How Hadoop can help?
                  Type

Logs (EMS)         I/O      • One month of data is worth    • POC shows a 10 node
Find              bound       24 B records                    cluster could process
occurrence                  • Out of this some 100 M          one month of data
of a pattern                  records are loaded per          within 20 minutes
across all log                month in DW. Takes 4 days
files in last                 to load a week
6 months                    • No ad-hoc capability exists
                              to mine the pending records




                                                                                      17
Prime Hadoop Use Cases in POC
                 Workload
Use Case                        Current Capabilities        How Hadoop can help?
                  Type

Logs (EMS)         I/O      • One month of data is worth    • POC shows a 10 node
Find              bound       24 B records                    cluster could process
occurrence                  • Out of this some 100 M          one month of data
of a pattern                  records are loaded per          within 20 minutes
across all log                month in DW. Takes 4 days
files in last                 to load a week
6 months                    • No ad-hoc capability exists
                              to mine the pending records

CM                 CPU      • Up to 10 M records in         • Achieved throughput of
Find hot disks    bound       single CM file                  3M records per
by disk types,              • 200 B records in a month        second during POC
sys model etc.              • No capability exists today    • 100 node cluster is
                              in backend infrastructure       projected to process
                              to process these                one month of data
                                                              in 1.8 hours




                                                                                       18
Solution Architecture
ASUP.Next Hadoop Architecture


                    HDFS                                  Lookup
         F
         L Ingest Logs,                                      R
                               Ingest            Asup
Ingest   U                                                   E
                 Performance                     Config            Tools
         M      and raw config                               S
                                                 Data
         E                                                   T


                                                 Pig


                                     Subscribe
                      Analyze




                    Metrics, Analytics, E
                    BI

                                                                           20
NetApp Open Solution for Hadoop
                 Easy to Deploy, Manage, Scale
                 Performance; Resilience; Density
                  Performance
                      Bandwidth for streaming
                      IOPs for metadata
                      Reduced cluster network congestion
                  Capacity and density
                      4 servers and 120TB fit in 8U
                      Fully serviceable storage system
                  Reliability
                      Hardware RAID and hot swap prevent job
                        restart in case of media failure
                      Reliable metadata (Name Node)
                  Enterprise-class fit and finish


                        Enterprise Class Hadoop


                                                          21
NetApp Open Solution for Hadoop
                 Easy to Deploy, Manage, Scale
                 Performance; Resilience; Density
                  Performance
                      Bandwidth for streaming
                      IOPs for metadata
                      Reduced cluster network congestion
                  Capacity and density
                      4 servers and 120TB fit in 8U
                      Fully serviceable storage system
                  Reliability
                      Hardware RAID and hot swap prevent job
                        restart in case of media failure
                      Reliable metadata (Name Node)
                  Enterprise-class fit and finish


                        Enterprise Class Hadoop


                                                          22
NetApp Storage Solution Architecture

 Key Attributes:
 – Storage is protected by in-box RAID
     Shared spare pool defers replacement of drives
     Rebuild does not consume network bandwidth
 – Storage is striped
     Maximize performance by minimizing unequal
      storage utilization
 – Reliable storage: HDFS replication count 2
     Fewer disks
     Less space, power, cooling, cost, …


                                                       23
NetApp Storage Solution Architecture

 Primary Questions:
 – Performance?
 – Cost?




                                       24
NetApp Storage Solution Architecture

 RESULTS ARE
 PRELIMINARY
 Performance
  Concerns
 – Initial testing has
   focused on using
   TestDFSIO




                                       25
NetApp Storage Solution Architecture

 RESULTS ARE
 PRELIMINARY
 Performance
  Concerns
 – Initial testing has
   focused on using
   TestDFSIO

 Per-Disk:
 – 14 disks/server in array
 – 6 disks/server direct-
   attach

                                       26
NetApp Storage Solution Architecture

 Minimizing TCO
 – Disk rebuild
    Handled in the controller
    Minimal impact to performance
    No network bandwidth consumed
 – Server uptime
    Very high
 – Hardware maintenance
    Swap out dead disks as routine, not exception
    Swap out of stateless servers is painless


                                                     27
Conclusions
Take Aways

 NetApp Assessed multiple traditional DB
  technologies to solve it’s Big Data problem and
  determined Hadoop was the best fit

 Moved from direct attach disks to array-based
  storage to improve TCO

 The overall architecture supports scale out
  growth


                                                    30
© 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of
NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go
further, faster, AutoSupport, Data ONTAP, NOW, and Snapshot are trademarks or registered trademarks of NetApp, Inc. in
the United States and/or other countries. Symantec is a registered trademark of Symantec Corporation. All other brands or
products are trademarks or registered trademarks of their respective holders and should be treated as such.

More Related Content

What's hot

Moodle Moot Spain: Moodle Available and Scalable with MySQL HA - InnoDB Clust...
Moodle Moot Spain: Moodle Available and Scalable with MySQL HA - InnoDB Clust...Moodle Moot Spain: Moodle Available and Scalable with MySQL HA - InnoDB Clust...
Moodle Moot Spain: Moodle Available and Scalable with MySQL HA - InnoDB Clust...Keith Hollman
 
Google Cloud Anthos on HPE Simplivity
Google Cloud Anthos on HPE SimplivityGoogle Cloud Anthos on HPE Simplivity
Google Cloud Anthos on HPE SimplivityTanawit Chansuchai
 
Veeam Using cloud connect in 3 unexpected, awesome ways
Veeam Using cloud connect in 3 unexpected, awesome waysVeeam Using cloud connect in 3 unexpected, awesome ways
Veeam Using cloud connect in 3 unexpected, awesome waysTanawit Chansuchai
 
Netbackup advantages features and benefits Netbackup classes may help hands o...
Netbackup advantages features and benefits Netbackup classes may help hands o...Netbackup advantages features and benefits Netbackup classes may help hands o...
Netbackup advantages features and benefits Netbackup classes may help hands o...Vidhyalive
 
GAB 2016 Hybrid Storage
GAB 2016 Hybrid StorageGAB 2016 Hybrid Storage
GAB 2016 Hybrid StorageCarlos Mayol
 
DRaaS on Microsoft Azure with Veeam Software
DRaaS on Microsoft Azure with Veeam SoftwareDRaaS on Microsoft Azure with Veeam Software
DRaaS on Microsoft Azure with Veeam SoftwareTanawit Chansuchai
 
Veeam - Fast Secure Cloud base Disaster Recovery with Veeam Cloud Connect
Veeam - Fast Secure Cloud base Disaster Recovery with Veeam Cloud ConnectVeeam - Fast Secure Cloud base Disaster Recovery with Veeam Cloud Connect
Veeam - Fast Secure Cloud base Disaster Recovery with Veeam Cloud ConnectTanawit Chansuchai
 
Veeam Availability Suite version 10
Veeam Availability Suite version 10Veeam Availability Suite version 10
Veeam Availability Suite version 10Tanawit Chansuchai
 
Simplivity webinar presentation
Simplivity webinar presentationSimplivity webinar presentation
Simplivity webinar presentationRyan Hadden
 
A Winning Combination: IBM Storage and VMware
A Winning Combination: IBM Storage and VMwareA Winning Combination: IBM Storage and VMware
A Winning Combination: IBM Storage and VMwarePaula Koziol
 
FlexPod as a Competitive Edge
FlexPod as a Competitive EdgeFlexPod as a Competitive Edge
FlexPod as a Competitive Edge NetApp
 
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOneFlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOneNetApp
 
MySQL Technology Overview
MySQL Technology OverviewMySQL Technology Overview
MySQL Technology OverviewKeith Hollman
 
Scylla Summit 2018: Grow small, Get big — Experiences with Scylla
Scylla Summit 2018: Grow small, Get big — Experiences with ScyllaScylla Summit 2018: Grow small, Get big — Experiences with Scylla
Scylla Summit 2018: Grow small, Get big — Experiences with ScyllaScyllaDB
 
Blue Medora - VMware vROps Management Pack for VCE Vblock Overview
Blue Medora - VMware vROps Management Pack for VCE Vblock OverviewBlue Medora - VMware vROps Management Pack for VCE Vblock Overview
Blue Medora - VMware vROps Management Pack for VCE Vblock OverviewBlue Medora
 
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorEDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorDaniel Martin
 
Oracle Database Consolidation with FlexPod on Cisco UCS
Oracle Database Consolidation with FlexPod on Cisco UCSOracle Database Consolidation with FlexPod on Cisco UCS
Oracle Database Consolidation with FlexPod on Cisco UCSNetApp
 
FAS2240: An Inside Look
FAS2240: An Inside LookFAS2240: An Inside Look
FAS2240: An Inside LookNetApp
 
MySQL 8.0 InnoDB Cluster demo
MySQL 8.0 InnoDB Cluster demoMySQL 8.0 InnoDB Cluster demo
MySQL 8.0 InnoDB Cluster demoKeith Hollman
 

What's hot (20)

Moodle Moot Spain: Moodle Available and Scalable with MySQL HA - InnoDB Clust...
Moodle Moot Spain: Moodle Available and Scalable with MySQL HA - InnoDB Clust...Moodle Moot Spain: Moodle Available and Scalable with MySQL HA - InnoDB Clust...
Moodle Moot Spain: Moodle Available and Scalable with MySQL HA - InnoDB Clust...
 
Google Cloud Anthos on HPE Simplivity
Google Cloud Anthos on HPE SimplivityGoogle Cloud Anthos on HPE Simplivity
Google Cloud Anthos on HPE Simplivity
 
Veeam Using cloud connect in 3 unexpected, awesome ways
Veeam Using cloud connect in 3 unexpected, awesome waysVeeam Using cloud connect in 3 unexpected, awesome ways
Veeam Using cloud connect in 3 unexpected, awesome ways
 
GAB 2016 ASR
GAB 2016 ASRGAB 2016 ASR
GAB 2016 ASR
 
Netbackup advantages features and benefits Netbackup classes may help hands o...
Netbackup advantages features and benefits Netbackup classes may help hands o...Netbackup advantages features and benefits Netbackup classes may help hands o...
Netbackup advantages features and benefits Netbackup classes may help hands o...
 
GAB 2016 Hybrid Storage
GAB 2016 Hybrid StorageGAB 2016 Hybrid Storage
GAB 2016 Hybrid Storage
 
DRaaS on Microsoft Azure with Veeam Software
DRaaS on Microsoft Azure with Veeam SoftwareDRaaS on Microsoft Azure with Veeam Software
DRaaS on Microsoft Azure with Veeam Software
 
Veeam - Fast Secure Cloud base Disaster Recovery with Veeam Cloud Connect
Veeam - Fast Secure Cloud base Disaster Recovery with Veeam Cloud ConnectVeeam - Fast Secure Cloud base Disaster Recovery with Veeam Cloud Connect
Veeam - Fast Secure Cloud base Disaster Recovery with Veeam Cloud Connect
 
Veeam Availability Suite version 10
Veeam Availability Suite version 10Veeam Availability Suite version 10
Veeam Availability Suite version 10
 
Simplivity webinar presentation
Simplivity webinar presentationSimplivity webinar presentation
Simplivity webinar presentation
 
A Winning Combination: IBM Storage and VMware
A Winning Combination: IBM Storage and VMwareA Winning Combination: IBM Storage and VMware
A Winning Combination: IBM Storage and VMware
 
FlexPod as a Competitive Edge
FlexPod as a Competitive EdgeFlexPod as a Competitive Edge
FlexPod as a Competitive Edge
 
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOneFlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
 
MySQL Technology Overview
MySQL Technology OverviewMySQL Technology Overview
MySQL Technology Overview
 
Scylla Summit 2018: Grow small, Get big — Experiences with Scylla
Scylla Summit 2018: Grow small, Get big — Experiences with ScyllaScylla Summit 2018: Grow small, Get big — Experiences with Scylla
Scylla Summit 2018: Grow small, Get big — Experiences with Scylla
 
Blue Medora - VMware vROps Management Pack for VCE Vblock Overview
Blue Medora - VMware vROps Management Pack for VCE Vblock OverviewBlue Medora - VMware vROps Management Pack for VCE Vblock Overview
Blue Medora - VMware vROps Management Pack for VCE Vblock Overview
 
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorEDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
 
Oracle Database Consolidation with FlexPod on Cisco UCS
Oracle Database Consolidation with FlexPod on Cisco UCSOracle Database Consolidation with FlexPod on Cisco UCS
Oracle Database Consolidation with FlexPod on Cisco UCS
 
FAS2240: An Inside Look
FAS2240: An Inside LookFAS2240: An Inside Look
FAS2240: An Inside Look
 
MySQL 8.0 InnoDB Cluster demo
MySQL 8.0 InnoDB Cluster demoMySQL 8.0 InnoDB Cluster demo
MySQL 8.0 InnoDB Cluster demo
 

Similar to Architecting Business-Critical Hadoop Application for AutoSupport Data

Architecting BigData Enterprise Application-HadoopSummit2012
Architecting BigData Enterprise Application-HadoopSummit2012Architecting BigData Enterprise Application-HadoopSummit2012
Architecting BigData Enterprise Application-HadoopSummit2012Kumar Palaniappan
 
Architecting Business Critical Enterprise Apps-NetApp
Architecting Business Critical Enterprise Apps-NetAppArchitecting Business Critical Enterprise Apps-NetApp
Architecting Business Critical Enterprise Apps-NetAppDataWorks Summit
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase
 
Java Batch for Cost Optimized Efficiency
Java Batch for Cost Optimized EfficiencyJava Batch for Cost Optimized Efficiency
Java Batch for Cost Optimized EfficiencySridharSudarsan
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM (Middle East and Africa)
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaleBase
 
Microsoft StreamInsight
Microsoft StreamInsight Microsoft StreamInsight
Microsoft StreamInsight Mark Ginnebaugh
 
Oracle Quality of Service Management - Meeting SLAs in a Grid Environment
Oracle Quality of Service Management - Meeting SLAs in a Grid EnvironmentOracle Quality of Service Management - Meeting SLAs in a Grid Environment
Oracle Quality of Service Management - Meeting SLAs in a Grid EnvironmentAris Prassinos
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey
 
Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Umesh Ramalingachar
 
16h00 globant - aws globant-big-data_summit2012
16h00   globant - aws globant-big-data_summit201216h00   globant - aws globant-big-data_summit2012
16h00 globant - aws globant-big-data_summit2012infolive
 
Revenue and Spend Insights from Vistex and IBM Whitepaper
Revenue and Spend Insights from Vistex and IBM Whitepaper Revenue and Spend Insights from Vistex and IBM Whitepaper
Revenue and Spend Insights from Vistex and IBM Whitepaper SAP Solution Extensions
 
Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Narayan Bharadwaj
 

Similar to Architecting Business-Critical Hadoop Application for AutoSupport Data (20)

Architecting BigData Enterprise Application-HadoopSummit2012
Architecting BigData Enterprise Application-HadoopSummit2012Architecting BigData Enterprise Application-HadoopSummit2012
Architecting BigData Enterprise Application-HadoopSummit2012
 
Architecting Business Critical Enterprise Apps-NetApp
Architecting Business Critical Enterprise Apps-NetAppArchitecting Business Critical Enterprise Apps-NetApp
Architecting Business Critical Enterprise Apps-NetApp
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 
Java Batch for Cost Optimized Efficiency
Java Batch for Cost Optimized EfficiencyJava Batch for Cost Optimized Efficiency
Java Batch for Cost Optimized Efficiency
 
Ams Webinar 25 March 2010 Jf Final[1]
Ams Webinar 25 March 2010 Jf Final[1]Ams Webinar 25 March 2010 Jf Final[1]
Ams Webinar 25 March 2010 Jf Final[1]
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data Distribution
 
Microsoft StreamInsight
Microsoft StreamInsight Microsoft StreamInsight
Microsoft StreamInsight
 
Oracle Quality of Service Management - Meeting SLAs in a Grid Environment
Oracle Quality of Service Management - Meeting SLAs in a Grid EnvironmentOracle Quality of Service Management - Meeting SLAs in a Grid Environment
Oracle Quality of Service Management - Meeting SLAs in a Grid Environment
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012
 
Globant and Big Data on AWS
Globant and Big Data on AWSGlobant and Big Data on AWS
Globant and Big Data on AWS
 
16h00 globant - aws globant-big-data_summit2012
16h00   globant - aws globant-big-data_summit201216h00   globant - aws globant-big-data_summit2012
16h00 globant - aws globant-big-data_summit2012
 
Revenue and Spend Insights from Vistex and IBM Whitepaper
Revenue and Spend Insights from Vistex and IBM Whitepaper Revenue and Spend Insights from Vistex and IBM Whitepaper
Revenue and Spend Insights from Vistex and IBM Whitepaper
 
Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013Hadoop Summit San Diego Feb2013
Hadoop Summit San Diego Feb2013
 
Sap hana Overview
Sap hana OverviewSap hana Overview
Sap hana Overview
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Architecting Business-Critical Hadoop Application for AutoSupport Data

  • 1. Architecting a Business-Critical Application in Hadoop Stephen Daniel Technical Director Marty Mayer Sr. Manager, AutoSupport
  • 2. Agenda  NetApp: Drowning in Data  Technology Assessment  Business Drivers to Choose E-Series  Solution Architecture  Performance Benchmarks  Best Practices  Questions 2
  • 3. The AutoSupport Family The foundation of NetApp Support strategies  Catch issues before they become critical  Secure automated “call-home” service  System monitoring and nonintrusive alerting  RMA requests without customer action  Enables faster incident management “My AutoSupport Upgrade Advisor tool does all the hard work for me, saving me 4 to 5 hours of work per storage system and providing an upgrade plan that’s complete and easy to follow.” 3
  • 4. AutoSupport Capabilities Customer Install Base NetApp and Partner Usage Auto Replacement Parts (Reactive) Auto Case Creation (Reactive) Customer Assess & Optimize Environments (Proactive) AutoSupport Messages (HTTPS) AutoSupport Database NetApp Storage System Risk Detection & Automation Engine Customer Messages Sizing and (Email) modeling (Proactive) Storage My AutoSupport – Customer Administrator Portal (Proactive and Predictive) 4
  • 5. Business Challenges Gateways ETL Data Warehouse Reporting • 600K ASUPs • Data needs to • Only 5% of data goes into the • Numerous mining every week be parsed and data warehouse, rest requests are not satisfied • 40% coming over loaded in 15 unstructured, yet it’s growing currently the weekend mins 6-8TB per month • Huge untapped potential • .5% growth week • Oracle DBMS struggling to of valuable information for over week scale, maintenance and lead generation, backups challenging supportability, and BI • No easy way to access this unstructured content Finally, the incoming load doubles every 16 months! 5
  • 6. Incoming AutoSupport Volumes and TB Consumption Flat-File Storage Requirement 3500 3000 Total Usage (tb) 2500 2000 Projected Total Usage (tb) 1500 Doubles 1000 500 0 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16  At projected current rate of growth, As of June 2011: total storage requirement will ~ 600,000 events archived each week double every 16 months ~ 3 TB Disk space used each week  Cost Model: Events growing at 40% year over year > $15M per year Ecosystem costs Disk use growing faster Expanding products & features 7
  • 7. Big Data is Expensive Growth Rates (CAGR) – Data: +68% – Cost/byte: -30% – Net cost: +30% 4 Budget is flat 8
  • 8. Problem Summary 1. Data Growing at 68% CAGR 2. Current implementation will not survive much longer – We will fail to meet SLAs on ingest of new data – To meet business critical SLAs we will limit the scope of the data warehouse 3. Many new opportunities / requirements 9
  • 9. New Functionality Needed Weeks Product Analysis Service Cross Sell & Performance Up Sell Planning Customer Intelligence Sales License Management Proactive Support Customer Product Self Service Development Seconds Gigabytes Petabytes 10
  • 10. Predictive Analytics Examples  Proactive Support – Predict failure probabilities – Text events, performance changes, lifetime usage  Product Analysis – Feature usage – Per segment variations  Capacity Planning – Growth trends – Seasonality factors  Up-sell, cross-sell models 11
  • 12. Requirements used for POC & RFP  Cost Effective  Highly Scalable  Adaptive  New Analytical Capabilities 13
  • 13. POC Tests Log Data: Report analysis for an event across all install- base (25% of the install base and 2 months of data used for benchmarks) – 6 months to 1 year. – I/O bound  Counter Manager : Analysis restricted generally to 1 system or 1 cluster data for a single month (2 days 25% install base used for benchmark) – Trending across install-base are generally rare and ad-hoc. – More CPU bound (some tools will query large numbers of counters) 14
  • 15. Prime Hadoop Use Cases in POC Workload Use Case Current Capabilities How Hadoop can help? Type Logs (EMS) I/O • One month of data is worth • POC shows a 10 node Find bound 24 B records cluster could process occurrence • Out of this some 100 M one month of data of a pattern records are loaded per within 20 minutes across all log month in DW. Takes 4 days files in last to load a week 6 months • No ad-hoc capability exists to mine the pending records 17
  • 16. Prime Hadoop Use Cases in POC Workload Use Case Current Capabilities How Hadoop can help? Type Logs (EMS) I/O • One month of data is worth • POC shows a 10 node Find bound 24 B records cluster could process occurrence • Out of this some 100 M one month of data of a pattern records are loaded per within 20 minutes across all log month in DW. Takes 4 days files in last to load a week 6 months • No ad-hoc capability exists to mine the pending records CM CPU • Up to 10 M records in • Achieved throughput of Find hot disks bound single CM file 3M records per by disk types, • 200 B records in a month second during POC sys model etc. • No capability exists today • 100 node cluster is in backend infrastructure projected to process to process these one month of data in 1.8 hours 18
  • 18. ASUP.Next Hadoop Architecture HDFS Lookup F L Ingest Logs, R Ingest Asup Ingest U E Performance Config Tools M and raw config S Data E T Pig Subscribe Analyze Metrics, Analytics, E BI 20
  • 19. NetApp Open Solution for Hadoop Easy to Deploy, Manage, Scale Performance; Resilience; Density  Performance  Bandwidth for streaming  IOPs for metadata  Reduced cluster network congestion  Capacity and density  4 servers and 120TB fit in 8U  Fully serviceable storage system  Reliability  Hardware RAID and hot swap prevent job restart in case of media failure  Reliable metadata (Name Node)  Enterprise-class fit and finish Enterprise Class Hadoop 21
  • 20. NetApp Open Solution for Hadoop Easy to Deploy, Manage, Scale Performance; Resilience; Density  Performance  Bandwidth for streaming  IOPs for metadata  Reduced cluster network congestion  Capacity and density  4 servers and 120TB fit in 8U  Fully serviceable storage system  Reliability  Hardware RAID and hot swap prevent job restart in case of media failure  Reliable metadata (Name Node)  Enterprise-class fit and finish Enterprise Class Hadoop 22
  • 21. NetApp Storage Solution Architecture  Key Attributes: – Storage is protected by in-box RAID  Shared spare pool defers replacement of drives  Rebuild does not consume network bandwidth – Storage is striped  Maximize performance by minimizing unequal storage utilization – Reliable storage: HDFS replication count 2  Fewer disks  Less space, power, cooling, cost, … 23
  • 22. NetApp Storage Solution Architecture  Primary Questions: – Performance? – Cost? 24
  • 23. NetApp Storage Solution Architecture RESULTS ARE PRELIMINARY  Performance Concerns – Initial testing has focused on using TestDFSIO 25
  • 24. NetApp Storage Solution Architecture RESULTS ARE PRELIMINARY  Performance Concerns – Initial testing has focused on using TestDFSIO  Per-Disk: – 14 disks/server in array – 6 disks/server direct- attach 26
  • 25. NetApp Storage Solution Architecture  Minimizing TCO – Disk rebuild  Handled in the controller  Minimal impact to performance  No network bandwidth consumed – Server uptime  Very high – Hardware maintenance  Swap out dead disks as routine, not exception  Swap out of stateless servers is painless 27
  • 27. Take Aways  NetApp Assessed multiple traditional DB technologies to solve it’s Big Data problem and determined Hadoop was the best fit  Moved from direct attach disks to array-based storage to improve TCO  The overall architecture supports scale out growth 30
  • 28.
  • 29. © 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, AutoSupport, Data ONTAP, NOW, and Snapshot are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. Symantec is a registered trademark of Symantec Corporation. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.

Editor's Notes

  1. AutoSupport (resident in DATA ONTAP (OS) of every NetApp storage system) constantly monitors, troubleshoots and reports on the health of NetApp systemsIn addition to using AutoSupport for case generation and part dispatch, NetApp’s risk prognosis ecosystem (developed through innovations in people, process, and technology) delivers exemplary storage uptime and customer satisfactionRisks handled include issues in areas of configuration, interoperability, and other errors induced in the storage system from unintentional operationsNetApp support site has knowledgebase articles and support bulletins to help SAMs (Support Account Managers) and FSEs (Field Support Engineers) drive adoption and awareness and help customers actively mitigate risks
  2. The Current DataWarehouse will reach limits of capacity as well as processing capabilities for future Data ONTAP releasesMissed SLAsThe current environment has limited reporting capabilities, with a large demand for ASUP reportingProcessing all Performance Data for analysis is not due to size and scale of dataData doubling every 16 months
  3. Proactive SupportPredict failure probabilitiesText events, performance changes, lifetime usageProduct AnalysisFeature usagePer segment variationsCapacity PlanningGrowth trendsSeasonality factorsUp-sell, cross-sell models