SlideShare a Scribd company logo
1 of 12
Analyzing Logs/
Configs of 200'000
Systems with
Hadoop
christoph.schnidrig@netapp.com




                                 1
What is AutoSupport?


¡  AutoSupport is NetApp's 'phone home'
    mechanism

¡  Collection of
  –    Logfiles
  –    XML files
  –    Command output capture
  –    Counter Manager output


                                           2
Business Challenges




   Gateways                ETL               Data Warehouse                        Reporting
•  600K ASUPs        •  Data needs to   •  Only 5% of data goes into the   •  Numerous mining
   every week           be parsed and      data warehouse                     requests are not satisfied
                        loaded in 15    •  Oracle DBMS struggling to          currently
•  40% coming over
   the weekend          mins               scale, maintenance and          •  Huge untapped potential
                                           backups challenging                of valuable information for
•  2TB growth over
                                        •  No easy way to access this         lead generation,
   week
                                           unstructured content               supportability, and BI




        Finally, the incoming load doubles every 16 months!
                                                                                                       4
Hadoop Architecture




                      7
Solution Architecture




                        8
Client Apps – how the customer sees it




                                         11
Physical Architecture
                                                                                                                                                                            FAS	
  2040
                                                                                                                                                                                                         FAS2040                     A
                                                                                                                                                                                                                                         1	
  GB	
  Ethernet
                                                                                                                                                                                                                                     B




12 data nodes : 12 cores , 48 GB RAM each
3 E-series storage arrays (~600TB)                                                                                                                                                                                                             Secondary	
  
                                                                                                                    Job	
  Tracker                                                                     Name	
  Node                            Name	
  Node
                                                                                                                                             10	
  GB/s	
  Ethernet




                                                              2          4   2        4     2       4   2       4                                            2          4   2        4     2       4   2       4                                                  2          4   2        4     2       4   2       4
                                            Port 1   Port 2        8              8             8           8          Lnk    Lnk          Port 1   Port 2        8              8             8           8       Lnk    Lnk                   Port 1   Port 2        8              8             8           8       Lnk    Lnk




                                                                  Ch 1           Ch 2 FCHost Ch 3       Ch 4                   Drive                             Ch 1           Ch 2 FCHost Ch 3       Ch 4                Drive                                      Ch 1           Ch 2 FCHost Ch 3       Ch 4                Drive
                                                                                                                             Expansion                                                                                   Expansion                                                                                            Expansion
                                                                                                ID/Diag                                                                                        ID/Diag                                                                                              ID/Diag




                                                                                                                                         E	
  2600	
  Storage	
  Array
Some performance numbers

Metrics                           Hadoop

Raw ASUP ingest                   1000 ASUPs/min
Throughput                        or 1.5 GB/min

ASUP Configuration data parse &   1000 ASUP/min
Load

Event messages (EMS) Process &    < 1 Hour for 2 Billion records
Load                              ~= > 200 GB/Hour

EMS Ad-hoc analysis               4-6M records/sec ~=
                                  200 MB/sec on compressed
                                  (LZO) data



                                                                   14
                                                                    14
New possibilities with Hadoop

                ¡  Correlate disk latency (hot) with
                    disk type
                  –  24 billion records
                  –  4 weeks to run query
                  –  Hadoop implementation 10.5 hours
                ¡  Bug detection through pattern
                    matching
                  –  240 billion records – Too large to
                     run
                  –  Hadoop implementation 18 hours




                                                          15
Incoming AutoSupport Volumes
and TB Consumption
                              Flat-File Storage Requirement
3500
3000
                                Total Usage (tb)
2500
2000                            Projected Total Usage (tb)
1500                            Doubles
1000
500
  0
  Jan-05   Jan-06   Jan-07   Jan-08   Jan-09   Jan-10   Jan-11   Jan-12   Jan-13   Jan-14   Jan-15   Jan-16


¡  At projected current rate of growth,
    total storage requirements continue
    doubling every 16 months
¡  Cost Model:
    > $15M per year Ecosystem costs


                                                                                                        16
References
¡  NetApp Accelerates AutoSupport Analytics with
    NetApp Open Solution for Hadoop
    http://media.netapp.com/documents/asup-hadoop.pdf

¡  NetApp Open Solution for Hadoop Solutions Guide
    http://media.netapp.com/documents/tr-3969.pdf

¡  ESG: Lab Validation Report
    http://media.netapp.com/documents/ar-esg-netapp-
    open-solution.pdf
18

More Related Content

Viewers also liked

Entrevista a Cortázar por Sara Castro
Entrevista a Cortázar por Sara CastroEntrevista a Cortázar por Sara Castro
Entrevista a Cortázar por Sara CastroCecilia Pesce
 
Swiss Culinary Cup 2016 - Regolamento in italiano
Swiss Culinary Cup 2016 - Regolamento in italianoSwiss Culinary Cup 2016 - Regolamento in italiano
Swiss Culinary Cup 2016 - Regolamento in italianoMirjam Trinkler
 
Diploma titles 2016 17
Diploma titles 2016 17Diploma titles 2016 17
Diploma titles 2016 17parasuraman535
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsBaiju P.S.
 
Forex e book-easy-forex
Forex e book-easy-forexForex e book-easy-forex
Forex e book-easy-forexIndia Rocks
 
Ecoplast Fiber presentation
Ecoplast Fiber presentationEcoplast Fiber presentation
Ecoplast Fiber presentationYana Cholakova
 
Make the Most of Hosted Unified Communications
Make the Most of Hosted Unified CommunicationsMake the Most of Hosted Unified Communications
Make the Most of Hosted Unified CommunicationsOnvoy
 
Common Carbon Metric in Buildings in Putrajaya
Common Carbon Metric in Buildings in PutrajayaCommon Carbon Metric in Buildings in Putrajaya
Common Carbon Metric in Buildings in PutrajayaSteve Lojuntin
 
The New Design Workflow
The New Design WorkflowThe New Design Workflow
The New Design WorkflowPhase2
 
Apuntes bm 1
Apuntes bm 1Apuntes bm 1
Apuntes bm 1Sierras89
 
Boletín domingo 2 de marzo
Boletín domingo 2 de marzo Boletín domingo 2 de marzo
Boletín domingo 2 de marzo europeanecc2014
 
Sl boston 05_12_15_ener_noc_final_public
Sl boston 05_12_15_ener_noc_final_publicSl boston 05_12_15_ener_noc_final_public
Sl boston 05_12_15_ener_noc_final_publicSplunk
 
Creating Enchantment with Referring Physicians - Cleveland Clinic - Gelb
Creating Enchantment with Referring Physicians - Cleveland Clinic - GelbCreating Enchantment with Referring Physicians - Cleveland Clinic - Gelb
Creating Enchantment with Referring Physicians - Cleveland Clinic - GelbEndeavor Management
 
Laser Hair Removal
Laser Hair RemovalLaser Hair Removal
Laser Hair RemovalLaserklinic
 
Biodiversity of karnataka at a glan
Biodiversity of karnataka at a glanBiodiversity of karnataka at a glan
Biodiversity of karnataka at a glanBen Sudarsanan
 
Corpus iuris civilis
Corpus iuris civilisCorpus iuris civilis
Corpus iuris civilisiusvieyra
 

Viewers also liked (19)

Entrevista a Cortázar por Sara Castro
Entrevista a Cortázar por Sara CastroEntrevista a Cortázar por Sara Castro
Entrevista a Cortázar por Sara Castro
 
Rev julio sept 2005
Rev julio   sept 2005Rev julio   sept 2005
Rev julio sept 2005
 
Swiss Culinary Cup 2016 - Regolamento in italiano
Swiss Culinary Cup 2016 - Regolamento in italianoSwiss Culinary Cup 2016 - Regolamento in italiano
Swiss Culinary Cup 2016 - Regolamento in italiano
 
Sistemas..
Sistemas..Sistemas..
Sistemas..
 
Diploma titles 2016 17
Diploma titles 2016 17Diploma titles 2016 17
Diploma titles 2016 17
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC Products
 
Forex e book-easy-forex
Forex e book-easy-forexForex e book-easy-forex
Forex e book-easy-forex
 
Ecoplast Fiber presentation
Ecoplast Fiber presentationEcoplast Fiber presentation
Ecoplast Fiber presentation
 
Make the Most of Hosted Unified Communications
Make the Most of Hosted Unified CommunicationsMake the Most of Hosted Unified Communications
Make the Most of Hosted Unified Communications
 
Ley Gener[1]
Ley Gener[1]Ley Gener[1]
Ley Gener[1]
 
Common Carbon Metric in Buildings in Putrajaya
Common Carbon Metric in Buildings in PutrajayaCommon Carbon Metric in Buildings in Putrajaya
Common Carbon Metric in Buildings in Putrajaya
 
The New Design Workflow
The New Design WorkflowThe New Design Workflow
The New Design Workflow
 
Apuntes bm 1
Apuntes bm 1Apuntes bm 1
Apuntes bm 1
 
Boletín domingo 2 de marzo
Boletín domingo 2 de marzo Boletín domingo 2 de marzo
Boletín domingo 2 de marzo
 
Sl boston 05_12_15_ener_noc_final_public
Sl boston 05_12_15_ener_noc_final_publicSl boston 05_12_15_ener_noc_final_public
Sl boston 05_12_15_ener_noc_final_public
 
Creating Enchantment with Referring Physicians - Cleveland Clinic - Gelb
Creating Enchantment with Referring Physicians - Cleveland Clinic - GelbCreating Enchantment with Referring Physicians - Cleveland Clinic - Gelb
Creating Enchantment with Referring Physicians - Cleveland Clinic - Gelb
 
Laser Hair Removal
Laser Hair RemovalLaser Hair Removal
Laser Hair Removal
 
Biodiversity of karnataka at a glan
Biodiversity of karnataka at a glanBiodiversity of karnataka at a glan
Biodiversity of karnataka at a glan
 
Corpus iuris civilis
Corpus iuris civilisCorpus iuris civilis
Corpus iuris civilis
 

Similar to 16.07.12 Analyzing Logs/Configs of 200'000 Systems with Hadoop (Christoph Schnidrig, NetApp)

Netgear ReadyNAS Comparison
Netgear ReadyNAS ComparisonNetgear ReadyNAS Comparison
Netgear ReadyNAS ComparisonAltaware, Inc.
 
Bobcat hotchips final 8 2 10
Bobcat hotchips final 8 2 10Bobcat hotchips final 8 2 10
Bobcat hotchips final 8 2 10mbasford
 
Shmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxyShmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxyShannon McFarland
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheChristopher Brown
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組みRyousei Takano
 
16Gb Fibre Channel Deployment Guide
16Gb Fibre Channel Deployment Guide16Gb Fibre Channel Deployment Guide
16Gb Fibre Channel Deployment GuideTheFibreChannel
 
PA-3 Debugging Wireless with Wireshark Including Large Trace Files, AirPcap &...
PA-3 Debugging Wireless with Wireshark Including Large Trace Files, AirPcap &...PA-3 Debugging Wireless with Wireshark Including Large Trace Files, AirPcap &...
PA-3 Debugging Wireless with Wireshark Including Large Trace Files, AirPcap &...Megumi Takeshita
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationxKinAnx
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxssuserabc741
 
Sun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentationSun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentationxKinAnx
 
A comparison of segment routing data-plane encodings
A comparison of segment routing data-plane encodingsA comparison of segment routing data-plane encodings
A comparison of segment routing data-plane encodingsGunter Van de Velde
 
Jaguar x86 Core Functional Verification
Jaguar x86 Core Functional VerificationJaguar x86 Core Functional Verification
Jaguar x86 Core Functional VerificationDVClub
 
Operational Issues inIPv6 --from vendors' point of view--
Operational Issues inIPv6 --from vendors' point of view--Operational Issues inIPv6 --from vendors' point of view--
Operational Issues inIPv6 --from vendors' point of view--Shinsuke SUZUKI
 
A comparison of Segment Routing Data-Plane encodings
A comparison of Segment Routing Data-Plane encodingsA comparison of Segment Routing Data-Plane encodings
A comparison of Segment Routing Data-Plane encodingsGunter Van de Velde
 
NetApp FAS2200 Series Portfolio
NetApp FAS2200 Series PortfolioNetApp FAS2200 Series Portfolio
NetApp FAS2200 Series PortfolioNetApp
 
QsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale SystemsQsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale SystemsFederica Pisani
 
Hds brcd solutions_tech_summit
Hds brcd solutions_tech_summitHds brcd solutions_tech_summit
Hds brcd solutions_tech_summitSteve Lee
 

Similar to 16.07.12 Analyzing Logs/Configs of 200'000 Systems with Hadoop (Christoph Schnidrig, NetApp) (20)

Netgear ReadyNAS Comparison
Netgear ReadyNAS ComparisonNetgear ReadyNAS Comparison
Netgear ReadyNAS Comparison
 
Bobcat hotchips final 8 2 10
Bobcat hotchips final 8 2 10Bobcat hotchips final 8 2 10
Bobcat hotchips final 8 2 10
 
Shmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxyShmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxy
 
Castoro / RubyKaigi2010
Castoro / RubyKaigi2010Castoro / RubyKaigi2010
Castoro / RubyKaigi2010
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み
 
16Gb Fibre Channel Deployment Guide
16Gb Fibre Channel Deployment Guide16Gb Fibre Channel Deployment Guide
16Gb Fibre Channel Deployment Guide
 
PA-3 Debugging Wireless with Wireshark Including Large Trace Files, AirPcap &...
PA-3 Debugging Wireless with Wireshark Including Large Trace Files, AirPcap &...PA-3 Debugging Wireless with Wireshark Including Large Trace Files, AirPcap &...
PA-3 Debugging Wireless with Wireshark Including Large Trace Files, AirPcap &...
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentation
 
NetApp Product training
NetApp Product trainingNetApp Product training
NetApp Product training
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptx
 
Sun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentationSun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentation
 
A comparison of segment routing data-plane encodings
A comparison of segment routing data-plane encodingsA comparison of segment routing data-plane encodings
A comparison of segment routing data-plane encodings
 
Jaguar x86 Core Functional Verification
Jaguar x86 Core Functional VerificationJaguar x86 Core Functional Verification
Jaguar x86 Core Functional Verification
 
Operational Issues inIPv6 --from vendors' point of view--
Operational Issues inIPv6 --from vendors' point of view--Operational Issues inIPv6 --from vendors' point of view--
Operational Issues inIPv6 --from vendors' point of view--
 
A comparison of Segment Routing Data-Plane encodings
A comparison of Segment Routing Data-Plane encodingsA comparison of Segment Routing Data-Plane encodings
A comparison of Segment Routing Data-Plane encodings
 
NetApp FAS2200 Series Portfolio
NetApp FAS2200 Series PortfolioNetApp FAS2200 Series Portfolio
NetApp FAS2200 Series Portfolio
 
QsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale SystemsQsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale Systems
 
Hds brcd solutions_tech_summit
Hds brcd solutions_tech_summitHds brcd solutions_tech_summit
Hds brcd solutions_tech_summit
 

More from Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseSwiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexitySwiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceSwiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketSwiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseSwiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computingSwiss Big Data User Group
 

More from Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

16.07.12 Analyzing Logs/Configs of 200'000 Systems with Hadoop (Christoph Schnidrig, NetApp)

  • 1. Analyzing Logs/ Configs of 200'000 Systems with Hadoop christoph.schnidrig@netapp.com 1
  • 2. What is AutoSupport? ¡  AutoSupport is NetApp's 'phone home' mechanism ¡  Collection of –  Logfiles –  XML files –  Command output capture –  Counter Manager output 2
  • 3. Business Challenges Gateways ETL Data Warehouse Reporting •  600K ASUPs •  Data needs to •  Only 5% of data goes into the •  Numerous mining every week be parsed and data warehouse requests are not satisfied loaded in 15 •  Oracle DBMS struggling to currently •  40% coming over the weekend mins scale, maintenance and •  Huge untapped potential backups challenging of valuable information for •  2TB growth over •  No easy way to access this lead generation, week unstructured content supportability, and BI Finally, the incoming load doubles every 16 months! 4
  • 6. Client Apps – how the customer sees it 11
  • 7. Physical Architecture FAS  2040 FAS2040 A 1  GB  Ethernet B 12 data nodes : 12 cores , 48 GB RAM each 3 E-series storage arrays (~600TB) Secondary   Job  Tracker Name  Node Name  Node 10  GB/s  Ethernet 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 Port 1 Port 2 8 8 8 8 Lnk Lnk Port 1 Port 2 8 8 8 8 Lnk Lnk Port 1 Port 2 8 8 8 8 Lnk Lnk Ch 1 Ch 2 FCHost Ch 3 Ch 4 Drive Ch 1 Ch 2 FCHost Ch 3 Ch 4 Drive Ch 1 Ch 2 FCHost Ch 3 Ch 4 Drive Expansion Expansion Expansion ID/Diag ID/Diag ID/Diag E  2600  Storage  Array
  • 8. Some performance numbers Metrics Hadoop Raw ASUP ingest 1000 ASUPs/min Throughput or 1.5 GB/min ASUP Configuration data parse & 1000 ASUP/min Load Event messages (EMS) Process & < 1 Hour for 2 Billion records Load ~= > 200 GB/Hour EMS Ad-hoc analysis 4-6M records/sec ~= 200 MB/sec on compressed (LZO) data 14 14
  • 9. New possibilities with Hadoop ¡  Correlate disk latency (hot) with disk type –  24 billion records –  4 weeks to run query –  Hadoop implementation 10.5 hours ¡  Bug detection through pattern matching –  240 billion records – Too large to run –  Hadoop implementation 18 hours 15
  • 10. Incoming AutoSupport Volumes and TB Consumption Flat-File Storage Requirement 3500 3000 Total Usage (tb) 2500 2000 Projected Total Usage (tb) 1500 Doubles 1000 500 0 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16 ¡  At projected current rate of growth, total storage requirements continue doubling every 16 months ¡  Cost Model: > $15M per year Ecosystem costs 16
  • 11. References ¡  NetApp Accelerates AutoSupport Analytics with NetApp Open Solution for Hadoop http://media.netapp.com/documents/asup-hadoop.pdf ¡  NetApp Open Solution for Hadoop Solutions Guide http://media.netapp.com/documents/tr-3969.pdf ¡  ESG: Lab Validation Report http://media.netapp.com/documents/ar-esg-netapp- open-solution.pdf
  • 12. 18