SlideShare a Scribd company logo
1 of 27
Download to read offline
v7.0 – 09/07/2012




Accelerating Decisions Through
Enterprise Hadoop
Evolving Hadoop to support Enterprise Computing




v7.0 – 09/07/2012                                            Joey Jablonski
                                                             Practice Director, Analytic Services




           ©2012 DataDirect Networks. All Rights Reserved.                                       ddn.com
Agenda for The Data Challenge

►   Overview of DataDirect Network

►   What is Storage Fusion Processing™,
                      it’s advantages & applications

►   Overview of Analytics

►   Introduction to Apache Hadoop

►   An overview of DDN hScaler solution

►   Conclusion


         ©2012 DataDirect Networks. All Rights Reserved.   ddn.com
DDN | We Accelerate Information Insight

     DDN provides a competitive advantage by maximizing your
     datacenter investment while mitigating growth challenges
     over your discovery process.
 ►   Established: 1998
 ►   Revenue: $226M (2011) – Profitable, Fast Growth
 ►   Main Office: Sunnyvale, California, USA
 ►   Employees: 600+ Worldwide
 ►   Worldwide Presence: 16 Countries
 ►   Installed Base: 1,000+ End Customers; 50+ Countries
 ►   Go To Market: Global Partners, Resellers, Direct




 World-Renowned & Award-Winning



          ©2012 DataDirect Networks. All Rights Reserved.       ddn.com
DDN | 15 Years in HPC
  Investment In Scale & Innovation
                       First HPC
     DDN               Customer
 Incorporated

  DDN                        1st Customer                                  SFA Project          WOS Project       Largest private              500+
  FOUNDED                    NASA                                           Inception            Inception       storage co. (IDC)          EMPLOYEES




    1998    1999        2000        2001        2002         2003   2004     2005        2006   2007      2008   2009     2010       2011     2012




                                                         S2A8000                                S2A9900
                   S2A6000
                                                                                    S2A9550
                                         S2A3000




AWARDS
                                                                                                     6620           10K                        12K




                ©2012 DataDirect Networks. All Rights Reserved.                                                                                 ddn.com
Agenda for The Data Challenge

►   Overview of DataDirect Network

►   What is Storage Fusion Processing™,
                      it’s advantages & applications

►   Overview of Analytics

►   Introduction to Apache Hadoop

►   An overview of DDN hScaler solution

►   Conclusion


         ©2012 DataDirect Networks. All Rights Reserved.   ddn.com
Storage Fusion Processing™

                                                                                        Applications
    DDN’s
Storage Fusion                                                                         GRIDScaler™
 Architecture


                                                                   Network Interface                    Network Interface

                            SAS                                                        Storage Server
                          Interface                                                                                         Compute
     Storage                                    RAID                                                                        Resource
      Media                                    Controller




      • Driving Imperatives = Improved OPEX
             Massive bandwidth and low latency to storage media
             Multi-core processors + Big DRAMs
             Virtualization / Hypervisor

                 ©2012 DataDirect Networks. All Rights Reserved.                                                             ddn.com
DDN | Appliance Portfolio

             GRIDScaler™                                        EXAScaler™




  SFA12K-E                                SFA10K-E               SFA10K-M                  WOS6000
  Bandwidth: 40GB/s                     Bandwidth: 15GB/s         Bandwidth: 2GB/s       4U, 60-Drive System
  Flash IOPS: 1.4M                      Flash IOPS: 840K          Flash IOPS: 840K        8 x GbE per Node
Scales to 1680 Drives                  Scales to 1200 dives       Scales to 120 dives   2PB/Rack, 23PB/Cluster
In-Storage Processing                 In-Storage Processing     In-Storage Processing     25B Objects/Rack


                 Maximize Value: Best-In-Class Performance to Accelerate Applications

              Minimize OPEX: >2x More Data Center Efficient Than Competing Systems

               Minimize Overhead: Autonomous System Fault Management & Recovery

              ©2012 DataDirect Networks. All Rights Reserved.                                                    ddn.com
Storage Fusion Processing™
A Unique DDN Vision

Embedded Data-Intensive Applications
Within Storage Infrastructure

►Reduce  complexity, infrastructure,
 administration, TCO
►Reduce   infrastructure & OPEX
►Increase performance for
 latency sensitive applications
►Success    today with: File-Systems,
 iRODS, Hadoop, BWA, FASTA/SAM/BAM
►Work   with your research teams to:
  • Identify application candidates                         Gap Aligners?
  • Port to our VMs/Hypervisor and Benchmark                Molecular Dynamics?
  • Deploy to your community                                Deep and wide search?
                                                            Query engine?

          ©2012 DataDirect Networks. All Rights Reserved.                    ddn.com
Agenda for The Data Challenge

►   Overview of DataDirect Network

►   What is Storage Fusion Processing™,
                      it’s advantages & applications

►   Overview of Analytics

►   Introduction to Apache Hadoop

►   An overview of DDN hScaler solution

►   Conclusion


         ©2012 DataDirect Networks. All Rights Reserved.   ddn.com
Why Data Analytics is so Hard?


           Technical                                               Business


         Hacking Skills                                           Business Acumen




                     Data
                    Science                                               Analytics

   Math &




                                                                           Decisioning
                      Traditional
                      Research




                                    Substantive
  Statistics




                                                                              Poor
                                                         Communications                  Curiosity
                                     Expertise
 knowledge




       ©2012 DataDirect Networks. All Rights Reserved.                                          ddn.com
Analytics | Looking for Actionable Data



Billions of
   Data
Points to
Consider



•   Consumer purchasing trends
•   Product perception
•   Drug Discovery
•   Genomics
•   Surveillance
•   Financial Analysis

              ©2012 DataDirect Networks. All Rights Reserved.   ddn.com
How do I leverage Analytics?




                                                                 Improved
                                                                  Results




                                                                             Modify
                                                       Insight
                                                                            Behavior


     ©2012 DataDirect Networks. All Rights Reserved.                          ddn.com
Data Gravity
Warps the Application Space

     Applications


                                                        DATA

                                                          Services




      ©2012 DataDirect Networks. All Rights Reserved.                ddn.com
Todays Enterprise Picture
 Empowered




                                                                       Enabled
                                              Aware
                                              Users




                                                                        Users
   Users




                                                           The Cloud




         ©2012 DataDirect Networks. All Rights Reserved.                         ddn.com
Agenda for The Data Challenge

►   Overview of DataDirect Network

►   What is Storage Fusion Processing™,
                      it’s advantages & applications

►   Overview of Analytics

►   Introduction to Apache Hadoop

►   An overview of DDN hScaler solution

►   Conclusion


         ©2012 DataDirect Networks. All Rights Reserved.   ddn.com
The tools of the Trade
Ecosystem
 Hadoop




                     4             3                   5
Core Apache Hadoop




                     2             6                   1



                                                                                   Map   Reduce




                     1   2   3         4      5       6




                                 ©2012 DataDirect Networks. All Rights Reserved.              ddn.com
Hadoop & HPC Compared

                    Data Locality                         Inter-process Communication
                                                                   Job Input
      HPC




               1       2      3        4    5         6
                                                                 Slic      Slic
                                                                 e1        en


                4                  3                  5
                                                                    Job Input
                2                  6                  1
    Hadoop




                                                                 Slic     Slic
                                                                 e1       en
                1      2      3        4    5         6



    ©2012 DataDirect Networks. All Rights Reserved.                                     ddn.com
Organizational Scalability
Higher is Better
   Adoption




                                                                                         Goal for Human Costs




                                                                              Capacity
      18           6/8/12   ©2012 DataDirect Networks. All Rights Reserved.                                     ddn.com
Agenda for The Data Challenge

►   Overview of DataDirect Network

►   What is Storage Fusion Processing™,
                      it’s advantages & applications

►   Overview of Analytics

►   Introduction to Apache Hadoop

►   An overview of DDN hScaler solution

►   Conclusion


         ©2012 DataDirect Networks. All Rights Reserved.   ddn.com
Hadoop Cluster Lifecycle


                                                           Deploy




                                    Upgrade                              Manage




                                                 Respond            Monitor




Software Platform                                                                 Hardware Platform
        ©2012 DataDirect Networks. All Rights Reserved.                                     ddn.com
Infrastructure Chargeback




                                                          • Visibility to Trends
                                                          • Actionable Reporting
                                                          • Limits & Enforcement
                                                       Site Overview




     ©2012 DataDirect Networks. All Rights Reserved.                          ddn.com
Analytics Services Portfolio




  Architect                                     Deploy                        Manage                   Customize


• Data Transformation                   •   hScaler Installation      •       Data Curation            •   Data Migration
• Data & Analytics                      •   hScaler Upgrade           •       hScaler Administration   •   DR&BC
  Strategy                              •   Environment Integration   •       System Tuning            •   Application Integration
• Security Strategy in                  •   Performance Testing       •       Health Checks            •   Data Curation
  shared-data                           •   Operational Validation                                     •   Application Development
  Environments                          •   Factory Build                                              •   Data Cleansing
• DR&BC
• Data Curation
• Solution Sizing
• Data Center Preparation
                                                                               Support
• Process Integration                                                     •   Phone/Email
• ETL planning                                                            •   Phone Home Monitoring
• Compliance Planning                                                     •   Patches & Upgrades
                                                                          •   Remote Diagnostics
                 ©2012 DataDirect Networks. All Rights Reserved.                                                          ddn.com
Apache Hadoop
Genomics Application Examples

 ►    Apache Hadoop™ MapReduce™ computing efficiency:
      • The algorithm-performance should scale with CPU count
      • The algorithm should be embarrassingly parallel
      • There should be no dependence on how the data is distributed
      • The data should be static

 ►    Example genomics application that work well within Hadoop:
      • Crossbow. Whole genome re-sequencing & SNP genotyping (short reads)
      • Contrail. De novo assembly from short sequencing reads.
      • Myrna. Fast short-read & differential gene expression aligner (RNA-seq)
      • PeakRanger. Cloud-enabled peak caller for ChIP-seq data.
      • Quake. Quality-aware detection and sequencing error correction tool.
      • BlastReduce. High-performance short read mapping.
      • CloudBLAST. Hadoop implementation of NCBI’s Blast.
      • MrsRF. Algorithm for analyzing large evolutionary trees.
 23         ©2012 DataDirect Networks. All Rights Reserved.                    ddn.com
CloudBLAST Application Example

                                                                                                            StreamInputFormat
     CloudBLAST is a Map-Reduce
     version of the commonly used                                                              S=
                                                                                          {s1, s2, … sk}
                                                                                                                              S=
                                                                                                                           {s1, s2, … sk}
                                                                                                                                                           S=
                                                                                                                                                      {s1, s2, … sk}

     bioinformatics application NCBI
     BLAST




                                                                                                                                                                       CPU - N
                                                                                CPU - 0


                                                                                           CPU - 1


                                                                                                       CPU - 2


                                                                                                                 CPU - 3


                                                                                                                                  CPU - 4


                                                                                                                                            CPU - 5


                                                                                                                                                      CPU -6
     1. Stream Input Formatted data is split
        into “960 long chunks” base on new
        line.
     2. Data “chunks” split into sequences as
        keys for the MapReduce
     3. Blast output is written to local file




                                                                                                     Data Merger

Based on work by Andréa Matsunaga, Maurício Tsugawa and José Fortes - University of Florida

    24              ©2012 DataDirect Networks. All Rights Reserved.                                                                                                    ddn.com
Agenda for The Data Challenge

►   Overview of DataDirect Network

►   What is Storage Fusion Processing™,
                      it’s advantages & applications

►   Overview of Analytics

►   Introduction to Apache Hadoop

►   An overview of DDN hScaler solution

►   Conclusion


         ©2012 DataDirect Networks. All Rights Reserved.   ddn.com
How DDN can
    Accelerate Your Analytics
►   Lower Total Cost of Ownership and Improved OPEX:
    • Scale – Dynamically add capacity to match your complex workloads
    • Value – Grow storage capacity economically: Access, Solve, Archive
    • High Availability - Always running with world-class 24/7 service & support

►   Drive Innovation:
    • Performance at Scale – A homogeneous platform that performs at scale
    • Eloquent - Leverage virtualization to deliver analytics platform to provide the
      quickest answers to your most complex questions
    • Collaboration – Centralize & share discoveries across the globe, securely

►   Deliver Experience:
    • Fifteen Years of HPC – Government Labs, DoE, and Universities trust DDN
    • HPC community rely on DDN – 60% of the top 500 Supercomputer & growing
    • Single vendor solution - OEMs provide DDN with their datacenter solutions.



             ©2012 DataDirect Networks. All Rights Reserved.                    ddn.com
Thank you – Questions?



DataDirect Networks, Information in Motion, Silicon Storage Appliance, S2A, Storage Fusion Architecture, SFA, Storage Fusion Fabric, Web Object Scaler, WOS, EXAScaler, GRIDScaler,
       xSTREAMScaler, NAS Scaler, ReAct, ObjectAssure, In-Storage Processing and SATAssure are all trademarks of DataDirect Networks. Any unauthorized use is prohibited.

                       ©2012 DataDirect Networks. All Rights Reserved.                                                                                              ddn.com

More Related Content

What's hot

01 data quality-international challenge
01 data quality-international challenge01 data quality-international challenge
01 data quality-international challengePiLog
 
Maximize the Business Value of Your Information
Maximize the Business Value of Your Information Maximize the Business Value of Your Information
Maximize the Business Value of Your Information Iron Mountain
 
2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked Infrancisxsmith
 
2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked InMichelle Josephson
 
2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked Invickimason
 
2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked Inamrobbins17
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Cloudera, Inc.
 
STPCon fall 2012: The Testing Renaissance Has Arrived
STPCon fall 2012: The Testing Renaissance Has ArrivedSTPCon fall 2012: The Testing Renaissance Has Arrived
STPCon fall 2012: The Testing Renaissance Has ArrivedSOASTA
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsDataWorks Summit
 
Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...
Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...
Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...ORACLE USER GROUP ESTONIA
 
Tech Talk SQL Server 2012 Business Intelligence
Tech Talk SQL Server 2012 Business IntelligenceTech Talk SQL Server 2012 Business Intelligence
Tech Talk SQL Server 2012 Business IntelligenceRay Cochrane
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012DATAVERSITY
 
Patterns of Data Distribution
Patterns of Data DistributionPatterns of Data Distribution
Patterns of Data DistributionRick Warren
 
Services and Models in a Large IT System
Services and Models in a Large IT SystemServices and Models in a Large IT System
Services and Models in a Large IT SystemCHOOSE
 
HP Storage Works -Clemes Esser
HP Storage Works -Clemes EsserHP Storage Works -Clemes Esser
HP Storage Works -Clemes EsserHPDutchWorld
 
Introduction to the Interoperability Reference Architecture
Introduction to the Interoperability Reference ArchitectureIntroduction to the Interoperability Reference Architecture
Introduction to the Interoperability Reference ArchitectureHealth Informatics New Zealand
 
Microsoft Data Mining 2012
Microsoft Data Mining 2012Microsoft Data Mining 2012
Microsoft Data Mining 2012Mark Ginnebaugh
 

What's hot (19)

01 data quality-international challenge
01 data quality-international challenge01 data quality-international challenge
01 data quality-international challenge
 
Maximize the Business Value of Your Information
Maximize the Business Value of Your Information Maximize the Business Value of Your Information
Maximize the Business Value of Your Information
 
UPES-First Indian University to implement SAP
UPES-First Indian University to implement SAPUPES-First Indian University to implement SAP
UPES-First Indian University to implement SAP
 
2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In
 
2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In
 
2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In
 
2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In2011 As Corporate Overview Linked In
2011 As Corporate Overview Linked In
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
 
STPCon fall 2012: The Testing Renaissance Has Arrived
STPCon fall 2012: The Testing Renaissance Has ArrivedSTPCon fall 2012: The Testing Renaissance Has Arrived
STPCon fall 2012: The Testing Renaissance Has Arrived
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analytics
 
Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...
Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...
Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...
 
Tech Talk SQL Server 2012 Business Intelligence
Tech Talk SQL Server 2012 Business IntelligenceTech Talk SQL Server 2012 Business Intelligence
Tech Talk SQL Server 2012 Business Intelligence
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012
 
Business Models for Interoperability
Business Models for InteroperabilityBusiness Models for Interoperability
Business Models for Interoperability
 
Patterns of Data Distribution
Patterns of Data DistributionPatterns of Data Distribution
Patterns of Data Distribution
 
Services and Models in a Large IT System
Services and Models in a Large IT SystemServices and Models in a Large IT System
Services and Models in a Large IT System
 
HP Storage Works -Clemes Esser
HP Storage Works -Clemes EsserHP Storage Works -Clemes Esser
HP Storage Works -Clemes Esser
 
Introduction to the Interoperability Reference Architecture
Introduction to the Interoperability Reference ArchitectureIntroduction to the Interoperability Reference Architecture
Introduction to the Interoperability Reference Architecture
 
Microsoft Data Mining 2012
Microsoft Data Mining 2012Microsoft Data Mining 2012
Microsoft Data Mining 2012
 

Viewers also liked

SNIA 2012 - Creating an Enterprise Hadoop Platform
SNIA 2012 - Creating an Enterprise Hadoop PlatformSNIA 2012 - Creating an Enterprise Hadoop Platform
SNIA 2012 - Creating an Enterprise Hadoop PlatformJoey Jablonski
 
DDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for ExascaleDDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for ExascaleIntel IT Center
 
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...inside-BigData.com
 
Phan tich co phieu JVC, DNM, DDN (fintzone)
Phan tich co phieu JVC, DNM, DDN  (fintzone)Phan tich co phieu JVC, DNM, DDN  (fintzone)
Phan tich co phieu JVC, DNM, DDN (fintzone)Tony Auditor
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardwareinside-BigData.com
 
IBM general parallel file system - introduction
IBM general parallel file system - introductionIBM general parallel file system - introduction
IBM general parallel file system - introductionIBM Danmark
 
Optimizing Lustre and GPFS with DDN
Optimizing Lustre and GPFS with DDNOptimizing Lustre and GPFS with DDN
Optimizing Lustre and GPFS with DDNinside-BigData.com
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...inside-BigData.com
 
Academic Workflows with iRODS FINAL
Academic Workflows with iRODS FINALAcademic Workflows with iRODS FINAL
Academic Workflows with iRODS FINALRandy Splinter
 

Viewers also liked (13)

SNIA 2012 - Creating an Enterprise Hadoop Platform
SNIA 2012 - Creating an Enterprise Hadoop PlatformSNIA 2012 - Creating an Enterprise Hadoop Platform
SNIA 2012 - Creating an Enterprise Hadoop Platform
 
Corralling Big Data at TACC
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
 
DDN Service Strategy
DDN Service StrategyDDN Service Strategy
DDN Service Strategy
 
DDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for ExascaleDDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for Exascale
 
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
 
Phan tich co phieu JVC, DNM, DDN (fintzone)
Phan tich co phieu JVC, DNM, DDN  (fintzone)Phan tich co phieu JVC, DNM, DDN  (fintzone)
Phan tich co phieu JVC, DNM, DDN (fintzone)
 
Ddn Vision
Ddn VisionDdn Vision
Ddn Vision
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
 
IBM general parallel file system - introduction
IBM general parallel file system - introductionIBM general parallel file system - introduction
IBM general parallel file system - introduction
 
Optimizing Lustre and GPFS with DDN
Optimizing Lustre and GPFS with DDNOptimizing Lustre and GPFS with DDN
Optimizing Lustre and GPFS with DDN
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
 
DDN Product Update from SC13
DDN Product Update from SC13DDN Product Update from SC13
DDN Product Update from SC13
 
Academic Workflows with iRODS FINAL
Academic Workflows with iRODS FINALAcademic Workflows with iRODS FINAL
Academic Workflows with iRODS FINAL
 

Similar to DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final

Getting Big Value from Big Data
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big DataDataStax
 
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013patmisasi
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Accelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQLAccelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQLSumeet Bansal
 
Big Data Needs Big Analytics
Big Data Needs Big AnalyticsBig Data Needs Big Analytics
Big Data Needs Big AnalyticsDeepak Ramanathan
 
Big Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC RepresentativeBig Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC RepresentativeIBM Danmark
 
DDN Strategic Vision Tour June 2015
DDN Strategic Vision Tour June 2015DDN Strategic Vision Tour June 2015
DDN Strategic Vision Tour June 2015inside-BigData.com
 
Cloud as a Flexible & Collaborative Tool for Creators
Cloud as a Flexible & Collaborative Tool for CreatorsCloud as a Flexible & Collaborative Tool for Creators
Cloud as a Flexible & Collaborative Tool for Creatorsjlchatelain
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud MigrationGamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud MigrationNuoDB
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Next Generation Datacenter Oracle - Alan Hartwell
Next Generation Datacenter Oracle - Alan HartwellNext Generation Datacenter Oracle - Alan Hartwell
Next Generation Datacenter Oracle - Alan HartwellHPDutchWorld
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellHPDutchWorld
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Best-Fit-Engineering Deployments of Logical Data Warehouses
Best-Fit-Engineering Deployments of Logical Data WarehousesBest-Fit-Engineering Deployments of Logical Data Warehouses
Best-Fit-Engineering Deployments of Logical Data WarehousesDenodo
 

Similar to DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final (20)

Getting Big Value from Big Data
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big Data
 
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Accelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQLAccelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQL
 
Big Data Needs Big Analytics
Big Data Needs Big AnalyticsBig Data Needs Big Analytics
Big Data Needs Big Analytics
 
Big Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC RepresentativeBig Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC Representative
 
DDN Strategic Vision Tour June 2015
DDN Strategic Vision Tour June 2015DDN Strategic Vision Tour June 2015
DDN Strategic Vision Tour June 2015
 
Cloud as a Flexible & Collaborative Tool for Creators
Cloud as a Flexible & Collaborative Tool for CreatorsCloud as a Flexible & Collaborative Tool for Creators
Cloud as a Flexible & Collaborative Tool for Creators
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud MigrationGamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Antonio piraino v1
Antonio piraino v1Antonio piraino v1
Antonio piraino v1
 
Next Generation Datacenter Oracle - Alan Hartwell
Next Generation Datacenter Oracle - Alan HartwellNext Generation Datacenter Oracle - Alan Hartwell
Next Generation Datacenter Oracle - Alan Hartwell
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Best-Fit-Engineering Deployments of Logical Data Warehouses
Best-Fit-Engineering Deployments of Logical Data WarehousesBest-Fit-Engineering Deployments of Logical Data Warehouses
Best-Fit-Engineering Deployments of Logical Data Warehouses
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final

  • 1. v7.0 – 09/07/2012 Accelerating Decisions Through Enterprise Hadoop Evolving Hadoop to support Enterprise Computing v7.0 – 09/07/2012 Joey Jablonski Practice Director, Analytic Services ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 2. Agenda for The Data Challenge ► Overview of DataDirect Network ► What is Storage Fusion Processing™, it’s advantages & applications ► Overview of Analytics ► Introduction to Apache Hadoop ► An overview of DDN hScaler solution ► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 3. DDN | We Accelerate Information Insight DDN provides a competitive advantage by maximizing your datacenter investment while mitigating growth challenges over your discovery process. ► Established: 1998 ► Revenue: $226M (2011) – Profitable, Fast Growth ► Main Office: Sunnyvale, California, USA ► Employees: 600+ Worldwide ► Worldwide Presence: 16 Countries ► Installed Base: 1,000+ End Customers; 50+ Countries ► Go To Market: Global Partners, Resellers, Direct World-Renowned & Award-Winning ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 4. DDN | 15 Years in HPC Investment In Scale & Innovation First HPC DDN Customer Incorporated DDN 1st Customer SFA Project WOS Project Largest private 500+ FOUNDED NASA Inception Inception storage co. (IDC) EMPLOYEES 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 S2A8000 S2A9900 S2A6000 S2A9550 S2A3000 AWARDS 6620 10K 12K ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 5. Agenda for The Data Challenge ► Overview of DataDirect Network ► What is Storage Fusion Processing™, it’s advantages & applications ► Overview of Analytics ► Introduction to Apache Hadoop ► An overview of DDN hScaler solution ► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 6. Storage Fusion Processing™ Applications DDN’s Storage Fusion GRIDScaler™ Architecture Network Interface Network Interface SAS Storage Server Interface Compute Storage RAID Resource Media Controller • Driving Imperatives = Improved OPEX  Massive bandwidth and low latency to storage media  Multi-core processors + Big DRAMs  Virtualization / Hypervisor ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 7. DDN | Appliance Portfolio GRIDScaler™ EXAScaler™ SFA12K-E SFA10K-E SFA10K-M WOS6000 Bandwidth: 40GB/s Bandwidth: 15GB/s Bandwidth: 2GB/s 4U, 60-Drive System Flash IOPS: 1.4M Flash IOPS: 840K Flash IOPS: 840K 8 x GbE per Node Scales to 1680 Drives Scales to 1200 dives Scales to 120 dives 2PB/Rack, 23PB/Cluster In-Storage Processing In-Storage Processing In-Storage Processing 25B Objects/Rack Maximize Value: Best-In-Class Performance to Accelerate Applications Minimize OPEX: >2x More Data Center Efficient Than Competing Systems Minimize Overhead: Autonomous System Fault Management & Recovery ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 8. Storage Fusion Processing™ A Unique DDN Vision Embedded Data-Intensive Applications Within Storage Infrastructure ►Reduce complexity, infrastructure, administration, TCO ►Reduce infrastructure & OPEX ►Increase performance for latency sensitive applications ►Success today with: File-Systems, iRODS, Hadoop, BWA, FASTA/SAM/BAM ►Work with your research teams to: • Identify application candidates Gap Aligners? • Port to our VMs/Hypervisor and Benchmark Molecular Dynamics? • Deploy to your community Deep and wide search? Query engine? ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 9. Agenda for The Data Challenge ► Overview of DataDirect Network ► What is Storage Fusion Processing™, it’s advantages & applications ► Overview of Analytics ► Introduction to Apache Hadoop ► An overview of DDN hScaler solution ► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 10. Why Data Analytics is so Hard? Technical Business Hacking Skills Business Acumen Data Science Analytics Math & Decisioning Traditional Research Substantive Statistics Poor Communications Curiosity Expertise knowledge ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 11. Analytics | Looking for Actionable Data Billions of Data Points to Consider • Consumer purchasing trends • Product perception • Drug Discovery • Genomics • Surveillance • Financial Analysis ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 12. How do I leverage Analytics? Improved Results Modify Insight Behavior ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 13. Data Gravity Warps the Application Space Applications DATA Services ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 14. Todays Enterprise Picture Empowered Enabled Aware Users Users Users The Cloud ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 15. Agenda for The Data Challenge ► Overview of DataDirect Network ► What is Storage Fusion Processing™, it’s advantages & applications ► Overview of Analytics ► Introduction to Apache Hadoop ► An overview of DDN hScaler solution ► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 16. The tools of the Trade Ecosystem Hadoop 4 3 5 Core Apache Hadoop 2 6 1 Map Reduce 1 2 3 4 5 6 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 17. Hadoop & HPC Compared Data Locality Inter-process Communication Job Input HPC 1 2 3 4 5 6 Slic Slic e1 en 4 3 5 Job Input 2 6 1 Hadoop Slic Slic e1 en 1 2 3 4 5 6 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 18. Organizational Scalability Higher is Better Adoption Goal for Human Costs Capacity 18 6/8/12 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 19. Agenda for The Data Challenge ► Overview of DataDirect Network ► What is Storage Fusion Processing™, it’s advantages & applications ► Overview of Analytics ► Introduction to Apache Hadoop ► An overview of DDN hScaler solution ► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 20. Hadoop Cluster Lifecycle Deploy Upgrade Manage Respond Monitor Software Platform Hardware Platform ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 21. Infrastructure Chargeback • Visibility to Trends • Actionable Reporting • Limits & Enforcement Site Overview ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 22. Analytics Services Portfolio Architect Deploy Manage Customize • Data Transformation • hScaler Installation • Data Curation • Data Migration • Data & Analytics • hScaler Upgrade • hScaler Administration • DR&BC Strategy • Environment Integration • System Tuning • Application Integration • Security Strategy in • Performance Testing • Health Checks • Data Curation shared-data • Operational Validation • Application Development Environments • Factory Build • Data Cleansing • DR&BC • Data Curation • Solution Sizing • Data Center Preparation Support • Process Integration • Phone/Email • ETL planning • Phone Home Monitoring • Compliance Planning • Patches & Upgrades • Remote Diagnostics ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 23. Apache Hadoop Genomics Application Examples ► Apache Hadoop™ MapReduce™ computing efficiency: • The algorithm-performance should scale with CPU count • The algorithm should be embarrassingly parallel • There should be no dependence on how the data is distributed • The data should be static ► Example genomics application that work well within Hadoop: • Crossbow. Whole genome re-sequencing & SNP genotyping (short reads) • Contrail. De novo assembly from short sequencing reads. • Myrna. Fast short-read & differential gene expression aligner (RNA-seq) • PeakRanger. Cloud-enabled peak caller for ChIP-seq data. • Quake. Quality-aware detection and sequencing error correction tool. • BlastReduce. High-performance short read mapping. • CloudBLAST. Hadoop implementation of NCBI’s Blast. • MrsRF. Algorithm for analyzing large evolutionary trees. 23 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 24. CloudBLAST Application Example StreamInputFormat CloudBLAST is a Map-Reduce version of the commonly used S= {s1, s2, … sk} S= {s1, s2, … sk} S= {s1, s2, … sk} bioinformatics application NCBI BLAST CPU - N CPU - 0 CPU - 1 CPU - 2 CPU - 3 CPU - 4 CPU - 5 CPU -6 1. Stream Input Formatted data is split into “960 long chunks” base on new line. 2. Data “chunks” split into sequences as keys for the MapReduce 3. Blast output is written to local file Data Merger Based on work by Andréa Matsunaga, Maurício Tsugawa and José Fortes - University of Florida 24 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 25. Agenda for The Data Challenge ► Overview of DataDirect Network ► What is Storage Fusion Processing™, it’s advantages & applications ► Overview of Analytics ► Introduction to Apache Hadoop ► An overview of DDN hScaler solution ► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 26. How DDN can Accelerate Your Analytics ► Lower Total Cost of Ownership and Improved OPEX: • Scale – Dynamically add capacity to match your complex workloads • Value – Grow storage capacity economically: Access, Solve, Archive • High Availability - Always running with world-class 24/7 service & support ► Drive Innovation: • Performance at Scale – A homogeneous platform that performs at scale • Eloquent - Leverage virtualization to deliver analytics platform to provide the quickest answers to your most complex questions • Collaboration – Centralize & share discoveries across the globe, securely ► Deliver Experience: • Fifteen Years of HPC – Government Labs, DoE, and Universities trust DDN • HPC community rely on DDN – 60% of the top 500 Supercomputer & growing • Single vendor solution - OEMs provide DDN with their datacenter solutions. ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 27. Thank you – Questions? DataDirect Networks, Information in Motion, Silicon Storage Appliance, S2A, Storage Fusion Architecture, SFA, Storage Fusion Fabric, Web Object Scaler, WOS, EXAScaler, GRIDScaler, xSTREAMScaler, NAS Scaler, ReAct, ObjectAssure, In-Storage Processing and SATAssure are all trademarks of DataDirect Networks. Any unauthorized use is prohibited. ©2012 DataDirect Networks. All Rights Reserved. ddn.com