More Related Content
Similar to DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013patmisasi
Similar to DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final (20)
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
- 1. v7.0 – 09/07/2012
Accelerating Decisions Through
Enterprise Hadoop
Evolving Hadoop to support Enterprise Computing
v7.0 – 09/07/2012 Joey Jablonski
Practice Director, Analytic Services
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 2. Agenda for The Data Challenge
► Overview of DataDirect Network
► What is Storage Fusion Processing™,
it’s advantages & applications
► Overview of Analytics
► Introduction to Apache Hadoop
► An overview of DDN hScaler solution
► Conclusion
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 3. DDN | We Accelerate Information Insight
DDN provides a competitive advantage by maximizing your
datacenter investment while mitigating growth challenges
over your discovery process.
► Established: 1998
► Revenue: $226M (2011) – Profitable, Fast Growth
► Main Office: Sunnyvale, California, USA
► Employees: 600+ Worldwide
► Worldwide Presence: 16 Countries
► Installed Base: 1,000+ End Customers; 50+ Countries
► Go To Market: Global Partners, Resellers, Direct
World-Renowned & Award-Winning
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 4. DDN | 15 Years in HPC
Investment In Scale & Innovation
First HPC
DDN Customer
Incorporated
DDN 1st Customer SFA Project WOS Project Largest private 500+
FOUNDED NASA Inception Inception storage co. (IDC) EMPLOYEES
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
S2A8000 S2A9900
S2A6000
S2A9550
S2A3000
AWARDS
6620 10K 12K
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 5. Agenda for The Data Challenge
► Overview of DataDirect Network
► What is Storage Fusion Processing™,
it’s advantages & applications
► Overview of Analytics
► Introduction to Apache Hadoop
► An overview of DDN hScaler solution
► Conclusion
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 6. Storage Fusion Processing™
Applications
DDN’s
Storage Fusion GRIDScaler™
Architecture
Network Interface Network Interface
SAS Storage Server
Interface Compute
Storage RAID Resource
Media Controller
• Driving Imperatives = Improved OPEX
Massive bandwidth and low latency to storage media
Multi-core processors + Big DRAMs
Virtualization / Hypervisor
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 7. DDN | Appliance Portfolio
GRIDScaler™ EXAScaler™
SFA12K-E SFA10K-E SFA10K-M WOS6000
Bandwidth: 40GB/s Bandwidth: 15GB/s Bandwidth: 2GB/s 4U, 60-Drive System
Flash IOPS: 1.4M Flash IOPS: 840K Flash IOPS: 840K 8 x GbE per Node
Scales to 1680 Drives Scales to 1200 dives Scales to 120 dives 2PB/Rack, 23PB/Cluster
In-Storage Processing In-Storage Processing In-Storage Processing 25B Objects/Rack
Maximize Value: Best-In-Class Performance to Accelerate Applications
Minimize OPEX: >2x More Data Center Efficient Than Competing Systems
Minimize Overhead: Autonomous System Fault Management & Recovery
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 8. Storage Fusion Processing™
A Unique DDN Vision
Embedded Data-Intensive Applications
Within Storage Infrastructure
►Reduce complexity, infrastructure,
administration, TCO
►Reduce infrastructure & OPEX
►Increase performance for
latency sensitive applications
►Success today with: File-Systems,
iRODS, Hadoop, BWA, FASTA/SAM/BAM
►Work with your research teams to:
• Identify application candidates Gap Aligners?
• Port to our VMs/Hypervisor and Benchmark Molecular Dynamics?
• Deploy to your community Deep and wide search?
Query engine?
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 9. Agenda for The Data Challenge
► Overview of DataDirect Network
► What is Storage Fusion Processing™,
it’s advantages & applications
► Overview of Analytics
► Introduction to Apache Hadoop
► An overview of DDN hScaler solution
► Conclusion
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 10. Why Data Analytics is so Hard?
Technical Business
Hacking Skills Business Acumen
Data
Science Analytics
Math &
Decisioning
Traditional
Research
Substantive
Statistics
Poor
Communications Curiosity
Expertise
knowledge
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 11. Analytics | Looking for Actionable Data
Billions of
Data
Points to
Consider
• Consumer purchasing trends
• Product perception
• Drug Discovery
• Genomics
• Surveillance
• Financial Analysis
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 12. How do I leverage Analytics?
Improved
Results
Modify
Insight
Behavior
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 13. Data Gravity
Warps the Application Space
Applications
DATA
Services
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 14. Todays Enterprise Picture
Empowered
Enabled
Aware
Users
Users
Users
The Cloud
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 15. Agenda for The Data Challenge
► Overview of DataDirect Network
► What is Storage Fusion Processing™,
it’s advantages & applications
► Overview of Analytics
► Introduction to Apache Hadoop
► An overview of DDN hScaler solution
► Conclusion
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 16. The tools of the Trade
Ecosystem
Hadoop
4 3 5
Core Apache Hadoop
2 6 1
Map Reduce
1 2 3 4 5 6
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 17. Hadoop & HPC Compared
Data Locality Inter-process Communication
Job Input
HPC
1 2 3 4 5 6
Slic Slic
e1 en
4 3 5
Job Input
2 6 1
Hadoop
Slic Slic
e1 en
1 2 3 4 5 6
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 19. Agenda for The Data Challenge
► Overview of DataDirect Network
► What is Storage Fusion Processing™,
it’s advantages & applications
► Overview of Analytics
► Introduction to Apache Hadoop
► An overview of DDN hScaler solution
► Conclusion
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 20. Hadoop Cluster Lifecycle
Deploy
Upgrade Manage
Respond Monitor
Software Platform Hardware Platform
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 21. Infrastructure Chargeback
• Visibility to Trends
• Actionable Reporting
• Limits & Enforcement
Site Overview
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 22. Analytics Services Portfolio
Architect Deploy Manage Customize
• Data Transformation • hScaler Installation • Data Curation • Data Migration
• Data & Analytics • hScaler Upgrade • hScaler Administration • DR&BC
Strategy • Environment Integration • System Tuning • Application Integration
• Security Strategy in • Performance Testing • Health Checks • Data Curation
shared-data • Operational Validation • Application Development
Environments • Factory Build • Data Cleansing
• DR&BC
• Data Curation
• Solution Sizing
• Data Center Preparation
Support
• Process Integration • Phone/Email
• ETL planning • Phone Home Monitoring
• Compliance Planning • Patches & Upgrades
• Remote Diagnostics
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 23. Apache Hadoop
Genomics Application Examples
► Apache Hadoop™ MapReduce™ computing efficiency:
• The algorithm-performance should scale with CPU count
• The algorithm should be embarrassingly parallel
• There should be no dependence on how the data is distributed
• The data should be static
► Example genomics application that work well within Hadoop:
• Crossbow. Whole genome re-sequencing & SNP genotyping (short reads)
• Contrail. De novo assembly from short sequencing reads.
• Myrna. Fast short-read & differential gene expression aligner (RNA-seq)
• PeakRanger. Cloud-enabled peak caller for ChIP-seq data.
• Quake. Quality-aware detection and sequencing error correction tool.
• BlastReduce. High-performance short read mapping.
• CloudBLAST. Hadoop implementation of NCBI’s Blast.
• MrsRF. Algorithm for analyzing large evolutionary trees.
23 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 24. CloudBLAST Application Example
StreamInputFormat
CloudBLAST is a Map-Reduce
version of the commonly used S=
{s1, s2, … sk}
S=
{s1, s2, … sk}
S=
{s1, s2, … sk}
bioinformatics application NCBI
BLAST
CPU - N
CPU - 0
CPU - 1
CPU - 2
CPU - 3
CPU - 4
CPU - 5
CPU -6
1. Stream Input Formatted data is split
into “960 long chunks” base on new
line.
2. Data “chunks” split into sequences as
keys for the MapReduce
3. Blast output is written to local file
Data Merger
Based on work by Andréa Matsunaga, Maurício Tsugawa and José Fortes - University of Florida
24 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 25. Agenda for The Data Challenge
► Overview of DataDirect Network
► What is Storage Fusion Processing™,
it’s advantages & applications
► Overview of Analytics
► Introduction to Apache Hadoop
► An overview of DDN hScaler solution
► Conclusion
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 26. How DDN can
Accelerate Your Analytics
► Lower Total Cost of Ownership and Improved OPEX:
• Scale – Dynamically add capacity to match your complex workloads
• Value – Grow storage capacity economically: Access, Solve, Archive
• High Availability - Always running with world-class 24/7 service & support
► Drive Innovation:
• Performance at Scale – A homogeneous platform that performs at scale
• Eloquent - Leverage virtualization to deliver analytics platform to provide the
quickest answers to your most complex questions
• Collaboration – Centralize & share discoveries across the globe, securely
► Deliver Experience:
• Fifteen Years of HPC – Government Labs, DoE, and Universities trust DDN
• HPC community rely on DDN – 60% of the top 500 Supercomputer & growing
• Single vendor solution - OEMs provide DDN with their datacenter solutions.
©2012 DataDirect Networks. All Rights Reserved. ddn.com
- 27. Thank you – Questions?
DataDirect Networks, Information in Motion, Silicon Storage Appliance, S2A, Storage Fusion Architecture, SFA, Storage Fusion Fabric, Web Object Scaler, WOS, EXAScaler, GRIDScaler,
xSTREAMScaler, NAS Scaler, ReAct, ObjectAssure, In-Storage Processing and SATAssure are all trademarks of DataDirect Networks. Any unauthorized use is prohibited.
©2012 DataDirect Networks. All Rights Reserved. ddn.com