DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final

1,114 views
983 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,114
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final

  1. 1. v7.0 – 09/07/2012Accelerating Decisions ThroughEnterprise HadoopEvolving Hadoop to support Enterprise Computingv7.0 – 09/07/2012 Joey Jablonski Practice Director, Analytic Services ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  2. 2. Agenda for The Data Challenge► Overview of DataDirect Network► What is Storage Fusion Processing™, it’s advantages & applications► Overview of Analytics► Introduction to Apache Hadoop► An overview of DDN hScaler solution► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  3. 3. DDN | We Accelerate Information Insight DDN provides a competitive advantage by maximizing your datacenter investment while mitigating growth challenges over your discovery process. ► Established: 1998 ► Revenue: $226M (2011) – Profitable, Fast Growth ► Main Office: Sunnyvale, California, USA ► Employees: 600+ Worldwide ► Worldwide Presence: 16 Countries ► Installed Base: 1,000+ End Customers; 50+ Countries ► Go To Market: Global Partners, Resellers, Direct World-Renowned & Award-Winning ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  4. 4. DDN | 15 Years in HPC Investment In Scale & Innovation First HPC DDN Customer Incorporated DDN 1st Customer SFA Project WOS Project Largest private 500+ FOUNDED NASA Inception Inception storage co. (IDC) EMPLOYEES 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 S2A8000 S2A9900 S2A6000 S2A9550 S2A3000AWARDS 6620 10K 12K ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  5. 5. Agenda for The Data Challenge► Overview of DataDirect Network► What is Storage Fusion Processing™, it’s advantages & applications► Overview of Analytics► Introduction to Apache Hadoop► An overview of DDN hScaler solution► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  6. 6. Storage Fusion Processing™ Applications DDN’sStorage Fusion GRIDScaler™ Architecture Network Interface Network Interface SAS Storage Server Interface Compute Storage RAID Resource Media Controller • Driving Imperatives = Improved OPEX  Massive bandwidth and low latency to storage media  Multi-core processors + Big DRAMs  Virtualization / Hypervisor ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  7. 7. DDN | Appliance Portfolio GRIDScaler™ EXAScaler™ SFA12K-E SFA10K-E SFA10K-M WOS6000 Bandwidth: 40GB/s Bandwidth: 15GB/s Bandwidth: 2GB/s 4U, 60-Drive System Flash IOPS: 1.4M Flash IOPS: 840K Flash IOPS: 840K 8 x GbE per NodeScales to 1680 Drives Scales to 1200 dives Scales to 120 dives 2PB/Rack, 23PB/ClusterIn-Storage Processing In-Storage Processing In-Storage Processing 25B Objects/Rack Maximize Value: Best-In-Class Performance to Accelerate Applications Minimize OPEX: >2x More Data Center Efficient Than Competing Systems Minimize Overhead: Autonomous System Fault Management & Recovery ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  8. 8. Storage Fusion Processing™A Unique DDN VisionEmbedded Data-Intensive ApplicationsWithin Storage Infrastructure►Reduce complexity, infrastructure, administration, TCO►Reduce infrastructure & OPEX►Increase performance for latency sensitive applications►Success today with: File-Systems, iRODS, Hadoop, BWA, FASTA/SAM/BAM►Work with your research teams to: • Identify application candidates Gap Aligners? • Port to our VMs/Hypervisor and Benchmark Molecular Dynamics? • Deploy to your community Deep and wide search? Query engine? ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  9. 9. Agenda for The Data Challenge► Overview of DataDirect Network► What is Storage Fusion Processing™, it’s advantages & applications► Overview of Analytics► Introduction to Apache Hadoop► An overview of DDN hScaler solution► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  10. 10. Why Data Analytics is so Hard? Technical Business Hacking Skills Business Acumen Data Science Analytics Math & Decisioning Traditional Research Substantive Statistics Poor Communications Curiosity Expertise knowledge ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  11. 11. Analytics | Looking for Actionable DataBillions of DataPoints toConsider• Consumer purchasing trends• Product perception• Drug Discovery• Genomics• Surveillance• Financial Analysis ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  12. 12. How do I leverage Analytics? Improved Results Modify Insight Behavior ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  13. 13. Data GravityWarps the Application Space Applications DATA Services ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  14. 14. Todays Enterprise Picture Empowered Enabled Aware Users Users Users The Cloud ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  15. 15. Agenda for The Data Challenge► Overview of DataDirect Network► What is Storage Fusion Processing™, it’s advantages & applications► Overview of Analytics► Introduction to Apache Hadoop► An overview of DDN hScaler solution► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  16. 16. The tools of the TradeEcosystem Hadoop 4 3 5Core Apache Hadoop 2 6 1 Map Reduce 1 2 3 4 5 6 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  17. 17. Hadoop & HPC Compared Data Locality Inter-process Communication Job Input HPC 1 2 3 4 5 6 Slic Slic e1 en 4 3 5 Job Input 2 6 1 Hadoop Slic Slic e1 en 1 2 3 4 5 6 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  18. 18. Organizational ScalabilityHigher is Better Adoption Goal for Human Costs Capacity 18 6/8/12 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  19. 19. Agenda for The Data Challenge► Overview of DataDirect Network► What is Storage Fusion Processing™, it’s advantages & applications► Overview of Analytics► Introduction to Apache Hadoop► An overview of DDN hScaler solution► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  20. 20. Hadoop Cluster Lifecycle Deploy Upgrade Manage Respond MonitorSoftware Platform Hardware Platform ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  21. 21. Infrastructure Chargeback • Visibility to Trends • Actionable Reporting • Limits & Enforcement Site Overview ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  22. 22. Analytics Services Portfolio Architect Deploy Manage Customize• Data Transformation • hScaler Installation • Data Curation • Data Migration• Data & Analytics • hScaler Upgrade • hScaler Administration • DR&BC Strategy • Environment Integration • System Tuning • Application Integration• Security Strategy in • Performance Testing • Health Checks • Data Curation shared-data • Operational Validation • Application Development Environments • Factory Build • Data Cleansing• DR&BC• Data Curation• Solution Sizing• Data Center Preparation Support• Process Integration • Phone/Email• ETL planning • Phone Home Monitoring• Compliance Planning • Patches & Upgrades • Remote Diagnostics ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  23. 23. Apache HadoopGenomics Application Examples ► Apache Hadoop™ MapReduce™ computing efficiency: • The algorithm-performance should scale with CPU count • The algorithm should be embarrassingly parallel • There should be no dependence on how the data is distributed • The data should be static ► Example genomics application that work well within Hadoop: • Crossbow. Whole genome re-sequencing & SNP genotyping (short reads) • Contrail. De novo assembly from short sequencing reads. • Myrna. Fast short-read & differential gene expression aligner (RNA-seq) • PeakRanger. Cloud-enabled peak caller for ChIP-seq data. • Quake. Quality-aware detection and sequencing error correction tool. • BlastReduce. High-performance short read mapping. • CloudBLAST. Hadoop implementation of NCBI’s Blast. • MrsRF. Algorithm for analyzing large evolutionary trees. 23 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  24. 24. CloudBLAST Application Example StreamInputFormat CloudBLAST is a Map-Reduce version of the commonly used S= {s1, s2, … sk} S= {s1, s2, … sk} S= {s1, s2, … sk} bioinformatics application NCBI BLAST CPU - N CPU - 0 CPU - 1 CPU - 2 CPU - 3 CPU - 4 CPU - 5 CPU -6 1. Stream Input Formatted data is split into “960 long chunks” base on new line. 2. Data “chunks” split into sequences as keys for the MapReduce 3. Blast output is written to local file Data MergerBased on work by Andréa Matsunaga, Maurício Tsugawa and José Fortes - University of Florida 24 ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  25. 25. Agenda for The Data Challenge► Overview of DataDirect Network► What is Storage Fusion Processing™, it’s advantages & applications► Overview of Analytics► Introduction to Apache Hadoop► An overview of DDN hScaler solution► Conclusion ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  26. 26. How DDN can Accelerate Your Analytics► Lower Total Cost of Ownership and Improved OPEX: • Scale – Dynamically add capacity to match your complex workloads • Value – Grow storage capacity economically: Access, Solve, Archive • High Availability - Always running with world-class 24/7 service & support► Drive Innovation: • Performance at Scale – A homogeneous platform that performs at scale • Eloquent - Leverage virtualization to deliver analytics platform to provide the quickest answers to your most complex questions • Collaboration – Centralize & share discoveries across the globe, securely► Deliver Experience: • Fifteen Years of HPC – Government Labs, DoE, and Universities trust DDN • HPC community rely on DDN – 60% of the top 500 Supercomputer & growing • Single vendor solution - OEMs provide DDN with their datacenter solutions. ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  27. 27. Thank you – Questions?DataDirect Networks, Information in Motion, Silicon Storage Appliance, S2A, Storage Fusion Architecture, SFA, Storage Fusion Fabric, Web Object Scaler, WOS, EXAScaler, GRIDScaler, xSTREAMScaler, NAS Scaler, ReAct, ObjectAssure, In-Storage Processing and SATAssure are all trademarks of DataDirect Networks. Any unauthorized use is prohibited. ©2012 DataDirect Networks. All Rights Reserved. ddn.com

×