Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Research and technology explosion in scale-out storage

886 views

Published on

A view of the directions storage is taking in science & technology from Ryan Sayre, technical strategist in the office of the CTO for EMC Isilon, using examples from recent work in life science genomics and other industries taking advantage of the combination of extreme computing (HPC) and big data. As presented at the Bull sponsored Science & Innovation 2013 conference Westminster.

Published in: Technology
  • Be the first to comment

Research and technology explosion in scale-out storage

  1. 1. 1© Copyright 2013 EMC Corporation. All rights reserved. Research and Technology Explosion in the Scale-Out Storage Era Exploring the new frontier of perpetual data growth and how it will affect us Ryan Sayre Technical Strategist, EMEA EMC ISD Office of the CTO June 2013
  2. 2. 2© Copyright 2013 EMC Corporation. All rights reserved. What Is Big Data? Data that challenges the capabilities of a system to capture, manage, and process it within an acceptable elapsed time ~ Wikipedia ~
  3. 3. 3© Copyright 2013 EMC Corporation. All rights reserved. The Big Data Challenge 0 10 20 30 40 50 60 70 80 90 2009 2010 2011 2012 2013 2014 Exabytes By 2013, 80% of all storage capacity sold will be for file-based data Source: “Scale Out Storage in the Content Driven Enterprise: Unleashing the Value of Information Assets,” IDC White Paper (2010 Enterprise Disk Storage Consumption Model), June 2011 File based: 61.8% CAGR Block based: 23.7% CAGR Media & Entertainment Design & Simulation HealthcareBioinformatics Data Analytics File Shares & Archives
  4. 4. 5© Copyright 2013 EMC Corporation. All rights reserved. Genomics Size : : * 1000 EMR Radiology Genomics 88 million outpatient visits to NHS hospitals in 2010/2011 *finished data Sources: Dr. Halamka, BIDMC S. Joshi, internal research HIMSS Internal EMC data Volume 50GB
  5. 5. 6© Copyright 2013 EMC Corporation. All rights reserved.
  6. 6. 7© Copyright 2013 EMC Corporation. All rights reserved. Bioinformatics: A “data tsunami” • Already a cliché in 2006: – “Data Deluge”, “Data Tsunami” … • What changed starting in 2007: Terabyte scale laboratory instruments – “Next Generation” DNA Sequencers – Confocal Microscopy & Live cell imaging – Other Imaging (fMRI, CT, Ultrasound, etc.) • 2010: Faster adoption of next-generation sequencing • 2013: Scale-Out Storage is the only way to keep surviving!
  7. 7. 8© Copyright 2013 EMC Corporation. All rights reserved. Vast quantities of data • Terabyte scale issues have traditionally been “lab” or “workgroup” problems • Individual researchers & lab instruments can generate terabyte volumes of data per-experiment – Average of 40TB storage for each Solexa instrument – A recent “100TB Single-namespace” project was for a lab with a single 454 instrument
  8. 8. 9© Copyright 2013 EMC Corporation. All rights reserved. Sequencing throughput over time (Data from one vendor’s platform) 0 2 4 6 8 10 12 14 16 18 20 GigabasesofSequenceperRun 15 x
  9. 9. 10© Copyright 2013 EMC Corporation. All rights reserved. Throughput Outpacing Moore’s Law • 1000 Genomes Project – Could generate 90Tbase of raw data (@ 30x coverage) • International Cancer Genome Consortium – 50,000+ samples could generate 5,000Tbase of raw data 1 10 100 1,000 10,000 100,000 1,000,000 1996 Today kb/Day CPU
  10. 10. 11© Copyright 2013 EMC Corporation. All rights reserved. 0 10 20 30 40 50 60 70 80 G Per Instrument Sequencer capacity is growing enormously Dependent infrastructure has become a significant and critical factor Home grown storage and compute resources are capable of supporting data reduction and alignment Specialized HPC and storage architectures are required to meet aggregate throughput and processing demands Current HPC architectures can be resource prohibitive at the quantity required to manage data output Time
  11. 11. 12© Copyright 2013 EMC Corporation. All rights reserved. Broad Institute Sequencing Data
  12. 12. 13© Copyright 2013 EMC Corporation. All rights reserved. Big Data Apps Need Big Data Storage Data intensive, HPC workflows Medical Imaging Gene Sequencing Seismic Exploration Media & Entertainment Product DevelopmentSatellite Images
  13. 13. 14© Copyright 2013 EMC Corporation. All rights reserved. Big Data Archive Challenge Relentless Data Growth Primary Storage Overloaded with Unstructured Files – Constant upgrade requirements Performance Issues – Hinders regulatory responses and e- discovery applications Storage Islands Many Systems or 2-way clusters and Points of Management Numerous File Systems/Volumes
  14. 14. 15© Copyright 2013 EMC Corporation. All rights reserved. My own Big Data Growth Story… Started out at 1 Terabytes of shared storage in 2004 – Image Processing and Visualisation – Quickly grew to 5 Terabytes within 5 months – Was worrying about storage every day, needed a way out! – Transitioned to Scale-Out, Scaled to 300 TB within 3 years Current organisation is over 2 Petabytes of storage – No dedicated storage administrator – I/O patterns are managed by policy and tier now
  15. 15. 16© Copyright 2013 EMC Corporation. All rights reserved. UK Case Study : (Life Sciences Institute) Bioinformatics Organisation needing to not only store but cross reference multiple genome types to create a mega database of genomic structural variants across all species Share across multiple organisations across the UK and into greater Europe Need to grow to 20 Petabytes and beyond
  16. 16. 17© Copyright 2013 EMC Corporation. All rights reserved. UK Case Study : (Engineering Design Automation) Performance requirements of over 1 million operations a second to simulate complex electrical pathways Time to market required more rapid simulations to advance technology roadmap Multiple protocols across Windows and Linux systems Growing for both performance and capacity (PB’s)
  17. 17. 18© Copyright 2013 EMC Corporation. All rights reserved. The Scale-Out / Scale-Up Dilemma 18 Scale-out Scale-up Isilon OneFS Other Storage Platforms Scalability • Scale-out • Performance, Capacity, Both • Scale-up • Capacity only, limited performance options Performance • True linear predictability • Degradation of performance & capacity at scale
  18. 18. 19© Copyright 2013 EMC Corporation. All rights reserved. What does this look like?
  19. 19. 20© Copyright 2013 EMC Corporation. All rights reserved. Isilon Scale-Out NAS Architecture OneFS Operating Environment Intra-cluster Communication Layer Servers Client/Application Layer Ethernet Layer Servers Servers SingleFS/Volume CIFSNFS FTPHTTP HDFS for Hadoop
  20. 20. 21© Copyright 2013 EMC Corporation. All rights reserved. Single storage pool for application consolidation Isilon Scale-Out Innovation Simple to scale – Manage 20+ PB like 1TB drive Predictable performance – Grows linearly Efficient and Easy to operate – Maximize utilization to 80%+ – Automate tiering Highly resilient – Survives multiple failures Enterprise proven – Management and protection tools that customers expect No data migrations
  21. 21. 22© Copyright 2013 EMC Corporation. All rights reserved. More scalable than traditional storage systems Largest and Most Scalable File System OneFS scales from 18 TB to more than 20 PB in a single file system, single volume Under 60 seconds to scale with no downtime World’s fastest performance and capacity scaling Over 100 GB/s of throughput
  22. 22. 23© Copyright 2013 EMC Corporation. All rights reserved. Gain New Levels of Efficiency • AutoBalance automatically moves content to new storage nodes while system is online and in production • Eliminates “hot spots” • Enables unmatched storage capacity utilization of more than 80% AutoBalance Automated data balancing across nodes reduces costs, complexity, and risks for scaling storage EMPTY EMPTY EMPTY EMPTY EMPTY FULL FULL FULL FULL BALANCED BALANCED BALANCED BALANCED BALANCED Isilon AutoBalance
  23. 23. 24© Copyright 2013 EMC Corporation. All rights reserved. Optimize Resources with Automated Tiering • Single point of management – Single file system/single volume – Multiple performance tiers • Automatic data movement – Policy-based tiering management – Transparent reallocation – NO application changes • Optimize storage resources – Automatically match storage resources with data requirements – Eliminate data migration Isilon SmartPools S-Series Performance NL-Series Active archives X-Series Collaboration Reducedcost/TB Files
  24. 24. 25© Copyright 2013 EMC Corporation. All rights reserved. With N+2, N+3, and N+4 protection, data is 100% available if multiple drives or nodes fail With N+1 protection, data is 100% available even if a single drive or node fails Highly resilient, clustered architecture Unmatched Data Protection and Availability 100% 100% 100% 100% 100% 100% 100% 100% FAILED FAILED And with Isilon, the more nodes in the cluster, the faster drive rebuild time
  25. 25. 26© Copyright 2013 EMC Corporation. All rights reserved. Interoperability for Operational Flexibility Platform REST API – Simplify management and integration – Third-party application integration VMware integration – VAAI: vStorage APIs for array integration – VASA: vSphere APIs for storage awareness – Virtual Server writeable clones Multi-protocol support – Integrated support for industry-standard protocols – Native HDFS support
  26. 26. 27© Copyright 2013 EMC Corporation. All rights reserved. The Cost Advantage of Scale-Out Ease of use and management simplicity IDC: Isilon improves IT productivity by 48%, reduces OPEX* Storage allocation Storage provisioning Managing capacity Managing backup Space reclamation Adding new applications Uploading of re-loading data 0.0 0.5 1.0 1.5 2.0 FTEHoursperTBinUse Isilon Traditional * Source: “Quantifying the Business Benefits of Scale-Out NAS Solutions,” IDC White Paper, November 2011
  27. 27. 28© Copyright 2013 EMC Corporation. All rights reserved. Reduces Big Data storage costs by 40% The Cost Advantage of Scale-Out $0 $500 $1,000 $1,500 $2,000 $2,500 Traditional Isilon Average Annual Cost Per TB in Use OPEX IT Staff CAPEX Source: “Quantifying the Business Benefits of Scale-Out NAS Solutions,” IDC White Paper, November 2011

×