7. emc isilon hdfs enterprise storage for hadoop

3,437 views
3,223 views

Published on

Published in: Technology, Business
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,437
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • <Note to speakers:The EMC Isilon presenter will cover the 1st half of the presentation, through slide 24. The EMC Greenplum presenter will cover the 2nd half of the presentation, slides 25 – 37Both presenters will participate in the Q+A (with backup from other EMC team members attending the event><To kick off the presentation>:Welcome the audience + thank them for joining usIntroduce yourself + the EMC Greenplum presenter
  • Here’s what we’re going to cover in today’s session:Walk through agenda
  • Isilon has been a leading innovator in scale-out NAS for more than10 years.Isilon scale-out storage is being used today across a wide range of organizations:Data-intensive, high performance computing (HPC) environments such as Life Sciences, Electronic Design Automation, and Media & Entertainment, to name a few examples.Traditional enterprise IT environments: Isilon’s storage systems are used to support a variety of large-scale use cases including archiving, home directories and file shares; virtualization (Tier 3 and Tier 4); and business analytics (Hadoop).In total, Isilon’s scale-out storage solutions are being used by over 3,000 organizations around the world today and, thanks to the success that customers have enjoyed, the business is growing rapidly…about 100percent per year last year. The key engine of customers’ success is the Isilon OneFS operating system. It is instrumental in providing customers with an innovative, scale-out data environment. Note to Presenter: Here are some additional facts that you may want to point out about Isilon:Isilon was founded more than 10 years ago (as Isilon Systems) and is now recognized as the industry leader in scale-out NAS storage solutions. Isilon joined the EMC team in December 2010 (when EMC acquired Isilon Systems). Since then, Isilon’s scale-out storage solutions business has continued to grow rapidly—being adopted in large enterprises across a wide range of industries.Gartner report can be found here: http://www.gartner.com/id=1960515 (abstract only)
  • This slide shows just a sampling of customers who are benefiting from Isilon scale-out storage.
  • One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  • The Isilon OneFS operating system provides the intelligence behind all Isilon scale-out storage systems. It combines the three layers of traditional storage architectures—file system, volume manager, and data protection—into one unified software layer, creating a single intelligent file system that spans all nodes within an Isilon cluster.Note to Presenter: Click now in Slide Show mode for animation.OneFS provides a number of important advantages: A single file system for great ease of management Unmatched efficiency with over 80 percent storage utilization plus automated storage tiering to gain additional efficienciesHigh-performance NASEasy, “grow as you go” flexibility Linear scalabilitylets you can scale performance and capacity to over 15 PB
  • Putting It All Together.The Isilon IQ X-Series, powered by the OneFS® operating system, uses Isilon's scale-out storage architecture to speed access to massive amounts of critical data, while dramatically reducing cost and complexity. Isilon delivers a flexible solution to accelerate your high-concurrent and sequential-throughput applications. With SSD technology for file-system metadata, the Isilon X-Series significantly accelerates namespace intensive operations. S-Series nodes provide balanced throughput and performance and the NL nodes form the foundation for nearline, and archive.Isilon’s modular architecture and intelligent software make deployment and management simple. You can have an Isilon cluster online in less than 10 minutes, without time-consuming, expensive integration services. Scale a cluster in performance and capacity in about one minute all within a single pool of storage with a global namespace, eliminating the need to support multiple volumes and file systems. Isilon’s suite of applications then work together to provide the data management and protection capabilities required by corporate IT – from the front end intelligence that eliminates client and data migration to quota management for file shares. SnapshotIQ and SyncIQ work in concert to protect and replicate important data for local and remote archive while SnapLock provides for the immutability of data. And finally, backup accelerator speeds file replication to tape with a scalable, parallel infrastructure that insures backup windows and recovery time objectives are always met.
  • It this section, we’re going to identify and describe the key technology challenges of Hadoop, especially when deployed using direct-attached storage (DAS).
  • There are 5 basic roles to every hadoop environment:HDFS is made up of the namenode, secondary namenode, and datanode roles.Mapreduce is comprised of the jobtracker and task tracker.
  • The job tracker is effectively the queue master of a hadoopmapreduce environment. It schedules jobs, distributes tasks across available task-trackers, and allows administrators to get a glimpse into the overall activity for a hadoop environment.
  • To go into more detail, the namenode is effectively the metadata server for all HDFS data and data blocks. In large hadoop clusters, this role is run on a dedicated host, typically with a large amount of D-RAM. This is because all metadata for the entire HDFS namespace is stored in local DRAM on this host. As such, traditional hadoop architectures have limitations on the number of objects which can be stored within each HDFS namespace.The namenode is contacted for every block request, both for reads and writes, and is responsible for making sure data blocks are mirrored to multiple datanodes, spanning multiple racks.
  • One challenge associated with traditional deployments of Hadoop, is that it has largely been done on a dedicated infrastructure and not integrated with or connected to any other applications. In effect, a silo’d environment, often outside the realm of the IT team. This poses a number inefficiencies and risks.<click>A well-recognized issue with traditional Hadoop deployments is the “single-point-of-failure” problem with the HadoopNamenode. In a Hadoop environment, a single namenode manages the hadoopfilesystem. If it goes down, the Hadoop environment will immediately go off-line. If the namenode does not come back online, the data stored within all of HDFS is lost and cannot be reconstructed.<Click to next build slide>
  • Another issue with traditional Hadoop environments is the lack of enterprise-level data protection. Typical Hadoop deployments do not have rigorous data protection backup and recovery capabilities such as snapshots or data replication for disaster recovery (DR) purposes.<click> Traditional Hadoop deployments on direct-attached storage (DAS) are also extremely inefficient. It’s not unusual for a DAS environment to operate with a 30-35% storage utilization rate (or less). Compounding this inefficiency is the fact that data is often mirrored (the default is 3 times). In addition to storage inefficiency, this type of infrastructure is very management-intensive.<click>Another issue with Hadoop running with direct attached storage is that ‘server’ and ‘storage’ resources must be increased together in lock-step. For example, if more storage resources are required, a new server must be deployed (and vice versa). This rigidity adds additional inefficiencies. Another issue is the manual import/export of data that is required in a traditional hadoop environment. In addition to being time and resource (bandwith) consuming, the hadoop data in typical environments can not be accessed or shared with other enterprise applications due to the lack of industry-standard protocol support.To address these challenges and to enable enterprises to begin realizing the benefits of Hadoop quickly and easily, EMC has recently introduced an exciting new Hadoop solution.<click to advance to next slide>
  • Isilon is able to “pretend” to be a HDFS cluster: it mimics the NameNode and DataNode protocols to host data.Underlying system is OneFS and does not follow the traditional HDFS scheme.Point HDFS clients (MapReduce, command line, etc.) to the DNS name of the Isilon cluster.
  • One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  • One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  • One reason Hadoop has emerged as an important technology is because it is an innovative, Big Data analytics engine designed specifically for massively large data volumes. With it, organizations can greatly reduce the time required to derive valuable insight from an enterprise’s dataset. By adopting Hadoop to store and analyze massive data volumes, enterprises are gaining an agile new platform to deliver new insights and identify new opportunities to accelerate their business.Hadoop has also been designed to tackle analytics for unstructured data. This is significant because this is the dominant area of data growth projected for the foreseeable future.Now let’s look at how the adoption of Hadoop is evolving.
  • The new EMC solution also eliminates the “single-point-of-failure” issue. We do this by enabling all nodes in an EMC Isilon storage cluster to become, in effect, namenodes. This greatly improves the resiliency of your hadoop environment.The EMC solution for hadoop also provides reliable, end-to-end data protection for Hadoop data including snapshoting for backup and recovery and data replication (with SyncIQ) for disaster recovery capabilities.Our new hadoop solution also takes advantage of the outstanding efficiency of EMC Isilon storage systems. With our solutions, customers can achieve up to 80% or more storage utilization.EMC Hadoop solutions can also scale easily and independently. This means if you need to add more storage capacity, you don’t need to add another server (and vice versa). With EMC isilon, you also get the added benefit of linear increases in performance as the scale increases.EMC also recently announced that we are the 1st vendor to integrate the HDFS (Hadoop Distributed File System) into our storage solutions. This means that with EMC Isilon storage, you can readily use your Hadoop data with other enterprise applications and workloads while eliminating the need to manually move data around as you would with direct-attached storage.
  • Math Logic on 28 hours.100 TB = 100,000,000 MB10GB can transfer approx 1GB per second (not including spindle speeds in calculations)So, 100TB/1GB = # of seconds to transfer then divide by 60 seconds / 60 minutes = 28 hours (ish)
  • It this section, we’re going to identify and describe the key technology challenges of Hadoop, especially when deployed using direct-attached storage (DAS).
  • Customer Profile: http://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdf Company background: www.returnpath.comReturn Path is the worldwide leader in email intelligence, serving Internet service providers (ISPs), businesses, and individuals. The company’s email intelligence solutions process and analyze massive volumes of data to maximize email performance, ensure email delivery, and protect users from spam and other abuse.Previous Environment & Existing ApplicationsPreviously a hodge-podge of more than 25 different storage systems, including server-attached storage, shared Oracle appliances, as well as NetApp and Hewlett-Packard systemsCompany Challenges: Data growing 25–50 terabytes per yearLimited performance and capacity to support intensive Hadoop analyticsDisparate systems lacked performance and capacityEMC Solution & Important Benefits to Customer:EMC Isilon X-seriesHadoop, internally developed email intelligence solutionsSmartPools,SmartConnect,SmartQuotas,InsightIQResults: Enables unconstrained access to email data for analysisReduces shared storage data center footprint by 30 percentImproves availability and reliability for Hadoop analyticsAchieves faster development and time to market of new productsEstimates five-year cost savings of $350,000 from lower power, cooling, and maintenanceShortens weekly administration time by more than 35 percentQuotes: “Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a significant business enabler, allowing Return Path todevelop customer solutions much faster.” Diz Carter Vice President of Infrastructure Operations, Return Path“Considering our projected growth, we were able to make a strong business case for Isilon,” says Carter. “Looking out over five years, we estimate greater than $350,000 in savings from lower power, cooling, and maintenance requirements.”“We went from having boxes on the dock to serving up 180 terabytes in just over three hours,” says Carter. “I’ve never come across another solution as easy toimplement as Isilon.”
  • With Isilon, Return Path now has a single repository for all its Big Data, accessible to email analysts, product development teams and external customers. Previously, performing analytics on email data residing in shared storage required making a separate copy of the data set and manually moving it to the Hadoop environment.  Today, Isilon delivers real-time data to Return Path’s end-user applications while providing seamless integration with Hadoop for back-end data analytics, boosting customer satisfaction and business productivity.“To have all this data being generated by our email intelligence products, but no way to access it directly by Hadoop, was a major hindrance,” Carter remarks. “Now, Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a huge business enabler because we're able to develop products much faster.” Pam please add a place holder for time savings from the old process of manually creating multiple copies to now with Isilon
  • Customer Profile: http://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdf Company background: www.returnpath.comReturn Path is the worldwide leader in email intelligence, serving Internet service providers (ISPs), businesses, and individuals. The company’s email intelligence solutions process and analyze massive volumes of data to maximize email performance, ensure email delivery, and protect users from spam and other abuse.Previous Environment & Existing ApplicationsPreviously a hodge-podge of more than 25 different storage systems, including server-attached storage, shared Oracle appliances, as well as NetApp and Hewlett-Packard systemsCompany Challenges: Data growing 25–50 terabytes per yearLimited performance and capacity to support intensive Hadoop analyticsDisparate systems lacked performance and capacityEMC Solution & Important Benefits to Customer:EMC Isilon X-seriesHadoop, internally developed email intelligence solutionsSmartPools,SmartConnect,SmartQuotas,InsightIQResults: Enables unconstrained access to email data for analysisReduces shared storage data center footprint by 30 percentImproves availability and reliability for Hadoop analyticsAchieves faster development and time to market of new productsEstimates five-year cost savings of $350,000 from lower power, cooling, and maintenanceShortens weekly administration time by more than 35 percentQuotes: “Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. That’s a significant business enabler, allowing Return Path todevelop customer solutions much faster.” Diz Carter Vice President of Infrastructure Operations, Return Path“Considering our projected growth, we were able to make a strong business case for Isilon,” says Carter. “Looking out over five years, we estimate greater than $350,000 in savings from lower power, cooling, and maintenance requirements.”“We went from having boxes on the dock to serving up 180 terabytes in just over three hours,” says Carter. “I’ve never come across another solution as easy toimplement as Isilon.”
  • Thank you
  • 7. emc isilon hdfs enterprise storage for hadoop

    1. 1. 1© Copyright 2011 EMC Corporation. All rights reserved.EMC Isilon HDFS –Enterprise Storage forHadoopFeaturing EMC Isilon Scale-Out NASStorageShai HarmelinEMC System Enginer – Isilon SpecialistMay 21, 2013
    2. 2. 2© Copyright 2011 EMC Corporation. All rights reserved.Today’s Agenda• EMC Isilon Background• HDFS Architectural Challenges• Isilon HDFS Benefits• Performance Comparison• Customer Case Study• Q+A
    3. 3. 3© Copyright 2011 EMC Corporation. All rights reserved.EMC IsilonSetting the standard for scale-out NAS• Founded in 2000 as the leader in Scaleout NAS (Gartner 2010)• Broad adoption across many markets– High Performance Computing (HPC): Life Sciences, Oil & Gas, ElectronicDesign Automation, Media & Entertainment, Financial Services– Enterprise IT: Archive, Home Directories, File Shares, Virtualization,Business Analytics• Acquired by EMC in 2011 for $2.5B• Over 3,500 global customers• Isilon OneFS: Seventh generation, industry-proven, innovativescale-out operating environment• 2012 – EMC Isilon is Industry’s First Scale-Out NAS System with NativeHDFS Support
    4. 4. 4© Copyright 2011 EMC Corporation. All rights reserved.Isilon Growing Momentum3,500+ customers
    5. 5. 5© Copyright 2011 EMC Corporation. All rights reserved.Why Hadoop is Important to EMCIsilon CustomersPragmatic approach to analytics on a very large scale– Opens up new ways of gaining insights and identifyingopportunities for businessesDesigned to address the rise of unstructured data– Enterprise data to grow by 650% over next 5 years– More than 80% of this growth will be unstructured dataHadoop is only ONE component ofEnterprise Big Data Analytics PIPELINE
    6. 6. 6© Copyright 2011 EMC Corporation. All rights reserved.Isilon Scale-Out NAS ArchitectureOneFS OperatingEnvironmentIntra-clusterCommunication LayerServersClient/Application Layer Ethernet LayerServersServersSingleFS/VolumeCIFSNFSFTPHTTPHDFSforHadoop
    7. 7. 7© Copyright 2011 EMC Corporation. All rights reserved.Isilon Core InnovationOneFS scale-out operating systemSingle File SystemSimplicityLeadership EfficiencyHigh PerformanceEasy GrowthAutomated TieringLinear Scalability
    8. 8. 8© Copyright 2011 EMC Corporation. All rights reserved.Largest and Most Scalable File System500X More Scalable than Traditional Storage SystemsOneFS™ can scale from 18TB to over 20,000 TB in asingle file system•••
    9. 9. 9© Copyright 2011 EMC Corporation. All rights reserved.AutoBalanceAutomated data balancing across nodes reduces costs,complexity and risks for scaling storage“Using Software to do Work Unfit for Humans”• AutoBalance migratescontent to new storage nodeswhile system isonline and in production• Requires NO manualintervention, NOreconfiguration,NO server or client mount pointor application changes• Eliminate “Hot Spots”EMPTYEMPTYEMPTYEMPTYEMPTYFULLFULLFULLFULLBALANCEDBALANCEDBALANCEDBALANCEDBALANCED
    10. 10. 10© Copyright 2011 EMC Corporation. All rights reserved. Back to Navigation
    11. 11. 11© Copyright 2011 EMC Corporation. All rights reserved.• Load balancing• Seamless failover• Performance zones• Quotamanagement• Thin provisioning• High speed replication• Disaster recovery• Business continuance• Instant recovery• Data protectionIsilon, Scale-Out NAS for Big DataSingle File System, Single Volume Simplicity For Active,Persistent, And Archive DataWAN/LANPrimary &Nearline StorageLocal/RemoteArchiveClient/ApplicationLayerVirtualized ServersVirtualizedServersClientsX-seriesNetworkNL-series• File immutability• Protection fromdeletion/changeNL-seriesBackupAcceleratorS-series• Automatedstorage tiering
    12. 12. 12© Copyright 2011 EMC Corporation. All rights reserved. Back to NavigationEasiest Storage System to ManageSingle-level ofManagementManage a 18TB to 10PBsingle file system fromone intuitive console"Isilon has made some verybold claims with respect to itsclustered storage products -not least the idea ofgenuinely revolutionizing theease and speed with whichmass storage - over 500Terabytes - can be added andmanaged thereafter. We haveconducted rigorous testingand unanimously agree withtheir assertions. This stuffis almost frighteninglysimple to use.”Steve Broadhead, Founder,Broadband-TestingLaboratories
    13. 13. 14© Copyright 2011 EMC Corporation. All rights reserved.HDFS Overview
    14. 14. 15© Copyright 2011 EMC Corporation. All rights reserved.Secondary NameNodeDataNode / Task TrackerJob TrackerNameNodeCore Hadoop Components
    15. 15. 16© Copyright 2011 EMC Corporation. All rights reserved.Job TrackerManages all the jobs to the clusterTracks and reports the status of jobs and tasksProvides job queuing functionalityCommunicates with NameNode and tries to align TaskTracker to Data NodesThe compute workhorseServes read/write requests from the clientsExecutes Map/Reduce tasksTypically performs I/O against local or remote DataNodesTask TrackerCompute Components
    16. 16. 17© Copyright 2011 EMC Corporation. All rights reserved.NameNodeManages the file system namespaceStores all the Metadata in the RAM – alimitation on file system sizeFilenames, owners, group, access infoKnows associated blocksManages block replication acrossDataNodesManages edit log and check-pointing of name node metadataDoes not provide name node hotfailoverCDH4 has a solution for this, butis not in full scale production inmost environmentsSecondary NameNodeStores blocks of files on top of native host OS file system (e.g. EXT3, XFS, ZFS)Same block is stored on multiple DataNodes for redundancyHas no “awareness” of data blocks living elsewhere (only the namenode does)DataNodeFile SystemComponents
    17. 17. 18© Copyright 2011 EMC Corporation. All rights reserved.Enterprise Challenges of HadoopHadoop DAS Environment1Dedicated Storage Infrastructure– One-off for Hadoop only2Single Point of Failure– Namenode3Lacking Enterprise Data Protection– No Snapshots, replication, backup4Poor Storage Efficiency– 3X mirroring5Fixed Scalability– Rigid compute to storage ratio6Manual Import/Export– No protocol interoperability supportName node
    18. 18. 19© Copyright 2011 EMC Corporation. All rights reserved.Enterprise Challenges of HadoopHadoop DAS Environment1Dedicated Storage Infrastructure– One-off for Hadoop only2Single Point of Failure– Namenode3Lacking Enterprise Data Protection– No Snapshots, replication, backup4Poor Storage Efficiency– 3X mirroring5Fixed Scalability– Rigid compute to storage ratio6Manual Import/Export– No protocol support1x1x2x2x3x2x3x3x1xNamenode
    19. 19. 20© Copyright 2011 EMC Corporation. All rights reserved.Isilon HDFS SupportIsilon supports the HDFSinterfaces for the NameNodeand DataNode to host andmetadata and dataUnderlying file system isOneFSAs simple as pointing theHadoop Nodes to the DNSname of the Isilon cluster!
    20. 20. 21© Copyright 2011 EMC Corporation. All rights reserved.HDFS is a protocol!Each Isilon node now “speaks” the HDFS NameNode andDataNode protocolWe eliminate need to run these services on the Hadoop computeclusterEvery Isilon node acts as both a namenode and datanode(isi_hdfs_d)Data is laid out within OneFS exactly the same as for NFS, SMB,etc.Data is protected just like any other data in the Isilon FileSystem. No Mirroring, only Parity = 80% utilizationAll Isilon Enterprise Features are applied to Hadoop data:Snapshots, Replication, SmartCache, SmartLock, etc…
    21. 21. 22© Copyright 2011 EMC Corporation. All rights reserved.HDFS Writes on IsilonJobtracker asks Isilon namenode (isi_hdfs_d) “tell me where toplace /path/file”OneFS isi_hdfs_d hands JT list of 3 “datanode” addresses foreach block (aligned to block size defined on Hadoop cluster)Jobtracker assigns task tracker to communicate to data-node(isi_hdfs_d) to write each data block (an abstraction in our case)When complete, isi_hdfs_d responds by saying the block isreplicated (a lie) because Data is striped like any other file,written over any protocol.HDFS files are laid out on Isilon File Systems (IFS) similarly to any otherprotocol (NFS, CIFS, FTP)File can be written over NFS (nfsd) or CIFS (lwiod) and accessedover HDFS (isi_hdfs_d)
    22. 22. 23© Copyright 2011 EMC Corporation. All rights reserved.HDFS Reads on IsilonJobtracker asks Isilon namenode (isi_hdfs_d) “tell me where/path/file lives”isi_hdfs_d responds with list of block addresses (3 datanode IP’sper block). Note that the blocksize in this case is configurableon isilon (default 64MB)Jobtracker assigns task trackers to read each block (first addressout of 3 for each)Tasks within each task tracker ask namenode (again) for blocklocations, then initiate I/O transactions to read the data over thenetworkThe concept of locality is eliminated accept for rack awareness.
    23. 23. 24© Copyright 2011 EMC Corporation. All rights reserved.Isilon HDFS Settings
    24. 24. 25© Copyright 2011 EMC Corporation. All rights reserved.How EMC Isilon Addresses the HadoopChallenge1Dedicated Storage Infrastructure– One-off for Hadoop only2Single Point of Failure– Namenode3Lacking Enterprise Data Protection– No Snapshots, replication, backup4Poor Storage Efficiency– 3X mirroring5Fixed Scalability– Rigid compute to storage ratio6Manual Import/Export– No protocol support1Scale-Out Storage Platform– Multiple applications & workflows2No Single Point of Failure– Distributed Namenode3End-to-End Data Protection– SnapshotIQ, SyncIQ, NDMP Backup4Industry-Leading Storage Efficiency– >80% Storage Utilization5Independent Scalability– Add compute & storage separately6Multi-Protocol– Industry standard protocols– NFS, CIFS, FTP, HTTP, HDFS
    25. 25. 27© Copyright 2011 EMC Corporation. All rights reserved.Distributed (Clustered) Name Node When Using IsilonMTTDL = 5,000 yearsMetadata stored acrosssystems same way asstandard file metadataBuilt-in clustered redundancyacross many nodesName NodeClustering theNameNode onIsilon allowsfor the failureprotectionlevel IsilonalreadyprovidesClusteredNameNode
    26. 26. 28© Copyright 2011 EMC Corporation. All rights reserved.Fixed Scaling / Independent ScalingHadoopIsilonStorage to Compute ratio is fixedScaling compute means scalingcapacityDifficult to provide QoSCompute upgrade is a forkliftScale compute independent ofstorageAchieve optimal performancebalance even as workloads evolveNo data migrations, ever!Add new performance ashardware evolvesstoragecomputeDesiredperformance/capacity
    27. 27. 29© Copyright 2011 EMC Corporation. All rights reserved.Protocol SupportServersServersServersBeforeAfterHDFS is not visible toWindows, Unix, Linux,Apple, or any other filesystem nativelyBig Data is only used forBig DataInherent Multi-ProtocolSupport in Isilon allowsubiquitous access to allfile systems includingHadoopBig Data is actual data!Servers
    28. 28. 30© Copyright 2011 EMC Corporation. All rights reserved.Data Center NetworkTime-to-ResultsData Copy Analysis In-Place AnalysisExisting Primary StorageHadoop on a StickHave you evercopied 100TB fromPrimary Storage toa Hadoop system?How long does ittake ≈ to copy100TB from oneplace to anotherover a 10GB link?>24 HoursData Center NetworkExisting Primary StorageHadoop Processing NodesReading relevantdata to analysis
    29. 29. 31© Copyright 2011 EMC Corporation. All rights reserved.Snapshot/Version ControlBeforeAfterTraditional HDFS does nothave replicationNo Snapshotting of dataLoss of Version controlNot designed for MissionCriticalFull Snapshot IQTMintegration identifieschangesMulti-threaded, Multi-NodeScale-Out replicationImproved RPO/RTO forbusiness continuityGeo-replicated Hadoop!5 5
    30. 30. 32© Copyright 2011 EMC Corporation. All rights reserved.Hadoop Distributions Support on Isilon HDFS• Available now in 7.0.1.5• Multiple HDFS:// namespaces– hdfs://DAS + hdfs://isilon– Potential for archive/tiering– Hadoop cluster version mixing• Distributions:– Cloudera CDH4.x– Hortonworks HDP-2– PivotalHD 1.0 (aka: GPHD 2.0)– Apache 0.23 / apache 2.0HDFS v2HDFS v1
    31. 31. 33© Copyright 2011 EMC Corporation. All rights reserved.Performance
    32. 32. 34© Copyright 2011 EMC Corporation. All rights reserved.Test Used HiBenchDeveloped by Intel and Open Sourced– Collection of standard Hadoop jobs– Our tests focused on TeraSort and TestDFSIOAll results normalized as throughput per node to allow comparison of differingconfigsTestDFSIO tests were uncompressed, which shows actual I/O efficiency– Compressed gives much higher performance, but is not actual I/O
    33. 33. 35© Copyright 2011 EMC Corporation. All rights reserved.GPHD-Isilon is Highly Competitive
    34. 34. 36© Copyright 2011 EMC Corporation. All rights reserved.Terasort Performance is ComparableBetween Configurations
    35. 35. 37© Copyright 2011 EMC Corporation. All rights reserved.I/O Performance Scales As Isilon NodesAre Added
    36. 36. 38© Copyright 2011 EMC Corporation. All rights reserved.For Typical Workloads, 1.5 ComputeNodes Per Isilon x400 Node is Good(4) Isilon x400Nodes Tested
    37. 37. 39© Copyright 2011 EMC Corporation. All rights reserved.Return Pathhttp://www.emc.com/collateral/customer-profiles/h11528-return-path-cp.pdfChallengesLimited performance and capacity to support intensive Hadoop analyticsNFS and Hadoop environments struggled to handle unique data sets comprised ofhundreds of millions of small email files, and large analytics files, which hinderedanalytics and delivery of customer solutions25 different DAS and NAS storage systems lacked performance and capacityStorage projected to increase from 150TB to 2PB over the next 5 yearsCompany background:• Return Path is the worldwide leader in email intelligence, serving Internetservice providers (ISPs), businesses, and individuals.• The company’s email intelligence solutions process and analyze massive volumesof data to maximize email performance, ensure email delivery, and protect usersfrom spam and other abuse.• Developed Hadoop based email intelligence solutions combined with NAS baseddata access
    38. 38. 40© Copyright 2011 EMC Corporation. All rights reserved.Return PathResultsReturn Path now has a single repository for all its Big Data, accessible to emailanalysts, product development teams and external customers.Isilon delivers real-time data to Return Path’s end-user applications whileproviding seamless integration with Hadoop for back-end data analyticsReduces shared storage data center footprint by 30 percentShortens weekly administration time by more than 35 percentImproves availability and reliability for Hadoop analyticsSavings of $350,000 from lower power, cooling, and maintenanceIsilon Solution and BenefitsSolutionIsilon X400 Scaleout NAS – Approx 200TB capacitySmartConnect, SmartQuotas, InsightIQ Software suiteNFS and HDFS Data Access Protocols
    39. 39. 41© Copyright 2011 EMC Corporation. All rights reserved.Return Path“To have all this data being generated by our email intelligence products, but no wayto access it directly by Hadoop, was a major hindrance,”“Isilon serves NFS data across multiple product suites and makes it easily accessible toour Hadoop analytics team. That’s a significant business enabler, allowing Return Path todevelop customer solutions much faster.”“Isilon InsightIQ software has been invaluable, providing visibility into our infrastructureand managing our space efficiently as we grow.”DIZ CARTERVP InfrastructureOperationsCustomer Quotes
    40. 40. 42© Copyright 2011 EMC Corporation. All rights reserved.Questions?
    41. 41. 43© Copyright 2011 EMC Corporation. All rights reserved.Thank You!

    ×