Big Data Benchmarking with RDMA solutions


Published on

Presented by Eyal Gutkind of Mellanox in the Mellanox theater, during Oracle OpenWorld 2013

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • HDFS is the underlying file system for Hadoop.WE have a project ongoing with OSU – stay tuned for the availability schedule.
  • Test configuration:5 nodes, Apache Hadoop 1.1.2 E5-2670, 64GB DRAM.1 Name Node, 4 Data Nodes.HDDs: 5x 1TB, 10K per NodeSSDs: 2x 960MB, PCIe Gen II x4.HiBench 2.2 test suite
  • Hadoop Filesystem Agnostic API
  • Recipe on how to build a big data solution is available on Mellanox web site.Everything is there, components, scripts, tradeoffs – USE IT, it works.
  • Ask your customers to login to the AWB.It is for their use and try, it is deployed over Mellanox e2e FDR network utilizing UDA and UFM
  • Big Data Benchmarking with RDMA solutions

    1. 1. © 2013 Mellanox Technologies 1 Big Data Benchmarking with RDMA solutions Oracle Open World 2013
    2. 2. © 2013 Mellanox Technologies 2 Leading Supplier of End-to-End Interconnect Solutions Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables Comprehensive End-to-End InfiniBand and Ethernet Portfolio Virtual Protocol Interconnect Storage Front / Back-End Server / Compute Switch / Gateway 56G IB & FCoIB 56G InfiniBand 10/40/56GbE & FCoE 10/40/56GbE Fibre Channel Virtual Protocol Interconnect
    3. 3. © 2013 Mellanox Technologies 3  A scalable fault-tolerant distributed system for data storage and processing  Hadoop has two main systems • Hadoop Distributed File System: self-healing high-bandwidth clustered storage. • MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data programming abstraction.  Key values • Flexibility – Store any data, Run any analysis. • Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. • Economics – Cost per TB at a fraction of traditional options. Hadoop Framework HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig Map Reduce HDFS™ (Hadoop Distributed File System)
    4. 4. © 2013 Mellanox Technologies 4 Three Areas for Accelerations  Data Analytics • Explore inefficiencies in existing analytics frameworks and systems • Accelerate data processing to deliver faster results  Storage • Explore ways to refine dominant file system • Take advantage for direct attached disk to accelerate data access  Distributed Storage • Leverage popular distributed storage systems with Big Data applications • Use existing systems for usage with Big Data frameworks
    5. 5. © 2013 Mellanox Technologies 5 ~88% CPU Utilization I/O Offload Frees Up CPU for Application Processing UserSpaceSystemSpace ~53% CPU Utilization ~47% CPU Overhead/Idle ~12% CPU Overhead/Idle Without RDMA With RDMA and Offload UserSpaceSystemSpace
    6. 6. © 2013 Mellanox Technologies 6  Plug-in architecture • Open-source, latest GA version 3.1 (6/10/2013) • Google code repository at:  Accelerates Map Reduce Jobs • Accelerated merge sort  Efficient Shuffle Provider • Data transfer over RDMA • Supports InfiniBand and Ethernet  Supported Hadoop Distributions • Apache 3.0 – In the main trunk! • Apache 2.0.3 – In the main trunk! • Apache Hadoop 1.0.x ; 1.1.x • Cloudera Distribution Hadoop 3 &4 • Hortonworks HDP 1.x • GPHD 1.2  Supported Hardware • ConnectX®-3 VPI • SwitchX-2 based systems Unstructured Data Accelerator - UDA HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig Map Reduce
    7. 7. © 2013 Mellanox Technologies 7 Double Map Reduce Performance with UDA *TeraSort is a popular benchmark used to measure the performance of Hadoop cluster ~50%Disk Access CPU Efficiency 2.5X **1TB Data Set, 20x dual X5670 (Westmere) Machines, 10x HDD Base; Vanilla GPHD1.2; UDA  GPHD1.2+UDA 2X Faster Job Completion! Increase the Value of Data! 54%
    8. 8. © 2013 Mellanox Technologies 8  HDFS is the Hadoop File System • The underlying File system for HBase and other NoSQL Data Bases  More Drives, Higher Throughput is Needed  SSDs Solutions Must use Higher Throughput • Bounded by 1GbE and 10GbE HDFS Acceleration; Joint Project With Ohio State University HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig
    9. 9. © 2013 Mellanox Technologies 9  SSDs Become De-Facto standard in HDFS deployment • Read capability is a critical factor for application performance  E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster • 1GbE network impede any performance benefit from SSD deployment Unlocking the Power SSDs In Hadoop Environment E-DFSIO, Showing the Power of SSD @ HDFS
    10. 10. © 2013 Mellanox Technologies 10 OrangeFS as Hadoop Storage Solution
    11. 11. © 2013 Mellanox Technologies 11  Mellanox VPI Card • MCX354A-FCBT  Mellanox Edge Switches • MSX10xx; MSX60xx Cloudera Certified – CDH3 and CDH4
    12. 12. © 2013 Mellanox Technologies 12  E5-26x0 (Sandy Bridge) Machines • Dual Socket • 4+ cores each socket • 32GB+ of DRAM  Disk Drives • At least 5 x 1TB, SAS, 10K RPM  Hadoop Configuration • At least one Name Node + Job Tracker • At least 4 Data Nodes  Installation: • Your selection of Hadoop Distribution or other Big Data solution (Such as Cassandra)  Networking • ConnectX-3 VPI card, FDR, 40GbE and 10GbE • SwitchX based systems: MSX6036F, MSX1036B and MSX1016 • Mellanox’s FDR, 40GbE and 10GbE Cable Solutions  Simple Building Block for Big Data Solution
    13. 13. © 2013 Mellanox Technologies 13  EMC 1000-Node Analytic Platform  Accelerates Industry's Hadoop Development  24 PetaByte of physical storage • Half of every written word since inception of mankind  Mellanox VPI Solutions Test Drive Your Big Data 2X Faster Hadoop Job Run-Time Hadoop Acceleration High Throughput, Low Latency, RDMA Critical for ROI
    14. 14. © 2013 Mellanox Technologies 14 Thank You