Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data Benchmarking with RDMA solutions


Published on

Presented by Eyal Gutkind of Mellanox in the Mellanox theater, during Oracle OpenWorld 2013

Presented by Eyal Gutkind of Mellanox in the Mellanox theater, during Oracle OpenWorld 2013

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • HDFS is the underlying file system for Hadoop.WE have a project ongoing with OSU – stay tuned for the availability schedule.
  • Test configuration:5 nodes, Apache Hadoop 1.1.2 E5-2670, 64GB DRAM.1 Name Node, 4 Data Nodes.HDDs: 5x 1TB, 10K per NodeSSDs: 2x 960MB, PCIe Gen II x4.HiBench 2.2 test suite
  • Hadoop Filesystem Agnostic API
  • Recipe on how to build a big data solution is available on Mellanox web site.Everything is there, components, scripts, tradeoffs – USE IT, it works.
  • Ask your customers to login to the AWB.It is for their use and try, it is deployed over Mellanox e2e FDR network utilizing UDA and UFM
  • Transcript

    • 1. © 2013 Mellanox Technologies 1 Big Data Benchmarking with RDMA solutions Oracle Open World 2013
    • 2. © 2013 Mellanox Technologies 2 Leading Supplier of End-to-End Interconnect Solutions Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables Comprehensive End-to-End InfiniBand and Ethernet Portfolio Virtual Protocol Interconnect Storage Front / Back-End Server / Compute Switch / Gateway 56G IB & FCoIB 56G InfiniBand 10/40/56GbE & FCoE 10/40/56GbE Fibre Channel Virtual Protocol Interconnect
    • 3. © 2013 Mellanox Technologies 3  A scalable fault-tolerant distributed system for data storage and processing  Hadoop has two main systems • Hadoop Distributed File System: self-healing high-bandwidth clustered storage. • MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data programming abstraction.  Key values • Flexibility – Store any data, Run any analysis. • Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. • Economics – Cost per TB at a fraction of traditional options. Hadoop Framework HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig Map Reduce HDFS™ (Hadoop Distributed File System)
    • 4. © 2013 Mellanox Technologies 4 Three Areas for Accelerations  Data Analytics • Explore inefficiencies in existing analytics frameworks and systems • Accelerate data processing to deliver faster results  Storage • Explore ways to refine dominant file system • Take advantage for direct attached disk to accelerate data access  Distributed Storage • Leverage popular distributed storage systems with Big Data applications • Use existing systems for usage with Big Data frameworks
    • 5. © 2013 Mellanox Technologies 5 ~88% CPU Utilization I/O Offload Frees Up CPU for Application Processing UserSpaceSystemSpace ~53% CPU Utilization ~47% CPU Overhead/Idle ~12% CPU Overhead/Idle Without RDMA With RDMA and Offload UserSpaceSystemSpace
    • 6. © 2013 Mellanox Technologies 6  Plug-in architecture • Open-source, latest GA version 3.1 (6/10/2013) • Google code repository at:  Accelerates Map Reduce Jobs • Accelerated merge sort  Efficient Shuffle Provider • Data transfer over RDMA • Supports InfiniBand and Ethernet  Supported Hadoop Distributions • Apache 3.0 – In the main trunk! • Apache 2.0.3 – In the main trunk! • Apache Hadoop 1.0.x ; 1.1.x • Cloudera Distribution Hadoop 3 &4 • Hortonworks HDP 1.x • GPHD 1.2  Supported Hardware • ConnectX®-3 VPI • SwitchX-2 based systems Unstructured Data Accelerator - UDA HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig Map Reduce
    • 7. © 2013 Mellanox Technologies 7 Double Map Reduce Performance with UDA *TeraSort is a popular benchmark used to measure the performance of Hadoop cluster ~50%Disk Access CPU Efficiency 2.5X **1TB Data Set, 20x dual X5670 (Westmere) Machines, 10x HDD Base; Vanilla GPHD1.2; UDA  GPHD1.2+UDA 2X Faster Job Completion! Increase the Value of Data! 54%
    • 8. © 2013 Mellanox Technologies 8  HDFS is the Hadoop File System • The underlying File system for HBase and other NoSQL Data Bases  More Drives, Higher Throughput is Needed  SSDs Solutions Must use Higher Throughput • Bounded by 1GbE and 10GbE HDFS Acceleration; Joint Project With Ohio State University HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig
    • 9. © 2013 Mellanox Technologies 9  SSDs Become De-Facto standard in HDFS deployment • Read capability is a critical factor for application performance  E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster • 1GbE network impede any performance benefit from SSD deployment Unlocking the Power SSDs In Hadoop Environment E-DFSIO, Showing the Power of SSD @ HDFS
    • 10. © 2013 Mellanox Technologies 10 OrangeFS as Hadoop Storage Solution
    • 11. © 2013 Mellanox Technologies 11  Mellanox VPI Card • MCX354A-FCBT  Mellanox Edge Switches • MSX10xx; MSX60xx Cloudera Certified – CDH3 and CDH4
    • 12. © 2013 Mellanox Technologies 12  E5-26x0 (Sandy Bridge) Machines • Dual Socket • 4+ cores each socket • 32GB+ of DRAM  Disk Drives • At least 5 x 1TB, SAS, 10K RPM  Hadoop Configuration • At least one Name Node + Job Tracker • At least 4 Data Nodes  Installation: • Your selection of Hadoop Distribution or other Big Data solution (Such as Cassandra)  Networking • ConnectX-3 VPI card, FDR, 40GbE and 10GbE • SwitchX based systems: MSX6036F, MSX1036B and MSX1016 • Mellanox’s FDR, 40GbE and 10GbE Cable Solutions  Simple Building Block for Big Data Solution
    • 13. © 2013 Mellanox Technologies 13  EMC 1000-Node Analytic Platform  Accelerates Industry's Hadoop Development  24 PetaByte of physical storage • Half of every written word since inception of mankind  Mellanox VPI Solutions Test Drive Your Big Data 2X Faster Hadoop Job Run-Time Hadoop Acceleration High Throughput, Low Latency, RDMA Critical for ROI
    • 14. © 2013 Mellanox Technologies 14 Thank You