• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
 

Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

on

  • 7,641 views

MapR Architecture Presentation given at Strata + Hadoop World 2013 by MapR CTO & Co-Founder M.C. Srivas ...

MapR Architecture Presentation given at Strata + Hadoop World 2013 by MapR CTO & Co-Founder M.C. Srivas


Prior to co-founding MapR, Srivas ran one of the major search infrastructure teams at Google where GFS, BigTable and MapReduce were used extensively. He wanted to provide that powerful capability to everyone, and started MapR on his vision to build the next-generation platform for Big Data. His strategy was to evolve Hadoop and bring simplicity of use, extreme speed and complete reliability to Hadoop users everywhere, and make it seamlessly easy for enterprises to use this powerful new way to get deep insights. Srivas brings to MapR his experiences at Google, Spinnaker Networks, Transarc in building game-changing products that advance the state of the art.

Statistics

Views

Total Views
7,641
Views on SlideShare
1,206
Embed Views
6,435

Actions

Likes
6
Downloads
42
Comments
0

4 Embeds 6,435

http://www.mapr.com 6397
http://72.249.109.133 29
https://twitter.com 8
http://72-249-109-133.static.directrouter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • MapR provides a complete platform for Apache Hadoop. MapR has integrated, tested and hardened a broad array of packages as part of this distribution Hive, Pig, Oozie, Sqoop, plus additional packages such as Cascading. All of these packages with MapR updates are available on Github.We have spent over a two year well funded effort to provide deep architectural improvements to create the next generation distribution for Hadoop. This is in stark contrast with the alternative distributions from Cloudera, HortonWorks, Apache which are all equivalent.
  • The Namenode today in Hadoop is a single point of failure, a scalability limitation, and a performance bottleneck.With MapR there is no dedicated NameNode. The NameNode function is distributed across the cluster. This provides major advantages in terms of HA, data loss avoidance, scalability and performance. Other distributions you have a bottleneck regardless of the number of nodes in the cluster. With other distributions the most number of files that you can support is 200M at the maximum and that is with an extremely high end server. 50% of the processing of Hadoop in Facebook is to pack and unpack files to try to work around this limitation. MapR scales uniformly.
  • Another major advantage with MapR is the distributed Namenode. The Namenode today in Hadoop is a single point of failure, a scalability limitation, and a performance bottleneck.With MapR there is no dedicated NameNode. The NameNode function is distributed across the cluster. This provides major advantages in terms of HA, data loss avoidance, scalability and performance. Other distributions you have a bottleneck regardless of the number of nodes in the cluster. With other distributions the most number of files that you can support is between 70-100M. 50% of the processing of Hadoop in Facebook is to pack and unpack files to try to work around this limitation. MapR scales uniformly.
  • With MapR’s Direct Access NFS we use industry standard protocol to load data into the cluster. There are no clients that need to be deployed. No special agents. If writing over NFS is it highly available. We have that. Do you need to compress it. We do it automatically. We support random read/write.
  • Another aspect of ease of use is the support for volumes. Only MapR provides the logical volume construct to group content. Policies can be used to enforce user access, admin permissions, data protection polices, etc. Data placement can also be controlled. So I have certain nodes that have SSD drives. Other Hadoop distros require you to have permissions at the individual files or across the entire cluster.MapR’s Hadoop distribution includes support for Volumes - a unique idea that dramatically simplifies the management of your big data. Volumes allow you to efficiently define data management policies. With other Hadoop distributions you either set policies for the entire cluster or individual files. Setting policies by file or directories is too granular and hierarchical. There are just too many files. On the other hand, cluster-wide policies are not granular enough. Slide 2A volume is a logical constructthat lets you collect and refer to data files regardless of where the file is physically located, and with no concern for the relationship between the data files.

Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance Presentation Transcript

  • Shift into High Gear: Dramatically Improve Hadoop and NoSQL Performance M. C. Srivas, CTO/Founder 1
  • MapR History MapR M7 Launches ultra-reliable (struggling with Nutch) Copies MR paper, calls it Hadoop 06 uses Hadoop publishes MapReduce paper MapR founded ‘11 ‘09 07 ‘12 MapR partners with Uses Hadoop Uses Hadoop Fastest NoSQL on the planet DOW drops from 14,300 to 6,800 2 Breaks all world records! ‘13 2500 nodes Largest non-Web installation
  • MapR Distribution for Apache Hadoop Tools Web 2.0 Enterprise Apps Analytics Operations Vertical Apps OLTP Management Hardening, Testing, Integrating, Innovating, Contributing as part of the Apache Open Source Community NoSQL Data Platform Batch 99.999% HA Data Protection Interactive Disaster Recovery Enterprise Integration 3 Real Time Scalability & Performance Multi-Tenancy
  • MapR Distribution for Apache Hadoop  100% Apache Hadoop  With significant enterprisegrade enhancements  Comprehensive management  Industry-standard interfaces  Higher performance 4
  • The Cloud Leaders Pick MapR Amazon EMR is the largest Hadoop provider in revenue and # of clusters Google chose MapR to provide Hadoop on Google Compute Engine MapR partnership with Canonical and Mirantis provides advantages for OpenStack deployments 5
  • What Makes MapR so Reliable? 6
  • How to Make a Cluster Reliable? Make the storage reliable 1. – Recover from disk and node failures Make services reliable 2. – – – Services need to checkpoint their state rapidly Restart failed service, possibly on another node Move check-pointed state to restarted service, using (1) above Do it fast 3. – – Instant-on … (1) and (2) must happen very, very fast Without maintenance windows No compactions (e.g., Cassandra, Apache HBase) • No “anti-entropy” that periodically wipes out the cluster (e.g., Cassandra) • 7
  • Reliability with Commodity Hardware  No NVRAM  Cannot assume special connectivity –  Cannot even assume more than 1 drive per node –  No RAID possible Use replication, but … – –  No separate data paths for "online" vs. replica traffic Cannot assume peers have equal drive sizes Drive on first machine is 10x larger than drive on other? No choice but to replicate for reliability 8
  • Reliability via Replication Replication is easy, right? All we have to do is send the same bits to the master and replica Clients Primary Server Clients Primary Server Replica Normal replication, primary forwards Replica Cassandra-style replication 9
  • But Crashes Occur… When the replica comes back, it is stale – It must be brought up-to-date – Until then, exposed to failure Clients Clients Who re-syncs? Primary Server Primary Server Replica Primary re-syncs replica Replica Replica remains stale until "anti-entropy" process kicked off by administrator 10
  • Unless it’s HDFS …  HDFS solves the problem a third way  Makes everything read-only – static data, trivial to re-sync  Single writer, no reads allowed while writing  File close is the transaction that allows readers to see data – –  Realtime not possible with HDFS – –  Unclosed files are lost Cannot write any further to closed file To make data visible, must close file immediately after writing Too many files is a serious problem with HDFS (a well documented limitation) HDFS therefore cannot do NFS, ever – No “close” in NFS … can lose data any time 11
  • This is the 21st Century…  To support normal apps, need full read/write support  Let's return to the issue: re-sync the replica when it comes back Clients Clients Who re-syncs? Primary Server Primary Server Replica Primary re-syncs replica Replica Replica remains stale until "anti-entropy" process kicked off by administrator 12
  • How Long to Re-sync?  24 TB / server – –  Did you say you want to do this online? – –  @ 1000MB/s = 7 hours Practical terms, @ 200MB/s = 35 hours Throttle re-sync rate to 1/10th 350 hours to re-sync (= 15 days) What is your Mean Time To Data Loss (MTTDL)? – – How long before a double disk failure? A triple disk failure? 13
  • Traditional Solutions Use dual-ported disk to side-step this problem Clients Servers use NVRAM Primary Server Raid-6 with idle spares Replica Dual Ported Disk Array COMMODITY HARDWARE LARGE SCALE CLUSTERING Large Purchase Contracts, 5-year spare-parts plan 14
  • Forget Performance? Hadoop Traditional Architecture function function function App App App function function function data data data function function function data data data function function function data data data function function function data data data function RDBMS data data data data data data daa data data data data data SAN/NAS Geographically dispersed also? 15
  • What MapR Does  Chop the data on each node to 1000s of pieces – –  Not millions of pieces, only 1000s Pieces are called containers Spread replicas of each container across the cluster 16
  • Why Does It Improve Things? 17
  • MapR Replication Example  100-node cluster  Each node holds 1/100th of every node's data  When a server dies, entire cluster re-syncs the dead node's data 18
  • MapR Re-sync Speed  – – –   99 nodes re-sync'ing in parallel Net speed up is about 100x – 99x number of drives 99x number of Ethernet ports 99x CPUs  Each is re-sync'ing 1/100th of the data 19 350 hours vs. 3.5 MTTDL is 100x better
  • MapR Reliability  Mean Time To Data Loss (MTTDL) is far better –  Does not require idle spare drives – –  Rest of cluster has sufficient spare capacity to absorb one node’s data On a 100-node cluster, 1 node’s data == 1% of cluster capacity Utilizes all resources – –  Improves as cluster size increases no wasted “master-slave” nodes no wasted idle spare drives … all spindles put to use Better reliability with less resources – on commodity hardware! 20
  • Why Is This So Difficult? 21
  • MapR's Read-write Replication  Writes are synchronous –  client2 client1 Visible immediately clientN Data is replicated in a "chain" fashion – Utilizes full-duplex network client2  client1 Meta-data is replicated in a "star" manner – Response time better 22 clientN
  • Container Balancing • Servers keep a bunch of containers "ready to go” • Writes get distributed around the cluster    23 As data size increases, writes spread more (like dropping a pebble in a pond) Larger pebbles spread the ripples farther Space balanced by moving idle containers
  • MapR Container Re-sync  MapR is 100% random write – uses distributed transactions   Complete crash On a complete crash, all replicas diverge from each other On recovery, which one should be master? 24
  • MapR Container Re-sync  MapR can detect exactly where replicas diverged – even at 2000 MB/s update rate  Re-sync means – roll-back each to divergence point – roll-forward to converge with chosen master  Done while online – with very little impact on normal operations 25 New master after crash
  • MapR Does Automatic Re-sync Throttling  Re-sync traffic is “secondary”  Each node continuously measures RTT to all its peers  More throttle to slower peers –  Idle system runs at full speed All automatically 26
  • But Where Do Containers Fit In? 27
  • MapR's Containers Files/directories are sharded and placed into containers  Each container contains  Directories & files  Data blocks  28 Replicated on servers  Containers are 1632 GB segments of disk, placed on nodes Automatically managed
  • MapR’s Architectural Params HDFS 'block' Unit of I/O  – 4K/8K (8K in MapR) 10^3 map-red i/o KB  Unit of Chunking (a mapreduce split) –  10-100's of megabytes – Unit of Resync (a replica) – – – – 10-100's of gigabytes container in MapR automatically managed 29 10^9 admin Unit of Administration (snap, repl, mirror, quota, backup) –  10^6 resync 1 gigabyte - 1000's of terabytes volume in MapR what data is affected by my missing blocks?
  • Where & How Does MapR Exploit This Unique Advantage? 30
  • MapR's No NameNode HATM Architecture HDFS Federation MapR (Distributed Metadata) NAS appliance E E F E F E F C E D F B NameNode NameNode NameNode NameNode NameNode NameNode A DataNode DataNode DataNode C D E D B B C E B A DataNode F A A D C F B F DataNode DataNode • HA w/automatic failover • Instant cluster restart • Up to 1T files (> 5000x advantage) • 10-20x higher performance • 100% commodity hardware • Multiple single points of failure • Limited to 50-200 million files • Performance bottleneck • Commercial NAS required 31
  • Relative Performance and Scale Other Advantage Rate (creates/s) 14-16K 335-360 40x Scale (files) 6B 1.3M 4615x File creates/s MapR 18000 Other distribution 0 MapR distribution 16000 400 350 300 250 200 150 100 50 0 0.5 1 1.5 Files (M) File creates/s 14000 12000 Benchmark: 100-byte file creates 10000 8000 6000 4000 2000 Other distribution 0 0 0 1000 100 2000 200 3000 400 Files (M) 4000 600 5000 800 32 6000 1000 Hardware: 10 nodes, each 2 x 4 cores 24 GB RAM 12 x 1 TB 7200 RPM 1 GigE NIC
  • Where & How Does MapR Exploit This Unique Advantage? 33
  • MapR’s NFS Allows Direct Deposit Web Server Random Read/Write Compression Database Server Distributed HA … Application Server Connectors not needed No extra scripts or clusters to deploy and maintain 34
  • Where & How Does MapR Exploit This Unique Advantage? 35
  • MapR Volumes & Snapshots /projects /tahoe /yosemite /user /msmith /bjohnson Volumes dramatically simplify the management of Big Data • • • • • • Replication factor Scheduled mirroring Scheduled snapshots Data placement control User access and tracking Administrative permissions 100K volumes are OK, create as many as desired! 36
  • Where & How Does MapR Exploit This Unique Advantage? 37
  • MapR M7 Tables  Binary compatible with Apache HBase – –  no recompilation needed to access M7 tables Just set CLASSPATH M7 tables accessed via pathname – – – openTable( "hello") … uses HBase openTable( "/hello") … uses M7 openTable( "/user/srivas/hello") … uses M7 38
  • M7 Tables  M7 tables integrated into storage –  Unlimited number of tables –  Apache HBase is typically 10-20 tables (max 100) No compactions –  always available on every node, zero admin update in place Instant-on – Zero recovery time  5-10x better performance  Consistent low latency – MapR M7 At 95th & 99th percentiles 39
  • YCSB 50-50 Mix (throughput) M7 (ops/sec) Other HBase 0.94.9 latency (usec) Other HBase 0.94.9 (ops/sec) 40 M7 Latency (usec)
  • YCSB 50-50 mix (latency) Other latency (usec) 41 M7 Latency (usec)
  • Where & How Does MapR Exploit This Unique Advantage? 43
  • MapR Makes Hadoop Truly HA  ALL Hadoop components have High Availability – e.g. YARN  ApplicationMaster (old JT) and TaskTracker record their state in MapR  On node-failure, AM recovers its state from MapR –  All jobs resume from where they were –  Works even if entire cluster restarted Only from MapR Allows preemption – – MapR can preempt any job, without losing its progress ExpressLane™ feature in MapR exploits it 44
  • MapR Does MapReduce (Fast!) TeraSort Record 1 TB in 54 seconds 1003 nodes MinuteSort Record 1.5 TB in 59 seconds 2103 nodes 45
  • MapR Does MapReduce (Faster!) TeraSort Record 1 TB in 54 seconds 1003 nodes MinuteSort Record 1.65 1.5 TB in 59 seconds 2103 nodes 300 46
  • YOUR CODE Can Exploit MapR’s Unique Advantages  ALL your code can easily be scale-out HA – – Save service-state in MapR Save data in MapR Use Zookeeper to notice service failure  Restart anywhere, data+state will move there automatically  That’s what we did!   Only from MapR: HA for Impala, Hive, Oozie, Storm, MySQL, SOLR/Lucene, HCatalog, Kafka, … 47
  • MapR: Unlimited Hadoop Reliable Compute Dependable Storage Build cluster brick by brick, one node at a time    Use commodity hardware at rock-bottom prices Get enterprise-class reliability: instantrestart, snapshots, mirrors, no single-point-of-failure, … Export via NFS, ODBC, Hadoop and other standard protocols 48