• Save
Seattle Scalability Meetup - Ted Dunning - MapR
Upcoming SlideShare
Loading in...5
×
 

Seattle Scalability Meetup - Ted Dunning - MapR

on

  • 1,778 views

MapR is an amazing new distributed filesystem modeled after Hadoop. It maintains API compatibility with Hadoop, but far exceeds it in performance, manageability, and more. ...

MapR is an amazing new distributed filesystem modeled after Hadoop. It maintains API compatibility with Hadoop, but far exceeds it in performance, manageability, and more.

/* Ted's MapR meeting slides incorporated here */

Statistics

Views

Total Views
1,778
Views on SlideShare
1,755
Embed Views
23

Actions

Likes
2
Downloads
1
Comments
0

4 Embeds 23

http://paper.li 10
http://managerv3.3rdi-technology.com 9
http://a0.twimg.com 2
http://www.linkedin.com 2

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NoDerivs LicenseCC Attribution-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Constant time implies constantfactor of growth. Thus the accumulation of all of history before 10 time units ago is less than half the accumulation in the last 10 units alone. This is true at all time.
  • Startups use this fact to their advantage and completely change everything to allow time-efficient development initially with conversion to computer-efficient systems later.
  • Here the later history is shown after the initial exponential growth phase. This changes the economics of the company dramatically.
  • The startup can throw away history because it is so small. That means that the startup has almost no compatibility requirement because the data lost due to lack of compatibility is a small fraction of the total data.
  • A large enterprise cannot do that. They have to have access to the old data and have to share between old data and Hadoop accessible data.This doesn’t have to happen with the proof of concept level, but it really must happen when hadoop first goes to production.
  • But stock Hadoop does not handle this well.
  • This is because Hadoop and other data silos have different foundations. What is worse, there is a semantic wall that separates HDFS from normal resources.
  • Here is a picture that shows how MapR can replace the foundation and provide compatibility. Of course, MapR provide much more than just the base, but the foundation is what provides the fundamental limitation or lack of limit in MapR’s case.

Seattle Scalability Meetup - Ted Dunning - MapR Seattle Scalability Meetup - Ted Dunning - MapR Presentation Transcript

  • Seattle Monthly Hadoop / Scalability / NoSQLMeetup Ted Dunning, MapR..
  • Agenda• Lightning talks / community announcements• Main Speaker• Bier @ Feierabend - 422 Yale Ave North• Hashtags #Seattle #Hadoop
  • Fast & Frugal: Running a Lean Startup with AWS – Oct 27th 10am-2pmhttp://aws.amazon.com/about-aws/events/
  • Seattle AWS User Group November 9th, 2011 – 6:30 -9pm• November were going to hear from Amy Woodward from EngineYard about keeping your systems live through outages and other problems using EngineYard atop AWS. Come check out this great talk and learn a thing or three about EngineYard& keeping high availability for your systems!• http://www.nwcloud.org
  • www.mapr.com• MapR is an amazing new distributed filesystem modeled after Hadoop. It maintains API compatibility with Hadoop, but far exceeds it in performance, manageability, and more.
  • MapR, Scaling, Machine Learning
  • Outline• Philosophy• Architecture• Applications
  • Physics of startup companies
  • For startups• History is always small• The future is huge• Must adopt new technology to survive• Compatibility is not as important – In fact, incompatibility is assumed
  • Physics of large companies Absolute growth still very large Startup phase
  • For large businesses• Present state is always large• Relative growth is much smaller• Absolute growth rate can be very large• Must adopt new technology to survive – Cautiously! – But must integrate technology with legacy• Compatibility is crucial
  • The startup technology picture No compatibility requirementOld computers and software Expected hardware and software growth Current computers and software
  • The large enterprise picture Must work together ? Current hardware and software Proof of concept Hadoop cluster Long-term Hadoop cluster
  • What does this mean?• Hadoop is very, very good at streaming through things in batch jobs• Hbase is good at persisting data in very write- heavy workloads• Unfortunately, the foundation of both systems is HDFS which does not export or import well
  • Narrow Foundations Big data is Pig Hive Web Services and heavy expensive to move. Sequential File Map/OLAP OLTP Hbase Processing Reduce RDBMS NAS HDFS
  • Narrow Foundations• Because big data has inertia, it is difficult to move – It costs time to move – It costs reliability because of more moving parts• The result is many duplicate copies
  • One Possible Answer• Widen the foundation• Use standard communication protocols• Allow conventional processing to share with parallel processing
  • Broad Foundation Pig Hive Web Services Sequential File Map/OLAP OLTP Hbase Processing Reduce RDBMS NAS HDFS MapR
  • Broad Foundation• Having a broad foundation allows many kinds of computation to work together• It is no longer necessary to throw data over a wall• Performance much higher for map-reduce• Enterprise grade feature sets such as snapshots and mirrors can be integrated• Operations more familiar to admin staff
  • Map-ReduceInput Output Map function Reduce function Shuffle
  • Map-reduce key details• User supplies f1 (map) and f2 (reduce) – Both are pure functions, no side effect• Framework supplies input, shuffle, output• Framework will re-run f1 and f2 on failure• Redundant task completion is OK
  • Map-ReduceInput Output
  • Map-Reduce f1 Local f2 DiskInput Output f1 Local f2 Disk f1
  • Example – WordCount• Mapper – read line, tokenize into words – emit (word, 1)• Reducer – read (word, [k1, … , kn]) – Emit (word, Σki)
  • Example – Map Tiles• Input is set of objects – Roads (polyline) – Towns (polygon) – Lakes (polygon)• Output is set of map-tiles – Graphic image of part of map
  • Bottlenecks and Issues• Read-only files• Many copies in I/O path• Shuffle based on HTTP – Can’t use new technologies – Eats file descriptors• Spills go to local file space – Bad for skewed distribution of sizes
  • MapR Areas of Development HBase Map Reduce Ecosystem Storage Management Services
  • MapR Improvements• Faster file system – Fewer copies – Multiple NICS – No file descriptor or page-buf competition• Faster map-reduce – Uses distributed file system – Direct RPC to receiver – Very wide merges
  • MapR Innovations• Volumes – Distributed management – Data placement• Read/write random access file system – Allows distributed meta-data – Improved scaling – Enables NFS access• Application-level NIC bonding• Transactionally correct snapshots and mirrors
  • MapRsContainers Files/directories are sharded into blocks, which are placed into mini NNs (containers ) on disks  Each container contains  Directories & files  Data blocks  Replicated on serversContainers are 16-  No need to manage32 GB segments of directlydisk, placed onnodes
  • MapRsContainers  Each container has a replication chain  Updates are transactional  Failures are handled by rearranging replication
  • Container locations and replication N1, N2 N1 N3, N2 N1, N2 N1, N3 N2 N3, N2 CLDB N3 Container location database (CLDB) keeps track of nodes hosting each container and replication chain order
  • MapR ScalingContainers represent 16 - 32GB of data  Each can hold up to 1 Billion files and directories  100M containers = ~ 2 Exabytes (a very large cluster)250 bytes DRAM to cache a container  25GB to cache all containers for 2EB cluster But not necessary, can page to disk  Typical large 10PB cluster needs 2GBContainer-reports are 100x - 1000x < HDFS block-reports  Serve 100x more data-nodes  Increase container size to 64G to serve 4EB cluster  Map/reduce not affected
  • MapRs Streaming Performance 2250 2250 11 x 7200rpm SATA 11 x 15Krpm SAS 2000 2000 1750 1750 1500 1500 1250 1250 Hardware MapR 1000 1000MB Hadoop 750 750persec 500 500 250 250 0 0 Read Write Read Write Higher is better Tests: i. 16 streams x 120GB ii. 2000 streams x 1GB
  • Terasort on MapR 10+1 nodes: 8 core, 24GB DRAM, 11 x 1TB SATA 7200 rpm 60 300 50 250 40 200Elapsed 150 MapR 30time Hadoop(mins) 20 100 10 50 0 0 1.0 TB 3.5 TB Lower is better
  • HBase on MapR YCSB Random Read with 1 billion 1K records 10+1 node cluster: 8 core, 24GB DRAM, 11 x 1TB 7200 RPM 25000 20000Records 15000 per MapRsecond 10000 Apache 5000 0 Zipfian Uniform Higher is better
  • Small Files (Apache Hadoop, 10 nodes) Out of box Op: - create fileRate (files/sec) - write 100 bytes Tuned - close Notes: - NN not replicated - NN uses 20G DRAM - DN uses 2G DRAM # of files (m)
  • MUCH faster for some operationsSame 10 nodes …Create Rate # of files (millions)
  • What MapR is not• Volumes != federation – MapR supports > 10,000 volumes all with independent placement and defaults – Volumes support snapshots and mirroring• NFS != FUSE – Checksum and compress at gateway – IP fail-over – Read/write/update semantics at full speed• MapR != maprfs
  • Not Your Father’s NFS• Multiple architectures possible• Export to the world – NFS gateway runs on selected gateway hosts• Local server – NFS gateway runs on local host – Enables local compression and check summing• Export to self – NFS gateway runs on all data nodes, mounted from localhost
  • Export to the world NFS NFS Server NFS Server NFS Server NFS ServerClient
  • Local server Application NFS ServerClient Cluster Nodes
  • Universal export to self Cluster Nodes Task NFS Cluster Server Node
  • Nodes are identical Task Task NFS NFSCluster ServerNode Cluster Server Node Task NFS Cluster Server Node
  • Application architecture• High performance map-reduce is nice• But algorithmic flexibility is even nicer
  • Sharded textIndex text to local disk Indexing Assign documents to shards and then copy index to distributed file store Clustered Reducer index storage Input Mapdocuments Copy to local disk Local typically disk required before Local Search index can be loaded disk Engine
  • Shardedtext indexing• Mapper assigns document to shard – Shard is usually hash of document id• Reducer indexes all documents for a shard – Indexes created on local disk – On success, copy index to DFS – On failure, delete local files• Must avoid directory collisions – can’t use shard id!• Must manage and reclaim local disk space
  • Conventional data flow Failure of search engine requires Failure of a reducer another download causes garbage to of the index from accumulate in the clustered storage. Clustered local disk Reducer index storage Input Mapdocuments Local disk Local Search disk Engine
  • Simplified NFS data flows Search Engine Reducer Input Map Clustereddocuments index storage Failure of a reducer Search engine is cleaned up by reads mirrored map-reduce index directly. framework
  • Simplified NFS data flows Search Mirroring allows Engine exact placement of index data Reducer Input Mapdocuments Search Engine Aribitrary levels of replication also possible Mirrors
  • How about another one?
  • K-means• Classic E-M based algorithm• Given cluster centroids, – Assign each data point to nearest centroid – Accumulate new centroids – Rinse, lather, repeat
  • K-means, the movie CentroidsIn Assign Aggregatep to newu Nearest centroidst centroid
  • But …
  • Parallel Stochastic Gradient Descent Model I n Train Average p sub models u model t
  • VariationalDirichlet Assignment Model I n Gather Update p sufficient model u statistics t
  • Old tricks, new dogs Read from local disk• Mapper from distributed cache – Assign point to cluster Read from – Emit cluster id, (1, point) HDFS to local disk• Combiner and reducer by distributed cache – Sum counts, weighted sum of points – Emit cluster id, (n, sum/n) Written by• Output to HDFS map-reduce
  • Old tricks, new dogs• Mapper – Assign point to cluster Read from – Emit cluster id, (1, point) NFS• Combiner and reducer – Sum counts, weighted sum of points – Emit cluster id, (n, sum/n) Written by map-reduce• Output to HDFS MapR FS
  • Poor man’s Pregel• Mapper while not done: read and accumulate input models for each input: accumulate model write model synchronize reset input format emit summary• Lines in bold can use conventional I/O via NFS 60
  • Click modeling architecture Side-data Now via NFSI Featuren Sequential extraction Datap SGD and joinu Learning downt sampling Map-reduce
  • Click modeling architecture Side-data Map-reduce cooperates Sequential with NFS SGD Learning Sequential SGDI Learning Featuren Sequential extraction Datap SGD and joinu Learning downt sampling Sequential SGD Learning Map-reduce Map-reduce
  • And another…
  • Hybrid model flowFeature extraction and Down down sampling stream modeling Map-reduce Deployed Map-reduce Model SVD (PageRank) (spectral) ??
  • Hybrid model flowFeature extraction and Down down sampling stream modeling Deployed Model SVD (PageRank) (spectral) Sequential Map-reduce
  • And visualization…
  • Trivial visualization interface• Map-reduce output is visible via NFS $R > x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”) > plot(error ~ t, x) > q(save=„n‟)• Legacy visualization just works
  • Conclusions• We used to know all this• Tab completion used to work• 5 years of work-arounds have clouded our memories• We just have to remember the future