• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
 

NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications

on

  • 3,018 views

Slides from: http://www.meetup.com/Hadoop-NYC/events/34411232/ ...

Slides from: http://www.meetup.com/Hadoop-NYC/events/34411232/

There are a number of assumptions that come with using standard Hadoop that are based on Hadoop's initial architecture. Many of these assumptions can be relaxed with more advanced architectures such as those provided by MapR. These changes in assumptions have ripple effects throughout the system architecture. This is significant because many systems like Mahout provide multiple implementations of various algorithms with very different performance and scaling implications.

I will describe several case studies and use these examples to show how these changes can simplify systems or, in some cases, make certain classes of programs run an order of magnitude faster.

About the speaker: Ted Dunning - Chief Application Architect (MapR)

Ted has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the most advanced identity theft detection system on the planet, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 15 issued and 15 pending patents and contributes to several Apache open source projects including Hadoop, Zookeeper and Hbase. He is also a committer for Apache Mahout. Ted earned a BS degree in electrical engineering from the University of Colorado; a MS degree in computer science from New Mexico State University; and a Ph.D. in computing science from Sheffield University in the United Kingdom. Ted also bought the drinks at one of the very first Hadoop User Group meetings.

Statistics

Views

Total Views
3,018
Views on SlideShare
2,652
Embed Views
366

Actions

Likes
1
Downloads
96
Comments
0

5 Embeds 366

http://www.pulsepoint.com 351
http://www.linkedin.com 7
http://beta.pulsepoint.com 5
http://www.slashdocs.com 2
http://stage.pulsepoint.karaha.analogmethod.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Constant time implies constantfactor of growth. Thus the accumulation of all of history before 10 time units ago is less than half the accumulation in the last 10 units alone. This is true at all time.
  • Startups use this fact to their advantage and completely change everything to allow time-efficient development initially with conversion to computer-efficient systems later.
  • Here the later history is shown after the initial exponential growth phase. This changes the economics of the company dramatically.
  • The startup can throw away history because it is so small. That means that the startup has almost no compatibility requirement because the data lost due to lack of compatibility is a small fraction of the total data.
  • A large enterprise cannot do that. They have to have access to the old data and have to share between old data and Hadoop accessible data.This doesn’t have to happen with the proof of concept level, but it really must happen when hadoop first goes to production.
  • But stock Hadoop does not handle this well.
  • This is because Hadoop and other data silos have different foundations. What is worse, there is a semantic wall that separates HDFS from normal resources.
  • Here is a picture that shows how MapR can replace the foundation and provide compatibility. Of course, MapR provide much more than just the base, but the foundation is what provides the fundamental limitation or lack of limit in MapR’s case.

NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications Presentation Transcript

  • MapR, Architecture, Philosophy and Applications
    NY HUG – October 2011
  • Outline
    Architecture (MapR)
    Philosophy
    Architectural (Machine learning)
  • Map-Reduce, the Original Mission
    Shuffle
    Input
    Output
  • Bottlenecks and Issues
    Read-only files
    Many copies in I/O path
    Shuffle based on HTTP
    Can’t use new technologies
    Eats file descriptors
    Spills go to local file space
    Bad for skewed distribution of sizes
  • MapR Areas of Development
  • MapR Improvements
    Faster file system
    Fewer copies
    Multiple NICS
    No file descriptor or page-buf competition
    Faster map-reduce
    Uses distributed file system
    Direct RPC to receiver
    Very wide merges
  • MapR Innovations
    Volumes
    Distributed management
    Data placement
    Read/write random access file system
    Allows distributed meta-data
    Improved scaling
    Enables NFS access
    Application-level NIC bonding
    Transactionally correct snapshots and mirrors
  • MapR'sContainers
    Files/directories are sharded into blocks, whichare placed into mini NNs (containers ) on disks
    • Each container contains
    • Directories & files
    • Data blocks
    • Replicated on servers
    • No need to manage directly
    Containers are 16-32 GB segments of disk, placed on nodes
  • MapR'sContainers
    • Each container has a replication chain
    • Updates are transactional
    • Failures are handled by rearranging replication
  • MapR'sContainers
    Files/directories are sharded into blocks, whichare placed into mini NNs (containers ) on disks
    • Each container contains
    • Directories & files
    • Data blocks
    • Replicated on servers
    • No need to manage directly
    Containers are 16-32 GB segments of disk, placed on nodes
  • Container locations and replication
    CLDB
    N1, N2
    N1
    N3, N2
    N1, N2
    N2
    N1, N3
    N3, N2
    N3
    Container location database (CLDB) keeps track of nodes hosting each container and replication chain order
  • MapR Scaling
    Containers represent 16 - 32GB of data
    • Each can hold up to 1 Billion files and directories
    • 100M containers = ~ 2 Exabytes (a very large cluster)
    250 bytes DRAM to cache a container
    • 25GB to cache all containers for 2EB cluster
    • But not necessary, can page to disk
    • Typical large 10PB cluster needs 2GB
    Container-reports are 100x - 1000x < HDFS block-reports
    • Serve 100x more data-nodes
    • Increase container size to 64G to serve 4EB cluster
    • Map/reduce not affected
  • MapR's Streaming Performance
    11 x 7200rpm SATA
    11 x 15Krpm SAS
    MB
    per
    sec
    Higher is better
    Tests: i. 16 streams x 120GB ii. 2000 streams x 1GB
  • Terasort on MapR
    10+1 nodes: 8 core, 24GB DRAM, 11 x 1TB SATA 7200 rpm
    Elapsed time (mins)
    Lower is better
  • HBase on MapR
    YCSB Random Read with 1 billion 1K records
    10+1 node cluster: 8 core, 24GB DRAM, 11 x 1TB 7200 RPM
    Recordspersecond
    Higher is better
  • Small Files (Apache Hadoop, 10 nodes)
    Out of box
    Op: - create file - write 100 bytes - close
    Notes:
    - NN not replicated
    - NN uses 20G DRAM
    - DN uses 2G DRAM
    Tuned
    Rate (files/sec)
    # of files (m)
  • MUCH faster for some operations
    Same 10 nodes …
    Create
    Rate
    # of files (millions)
  • What MapR is not
    Volumes != federation
    MapR supports > 10,000 volumes all with independent placement and defaults
    Volumes support snapshots and mirroring
    NFS != FUSE
    Checksum and compress at gateway
    IP fail-over
    Read/write/update semantics at full speed
    MapR != maprfs
  • Philosophy
  • Physics of startup companies
  • For startups
    History is always small
    The future is huge
    Must adopt new technology to survive
    Compatibility is not as important
    In fact, incompatibility is assumed
  • Physics of large companies
    Absolute growth still very large
    Startup phase
  • For large businesses
    Present state is always large
    Relative growth is much smaller
    Absolute growth rate can be very large
    Must adopt new technology to survive
    Cautiously!
    But must integrate technology with legacy
    Compatibility is crucial
  • The startup technology picture
    No compatibility requirement
    Old computers
    and software
    Expected hardware
    and software growth
    Current computers
    and software
  • The large enterprise picture
    Must work
    together
    ?
    Proof of concept Hadoop cluster
    Current hardware
    and software
    Long-term Hadoop cluster
  • What does this mean?
    Hadoop is very, very good at streaming through things in batch jobs
    Hbase is good at persisting data in very write-heavy workloads
    Unfortunately, the foundation of both systems is HDFS which does not export or import well
  • Narrow Foundations
    Pig
    Hive
    Big data is heavy and expensive to move.
    Web Services
    Sequential File Processing
    OLAP
    OLTP
    Map/
    Reduce
    Hbase
    RDBMS
    NAS
    HDFS
  • Narrow Foundations
    Because big data has inertia, it is difficult to move
    It costs time to move
    It costs reliability because of more moving parts
    The result is many duplicate copies
  • One Possible Answer
    Widen the foundation
    Use standard communication protocols
    Allow conventional processing to share with parallel processing
  • Broad Foundation
    Pig
    Hive
    Web Services
    Sequential File Processing
    OLAP
    OLTP
    Map/
    Reduce
    Hbase
    RDBMS
    NAS
    HDFS
    MapR
  • New Capabilities
  • Export to the world
    NFS
    Server
    NFS
    Server
    NFS
    Server
    NFS
    Server
    NFS
    Client
  • Local server
    Client
    Application
    NFS
    Server
    Cluster Nodes
  • Universal export to self
    Cluster Nodes
    Cluster
    Node
    Task
    NFS
    Server
  • Cluster
    Node
    Task
    NFS
    Server
    Cluster
    Node
    Task
    Cluster
    Node
    Task
    NFS
    Server
    NFS
    Server
    Nodes are identical
  • Application architecture
    High performance map-reduce is nice
    But algorithmic flexibility is even nicer
  • Hybrid model flow
    Map-reduce
    Map-reduce
    Feature extraction
    and
    down sampling
    Down
    stream
    modeling
    Deployed
    Model
    ??
    SVD
    (PageRank)
    (spectral)
  • Hybrid model flow
    Feature extraction
    and
    down sampling
    Down
    stream
    modeling
    Deployed
    Model
    Sequential
    Map-reduce
    SVD
    (PageRank)
    (spectral)
  • Shardedtext indexing
    Mapper assigns document to shard
    Shard is usually hash of document id
    Reducer indexes all documents for a shard
    Indexes created on local disk
    On success, copy index to DFS
    On failure, delete local files
    Must avoid directory collisions
    can’t use shard id!
    Must manage and reclaim local disk space
  • Sharded text Indexing
    Index text to local disk and then copy index to distributed file store
    Assign documents to shards
    Map
    Reducer
    Clustered index storage
    Input documents
    Copy to local disk typically required before index can be loaded
    Local
    disk
    Search
    Engine
    Local
    disk
  • Conventional data flow
    Failure of search engine requires another download of the index from clustered storage.
    Map
    Failure of a reducer causes garbage to accumulate in the local disk
    Reducer
    Clustered index storage
    Input documents
    Local
    disk
    Search
    Engine
    Local
    disk
  • Simplified NFS data flows
    Index to task work directory via NFS
    Map
    Reducer
    Search
    Engine
    Input documents
    Clustered index storage
    Failure of a reducer is cleaned up by map-reduce framework
    Search engine reads mirrored index directly.
  • Simplified NFS data flows
    Search
    Engine
    Mirroring allows exact placement of index data
    Map
    Reducer
    Input documents
    Search
    Engine
    Aribitrary levels of replication also possible
    Mirrors
  • K-means
    Classic E-M based algorithm
    Given cluster centroids,
    Assign each data point to nearest centroid
    Accumulate new centroids
    Rinse, lather, repeat
  • K-means, the movie
    Centroids
    Assign
    to
    Nearest
    centroid
    I
    n
    p
    u
    t
    Aggregate
    new
    centroids
  • Parallel Stochastic Gradient Descent
    Model
    Train
    sub
    model
    I
    n
    p
    u
    t
    Average
    models
  • VariationalDirichlet Assignment
    Model
    Gather
    sufficient
    statistics
    I
    n
    p
    u
    t
    Update
    model
  • Old tricks, new dogs
    Mapper
    Assign point to cluster
    Emit cluster id, (1, point)
    Combiner and reducer
    Sum counts, weighted sum of points
    Emit cluster id, (n, sum/n)
    Output to HDFS
    Read from local disk from distributed cache
    Read from
    HDFS to local disk by distributed cache
    Written by map-reduce
  • Old tricks, new dogs
    Mapper
    Assign point to cluster
    Emit cluster id, (1, point)
    Combiner and reducer
    Sum counts, weighted sum of points
    Emit cluster id, (n, sum/n)
    Output to HDFS
    Read from
    NFS
    Written by map-reduce
    MapR FS
  • Poor man’s Pregel
    Mapper
    Lines in bold can use conventional I/O via NFS
    while not done:
    read and accumulate input models
    for each input:
    accumulate model
    write model
    synchronize
    reset input format
    emit summary
    51
  • Click modeling architecture
    Map-reduce
    Side-data
    Now via NFS
    Feature
    extraction
    and
    down
    sampling
    I
    n
    p
    u
    t
    Data
    join
    Sequential
    SGD
    Learning
  • Click modeling architecture
    Map-reduce
    Map-reduce
    Side-data
    Map-reduce cooperates with NFS
    Sequential
    SGD
    Learning
    Feature
    extraction
    and
    down
    sampling
    Sequential
    SGD
    Learning
    I
    n
    p
    u
    t
    Data
    join
    Sequential
    SGD
    Learning
    Sequential
    SGD
    Learning
  • Trivial visualization interface
    Map-reduce output is visible via NFS
    Legacy visualization just works
    $ R
    > x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”)
    > plot(error ~ t, x)
    > q(save=‘n’)
  • Conclusions
    We used to know all this
    Tab completion used to work
    5 years of work-arounds have clouded our memories
    We just have to remember the future