Ted Dunning - Whither Hadoop
Upcoming SlideShare
Loading in...5

Ted Dunning - Whither Hadoop






Total Views
Views on SlideShare
Embed Views



1 Embed 4

http://managerv3.3rdi-technology.com 4



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • ----- Meeting Notes (3/13/12 19:23) -----

Ted Dunning - Whither Hadoop Ted Dunning - Whither Hadoop Presentation Transcript

  • Whither Hadoop? (but not wither)
  • Questions I Have Been Asked• I have a system that currently has 10 billion files• Files range from 10K to 100MB and average about 1MB• Access is from a legacy parallel app currently running on thousands of machines?• I want to expand to 200 billion files of about the same size range
  • World-wide Grid• I have 100,000 machines spread across 10 data centers around the world• I have a current grid architecture local to each data center (it works OK)• But I want to load-level by mirroring data from data center to data center• And I want to run both legacy and Hadoop programs
  • Little Records in Real Time• I have a high rate stream of incoming small records• I need to have several years of data on-line – about 10-30 PB• But I have to have aggregates and alarms based in real-time – maximum response should be less than 10s end to end – typical response should be much faster (200ms)
  • Model Deployment• I am building recommendation models off-line and would like to scale that using Hadoop• But these models need to be deployed transactionally to machines around the world• And I need to keep reference copies of every model and the exact data it was trained on• But I can’t afford to keep many copies of my entire training data
  • Video Repository• I have video output from thousands of sources indexed by source and time• Each source wanders around a lot, I know where• Each source has about 100,000 video snippets, each of which is 10-100MB• I need to run map-reduce-like programs on all video for particular locations• 10,000 x 100,000 x 100MB = 10^17 B = 100PB• 10,000 x 100,000 = 10^9 files
  • And then they say• Oh, yeah… we need to have backups, too• Say, every 10 minutes for the last day or so• And every hour for the last month• You know… like Time Machine on my Mac
  • So what’s a poor guy to do?
  • Scenario 1• 10 billion files x 1MB average – 100 federated name nodes?• Legacy code access• Expand to 200 billion files – 2,000 name nodes?! Let’s not talk about HA• Or 400 node MapR – no special adaptation
  • World Wide Grid• 10 x 10,000 machines + legacy code• 10 x 10,000 or 10 x 10 x 1000 node cluster• NFS for legacy apps• Scheduled mirrors move data at end of shift
  • Little Records in Real Time• Real-time + 10-30 PB• Storm with pluggable services• Or Ultra-messaging on the commercial side• Key requirement is that real-time processors need distributed, mutable state storage
  • Model Deployment• Snapshots – of training data at start of training – of models at the end of training• Mirrors and data placement allow precise deployment• Redundant snapshots require no space
  • Video Repository• Media repositories typically have low average bandwidth• This allows very high density machines – 36 x 3 TB = 100TB per 4U = 25TB net per 4U• 100PB = 4,000 nodes = 400 racks• Can be organized as one cluster or several pods
  • And Backups?• Snapshots can be scheduled at high frequency• Expiration allows complex retention schedules• You know… like Time Machine on my Mac• (and off-site backups work as well)
  • How does this work?
  • MapR Areas of Development HBase Map Reduce Ecosystem Storage Management Services
  • MapR Improvements• Faster file system – Fewer copies – Multiple NICS – No file descriptor or page-buf competition• Faster map-reduce – Uses distributed file system – Direct RPC to receiver – Very wide merges
  • MapR Innovations• Volumes – Distributed management – Data placement• Read/write random access file system – Allows distributed meta-data – Improved scaling – Enables NFS access• Application-level NIC bonding• Transactionally correct snapshots and mirrors
  • MapRs Containers Files/directories are sharded into blocks, which are placed into mini NNs (containers ) on disks  Each container contains  Directories & files  Data blocks  Replicated on serversContainers are 16-  No need to manage32 GB segments of directlydisk, placed onnodes
  • MapRs Containers  Each container has a replication chain  Updates are transactional  Failures are handled by rearranging replication
  • Container locations and replication N1, N2 N1 N3, N2 N1, N2 N1, N3 N2 N3, N2 CLDB N3 Container location database (CLDB) keeps track of nodes hosting each container and replication chain order
  • MapR ScalingContainers represent 16 - 32GB of data  Each can hold up to 1 Billion files and directories  100M containers = ~ 2 Exabytes (a very large cluster)250 bytes DRAM to cache a container  25GB to cache all containers for 2EB cluster But not necessary, can page to disk  Typical large 10PB cluster needs 2GBContainer-reports are 100x - 1000x < HDFS block-reports  Serve 100x more data-nodes  Increase container size to 64G to serve 4EB cluster  Map/reduce not affected
  • Export to the world NFS NFS Server NFS Server NFS Server NFS ServerClient
  • Local server Application NFS ServerClient Cluster Nodes
  • Universal export to self Cluster Nodes Task NFS Cluster Server Node
  • Nodes are identical Task Task NFS NFSCluster ServerNode Cluster Server Node Task NFS Cluster Server Node