• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hadoop - Simple. Scalable.
 

Hadoop - Simple. Scalable.

on

  • 2,671 views

 

Statistics

Views

Total Views
2,671
Views on SlideShare
2,666
Embed Views
5

Actions

Likes
0
Downloads
27
Comments
0

1 Embed 5

http://www.slideshare.net 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hadoop - Simple. Scalable. Hadoop - Simple. Scalable. Presentation Transcript

    • Hadoop Simple. Scalable.
    • @markgunnels mark@catamorphiclabs.com
    • Java. Clojure. Ruby. Cloudera Certified
    • posscon.org April 15, 16, and 17
    • Agenda Overview Massively Large Data Sets and the problems therein Distributed File System MapReduce Pig
    • Overview
    • Doug Cutting Genius
    • Favorite Hadoop Story New York Times
    • 4 Terabytes of Source Articles.
    • 24 Hours.
    • 5.5 Terabytes of PDFs.
    • Did it again.
    • $240.
    • Infoporn from Yahoo 73 hours 490 TB Shuffling 280 TB Output 4000 Nodes 16 PB Disk Space 32K Cores 64 TB RAM
    • Hadoop solves...
    • Analyzing Massively Large Datasets
    • Two Problems You have to distribute.
    • Data Storage Capacity has increased rapidly beyond read speeds. Datasets won't fit on one disk. Tolerate node failure.
    • Data Analysis Combine data from many machines. Tolerate node failure.
    • How Hadoop solves these problems.
    • Send Code to Data. Not Data to Code.
    • Data Storage HDFS
    • Name Node. Data Nodes. Master - Slave Relationship
    • Shard massive files across multiple machines. MB, GB, and TB
    • Tolerant of Node Failure Files replicated across at least 3 nodes.
    • HDFS behaves like a normal file system. No true appends yet.
    • Demonstration.
    • Data Analysis MapReduce
    • Job Tracker. Task Nodes. Master - Slave Relationship.
    • map
    • Demonstration
    • pmap
    • Demonstration
    • reduce
    • Demonstration
    • (reduce (pmap))
    • Demonstration.
    • MapReduce Java
    • Nobody likes it. :-)
    • MapReduce Ruby. Python. Unix Utilities.
    • MapReduce Clojure
    • Hadoop Ecosystem Pigkeeper. Hive. Cascading.
    • Pig
    • HBase