0
Hadoop Introduction   Background && Installation && Hello world && related
Outline•   Background•   Hello world•   Installation•   Related12/20/12           2
Background• Why Hadoop?   • Accessible: AWS   • Robust : handle most such failures   • Scalable: linearly   • Simple: 1 ==...
Background: History• Apache Top Project: Doug Cutting• Lucence -> Nutch -> Hadoop(2004)   • Yahoo (1w)   • Facebook (Hive,...
Background• Comparing SQL database and Hadoop   • Structure:      • SQL(structure data, Specific Pattern)      • Hadoop(Ke...
Background – Understanding• Word Count     • File Size ++ , Memory Leak     • Disk-Hash Table (More complex)     • Distrib...
Hello World: Word Count• Two Phase:     • Mapping: 获取输入数据,并将其装载到 mapper 中     • Reducing: 处理来自 mapper 的所有输出,产生最终结果。•   1.1...
Hello World• mapper.py• Reducer.py12/20/12       8
Installation• Mode:   • 单机模式( default)   • 伪分布模式 推荐开发和调试模式   • 全分布模式• Configuration:   • 基本配置   • Ssh 配置   • Ubuntu 配置12/2...
Hadoop Framework• HDFS:   • NameNode : 跟踪,指导,记录   • DataNode :底层 IO 操作   • Secondary NameNode• Map Reduce :   • Job Tracke...
Related• Programming:   • Java   • Python      • Jython ( Translate Python )      • Hadoop Streaming ( stdin , stdout )   ...
Related•   Pig: 高级数据流语言•   Hive: SQL 数据仓库•   Hbase : Google BigTable , 面向列的数据库•   ZookKeeper: 共享状态的协同系统•   Chukwa : 数据收集系统...
Resource• Book:   • Hadoop In action   • Hadoop 实战 (第二版)• Video && Google Course• URL:   • 资源收藏12/20/12                   13
thanks12/20/12            14
Upcoming SlideShare
Loading in...5
×

Hadoop introduction

323

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
323
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 素材天下 sucaitianxia.com
  • Transcript of "Hadoop introduction"

    1. 1. Hadoop Introduction Background && Installation && Hello world && related
    2. 2. Outline• Background• Hello world• Installation• Related12/20/12 2
    3. 3. Background• Why Hadoop? • Accessible: AWS • Robust : handle most such failures • Scalable: linearly • Simple: 1 == 1 w• Key Points: • Scale-out • Moving code to data12/20/12 3
    4. 4. Background: History• Apache Top Project: Doug Cutting• Lucence -> Nutch -> Hadoop(2004) • Yahoo (1w) • Facebook (Hive, Hbase,…) • HULU (Hbase) • Baidu (3000TB, one week) • Twitter (sweat data)12/20/12 4
    5. 5. Background• Comparing SQL database and Hadoop • Structure: • SQL(structure data, Specific Pattern) • Hadoop(Key-value, like Text, Picture) • Scale-out <- scale-up • Key-Value <- Relation Tables • Functional Programming <- Declarative Queries • Offline batch processing <- Online (Once Write , Read many times)12/20/12 5
    6. 6. Background – Understanding• Word Count • File Size ++ , Memory Leak • Disk-Hash Table (More complex) • Distributed: • Phase 1: Part Processing • Phase 2: Merge Results • Shuffle the partitions the appropriate machines(AlphaBeta) • Now, We have already finish a minimal Hadoop.12/20/12 6
    7. 7. Hello World: Word Count• Two Phase: • Mapping: 获取输入数据,并将其装载到 mapper 中 • Reducing: 处理来自 mapper 的所有输出,产生最终结果。• 1.1 list(filename, file content)• 1.2 list(word, 1)• 2.1 list(word, list(word))• 2.2 list(word, count)12/20/12 7
    8. 8. Hello World• mapper.py• Reducer.py12/20/12 8
    9. 9. Installation• Mode: • 单机模式( default) • 伪分布模式 推荐开发和调试模式 • 全分布模式• Configuration: • 基本配置 • Ssh 配置 • Ubuntu 配置12/20/12 9
    10. 10. Hadoop Framework• HDFS: • NameNode : 跟踪,指导,记录 • DataNode :底层 IO 操作 • Secondary NameNode• Map Reduce : • Job Tracker • Task Tracker12/20/12 10
    11. 11. Related• Programming: • Java • Python • Jython ( Translate Python ) • Hadoop Streaming ( stdin , stdout ) • Dumbo • Happy12/20/12 11
    12. 12. Related• Pig: 高级数据流语言• Hive: SQL 数据仓库• Hbase : Google BigTable , 面向列的数据库• ZookKeeper: 共享状态的协同系统• Chukwa : 数据收集系统• Mahout :数据挖掘与机器学习• Hama: 矩阵计算12/20/12 12
    13. 13. Resource• Book: • Hadoop In action • Hadoop 实战 (第二版)• Video && Google Course• URL: • 资源收藏12/20/12 13
    14. 14. thanks12/20/12 14
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×