Map reduce & HDFS with Hadoop

1,876 views

Published on

Map reduce & HDFS with Hadoop

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,876
On SlideShare
0
From Embeds
0
Number of Embeds
1,029
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Map reduce & HDFS with Hadoop

  1. 1. Big Data: Hadoop 2.0 Map Reduce / HDFS 2.0 @diego_pacheco Software Architect | Agile Coach
  2. 2. Big Data
  3. 3. Hadoop - Cases
  4. 4. Hadoop
  5. 5. Hadoop Ddistributed F ile S ystem 4000 nodes: 14PB storage
  6. 6. HDFS – Assumptions and Goals • Hardware Failure: Houndred or thousands machines, expect to fail. • Streaming Data Access: Batch processing, high throughtput not low latency. • Large Data Sets: Terrabytes, works on cluster, scale, milions of files single instance. • Simple Coherency Model: Write-once-read-many(create, read, close, no changes) maximize coherency and high throughtput, perfect for Map/Reduce. • Moving Computation instead of Moving Data: Is way more cheaper, huge data, minimize network. HDFS moves the computation close to the data. • Sofware and hardware Portability: Easily Portable.
  7. 7. HDFS • Very large distributed FS • 10k nodes, 100M files, 10PB • Works with comodity hardware • File replication • Detect and recover from failures • Optimized for batch processing • Files break by blocks 128mb • blocks: replicated in N dataNodes • Data Coherency • Write Once, Read Many • Only Append to existent files
  8. 8. HDFS - Architecture
  9. 9. HDFS 2.0 - Federation
  10. 10. Hadoop
  11. 11. Map Reduce
  12. 12. Today: Parallelism per file Single LARGE File Single Thread No Parallelism
  13. 13. Map/Reduce: Unit of data Task 0 0..64 mb Task 1 64..128mb Task 2 128..192mb Each task process a unit of data Task 3 192..256mb
  14. 14. Today: Network issue
  15. 15. Map/Reduce: Local Read Task 0 Task 1 Task 2 Task 3 0..64 mb 64..128mb 128..192mb 192..256mb Node 0 Node 1 Node 2 Node 3 • Local Read, no need for network copy • Data is read from many disks in parallel
  16. 16. Map/Reduce: The Magic! Single Hard Drive: Reads 75mb/second 12 hard drive Per machine 12 * 75mb/second * 4k = 3.4 TB/ second
  17. 17. Map
  18. 18. Reduce
  19. 19. Big Data: Hadoop 2.0 Map Reduce / HDFS 2.0 Obrigado! Thank You! @diego_pacheco Software Architect | Agile Coach

×