Your SlideShare is downloading. ×
Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Hadoop

526
views

Published on

Published in: Software, Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
526
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. © 2013 KMS Technology
  • 2. HADOOP CUONG LUU KMS TECHNOLOGY 21| 04 | 2014
  • 3. AGENDA 1. Big data 2. Hadoop v1 – Hadoop file system (HDFS) – MapReduce programming model 3. Hadoop v2 4. Hadoop ecosystem 5. Q&A HADOOP
  • 4. Big Data BIG DATA ACCOMPLISHMENTS
  • 5. “Big Data” Definition HADOOP “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization” 2012, Gartner
  • 6. Problem & Solutions HADOOP  Storage • HDFS, NoSql DB, google file system, big table.  Data Processing • MapReduce, Stream processing - storm, Dremel – Drill.
  • 7. Big Data Examples HADOOP Twitter - Data in 2010: “1 trillion tweets. Today, we are seeing 50 million tweets per day”. - Platform: Hadoop, Pig, Protocol Buffers - Type of applications: Analysis, People search … See full: http://goo.gl/y7rEw7
  • 8. Big Data Examples HADOOP Facebook data warehouse - Data: 200GB per day in March 2008. 12+TB(compressed) raw data per day in 2010. - Platform: Hadoop, Hive - Type of applications: Reporting, Analysis, Machine learning… See full: http://goo.gl/XUHD9k
  • 9. Hadoop V1 BIG DATA ACCOMPLISHMENTS
  • 10. Hadoop HADOOP
  • 11. Hadoop File System (HDFS) HADOOP
  • 12. HDFS High Level HADOOP HDFS is a distributed file system designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware
  • 13. HDFS High Level HADOOP HDFS is A File System, not Database
  • 14. HDFS Block HADOOP In file system, a block is the minimum amount of data that it can read or write. Local File system | HDFS normally 512 bytes | 64 MB by default
  • 15. HDFS Architecture HADOOP
  • 16. HDFS Archives HADOOP Hadoop Archives (HAR) are a file archiving facility that packs files into HDFS blocks more efficiently, thereby reducing namenode memory usage.
  • 17. HDFS Permission HADOOP
  • 18. MapReduce programming model HADOOP
  • 19. MapReduce HADOOP Programming model and an associated implementation for processing and generating large data sets that hides the messy details of parallelization, fault-tolerance, data distribution and load balancing.
  • 20. Programming Model HADOOP
  • 21. Programming Model HADOOP
  • 22. Programming Model HADOOP
  • 23. Example1: Word Count HADOOP
  • 24. Example1: Word Count HADOOP
  • 25. Example2: Google Flu Trends HADOOP
  • 26. Example2: Google Flu Trends HADOOP select month(date), avg(california), avg(new_york) from google_flu_trends group by month (date);
  • 27. Example2: Google Flu Trends HADOOP
  • 28. Hadoop V2 HADOOP
  • 29. Hadoop V2 HADOOP
  • 30. Why YARN? – MapReduce V1 Limitations HADOOP – Scalability • Maximum Cluster size – 4,000 nodes • Maximum concurrent tasks – 4,000 • Job Tracker was overloaded by map tasks. – Availability • Failure kills all queued & running jobs. – Low resource utilization – Lacks support for alternative paradigms and services
  • 31. YARN - Concept HADOOP – Application • Application is a job submitted to the framework • Ex: Map-reduce job – Container • Basic unit of allocation • Fine-grained resource allocation across multiple resource type (memory, cpu, disk, network, gpu …) – Ex: container = 2gb, 1CPU
  • 32. YARN - Architecture HADOOP
  • 33. Hadoop V2 HADOOP
  • 34. Hadoop Ecosystem HADOOP
  • 35. Hadoop Ecosystem HADOOP
  • 36. HADOOP
  • 37. THANK YOU © 2013 KMS Technology