Your SlideShare is downloading. ×
0
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop

543

Published on

Published in: Software, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
543
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. © 2013 KMS Technology
  • 2. HADOOP CUONG LUU KMS TECHNOLOGY 21| 04 | 2014
  • 3. AGENDA 1. Big data 2. Hadoop v1 – Hadoop file system (HDFS) – MapReduce programming model 3. Hadoop v2 4. Hadoop ecosystem 5. Q&A HADOOP
  • 4. Big Data BIG DATA ACCOMPLISHMENTS
  • 5. “Big Data” Definition HADOOP “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization” 2012, Gartner
  • 6. Problem & Solutions HADOOP  Storage • HDFS, NoSql DB, google file system, big table.  Data Processing • MapReduce, Stream processing - storm, Dremel – Drill.
  • 7. Big Data Examples HADOOP Twitter - Data in 2010: “1 trillion tweets. Today, we are seeing 50 million tweets per day”. - Platform: Hadoop, Pig, Protocol Buffers - Type of applications: Analysis, People search … See full: http://goo.gl/y7rEw7
  • 8. Big Data Examples HADOOP Facebook data warehouse - Data: 200GB per day in March 2008. 12+TB(compressed) raw data per day in 2010. - Platform: Hadoop, Hive - Type of applications: Reporting, Analysis, Machine learning… See full: http://goo.gl/XUHD9k
  • 9. Hadoop V1 BIG DATA ACCOMPLISHMENTS
  • 10. Hadoop HADOOP
  • 11. Hadoop File System (HDFS) HADOOP
  • 12. HDFS High Level HADOOP HDFS is a distributed file system designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware
  • 13. HDFS High Level HADOOP HDFS is A File System, not Database
  • 14. HDFS Block HADOOP In file system, a block is the minimum amount of data that it can read or write. Local File system | HDFS normally 512 bytes | 64 MB by default
  • 15. HDFS Architecture HADOOP
  • 16. HDFS Archives HADOOP Hadoop Archives (HAR) are a file archiving facility that packs files into HDFS blocks more efficiently, thereby reducing namenode memory usage.
  • 17. HDFS Permission HADOOP
  • 18. MapReduce programming model HADOOP
  • 19. MapReduce HADOOP Programming model and an associated implementation for processing and generating large data sets that hides the messy details of parallelization, fault-tolerance, data distribution and load balancing.
  • 20. Programming Model HADOOP
  • 21. Programming Model HADOOP
  • 22. Programming Model HADOOP
  • 23. Example1: Word Count HADOOP
  • 24. Example1: Word Count HADOOP
  • 25. Example2: Google Flu Trends HADOOP
  • 26. Example2: Google Flu Trends HADOOP select month(date), avg(california), avg(new_york) from google_flu_trends group by month (date);
  • 27. Example2: Google Flu Trends HADOOP
  • 28. Hadoop V2 HADOOP
  • 29. Hadoop V2 HADOOP
  • 30. Why YARN? – MapReduce V1 Limitations HADOOP – Scalability • Maximum Cluster size – 4,000 nodes • Maximum concurrent tasks – 4,000 • Job Tracker was overloaded by map tasks. – Availability • Failure kills all queued & running jobs. – Low resource utilization – Lacks support for alternative paradigms and services
  • 31. YARN - Concept HADOOP – Application • Application is a job submitted to the framework • Ex: Map-reduce job – Container • Basic unit of allocation • Fine-grained resource allocation across multiple resource type (memory, cpu, disk, network, gpu …) – Ex: container = 2gb, 1CPU
  • 32. YARN - Architecture HADOOP
  • 33. Hadoop V2 HADOOP
  • 34. Hadoop Ecosystem HADOOP
  • 35. Hadoop Ecosystem HADOOP
  • 36. HADOOP
  • 37. THANK YOU © 2013 KMS Technology

×