Your SlideShare is downloading. ×
0
© 2013 KMS Technology
HADOOP
CUONG LUU
KMS TECHNOLOGY
21| 04 | 2014
AGENDA
1. Big data
2. Hadoop v1
– Hadoop file system (HDFS)
– MapReduce programming model
3. Hadoop v2
4. Hadoop ecosystem...
Big Data
BIG DATA ACCOMPLISHMENTS
“Big Data” Definition
HADOOP
“Big data is high volume, high velocity, and/or
high variety information assets that require ...
Problem & Solutions
HADOOP
 Storage
• HDFS, NoSql DB, google file system, big
table.
 Data Processing
• MapReduce, Strea...
Big Data Examples
HADOOP
Twitter
- Data in 2010: “1 trillion tweets. Today, we are seeing
50 million tweets per day”.
- Pl...
Big Data Examples
HADOOP
Facebook data warehouse
- Data: 200GB per day in March 2008.
12+TB(compressed) raw data per day i...
Hadoop V1
BIG DATA ACCOMPLISHMENTS
Hadoop
HADOOP
Hadoop File System (HDFS)
HADOOP
HDFS High Level
HADOOP
HDFS is a distributed file system designed for storing
very large files with streaming data access ...
HDFS High Level
HADOOP
HDFS is A File System, not Database
HDFS Block
HADOOP
In file system, a block is the minimum amount of data that it
can read or write.
Local File system | HDF...
HDFS Architecture
HADOOP
HDFS Archives
HADOOP
Hadoop Archives (HAR) are a file archiving facility that
packs files into HDFS blocks more efficientl...
HDFS Permission
HADOOP
MapReduce programming model
HADOOP
MapReduce
HADOOP
Programming model and an associated implementation for
processing and generating large data sets that hid...
Programming Model
HADOOP
Programming Model
HADOOP
Programming Model
HADOOP
Example1: Word Count
HADOOP
Example1: Word Count
HADOOP
Example2: Google Flu Trends
HADOOP
Example2: Google Flu Trends
HADOOP
select month(date), avg(california), avg(new_york) from google_flu_trends group by mont...
Example2: Google Flu Trends
HADOOP
Hadoop V2
HADOOP
Hadoop V2
HADOOP
Why YARN? – MapReduce V1
Limitations
HADOOP
– Scalability
• Maximum Cluster size – 4,000 nodes
• Maximum concurrent tasks ...
YARN - Concept
HADOOP
– Application
• Application is a job submitted to the framework
• Ex: Map-reduce job
– Container
• B...
YARN - Architecture
HADOOP
Hadoop V2
HADOOP
Hadoop Ecosystem
HADOOP
Hadoop Ecosystem
HADOOP
HADOOP
THANK YOU
© 2013 KMS Technology
Upcoming SlideShare
Loading in...5
×

Hadoop

558

Published on

Published in: Software, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
558
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop"

  1. 1. © 2013 KMS Technology
  2. 2. HADOOP CUONG LUU KMS TECHNOLOGY 21| 04 | 2014
  3. 3. AGENDA 1. Big data 2. Hadoop v1 – Hadoop file system (HDFS) – MapReduce programming model 3. Hadoop v2 4. Hadoop ecosystem 5. Q&A HADOOP
  4. 4. Big Data BIG DATA ACCOMPLISHMENTS
  5. 5. “Big Data” Definition HADOOP “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization” 2012, Gartner
  6. 6. Problem & Solutions HADOOP  Storage • HDFS, NoSql DB, google file system, big table.  Data Processing • MapReduce, Stream processing - storm, Dremel – Drill.
  7. 7. Big Data Examples HADOOP Twitter - Data in 2010: “1 trillion tweets. Today, we are seeing 50 million tweets per day”. - Platform: Hadoop, Pig, Protocol Buffers - Type of applications: Analysis, People search … See full: http://goo.gl/y7rEw7
  8. 8. Big Data Examples HADOOP Facebook data warehouse - Data: 200GB per day in March 2008. 12+TB(compressed) raw data per day in 2010. - Platform: Hadoop, Hive - Type of applications: Reporting, Analysis, Machine learning… See full: http://goo.gl/XUHD9k
  9. 9. Hadoop V1 BIG DATA ACCOMPLISHMENTS
  10. 10. Hadoop HADOOP
  11. 11. Hadoop File System (HDFS) HADOOP
  12. 12. HDFS High Level HADOOP HDFS is a distributed file system designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware
  13. 13. HDFS High Level HADOOP HDFS is A File System, not Database
  14. 14. HDFS Block HADOOP In file system, a block is the minimum amount of data that it can read or write. Local File system | HDFS normally 512 bytes | 64 MB by default
  15. 15. HDFS Architecture HADOOP
  16. 16. HDFS Archives HADOOP Hadoop Archives (HAR) are a file archiving facility that packs files into HDFS blocks more efficiently, thereby reducing namenode memory usage.
  17. 17. HDFS Permission HADOOP
  18. 18. MapReduce programming model HADOOP
  19. 19. MapReduce HADOOP Programming model and an associated implementation for processing and generating large data sets that hides the messy details of parallelization, fault-tolerance, data distribution and load balancing.
  20. 20. Programming Model HADOOP
  21. 21. Programming Model HADOOP
  22. 22. Programming Model HADOOP
  23. 23. Example1: Word Count HADOOP
  24. 24. Example1: Word Count HADOOP
  25. 25. Example2: Google Flu Trends HADOOP
  26. 26. Example2: Google Flu Trends HADOOP select month(date), avg(california), avg(new_york) from google_flu_trends group by month (date);
  27. 27. Example2: Google Flu Trends HADOOP
  28. 28. Hadoop V2 HADOOP
  29. 29. Hadoop V2 HADOOP
  30. 30. Why YARN? – MapReduce V1 Limitations HADOOP – Scalability • Maximum Cluster size – 4,000 nodes • Maximum concurrent tasks – 4,000 • Job Tracker was overloaded by map tasks. – Availability • Failure kills all queued & running jobs. – Low resource utilization – Lacks support for alternative paradigms and services
  31. 31. YARN - Concept HADOOP – Application • Application is a job submitted to the framework • Ex: Map-reduce job – Container • Basic unit of allocation • Fine-grained resource allocation across multiple resource type (memory, cpu, disk, network, gpu …) – Ex: container = 2gb, 1CPU
  32. 32. YARN - Architecture HADOOP
  33. 33. Hadoop V2 HADOOP
  34. 34. Hadoop Ecosystem HADOOP
  35. 35. Hadoop Ecosystem HADOOP
  36. 36. HADOOP
  37. 37. THANK YOU © 2013 KMS Technology
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×