Your SlideShare is downloading. ×
0
Introduction to theHadoop ecosystem
About me
About us
Why Hadoop?
Why Hadoop?
Why Hadoop?
Why Hadoop?
Why Hadoop?
Why Hadoop?
Why Hadoop?
How to scale data?w1 w2 w3r1 r2 r3
But…
But…
What is Hadoop?
What is Hadoop?
What is Hadoop?
What is Hadoop?
The Hadoop App StoreHDFS MapRed HCat Pig Hive HBase Ambari Avro CassandraChukwaIntelSyncFlume Hana HyperT Impala Mahout Nu...
Data Storage
Data Storage
Hadoop Distributed File System•••
Hadoop Distributed File System••
HDFS Architecture
Data Processing
Data Processing
MapReduce•••
Typical large-data problem•••••
MapReduce Flow𝐤 𝟏 𝐯 𝟏 𝐤 𝟐 𝐯 𝟐 𝐤 𝟒 𝐯 𝟒 𝐤 𝟓 𝐯 𝟓 𝐤 𝟔 𝐯 𝟔𝐤 𝟑 𝐯 𝟑a 𝟏 b 2 c 9 a 3 c 2 b 7 c 8a 𝟏 b 2 c 3 c 6 a 3 c 2 b 7 c 8a 1 ...
Combined Hadoop Architecture
Word Count Mapper in Javapublic class WordCountMapper extends MapReduceBase implementsMapper<LongWritable, Text, Text, Int...
Word Count Reducer in Javapublic class WordCountReducer extends MapReduceBaseimplements Reducer<Text, IntWritable, Text, I...
Scripting for Hadoop
Scripting for Hadoop
Apache Pig••••
Pig in the Hadoop ecosystemHadoop Distributed File SystemDistributed Programming FrameworkMetadata ManagementScripting
Pig Latinusers = LOAD users.txt USING PigStorage(,) AS (name,age);pages = LOAD pages.txt USING PigStorage(,) AS (user,url)...
Pig Execution Plan
Try that with Java…
SQL for Hadoop
SQL for Hadoop
Apache Hive••
Hive in the Hadoop ecosystemHadoop Distributed File SystemDistributed Programming FrameworkMetadata ManagementScripting Qu...
Hive Architecture
Hive ExampleCREATE TABLE users(name STRING, age INT);CREATE TABLE pages(user STRING, url STRING);LOAD DATA INPATH /user/sa...
Bringing it all together…
Online AdServing••••
AdServing Architecture
Getting started…
Hortonworks Sandbox
Hadoop Training•••••••••
Upcoming SlideShare
Loading in...5
×

Introduction to the Hadoop Ecosystem (SEACON Edition)

816

Published on

Talk held at the SEACON 2013 on 17.05.2013 in Hamburg

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
816
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
48
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Introduction to the Hadoop Ecosystem (SEACON Edition)"

  1. 1. Introduction to theHadoop ecosystem
  2. 2. About me
  3. 3. About us
  4. 4. Why Hadoop?
  5. 5. Why Hadoop?
  6. 6. Why Hadoop?
  7. 7. Why Hadoop?
  8. 8. Why Hadoop?
  9. 9. Why Hadoop?
  10. 10. Why Hadoop?
  11. 11. How to scale data?w1 w2 w3r1 r2 r3
  12. 12. But…
  13. 13. But…
  14. 14. What is Hadoop?
  15. 15. What is Hadoop?
  16. 16. What is Hadoop?
  17. 17. What is Hadoop?
  18. 18. The Hadoop App StoreHDFS MapRed HCat Pig Hive HBase Ambari Avro CassandraChukwaIntelSyncFlume Hana HyperT Impala Mahout Nutch Oozie ScoopScribe Tez Vertica Whirr ZooKee Horton Cloudera MapR EMCIBM Talend TeraData Pivotal Informat Microsoft. Pentaho JasperKognitio Tableau Splunk Platfora Rack Karma Actuate MicStrat
  19. 19. Data Storage
  20. 20. Data Storage
  21. 21. Hadoop Distributed File System•••
  22. 22. Hadoop Distributed File System••
  23. 23. HDFS Architecture
  24. 24. Data Processing
  25. 25. Data Processing
  26. 26. MapReduce•••
  27. 27. Typical large-data problem•••••
  28. 28. MapReduce Flow𝐤 𝟏 𝐯 𝟏 𝐤 𝟐 𝐯 𝟐 𝐤 𝟒 𝐯 𝟒 𝐤 𝟓 𝐯 𝟓 𝐤 𝟔 𝐯 𝟔𝐤 𝟑 𝐯 𝟑a 𝟏 b 2 c 9 a 3 c 2 b 7 c 8a 𝟏 b 2 c 3 c 6 a 3 c 2 b 7 c 8a 1 3 b 𝟐 7 c 2 8 9a 4 b 9 c 19
  29. 29. Combined Hadoop Architecture
  30. 30. Word Count Mapper in Javapublic class WordCountMapper extends MapReduceBase implementsMapper<LongWritable, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException{String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()){word.set(tokenizer.nextToken());output.collect(word, one);}}}
  31. 31. Word Count Reducer in Javapublic class WordCountReducer extends MapReduceBaseimplements Reducer<Text, IntWritable, Text, IntWritable>{public void reduce(Text key, Iterator values, OutputCollectoroutput, Reporter reporter) throws IOException{int sum = 0;while (values.hasNext()){IntWritable value = (IntWritable) values.next();sum += value.get();}output.collect(key, new IntWritable(sum));}}
  32. 32. Scripting for Hadoop
  33. 33. Scripting for Hadoop
  34. 34. Apache Pig••••
  35. 35. Pig in the Hadoop ecosystemHadoop Distributed File SystemDistributed Programming FrameworkMetadata ManagementScripting
  36. 36. Pig Latinusers = LOAD users.txt USING PigStorage(,) AS (name,age);pages = LOAD pages.txt USING PigStorage(,) AS (user,url);filteredUsers = FILTER users BY age >= 18 and age <=50;joinResult = JOIN filteredUsers BY name, pages by user;grouped = GROUP joinResult BY url;summed = FOREACH grouped GENERATE group,COUNT(joinResult) as clicks;sorted = ORDER summed BY clicks desc;top10 = LIMIT sorted 10;STORE top10 INTO top10sites;
  37. 37. Pig Execution Plan
  38. 38. Try that with Java…
  39. 39. SQL for Hadoop
  40. 40. SQL for Hadoop
  41. 41. Apache Hive••
  42. 42. Hive in the Hadoop ecosystemHadoop Distributed File SystemDistributed Programming FrameworkMetadata ManagementScripting Query
  43. 43. Hive Architecture
  44. 44. Hive ExampleCREATE TABLE users(name STRING, age INT);CREATE TABLE pages(user STRING, url STRING);LOAD DATA INPATH /user/sandbox/users.txt INTOTABLE users;LOAD DATA INPATH /user/sandbox/pages.txt INTOTABLE pages;SELECT pages.url, count(*) AS clicks FROM users JOINpages ON (users.name = pages.user)WHERE users.age >= 18 AND users.age <= 50GROUP BY pages.urlSORT BY clicks DESCLIMIT 10;
  45. 45. Bringing it all together…
  46. 46. Online AdServing••••
  47. 47. AdServing Architecture
  48. 48. Getting started…
  49. 49. Hortonworks Sandbox
  50. 50. Hadoop Training•••••••••
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×