Introduction to theHadoop ecosystem
About me
About us
Why Hadoop?
Why Hadoop?
Why Hadoop?
Why Hadoop?
Why Hadoop?
Why Hadoop?
Why Hadoop?
How to scale data?w1 w2 w3r1 r2 r3
But…
But…
What is Hadoop?
What is Hadoop?
What is Hadoop?
What is Hadoop?
The Hadoop App StoreHDFS MapRed HCat Pig Hive HBase Ambari Avro CassandraChukwaIntelSyncFlume Hana HyperT Impala Mahout Nu...
Data Storage
Data Storage
Hadoop Distributed File System•••
Hadoop Distributed File System••
HDFS Architecture
Data Processing
Data Processing
MapReduce•••
Typical large-data problem•••••
MapReduce Flow𝐤 𝟏 𝐯 𝟏 𝐤 𝟐 𝐯 𝟐 𝐤 𝟒 𝐯 𝟒 𝐤 𝟓 𝐯 𝟓 𝐤 𝟔 𝐯 𝟔𝐤 𝟑 𝐯 𝟑a 𝟏 b 2 c 9 a 3 c 2 b 7 c 8a 𝟏 b 2 c 3 c 6 a 3 c 2 b 7 c 8a 1 ...
Jobs & Tasks••••
Combined Hadoop Architecture
Word Count Mapper in Javapublic class WordCountMapper extends MapReduceBase implementsMapper<LongWritable, Text, Text, Int...
Word Count Reducer in Javapublic class WordCountReducer extends MapReduceBaseimplements Reducer<Text, IntWritable, Text, I...
Scripting for Hadoop
Scripting for Hadoop
Apache Pig••••
Pig in the Hadoop ecosystemHadoop Distributed File SystemDistributed Programming FrameworkMetadata ManagementScripting
Pig Latinusers = LOAD users.txt USING PigStorage(,) AS (name,age);pages = LOAD pages.txt USING PigStorage(,) AS (user,url)...
Pig Execution Plan
Try that with Java…
SQL for Hadoop
SQL for Hadoop
Apache Hive••
Hive in the Hadoop ecosystemHadoop Distributed File SystemDistributed Programming FrameworkMetadata ManagementScripting Qu...
Hive Architecture
Hive ExampleCREATE TABLE users(name STRING, age INT);CREATE TABLE pages(user STRING, url STRING);LOAD DATA INPATH /user/sa...
Bringing it all together…
Online Advertising
Getting started…
Hortonworks Sandbox
Hadoop Training•••••••••
The end…or the beginning?
Upcoming SlideShare
Loading in...5
×

Introduction to the Hadoop Ecosystem (codemotion Edition)

841

Published on

Talk held at codemotion Berlin 2013 on 10.05.2013 in Berlin, Germany

Published in: Technology, News & Politics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
841
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introduction to the Hadoop Ecosystem (codemotion Edition)

  1. 1. Introduction to theHadoop ecosystem
  2. 2. About me
  3. 3. About us
  4. 4. Why Hadoop?
  5. 5. Why Hadoop?
  6. 6. Why Hadoop?
  7. 7. Why Hadoop?
  8. 8. Why Hadoop?
  9. 9. Why Hadoop?
  10. 10. Why Hadoop?
  11. 11. How to scale data?w1 w2 w3r1 r2 r3
  12. 12. But…
  13. 13. But…
  14. 14. What is Hadoop?
  15. 15. What is Hadoop?
  16. 16. What is Hadoop?
  17. 17. What is Hadoop?
  18. 18. The Hadoop App StoreHDFS MapRed HCat Pig Hive HBase Ambari Avro CassandraChukwaIntelSyncFlume Hana HyperT Impala Mahout Nutch Oozie ScoopScribe Tez Vertica Whirr ZooKee Cloudera Horton MapR EMCIBM Talend TeraData Pivotal Informat Microsoft. Pentaho JasperKognitio Tableau Splunk Platfora Rack Karma Actuate MicStrat
  19. 19. Data Storage
  20. 20. Data Storage
  21. 21. Hadoop Distributed File System•••
  22. 22. Hadoop Distributed File System••
  23. 23. HDFS Architecture
  24. 24. Data Processing
  25. 25. Data Processing
  26. 26. MapReduce•••
  27. 27. Typical large-data problem•••••
  28. 28. MapReduce Flow𝐤 𝟏 𝐯 𝟏 𝐤 𝟐 𝐯 𝟐 𝐤 𝟒 𝐯 𝟒 𝐤 𝟓 𝐯 𝟓 𝐤 𝟔 𝐯 𝟔𝐤 𝟑 𝐯 𝟑a 𝟏 b 2 c 9 a 3 c 2 b 7 c 8a 𝟏 b 2 c 3 c 6 a 3 c 2 b 7 c 8a 1 3 b 𝟐 7 c 2 8 9a 4 b 9 c 19
  29. 29. Jobs & Tasks••••
  30. 30. Combined Hadoop Architecture
  31. 31. Word Count Mapper in Javapublic class WordCountMapper extends MapReduceBase implementsMapper<LongWritable, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException{String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()){word.set(tokenizer.nextToken());output.collect(word, one);}}}
  32. 32. Word Count Reducer in Javapublic class WordCountReducer extends MapReduceBaseimplements Reducer<Text, IntWritable, Text, IntWritable>{public void reduce(Text key, Iterator values, OutputCollectoroutput, Reporter reporter) throws IOException{int sum = 0;while (values.hasNext()){IntWritable value = (IntWritable) values.next();sum += value.get();}output.collect(key, new IntWritable(sum));}}
  33. 33. Scripting for Hadoop
  34. 34. Scripting for Hadoop
  35. 35. Apache Pig••••
  36. 36. Pig in the Hadoop ecosystemHadoop Distributed File SystemDistributed Programming FrameworkMetadata ManagementScripting
  37. 37. Pig Latinusers = LOAD users.txt USING PigStorage(,) AS (name,age);pages = LOAD pages.txt USING PigStorage(,) AS (user,url);filteredUsers = FILTER users BY age >= 18 and age <=50;joinResult = JOIN filteredUsers BY name, pages by user;grouped = GROUP joinResult BY url;summed = FOREACH grouped GENERATE group,COUNT(joinResult) as clicks;sorted = ORDER summed BY clicks desc;top10 = LIMIT sorted 10;STORE top10 INTO top10sites;
  38. 38. Pig Execution Plan
  39. 39. Try that with Java…
  40. 40. SQL for Hadoop
  41. 41. SQL for Hadoop
  42. 42. Apache Hive••
  43. 43. Hive in the Hadoop ecosystemHadoop Distributed File SystemDistributed Programming FrameworkMetadata ManagementScripting Query
  44. 44. Hive Architecture
  45. 45. Hive ExampleCREATE TABLE users(name STRING, age INT);CREATE TABLE pages(user STRING, url STRING);LOAD DATA INPATH /user/sandbox/users.txt INTOTABLE users;LOAD DATA INPATH /user/sandbox/pages.txt INTOTABLE pages;SELECT pages.url, count(*) AS clicks FROM users JOINpages ON (users.name = pages.user)WHERE users.age >= 18 AND users.age <= 50GROUP BY pages.urlSORT BY clicks DESCLIMIT 10;
  46. 46. Bringing it all together…
  47. 47. Online Advertising
  48. 48. Getting started…
  49. 49. Hortonworks Sandbox
  50. 50. Hadoop Training•••••••••
  51. 51. The end…or the beginning?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×