Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction to Analytics
and Big Data - Hadoop

The University of British Columbia
Computer Science Alumni/Industry Lectu...
Who am I?
 Director Engineering, Teradata
 HSBC, Pivotal/Aptean, Newbridge/Alcatel, etc. various

engineering roles
 Te...
Big Data and Hadoop
 History
 Data Challenges
 Why Hadoop?

© 2013 Geoff Fawkes. All Rights Reserved.
3
Customer Challenges: The Data
Deluge

© 2013 Geoff Fawkes. All Rights Reserved.
4
Big Data is Different than Business
Intelligence

© 2013 Geoff Fawkes. All Rights Reserved.
5
Questions From Business Will Vary

© 2013 Geoff Fawkes. All Rights Reserved.
6
Web 2.0 is “Data Driven”

© 2013 Geoff Fawkes. All Rights Reserved.
7
The World of Data-Driven
Applications

© 2013 Geoff Fawkes. All Rights Reserved.
8
Attributes of Big Data

© 2013 Geoff Fawkes. All Rights Reserved.
9
Top Ten Common Big Data Problems

© 2013 Geoff Fawkes. All Rights Reserved.
10
Industries Are Embracing Big Data

© 2013 Geoff Fawkes. All Rights Reserved.
11
Why Hadoop?

© 2013 Geoff Fawkes. All Rights Reserved.
12
Why Hadoop?

© 2013 Geoff Fawkes. All Rights Reserved.
13
Storage and Memory B/W Lagging
CPU

© 2013 Geoff Fawkes. All Rights Reserved.
14
Commodity Hardware Economics

© 2013 Geoff Fawkes. All Rights Reserved.
15
What is Hadoop?
 Hadoop Adoption
 HDFS
 MapReduce
 Examples
 Ecosystem Projects

© 2013 Geoff Fawkes. All Rights Rese...
Hadoop Adoption in the Industry

© 2013 Geoff Fawkes. All Rights Reserved.
18
What is Hadoop?

© 2013 Geoff Fawkes. All Rights Reserved.
19
What is Hadoop?

© 2013 Geoff Fawkes. All Rights Reserved.
20
HDFS 101 – The Data Set System

© 2013 Geoff Fawkes. All Rights Reserved.
21
HDFS Organization and Replication

© 2013 Geoff Fawkes. All Rights Reserved.
22
Hadoop Server Roles - Multiple

© 2013 Geoff Fawkes. All Rights Reserved.
23
Hadoop Cluster

© 2013 Geoff Fawkes. All Rights Reserved.
24
HDFS File Write Operation - Instance

© 2013 Geoff Fawkes. All Rights Reserved.
25
HDFS File Read Operation - Instance

© 2013 Geoff Fawkes. All Rights Reserved.
26
HDFS File Operation R/W Replication

© 2013 Geoff Fawkes. All Rights Reserved.
27
MapReduce 101 – Functional
Programming Meets Distributed Processing

© 2013 Geoff Fawkes. All Rights Reserved.
28
What is MapReduce?

© 2013 Geoff Fawkes. All Rights Reserved.
29
Key MapReduce Terminology

© 2013 Geoff Fawkes. All Rights Reserved.
30
MapReduce Basic Concepts

© 2013 Geoff Fawkes. All Rights Reserved.
31
Example 1: MapReduce Operation

© 2013 Geoff Fawkes. All Rights Reserved.
32
Example 2: Sample Dataset

© 2013 Geoff Fawkes. All Rights Reserved.
33
MapReduce Paradigm – UNIX Cmd

© 2013 Geoff Fawkes. All Rights Reserved.
34
Example 3: Count Words

© 2013 Geoff Fawkes. All Rights Reserved.
35
Ex. 3: Lifecycle of a MapReduce Job
Map function

Reduce function

Run this program as a
MapReduce job

© 2013 Geoff Fawke...
Ex. 3: Lifecycle of a MapReduce Job
Map function

Reduce function

Run this program as a
MapReduce job

© 2013 Geoff Fawke...
Ex. 3: Lifecycle of a MapReduce Job
Time

Input
Splits

Map
Wave 1

Map
Wave 2

Reduce
Wave 1

Reduce
Wave 2

How are the ...
MapReduce Job Configuration Parms
 190+ parameters in

Hadoop
 Set manually or
defaults are used

© 2013 Geoff Fawkes. A...
Putting it all Together: MapReduce +
HDFS

© 2013 Geoff Fawkes. All Rights Reserved.
40
Hadoop Ecosystem Projects

- Interactive SQL Query & Modeling
- Data flow for tedious MapReduce Jobs
- Columnar NoSQL Stor...
Compare: Hadoop, SQL, Massively
Parallel Processing (MPP)

© 2013 Geoff Fawkes. All Rights Reserved.
42
Compare: RDBMS and MapReduce

© 2013 Geoff Fawkes. All Rights Reserved.
43
Hadoop Use Cases
 Set Top Cable TV Boxes
 Pay Per View Advertising
 Bank Risk Modelling
 Product Sentiment Analysis

©...
Example 1: Set Top Cable TV Boxes

© 2013 Geoff Fawkes. All Rights Reserved.
45
Example 2: Pay Per View Advertising

© 2013 Geoff Fawkes. All Rights Reserved.
46
Example 3: Bank Risk Modelling

© 2013 Geoff Fawkes. All Rights Reserved.
47
Example 4: Product Sentiment Analysis

© 2013 Geoff Fawkes. All Rights Reserved.
48
More Reading?
 World Economic Forum: “Personal Data: The Emergence of a New Asset
Class” 2011
 McKinsey Global Institute...
Introduction to Analytics
and Big Data – Hadoop
Q&A
Geoff Fawkes
http://www.linkedin.com/pub/geoff-fawkes/1/269/202
@gfawk...
Upcoming SlideShare
Loading in …5
×

of

Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 1 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 2 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 3 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 4 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 5 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 6 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 7 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 8 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 9 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 10 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 11 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 12 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 13 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 14 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 15 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 16 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 17 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 18 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 19 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 20 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 21 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 22 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 23 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 24 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 25 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 26 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 27 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 28 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 29 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 30 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 31 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 32 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 33 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 34 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 35 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 36 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 37 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 38 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 39 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 40 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 41 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 42 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 43 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 44 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 45 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 46 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 47 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 48 Intro to big data and hadoop   ubc cs lecture series - g fawkes Slide 49
Upcoming SlideShare
An introduction to Apache Cassandra
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Intro to big data and hadoop ubc cs lecture series - g fawkes

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Intro to big data and hadoop ubc cs lecture series - g fawkes

  1. 1. Introduction to Analytics and Big Data - Hadoop The University of British Columbia Computer Science Alumni/Industry Lecture Series Geoff Fawkes November, 2013 © 2013 Geoff Fawkes. All Rights Reserved. 1 / 450
  2. 2. Who am I?  Director Engineering, Teradata  HSBC, Pivotal/Aptean, Newbridge/Alcatel, etc. various engineering roles  Technology executive, mentor, software engineer  B.Sc. Comp Sci (UBC), MBA Executive (SFU)  Interruptive (disruptive?) personality   Please ask questions to me / each other as we go along I don’t have all the answers – you do!  Credits: Rob Pegler, SNIA Education  Storage Networking Industry Association, 2012  Who’s paying attention - 450 slides page count?  Not that “big” - - about 50 © 2013 Geoff Fawkes. All Rights Reserved. 2
  3. 3. Big Data and Hadoop  History  Data Challenges  Why Hadoop? © 2013 Geoff Fawkes. All Rights Reserved. 3
  4. 4. Customer Challenges: The Data Deluge © 2013 Geoff Fawkes. All Rights Reserved. 4
  5. 5. Big Data is Different than Business Intelligence © 2013 Geoff Fawkes. All Rights Reserved. 5
  6. 6. Questions From Business Will Vary © 2013 Geoff Fawkes. All Rights Reserved. 6
  7. 7. Web 2.0 is “Data Driven” © 2013 Geoff Fawkes. All Rights Reserved. 7
  8. 8. The World of Data-Driven Applications © 2013 Geoff Fawkes. All Rights Reserved. 8
  9. 9. Attributes of Big Data © 2013 Geoff Fawkes. All Rights Reserved. 9
  10. 10. Top Ten Common Big Data Problems © 2013 Geoff Fawkes. All Rights Reserved. 10
  11. 11. Industries Are Embracing Big Data © 2013 Geoff Fawkes. All Rights Reserved. 11
  12. 12. Why Hadoop? © 2013 Geoff Fawkes. All Rights Reserved. 12
  13. 13. Why Hadoop? © 2013 Geoff Fawkes. All Rights Reserved. 13
  14. 14. Storage and Memory B/W Lagging CPU © 2013 Geoff Fawkes. All Rights Reserved. 14
  15. 15. Commodity Hardware Economics © 2013 Geoff Fawkes. All Rights Reserved. 15
  16. 16. What is Hadoop?  Hadoop Adoption  HDFS  MapReduce  Examples  Ecosystem Projects © 2013 Geoff Fawkes. All Rights Reserved. 17
  17. 17. Hadoop Adoption in the Industry © 2013 Geoff Fawkes. All Rights Reserved. 18
  18. 18. What is Hadoop? © 2013 Geoff Fawkes. All Rights Reserved. 19
  19. 19. What is Hadoop? © 2013 Geoff Fawkes. All Rights Reserved. 20
  20. 20. HDFS 101 – The Data Set System © 2013 Geoff Fawkes. All Rights Reserved. 21
  21. 21. HDFS Organization and Replication © 2013 Geoff Fawkes. All Rights Reserved. 22
  22. 22. Hadoop Server Roles - Multiple © 2013 Geoff Fawkes. All Rights Reserved. 23
  23. 23. Hadoop Cluster © 2013 Geoff Fawkes. All Rights Reserved. 24
  24. 24. HDFS File Write Operation - Instance © 2013 Geoff Fawkes. All Rights Reserved. 25
  25. 25. HDFS File Read Operation - Instance © 2013 Geoff Fawkes. All Rights Reserved. 26
  26. 26. HDFS File Operation R/W Replication © 2013 Geoff Fawkes. All Rights Reserved. 27
  27. 27. MapReduce 101 – Functional Programming Meets Distributed Processing © 2013 Geoff Fawkes. All Rights Reserved. 28
  28. 28. What is MapReduce? © 2013 Geoff Fawkes. All Rights Reserved. 29
  29. 29. Key MapReduce Terminology © 2013 Geoff Fawkes. All Rights Reserved. 30
  30. 30. MapReduce Basic Concepts © 2013 Geoff Fawkes. All Rights Reserved. 31
  31. 31. Example 1: MapReduce Operation © 2013 Geoff Fawkes. All Rights Reserved. 32
  32. 32. Example 2: Sample Dataset © 2013 Geoff Fawkes. All Rights Reserved. 33
  33. 33. MapReduce Paradigm – UNIX Cmd © 2013 Geoff Fawkes. All Rights Reserved. 34
  34. 34. Example 3: Count Words © 2013 Geoff Fawkes. All Rights Reserved. 35
  35. 35. Ex. 3: Lifecycle of a MapReduce Job Map function Reduce function Run this program as a MapReduce job © 2013 Geoff Fawkes. All Rights Reserved. 36
  36. 36. Ex. 3: Lifecycle of a MapReduce Job Map function Reduce function Run this program as a MapReduce job © 2013 Geoff Fawkes. All Rights Reserved. 37
  37. 37. Ex. 3: Lifecycle of a MapReduce Job Time Input Splits Map Wave 1 Map Wave 2 Reduce Wave 1 Reduce Wave 2 How are the number of splits, number of map and reduce tasks, memory allocation to tasks, etc., determined? © 2013 Geoff Fawkes. All Rights Reserved. 38
  38. 38. MapReduce Job Configuration Parms  190+ parameters in Hadoop  Set manually or defaults are used © 2013 Geoff Fawkes. All Rights Reserved. 39
  39. 39. Putting it all Together: MapReduce + HDFS © 2013 Geoff Fawkes. All Rights Reserved. 40
  40. 40. Hadoop Ecosystem Projects - Interactive SQL Query & Modeling - Data flow for tedious MapReduce Jobs - Columnar NoSQL Store © 2013 Geoff Fawkes. All Rights Reserved. 41
  41. 41. Compare: Hadoop, SQL, Massively Parallel Processing (MPP) © 2013 Geoff Fawkes. All Rights Reserved. 42
  42. 42. Compare: RDBMS and MapReduce © 2013 Geoff Fawkes. All Rights Reserved. 43
  43. 43. Hadoop Use Cases  Set Top Cable TV Boxes  Pay Per View Advertising  Bank Risk Modelling  Product Sentiment Analysis © 2013 Geoff Fawkes. All Rights Reserved. 44
  44. 44. Example 1: Set Top Cable TV Boxes © 2013 Geoff Fawkes. All Rights Reserved. 45
  45. 45. Example 2: Pay Per View Advertising © 2013 Geoff Fawkes. All Rights Reserved. 46
  46. 46. Example 3: Bank Risk Modelling © 2013 Geoff Fawkes. All Rights Reserved. 47
  47. 47. Example 4: Product Sentiment Analysis © 2013 Geoff Fawkes. All Rights Reserved. 48
  48. 48. More Reading?  World Economic Forum: “Personal Data: The Emergence of a New Asset Class” 2011  McKinsey Global Institute: Big Data: The next frontier for innovation, competition, and productivity  Big Data: Harnessing a game-changing asset  IDC: 2011 Digital Universe Study: Extracting Value from Chaos  The Economist: Data, Data Everywhere  Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field  O’Reilly – What is Data Science?  O’Reilly – Building Data Science Teams?  O’Reilly – Data for the public good  Obama Administration “Big Data Research and Development Initiative.” © 2013 Geoff Fawkes. All Rights Reserved. 49
  49. 49. Introduction to Analytics and Big Data – Hadoop Q&A Geoff Fawkes http://www.linkedin.com/pub/geoff-fawkes/1/269/202 @gfawkes November, 2013 © 2013 Geoff Fawkes. All Rights Reserved. 50

Views

Total views

1,092

On Slideshare

0

From embeds

0

Number of embeds

98

Actions

Downloads

31

Shares

0

Comments

0

Likes

0

×