MK99 – Big Data 1
Big data
&
cross-platform analytics
MOOC lectures Pr. Clement Levallois
MK99 – Big Data 2
Focus on “Hadoop”
• Frequently mentioned in relation to big data
• Vague definitions available and infla...
MK99 – Big Data 3
• Note on the terminology:
– “computers” are called “servers” when they are just used
for computing / pr...
MK99 – Big Data 4
“Hadoop”
• Created by Yahoo! engineers in ~ 2005. Named after the elephant toy of one of the
engineer’s ...
MK99 – Big Data 5
Why are Hadoop, cloud computing and big data
often discussed together?
– Imagine that you are Walmart an...
MK99 – Big Data 6
And map/reduce?
– “Map/reduce” is also an expression often discussed in relation with cloud computing an...
MK99 – Big Data 7
What is the business relevance
of Hadoop?
• Hadoop made it possible to process large amounts of data qui...
Upcoming SlideShare
Loading in …5
×

What is Hadoop?

1,279 views

Published on

Slides of the course on big data by C. Levallois from EMLYON Business School.
For business students. Check the online video connected with these slides.

-> Basic definition of Hadoop in relation to cloud computing and big data.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,279
On SlideShare
0
From Embeds
0
Number of Embeds
298
Actions
Shares
0
Downloads
68
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

What is Hadoop?

  1. 1. MK99 – Big Data 1 Big data & cross-platform analytics MOOC lectures Pr. Clement Levallois
  2. 2. MK99 – Big Data 2 Focus on “Hadoop” • Frequently mentioned in relation to big data • Vague definitions available and inflated talks • This short video will clarify it.
  3. 3. MK99 – Big Data 3 • Note on the terminology: – “computers” are called “servers” when they are just used for computing / processing / storing data – They have no screen, no mouse and no keyboard because that’s not needed. – But they are basically computers!
  4. 4. MK99 – Big Data 4 “Hadoop” • Created by Yahoo! engineers in ~ 2005. Named after the elephant toy of one of the engineer’s kid. • Made open source and now developed by the main open source developer community, called “Apache”. So you can see sometimes “Apache Hadoop”. • In simple words: – Hadoop is a free, open source software. – It serves to connect several servers, so that a single task can be accomplished in parallel on them. – So, with Hadoop and 5 servers you can get a task of data crunching finish 5 times sooner than with if you had just used one server. – That’s it!
  5. 5. MK99 – Big Data 5 Why are Hadoop, cloud computing and big data often discussed together? – Imagine that you are Walmart and want to compute something on your CRM: say, what are the clients who are most profitable for each store, based on their purchase history. – You will need many servers to store the data, and many servers to do the computations. – Instead of purchasing a farm of servers for this (expensive! time consuming!), you can pay for a service of cloud computing (such as Amazon AWS EC2) to rent servers just for this task, – And install Hadoop on these servers to divide the task among all servers and get it to run in parallel, speeding up computation times. – You will get the results in minutes or hours, instead of days.
  6. 6. MK99 – Big Data 6 And map/reduce? – “Map/reduce” is also an expression often discussed in relation with cloud computing and Hadoop. – This is a principle of programming perfected by engineers in Google around 2004, and made open source. – It is a principle that solves this problem: when I have data spread on 500 different servers, how do I search some data on all the servers? Checking all servers one by one (sequential search) would take a very long time. MapReduce dispatches the search on all servers at once, hence it is 500 times quicker than a sequential search. – Any software can use this principle of programming. Mapreduce is at the heart of Hadoop, which is one of the most popular software using it.
  7. 7. MK99 – Big Data 7 What is the business relevance of Hadoop? • Hadoop made it possible to process large amounts of data quickly, using free software. • It enables business models where intensive data crunching is necessary to create value. • Examples: – Amazon computing book recommendations for you, – Walmart offering personalized coupons, – NYT showing personalized display ads, – Waze (driving app) showing the state of traffic on your road in real time, – your electricity utility company computing how much electricity should be generated at peak hours.

×