Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MK99 – Big Data 1
Big data
&
cross-platform analytics
MOOC lectures Pr. Clement Levallois
MK99 – Big Data 2
Focus on “Hadoop”
• Frequently mentioned in relation to big data
• Vague definitions available and infla...
MK99 – Big Data 3
• Note on the terminology:
– “computers” are called “servers” when they are just used
for computing / pr...
MK99 – Big Data 4
“Hadoop”
• Created by Yahoo! engineers in ~ 2005. Named after the elephant toy of one of the
engineer’s ...
MK99 – Big Data 5
Why are Hadoop, cloud computing and big data
often discussed together?
– Imagine that you are Walmart an...
MK99 – Big Data 6
And map/reduce?
– “Map/reduce” is also an expression often discussed in relation with cloud computing an...
MK99 – Big Data 7
What is the business relevance
of Hadoop?
• Hadoop made it possible to process large amounts of data qui...
Upcoming SlideShare
Loading in …5
×

What is Hadoop?

Slides of the course on big data by C. Levallois from EMLYON Business School.
For business students. Check the online video connected with these slides.

-> Basic definition of Hadoop in relation to cloud computing and big data.

  • Login to see the comments

  • Be the first to like this

What is Hadoop?

  1. 1. MK99 – Big Data 1 Big data & cross-platform analytics MOOC lectures Pr. Clement Levallois
  2. 2. MK99 – Big Data 2 Focus on “Hadoop” • Frequently mentioned in relation to big data • Vague definitions available and inflated talks • This short video will clarify it.
  3. 3. MK99 – Big Data 3 • Note on the terminology: – “computers” are called “servers” when they are just used for computing / processing / storing data – They have no screen, no mouse and no keyboard because that’s not needed. – But they are basically computers!
  4. 4. MK99 – Big Data 4 “Hadoop” • Created by Yahoo! engineers in ~ 2005. Named after the elephant toy of one of the engineer’s kid. • Made open source and now developed by the main open source developer community, called “Apache”. So you can see sometimes “Apache Hadoop”. • In simple words: – Hadoop is a free, open source software. – It serves to connect several servers, so that a single task can be accomplished in parallel on them. – So, with Hadoop and 5 servers you can get a task of data crunching finish 5 times sooner than with if you had just used one server. – That’s it!
  5. 5. MK99 – Big Data 5 Why are Hadoop, cloud computing and big data often discussed together? – Imagine that you are Walmart and want to compute something on your CRM: say, what are the clients who are most profitable for each store, based on their purchase history. – You will need many servers to store the data, and many servers to do the computations. – Instead of purchasing a farm of servers for this (expensive! time consuming!), you can pay for a service of cloud computing (such as Amazon AWS EC2) to rent servers just for this task, – And install Hadoop on these servers to divide the task among all servers and get it to run in parallel, speeding up computation times. – You will get the results in minutes or hours, instead of days.
  6. 6. MK99 – Big Data 6 And map/reduce? – “Map/reduce” is also an expression often discussed in relation with cloud computing and Hadoop. – This is a principle of programming perfected by engineers in Google around 2004, and made open source. – It is a principle that solves this problem: when I have data spread on 500 different servers, how do I search some data on all the servers? Checking all servers one by one (sequential search) would take a very long time. MapReduce dispatches the search on all servers at once, hence it is 500 times quicker than a sequential search. – Any software can use this principle of programming. Mapreduce is at the heart of Hadoop, which is one of the most popular software using it.
  7. 7. MK99 – Big Data 7 What is the business relevance of Hadoop? • Hadoop made it possible to process large amounts of data quickly, using free software. • It enables business models where intensive data crunching is necessary to create value. • Examples: – Amazon computing book recommendations for you, – Walmart offering personalized coupons, – NYT showing personalized display ads, – Waze (driving app) showing the state of traffic on your road in real time, – your electricity utility company computing how much electricity should be generated at peak hours.

×