2. Do you Know?
The total volume of electronic data stored is approximately 2 zettabytes (1
Do you know how many photos FB host?
10 billion photos, nearly 1PB
Do you know how much data internet archive stores everyday?
Around 2PB of data and is growing at a rate of 20PB of data
15 million smart meters (US) generating data at the rate of 3GB per second
Events collected through user interaction from sites are generated at the rate of
1.5GB per second.
3. What is BIG DATA?
Near real time Analytics
5. Who is using Hadoop?
New York Times
6. What makes Hadoop special?
No high end or expensive systems are required
– Built on commodity hardwares
Can run on Linux, Mac OS/X, Windows, Solaris
Fault tolerant system
– Execution of the job continues even of nodes are failing
Highly reliable and efficient storage system
In built intelligence to speed up the application
– Speculative execution
Fit for lot of applications:
– Web log processing
– Page Indexing,page ranking
– Complex event processing
7. Overview of HDFS architecture
8. Overview of MapReduce
9. Does Hadoop solves every one problem?????
• I am DB guy, I am proficient in writing SQL and trying very
hard to optimize my queries, but still not able to do so.
Moreover I am not Java geek. Will this solve my problem
• Hadoop is written in Java, and I am purely from C++ back
ground, how I can use Hadoop for my big data problems?
Use Hadoop Pipes
• I am a statistician and I know only R, how can I write MR
jobs in R?
Use RHIPE Package
• Well how about Python, Scala, Ruby, etc programmers?
Does Hadoop support all these?
Use Hadoop streaming
10. Training Links
Hadoop Installation lab: (3000 + Youtube Hits)
Hadoop HDFS File system Lab:
LinkedIn-Group ( real time discussion)
Please join linked in group for regular updates on my learning in Hadoop / Bigdata Real time work.
11. Course Material
Recordings – All sessions - 40 Hours
Exercises – 30+ Fully solved
Certification questions – 2 sets
Resumes -2 sets
Online Case Study – Insurance Domain
Virtual Machine – Red Hat OS. ( Oracle Virtual Box
Linked in group discussion – Online Hadoop Learning