0
Hadoop – BIG DATA
Do you Know?
 The total volume of electronic data stored is approximately 2 zettabytes (1

billion TB)

 Do you know how...
What is BIG DATA?
 Volume
 Velocity
 Variety
Challenges?
 Complex
 Near real time Analytics
 Storage
 Computation
Who is using Hadoop?
•

Amazon

•

Facebook

•

Google

•

IBM

•

Yahoo!

•

Last.fm

•

New York Times

•

PowerSet

•

...
What makes Hadoop special?
•

No high end or expensive systems are required
– Built on commodity hardwares

•

Can run on ...
Overview of HDFS architecture
Overview of MapReduce
Programming Model
Does Hadoop solves every one problem?????
• I am DB guy, I am proficient in writing SQL and trying very
hard to optimize m...
Training Links


Course Details:

http://onlinetraining2011.blogspot.com/2012/12/apache-hadoop-and-aws-mapreduce-training...
Course Material
 Recordings – All sessions - 40 Hours

 Exercises – 30+ Fully solved
 Certification questions – 2 sets
...
Training Details
 GotoMeeting

 40 – 45 Hours
 1 hours weekday / Weekends
 Contact: onlinetraining2011@gmail.com
Thank You
Upcoming SlideShare
Loading in...5
×

Hadoop big data introduction and training

248

Published on

Hadoop Big Data Introduction and Training Details

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
248
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop big data introduction and training"

  1. 1. Hadoop – BIG DATA
  2. 2. Do you Know?  The total volume of electronic data stored is approximately 2 zettabytes (1 billion TB)  Do you know how many photos FB host?  10 billion photos, nearly 1PB  Do you know how much data internet archive stores everyday?  Around 2PB of data and is growing at a rate of 20PB of data  15 million smart meters (US) generating data at the rate of 3GB per second  Events collected through user interaction from sites are generated at the rate of 1.5GB per second.
  3. 3. What is BIG DATA?  Volume  Velocity  Variety
  4. 4. Challenges?  Complex  Near real time Analytics  Storage  Computation
  5. 5. Who is using Hadoop? • Amazon • Facebook • Google • IBM • Yahoo! • Last.fm • New York Times • PowerSet • Veoh
  6. 6. What makes Hadoop special? • No high end or expensive systems are required – Built on commodity hardwares • Can run on Linux, Mac OS/X, Windows, Solaris • Fault tolerant system – Execution of the job continues even of nodes are failing • Highly reliable and efficient storage system • In built intelligence to speed up the application – Speculative execution • Fit for lot of applications: – Web log processing – Page Indexing,page ranking – Complex event processing
  7. 7. Overview of HDFS architecture
  8. 8. Overview of MapReduce Programming Model
  9. 9. Does Hadoop solves every one problem????? • I am DB guy, I am proficient in writing SQL and trying very hard to optimize my queries, but still not able to do so. Moreover I am not Java geek. Will this solve my problem Use Hive/HBase • Hadoop is written in Java, and I am purely from C++ back ground, how I can use Hadoop for my big data problems? Use Hadoop Pipes • I am a statistician and I know only R, how can I write MR jobs in R? Use RHIPE Package • Well how about Python, Scala, Ruby, etc programmers? Does Hadoop support all these? Use Hadoop streaming
  10. 10. Training Links  Course Details: http://onlinetraining2011.blogspot.com/2012/12/apache-hadoop-and-aws-mapreduce-training.html  Sample Session: Hadoop Installation lab: (3000 + Youtube Hits) http://www.youtube.com/watch?v=i9yckEduQBE Hadoop HDFS File system Lab: http://www.youtube.com/watch?v=Pp8SV50S9HM Case Study:  http://www.linkedin.com/groups/Insurance-Company-Case-Study-Hadoop4838165.S.256068004?qid=19f108a9-f563-4f99-9287-c19a1375ecf4&trk=groups_most_recent-0-bttl&goback=%2Egmr_4838165  LinkedIn-Group ( real time discussion)  Please join linked in group for regular updates on my learning in Hadoop / Bigdata Real time work.  http://www.linkedin.com/groups/Online-Hadoop-Training-4838165
  11. 11. Course Material  Recordings – All sessions - 40 Hours  Exercises – 30+ Fully solved  Certification questions – 2 sets  Resumes -2 sets  Online Case Study – Insurance Domain  Virtual Machine – Red Hat OS. ( Oracle Virtual Box Manager).  Linked in group discussion – Online Hadoop Learning
  12. 12. Training Details  GotoMeeting  40 – 45 Hours  1 hours weekday / Weekends  Contact: onlinetraining2011@gmail.com
  13. 13. Thank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×