Hadoop big data introduction and training

Uploaded on

Hadoop Big Data Introduction and Training Details

Hadoop Big Data Introduction and Training Details

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Hadoop – BIG DATA
  • 2. Do you Know?  The total volume of electronic data stored is approximately 2 zettabytes (1 billion TB)  Do you know how many photos FB host?  10 billion photos, nearly 1PB  Do you know how much data internet archive stores everyday?  Around 2PB of data and is growing at a rate of 20PB of data  15 million smart meters (US) generating data at the rate of 3GB per second  Events collected through user interaction from sites are generated at the rate of 1.5GB per second.
  • 3. What is BIG DATA?  Volume  Velocity  Variety
  • 4. Challenges?  Complex  Near real time Analytics  Storage  Computation
  • 5. Who is using Hadoop? • Amazon • Facebook • Google • IBM • Yahoo! • Last.fm • New York Times • PowerSet • Veoh
  • 6. What makes Hadoop special? • No high end or expensive systems are required – Built on commodity hardwares • Can run on Linux, Mac OS/X, Windows, Solaris • Fault tolerant system – Execution of the job continues even of nodes are failing • Highly reliable and efficient storage system • In built intelligence to speed up the application – Speculative execution • Fit for lot of applications: – Web log processing – Page Indexing,page ranking – Complex event processing
  • 7. Overview of HDFS architecture
  • 8. Overview of MapReduce Programming Model
  • 9. Does Hadoop solves every one problem????? • I am DB guy, I am proficient in writing SQL and trying very hard to optimize my queries, but still not able to do so. Moreover I am not Java geek. Will this solve my problem Use Hive/HBase • Hadoop is written in Java, and I am purely from C++ back ground, how I can use Hadoop for my big data problems? Use Hadoop Pipes • I am a statistician and I know only R, how can I write MR jobs in R? Use RHIPE Package • Well how about Python, Scala, Ruby, etc programmers? Does Hadoop support all these? Use Hadoop streaming
  • 10. Training Links  Course Details: http://onlinetraining2011.blogspot.com/2012/12/apache-hadoop-and-aws-mapreduce-training.html  Sample Session: Hadoop Installation lab: (3000 + Youtube Hits) http://www.youtube.com/watch?v=i9yckEduQBE Hadoop HDFS File system Lab: http://www.youtube.com/watch?v=Pp8SV50S9HM Case Study:  http://www.linkedin.com/groups/Insurance-Company-Case-Study-Hadoop4838165.S.256068004?qid=19f108a9-f563-4f99-9287-c19a1375ecf4&trk=groups_most_recent-0-bttl&goback=%2Egmr_4838165  LinkedIn-Group ( real time discussion)  Please join linked in group for regular updates on my learning in Hadoop / Bigdata Real time work.  http://www.linkedin.com/groups/Online-Hadoop-Training-4838165
  • 11. Course Material  Recordings – All sessions - 40 Hours  Exercises – 30+ Fully solved  Certification questions – 2 sets  Resumes -2 sets  Online Case Study – Insurance Domain  Virtual Machine – Red Hat OS. ( Oracle Virtual Box Manager).  Linked in group discussion – Online Hadoop Learning
  • 12. Training Details  GotoMeeting  40 – 45 Hours  1 hours weekday / Weekends  Contact: onlinetraining2011@gmail.com
  • 13. Thank You