Hadoop for beginners free course ppt

2,928 views

Published on

This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data.

This course is available on hadoop-skills.com for free!

This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:
• This course builds Understanding of Big Data problems with easy to understand examples and illustrations.
• History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop and was called Nutch
• What is Hadoop Magic which makes it so unique and powerful.
• Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
• And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.

This course is available for free on hadoop-skills.com

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,928
On SlideShare
0
From Embeds
0
Number of Embeds
152
Actions
Shares
0
Downloads
157
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Hadoop for beginners free course ppt

  1. 1.  Facebook,Twitter, Google generating petabytes of data everyday.  Hadron Collider project discarding large amount of data as they won’t be able to analyse. Hoping that they haven’t thrown anything valuable. Interesting facts but ….Why is Big Data important? Lets understand via an example
  2. 2. Bank Optimal Price? Maximise Profit Insurance 3rd Party Survey Expert Debates Optimal Price
  3. 3. Bank Optimal Price? Maximise Profit Insurance Optimal Price Data Warehousing Repository WebActivity Transaction Competitors Pricing MarketTrends Statistics Data WarehouseRun Statistical Algorithms Decision Support System
  4. 4. Volume VelocityVariety
  5. 5. Bank Optimal Price? Maximise Profit Insurance Optimal Price Data Warehousing Repository WebActivity Transaction Competitors Pricing MarketTrends Statistics Data WarehouseRun Statistical Algorithms Decision Support System
  6. 6. Decision Support System Digital Nervous System Data Fundamental block to Data Fundamental Block to Business @ speed of thought Sense Interpret Decide Act Organisations behaving like Biological nervous system AvatarSkynet
  7. 7. Bank Repository WebActivity Transaction Competitors Pricing MarketTrends Statistics Optimal Price Mobile Alert with Travel insurance
  8. 8. International DataCorporation’s (IDC) 6th annual study:  From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes  More than 5,200 gigabytes for every man, woman, and child in 2020.  From now until 2020, the digital universe will about double every two years.  33% of the digital data might be valuable if analysed, compared with 25% today. From Gartner:  4.4 Million IT Jobs Globally to Support Big Data By 2015.
  9. 9. 2003-041996-2000 2005-06 2010 2013 Google File System And MapReduce Papers YARN/MapReduce 2/ Next Generation Hadoop Hadoop spawns off Nutch Big Data problem faced by All Search engines and Mike Dreadnaught Doug Joins Cloudera 0.xx Releases of hadoop
  10. 10. PriceAdvantage: 1. Clusters use commodity hardware, cheaper than one expensive server. 2. Software License is free.
  11. 11. HDFS MapReduce Google File System Google MapReduce file1 Name node Data nodes map map map map map Reduce User
  12. 12. HDFS MapReduce HBase Pig Hive Sqoop/Flume Log collection Yahoo Facebook Storm Chukwa Kafka Structured Stores Message broker Oozie
  13. 13. Complex Algorithm on a small dataset SimpleAlgorithm on a large dataset 1. Complex Algorithms needs to be correctly sensitive to week correlations. 2. Complex Algorithms are thus difficult to code and design.
  14. 14. Data Engineer Data Scientist Role Skills To solve business problems using data. To engineer software solutions. More of programing and technical skills and ability to architect technical solutions. Strong of Mathematical Skills and understanding of statistical Models.
  15. 15. -> SkeletonVersion ->All the ecosystems need to be additionally installed. -> Important ecosystem members included. -> Few Proprietary tools like Enterprise Manager. -> Proprietary Hadoop code written in C. -> Integrated with Hadoop ecosystem members. -> Based out of Apache hadoop. -> Supports .NET framework -> Launches Hadoop Distribution: Pivotal HD
  16. 16. ThankYou!!!
  17. 17. Superstar-Doug!!! A small fan :- Me And the real Hadoop

×