Introduction to Big Data and Hadoop
Agenda
• What is Big Data?
• Facts about Big Data
• Need for Big Data
• Hadoop Overview
• Big Data Course at GreyCampus
• Understanding HDFS and MapReduce
• Enterprise Hadoop
• Hadoop Career Opportunities
• Pre-Requisites to learn Big Data
What is Big Data?
• Big data is a buzzword describing massive volume of data that is so
large it is difficult to process using traditional database and software
techniques
• In most enterprise scenarios the volume of data is too big or it moves
too fast or it exceeds current processing capacity
• Despite these problems, big data has the potential to help companies
improve operations and make faster, more intelligent decisions
• Big Data has 3 attributes – Volume, Velocity and Variety
• Applications generating massive data in terabytes and petabytes
• Stock market generates 1 terabyte of data per day
What is Big Data?
Facts about Big Data
Need for Big Data
• Currently the data is 4 Zetabytes in the digital world
• The predictions are it might reach 40 Zetabytes in 2020
• The size of the data is doubled every 2 years
• We need mechanisms to store this data
• We need to process this data almost in real time for helping businesses
make informed decisions
• Fact: We are generating big data and we need faster information
Need for Big Data
Overview of Hadoop
Big Data Course at GreyCampus
• Module 1
– Introduction to Big Data
• Module 2
– HDFS Architecture
• Module 3
– MapReduce
• Module 4
– Advanced MapReduce
• Module 5
– Hive
• Module 6
– PIG
• Module 7
– HBase and Zookeeper
• Module 8
– Sqoop and Flume – Moving Data to and
from HDFS
• Module 9
– Hadoop Ecosystem and Components –
Introduction
• Module 10
– Commercial Distributions of Hadoop
Course Topics:
Features of HDFS
• When a dataset outgrows the storage capacity of a single physical machine, it
becomes necessary to partition it across a number of separate machines
• File systems that manage the storage across a network of machines are called
distributed filesystems
• HDFS is a filesystem designed for storing very large files with streaming data
access patterns, running on clusters of commodity hardware
• There are Hadoop clusters running today that store petabytes of data
Features of MapReduce
• Developers don’t have to worry about the plumbing for their jobs
• No threads or inter process communications or semaphores to program
• Just write programs that process part of your input files and produce the output
• The mappers and reducers share nothing. That means each mapper is independent of
what other mapper does and each reducer is independent of other reducers
• So the mappers and reducers can be massively parallel
• The MapReduce system is built handling failure
• The system is built robust so that the users don’t have to take any action and the system
automatically handles the failures.
Enterprise Hadoop
Hadoop Career Advantages
More Job Opportunities!
Look who is hiring!
Hadoop means high on salary!
Transform your career
Future of Big Data
Careers in Big Data
• “By 2015, 4.4 million IT jobs globally will be created to support big
data, generating 1.9 million IT jobs in the United States,” said Peter
Sondergaard, senior vice president at Gartner and global head of
Research. “In addition, every big data-related role in the U.S. will
create employment for three people outside of IT, so over the next
four years a total of 6 million jobs in the U.S. will be generated by
the information economy.“
Careers in Big Data
• “But there is a challenge. There is not enough talent in the industry. Our
public and private education systems are failing us. Therefore, only one-
third of the IT jobs will be filled. Data experts will be a scarce, valuable
commodity,” Mr. Sondergaard said. “IT leaders will need immediate focus
on how their organization develops and attracts the skills required. These
jobs will be needed to grow your business. These jobs are the future of
the new information economy.”
Pre-Requisites
• Good programming skills
• Basic understanding of database management systems
• Knowledge on core Java (added advantage)

Introduction to Big Data and Hadoop

  • 1.
    Introduction to BigData and Hadoop
  • 2.
    Agenda • What isBig Data? • Facts about Big Data • Need for Big Data • Hadoop Overview • Big Data Course at GreyCampus • Understanding HDFS and MapReduce • Enterprise Hadoop • Hadoop Career Opportunities • Pre-Requisites to learn Big Data
  • 3.
    What is BigData? • Big data is a buzzword describing massive volume of data that is so large it is difficult to process using traditional database and software techniques • In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity • Despite these problems, big data has the potential to help companies improve operations and make faster, more intelligent decisions • Big Data has 3 attributes – Volume, Velocity and Variety
  • 4.
    • Applications generatingmassive data in terabytes and petabytes • Stock market generates 1 terabyte of data per day What is Big Data?
  • 5.
  • 6.
    Need for BigData • Currently the data is 4 Zetabytes in the digital world • The predictions are it might reach 40 Zetabytes in 2020 • The size of the data is doubled every 2 years • We need mechanisms to store this data • We need to process this data almost in real time for helping businesses make informed decisions • Fact: We are generating big data and we need faster information
  • 7.
  • 8.
  • 9.
    Big Data Courseat GreyCampus • Module 1 – Introduction to Big Data • Module 2 – HDFS Architecture • Module 3 – MapReduce • Module 4 – Advanced MapReduce • Module 5 – Hive • Module 6 – PIG • Module 7 – HBase and Zookeeper • Module 8 – Sqoop and Flume – Moving Data to and from HDFS • Module 9 – Hadoop Ecosystem and Components – Introduction • Module 10 – Commercial Distributions of Hadoop Course Topics:
  • 10.
    Features of HDFS •When a dataset outgrows the storage capacity of a single physical machine, it becomes necessary to partition it across a number of separate machines • File systems that manage the storage across a network of machines are called distributed filesystems • HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware • There are Hadoop clusters running today that store petabytes of data
  • 11.
    Features of MapReduce •Developers don’t have to worry about the plumbing for their jobs • No threads or inter process communications or semaphores to program • Just write programs that process part of your input files and produce the output • The mappers and reducers share nothing. That means each mapper is independent of what other mapper does and each reducer is independent of other reducers • So the mappers and reducers can be massively parallel • The MapReduce system is built handling failure • The system is built robust so that the users don’t have to take any action and the system automatically handles the failures.
  • 12.
  • 13.
  • 14.
  • 15.
    Look who ishiring!
  • 16.
  • 17.
  • 18.
  • 19.
    Careers in BigData • “By 2015, 4.4 million IT jobs globally will be created to support big data, generating 1.9 million IT jobs in the United States,” said Peter Sondergaard, senior vice president at Gartner and global head of Research. “In addition, every big data-related role in the U.S. will create employment for three people outside of IT, so over the next four years a total of 6 million jobs in the U.S. will be generated by the information economy.“
  • 20.
    Careers in BigData • “But there is a challenge. There is not enough talent in the industry. Our public and private education systems are failing us. Therefore, only one- third of the IT jobs will be filled. Data experts will be a scarce, valuable commodity,” Mr. Sondergaard said. “IT leaders will need immediate focus on how their organization develops and attracts the skills required. These jobs will be needed to grow your business. These jobs are the future of the new information economy.”
  • 21.
    Pre-Requisites • Good programmingskills • Basic understanding of database management systems • Knowledge on core Java (added advantage)

Editor's Notes

  • #6 Please change the image look – Downloaded from HP website
  • #13 Please change the image – copied from radar.oreilly.com
  • #14 Source: http://www.edureka.co/blog/5-reasons-to-learn-hadoop/
  • #18 http://www.slideshare.net/innotech_conference/psl-hadoop-032812