2. Agenda
• What is Big Data?
• Facts about Big Data
• Need for Big Data
• Hadoop Overview
• Big Data Course at GreyCampus
• Understanding HDFS and MapReduce
• Enterprise Hadoop
• Hadoop Career Opportunities
• Pre-Requisites to learn Big Data
3. What is Big Data?
• Big data is a buzzword describing massive volume of data that is so
large it is difficult to process using traditional database and software
techniques
• In most enterprise scenarios the volume of data is too big or it moves
too fast or it exceeds current processing capacity
• Despite these problems, big data has the potential to help companies
improve operations and make faster, more intelligent decisions
• Big Data has 3 attributes – Volume, Velocity and Variety
4. • Applications generating massive data in terabytes and petabytes
• Stock market generates 1 terabyte of data per day
What is Big Data?
6. Need for Big Data
• Currently the data is 4 Zetabytes in the digital world
• The predictions are it might reach 40 Zetabytes in 2020
• The size of the data is doubled every 2 years
• We need mechanisms to store this data
• We need to process this data almost in real time for helping businesses
make informed decisions
• Fact: We are generating big data and we need faster information
9. Big Data Course at GreyCampus
• Module 1
– Introduction to Big Data
• Module 2
– HDFS Architecture
• Module 3
– MapReduce
• Module 4
– Advanced MapReduce
• Module 5
– Hive
• Module 6
– PIG
• Module 7
– HBase and Zookeeper
• Module 8
– Sqoop and Flume – Moving Data to and
from HDFS
• Module 9
– Hadoop Ecosystem and Components –
Introduction
• Module 10
– Commercial Distributions of Hadoop
Course Topics:
10. Features of HDFS
• When a dataset outgrows the storage capacity of a single physical machine, it
becomes necessary to partition it across a number of separate machines
• File systems that manage the storage across a network of machines are called
distributed filesystems
• HDFS is a filesystem designed for storing very large files with streaming data
access patterns, running on clusters of commodity hardware
• There are Hadoop clusters running today that store petabytes of data
11. Features of MapReduce
• Developers don’t have to worry about the plumbing for their jobs
• No threads or inter process communications or semaphores to program
• Just write programs that process part of your input files and produce the output
• The mappers and reducers share nothing. That means each mapper is independent of
what other mapper does and each reducer is independent of other reducers
• So the mappers and reducers can be massively parallel
• The MapReduce system is built handling failure
• The system is built robust so that the users don’t have to take any action and the system
automatically handles the failures.
19. Careers in Big Data
• “By 2015, 4.4 million IT jobs globally will be created to support big
data, generating 1.9 million IT jobs in the United States,” said Peter
Sondergaard, senior vice president at Gartner and global head of
Research. “In addition, every big data-related role in the U.S. will
create employment for three people outside of IT, so over the next
four years a total of 6 million jobs in the U.S. will be generated by
the information economy.“
20. Careers in Big Data
• “But there is a challenge. There is not enough talent in the industry. Our
public and private education systems are failing us. Therefore, only one-
third of the IT jobs will be filled. Data experts will be a scarce, valuable
commodity,” Mr. Sondergaard said. “IT leaders will need immediate focus
on how their organization develops and attracts the skills required. These
jobs will be needed to grow your business. These jobs are the future of
the new information economy.”
21. Pre-Requisites
• Good programming skills
• Basic understanding of database management systems
• Knowledge on core Java (added advantage)
Editor's Notes
Please change the image look – Downloaded from HP website
Please change the image – copied from radar.oreilly.com