SlideShare a Scribd company logo
1 of 20
Slide 1
What is Big Data
and
Why learn Hadoop
View Hadoop Courses at : www.edureka.in/hadoop
*
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/hadoopSlide 2
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Objectives of this Session
• Un
• What is Big Data
• Traditional Warehouse vs. Hadoop – Sears Case Study
• Why Should I Learn Hadoop & Related Technologies
• Jobs and Trends in Big Data
• Hadoop Architecture and Eco-System
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
www.edureka.in/hadoopSlide 3
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Big Data
 Lots of Data (Terabytes or Petabytes)
 Big data is the term for a collection of data sets
so large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications
 The challenges include capture, curation,
storage, search, sharing, transfer, analysis, and
visualization
cloud
tools
statistics
No SQL
compression
storage
support
database
analyze
information
terabytes
processing
mobile
Big Data
www.edureka.in/hadoopSlide 4
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Unstructured Data is Exploding
 2,500 exabytes of new information in 2012 with internet as primary driver
 “Digital universe grew by 62% last year to 800K petabytes and will grow to1.2 zettabytes” this year
www.edureka.in/hadoopSlide 5
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Big Data - Challenges
Increasing Data Volumes New data sources and types
Email and documents
Social Media, Web Logs
Machine Device (Scientific)
Transactions,
OLTP, OLAP
www.edureka.in/hadoopSlide 6
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Big Data is here
Bad News We are struggling to
store, process and
analyze it.
Good News
Big Data - Challenges (Contd.)
www.edureka.in/hadoopSlide 7
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Common Big Data Customer Scenarios
 Banks and Financial services
 Modeling True Risk
 Threat Analysis
 Fraud Detection
 Trade Surveillance
 Credit Scoring and Analysis
 Retail
 Point of Sales Transaction Analysis
 Customer Churn Analysis
 Sentiment Analysis
http://wiki.apache.org/hadoop/PoweredBy
www.edureka.in/hadoopSlide 8
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hidden Treasure – Case Study
Case Study: Sears Holding Corporation
X
*Sears was using traditional systems such as Oracle Exadata,
Teradata and SAS etc. to store and process the customer activity
and sales data.
 Insight into data can provide Business Advantage.
 Some key early indicators can mean Fortunes to Business.
 More Precise Analysis with more data.
www.edureka.in/hadoopSlide 9
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038?
90% of
the ~2PB
Archived
Storage
Processing
Instrumentation
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
ETL Compute Grid
3. Premature data
death
1. Can’t explore original
high fidelity raw data
2. Moving data to compute
doesn’t scale
Mostly Append
A meagre
10% of the
~2PB Data is
available for
BI
Storage only Grid (original Raw Data)
Collection
Limitations of Existing Data Analytics Architecture
www.edureka.in/hadoopSlide 10
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
*Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather
than a meagre 10% as was the case with existing Non-Hadoop solutions.
No Data
Archiving
1. Data Exploration &
Advanced analytics
2. Scalable throughput for ETL &
aggregation
3. Keep data alive
forever
Mostly Append
Instrumentation
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
Collection
Hadoop : Storage + Compute Grid
Entire ~2PB
Data is
available for
processing
Both
Storage
And
Processing
Solution: A Combined Storage Computer Layer
www.edureka.in/hadoopSlide 11
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Why move to Hadoop?
Hadoop is red-hot as it:
 allows distributed processing of large data sets across clusters
of computers using simple programming model.
 has become the de facto standard for storing, processing, and
analyzing hundreds of terabytes and petabytes of data.
 Is cheaper to use in comparison to other traditional proprietary
technologies such as Oracle, IBM etc. It can runs on low cost
commodity hardware.
 Can handle all types of data from disparate systems such server
logs, emails, sensor data, pictures, videos etc.
Slide 12 www.edureka.in/hadoop
Hadoop: Growth and Job Opportunities (Contd.)
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
As per the 2012-13 Salary Survey by Dice, a leading career site for technology and engineering
professionals:
 Out of the big three, mobile, cloud and data, there’s one that is having a disproportionate impact on
salaries – it’s big data.
 Salaries reported by those who regularly use Hadoop, NoSQL, and Mongo DB are all north of $100,000.
By comparison, average salaries for technologies closely associated with cloud and virtualization are
just under $90,000.
http://media.dice.com/report/2013-2012-dice-salary-survey/
“We’ve heard it’s a fad, heard it’s hyped and heard it’s fleeting, yet it’s clear that data professionals are in
demand and well paid. Tech professionals who analyse large data streams and strategically impact the
overall business goals of a firm have an opportunity to write their own ticket." said Alice Hill, Managing
Director of Dice.com.
www.edureka.in/hadoopSlide 13
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop is in Demand!
Big Data Analyst
Big Data Architect
Big Data Engineer
Big Data Research Analyst
Big Data Visualizer
Data Scientist
50
43
44
31
23
18
50
57
56
69
77
82
Filled job vs unfilled jobs in big data
Filled Unfilled
Vacancy/Filled(%)
Gartner Says Big Data Creates Big Jobs: 4.4 Million IT
Jobs Globally to Support Big Data By
2015http://www.gartner.com/newsroom/id/2207915
Slide 14 www.edureka.in/hadoop
Hadoop: Growth and Job Opportunities (Contd.)
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
60000
65000
70000
75000
80000
85000
90000
95000
100000
105000
110000
Salary – Other Technologies vs Hadoop
Salaries (USD)
www.edureka.in/hadoopSlide 15
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop for Big Data
 Apache Hadoop is a framework that allows for the distributed processing of large data sets across
clusters of commodity computers using a simple programming model.
 It is an Open-source Data Management with scale-out storage & distributed processing.
www.edureka.in/hadoopSlide 16
Apache Oozie (Workflow)
HDFS (Hadoop Distributed File System)
Pig Latin
Data Analysis
Mahout
Machine Learning
Hive
DW System
MapReduce Framework
HBase
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data
Structured Data
Hadoop Eco-System
ETL/DW
Professionals
Developers /
Programmers
DBA / Administrators
Twitter @edurekaIN, Facebook /edurekaIN, use #askedureka for Questions
www.edureka.in/hadoopSlide 17
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop and MapReduce
Hadoop is a system for large scale data processing.
It has two main components:
 HDFS – Hadoop Distributed File System (Storage)
 highly fault-tolerant
 high throughput access to application data
 suitable for applications that have large data set
 Natively redundant
MapReduce (Processing)
 software framework for easily writing applications which process
vast amounts of data (multi-terabyte data-sets) in-parallel on
large clusters (thousands of nodes) in a reliable, fault-tolerant
manner
 Splits a task across processors
Map-Reduce
Key Value
Slide 18 www.edureka.in/hadoop
BATCH
(MapReduce)
INTERACTIVE
(Text)
ONLINE
(HBase)
STREAMING
(Storm, S4, …)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
HPC MPI
(OpenMPI)
OTHER
(Search)
(Weave..)
www.edureka.in/hadoop
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html
Hadoop 2.0 : Much More is Possible
Twitter @edurekaIN, Facebook /edurekaIN, use #askedureka for Questions
Further Reading
 Big Prospects for Big Data
http://www.edureka.in/blog/big-prospects-for-big-data/
 Hadoop Learners Profile
http://www.edureka.in/blog/hadoop-learners-profile/
 Big Bucks for Big Data
http://www.edureka.in/blog/big-bucks-for-big-data/
 5 Reasons to Learn Hadoop
http://www.edureka.in/blog/5-reasons-to-learn-hadoop/
 Increasing Demand for ‘Hadoop and NoSQL skills’
http://www.edureka.in/blog/increasing-demand-for-hadoop-and-nosql-skills/
Slide 20
Questions?
Enroll for the Complete Course at : www.edureka.in/hadoop
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.in/hadoop
Type Enroll in the questions window if you want edureka to contact you
Class Recording and Presentation will be available in 24 hours at:
http://www.edureka.in/blog/what-is-big-data-and-why-learn-hadoop/

More Related Content

More from Edureka!

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

What is Big Data and Why Learn Hadoop

  • 1. Slide 1 What is Big Data and Why learn Hadoop View Hadoop Courses at : www.edureka.in/hadoop * Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 2. www.edureka.in/hadoopSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Objectives of this Session • Un • What is Big Data • Traditional Warehouse vs. Hadoop – Sears Case Study • Why Should I Learn Hadoop & Related Technologies • Jobs and Trends in Big Data • Hadoop Architecture and Eco-System For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN
  • 3. www.edureka.in/hadoopSlide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization cloud tools statistics No SQL compression storage support database analyze information terabytes processing mobile Big Data
  • 4. www.edureka.in/hadoopSlide 4 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Unstructured Data is Exploding  2,500 exabytes of new information in 2012 with internet as primary driver  “Digital universe grew by 62% last year to 800K petabytes and will grow to1.2 zettabytes” this year
  • 5. www.edureka.in/hadoopSlide 5 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data - Challenges Increasing Data Volumes New data sources and types Email and documents Social Media, Web Logs Machine Device (Scientific) Transactions, OLTP, OLAP
  • 6. www.edureka.in/hadoopSlide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data is here Bad News We are struggling to store, process and analyze it. Good News Big Data - Challenges (Contd.)
  • 7. www.edureka.in/hadoopSlide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Common Big Data Customer Scenarios  Banks and Financial services  Modeling True Risk  Threat Analysis  Fraud Detection  Trade Surveillance  Credit Scoring and Analysis  Retail  Point of Sales Transaction Analysis  Customer Churn Analysis  Sentiment Analysis http://wiki.apache.org/hadoop/PoweredBy
  • 8. www.edureka.in/hadoopSlide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hidden Treasure – Case Study Case Study: Sears Holding Corporation X *Sears was using traditional systems such as Oracle Exadata, Teradata and SAS etc. to store and process the customer activity and sales data.  Insight into data can provide Business Advantage.  Some key early indicators can mean Fortunes to Business.  More Precise Analysis with more data.
  • 9. www.edureka.in/hadoopSlide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038? 90% of the ~2PB Archived Storage Processing Instrumentation BI Reports + Interactive Apps RDBMS (Aggregated Data) ETL Compute Grid 3. Premature data death 1. Can’t explore original high fidelity raw data 2. Moving data to compute doesn’t scale Mostly Append A meagre 10% of the ~2PB Data is available for BI Storage only Grid (original Raw Data) Collection Limitations of Existing Data Analytics Architecture
  • 10. www.edureka.in/hadoopSlide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions *Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather than a meagre 10% as was the case with existing Non-Hadoop solutions. No Data Archiving 1. Data Exploration & Advanced analytics 2. Scalable throughput for ETL & aggregation 3. Keep data alive forever Mostly Append Instrumentation BI Reports + Interactive Apps RDBMS (Aggregated Data) Collection Hadoop : Storage + Compute Grid Entire ~2PB Data is available for processing Both Storage And Processing Solution: A Combined Storage Computer Layer
  • 11. www.edureka.in/hadoopSlide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Why move to Hadoop? Hadoop is red-hot as it:  allows distributed processing of large data sets across clusters of computers using simple programming model.  has become the de facto standard for storing, processing, and analyzing hundreds of terabytes and petabytes of data.  Is cheaper to use in comparison to other traditional proprietary technologies such as Oracle, IBM etc. It can runs on low cost commodity hardware.  Can handle all types of data from disparate systems such server logs, emails, sensor data, pictures, videos etc.
  • 12. Slide 12 www.edureka.in/hadoop Hadoop: Growth and Job Opportunities (Contd.) Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions As per the 2012-13 Salary Survey by Dice, a leading career site for technology and engineering professionals:  Out of the big three, mobile, cloud and data, there’s one that is having a disproportionate impact on salaries – it’s big data.  Salaries reported by those who regularly use Hadoop, NoSQL, and Mongo DB are all north of $100,000. By comparison, average salaries for technologies closely associated with cloud and virtualization are just under $90,000. http://media.dice.com/report/2013-2012-dice-salary-survey/ “We’ve heard it’s a fad, heard it’s hyped and heard it’s fleeting, yet it’s clear that data professionals are in demand and well paid. Tech professionals who analyse large data streams and strategically impact the overall business goals of a firm have an opportunity to write their own ticket." said Alice Hill, Managing Director of Dice.com.
  • 13. www.edureka.in/hadoopSlide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop is in Demand! Big Data Analyst Big Data Architect Big Data Engineer Big Data Research Analyst Big Data Visualizer Data Scientist 50 43 44 31 23 18 50 57 56 69 77 82 Filled job vs unfilled jobs in big data Filled Unfilled Vacancy/Filled(%) Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015http://www.gartner.com/newsroom/id/2207915
  • 14. Slide 14 www.edureka.in/hadoop Hadoop: Growth and Job Opportunities (Contd.) Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions 60000 65000 70000 75000 80000 85000 90000 95000 100000 105000 110000 Salary – Other Technologies vs Hadoop Salaries (USD)
  • 15. www.edureka.in/hadoopSlide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing.
  • 16. www.edureka.in/hadoopSlide 16 Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Mahout Machine Learning Hive DW System MapReduce Framework HBase Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Hadoop Eco-System ETL/DW Professionals Developers / Programmers DBA / Administrators Twitter @edurekaIN, Facebook /edurekaIN, use #askedureka for Questions
  • 17. www.edureka.in/hadoopSlide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop and MapReduce Hadoop is a system for large scale data processing. It has two main components:  HDFS – Hadoop Distributed File System (Storage)  highly fault-tolerant  high throughput access to application data  suitable for applications that have large data set  Natively redundant MapReduce (Processing)  software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) in a reliable, fault-tolerant manner  Splits a task across processors Map-Reduce Key Value
  • 18. Slide 18 www.edureka.in/hadoop BATCH (MapReduce) INTERACTIVE (Text) ONLINE (HBase) STREAMING (Storm, S4, …) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..) www.edureka.in/hadoop http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html Hadoop 2.0 : Much More is Possible Twitter @edurekaIN, Facebook /edurekaIN, use #askedureka for Questions
  • 19. Further Reading  Big Prospects for Big Data http://www.edureka.in/blog/big-prospects-for-big-data/  Hadoop Learners Profile http://www.edureka.in/blog/hadoop-learners-profile/  Big Bucks for Big Data http://www.edureka.in/blog/big-bucks-for-big-data/  5 Reasons to Learn Hadoop http://www.edureka.in/blog/5-reasons-to-learn-hadoop/  Increasing Demand for ‘Hadoop and NoSQL skills’ http://www.edureka.in/blog/increasing-demand-for-hadoop-and-nosql-skills/
  • 20. Slide 20 Questions? Enroll for the Complete Course at : www.edureka.in/hadoop Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in/hadoop Type Enroll in the questions window if you want edureka to contact you Class Recording and Presentation will be available in 24 hours at: http://www.edureka.in/blog/what-is-big-data-and-why-learn-hadoop/

Editor's Notes

  1. - 2 PB of data--mostly structured and unstructured data such as customer transaction, point of sale, and supply chain. - Because of Archiving Need 90% of the ~2PB of Data is not available for BI