With increasing amount of digital data, it has become essential to find a technology that can be used to analyse this data. The real big question is 'Why Big Data should matter to you?'
You can watch the video for more information on this.
Topics Included in this Presentation:
Big Data
Big Data and Hadoop
Why Hadoop?
Hadoop: The future of Data Management
Hadoop: Job Roles
Hadoop: Growth and Job Opportunities
For more information: http://www.edureka.in/big-data-and-hadoop
=============================================
Experience Instructors Led Online Training with 24x7 support at Edureka.
Edureka provides online training courses for Big Data and Hadoop, Hadoop Admin, Cassandra, Data Science, Cloud Computing, Android Development.
Please write back to us at sales@edureka.in or call us at +91-8880862004 for more information.
http://www.edureka.in
Boost PC performance: How more available memory can improve productivity
What is Big Data and Why Learn Hadoop
1. Slide 1
What is Big Data
and
Why learn Hadoop
View Hadoop Courses at : www.edureka.in/hadoop
*
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
2. www.edureka.in/hadoopSlide 2
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Objectives of this Session
• Un
• What is Big Data
• Traditional Warehouse vs. Hadoop – Sears Case Study
• Why Should I Learn Hadoop & Related Technologies
• Jobs and Trends in Big Data
• Hadoop Architecture and Eco-System
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
3. www.edureka.in/hadoopSlide 3
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Big Data
Lots of Data (Terabytes or Petabytes)
Big data is the term for a collection of data sets
so large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications
The challenges include capture, curation,
storage, search, sharing, transfer, analysis, and
visualization
cloud
tools
statistics
No SQL
compression
storage
support
database
analyze
information
terabytes
processing
mobile
Big Data
4. www.edureka.in/hadoopSlide 4
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Unstructured Data is Exploding
2,500 exabytes of new information in 2012 with internet as primary driver
“Digital universe grew by 62% last year to 800K petabytes and will grow to1.2 zettabytes” this year
5. www.edureka.in/hadoopSlide 5
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Big Data - Challenges
Increasing Data Volumes New data sources and types
Email and documents
Social Media, Web Logs
Machine Device (Scientific)
Transactions,
OLTP, OLAP
6. www.edureka.in/hadoopSlide 6
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Big Data is here
Bad News We are struggling to
store, process and
analyze it.
Good News
Big Data - Challenges (Contd.)
7. www.edureka.in/hadoopSlide 7
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Common Big Data Customer Scenarios
Banks and Financial services
Modeling True Risk
Threat Analysis
Fraud Detection
Trade Surveillance
Credit Scoring and Analysis
Retail
Point of Sales Transaction Analysis
Customer Churn Analysis
Sentiment Analysis
http://wiki.apache.org/hadoop/PoweredBy
8. www.edureka.in/hadoopSlide 8
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hidden Treasure – Case Study
Case Study: Sears Holding Corporation
X
*Sears was using traditional systems such as Oracle Exadata,
Teradata and SAS etc. to store and process the customer activity
and sales data.
Insight into data can provide Business Advantage.
Some key early indicators can mean Fortunes to Business.
More Precise Analysis with more data.
9. www.edureka.in/hadoopSlide 9
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038?
90% of
the ~2PB
Archived
Storage
Processing
Instrumentation
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
ETL Compute Grid
3. Premature data
death
1. Can’t explore original
high fidelity raw data
2. Moving data to compute
doesn’t scale
Mostly Append
A meagre
10% of the
~2PB Data is
available for
BI
Storage only Grid (original Raw Data)
Collection
Limitations of Existing Data Analytics Architecture
10. www.edureka.in/hadoopSlide 10
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
*Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather
than a meagre 10% as was the case with existing Non-Hadoop solutions.
No Data
Archiving
1. Data Exploration &
Advanced analytics
2. Scalable throughput for ETL &
aggregation
3. Keep data alive
forever
Mostly Append
Instrumentation
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
Collection
Hadoop : Storage + Compute Grid
Entire ~2PB
Data is
available for
processing
Both
Storage
And
Processing
Solution: A Combined Storage Computer Layer
11. www.edureka.in/hadoopSlide 11
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Why move to Hadoop?
Hadoop is red-hot as it:
allows distributed processing of large data sets across clusters
of computers using simple programming model.
has become the de facto standard for storing, processing, and
analyzing hundreds of terabytes and petabytes of data.
Is cheaper to use in comparison to other traditional proprietary
technologies such as Oracle, IBM etc. It can runs on low cost
commodity hardware.
Can handle all types of data from disparate systems such server
logs, emails, sensor data, pictures, videos etc.
12. Slide 12 www.edureka.in/hadoop
Hadoop: Growth and Job Opportunities (Contd.)
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
As per the 2012-13 Salary Survey by Dice, a leading career site for technology and engineering
professionals:
Out of the big three, mobile, cloud and data, there’s one that is having a disproportionate impact on
salaries – it’s big data.
Salaries reported by those who regularly use Hadoop, NoSQL, and Mongo DB are all north of $100,000.
By comparison, average salaries for technologies closely associated with cloud and virtualization are
just under $90,000.
http://media.dice.com/report/2013-2012-dice-salary-survey/
“We’ve heard it’s a fad, heard it’s hyped and heard it’s fleeting, yet it’s clear that data professionals are in
demand and well paid. Tech professionals who analyse large data streams and strategically impact the
overall business goals of a firm have an opportunity to write their own ticket." said Alice Hill, Managing
Director of Dice.com.
13. www.edureka.in/hadoopSlide 13
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop is in Demand!
Big Data Analyst
Big Data Architect
Big Data Engineer
Big Data Research Analyst
Big Data Visualizer
Data Scientist
50
43
44
31
23
18
50
57
56
69
77
82
Filled job vs unfilled jobs in big data
Filled Unfilled
Vacancy/Filled(%)
Gartner Says Big Data Creates Big Jobs: 4.4 Million IT
Jobs Globally to Support Big Data By
2015http://www.gartner.com/newsroom/id/2207915
14. Slide 14 www.edureka.in/hadoop
Hadoop: Growth and Job Opportunities (Contd.)
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
60000
65000
70000
75000
80000
85000
90000
95000
100000
105000
110000
Salary – Other Technologies vs Hadoop
Salaries (USD)
15. www.edureka.in/hadoopSlide 15
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop for Big Data
Apache Hadoop is a framework that allows for the distributed processing of large data sets across
clusters of commodity computers using a simple programming model.
It is an Open-source Data Management with scale-out storage & distributed processing.
16. www.edureka.in/hadoopSlide 16
Apache Oozie (Workflow)
HDFS (Hadoop Distributed File System)
Pig Latin
Data Analysis
Mahout
Machine Learning
Hive
DW System
MapReduce Framework
HBase
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data
Structured Data
Hadoop Eco-System
ETL/DW
Professionals
Developers /
Programmers
DBA / Administrators
Twitter @edurekaIN, Facebook /edurekaIN, use #askedureka for Questions
17. www.edureka.in/hadoopSlide 17
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop and MapReduce
Hadoop is a system for large scale data processing.
It has two main components:
HDFS – Hadoop Distributed File System (Storage)
highly fault-tolerant
high throughput access to application data
suitable for applications that have large data set
Natively redundant
MapReduce (Processing)
software framework for easily writing applications which process
vast amounts of data (multi-terabyte data-sets) in-parallel on
large clusters (thousands of nodes) in a reliable, fault-tolerant
manner
Splits a task across processors
Map-Reduce
Key Value
19. Further Reading
Big Prospects for Big Data
http://www.edureka.in/blog/big-prospects-for-big-data/
Hadoop Learners Profile
http://www.edureka.in/blog/hadoop-learners-profile/
Big Bucks for Big Data
http://www.edureka.in/blog/big-bucks-for-big-data/
5 Reasons to Learn Hadoop
http://www.edureka.in/blog/5-reasons-to-learn-hadoop/
Increasing Demand for ‘Hadoop and NoSQL skills’
http://www.edureka.in/blog/increasing-demand-for-hadoop-and-nosql-skills/
20. Slide 20
Questions?
Enroll for the Complete Course at : www.edureka.in/hadoop
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.in/hadoop
Type Enroll in the questions window if you want edureka to contact you
Class Recording and Presentation will be available in 24 hours at:
http://www.edureka.in/blog/what-is-big-data-and-why-learn-hadoop/
Editor's Notes
- 2 PB of data--mostly structured and unstructured data such as customer transaction, point of sale, and supply chain.
- Because of Archiving Need 90% of the ~2PB of Data is not available for BI