www.edureka.co/r-for-analytics
www.edureka.co/big-data-and-hadoop
Is Hadoop a necessity for data Science ?
Slide 2Slide 2Slide 2 www.edureka.co/big-data-and-hadoop
Today we will take you through the following:
 What is Big Data & Hadoop?
 What is a Data Product?
 What is Data Science?
 Why Hadoop for Data Science?
 Is Hadoop a necessity for Data Science?
AGENDA
Slide 3Slide 3Slide 3 www.edureka.co/big-data-and-hadoop
What is
Big Data & Hadoop?
Slide 4Slide 4Slide 4 www.edureka.co/big-data-and-hadoop
BIG DATA
Big data is a popular term used to describe the exponential
growth of data.
Big Data can be either Structured data or Unstructured data
or a combination of both.
Big Data
Slide 5Slide 5Slide 5 www.edureka.co/big-data-and-hadoop
BIGDATA
3 V’s (Volume, Variety and Velocity) are three defining properties or dimensions of Big Data.
Slide 6Slide 6Slide 6 www.edureka.co/big-data-and-hadoop
HADOOP
Hadoop is a programming framework
that supports the processing of large
data sets in a distributed computing
environment.
Hadoop was the first and still
the best tool to handle Big
Data.
Slide 7Slide 7Slide 7 www.edureka.co/big-data-and-hadoop
A BRIEF HISTORY OF HADOOP
Slide 8Slide 8Slide 8 www.edureka.co/big-data-and-hadoop
HADOOP:- HDFS & MAP-REDUCE
Most efficient for Large-Scale Storage & Processing
 HDFS: Distributed file system
Self-Healing Data store
 MAP-REDUCE: Distributed computation framework
that handles the complexities of distributed
programming
Slide 9Slide 9Slide 9 www.edureka.co/big-data-and-hadoop
KEY TO HADOOP’S POWER
 Computation co-located with data
Data and computation system co-designed and co-developed to work
together
 Process data in parallel across thousands of “commodity” hardware
nodes
Self-healing; failure handled by software
 Designed for one write and multiple reads
There are no random writes
Optimized for minimum seek on hard drives
Slide 10Slide 10Slide 10 www.edureka.co/big-data-and-hadoop
What is a Data product?
“A software system whose core functionality
depends on the application of statistical analysis
and machine learning to data.”
Slide 11Slide 11Slide 11 www.edureka.co/big-data-and-hadoop
Example #1: People you may know
Slide 12Slide 12Slide 12 www.edureka.co/big-data-and-hadoop
Example #2: Spell Correction
Slide 13Slide 13Slide 13 www.edureka.co/big-data-and-hadoop
What is
Data Science?
Slide 14Slide 14Slide 14 www.edureka.co/big-data-and-hadoop
DATA SCIENCE
#1: Extracting deep meaning from data
(data mining; finding “gems” in data)
Slide 15Slide 15Slide 15 www.edureka.co/big-data-and-hadoop
Common Data Science tasks
Slide 16Slide 16Slide 16 www.edureka.co/big-data-and-hadoop
DATA SCIENCE
#2: Building Data Products
(Delivering Gems on a regular basis)
Slide 17Slide 17Slide 17 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #1:
Explore full datasets
Slide 18Slide 18Slide 18 www.edureka.co/big-data-and-hadoop
#1: Exploration of Data sets
Slide 19Slide 19Slide 19 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #2:
Mining of larger datasets
Slide 20Slide 20Slide 20 www.edureka.co/big-data-and-hadoop
#2: Mining of larger data sets
More Data ---> Better Outcomes
Slide 21Slide 21Slide 21 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #3:
Large-scale data preparation
Slide 22Slide 22Slide 22 www.edureka.co/big-data-and-hadoop
#3: Large-Scale Data preparation
80% of data science work is data preparation
Slide 23Slide 23Slide 23 www.edureka.co/big-data-and-hadoop
Reason #4:
Accelerate data-driven innovation
Why HADOOP for DATA SCIENCE?
Slide 24Slide 24Slide 24 www.edureka.co/big-data-and-hadoop
Speed Barriers of traditional Data Architectures
Slide 25Slide 25Slide 25 www.edureka.co/big-data-and-hadoop
“Schema on read” means faster time-to-innovation
Demo
Questions
Slide 27
Slide 28
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.
SURVEY
Is Hadoop a necessity for Data Science

Is Hadoop a necessity for Data Science

  • 1.
  • 2.
    Slide 2Slide 2Slide2 www.edureka.co/big-data-and-hadoop Today we will take you through the following:  What is Big Data & Hadoop?  What is a Data Product?  What is Data Science?  Why Hadoop for Data Science?  Is Hadoop a necessity for Data Science? AGENDA
  • 3.
    Slide 3Slide 3Slide3 www.edureka.co/big-data-and-hadoop What is Big Data & Hadoop?
  • 4.
    Slide 4Slide 4Slide4 www.edureka.co/big-data-and-hadoop BIG DATA Big data is a popular term used to describe the exponential growth of data. Big Data can be either Structured data or Unstructured data or a combination of both. Big Data
  • 5.
    Slide 5Slide 5Slide5 www.edureka.co/big-data-and-hadoop BIGDATA 3 V’s (Volume, Variety and Velocity) are three defining properties or dimensions of Big Data.
  • 6.
    Slide 6Slide 6Slide6 www.edureka.co/big-data-and-hadoop HADOOP Hadoop is a programming framework that supports the processing of large data sets in a distributed computing environment. Hadoop was the first and still the best tool to handle Big Data.
  • 7.
    Slide 7Slide 7Slide7 www.edureka.co/big-data-and-hadoop A BRIEF HISTORY OF HADOOP
  • 8.
    Slide 8Slide 8Slide8 www.edureka.co/big-data-and-hadoop HADOOP:- HDFS & MAP-REDUCE Most efficient for Large-Scale Storage & Processing  HDFS: Distributed file system Self-Healing Data store  MAP-REDUCE: Distributed computation framework that handles the complexities of distributed programming
  • 9.
    Slide 9Slide 9Slide9 www.edureka.co/big-data-and-hadoop KEY TO HADOOP’S POWER  Computation co-located with data Data and computation system co-designed and co-developed to work together  Process data in parallel across thousands of “commodity” hardware nodes Self-healing; failure handled by software  Designed for one write and multiple reads There are no random writes Optimized for minimum seek on hard drives
  • 10.
    Slide 10Slide 10Slide10 www.edureka.co/big-data-and-hadoop What is a Data product? “A software system whose core functionality depends on the application of statistical analysis and machine learning to data.”
  • 11.
    Slide 11Slide 11Slide11 www.edureka.co/big-data-and-hadoop Example #1: People you may know
  • 12.
    Slide 12Slide 12Slide12 www.edureka.co/big-data-and-hadoop Example #2: Spell Correction
  • 13.
    Slide 13Slide 13Slide13 www.edureka.co/big-data-and-hadoop What is Data Science?
  • 14.
    Slide 14Slide 14Slide14 www.edureka.co/big-data-and-hadoop DATA SCIENCE #1: Extracting deep meaning from data (data mining; finding “gems” in data)
  • 15.
    Slide 15Slide 15Slide15 www.edureka.co/big-data-and-hadoop Common Data Science tasks
  • 16.
    Slide 16Slide 16Slide16 www.edureka.co/big-data-and-hadoop DATA SCIENCE #2: Building Data Products (Delivering Gems on a regular basis)
  • 17.
    Slide 17Slide 17Slide17 www.edureka.co/big-data-and-hadoop Why HADOOP for DATA SCIENCE? Reason #1: Explore full datasets
  • 18.
    Slide 18Slide 18Slide18 www.edureka.co/big-data-and-hadoop #1: Exploration of Data sets
  • 19.
    Slide 19Slide 19Slide19 www.edureka.co/big-data-and-hadoop Why HADOOP for DATA SCIENCE? Reason #2: Mining of larger datasets
  • 20.
    Slide 20Slide 20Slide20 www.edureka.co/big-data-and-hadoop #2: Mining of larger data sets More Data ---> Better Outcomes
  • 21.
    Slide 21Slide 21Slide21 www.edureka.co/big-data-and-hadoop Why HADOOP for DATA SCIENCE? Reason #3: Large-scale data preparation
  • 22.
    Slide 22Slide 22Slide22 www.edureka.co/big-data-and-hadoop #3: Large-Scale Data preparation 80% of data science work is data preparation
  • 23.
    Slide 23Slide 23Slide23 www.edureka.co/big-data-and-hadoop Reason #4: Accelerate data-driven innovation Why HADOOP for DATA SCIENCE?
  • 24.
    Slide 24Slide 24Slide24 www.edureka.co/big-data-and-hadoop Speed Barriers of traditional Data Architectures
  • 25.
    Slide 25Slide 25Slide25 www.edureka.co/big-data-and-hadoop “Schema on read” means faster time-to-innovation
  • 26.
  • 27.
  • 28.
    Slide 28 Your feedbackis vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better! Please spare few minutes to take the survey after the webinar. SURVEY