SlideShare a Scribd company logo
1 of 84
Data Science
Introduction to Data
Science
LIVE On-line Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work on Large Data Base
Verifiable Certificate
How it Works?
Slide 2 www.edureka.in/data-science
Topics for the Day
Slide 3 www.edureka.in/data-science
 Big Data
 Big Data Scenarios
 Big Data Challenges
 Introduction to Data Science
 Data Science: Components
 Types of DataScientists
 Data Science: Core Components
 Use-Cases
 Introduction to Hadoop and R
 R and Hadoop Integration
 Machine Learning with Mahout
 References
Objectives
At the end of this module, you will be able to
 Understand Big Data and its challenges
 Implement Big Data in real time scenarios
 List and explain the components and prospects of Data Science
 Learn the implementation of Hadoop on Big data
 Analyze some real world use-cases with the help of R programming Language
 Understand machine learning concepts
Data Science
Slide 5 www.edureka.in/data-science
Big Data
Slide 6 www.edureka.in/data-science
What is Big Data?
Lots of Data
(Terabytes or
Petabytes)
Systems/Enterprises
generate huge amount
of data from Terabytes
to and even Petabytes
of information
Slide 8 www.edureka.in/data-sciencehttp://www.today.mccombs.utexas.edu/2012/04/the-big-data-machine
Big Data Scenarios
Slide 9 www.edureka.in/data-sciencehttp://www.clker.com/clipart-13967.html
Big Data Scenarios: Sports
Slide 9 www.edureka.in/data-sciencehttp://www.espncricinfo.com/
Big Data Scenarios: Sports
Sports teams are using data for tracking ticket
sales and even for tracking team strategies.
Advertising and marketing agencies are tracking
social media to understand responsiveness to
campaigns, promotions, and other advertising
mediums
Slide 10 www.edureka.in/data-sciencehttp://www.espncricinfo.com/
Big Data Scenarios : Hospital Care
Slide 12 www.edureka.in/data-sciencehttp://www.majorprojects.vic.gov.au/our-projects/our-past-projects/austin-hospital
Big Data Scenarios : Hospital Care
Hospitals are analyzing medical data and patient
records to predict those patients that are likely to seek
readmission within a few months of discharge. The
hospital can then intervene in hopes of preventing
another costly hospital stay.
Medical diagnostics company analyzes millions of lines
of data to develop first non-intrusive test for
predicting coronary artery disease. To do so,
researchers at the company analyzed over 100 million
gene samples to ultimately identify the 23 primary
predictive genes for coronary artery disease
Slide 13 www.edureka.in/data-science
Big Data Scenarios : Amazon.com
Slide 13 www.edureka.in/data-sciencehttp://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png
Amazon has an unrivalled bank of data on online consumer
purchasing behaviour that it can mine from its 152 million
customer accounts.
Amazon also uses Big Data to monitor, track and secure its 1.5
billion items in its retail store that are laying around it 200
fulfilment centres around the world. Amazon stores the
product catalogue data in S3.
S3 can write, read and delete objects up to 5 TB of data each.
The catalogue stored in S3 receives more than 50 million
updates a week and every 30 minutes all data received is
crunched and reported back to the different warehouses and
the website.
Big Data Scenarios : Amazon.com
Slide 14 www.edureka.in/data-sciencehttp://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png
Big Data Scenarios: NetFlix
Slide 15 www.edureka.in/data-sciencehttp://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png
Netflix uses 1 petabyte to store the videos for streaming.
BitTorrent Sync has transferred over 30 petabytes of data
since its pre-alpha release in January 2013.
The 2009 movie Avatar is reported to have taken over 1
petabyte of local storage at Weta Digital for the rendering
of the 3D CGI effects.
One petabyte of average MP3-encoded songs (for mobile,
roughly one megabyte per minute), would require 2000
years to play.
Big Data Scenarios: NetFlix
Slide 16 www.edureka.in/data-sciencehttp://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png
Big Data Scenarios: The Large Hadron Collider
Slide 18 www.edureka.in/data-sciencehttp://www.crowdsourcing.org/article/-nasa-tries-to-free-creativity-with-big-data-challenge/19984
The experiments in the Large Hadron Collider produce
about 15 petabytes of data per year, which are
distributed over the Worldwide LHC Computing Grid.
One petabyte is enough to store the DNA of the
entire population of the USA - with cloning it twice.
Big Data Scenarios: The Large Hadron Collider
Slide 19 www.edureka.in/data-sciencehttp://en.wikipedia.org/wiki/Large_Hadron_Collider
IBM’s Definition
IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
Web
logs
Audios
Images
Videos
Sensor
Data
VOLUME VELOCITY VARIETY
Slide 19 www.edureka.in/data-science
IBM’s Definition
 Structured
 Unstructured
 Semi structured
 All the above
Variety
3 Vs of
Big data
 Batch
 Near Time
 Real Time
 Streams
Velocity
 Terabytes
 Records
 Transactions
 Tables, files
Volume
IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
Slide 20 www.edureka.in/data-science
Slide 22 www.edureka.in/data-sciencehttp://whatsthebigdata.files.wordpress.com/2013/11/batman-on-big-data.jpg
What about ‘Veracity’?
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to make
you guys think and answer my
questions.
Slide 22 www.edureka.in/data-science
Annie’s Introduction
Map the following to corresponding type: Structured/ Unstructured/ Semi-
structured.
- XML Files
- Word Docs, PDF files, Text files
- E-Mail body
- Data from Enterprise systems (ERP, CRM etc.)
Slide 23 www.edureka.in/data-science
Annie’s Question
XML Files -> Semi-structured data
Word Docs, PDF files, Text files -> Unstructured Data
E-Mail body -> Unstructured Data
Data from Enterprise systems (ERP, CRM etc.) -> Structured Data
Slide 24 www.edureka.in/data-science
Annie’s Answer
Big Data: Challenges
Slide 26 www.edureka.in/data-sciencehttp://spinnakr.com/blog/wp-content/uploads/2013/08/Using-Big-Data-.jpg
Big Data
Challenges
Data security and
Privacy
High variety of
Information
High veracity of
Data
Data Acquisition
High velocity of
processed Data
Information search
and Analytics
High volume of
Data
Information storage
and Analytics
Slide 27 www.edureka.in/data-science
Big Data: Challenges
Slide 28 www.edureka.in/data-sciencehttp://thesocietypages.org/sociologylens/files/2013/09/BIgDataDilbert_Cartoon.jpg
Data Science
Slide 29 www.edureka.in/data-sciencehttp://escience.washington.edu/blog/uw-berkeley-nyu-collaborate-378m-data-science-initiative
Data Science
“More data usually beats better algorithms,”
Such as: Recommending movies or music based on past preferences.
Slide 29 www.edureka.in/data-science
No matter how extremely unpleasant your algorithm is, they can often be beaten simply by having
more data (and a less sophisticated algorithm).
Big Data is here
Bad News We are struggling to
store and analyze it.
Good News
Data Science
Slide 30 www.edureka.in/data-science
Data Science: Components
Slide 32 www.edureka.in/data-sciencehttp://abstrusegoose.com/55
Data Science
Visualization
Advanced Computing
Domain Expertise
Statistics
Data Engineering
Data Science: Components
Slide 32 www.edureka.in/data-science
Data Science: Prospects
Slide 33 www.edureka.in/data-science
Types of Data Scientists
Based on clustering the ways that data is handled by Data Scientists, the following 4 categories can be created:
 Data Businesspeople are the product and profit-focused data scientists. They’re leaders, managers, and
entrepreneurs, but with a technical bent. A common educational path is an engineering degree paired with an
MBA.
 Data Creatives are eclectic jacks-of-all-trades, able to work with a broad range of data and tools. They may
think of themselves as artists or hackers, and excel at visualization and open source technologies.
 Data Developers are focused on writing software to do analytic, statistical, and machine learning tasks, often
in production environments. They often have computer science degrees, and often work with so-called “big
data”.
 Data Researchers apply their scientific training, and the tools and techniques they learned in academia, to
organizational data. They may have PhDs, and their creative applications of mathematical tools yields valuable
insights and products.
Slide 35 www.edureka.in/data-sciencehttp://datacommunitydc.org/blog/2013/06/there-is-more-than-one-kind-of-data-scientist/
Relationships - Four Categories and the Five Skill Groups
Slide 36 www.edureka.in/data-sciencehttp://datacommunitydc.org/blog/wp-content/uploads/2012/08/SkillsSelfIDMosaic-edit-500px.png
Data Science: Core Components
Data Science
Data Architecture
Tool: Hadoop
Machine Learning
Tool: Mahout
Analytics
Tool: R
Slide 36 www.edureka.in/data-science
Use-Cases
Slide 37 www.edureka.in/data-science
No one Knows How to Use it
Slide 38 www.edureka.in/data-science
Use-Case Implementation: Techniques Used
A Problem
Dataset
Analysis
Results
Slide 39 www.edureka.in/data-science
Understanding the
Machine Learning
algorithm to be
used Implementing Machine
Learning in Hadoop on Big
Data Visualisation of
the analysis
Understanding the
problem statement
and defining the
solution
Exploring ways to
integrate R with
Hadoop
Implementing Machine
Learning algorithm in R on
the smaller dataset
Use-Case Implementation:Process Flow Diagram
Slide 40 www.edureka.in/data-science
Domain of the Dataset:
Communications and Media. However, the
application of the algorithm is not limited to only
Communications and Media. The technique is
useful for any domain which requires organizing
documents to improve retrieval and support
browsing.
Problem Statement:
A top media company wants to browse through
the popular news from a collection that appeared
on the Reuters newswire in 1987.
Clustering / Grouping documents based on their
contents will make the analysis easier.
Media Use-Case
The Reuters-21578 data set composition
Slide 41 www.edureka.in/data-science
Media Use-Case: K-means Clustering
First we will
understand the
implementation of the
technique in R on a
smaller dataset
Then we will understand how
to achieve document
clustering on Big Data using
Mahout libraries on Hadoop
K-Means Clustering can
be implemented on this
dataset
Communications and
Media Dataset to be
Clustered based on
their contents
R Implementation
Hadoop
Implementation
Machine Learning
Implementation
Content-wise
Clustered/Grouped
documents
Slide 42 www.edureka.in/data-science
Domain of the Dataset:
Products and Retail. However, the application of the
algorithm is not limited to only Products and Retail. The
technique can be applied wherever we want to discover
the co-occurrence relationship amongst various
activities.
Problem Statement:
Market Basket Analysis.
A retail outlet wants understand the purchase behavior
of a buyer. This information will enable the retailer to
understand the buyer's needs.
The analysis might tell a retailer that customers often
purchase shampoo and conditioner together, so putting
both items on promotion at the same time would create
a significant increase in profit, while a promotion
involving just one of the items would likely drive sales of
the other.
Market Basket Use-Case
Market Basket Analysis
98% of people
who purchased
items A and B
also purchased
item C
Slide 43 www.edureka.in/data-science
Market Basket Use-Case: Association Rule Mining
Product and Retail
Dataset
Understand the
implementation of the
technique on a smaller
dataset
Understand how to
achieve the same on
Big Data using Mahout
libraries on Hadoop
The technique used is
Affinity Analysis or
Association Rule Mining
R Implementation
Hadoop
Implementation
Machine Learning
Implementation
Market Basket
Analysis
Slide 44 www.edureka.in/data-science
Slide 46 www.edureka.in/data-science
Domain of the Dataset:
Life Science and Health Care. However, the application
of the algorithm is not limited to only Life Science and
Health Care . The technique can be applied wherever
we want to forecast the occurrence of a event on the
basis of certain conditions.
Problem Statement:
A health care organization wants to forecast the onset
of diabetes mellitus in Indians using certain set of
attributes of patients as input such as:
 Plasma glucoseconcentration
 Diastolic bloodpressure
 Triceps skin fold thickness
etc.
Health Care Use-Case
http://www.thenewstribe.com/2013/11/15/diabetes-is-killing-one-patient-every-six-seconds/
Slide 47 www.edureka.in/data-science
Understand how to
achieve the same on Big
Data using Mahout
libraries on Hadoop
The technique used
is Affinity Analysis or
Association Rule
Mining.
R Implementation
Understand the basic
implementation of the technique
on a smaller dataset using R
Achieve parallel processing on
the same algorithm using a
parallel processing library
provided by Revolution R.
Hadoop
Implementation
Machine Learning
Implementation
Forecast the onset of
diabetes mellitus in
Indians
Life Science and
Health Care Dataset
with some attributes
of patients as input.
Health Care Use-Case: Parallel Processing
Slide 48 www.edureka.in/data-science
Domain of the Dataset:
Social Media. However, the application of the
algorithm is not limited to only Social Media. The
technique can be applied wherever we want to put
documents into category without going through
the contents of all the documents.
Problem Statement:
A Social Media research firm wants to know the
trends of topics discussed on Twitter. For easy
analysis it wants to classify them in the following
categories:
 apparel (clothes, shoes, watches, …)
 art (Book, DVD, Music, …)
 camera
 event (travel, concert, …)
 health (beauty, spa, …)
 home (kitchen, furniture, garden, …)
 tech (computer, laptop, tablet, …)
http://www.mobigyaan.com/images/stories/Miscellaneous/mobigyaan-twitter-chat.jpg
Social Media Use-Case
Social Media Use-Case: Naïve Bayes Classifier
Understand the basic
implementation of the
technique on a smaller
dataset using R.
Understand how to
achieve the same on
Big Data using Mahout
libraries on Hadoop.
The technique used is
Naïve Bayes Classifier.
Social Media
dataset
R Implementation
Hadoop
Implementation
Machine Learning
Implementation
Categorical
classification of
the tweets
Slide 48 www.edureka.in/data-science
Going forward with the class, we will throw some light on the concepts of
Hadoop, R and Machine Learning respectively.
These topics will be vividly covered in their respective modules during the course.
Data Science: Core Components
Slide 49 www.edureka.in/data-science
Introduction to Hadoop
Slide 50 www.edureka.in/data-science
 Apache Hadoop is a framework that allows for the distributed
processing of large data sets across clusters of commodity
computers using a simple programming model.
 It is an Open-source Data Management with scale-out
storage & distributed processing.
 In 2004, Google published a paper on a process called
MapReduce.
parallel processing model
process huge amount of
 MapReduce framework provides a
and associated implementation to
data.
 Therefore, an implementation of MapReduce framework was
adopted by an Apache open source project named Hadoop.
Introduction to Hadoop
Slide 51 www.edureka.in/data-science
Hadoop Key Characteristics
Scalable
Reliable
Economical
Flexible
Robust
Ecosystem
Hadoop Key
Characteristics
Slide 52 www.edureka.in/data-science
Hadoop Core Components
Data Node
Task
Tracker
Data Node
Task
Tracker
Data Node
Task
Tracker
Data Node
Task
Tracker
MapReduce
Engine
HDFS
Cluster
Job Tracker
Admin Node
Name node
Slide 53 www.edureka.in/data-science
Hadoop is a framework that allows for the distributed processing of:
- Small Data Sets
- Large Data Sets
Slide 54 www.edureka.in/data-science
Annie’s Question
Large Data Sets. It is also capable to process small data-sets however to
experience the true power of Hadoop one needs to have data in Tb’s because
this where RDBMS takes hours and fails whereas Hadoop does the same in
couple of minutes.
Slide 55 www.edureka.in/data-science
Annie’s Answer
For setting-up Hadoop on your system you can follow the “Hadoop Installation Guide” present in the LMS.
Slide 56 www.edureka.in/data-science
Analytics with R
Slide 57 www.edureka.in/data-science
Analytics with R
Slide 59 www.edureka.in/data-sciencehttp://www.r-project.org/
R : Characteristics
Slide 59 www.edureka.in/data-science
 R is open source and free.
 R has lots of packages and multiple ways of doing the same thing.
 By default stores memory in RAM.
 R has the most advanced graphics. You need much better programming skills.
 R has GUI to help make learning easier.
 Customization needs commandline.
 R can connect to many database and data types.
Comparing R and others
http://r4stats.com/articles/popularity/
Comparing R
Slide 60 www.edureka.in/data-science
Comparing R with Base SAS* /SAS Stat*
R Base SAS* /SAS Stat*
R is open source and free
Base SAS* , SAS/Stat*, SAS/ET*, SAS/OR*,
SAS/Graph* are expensive relatively because of
annual licenses
Open source R has support from email lists,
twitter, stack overflow
SAS Institute* products have dedicated support
and extensive documentation
R is slower on the desktop than base SAS for
datasets ~4-5 gb
By default R stores memory in RAM, so we can
use the cloud
R has much better graphics You need much better programming skills
You can create custom functions in R easily Customization needs command line
R has multiple GUI that are free SAS GUI are more expensive
Slide 62 www.edureka.in/data-science*Copyright © 2012 SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513, USA. All rights reserved.
Annie’s Question
R Provides support in terms of:
1. Dedicated Support and Documentation
2. Email-lists, twitter, etc.
Slide 62 www.edureka.in/data-science
Annie’s Answer
Answer:
2. Email-lists, twitter, etc.
Slide 63 www.edureka.in/data-science
Annie’s Question
Custom functions can be easily created in :
1. SAS
2. R
Slide 64 www.edureka.in/data-science
Annie’s Answer
Answer:
1. R
Slide 65 www.edureka.in/data-science
Annie’s Question
Most of the functions in R are written in :
- Java
- R
- C
- Fortran
Slide 66 www.edureka.in/data-science
Annie’s Answer
Most of the user-visible functions in R are written in R.
It is possible for the user to interface to procedures written in the C, C++, or
FORTRAN languages for efficiency.
Slide 67 www.edureka.in/data-science
Introduction to R Programming language
www.r-project.org/about.html
 History
 Evolution
 Current State
Slide 68 www.edureka.in/data-science
 Open Source
 Free
 Widely Recognized
 Official Website
 R Core
 Creators
 R Journal
R and Hadoop Integration
 R and Hadoop are a natural match in Big Data Analytics and visualization.
 One of the most well-known R packages to support Hadoop functionalities is : RHadoop
 Rhadoop was developed by Revolution Analytics.
 RHadoop is a collection of three R packages: rmr, rhdfs and rhbase.
file rmr package provides Hadoop MapReduce functionality in R, rhdfs provides HDFS
management in R and rhbase provides HBase database management from within R.
+
Slide 69 www.edureka.in/data-science
For setting-up R on your system you can follow the “R Installation Guide” present in the LMS under
module 1.
Slide 70 www.edureka.in/data-science
Machine Learning
Slide 71 www.edureka.in/data-science
Slide 73 www.edureka.in/data-science
Machine Learning: Mahout
 Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is
the data that "tells" what the "good answer" is.
Example:
An hypothetical non-machine learning
algorithm for face recognition in images
would try to define
what a face is (round skin-like-colored
disk, with dark area where you expect the
eyes etc).
A machine learning algorithm would not
have such coded definition, but will
"learn-by-examples": you'll show several
images of faces and not-faces and a good
algorithm will
eventually learn and be able to predict
whether or not an unseen image is a face.
http://endthelie.com/2012/08/24/fbi-sharing-facial-recognition-software-with-police-departments-across-america/
Mahout Overview
Mahout is about scalable
Machine Learning
Mahout has functionality
for many of today’s
common machine
learning tasks
Machine Learning is all
over the web today
MapReduce magic in
action
Slide 73 www.edureka.in/data-science
Hadoop and
MapReduce magic in
action
https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout
Write intelligent applications using Apache Mahout
LinkedIn Recommendations
Machine Learning: LinkedIn Recommendations
Slide 74 www.edureka.in/data-science
Annie’s Question
Mahout Algorithms for clustering, classification and collaborative filtering are
implemented on top of Apache Hadoop using :
- Flume
- MapReduce
- Sqoop
- Hive
Slide 75 www.edureka.in/data-science
Annie’s Answer
Mahout Algorithms are implemented on top of Apache Hadoop using the
Map/Reduce paradigm.
Slide 76 www.edureka.in/data-science
1. Install R with the help of “R Installation Steps” guide in the LMS. This is a step wise guide which will help you in
installing and setting up R on your system
Slide 77 www.edureka.in/data-science
Assignment
Agenda for Next Class
Slide 78 www.edureka.in/data-science
In the next class you will be able to
 Understand what is R
 Describe why R is used?
 Implement R Programming Concepts
 Learn Data Import Techniques
 Analyze the Processing of Data
Pre-work
Go through the “R Essentials for Data Science” section in the LMS. Watch the recordings present in the
section to gain an understanding of the R environment.
Slide 79 www.edureka.in/data-science
What’s Within the LMS?
Slide 80 www.edureka.in/data-science
What’s Within the LMS?
Recording
of the Class
Presentation
Quiz
Slide 81 www.edureka.in/data-science
What’s Within the LMS?
Assignment
Installation
Guide
Pre-work
Slide 82 www.edureka.in/data-science
References
Slide 83 www.edureka.in/data-science
http://www.today.mccombs.utexas.edu/2012/04/the-big-data-machine
http://www.espncricinfo.com/
http://www.majorprojects.vic.gov.au/our-projects/our-past-projects/austin-hospital
http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png
http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png
http://www.crowdsourcing.org/article/-nasa-tries-to-free-creativity-with-big-data-challenge/19984
http://whatsthebigdata.files.wordpress.com/2013/11/batman-on-big-data.jpg
http://spinnakr.com/blog/wp-content/uploads/2013/08/Using-Big-Data-.jpg
http://thesocietypages.org/sociologylens/files/2013/09/BIgDataDilbert_Cartoon.jpg
http://abstrusegoose.com/55
http://www.thenewstribe.com/2013/11/15/diabetes-is-killing-one-patient-every-six-seconds/
http://www.mobigyaan.com/images/stories/Miscellaneous/mobigyaan-twitter-chat.jpg
http://www.r-project.org/
http://endthelie.com/2012/08/24/fbi-sharing-facial-recognition-software-with-police-departments-across-america/
Introduction to Data Science

More Related Content

What's hot

Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science IntroductionGang Tao
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science clubData Science Club
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Edureka!
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Edureka!
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 

What's hot (20)

Data science
Data science Data science
Data science
 
Data science
Data scienceData science
Data science
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science Introduction
 
Data science
Data scienceData science
Data science
 
Data science
Data scienceData science
Data science
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science
Data ScienceData Science
Data Science
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Data science
Data scienceData science
Data science
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 

Viewers also liked

Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...
Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...
Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...Edureka!
 
Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...
Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...
Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...Edureka!
 
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...Edureka!
 
Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...
Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...
Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...Edureka!
 
Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...
Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...
Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...Edureka!
 
Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...
Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...
Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...Edureka!
 
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | EdurekaBig Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | EdurekaEdureka!
 
Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...
Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...
Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...Edureka!
 
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...Edureka!
 
Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...
Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...
Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...Edureka!
 
Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...
Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...
Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...Edureka!
 
Docker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | Edureka
Docker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | EdurekaDocker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | Edureka
Docker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | EdurekaEdureka!
 
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...Edureka!
 
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...Edureka!
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Edureka!
 
Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...
Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...
Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...Edureka!
 
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...Edureka!
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Edureka!
 
ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...
ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...
ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...Edureka!
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 

Viewers also liked (20)

Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...
Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...
Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...
 
Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...
Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...
Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...
 
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
 
Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...
Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...
Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...
 
Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...
Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...
Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...
 
Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...
Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...
Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...
 
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | EdurekaBig Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
 
Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...
Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...
Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...
 
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
 
Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...
Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...
Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...
 
Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...
Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...
Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...
 
Docker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | Edureka
Docker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | EdurekaDocker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | Edureka
Docker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | Edureka
 
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
 
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
 
Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...
Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...
Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...
 
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
 
ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...
ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...
ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 

Similar to Introduction to Data Science

How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data ScienceEdureka!
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...Edureka!
 
Hadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersHadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersEdureka!
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop iACT Global
 
Every angle jacques adriaansen
Every angle   jacques adriaansenEvery angle   jacques adriaansen
Every angle jacques adriaansenBigDataExpo
 
Ai open powermeetupmarch25th
Ai open powermeetupmarch25thAi open powermeetupmarch25th
Ai open powermeetupmarch25thIBM
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdfUSDSI
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsEdureka!
 
Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4LennartF
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 

Similar to Introduction to Data Science (20)

How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Hadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersHadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three Musketeers
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Every angle jacques adriaansen
Every angle   jacques adriaansenEvery angle   jacques adriaansen
Every angle jacques adriaansen
 
Bigdata
Bigdata Bigdata
Bigdata
 
Ai open powermeetupmarch25th
Ai open powermeetupmarch25thAi open powermeetupmarch25th
Ai open powermeetupmarch25th
 
Big data
Big data Big data
Big data
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionals
 
Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
13 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v313 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v3
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Introduction to Data Science

  • 2. LIVE On-line Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work on Large Data Base Verifiable Certificate How it Works? Slide 2 www.edureka.in/data-science
  • 3. Topics for the Day Slide 3 www.edureka.in/data-science  Big Data  Big Data Scenarios  Big Data Challenges  Introduction to Data Science  Data Science: Components  Types of DataScientists  Data Science: Core Components  Use-Cases  Introduction to Hadoop and R  R and Hadoop Integration  Machine Learning with Mahout  References
  • 4. Objectives At the end of this module, you will be able to  Understand Big Data and its challenges  Implement Big Data in real time scenarios  List and explain the components and prospects of Data Science  Learn the implementation of Hadoop on Big data  Analyze some real world use-cases with the help of R programming Language  Understand machine learning concepts
  • 5. Data Science Slide 5 www.edureka.in/data-science
  • 6. Big Data Slide 6 www.edureka.in/data-science
  • 7. What is Big Data? Lots of Data (Terabytes or Petabytes) Systems/Enterprises generate huge amount of data from Terabytes to and even Petabytes of information Slide 8 www.edureka.in/data-sciencehttp://www.today.mccombs.utexas.edu/2012/04/the-big-data-machine
  • 8. Big Data Scenarios Slide 9 www.edureka.in/data-sciencehttp://www.clker.com/clipart-13967.html
  • 9. Big Data Scenarios: Sports Slide 9 www.edureka.in/data-sciencehttp://www.espncricinfo.com/
  • 10. Big Data Scenarios: Sports Sports teams are using data for tracking ticket sales and even for tracking team strategies. Advertising and marketing agencies are tracking social media to understand responsiveness to campaigns, promotions, and other advertising mediums Slide 10 www.edureka.in/data-sciencehttp://www.espncricinfo.com/
  • 11. Big Data Scenarios : Hospital Care Slide 12 www.edureka.in/data-sciencehttp://www.majorprojects.vic.gov.au/our-projects/our-past-projects/austin-hospital
  • 12. Big Data Scenarios : Hospital Care Hospitals are analyzing medical data and patient records to predict those patients that are likely to seek readmission within a few months of discharge. The hospital can then intervene in hopes of preventing another costly hospital stay. Medical diagnostics company analyzes millions of lines of data to develop first non-intrusive test for predicting coronary artery disease. To do so, researchers at the company analyzed over 100 million gene samples to ultimately identify the 23 primary predictive genes for coronary artery disease Slide 13 www.edureka.in/data-science
  • 13. Big Data Scenarios : Amazon.com Slide 13 www.edureka.in/data-sciencehttp://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png
  • 14. Amazon has an unrivalled bank of data on online consumer purchasing behaviour that it can mine from its 152 million customer accounts. Amazon also uses Big Data to monitor, track and secure its 1.5 billion items in its retail store that are laying around it 200 fulfilment centres around the world. Amazon stores the product catalogue data in S3. S3 can write, read and delete objects up to 5 TB of data each. The catalogue stored in S3 receives more than 50 million updates a week and every 30 minutes all data received is crunched and reported back to the different warehouses and the website. Big Data Scenarios : Amazon.com Slide 14 www.edureka.in/data-sciencehttp://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png
  • 15. Big Data Scenarios: NetFlix Slide 15 www.edureka.in/data-sciencehttp://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png
  • 16. Netflix uses 1 petabyte to store the videos for streaming. BitTorrent Sync has transferred over 30 petabytes of data since its pre-alpha release in January 2013. The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects. One petabyte of average MP3-encoded songs (for mobile, roughly one megabyte per minute), would require 2000 years to play. Big Data Scenarios: NetFlix Slide 16 www.edureka.in/data-sciencehttp://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png
  • 17. Big Data Scenarios: The Large Hadron Collider Slide 18 www.edureka.in/data-sciencehttp://www.crowdsourcing.org/article/-nasa-tries-to-free-creativity-with-big-data-challenge/19984
  • 18. The experiments in the Large Hadron Collider produce about 15 petabytes of data per year, which are distributed over the Worldwide LHC Computing Grid. One petabyte is enough to store the DNA of the entire population of the USA - with cloning it twice. Big Data Scenarios: The Large Hadron Collider Slide 19 www.edureka.in/data-sciencehttp://en.wikipedia.org/wiki/Large_Hadron_Collider
  • 19. IBM’s Definition IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ Web logs Audios Images Videos Sensor Data VOLUME VELOCITY VARIETY Slide 19 www.edureka.in/data-science
  • 20. IBM’s Definition  Structured  Unstructured  Semi structured  All the above Variety 3 Vs of Big data  Batch  Near Time  Real Time  Streams Velocity  Terabytes  Records  Transactions  Tables, files Volume IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ Slide 20 www.edureka.in/data-science
  • 22. Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Slide 22 www.edureka.in/data-science Annie’s Introduction
  • 23. Map the following to corresponding type: Structured/ Unstructured/ Semi- structured. - XML Files - Word Docs, PDF files, Text files - E-Mail body - Data from Enterprise systems (ERP, CRM etc.) Slide 23 www.edureka.in/data-science Annie’s Question
  • 24. XML Files -> Semi-structured data Word Docs, PDF files, Text files -> Unstructured Data E-Mail body -> Unstructured Data Data from Enterprise systems (ERP, CRM etc.) -> Structured Data Slide 24 www.edureka.in/data-science Annie’s Answer
  • 25. Big Data: Challenges Slide 26 www.edureka.in/data-sciencehttp://spinnakr.com/blog/wp-content/uploads/2013/08/Using-Big-Data-.jpg
  • 26. Big Data Challenges Data security and Privacy High variety of Information High veracity of Data Data Acquisition High velocity of processed Data Information search and Analytics High volume of Data Information storage and Analytics Slide 27 www.edureka.in/data-science Big Data: Challenges
  • 28. Data Science Slide 29 www.edureka.in/data-sciencehttp://escience.washington.edu/blog/uw-berkeley-nyu-collaborate-378m-data-science-initiative
  • 29. Data Science “More data usually beats better algorithms,” Such as: Recommending movies or music based on past preferences. Slide 29 www.edureka.in/data-science
  • 30. No matter how extremely unpleasant your algorithm is, they can often be beaten simply by having more data (and a less sophisticated algorithm). Big Data is here Bad News We are struggling to store and analyze it. Good News Data Science Slide 30 www.edureka.in/data-science
  • 31. Data Science: Components Slide 32 www.edureka.in/data-sciencehttp://abstrusegoose.com/55
  • 32. Data Science Visualization Advanced Computing Domain Expertise Statistics Data Engineering Data Science: Components Slide 32 www.edureka.in/data-science
  • 33. Data Science: Prospects Slide 33 www.edureka.in/data-science
  • 34. Types of Data Scientists Based on clustering the ways that data is handled by Data Scientists, the following 4 categories can be created:  Data Businesspeople are the product and profit-focused data scientists. They’re leaders, managers, and entrepreneurs, but with a technical bent. A common educational path is an engineering degree paired with an MBA.  Data Creatives are eclectic jacks-of-all-trades, able to work with a broad range of data and tools. They may think of themselves as artists or hackers, and excel at visualization and open source technologies.  Data Developers are focused on writing software to do analytic, statistical, and machine learning tasks, often in production environments. They often have computer science degrees, and often work with so-called “big data”.  Data Researchers apply their scientific training, and the tools and techniques they learned in academia, to organizational data. They may have PhDs, and their creative applications of mathematical tools yields valuable insights and products. Slide 35 www.edureka.in/data-sciencehttp://datacommunitydc.org/blog/2013/06/there-is-more-than-one-kind-of-data-scientist/
  • 35. Relationships - Four Categories and the Five Skill Groups Slide 36 www.edureka.in/data-sciencehttp://datacommunitydc.org/blog/wp-content/uploads/2012/08/SkillsSelfIDMosaic-edit-500px.png
  • 36. Data Science: Core Components Data Science Data Architecture Tool: Hadoop Machine Learning Tool: Mahout Analytics Tool: R Slide 36 www.edureka.in/data-science
  • 38. No one Knows How to Use it Slide 38 www.edureka.in/data-science
  • 39. Use-Case Implementation: Techniques Used A Problem Dataset Analysis Results Slide 39 www.edureka.in/data-science
  • 40. Understanding the Machine Learning algorithm to be used Implementing Machine Learning in Hadoop on Big Data Visualisation of the analysis Understanding the problem statement and defining the solution Exploring ways to integrate R with Hadoop Implementing Machine Learning algorithm in R on the smaller dataset Use-Case Implementation:Process Flow Diagram Slide 40 www.edureka.in/data-science
  • 41. Domain of the Dataset: Communications and Media. However, the application of the algorithm is not limited to only Communications and Media. The technique is useful for any domain which requires organizing documents to improve retrieval and support browsing. Problem Statement: A top media company wants to browse through the popular news from a collection that appeared on the Reuters newswire in 1987. Clustering / Grouping documents based on their contents will make the analysis easier. Media Use-Case The Reuters-21578 data set composition Slide 41 www.edureka.in/data-science
  • 42. Media Use-Case: K-means Clustering First we will understand the implementation of the technique in R on a smaller dataset Then we will understand how to achieve document clustering on Big Data using Mahout libraries on Hadoop K-Means Clustering can be implemented on this dataset Communications and Media Dataset to be Clustered based on their contents R Implementation Hadoop Implementation Machine Learning Implementation Content-wise Clustered/Grouped documents Slide 42 www.edureka.in/data-science
  • 43. Domain of the Dataset: Products and Retail. However, the application of the algorithm is not limited to only Products and Retail. The technique can be applied wherever we want to discover the co-occurrence relationship amongst various activities. Problem Statement: Market Basket Analysis. A retail outlet wants understand the purchase behavior of a buyer. This information will enable the retailer to understand the buyer's needs. The analysis might tell a retailer that customers often purchase shampoo and conditioner together, so putting both items on promotion at the same time would create a significant increase in profit, while a promotion involving just one of the items would likely drive sales of the other. Market Basket Use-Case Market Basket Analysis 98% of people who purchased items A and B also purchased item C Slide 43 www.edureka.in/data-science
  • 44. Market Basket Use-Case: Association Rule Mining Product and Retail Dataset Understand the implementation of the technique on a smaller dataset Understand how to achieve the same on Big Data using Mahout libraries on Hadoop The technique used is Affinity Analysis or Association Rule Mining R Implementation Hadoop Implementation Machine Learning Implementation Market Basket Analysis Slide 44 www.edureka.in/data-science
  • 45. Slide 46 www.edureka.in/data-science Domain of the Dataset: Life Science and Health Care. However, the application of the algorithm is not limited to only Life Science and Health Care . The technique can be applied wherever we want to forecast the occurrence of a event on the basis of certain conditions. Problem Statement: A health care organization wants to forecast the onset of diabetes mellitus in Indians using certain set of attributes of patients as input such as:  Plasma glucoseconcentration  Diastolic bloodpressure  Triceps skin fold thickness etc. Health Care Use-Case http://www.thenewstribe.com/2013/11/15/diabetes-is-killing-one-patient-every-six-seconds/
  • 46. Slide 47 www.edureka.in/data-science Understand how to achieve the same on Big Data using Mahout libraries on Hadoop The technique used is Affinity Analysis or Association Rule Mining. R Implementation Understand the basic implementation of the technique on a smaller dataset using R Achieve parallel processing on the same algorithm using a parallel processing library provided by Revolution R. Hadoop Implementation Machine Learning Implementation Forecast the onset of diabetes mellitus in Indians Life Science and Health Care Dataset with some attributes of patients as input. Health Care Use-Case: Parallel Processing
  • 47. Slide 48 www.edureka.in/data-science Domain of the Dataset: Social Media. However, the application of the algorithm is not limited to only Social Media. The technique can be applied wherever we want to put documents into category without going through the contents of all the documents. Problem Statement: A Social Media research firm wants to know the trends of topics discussed on Twitter. For easy analysis it wants to classify them in the following categories:  apparel (clothes, shoes, watches, …)  art (Book, DVD, Music, …)  camera  event (travel, concert, …)  health (beauty, spa, …)  home (kitchen, furniture, garden, …)  tech (computer, laptop, tablet, …) http://www.mobigyaan.com/images/stories/Miscellaneous/mobigyaan-twitter-chat.jpg Social Media Use-Case
  • 48. Social Media Use-Case: Naïve Bayes Classifier Understand the basic implementation of the technique on a smaller dataset using R. Understand how to achieve the same on Big Data using Mahout libraries on Hadoop. The technique used is Naïve Bayes Classifier. Social Media dataset R Implementation Hadoop Implementation Machine Learning Implementation Categorical classification of the tweets Slide 48 www.edureka.in/data-science
  • 49. Going forward with the class, we will throw some light on the concepts of Hadoop, R and Machine Learning respectively. These topics will be vividly covered in their respective modules during the course. Data Science: Core Components Slide 49 www.edureka.in/data-science
  • 50. Introduction to Hadoop Slide 50 www.edureka.in/data-science
  • 51.  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing.  In 2004, Google published a paper on a process called MapReduce. parallel processing model process huge amount of  MapReduce framework provides a and associated implementation to data.  Therefore, an implementation of MapReduce framework was adopted by an Apache open source project named Hadoop. Introduction to Hadoop Slide 51 www.edureka.in/data-science
  • 52. Hadoop Key Characteristics Scalable Reliable Economical Flexible Robust Ecosystem Hadoop Key Characteristics Slide 52 www.edureka.in/data-science
  • 53. Hadoop Core Components Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker MapReduce Engine HDFS Cluster Job Tracker Admin Node Name node Slide 53 www.edureka.in/data-science
  • 54. Hadoop is a framework that allows for the distributed processing of: - Small Data Sets - Large Data Sets Slide 54 www.edureka.in/data-science Annie’s Question
  • 55. Large Data Sets. It is also capable to process small data-sets however to experience the true power of Hadoop one needs to have data in Tb’s because this where RDBMS takes hours and fails whereas Hadoop does the same in couple of minutes. Slide 55 www.edureka.in/data-science Annie’s Answer
  • 56. For setting-up Hadoop on your system you can follow the “Hadoop Installation Guide” present in the LMS. Slide 56 www.edureka.in/data-science
  • 57. Analytics with R Slide 57 www.edureka.in/data-science
  • 58. Analytics with R Slide 59 www.edureka.in/data-sciencehttp://www.r-project.org/
  • 59. R : Characteristics Slide 59 www.edureka.in/data-science  R is open source and free.  R has lots of packages and multiple ways of doing the same thing.  By default stores memory in RAM.  R has the most advanced graphics. You need much better programming skills.  R has GUI to help make learning easier.  Customization needs commandline.  R can connect to many database and data types.
  • 60. Comparing R and others http://r4stats.com/articles/popularity/ Comparing R Slide 60 www.edureka.in/data-science
  • 61. Comparing R with Base SAS* /SAS Stat* R Base SAS* /SAS Stat* R is open source and free Base SAS* , SAS/Stat*, SAS/ET*, SAS/OR*, SAS/Graph* are expensive relatively because of annual licenses Open source R has support from email lists, twitter, stack overflow SAS Institute* products have dedicated support and extensive documentation R is slower on the desktop than base SAS for datasets ~4-5 gb By default R stores memory in RAM, so we can use the cloud R has much better graphics You need much better programming skills You can create custom functions in R easily Customization needs command line R has multiple GUI that are free SAS GUI are more expensive Slide 62 www.edureka.in/data-science*Copyright © 2012 SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513, USA. All rights reserved.
  • 62. Annie’s Question R Provides support in terms of: 1. Dedicated Support and Documentation 2. Email-lists, twitter, etc. Slide 62 www.edureka.in/data-science
  • 63. Annie’s Answer Answer: 2. Email-lists, twitter, etc. Slide 63 www.edureka.in/data-science
  • 64. Annie’s Question Custom functions can be easily created in : 1. SAS 2. R Slide 64 www.edureka.in/data-science
  • 65. Annie’s Answer Answer: 1. R Slide 65 www.edureka.in/data-science
  • 66. Annie’s Question Most of the functions in R are written in : - Java - R - C - Fortran Slide 66 www.edureka.in/data-science
  • 67. Annie’s Answer Most of the user-visible functions in R are written in R. It is possible for the user to interface to procedures written in the C, C++, or FORTRAN languages for efficiency. Slide 67 www.edureka.in/data-science
  • 68. Introduction to R Programming language www.r-project.org/about.html  History  Evolution  Current State Slide 68 www.edureka.in/data-science  Open Source  Free  Widely Recognized  Official Website  R Core  Creators  R Journal
  • 69. R and Hadoop Integration  R and Hadoop are a natural match in Big Data Analytics and visualization.  One of the most well-known R packages to support Hadoop functionalities is : RHadoop  Rhadoop was developed by Revolution Analytics.  RHadoop is a collection of three R packages: rmr, rhdfs and rhbase. file rmr package provides Hadoop MapReduce functionality in R, rhdfs provides HDFS management in R and rhbase provides HBase database management from within R. + Slide 69 www.edureka.in/data-science
  • 70. For setting-up R on your system you can follow the “R Installation Guide” present in the LMS under module 1. Slide 70 www.edureka.in/data-science
  • 71. Machine Learning Slide 71 www.edureka.in/data-science
  • 72. Slide 73 www.edureka.in/data-science Machine Learning: Mahout  Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is the data that "tells" what the "good answer" is. Example: An hypothetical non-machine learning algorithm for face recognition in images would try to define what a face is (round skin-like-colored disk, with dark area where you expect the eyes etc). A machine learning algorithm would not have such coded definition, but will "learn-by-examples": you'll show several images of faces and not-faces and a good algorithm will eventually learn and be able to predict whether or not an unseen image is a face. http://endthelie.com/2012/08/24/fbi-sharing-facial-recognition-software-with-police-departments-across-america/
  • 73. Mahout Overview Mahout is about scalable Machine Learning Mahout has functionality for many of today’s common machine learning tasks Machine Learning is all over the web today MapReduce magic in action Slide 73 www.edureka.in/data-science
  • 74. Hadoop and MapReduce magic in action https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout Write intelligent applications using Apache Mahout LinkedIn Recommendations Machine Learning: LinkedIn Recommendations Slide 74 www.edureka.in/data-science
  • 75. Annie’s Question Mahout Algorithms for clustering, classification and collaborative filtering are implemented on top of Apache Hadoop using : - Flume - MapReduce - Sqoop - Hive Slide 75 www.edureka.in/data-science
  • 76. Annie’s Answer Mahout Algorithms are implemented on top of Apache Hadoop using the Map/Reduce paradigm. Slide 76 www.edureka.in/data-science
  • 77. 1. Install R with the help of “R Installation Steps” guide in the LMS. This is a step wise guide which will help you in installing and setting up R on your system Slide 77 www.edureka.in/data-science Assignment
  • 78. Agenda for Next Class Slide 78 www.edureka.in/data-science In the next class you will be able to  Understand what is R  Describe why R is used?  Implement R Programming Concepts  Learn Data Import Techniques  Analyze the Processing of Data
  • 79. Pre-work Go through the “R Essentials for Data Science” section in the LMS. Watch the recordings present in the section to gain an understanding of the R environment. Slide 79 www.edureka.in/data-science
  • 80. What’s Within the LMS? Slide 80 www.edureka.in/data-science
  • 81. What’s Within the LMS? Recording of the Class Presentation Quiz Slide 81 www.edureka.in/data-science
  • 82. What’s Within the LMS? Assignment Installation Guide Pre-work Slide 82 www.edureka.in/data-science
  • 83. References Slide 83 www.edureka.in/data-science http://www.today.mccombs.utexas.edu/2012/04/the-big-data-machine http://www.espncricinfo.com/ http://www.majorprojects.vic.gov.au/our-projects/our-past-projects/austin-hospital http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png http://www.crowdsourcing.org/article/-nasa-tries-to-free-creativity-with-big-data-challenge/19984 http://whatsthebigdata.files.wordpress.com/2013/11/batman-on-big-data.jpg http://spinnakr.com/blog/wp-content/uploads/2013/08/Using-Big-Data-.jpg http://thesocietypages.org/sociologylens/files/2013/09/BIgDataDilbert_Cartoon.jpg http://abstrusegoose.com/55 http://www.thenewstribe.com/2013/11/15/diabetes-is-killing-one-patient-every-six-seconds/ http://www.mobigyaan.com/images/stories/Miscellaneous/mobigyaan-twitter-chat.jpg http://www.r-project.org/ http://endthelie.com/2012/08/24/fbi-sharing-facial-recognition-software-with-police-departments-across-america/