SlideShare a Scribd company logo
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
Table Of Content :
What is Data Science? ​4
Why Data Science? ​5
Role of a Data Scientist ​6
Solving Problems with Data Science ​7
Tools for Data Science ​9
i. R ​9
ii. Python ​9
iii. SQL ​10
iv. Hadoop ​10
v. Tableau ​11
vi. Weka ​11
Applications of Data Science ​11
i. Data Science in Healthcare ​11
ii. Data Science in E-commerce ​12
iii. Data Science in Manufacturing ​12
iv. Data Science as Conversational Agents ​12
v. Data Science in Transport ​12
Summary ​12
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
Data Science has become one of the most demanded jobs of the 21st century.
It has become a buzzword that almost everyone talks about these days. But
what is Data Science? In this article, we will demystify Data Science, the role
of a Data Scientist and have a look at the tools required to master Data
Science.
So, let’s start Data Science Tutorial.
What is Data Science?
“Data Science is about extraction, preparation, analysis, visualization, and
maintenance of information. It is a cross-disciplinary field which uses
scientific methods and processes to draw insights from data. ”
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
With the emergence of new technologies, there has been an exponential
increase in data. This has created an opportunity to analyze and derive
meaningful insights from data. It requires special expertise of a ‘Data Scientist’
who can use various statistical & machine learning tools to understand and
analyze data. A Data Scientist, specializing in Data Science, not only analyzes
the data but also uses​ ​machine learning algorithms to predict future
occurrences of an event​. Therefore, we can understand Data Science as a
field that deals with data processing, analysis, and extraction of insights from
the data using various statistical methods and computer algorithms. It is a
multidisciplinary field that combines mathematics, statistics, and computer
science.
Why Data Science?
So, after knowing what exactly Data Science is, you must explore why Data
Science is important. So, data has become the fuel of industries. It is the new
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
electricity. Companies require data to function, grow and improve their
businesses. Data Scientists deal with the data in order to assist companies in
making proper decisions. The data-driven approach undertaken by the
companies with the help of Data Scientists who analyze a large amount of data
to derive meaningful insights. These insights will be helpful for the companies
who wish to analyze themselves and their performance in the market. Other
than commercial industries, healthcare industries also use Data Science.
where the technology is in huge demand to recognize microscopic tumors and
deformities at an early stage of diagnosis.
The number of roles for Data Scientists has grown by ​650%​ since 2012. About
11.5 Million jobs​ will be ​created by 2026​ according to the U.S. Bureau of
Labor Statistics. Also, the job of Data Scientist ranks among top emerging jobs
on Linkedin. All the statistics point towards the growing demand for Data
Scientists.
Role of a Data Scientist
You might want to know who is a Data Scientist and what are his/her roles in
different fields. A Data Scientist deals with both unstructured and structured
data. The unstructured data is present in a raw format that requires extensive
data pre-processing, cleaning and organization in order to impart a
meaningful structure to a dataset. The Data Scientist then investigates this
organized data and analyzes it thoroughly to derive information from it using
various statistical methodologies. We use these statistical methods to describe,
visualize and hypothesize information from the data. Then with the usage of
advanced machine learning algorithms, the data scientist predicts the
occurrence of events and takes data-driven decisions.
A​ Data Scientist​ deploys vast arrays of tools and practices to recognize
redundant patterns within the data. These tools range from SQL, Hadoop to
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
Weka, R, and Python. Data Scientists usually act as consultants employed by
companies where they participate in various decision-making processes and
creation of strategies. In other words, Data Scientists use meaningful insights
from data to assist companies in taking smarter business decisions. For
example – Companies like Netflix, Google and Amazon are using Data Science
to develop powerful recommendation systems for their users. Similarly,
various financial companies are using predictive analytics and forecasting
methods to predict stock prices. Data Science has helped to create smarter
systems that can take autonomous decisions based on historical datasets.
Through its assimilation with emerging technologies like Computer Vision,
Natural Language Processing and Reinforcement Learning, it has manifested
itself to form a greater picture of​Artificial Intelligence​.
Solving Problems with Data Science
When solving a real-world problem with Data Science, the first step towards
solving it starts with Data Cleaning and Preprocessing. When a Data Scientist
is provided with a dataset, it may be in an unstructured format with various
inconsistencies. Organizing the data and removing erroneous information
makes it easier to analyze and draw insights. This process involves the
removal of redundant data, the transformation of data in a prescribed format,
handling missing values etc.
A Data Scientist analyzes the data through various statistical procedures. In
particular, two types of procedures used are:
● Descriptive Statistics
● Inferential Statistics
Assume that you are a Data Scientist working for a company that
manufactures cell phones. You have to analyze customers using the mobile
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
phones of your company. In order to do so, you will first take a thorough look
at the data and understand various trends and patterns involved. In the end,
you will summarize the data and present it in the form of a graph or a chart.
You therefore, apply Descriptive Statistics to solve the problem.
You will then draw ‘inferences’ or conclusions from the data. We will
understand inferential statistics through the following example – Assume that
you wish to find out a number of defects that occurred during manufacturing.
However, individual testing of mobile phones can take time. Therefore, you
will consider a sample of the given phones and make a generalization about
the number of defective phones in the total sample.
Now, you have to predict the sales of mobile phones over a period of two years.
As a result, you will use Regression Algorithms. Based on the given historical
sales, you will use regression algorithms to predict the sales over time.
Furthermore, you wish to analyze if customers will purchase the product
based on their annual salary, age, gender, and credit score. You will use
historical data to find out whether customers will buy (1) or not (0). Since
there are two outputs or ‘classes’, you will use a Binary Classification
Algorithm. Also, if there are more than two output classes we use Multivariate
Classification Algorithm to solve the problem. Both of the above-stated
problems are part of ‘Supervised Learning’.
There are also instances of ‘unlabeled’ data. In this, there is no segregation of
output in fixed classes as mentioned above. Suppose that you have to find
clusters of potential customers and leads based on their socio-economic
background. Since you do not have a fixed set of classes in your historical data,
you will use the Clustering Algorithm to identify clusters or sets of potential
clients. Clustering is an ‘Unsupervised Learning’ algorithm.
Self Driving cars have become a trending technology. The principle behind the
self-driving car is autonomy, that is, being able to take decisions without
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
human interference. The traditional computers required human input to yield
output. Reinforcement Learning has solved the problem of
human-dependence. Reinforcement Learning is about taking specific actions
to accumulate maximum reward. You can understand this with the following
instance: Assume that you are training a dog to fetch ball. Then you reward
the dog with a treat or reward each time it fetches the ball. You do not give it a
treat if it does not fetch the ball. The dog will realize the reward of treats if it
fetches the ball back. Reinforcement Learning uses the same principle. We
give a reward to the agent based on its action and it will try to maximize the
reward.
A Data Scientist will require tools and software to tackle the above-mentioned
problems. We will now take a look at some of the tools that a Data Scientist
uses to those problems.
Tools for Data Science
Data Scientists use traditional statistical methodologies that form the core
backbone of Machine Learning algorithms. They also use Deep Learning
algorithms to generate robust predictions. Data Scientists use the following
tools and programming languages​:
i. R
R is a scripting language​ that is specifically tailored for statistical
computing. It is widely used for data analysis, statistical modeling, time-series
forecasting, clustering etc. R is mostly used for statistical operations. It also
possesses the features of an object-oriented programming language. R is an
interpreter based language and is widely popular across multiple industries
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
ii. Python
Like R, Python is an interpreter based high-level programming language.
Python is a versatile language. It is mostly used for Data Science and Software
Development. Python has gained popularity due to its ease of use and code
readability. As a result, Python is widely used for Data Analysis, Natural
Language Processing, and Computer Vision. Python comes with various
graphical and statistical packages like Matplotlib, Numpy, SciPy and more
advanced packages for Deep Learning such as TensorFlow, PyTorch, Keras
etc. For the purpose of data mining, wrangling, visualizations and developing
predictive models, we utilize Python. This makes Python a very flexible
programming language.
iii. SQL
SQL stands for Structured Query Language. Data Scientists use SQL for
managing and querying data stored in databases. Being able to extract
information from databases is the first step towards analyzing the data.
Relational Databases are a collection of data organized in tables. We use SQL
for extracting, managing and manipulating the data. For example A Data
Scientist working in the banking industry uses SQL for extracting information
of customers. While Relational Databases use SQL, ‘NoSQL’ is a popular
choice for non-relational or distributed databases. Recently NoSQL has been
gaining popularity due to its flexible scalability, dynamic design, and open
source nature. MongoDB, Redis, and Cassandra are some of the popular
NoSQL languages.
iv. Hadoop
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
Big data is another trending term that deals with management and storage of
huge amount of data. Data is either structured or unstructured. A Data
Scientist must have a familiarity with complex data and must know tools that
regulate the storage of massive datasets. One such tool is Hadoop. While being
open-source software, Hadoop utilizes a distributed storage system using a
model called ‘MapReduce’. There are several packages in Hadoop such as
Apache Pig, Hive, HBase etc. Due to its ability to process colossal data quickly,
its scalable architecture and low-cost deployment,​ ​Hadoop has grown to
become the most popular software for Big Data​.
v. Tableau
Tableau is a Data Visualization software specializing in graphical analysis of
data. It allows its users to create interactive visualizations and dashboards.
This makes Tableau an ideal choice for showing various trends and insights of
the data in the form of interactable charts such as Treemaps, Histograms, Box
plots etc. An important feature of Tableau is its ability to connect with
spreadsheets, relational databases, and cloud platforms. This allows Tableau
to process data directly, making it easier for the users.
vi. Weka
For Data Scientists looking forward to getting familiar with Machine Learning
in action, Weka is can be an ideal option. Weka is generally used for Data
Mining but also consists of​ ​various​ ​tools required for Machine
Learning​ operations. It is completely open-source software that uses GUI
Interface making it easier for users to interact with, without requiring any line
of code.
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
Applications of Data Science
Data Science has created a strong foothold in several industries such as
medicine, banking, manufacturing, transportation etc. It has immense
applications and has variety of uses​. ​Some of the following applications of
Data Science are:
i. Data Science in Healthcare
Data Science has been playing a pivotal role in the Healthcare Industry. With
the help of classification algorithms, doctors are able to detect cancer and
tumors at an early stage using Image Recognition software. Genetic Industries
use Data Science for analyzing and classifying patterns of genomic sequences.
Various virtual assistants are also helping patients to resolve their physical
and mental ailments.
ii. Data Science in E-commerce
Amazon uses a recommendation system that recommends users various
products based on their historical purchase. Data Scientists have developed
recommendation systems predict user preferences using Machine Learning.
iii. Data Science in Manufacturing
Industrial robots have made taken over mundane and repetitive roles required
in the manufacturing unit. These industrial robots are autonomous in nature
and use Data Science technologies such as Reinforcement Learning and Image
Recognition.
DataFlair Web Services Pvt. Ltd,
Call: +91-84510-97879
+91-91111-33369
​https://data-flair.training
iv. Data Science as Conversational Agents
Amazon’s Alexa and Siri by Apple use Speech Recognition to understand
users. Data Scientists develop this speech recognition system, that converts
human speech into textual data. Also, it uses various Machine Learning
algorithms to classify user queries and provide an appropriate response.
v. Data Science in Transport
Self Driving Cars use autonomous agents that utilize Reinforcement Learning
and Detection algorithms. Self-Driving Cars are no longer fiction due to
advancements in Data Science.
Summary
While Data Science is a vast subject, being an aggregate of several technologies
and disciplines, it is possible to acquire these skills with the right approach. In
the end, Data Science is a very robust field that best fits people who have a
knack for experimentation and problem-solving. With a large number of
applications,​ ​Data Science has become the most versatile career​.

More Related Content

What's hot

Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Edureka!
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
Vishal Patel
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Edureka!
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
Professor Lili Saghafi
 
Analytics 2
Analytics 2Analytics 2
Analytics 2
Srikanth Ayithy
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R
Persontyle
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
bodaceacat
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
Jason Geng
 
Data analytics
Data analyticsData analytics
Data analytics
BindhuBhargaviTalasi
 
Data analytics
Data analyticsData analytics
Data analytics
HimanshuPise2
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
Peter Kua
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
Andrei Savu
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
swethaT16
 
Unit 2
Unit 2Unit 2
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
Caserta
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
VijayMohan Vasu
 
Notebooks in IBM
Notebooks in IBMNotebooks in IBM
Notebooks in IBM
Rosario Cunha
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlation
VrushaliSolanke
 
Predictive analytics: hot and getting hotter
Predictive analytics: hot and getting hotterPredictive analytics: hot and getting hotter
Predictive analytics: hot and getting hotterThe Marketing Distillery
 

What's hot (20)

Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Analytics 2
Analytics 2Analytics 2
Analytics 2
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
Unit 2
Unit 2Unit 2
Unit 2
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Notebooks in IBM
Notebooks in IBMNotebooks in IBM
Notebooks in IBM
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlation
 
Predictive analytics: hot and getting hotter
Predictive analytics: hot and getting hotterPredictive analytics: hot and getting hotter
Predictive analytics: hot and getting hotter
 

Similar to Data science tutorial

Difference b/w DataScience, Data Analyst
Difference b/w DataScience, Data AnalystDifference b/w DataScience, Data Analyst
Difference b/w DataScience, Data Analyst
3RI Technologies Pvt Ltd
 
What is Data analytics? How is data analytics a better career option?
What is Data analytics? How is data analytics a better career option?What is Data analytics? How is data analytics a better career option?
What is Data analytics? How is data analytics a better career option?
Aspire Techsoft Academy
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
Vipul Kalamkar
 
Data Analytics Training in Gurgaon.pdf
Data Analytics Training in Gurgaon.pdfData Analytics Training in Gurgaon.pdf
Data Analytics Training in Gurgaon.pdf
APTRON Gurgaon
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
Data Science Council of America
 
Top 10 areas of expertise in data science
Top 10 areas of expertise in data scienceTop 10 areas of expertise in data science
Top 10 areas of expertise in data science
GlobalTechCouncil
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
NagarajanG35
 
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfChallenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
venkatakeerthi3
 
Unlocking big data
Unlocking big dataUnlocking big data
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
Sukirti Garg
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
Rohit Dubey
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
PoojaPatidar11
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
MuhammadTahiriqbal13
 
Business Analytics Unit III: Developing analytical talent
Business Analytics Unit III: Developing analytical talentBusiness Analytics Unit III: Developing analytical talent
Business Analytics Unit III: Developing analytical talent
Rani Channamma University, Sangolli Rayanna First Grade Constituent College, Belagavi
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
NyraSehgal
 
23.pdf
23.pdf23.pdf
23.pdf
JeanJaggu
 
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptxUnlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
APTRON Solutions Noida
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Dr. C.V. Suresh Babu
 
Data Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdfData Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdf
Sujata Gupta
 

Similar to Data science tutorial (20)

Difference b/w DataScience, Data Analyst
Difference b/w DataScience, Data AnalystDifference b/w DataScience, Data Analyst
Difference b/w DataScience, Data Analyst
 
What is Data analytics? How is data analytics a better career option?
What is Data analytics? How is data analytics a better career option?What is Data analytics? How is data analytics a better career option?
What is Data analytics? How is data analytics a better career option?
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Data Analytics Training in Gurgaon.pdf
Data Analytics Training in Gurgaon.pdfData Analytics Training in Gurgaon.pdf
Data Analytics Training in Gurgaon.pdf
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
 
Top 10 areas of expertise in data science
Top 10 areas of expertise in data scienceTop 10 areas of expertise in data science
Top 10 areas of expertise in data science
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
 
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfChallenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Business Analytics Unit III: Developing analytical talent
Business Analytics Unit III: Developing analytical talentBusiness Analytics Unit III: Developing analytical talent
Business Analytics Unit III: Developing analytical talent
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
 
23.pdf
23.pdf23.pdf
23.pdf
 
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptxUnlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdfData Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdf
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

Data science tutorial

  • 1.
  • 2. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training Table Of Content : What is Data Science? ​4 Why Data Science? ​5 Role of a Data Scientist ​6 Solving Problems with Data Science ​7 Tools for Data Science ​9 i. R ​9 ii. Python ​9 iii. SQL ​10 iv. Hadoop ​10 v. Tableau ​11 vi. Weka ​11 Applications of Data Science ​11 i. Data Science in Healthcare ​11 ii. Data Science in E-commerce ​12 iii. Data Science in Manufacturing ​12 iv. Data Science as Conversational Agents ​12 v. Data Science in Transport ​12 Summary ​12
  • 3. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training Data Science has become one of the most demanded jobs of the 21st century. It has become a buzzword that almost everyone talks about these days. But what is Data Science? In this article, we will demystify Data Science, the role of a Data Scientist and have a look at the tools required to master Data Science. So, let’s start Data Science Tutorial. What is Data Science? “Data Science is about extraction, preparation, analysis, visualization, and maintenance of information. It is a cross-disciplinary field which uses scientific methods and processes to draw insights from data. ”
  • 4. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training With the emergence of new technologies, there has been an exponential increase in data. This has created an opportunity to analyze and derive meaningful insights from data. It requires special expertise of a ‘Data Scientist’ who can use various statistical & machine learning tools to understand and analyze data. A Data Scientist, specializing in Data Science, not only analyzes the data but also uses​ ​machine learning algorithms to predict future occurrences of an event​. Therefore, we can understand Data Science as a field that deals with data processing, analysis, and extraction of insights from the data using various statistical methods and computer algorithms. It is a multidisciplinary field that combines mathematics, statistics, and computer science. Why Data Science? So, after knowing what exactly Data Science is, you must explore why Data Science is important. So, data has become the fuel of industries. It is the new
  • 5. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training electricity. Companies require data to function, grow and improve their businesses. Data Scientists deal with the data in order to assist companies in making proper decisions. The data-driven approach undertaken by the companies with the help of Data Scientists who analyze a large amount of data to derive meaningful insights. These insights will be helpful for the companies who wish to analyze themselves and their performance in the market. Other than commercial industries, healthcare industries also use Data Science. where the technology is in huge demand to recognize microscopic tumors and deformities at an early stage of diagnosis. The number of roles for Data Scientists has grown by ​650%​ since 2012. About 11.5 Million jobs​ will be ​created by 2026​ according to the U.S. Bureau of Labor Statistics. Also, the job of Data Scientist ranks among top emerging jobs on Linkedin. All the statistics point towards the growing demand for Data Scientists. Role of a Data Scientist You might want to know who is a Data Scientist and what are his/her roles in different fields. A Data Scientist deals with both unstructured and structured data. The unstructured data is present in a raw format that requires extensive data pre-processing, cleaning and organization in order to impart a meaningful structure to a dataset. The Data Scientist then investigates this organized data and analyzes it thoroughly to derive information from it using various statistical methodologies. We use these statistical methods to describe, visualize and hypothesize information from the data. Then with the usage of advanced machine learning algorithms, the data scientist predicts the occurrence of events and takes data-driven decisions. A​ Data Scientist​ deploys vast arrays of tools and practices to recognize redundant patterns within the data. These tools range from SQL, Hadoop to
  • 6. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training Weka, R, and Python. Data Scientists usually act as consultants employed by companies where they participate in various decision-making processes and creation of strategies. In other words, Data Scientists use meaningful insights from data to assist companies in taking smarter business decisions. For example – Companies like Netflix, Google and Amazon are using Data Science to develop powerful recommendation systems for their users. Similarly, various financial companies are using predictive analytics and forecasting methods to predict stock prices. Data Science has helped to create smarter systems that can take autonomous decisions based on historical datasets. Through its assimilation with emerging technologies like Computer Vision, Natural Language Processing and Reinforcement Learning, it has manifested itself to form a greater picture of​Artificial Intelligence​. Solving Problems with Data Science When solving a real-world problem with Data Science, the first step towards solving it starts with Data Cleaning and Preprocessing. When a Data Scientist is provided with a dataset, it may be in an unstructured format with various inconsistencies. Organizing the data and removing erroneous information makes it easier to analyze and draw insights. This process involves the removal of redundant data, the transformation of data in a prescribed format, handling missing values etc. A Data Scientist analyzes the data through various statistical procedures. In particular, two types of procedures used are: ● Descriptive Statistics ● Inferential Statistics Assume that you are a Data Scientist working for a company that manufactures cell phones. You have to analyze customers using the mobile
  • 7. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training phones of your company. In order to do so, you will first take a thorough look at the data and understand various trends and patterns involved. In the end, you will summarize the data and present it in the form of a graph or a chart. You therefore, apply Descriptive Statistics to solve the problem. You will then draw ‘inferences’ or conclusions from the data. We will understand inferential statistics through the following example – Assume that you wish to find out a number of defects that occurred during manufacturing. However, individual testing of mobile phones can take time. Therefore, you will consider a sample of the given phones and make a generalization about the number of defective phones in the total sample. Now, you have to predict the sales of mobile phones over a period of two years. As a result, you will use Regression Algorithms. Based on the given historical sales, you will use regression algorithms to predict the sales over time. Furthermore, you wish to analyze if customers will purchase the product based on their annual salary, age, gender, and credit score. You will use historical data to find out whether customers will buy (1) or not (0). Since there are two outputs or ‘classes’, you will use a Binary Classification Algorithm. Also, if there are more than two output classes we use Multivariate Classification Algorithm to solve the problem. Both of the above-stated problems are part of ‘Supervised Learning’. There are also instances of ‘unlabeled’ data. In this, there is no segregation of output in fixed classes as mentioned above. Suppose that you have to find clusters of potential customers and leads based on their socio-economic background. Since you do not have a fixed set of classes in your historical data, you will use the Clustering Algorithm to identify clusters or sets of potential clients. Clustering is an ‘Unsupervised Learning’ algorithm. Self Driving cars have become a trending technology. The principle behind the self-driving car is autonomy, that is, being able to take decisions without
  • 8. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training human interference. The traditional computers required human input to yield output. Reinforcement Learning has solved the problem of human-dependence. Reinforcement Learning is about taking specific actions to accumulate maximum reward. You can understand this with the following instance: Assume that you are training a dog to fetch ball. Then you reward the dog with a treat or reward each time it fetches the ball. You do not give it a treat if it does not fetch the ball. The dog will realize the reward of treats if it fetches the ball back. Reinforcement Learning uses the same principle. We give a reward to the agent based on its action and it will try to maximize the reward. A Data Scientist will require tools and software to tackle the above-mentioned problems. We will now take a look at some of the tools that a Data Scientist uses to those problems. Tools for Data Science Data Scientists use traditional statistical methodologies that form the core backbone of Machine Learning algorithms. They also use Deep Learning algorithms to generate robust predictions. Data Scientists use the following tools and programming languages​: i. R R is a scripting language​ that is specifically tailored for statistical computing. It is widely used for data analysis, statistical modeling, time-series forecasting, clustering etc. R is mostly used for statistical operations. It also possesses the features of an object-oriented programming language. R is an interpreter based language and is widely popular across multiple industries
  • 9. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training ii. Python Like R, Python is an interpreter based high-level programming language. Python is a versatile language. It is mostly used for Data Science and Software Development. Python has gained popularity due to its ease of use and code readability. As a result, Python is widely used for Data Analysis, Natural Language Processing, and Computer Vision. Python comes with various graphical and statistical packages like Matplotlib, Numpy, SciPy and more advanced packages for Deep Learning such as TensorFlow, PyTorch, Keras etc. For the purpose of data mining, wrangling, visualizations and developing predictive models, we utilize Python. This makes Python a very flexible programming language. iii. SQL SQL stands for Structured Query Language. Data Scientists use SQL for managing and querying data stored in databases. Being able to extract information from databases is the first step towards analyzing the data. Relational Databases are a collection of data organized in tables. We use SQL for extracting, managing and manipulating the data. For example A Data Scientist working in the banking industry uses SQL for extracting information of customers. While Relational Databases use SQL, ‘NoSQL’ is a popular choice for non-relational or distributed databases. Recently NoSQL has been gaining popularity due to its flexible scalability, dynamic design, and open source nature. MongoDB, Redis, and Cassandra are some of the popular NoSQL languages. iv. Hadoop
  • 10. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training Big data is another trending term that deals with management and storage of huge amount of data. Data is either structured or unstructured. A Data Scientist must have a familiarity with complex data and must know tools that regulate the storage of massive datasets. One such tool is Hadoop. While being open-source software, Hadoop utilizes a distributed storage system using a model called ‘MapReduce’. There are several packages in Hadoop such as Apache Pig, Hive, HBase etc. Due to its ability to process colossal data quickly, its scalable architecture and low-cost deployment,​ ​Hadoop has grown to become the most popular software for Big Data​. v. Tableau Tableau is a Data Visualization software specializing in graphical analysis of data. It allows its users to create interactive visualizations and dashboards. This makes Tableau an ideal choice for showing various trends and insights of the data in the form of interactable charts such as Treemaps, Histograms, Box plots etc. An important feature of Tableau is its ability to connect with spreadsheets, relational databases, and cloud platforms. This allows Tableau to process data directly, making it easier for the users. vi. Weka For Data Scientists looking forward to getting familiar with Machine Learning in action, Weka is can be an ideal option. Weka is generally used for Data Mining but also consists of​ ​various​ ​tools required for Machine Learning​ operations. It is completely open-source software that uses GUI Interface making it easier for users to interact with, without requiring any line of code.
  • 11. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training Applications of Data Science Data Science has created a strong foothold in several industries such as medicine, banking, manufacturing, transportation etc. It has immense applications and has variety of uses​. ​Some of the following applications of Data Science are: i. Data Science in Healthcare Data Science has been playing a pivotal role in the Healthcare Industry. With the help of classification algorithms, doctors are able to detect cancer and tumors at an early stage using Image Recognition software. Genetic Industries use Data Science for analyzing and classifying patterns of genomic sequences. Various virtual assistants are also helping patients to resolve their physical and mental ailments. ii. Data Science in E-commerce Amazon uses a recommendation system that recommends users various products based on their historical purchase. Data Scientists have developed recommendation systems predict user preferences using Machine Learning. iii. Data Science in Manufacturing Industrial robots have made taken over mundane and repetitive roles required in the manufacturing unit. These industrial robots are autonomous in nature and use Data Science technologies such as Reinforcement Learning and Image Recognition.
  • 12. DataFlair Web Services Pvt. Ltd, Call: +91-84510-97879 +91-91111-33369 ​https://data-flair.training iv. Data Science as Conversational Agents Amazon’s Alexa and Siri by Apple use Speech Recognition to understand users. Data Scientists develop this speech recognition system, that converts human speech into textual data. Also, it uses various Machine Learning algorithms to classify user queries and provide an appropriate response. v. Data Science in Transport Self Driving Cars use autonomous agents that utilize Reinforcement Learning and Detection algorithms. Self-Driving Cars are no longer fiction due to advancements in Data Science. Summary While Data Science is a vast subject, being an aggregate of several technologies and disciplines, it is possible to acquire these skills with the right approach. In the end, Data Science is a very robust field that best fits people who have a knack for experimentation and problem-solving. With a large number of applications,​ ​Data Science has become the most versatile career​.