SlideShare a Scribd company logo
Introduction to
DATA SCIENCE
Challenges deep-dive
Why the Hype Around
Data Science?
● The demand for data scientists will soar by 28% by 2023
● Data scientist roles have grown over 650% since 2012, but
currently, 35,000 people in the US have data science skills,
while hundreds of companies are hiring for those roles.
● Software engineering is a common starting point for
professionals who are in the top five fasting growing jobs today.
● Data Science gives you career flexibility
Who are Data Scientist?
Challenges deep-dive
What is Machine
Learning ?
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Challenges deep-dive
A Definition
A computer program is said to learn from experience E with
respect to some task T and some performance measure P if its
performance on T, as measured by P, improves with experience E.
-Tom Mitchell
Challenges deep-dive
A Small Question
Suppose we feed a learning algorithm a lot of historical weather
data, and have it learn to predict weather. In this setting, what is
T,P,E?
More Data,
More Questions,
Better Answers
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Real World
Applications
With the rise in big data, machine learning has become particularly
important for solving problems in areas like these:
● Image processing and computer vision,for face recognition,
motion detection, and object detection
● Computational biology, for tumor detection, drug discovery, and
DNA sequencing
● Energy production, for price and load forecasting
● Automotive, aerospace, and manufacturing, for predictive
maintenance
● Natural language processing
Challenges deep-dive
How Machine
Learning Works
Machine learning uses two types of techniques:
● Supervised learning, which trains a model on known input and
output data so that it can predict future outputs
● Unsupervised learning, which finds hidden patterns or intrinsic
structures in input data.
Machine Learning
Techniques
Challenges deep-dive
Supervised
Learning
The aim of supervised machine learning is to build a model that
makes predictions based on evidence in the presence of
uncertainty. A supervised learning algorithm takes a known set of
input data and known responses to the data (output) and trains a
model to generate reasonable predictions for the response to new
data
Classification - predict discrete responses
Classification models classify input data into categories.for
example, whether an email is genuine or spam, or whether a tumor
is cancerous or benign.
Regression - predict continuous responses
for example, changes in temperature or fluctuations in power
demand. Typical applications include electricity load forecasting and
algorithmic trading.
Challenges deep-dive
Unsupervised
Learning
Unsupervised learning finds hidden patterns or intrinsic structures in
data. It is used to draw inferences from dataset consisting of input
data without labeled responses.
Clustering is the most common unsupervised learning technique. It
is used for exploratory data analysis to find hidden patterns or
groupings in data.Applications for clustering include gene sequence
analysis,market research, and object recognition
Knowledge Test
Which of the following would you apply supervised learning to?
1. Given genetic (DNA) data from a person, predict the odds of him/her developing
diabetes over the next 10 years.
2. Given a large dataset of medical records from patients suffering from heart
disease, try to learn whether there might be different clusters of such patients for
which we might tailor separate treatments.
3. Given data on how 1000 medical patients respond to an experimental drug (such
as effectiveness of the treatment, side effects, etc.), discover whether there are
different categories or "types" of patients in terms of how they respond to the
drug, and if so what these categories are.
4. Have a computer examine an audio clip of a piece of music, and classify whether
or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a
clip of only musical instruments (and no vocals).
Knowledge Test
Which of the following questions can be answered using a
classification algorithm?
1. How does the exchange rate depend on the GDP?
2. Does a document contain the handwritten letter S?
3. How can I group supermarket products using purchase
frequency?
Knowledge Test
1. Suppose you are working on weather prediction, and you
would like to predict whether or not it will be raining at 5pm
tomorrow. You want to use a learning algorithm for this.Would
you treat this as a classification or a regression problem?
2. Suppose you are working on stock market prediction. You
would like to predict whether or not a certain company will
declare bankruptcy within the next 7 days (by training on data
of similar companies that had previously been at risk of
bankruptcy). Would you treat this as a classification or a
regression problem?
How Do You
Decide Which
Algorithm
to Use?
Choosing the right algorithm can seem overwhelming
There are dozens of supervised and unsupervised machine
learning algorithms, and each takes a different approach to
learning.
There is no best method or one size fits all. Finding the right
algorithm is partly just trial and error
But algorithm selection also depends on the size and type of data
you’re working with, the insights you want to get from the data, and
how those insights will be used.
Two - Class Classification
Multi - Class Classification
Anomaly Detection
Regression
Clustering
Challenges deep-dive
When should we use
Machine Learning
Consider using machine learning when you have a complex task or
problem involving a large amount of data and lots of variables, but
no existing formula or equation.
Knowledge Test
Have a look at the statements below and identify the one which
is not a machine learning problem
1. Given a viewer's shopping habits, recommend a product to
purchase the next time she visits your website.
2. Given the symptoms of a patient, identify her illness.
3. Predict the USD/EUR exchange rate for February 2023.
4. Compute the mean wage of 10 employees for your company.
Knowledge Test
Which of the following statements uses a machine learning
model?
1. Determine whether an incoming email is spam or not
2. Obtain the name of last year's FIFIA Ballon d’Or champion
3. Automatically tagging your new Facebook photos
4. Select the student with the highest grade on a statistics course
Getting
Started
Challenges deep-dive
There is NO
Straight Line
With machine learning there’s rarely a straight line from start to
finish. You’ll find yourself constantly iterating and trying different
ideas and approaches
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Machine Learning
Challenges
● Data comes in all shapes and sizes
● Preprocessing your data might require specialized knowledge
and tools
● It takes time to find the best model to fit the data.
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Questions to Ask
Before Starting
Every machine learning workflow begins with three questions:
● What kind of data are you working with?
● What insights do you want to get from it?
● How and where will those insights be applied?
Your answers to these questions help you decide whether to use
supervised or unsupervised learning.
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Data Science -
Five Questions
There are only five questions that data science answers:
● Is this A or B?
● Is this weird?
● How much – or – How many?
● How is this organized?
● What should I do next?
Knowledge Test
Which of the following questions can be answered using a
classification algorithm?
1. How does the exchange rate depend on the GDP?
2. Does a document contain the handwritten letter S?
3. How can I group supermarket products using purchase
frequency?
Workflow at a Glance
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 1 -
Load the Data
We store the labeled data sets in a text file. A flat file format such as
text or CSV is easy to work with and makes it straightforward to
import data.
Machine learning algorithms aren’t smart enough to tell the
difference between noise and valuable information. Before using the
data for training, we need to make sure it’s clean and complete
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 2 -
Preprocess the Data
To preprocess the data we do the following:
● Look for outliers–data points that lie outside the rest of the data
● Check for missing values
● Divide the data into two sets
○ We save part of the data for testing (the test set) and use
the rest (the training set) to build models. This is referred
to as holdout, and is a useful cross-validation technique
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 3 -
Derive Features
Deriving features (also known as feature engineering or feature
extraction) turns raw data into information that a machine learning
algorithm can use.
Use feature selection to:
• Improve the accuracy of a machine learning algorithm
• Boost model performance for high-dimensional data sets
• Improve model interpretability
• Prevent overfitting
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 4 -
Build and Train Model
● The predefined algorithms and the test data are used for
building the model.
● The training data is used to train and evaluate the model
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 5 -
Improve the Model
Improving a model can take two different directions: make the
model simpler or add complexity.
Simplify - reduce the number of features
Add Complexity - make it more fine-tuned
Simplify
Popular feature reduction techniques include:
● Correlation matrix – shows the relationship between
variables, so that variables (or features) that are not highly
correlated can be removed.
● Principal component analysis (PCA) - eliminates redundancy
by finding a combination of features that captures key
distinctions between the original features and brings out strong
patterns in the dataset.
● Sequential feature reduction – reduces features iteratively on
the model until there is no improvement in performance
Add Complexity
● Use model combination – merge multiple simpler models into
a larger model that is better able to represent the trends in the
data than any of the simpler models could on their own.
● Add more data sources
TO DO
● Getting Started
● Familiarize with Maths and
Algorithms
● Select the Infrastructure or
Tool
● Create your profile and
participate in competition
Christy Abraham Joy
Email - christyabrahamjoy@gmail.com
Mob - +91 94000 95273
Feel Free to Contact!

More Related Content

What's hot

A non-technical introduction to ChatGPT - SEDA.pptx
A non-technical introduction to ChatGPT - SEDA.pptxA non-technical introduction to ChatGPT - SEDA.pptx
A non-technical introduction to ChatGPT - SEDA.pptx
Sue Beckingham
 
Chat GPT - A Game Changer in Education
Chat GPT - A Game Changer in EducationChat GPT - A Game Changer in Education
Chat GPT - A Game Changer in Education
Thiyagu K
 
10 Things your Audience Hates About your Presentation
10 Things your Audience Hates About your Presentation10 Things your Audience Hates About your Presentation
10 Things your Audience Hates About your Presentation
Stinson
 
5 BENIFITES OF CHAT GPT.pptx
5 BENIFITES OF CHAT GPT.pptx5 BENIFITES OF CHAT GPT.pptx
5 BENIFITES OF CHAT GPT.pptx
shailesh sangle
 
Data Design: Where Math and Art Collide
Data Design: Where Math and Art CollideData Design: Where Math and Art Collide
Data Design: Where Math and Art Collide
Trina Chiasson
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck
 
Unlocking the Power of ChatGPT
Unlocking the Power of ChatGPTUnlocking the Power of ChatGPT
Unlocking the Power of ChatGPT
Kristine Schachinger SEO and Online Marketing
 
Uses of AI text bot.pdf
Uses of AI text bot.pdfUses of AI text bot.pdf
Uses of AI text bot.pdf
SreeNivas983124
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
Damian T. Gordon
 
Different Roles in Machine Learning Career
Different Roles in Machine Learning CareerDifferent Roles in Machine Learning Career
Different Roles in Machine Learning Career
Intellipaat
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
Andre Muscat
 
Chat GPT TEL Community of Practice
Chat GPT TEL Community of PracticeChat GPT TEL Community of Practice
Chat GPT TEL Community of Practice
Peter Windle
 
GitHub Copilot.pptx
GitHub Copilot.pptxGitHub Copilot.pptx
GitHub Copilot.pptx
Luis Beltran
 
14 2 2023 - AI & Marketing - Hugues Rey.pdf
14 2 2023 - AI & Marketing - Hugues Rey.pdf14 2 2023 - AI & Marketing - Hugues Rey.pdf
14 2 2023 - AI & Marketing - Hugues Rey.pdf
Hugues Rey
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
Greg Makowski
 
Andrii Burlutskyi: Going beyond ABM: power the client buying journey using AI...
Andrii Burlutskyi: Going beyond ABM: power the client buying journey using AI...Andrii Burlutskyi: Going beyond ABM: power the client buying journey using AI...
Andrii Burlutskyi: Going beyond ABM: power the client buying journey using AI...
Lviv Startup Club
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
How to Chat Gpt Works?
How to Chat Gpt Works?How to Chat Gpt Works?
How to Chat Gpt Works?
Md Tanver Rana Sobur
 

What's hot (20)

A non-technical introduction to ChatGPT - SEDA.pptx
A non-technical introduction to ChatGPT - SEDA.pptxA non-technical introduction to ChatGPT - SEDA.pptx
A non-technical introduction to ChatGPT - SEDA.pptx
 
Chat GPT - A Game Changer in Education
Chat GPT - A Game Changer in EducationChat GPT - A Game Changer in Education
Chat GPT - A Game Changer in Education
 
10 Things your Audience Hates About your Presentation
10 Things your Audience Hates About your Presentation10 Things your Audience Hates About your Presentation
10 Things your Audience Hates About your Presentation
 
5 BENIFITES OF CHAT GPT.pptx
5 BENIFITES OF CHAT GPT.pptx5 BENIFITES OF CHAT GPT.pptx
5 BENIFITES OF CHAT GPT.pptx
 
Data Design: Where Math and Art Collide
Data Design: Where Math and Art CollideData Design: Where Math and Art Collide
Data Design: Where Math and Art Collide
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Unlocking the Power of ChatGPT
Unlocking the Power of ChatGPTUnlocking the Power of ChatGPT
Unlocking the Power of ChatGPT
 
Uses of AI text bot.pdf
Uses of AI text bot.pdfUses of AI text bot.pdf
Uses of AI text bot.pdf
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
Different Roles in Machine Learning Career
Different Roles in Machine Learning CareerDifferent Roles in Machine Learning Career
Different Roles in Machine Learning Career
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
Chat GPT TEL Community of Practice
Chat GPT TEL Community of PracticeChat GPT TEL Community of Practice
Chat GPT TEL Community of Practice
 
GitHub Copilot.pptx
GitHub Copilot.pptxGitHub Copilot.pptx
GitHub Copilot.pptx
 
14 2 2023 - AI & Marketing - Hugues Rey.pdf
14 2 2023 - AI & Marketing - Hugues Rey.pdf14 2 2023 - AI & Marketing - Hugues Rey.pdf
14 2 2023 - AI & Marketing - Hugues Rey.pdf
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
Andrii Burlutskyi: Going beyond ABM: power the client buying journey using AI...
Andrii Burlutskyi: Going beyond ABM: power the client buying journey using AI...Andrii Burlutskyi: Going beyond ABM: power the client buying journey using AI...
Andrii Burlutskyi: Going beyond ABM: power the client buying journey using AI...
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
How to Chat Gpt Works?
How to Chat Gpt Works?How to Chat Gpt Works?
How to Chat Gpt Works?
 

Similar to Introduction to Data Science

Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine Learning
Knoldus Inc.
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Temok IT Services
 
INTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptxINTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptx
srikanthkallem1
 
Machine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsMachine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domains
Shrutika Oswal
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
ARVIND SARDAR
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Amit Kumar
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
Umair Shafique
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
Johnson Ubah
 
detailed Presentation on supervised learning
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learning
ZAMANCHBWN
 
AI.pdf
AI.pdfAI.pdf
AI.pdf
Tariqqandeel
 
machine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdfmachine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdf
agfi
 
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
Training_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docxTraining_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docx
ShubhamBishnoi14
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
StephenAmell4
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
AnastasiaSteele10
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
JamieDornan2
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
AnastasiaSteele10
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
StephenAmell4
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
Tara ram Goyal
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
Chouaieb NEMRI
 

Similar to Introduction to Data Science (20)

Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine Learning
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
 
INTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptxINTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptx
 
Machine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsMachine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domains
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
detailed Presentation on supervised learning
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learning
 
AI.pdf
AI.pdfAI.pdf
AI.pdf
 
machine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdfmachine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdf
 
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Training_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docxTraining_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docx
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
 

Recently uploaded

Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
prijesh mathew
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
frp60658
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Xiao Xu
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 

Recently uploaded (20)

Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 

Introduction to Data Science

  • 2.
  • 3.
  • 4.
  • 5. Challenges deep-dive Why the Hype Around Data Science? ● The demand for data scientists will soar by 28% by 2023 ● Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in the US have data science skills, while hundreds of companies are hiring for those roles. ● Software engineering is a common starting point for professionals who are in the top five fasting growing jobs today. ● Data Science gives you career flexibility
  • 6. Who are Data Scientist?
  • 7.
  • 8. Challenges deep-dive What is Machine Learning ? Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases.
  • 9. Challenges deep-dive A Definition A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. -Tom Mitchell
  • 10. Challenges deep-dive A Small Question Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. In this setting, what is T,P,E?
  • 11.
  • 13. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Real World Applications With the rise in big data, machine learning has become particularly important for solving problems in areas like these: ● Image processing and computer vision,for face recognition, motion detection, and object detection ● Computational biology, for tumor detection, drug discovery, and DNA sequencing ● Energy production, for price and load forecasting ● Automotive, aerospace, and manufacturing, for predictive maintenance ● Natural language processing
  • 14. Challenges deep-dive How Machine Learning Works Machine learning uses two types of techniques: ● Supervised learning, which trains a model on known input and output data so that it can predict future outputs ● Unsupervised learning, which finds hidden patterns or intrinsic structures in input data.
  • 16. Challenges deep-dive Supervised Learning The aim of supervised machine learning is to build a model that makes predictions based on evidence in the presence of uncertainty. A supervised learning algorithm takes a known set of input data and known responses to the data (output) and trains a model to generate reasonable predictions for the response to new data
  • 17. Classification - predict discrete responses Classification models classify input data into categories.for example, whether an email is genuine or spam, or whether a tumor is cancerous or benign. Regression - predict continuous responses for example, changes in temperature or fluctuations in power demand. Typical applications include electricity load forecasting and algorithmic trading.
  • 18. Challenges deep-dive Unsupervised Learning Unsupervised learning finds hidden patterns or intrinsic structures in data. It is used to draw inferences from dataset consisting of input data without labeled responses.
  • 19. Clustering is the most common unsupervised learning technique. It is used for exploratory data analysis to find hidden patterns or groupings in data.Applications for clustering include gene sequence analysis,market research, and object recognition
  • 20. Knowledge Test Which of the following would you apply supervised learning to? 1. Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next 10 years. 2. Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments. 3. Given data on how 1000 medical patients respond to an experimental drug (such as effectiveness of the treatment, side effects, etc.), discover whether there are different categories or "types" of patients in terms of how they respond to the drug, and if so what these categories are. 4. Have a computer examine an audio clip of a piece of music, and classify whether or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and no vocals).
  • 21. Knowledge Test Which of the following questions can be answered using a classification algorithm? 1. How does the exchange rate depend on the GDP? 2. Does a document contain the handwritten letter S? 3. How can I group supermarket products using purchase frequency?
  • 22. Knowledge Test 1. Suppose you are working on weather prediction, and you would like to predict whether or not it will be raining at 5pm tomorrow. You want to use a learning algorithm for this.Would you treat this as a classification or a regression problem? 2. Suppose you are working on stock market prediction. You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy). Would you treat this as a classification or a regression problem?
  • 23. How Do You Decide Which Algorithm to Use?
  • 24. Choosing the right algorithm can seem overwhelming There are dozens of supervised and unsupervised machine learning algorithms, and each takes a different approach to learning.
  • 25. There is no best method or one size fits all. Finding the right algorithm is partly just trial and error But algorithm selection also depends on the size and type of data you’re working with, the insights you want to get from the data, and how those insights will be used.
  • 26. Two - Class Classification
  • 27. Multi - Class Classification
  • 31. Challenges deep-dive When should we use Machine Learning Consider using machine learning when you have a complex task or problem involving a large amount of data and lots of variables, but no existing formula or equation.
  • 32.
  • 33. Knowledge Test Have a look at the statements below and identify the one which is not a machine learning problem 1. Given a viewer's shopping habits, recommend a product to purchase the next time she visits your website. 2. Given the symptoms of a patient, identify her illness. 3. Predict the USD/EUR exchange rate for February 2023. 4. Compute the mean wage of 10 employees for your company.
  • 34. Knowledge Test Which of the following statements uses a machine learning model? 1. Determine whether an incoming email is spam or not 2. Obtain the name of last year's FIFIA Ballon d’Or champion 3. Automatically tagging your new Facebook photos 4. Select the student with the highest grade on a statistics course
  • 36. Challenges deep-dive There is NO Straight Line With machine learning there’s rarely a straight line from start to finish. You’ll find yourself constantly iterating and trying different ideas and approaches
  • 37. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Machine Learning Challenges ● Data comes in all shapes and sizes ● Preprocessing your data might require specialized knowledge and tools ● It takes time to find the best model to fit the data.
  • 38. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Questions to Ask Before Starting Every machine learning workflow begins with three questions: ● What kind of data are you working with? ● What insights do you want to get from it? ● How and where will those insights be applied? Your answers to these questions help you decide whether to use supervised or unsupervised learning.
  • 39. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Data Science - Five Questions There are only five questions that data science answers: ● Is this A or B? ● Is this weird? ● How much – or – How many? ● How is this organized? ● What should I do next?
  • 40. Knowledge Test Which of the following questions can be answered using a classification algorithm? 1. How does the exchange rate depend on the GDP? 2. Does a document contain the handwritten letter S? 3. How can I group supermarket products using purchase frequency?
  • 41.
  • 42. Workflow at a Glance
  • 43. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 1 - Load the Data We store the labeled data sets in a text file. A flat file format such as text or CSV is easy to work with and makes it straightforward to import data. Machine learning algorithms aren’t smart enough to tell the difference between noise and valuable information. Before using the data for training, we need to make sure it’s clean and complete
  • 44. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 2 - Preprocess the Data To preprocess the data we do the following: ● Look for outliers–data points that lie outside the rest of the data ● Check for missing values ● Divide the data into two sets ○ We save part of the data for testing (the test set) and use the rest (the training set) to build models. This is referred to as holdout, and is a useful cross-validation technique
  • 45. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 3 - Derive Features Deriving features (also known as feature engineering or feature extraction) turns raw data into information that a machine learning algorithm can use. Use feature selection to: • Improve the accuracy of a machine learning algorithm • Boost model performance for high-dimensional data sets • Improve model interpretability • Prevent overfitting
  • 46. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 4 - Build and Train Model ● The predefined algorithms and the test data are used for building the model. ● The training data is used to train and evaluate the model
  • 47. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 5 - Improve the Model Improving a model can take two different directions: make the model simpler or add complexity. Simplify - reduce the number of features Add Complexity - make it more fine-tuned
  • 48. Simplify Popular feature reduction techniques include: ● Correlation matrix – shows the relationship between variables, so that variables (or features) that are not highly correlated can be removed. ● Principal component analysis (PCA) - eliminates redundancy by finding a combination of features that captures key distinctions between the original features and brings out strong patterns in the dataset. ● Sequential feature reduction – reduces features iteratively on the model until there is no improvement in performance
  • 49. Add Complexity ● Use model combination – merge multiple simpler models into a larger model that is better able to represent the trends in the data than any of the simpler models could on their own. ● Add more data sources
  • 50. TO DO ● Getting Started ● Familiarize with Maths and Algorithms ● Select the Infrastructure or Tool ● Create your profile and participate in competition
  • 51. Christy Abraham Joy Email - christyabrahamjoy@gmail.com Mob - +91 94000 95273 Feel Free to Contact!