3. The Machine Learning Workflow
The job of the data scientist
The processes a data scientist follows to provide
feedback to decision makers
The machine learning process in business
environment
4. Goals of Machine Learning
Workflow
Goals
• Derive answers to business
challenges
• Derive meaningful conclusions for
complicated issues
• Identify actionable steps with a
given set of variables
5. Yes
No
Yes
Yes
7 steps of Machine Learning
1.Get
more data
2. Ask a
sharp
question
3. Add
the data
to table
4. Check
for quality
5.
Transfor
m
features
6. Answer
the
question
7. Use
the
answer
6. Step 1 : Get More Data
Data can be collected in different formats
Investigate a business challenge
Quality of the model depends upon quality
and quantity of the data gathered
8. Step 2 : Ask a sharp question
Need for a
sharp
question
It is direct and specific
It focuses on a single topic
It helps you to get clear
answers to the questions.
It focuses on the exact need
and requirement
9. Step 2 : Ask a sharp question (Vague vs
Sharp)
Vague questions
1. What should you do?
2. How should you live your life?
3. Which career path should you take?
4. Which data can tell you about your business?
Sharp questions
1. Which route will get you to work faster?
2. When are you planning to join the company?
3. How many times will a user use the new product
features?
4. Where did you go to college?
10. Step 2 : Ask a sharp question (Example)
✦Study different tables of data in the database and analyse your company’s
monthly sales performance
✦Understand how the company is doing in terms of market share
✦Analyse the historical data and predict the stock price for a future date
11. Step 3 : Add data to the table
Data analyst arranges data in database tables in a systematic
manner.01
Systematic arrangement of data helps in detailed analysis02
Data is stored in the table in the form of columns and rows.03
Table columns represent data of a single type and rows
represent records pertaining to one entity.04
Aggregate, distribute, compute of measure to derive data
analysis.05
12. Data Analysis in Machine Learning
The process of deriving new
findings from historical data
Focuses on aggregating
table data to find answers to
business problems
Performed by data analysts
to build machine learning
algorithms
13. Example : Add Data to the Table
• The stock price column shows the stock value across different dates
• Each table row represents observations across given attributes
14. Example : Data Analysis
Aggregate and distribute the data as shown here:
15. Example : Aggregate
• You can aggregate the data in the table to derive answers
• This process is called data analysis and involves counting total
observations in a table or combining data from multiple tables
16. Example : Distribute, Compute, Measure
• An example of performing aggregate, distribute, compute and measure operations on data
in tables
• Each feature and their observations are distributed across the table and then combined
17. Example : Estimate
• The market share column shows the estimated stock price values of the company that are
derived from the previous steps
18. Step 4 : Check for Quality
Determine if the data is acceptable for
further investigation
Ensure the data in a column is in a
consistent format
20. Check for Quality : Example
• There is inconsistencies in the data format in the Birth year column of the table
• Dates in the column need to be converted to a consistent format to make it readable for the
ML algorithm
21. Check for Quality : Example
• Denote the Birth year column numbers as numbers, without any special characters
• Checking data quality is a critical step
22. Step 5 : Transform Features
•Enables you to make sense out of the data, especially when
there are multiple features
•Help overcome challenges where some features may not
give useful information for the model, where as some
features may be combined to derive meaningful information
Feature Engineering
23. Tricks of Feature Engineering
•Scale Invariant Feature Transform (SIFT): Images
•Term Frequency-Inverse Document Frequency (TF-IDF):
Text
Data Specific
•Econometric, technological, agricultural and sociological
data engineering
Domain Specific
•Images, text and audio data engineering
Deep Learning
24. Transform Features : Example
• There are 3 columns and 65670 rows
• Features 0 and 1 have similar values
• The numbers are meaningless and scattered
25. Transform Features : Example
• Values of feature column 0 is multiplied with every observation in feature
column 1
• These values are plotted in image 2
26. Transform Features : Example
• By plotting the values obtained by subtracting feature 0 from feature 1, a
curve is formed
• This curve is a normal or gaussian distribution or bell-shaped curve
27. Step 6 : Answer the question
Helps to analyse if the obtained answers are clear
Question
s
How much or how many?
Which category?
Which group?
Does this look strange?
Which action?
28. Answer the Question: Type 1
What will be the temperature this Friday?01
How much or how many?
How many people will like your post?02
What will be your product sales next month?03
29. Answer the Question: Type 2
Is this an image of a dog?01
Which category?
What is the topic of this news article?02
Which hotel in your area offers free Wi-Fi?03
30. Answer the Question: Type 3
Which group of shoppers purchase similar
products?01
Which group?
Which group of viewers like horror movies?02
How best can you divide this book into ten
topics?03
31. Answer the Question: Type 4
Is this internet message typical?01
Does this look strange?
Is this heart beat reading abnormal?02
Do these transactions look unusual as opposed
to customers usual transactions?03
32. Answer the Question: Type 5
Should you vacuum again or not?01
Which action?
Should you beat the red light?02
Should you raise or lower the temperature?03
33. Step 7 : Use the Answer
Making up a decision01
There are plenty of ways to use the answer derived
form the previous step.
Proposing the price of an item02
Publishing the results obtained as part of a
research paper
03
Constructing a dashboard on a visualisation tool04
Making changes to product features05
34. Key Takeaway
✦Machine learning workflow involves seven steps.
✦The first step of machine learning workflow is used to collect data to answer different business
questions.
✦To get the desired response, always ask sharp questions and avoid vague ones.
✦Arrange raw data in tables for better data analysis.
✦To ensure data consistency data scientists must check for the quality of data.
✦The transform feature is used to increase the efficiency of the machine learning model.
✦The answer received by the model helps in solving business challenges.
✦Learn from the answer received by the model and implement it as a solution to the problem.
Editor's Notes
The business must understand the various areas involved in machine learning