Introduction to DS, ML and
IBM Tools
—
Qamar Un Nisa
-Lead Developer Advocate, IBM Pakistan
-Women in Data Science (WiDS) Ambassador
-MS in Data Science, NLP Thesis
January 29, 2021
Contents
2
Introduction to AI and Data 04
Data Analysis and its types 05
Data Science Flow 07
Data Science Methodology 08
AI / Data Science Use-Cases 13
Data Science Tools and Frameworks 14
3
Around 2.5
quintillion bytes
of Data is
generated
everyday.
4
What is it all about?
Danger
Zone
Data
Science
Artificial Intelligence
Machine Learning
Deep Learning
Data Science
Any technique which enables computers to
mimic human intelligence.
AI techniques that give computers the ability to
learn with previous data and patterns – mimic
human learning.
Makes the computation of multi-layer neural
networks feasible – mimic human brain.
5
Types of Data Analysis
What Happened:
Uncovers insights
Dashboards
Descriptive
Analysis
Diagnostic
Analysis
Why it Happened:
Uncovers patterns and
dependencies
Social Media
Marketing Campaign
Predictive
Analysis
What will Happen:
Use past patterns to
predict future trends
Predicting future
sales
Prescriptive
Analysis
What to do:
Eliminate possible
future problems or
take advantage of
promising trend
Recommendation
Systems
6
A Data Scientist
wear
different
types of
HATS.
Data Science Flow Pipeline
Process
Data
Explore (Basic Stats,
Plots, Trends, Pattern
Mining)
Design Model e.g.
Feature Engineering
Learn Model
Update/Improve Model
Verify Experiment
(A/B) Testing
Domain Knowledge Expert Knowledge
Raw
Data
1. Data Cleaning/
Processing
Model
Deployment
2. Exploratory Data
Analysis / Data
Visualization
3. Machine Learning
Model / Data Modelling
8
Researching the Data - Types of Data
Name
Gender
Numerical
Discrete Continuous
Height
Number of
audiences
Age Salary
Images
Categorical
9
Researching the Data - Types of Data
Name
Gender
Numerical
Discrete Continuous
Height
Number of
audiences
Age Salary
Text Data
Categorical
Nominal Ordinal
Rating
Grades
10
Preparing the Data
Data Cleaning
• Bad formats: ignore or treat like missing data
• Missing Data: replace or remove row
• Useless variables: remove
• Wrong Data: dummy or rubbish data like aaa, bbb, 21ad etc.
Data Transformation
• Transform variables: data formats, column data type
• Create derived variables: Country from IP, age from ID Card
• Normalize strings: different spellings and nicknames (Male, M, Men etc.)
• Feature value rescaling: most ML algorithms need values ranging from 0-1
• Enrich: lookup and add age from profile
11
Data Exploration – Data Visualization
12
Applying Models on the Data – Data Modelling
Machine Learning
Supervised Learning
Classification
Clustering Regression
Unsupervised Learning Reinforcement Learning
13
Ways to excel in Data Science
• Build Foundations – Cognitiveclass.ai | Coursera | DataCamp etc.
• Tools – Opensource, company specific etc.
• Kaggle – Participate in competitions, maintain Kaggle profile.
• Competitions – Local events, Webinars , Hackathons.
• Internship – Companies, Telcos, Banks, Startups working with AI.
14
AI / Data Science Use-Cases
Fore more, Visit: Developer.ibm.com
15
Data Science Tools – Libraries and Frameworks
16
IBM Data Science & Analytics Tools
• Cloud Pak for Data (Watson Studio)
• SPSS Modeler
• Auto AI
• IBM Cognos Analytics
• Watson Machine Learning
• Watson AI OpenScale
• Other Watson Services – Chatbot, Natural Language Processing etc
Create free IBM Cloud account: http://ibm.biz/intro-ds
Thank you
17
Qamar Un Nisa
Lead Developer Advocate, IBM Pakistan
Data&AI Focal, IBM MEA
Email: qamar.n@ibm.com
LinkedIn: linkedin.com/in/qamarnisa/
Survey: https://ibm.biz/BdfXaR
Upcoming Events
• Workshope 2:
6 February 2021 – Introduction to Machine Learning and Different Models
Link: https://ibm.webex.com/ibm/j.php?MTID=mb8227fc22282ed57c4856d45d860b3db
• Workshop 3:
13 February 2021 - Automated Model Building using Watson AutoAI: Build and Deploy top-
performing models in minutes
Link: https://ibm.webex.com/ibm/j.php?MTID=m8f70391f2e893c4f8eb6d6e3f356a641
19

Introduction to DS, ML and IBM Tools

  • 1.
    Introduction to DS,ML and IBM Tools — Qamar Un Nisa -Lead Developer Advocate, IBM Pakistan -Women in Data Science (WiDS) Ambassador -MS in Data Science, NLP Thesis January 29, 2021
  • 2.
    Contents 2 Introduction to AIand Data 04 Data Analysis and its types 05 Data Science Flow 07 Data Science Methodology 08 AI / Data Science Use-Cases 13 Data Science Tools and Frameworks 14
  • 3.
    3 Around 2.5 quintillion bytes ofData is generated everyday.
  • 4.
    4 What is itall about? Danger Zone Data Science Artificial Intelligence Machine Learning Deep Learning Data Science Any technique which enables computers to mimic human intelligence. AI techniques that give computers the ability to learn with previous data and patterns – mimic human learning. Makes the computation of multi-layer neural networks feasible – mimic human brain.
  • 5.
    5 Types of DataAnalysis What Happened: Uncovers insights Dashboards Descriptive Analysis Diagnostic Analysis Why it Happened: Uncovers patterns and dependencies Social Media Marketing Campaign Predictive Analysis What will Happen: Use past patterns to predict future trends Predicting future sales Prescriptive Analysis What to do: Eliminate possible future problems or take advantage of promising trend Recommendation Systems
  • 6.
  • 7.
    Data Science FlowPipeline Process Data Explore (Basic Stats, Plots, Trends, Pattern Mining) Design Model e.g. Feature Engineering Learn Model Update/Improve Model Verify Experiment (A/B) Testing Domain Knowledge Expert Knowledge Raw Data 1. Data Cleaning/ Processing Model Deployment 2. Exploratory Data Analysis / Data Visualization 3. Machine Learning Model / Data Modelling
  • 8.
    8 Researching the Data- Types of Data Name Gender Numerical Discrete Continuous Height Number of audiences Age Salary Images Categorical
  • 9.
    9 Researching the Data- Types of Data Name Gender Numerical Discrete Continuous Height Number of audiences Age Salary Text Data Categorical Nominal Ordinal Rating Grades
  • 10.
    10 Preparing the Data DataCleaning • Bad formats: ignore or treat like missing data • Missing Data: replace or remove row • Useless variables: remove • Wrong Data: dummy or rubbish data like aaa, bbb, 21ad etc. Data Transformation • Transform variables: data formats, column data type • Create derived variables: Country from IP, age from ID Card • Normalize strings: different spellings and nicknames (Male, M, Men etc.) • Feature value rescaling: most ML algorithms need values ranging from 0-1 • Enrich: lookup and add age from profile
  • 11.
    11 Data Exploration –Data Visualization
  • 12.
    12 Applying Models onthe Data – Data Modelling Machine Learning Supervised Learning Classification Clustering Regression Unsupervised Learning Reinforcement Learning
  • 13.
    13 Ways to excelin Data Science • Build Foundations – Cognitiveclass.ai | Coursera | DataCamp etc. • Tools – Opensource, company specific etc. • Kaggle – Participate in competitions, maintain Kaggle profile. • Competitions – Local events, Webinars , Hackathons. • Internship – Companies, Telcos, Banks, Startups working with AI.
  • 14.
    14 AI / DataScience Use-Cases Fore more, Visit: Developer.ibm.com
  • 15.
    15 Data Science Tools– Libraries and Frameworks
  • 16.
    16 IBM Data Science& Analytics Tools • Cloud Pak for Data (Watson Studio) • SPSS Modeler • Auto AI • IBM Cognos Analytics • Watson Machine Learning • Watson AI OpenScale • Other Watson Services – Chatbot, Natural Language Processing etc Create free IBM Cloud account: http://ibm.biz/intro-ds
  • 17.
    Thank you 17 Qamar UnNisa Lead Developer Advocate, IBM Pakistan Data&AI Focal, IBM MEA Email: qamar.n@ibm.com LinkedIn: linkedin.com/in/qamarnisa/ Survey: https://ibm.biz/BdfXaR
  • 18.
    Upcoming Events • Workshope2: 6 February 2021 – Introduction to Machine Learning and Different Models Link: https://ibm.webex.com/ibm/j.php?MTID=mb8227fc22282ed57c4856d45d860b3db • Workshop 3: 13 February 2021 - Automated Model Building using Watson AutoAI: Build and Deploy top- performing models in minutes Link: https://ibm.webex.com/ibm/j.php?MTID=m8f70391f2e893c4f8eb6d6e3f356a641
  • 19.