SlideShare a Scribd company logo
1
Categorical Data Analysis in Python
By
Jaidev Deshpande
Data Scientist, DataCulture Analytics
twitter.com/jaidevd
2
Problem: Who's likely to attend the next
meetup?
●
Who comes often?
●
Men / Women?
●
Where do you live? How far from the venue?
●
Proficiency with Python
(Beginner / Intermediate / Advanced)?
●
Area of interest?
3
Something like..
Attendees Features
Attendance
(%)
Gender Pincode Proficiency in
Python
Interest ...
attendee_1 80 M 411013 Intermediate Web ...
attendee_2 30 F 411040 Advanced Test /
Automation
...
attendee_3 55 M 411001 Beginners Scientific ...
... ... ... ... ... ... ...
● 1. Numerical features – continuous and quantitative
● 2. Categorical features – discrete and qualitative
4
Common Numerical Operations on Data
●
Obviously – add, subtract, multiply divide
●
Statistical moments
●
Operations in vector spaces
– Distance measures
– Slicing
5
Comparison of Operations
Numerical Data
Addition, subtract, multiply, divide
Mean, Variance, Standard Deviation
Vector Spaces – the very idea of
'measuring'
Categorical Data (Strings, etc)
What's the product of two strings?
The average pincode of two areas?
&%%#&$$*&!!!!
At least get some numbers!
6
One-hot Encoding
●
[Apples,
Oranges,
Mangoes]
● sklearn.preprocessing.OneHotEncoder
● sklearn.feature_extraction.DictVectorizer
[0, 0, 1;
0, 1, 0;
1, 0, 0]
7
Original Data
Attendees Features
Attendance
(%)
Gender Pincode Proficiency in
Python
Interest ...
attendee_1 80 [0 1] [1 0 0 … 0] [0 1 0] [1 0 0 0 0 0] ...
attendee_2 30 [1 0] [0 1 0 … 0] [1 0 0] [0 1 0 0 0 0] ...
attendee_3 55 [0 1] [0 0 1 … 0] [0 0 1] [0 0 1 0 0 0] ...
... ... ... ... ... ... ...
8
Curse of Dimensionality
9
Correspondence Analysis
●
Contingency tables (pandas.crosstab)
profeciency advanced beginner intermediate
gender
F 1 0 0
M 0 1 1
●
Different numerical measures
●
Perceptual maps
10
Correspondence Analysis
●
How are proficiencies related w.r.t gender? (Row profiles)
●
How are genders related w.r.t proficiency? (Column profiles)
– Cosine similarity
– Correlation / Covariance
●
How are they interrelated?
– Weighted chi-squared distance
●
Can the dimensionality be reduced?
– Singular value decomposition / PCA
– sklearn.decomposition.PCA
– sklearn.decomposition.TruncatedSVD
11
Sample Problem
●
Consider the proficiency and interest features from the original
problem
●
Fake data with 100 observations
●
Contingency matrix:
automation scientific web
advanced 8 1 7
beginner 13 9 35
intermediate 7 1 19
12
Results
13
Source and Tutorials
●
http://github.com/motherbox/mca

More Related Content

Similar to Categorical Data Analysis in Python

Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
manaswidebbarma1
 
Guide to wall street quant jobs for IITians
Guide to wall street quant jobs for IITiansGuide to wall street quant jobs for IITians
Guide to wall street quant jobs for IITiansPratik Poddar
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
Pramit Choudhary
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in education
CSIRO
 
Machine learning- key concepts
Machine learning- key conceptsMachine learning- key concepts
Machine learning- key conceptsAmir Ziai
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Yuchen Zhao
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
Machine Learning Valencia
 
Mini datathon
Mini datathonMini datathon
Mini datathon
Kunal Jain
 
Explainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxExplainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptx
CSIRO
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
Machine Learning from Statistical Point of View
Machine Learning from Statistical Point of ViewMachine Learning from Statistical Point of View
Machine Learning from Statistical Point of View
Yury Gubman
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ANURAG SINGH
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ANURAG SINGH
 
Ml - A shallow dive
Ml  - A shallow diveMl  - A shallow dive
Ml - A shallow dive
Gopi Krishna Nuti
 
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Matthew Powers
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
Kunal Jain
 
Teaching Constraint Programming, Patrick Prosser
Teaching Constraint Programming,  Patrick ProsserTeaching Constraint Programming,  Patrick Prosser
Teaching Constraint Programming, Patrick Prosser
Pierre Schaus
 
MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1
heinestien
 

Similar to Categorical Data Analysis in Python (20)

Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
Guide to wall street quant jobs for IITians
Guide to wall street quant jobs for IITiansGuide to wall street quant jobs for IITians
Guide to wall street quant jobs for IITians
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in education
 
Machine learning- key concepts
Machine learning- key conceptsMachine learning- key concepts
Machine learning- key concepts
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Explainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxExplainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptx
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Machine Learning from Statistical Point of View
Machine Learning from Statistical Point of ViewMachine Learning from Statistical Point of View
Machine Learning from Statistical Point of View
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
Ml - A shallow dive
Ml  - A shallow diveMl  - A shallow dive
Ml - A shallow dive
 
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Teaching Constraint Programming, Patrick Prosser
Teaching Constraint Programming,  Patrick ProsserTeaching Constraint Programming,  Patrick Prosser
Teaching Constraint Programming, Patrick Prosser
 
0 introduction
0  introduction0  introduction
0 introduction
 
MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1
 

Recently uploaded

Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
An Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering TechniquesAn Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering Techniques
ambekarshweta25
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 

Recently uploaded (20)

Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
An Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering TechniquesAn Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering Techniques
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 

Categorical Data Analysis in Python