SlideShare a Scribd company logo
1 of 33
Applied Python for correlation on churn & stocks datasets
PRESENTED BY MAHMOUD FOUAD DARWISH
Correlation
 Correlation is a statistic that measures the degree to which two variables move in relation to each other.
 Correlation measures association, but doesn’t show if x causes y or vice versa.
Correlation Types
 Positive Correlation :- when x goes up or down then we expect y to follow the
same direction.
 Negative Correlation :- when x goes up or down, we expect y to follow the
opposite direction.
 A zero correlation, we cannot say anything in relation to each other.
Correlation Formula
Churn Dataset
 Churn dataset used is publicly available and is mentioned in the book [*Discovering Knowledge in
Data*](https://www.amazon.com/dp/0470908742/) by Daniel T. Larose. The author attributed the dataset
to the University of California Irvine Repository of Machine Learning Datasets.
 Mobile phone service providers keep historical records on customers who churn or leave their service
provider to another provider as it is useful to identify those customers before they leave and try to avoid
losing them.
 Dataset file contains 3,333 records, Each record uses 21 attributes to describe the profile of a customer of
an unknown US mobile phone service provider.
Load Dataset & Display Head Sample
Dataset Description
 State: The US state in which the customer resides indicated by a two letter abbreviation. For example, OH
or NJ
 Account Length: The number of days that this account has been active
 Area Code: The three digit area code of the corresponding customer’s phone number
 Phone: The seven digit phone number
 Int’l Plan: Whether the customer has an international calling plan: yes/no
 VMail Plan: Whether the customer has a voice mail feature: yes/no
 VMail Message: The average number of voice mail messages per month
 Day Mins: The total number of calling minutes used during the day
Dataset Description Cont.
 Day Calls: The total number of calls placed during the day
 Day Charge: The billed cost of daytime calls
 Eve Mins, Eve Calls, Eve Charge: The billed cost for calls placed during the evening
 Night Mins, Night Calls, Night Charge: The billed cost for calls placed during nighttime
 Intl Mins, Intl Calls, Intl Charge: The billed cost for international calls
 CustServ Calls: The number of calls placed to Customer Service
 Churn?: Whether the customer left the service: true/false
Data Exploration - Describe
 The first step is to use a describe function to see how the values of individual attributes are distributed, as
well as compute summary statistics for numeric attributes such as mean, min values, max values, standard
deviations, etc.
 display(churn.describe())
Data Exploration - Histogram
 A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a
set of continuous data. This allows the inspection of the data for its underlying distribution (e.g., normal
distribution), outliers, skewness, etc.
 hist = churn.hist(bins=30, sharey=True, figsize=(10, 10))
Data Exploration - crosstab
 We use crosstab function in order to show frequency tables for each categorical feature and counts of
unique values.
 for column in churn.select_dtypes(include=['object']).columns:
display(pd.crosstab(index=churn[column], columns='% observations', normalize='columns')) print("#
of unique values {}".format(churn[column].nunique()))
CrossTab Code Example
Crosstab – Feature Relation to Churn
Hist – Feature Relation to Churn
Corr()- pairwise relationships between attributes
Scatter()- pairwise relationships between attributes
seaborn heatmap - pairwise relationships between attributes
seaborn heatmap Cont.
seaborn heatmap Cont.
seaborn heatmap Cont.
Historical stock prices dataset loading
 In order to be able to read historical prices for US stock market, we would depend on pandas data
reader library to load stocks information from yahoo finance.
Historical stock prices dataset loading Cont.
Yahoo Finance Data Example
Stock prices Correlation – Corr()
Correlation between stocks and SP500 Index
Sp500 Data Sample
Stock Correlation seaborn heatmap
Stock Correlation seaborn heatmap Cont.
Stock Correlation – Cont.
Visualize Stocks Correlation
Conclusion
 We can use python to generate correlation between different attributes using corr, scattermatrix,
seaborn heatmap.
 We have applied python correlation functions on two different datasets [ churn dataset and stocks
datasets ]
 We can read automatically financial stock prices and load data properly using panda and panda reader
libraries.
 We can describe datasets using describe, histogram and other python functions.
 We can plot graphs using plot function in matplotlib libarary.
 We used notebook & anaconda to execute and run all python codes that are part of this presentation
successfully with no issues.
Future work
 Apply Machine Learning Models to datasets after considering correlation information.
Thank You

More Related Content

Similar to Applied python for correlation on churn and stocks datasets

Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessingAbdurRazzaqe1
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R NotesLakshmiSarvani6
 
Dbms ii mca-ch3-er-model-2013
Dbms ii mca-ch3-er-model-2013Dbms ii mca-ch3-er-model-2013
Dbms ii mca-ch3-er-model-2013Prosanta Ghosh
 
Data analysis.pptx
Data analysis.pptxData analysis.pptx
Data analysis.pptxMDPiasKhan
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docxscet315
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3OllieShoresna
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdfSudhanshiBakre1
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Building a data warehouse
Building a data warehouseBuilding a data warehouse
Building a data warehouseEster Daci
 
Chapter 04
Chapter 04Chapter 04
Chapter 04bmcfad01
 
databases3b
databases3bdatabases3b
databases3bc.west
 

Similar to Applied python for correlation on churn and stocks datasets (20)

Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessing
 
Data Types
Data TypesData Types
Data Types
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Dbms ii mca-ch3-er-model-2013
Dbms ii mca-ch3-er-model-2013Dbms ii mca-ch3-er-model-2013
Dbms ii mca-ch3-er-model-2013
 
User Case.pptx
User Case.pptxUser Case.pptx
User Case.pptx
 
Data analysis.pptx
Data analysis.pptxData analysis.pptx
Data analysis.pptx
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docx
 
CP Handout#3
CP Handout#3CP Handout#3
CP Handout#3
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Building a data warehouse
Building a data warehouseBuilding a data warehouse
Building a data warehouse
 
statistics.ppt
statistics.pptstatistics.ppt
statistics.ppt
 
Lecture-1.ppt
Lecture-1.pptLecture-1.ppt
Lecture-1.ppt
 
Lecture 1.ppt
Lecture 1.pptLecture 1.ppt
Lecture 1.ppt
 
Lecture 1.ppt
Lecture 1.pptLecture 1.ppt
Lecture 1.ppt
 
Chapter 04
Chapter 04Chapter 04
Chapter 04
 
databases3b
databases3bdatabases3b
databases3b
 

Recently uploaded

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Applied python for correlation on churn and stocks datasets

  • 1. Applied Python for correlation on churn & stocks datasets PRESENTED BY MAHMOUD FOUAD DARWISH
  • 2. Correlation  Correlation is a statistic that measures the degree to which two variables move in relation to each other.  Correlation measures association, but doesn’t show if x causes y or vice versa.
  • 3. Correlation Types  Positive Correlation :- when x goes up or down then we expect y to follow the same direction.  Negative Correlation :- when x goes up or down, we expect y to follow the opposite direction.  A zero correlation, we cannot say anything in relation to each other.
  • 5. Churn Dataset  Churn dataset used is publicly available and is mentioned in the book [*Discovering Knowledge in Data*](https://www.amazon.com/dp/0470908742/) by Daniel T. Larose. The author attributed the dataset to the University of California Irvine Repository of Machine Learning Datasets.  Mobile phone service providers keep historical records on customers who churn or leave their service provider to another provider as it is useful to identify those customers before they leave and try to avoid losing them.  Dataset file contains 3,333 records, Each record uses 21 attributes to describe the profile of a customer of an unknown US mobile phone service provider.
  • 6. Load Dataset & Display Head Sample
  • 7. Dataset Description  State: The US state in which the customer resides indicated by a two letter abbreviation. For example, OH or NJ  Account Length: The number of days that this account has been active  Area Code: The three digit area code of the corresponding customer’s phone number  Phone: The seven digit phone number  Int’l Plan: Whether the customer has an international calling plan: yes/no  VMail Plan: Whether the customer has a voice mail feature: yes/no  VMail Message: The average number of voice mail messages per month  Day Mins: The total number of calling minutes used during the day
  • 8. Dataset Description Cont.  Day Calls: The total number of calls placed during the day  Day Charge: The billed cost of daytime calls  Eve Mins, Eve Calls, Eve Charge: The billed cost for calls placed during the evening  Night Mins, Night Calls, Night Charge: The billed cost for calls placed during nighttime  Intl Mins, Intl Calls, Intl Charge: The billed cost for international calls  CustServ Calls: The number of calls placed to Customer Service  Churn?: Whether the customer left the service: true/false
  • 9. Data Exploration - Describe  The first step is to use a describe function to see how the values of individual attributes are distributed, as well as compute summary statistics for numeric attributes such as mean, min values, max values, standard deviations, etc.  display(churn.describe())
  • 10. Data Exploration - Histogram  A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc.  hist = churn.hist(bins=30, sharey=True, figsize=(10, 10))
  • 11. Data Exploration - crosstab  We use crosstab function in order to show frequency tables for each categorical feature and counts of unique values.  for column in churn.select_dtypes(include=['object']).columns: display(pd.crosstab(index=churn[column], columns='% observations', normalize='columns')) print("# of unique values {}".format(churn[column].nunique()))
  • 13. Crosstab – Feature Relation to Churn
  • 14. Hist – Feature Relation to Churn
  • 15. Corr()- pairwise relationships between attributes
  • 16. Scatter()- pairwise relationships between attributes
  • 17. seaborn heatmap - pairwise relationships between attributes
  • 21. Historical stock prices dataset loading  In order to be able to read historical prices for US stock market, we would depend on pandas data reader library to load stocks information from yahoo finance.
  • 22. Historical stock prices dataset loading Cont.
  • 25. Correlation between stocks and SP500 Index
  • 28. Stock Correlation seaborn heatmap Cont.
  • 31. Conclusion  We can use python to generate correlation between different attributes using corr, scattermatrix, seaborn heatmap.  We have applied python correlation functions on two different datasets [ churn dataset and stocks datasets ]  We can read automatically financial stock prices and load data properly using panda and panda reader libraries.  We can describe datasets using describe, histogram and other python functions.  We can plot graphs using plot function in matplotlib libarary.  We used notebook & anaconda to execute and run all python codes that are part of this presentation successfully with no issues.
  • 32. Future work  Apply Machine Learning Models to datasets after considering correlation information.