Applied python for correlation on churn and stocks datasets

Applied Python for correlation on churn & stocks datasets
PRESENTED BY MAHMOUD FOUAD DARWISH

Correlation
 Correlation is a statistic that measures the degree to which two variables move in relation to each other.
 Correlation measures association, but doesn’t show if x causes y or vice versa.

Correlation Types
 Positive Correlation :- when x goes up or down then we expect y to follow the
same direction.
 Negative Correlation :- when x goes up or down, we expect y to follow the
opposite direction.
 A zero correlation, we cannot say anything in relation to each other.

Churn Dataset
 Churn dataset used is publicly available and is mentioned in the book [*Discovering Knowledge in
Data*](https://www.amazon.com/dp/0470908742/) by Daniel T. Larose. The author attributed the dataset
to the University of California Irvine Repository of Machine Learning Datasets.
 Mobile phone service providers keep historical records on customers who churn or leave their service
provider to another provider as it is useful to identify those customers before they leave and try to avoid
losing them.
 Dataset file contains 3,333 records, Each record uses 21 attributes to describe the profile of a customer of
an unknown US mobile phone service provider.

Load Dataset & Display Head Sample

Dataset Description
 State: The US state in which the customer resides indicated by a two letter abbreviation. For example, OH
or NJ
 Account Length: The number of days that this account has been active
 Area Code: The three digit area code of the corresponding customer’s phone number
 Phone: The seven digit phone number
 Int’l Plan: Whether the customer has an international calling plan: yes/no
 VMail Plan: Whether the customer has a voice mail feature: yes/no
 VMail Message: The average number of voice mail messages per month
 Day Mins: The total number of calling minutes used during the day

Dataset Description Cont.
 Day Calls: The total number of calls placed during the day
 Day Charge: The billed cost of daytime calls
 Eve Mins, Eve Calls, Eve Charge: The billed cost for calls placed during the evening
 Night Mins, Night Calls, Night Charge: The billed cost for calls placed during nighttime
 Intl Mins, Intl Calls, Intl Charge: The billed cost for international calls
 CustServ Calls: The number of calls placed to Customer Service
 Churn?: Whether the customer left the service: true/false

Data Exploration - Describe
 The first step is to use a describe function to see how the values of individual attributes are distributed, as
well as compute summary statistics for numeric attributes such as mean, min values, max values, standard
deviations, etc.
 display(churn.describe())

Data Exploration - Histogram
 A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a
set of continuous data. This allows the inspection of the data for its underlying distribution (e.g., normal
distribution), outliers, skewness, etc.
 hist = churn.hist(bins=30, sharey=True, figsize=(10, 10))

Data Exploration - crosstab
 We use crosstab function in order to show frequency tables for each categorical feature and counts of
unique values.
 for column in churn.select_dtypes(include=['object']).columns:
display(pd.crosstab(index=churn[column], columns='% observations', normalize='columns')) print("#
of unique values {}".format(churn[column].nunique()))

Crosstab – Feature Relation to Churn

Hist – Feature Relation to Churn

Corr()- pairwise relationships between attributes

Scatter()- pairwise relationships between attributes

seaborn heatmap - pairwise relationships between attributes

Historical stock prices dataset loading
 In order to be able to read historical prices for US stock market, we would depend on pandas data
reader library to load stocks information from yahoo finance.

Historical stock prices dataset loading Cont.

Stock prices Correlation – Corr()

Correlation between stocks and SP500 Index

Stock Correlation seaborn heatmap

Stock Correlation seaborn heatmap Cont.

Conclusion
 We can use python to generate correlation between different attributes using corr, scattermatrix,
seaborn heatmap.
 We have applied python correlation functions on two different datasets [ churn dataset and stocks
datasets ]
 We can read automatically financial stock prices and load data properly using panda and panda reader
libraries.
 We can describe datasets using describe, histogram and other python functions.
 We can plot graphs using plot function in matplotlib libarary.
 We used notebook & anaconda to execute and run all python codes that are part of this presentation
successfully with no issues.

Future work
 Apply Machine Learning Models to datasets after considering correlation information.

Applied python for correlation on churn and stocks datasets

Recommended

Recommended

More Related Content

Similar to Applied python for correlation on churn and stocks datasets

Similar to Applied python for correlation on churn and stocks datasets (20)

Recently uploaded

Recently uploaded (20)

Applied python for correlation on churn and stocks datasets