Customer churn has become a big issue in many banks because it costs a lot more to acquire a new customer than retaining existing ones. With the use of a customer churn prediction model possible churners in a bank can be identified, and as a result the bank can take some action to prevent them from leaving. In order to set up such a model in a bank in Iceland few things have to be considered. How a churner in a bank is defined, and which variables and methods to use. We propose that a churner for that Icelandic bank should be defined as a customer who has not been active for the last three months based on the bank definition of an active customer. Behavioral and demographic variables should be used as an input for the model, and either decision tree or logistic regression used as a technique.
Customer churn occurs when customers or subscribers stop doing business with a company or service.
Also known as customer attrition, customer churn is a critical metric because it is much less expensive to retain existing customers than it is to acquire new customers – earning business from new customer’s means working leads all the way through the sales funnel, utilizing your marketing and sales resources throughout the process.
BigData Republic teamed up with VodafoneZiggo and hosted an meetup on churn prediction.
Telecom companies like VodafoneZiggo have long benefited from the fine art/science of predicting churn. Currently, in the booming age of subscription based business models (e.g. Netflix, Spotify, HelloFresh), the importance of predicting churn has become widespread. During this event, VodafoneZiggo shared some of its wisdom with the public, after which BDR Data Scientist Tom de Ruijter presented an overview of the modeling tools at hand, both classical, as well as novel approaches. Finally, the participants engaged in a hands-on session showcasing the implementation of different approaches.
PART 1 — Churn Prediction in Practice by Florian Maas
At VodafoneZiggo we are incredibly excited about Advanced Analytics and the enormous potential for progress and innovation. In our state of the art open source platform we store the tremendous amount of data that is generated every single second in our mobile and fixed networks. This means that we have a vast body of rich information, which if unlocked, can lead to something very special. As a company with a primarily subscription-based service model, churn plays a vital role in the daily business. Not only is the churn rate a good indicator of customer (dis)satisfaction, it is also one out of two factors that determines the steady-state level of active customers. During this talk, we will show how data science provides added value in the process of churn prevention at VodafoneZiggo. We will talk about the data and the modeling approach we use, and the pitfalls and shortcomings that we have encountered while building the model. We will also briefly discuss potential improvements to the current approach, which brings us to talk #2.
PART 2 — The Churn Prediction Toolbox by Tom de Ruijter
The second talk will show you the fine intricacies of predicting churn through different approaches. We’ll start off with an overview of different modeling strategies for describing the problem of churn, both in terms of a classification problem as well as a regression problem. Secondly, Tom will give you insights in how you evaluate a churn model in a way such that business stakeholders know how to act upon the model results. Finally, we’ll work towards the hands-on session demonstrating different model approaches for churn prediction, ranging from classical time series prediction to recurrent neural networks.
The importance of this type of research in the telecom market is to help companies make more profit.
It has become known that predicting churn is one of the most important sources of income to Telecom companies.
Hence, this research aimed to build a system that predicts the churn of customers i telecom company.
These prediction models need to achieve high AUC values. To test and train the model, the sample data is divided into 70% for training and 30% for testing.
Slides from the presentation of this NYC meetup : http://www.meetup.com/Data-Modeling/events/224554990/
I talked about how to model churn before even thinking about the machine learning model.
Customer churn has become a big issue in many banks because it costs a lot more to acquire a new customer than retaining existing ones. With the use of a customer churn prediction model possible churners in a bank can be identified, and as a result the bank can take some action to prevent them from leaving. In order to set up such a model in a bank in Iceland few things have to be considered. How a churner in a bank is defined, and which variables and methods to use. We propose that a churner for that Icelandic bank should be defined as a customer who has not been active for the last three months based on the bank definition of an active customer. Behavioral and demographic variables should be used as an input for the model, and either decision tree or logistic regression used as a technique.
Customer churn occurs when customers or subscribers stop doing business with a company or service.
Also known as customer attrition, customer churn is a critical metric because it is much less expensive to retain existing customers than it is to acquire new customers – earning business from new customer’s means working leads all the way through the sales funnel, utilizing your marketing and sales resources throughout the process.
BigData Republic teamed up with VodafoneZiggo and hosted an meetup on churn prediction.
Telecom companies like VodafoneZiggo have long benefited from the fine art/science of predicting churn. Currently, in the booming age of subscription based business models (e.g. Netflix, Spotify, HelloFresh), the importance of predicting churn has become widespread. During this event, VodafoneZiggo shared some of its wisdom with the public, after which BDR Data Scientist Tom de Ruijter presented an overview of the modeling tools at hand, both classical, as well as novel approaches. Finally, the participants engaged in a hands-on session showcasing the implementation of different approaches.
PART 1 — Churn Prediction in Practice by Florian Maas
At VodafoneZiggo we are incredibly excited about Advanced Analytics and the enormous potential for progress and innovation. In our state of the art open source platform we store the tremendous amount of data that is generated every single second in our mobile and fixed networks. This means that we have a vast body of rich information, which if unlocked, can lead to something very special. As a company with a primarily subscription-based service model, churn plays a vital role in the daily business. Not only is the churn rate a good indicator of customer (dis)satisfaction, it is also one out of two factors that determines the steady-state level of active customers. During this talk, we will show how data science provides added value in the process of churn prevention at VodafoneZiggo. We will talk about the data and the modeling approach we use, and the pitfalls and shortcomings that we have encountered while building the model. We will also briefly discuss potential improvements to the current approach, which brings us to talk #2.
PART 2 — The Churn Prediction Toolbox by Tom de Ruijter
The second talk will show you the fine intricacies of predicting churn through different approaches. We’ll start off with an overview of different modeling strategies for describing the problem of churn, both in terms of a classification problem as well as a regression problem. Secondly, Tom will give you insights in how you evaluate a churn model in a way such that business stakeholders know how to act upon the model results. Finally, we’ll work towards the hands-on session demonstrating different model approaches for churn prediction, ranging from classical time series prediction to recurrent neural networks.
The importance of this type of research in the telecom market is to help companies make more profit.
It has become known that predicting churn is one of the most important sources of income to Telecom companies.
Hence, this research aimed to build a system that predicts the churn of customers i telecom company.
These prediction models need to achieve high AUC values. To test and train the model, the sample data is divided into 70% for training and 30% for testing.
Slides from the presentation of this NYC meetup : http://www.meetup.com/Data-Modeling/events/224554990/
I talked about how to model churn before even thinking about the machine learning model.
Introducing Customer Churn Prevention Powerpoint Presentation Slides. Discuss various ways through which a company can manage customer churn with this PPT slide deck. Showcase methods and ways by which a company can prevent the customer from reducing their purchase of products and services. Our readily available PPT slide deck helps to present the types of customer churn, methods to handle customer attrition, the impact of successful implementation of churn management, dashboard, churn propensity model, etc. Take the assistance of customer churn management PPT slideshow to depict several ways by which a firm can experience customer churn such as when customers stop spending, churn due to product quality, etc. Showcase four stages of customer churn management which allow the company to handle customer attrition. Present how the firm can prevent customer churn by using customer churn analysis PPT infographics. You can easily highlight information about the various marketing campaigns in order to retain its customer from churning. Provide ways to prevent churn through predictive analysis by incorporating our professionally designed customer churn prediction PPT presentation. https://bit.ly/3p6AR7S
I have done this analysis using SAS on a dataset with 5000 records. I have used CART and Logistic regression to build a predictive model to identify customers which are likely to shift to competitors network.
Big Data presence in the high volume in the data storages can help in various ways to learn more about the need and trends of the current market which will be useful for all type of organizations. Modern information technology used to analyze the relationship between social trends and market insights is a useful way to have indirectly interlinked to customers and their interests from unstructured and semi-structured data. Such analysis will give organizations a broader view towards the practical needs of customers and once banking industry or any industry could know the customers, they can serve better and with more flexibility. In this presentation, team has primarily created the platform and designed the architecture in big data technology for banking industry to maximize the users of credit card.
Customer churn prediction for telecom data set.Kuldeep Mahani
Customer churn prediction and relevant recommendations as per DSN telecom data analysis. Random forest and logistic regression were applied to predict customer churn.
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
The aim of this project is to help a telecom company with insights on customer behavior that would be useful for retention of customers. The specific goals expected to be achieved are given below
1. Identification of the top variables driving likelihood of churn
2. Build a predictive model to identify customers who have highest probability to terminate services with the company.
3. Build a lift chart for optimization of efforts by targeting most of the potential churns with least contact efforts. Here with 30% of the total customer pool, the model accurately provides 33% of total potential churn candidates.
Models tried to arrive at the best are
1. Simple Models like Logistic Regression & Discriminant Analysis with different thresholds for classification
2. Random Forest after balancing the dataset using Synthetic Minority Oversampling Technique (SMOTE)
3. Ensemble of five individual models and predicting the output by averaging the individual output probabilities
4. Xgboost algorithm
Fighting financial fraud at Danske Bank with artificial intelligenceRon Bodkin
Danske Bank, the leader in mobile payments in Denmark, is innovating with AI. Danske Bank’s existing fraud detection engine is being enhanced with deep learning algorithms that can analyze potentially tens of thousands of latent features. Danske Bank’s current system is largely based on handcrafted rules created by the business, based on intuition and some light analysis. The system is effective at blocking fraud, but it has a high rate of false positives, which is expensive and inconvenient, and it has proved impractical to update and maintain as fraudsters evolve their capabilities. Moreover, the bank understands that fraud is getting worse in the near- and long-term future due to the increased digitization of banking and the prevalence of mobile banking applications and recognizes the need to use cutting-edge techniques to engage fraudsters not where they are today but where they will be tomorrow.
Application fraud is an important emerging trend, in which machines fill in transaction forms. There is evidence that criminals are employing sophisticated machine-learning techniques to attack, so it’s critical to use sophisticated machine learning to catch fraud in banking and mobile payment transactions.
Ron Bodkin and Nadeem Gulzar explore how Danske Bank uses deep learning for better fraud detection. Danske Bank’s multistep program first productionizes “classic” machine learning techniques (boosted decision trees) while in parallel developing deep learning models with TensorFlow as a “challenger” to test. The system was first tested in shadow production and then in full production in a champion-challenger setup against live transactions. Ron and Nadeem explain how the bank is integrating the models with the efforts already running, giving the bank and its investigation team the ability to adapt to new patterns faster than before and taking on complex highly varying functions not present in the training examples.
Real Estate Executive Summary (MKT460 Lab #5)Mira McKee
This is a lab report I wrote for my Marketing 460: Information & Analysis class. Utilizing SPSS, a statistical analysis program, to analyze real estate data, I wrote this report detailing my research steps (including regression analysis, CHAID analysis, customer profiles, etc.) and conclusion. I designed the Lab Report in Canva.
The term “inflection point” has multiple definitions. In differential calculus, an inflection point is a point on a curve at which the concavity changes from positive curvature to
negative curvature, or vice versa. In political science, an inflection point is a moment in history that dramatically alters a geopolitical situation, for better or worse. In business, Intel co-founder Andy Grove has described a strategic inflection point as “an event that changes the way we think and act.” Each of these definitions describes a moment at which our fortunes change — and in many cases, we can’t recognize the moment until
after it’s passed.
Introducing Customer Churn Prevention Powerpoint Presentation Slides. Discuss various ways through which a company can manage customer churn with this PPT slide deck. Showcase methods and ways by which a company can prevent the customer from reducing their purchase of products and services. Our readily available PPT slide deck helps to present the types of customer churn, methods to handle customer attrition, the impact of successful implementation of churn management, dashboard, churn propensity model, etc. Take the assistance of customer churn management PPT slideshow to depict several ways by which a firm can experience customer churn such as when customers stop spending, churn due to product quality, etc. Showcase four stages of customer churn management which allow the company to handle customer attrition. Present how the firm can prevent customer churn by using customer churn analysis PPT infographics. You can easily highlight information about the various marketing campaigns in order to retain its customer from churning. Provide ways to prevent churn through predictive analysis by incorporating our professionally designed customer churn prediction PPT presentation. https://bit.ly/3p6AR7S
I have done this analysis using SAS on a dataset with 5000 records. I have used CART and Logistic regression to build a predictive model to identify customers which are likely to shift to competitors network.
Big Data presence in the high volume in the data storages can help in various ways to learn more about the need and trends of the current market which will be useful for all type of organizations. Modern information technology used to analyze the relationship between social trends and market insights is a useful way to have indirectly interlinked to customers and their interests from unstructured and semi-structured data. Such analysis will give organizations a broader view towards the practical needs of customers and once banking industry or any industry could know the customers, they can serve better and with more flexibility. In this presentation, team has primarily created the platform and designed the architecture in big data technology for banking industry to maximize the users of credit card.
Customer churn prediction for telecom data set.Kuldeep Mahani
Customer churn prediction and relevant recommendations as per DSN telecom data analysis. Random forest and logistic regression were applied to predict customer churn.
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
The aim of this project is to help a telecom company with insights on customer behavior that would be useful for retention of customers. The specific goals expected to be achieved are given below
1. Identification of the top variables driving likelihood of churn
2. Build a predictive model to identify customers who have highest probability to terminate services with the company.
3. Build a lift chart for optimization of efforts by targeting most of the potential churns with least contact efforts. Here with 30% of the total customer pool, the model accurately provides 33% of total potential churn candidates.
Models tried to arrive at the best are
1. Simple Models like Logistic Regression & Discriminant Analysis with different thresholds for classification
2. Random Forest after balancing the dataset using Synthetic Minority Oversampling Technique (SMOTE)
3. Ensemble of five individual models and predicting the output by averaging the individual output probabilities
4. Xgboost algorithm
Fighting financial fraud at Danske Bank with artificial intelligenceRon Bodkin
Danske Bank, the leader in mobile payments in Denmark, is innovating with AI. Danske Bank’s existing fraud detection engine is being enhanced with deep learning algorithms that can analyze potentially tens of thousands of latent features. Danske Bank’s current system is largely based on handcrafted rules created by the business, based on intuition and some light analysis. The system is effective at blocking fraud, but it has a high rate of false positives, which is expensive and inconvenient, and it has proved impractical to update and maintain as fraudsters evolve their capabilities. Moreover, the bank understands that fraud is getting worse in the near- and long-term future due to the increased digitization of banking and the prevalence of mobile banking applications and recognizes the need to use cutting-edge techniques to engage fraudsters not where they are today but where they will be tomorrow.
Application fraud is an important emerging trend, in which machines fill in transaction forms. There is evidence that criminals are employing sophisticated machine-learning techniques to attack, so it’s critical to use sophisticated machine learning to catch fraud in banking and mobile payment transactions.
Ron Bodkin and Nadeem Gulzar explore how Danske Bank uses deep learning for better fraud detection. Danske Bank’s multistep program first productionizes “classic” machine learning techniques (boosted decision trees) while in parallel developing deep learning models with TensorFlow as a “challenger” to test. The system was first tested in shadow production and then in full production in a champion-challenger setup against live transactions. Ron and Nadeem explain how the bank is integrating the models with the efforts already running, giving the bank and its investigation team the ability to adapt to new patterns faster than before and taking on complex highly varying functions not present in the training examples.
Real Estate Executive Summary (MKT460 Lab #5)Mira McKee
This is a lab report I wrote for my Marketing 460: Information & Analysis class. Utilizing SPSS, a statistical analysis program, to analyze real estate data, I wrote this report detailing my research steps (including regression analysis, CHAID analysis, customer profiles, etc.) and conclusion. I designed the Lab Report in Canva.
The term “inflection point” has multiple definitions. In differential calculus, an inflection point is a point on a curve at which the concavity changes from positive curvature to
negative curvature, or vice versa. In political science, an inflection point is a moment in history that dramatically alters a geopolitical situation, for better or worse. In business, Intel co-founder Andy Grove has described a strategic inflection point as “an event that changes the way we think and act.” Each of these definitions describes a moment at which our fortunes change — and in many cases, we can’t recognize the moment until
after it’s passed.
What is the most important thing that changes during times of recession? Customer behavior. Building a good customer experience through experimentation can help retain existing customers, strengthen brand loyalty, and gain valuable insights.
1
2
Summary Of the Business Model Canvas
Student’s Name
Department/ Faculty
Professor’s Name
Course Code & Name
Date
Summary Of the Business Model Canvas
The Business Model Canvas is a tool that helps entrepreneurs and innovators create and communicate their business models. It is a visual representation of the critical components of a business, including the value proposition, key partners, key activities, essential resources, cost structure, customer relationships, customer segments, and revenue streams. The Business Model Canvas can create or improve new businesses (Fisher et al., 2020). The value proposition is the core value that a business delivers to its customers. In this case, the value proposition is the ability to make better financial decisions. This is delivered by helping customers understand and predict the financial impact of different scenarios. The business also helps customers manage their money in a way that improves their financial well-being.
The key partners are the people outside the organization that the business needs to work with to be successful. In this case, the key partners are the Federal Reserve Bank, banks, and academic institutions. The Federal Reserve Bank is the best source for data on inflation rates, interest rates, and treasury bonds. Banks are a good data source on consumer prices (Pratama & Iijima, 2018). Academic institutions are a good source of research on economics and inflation. The key activities are the activities that are most important for the business to do in order to be successful. In this case, the key activities are data collection, analysis, and research. Data collection is necessary in order to test the hypothesis. Data analysis is necessary to see if there is a correlation between inflation and the cost of living. Research is necessary to understand the phenomenon better.
The resources that are most essential for the company to have in order to ensure its continued success are referred to as the essential resources. In this particular instance, the most critical resources are the data and the research. Gathering the appropriate data to test the hypothesis is essential. Research is required if we are going to get a better grasp of the phenomenon. The company's cost structure is the most significant cost factor in the operation. In this scenario, the expenditures of data, time, and research are among the most significant. Collecting and analyzing data is an expensive endeavor. Because there are many of data to collect and analyze, time is a precious resource (Pratama & Iijima, 2018). The cost of carrying out research can be high. An organization's relationships with its clients or patrons are referred to as customer relationships. In this situation, the client connections consist of receiving financial education and direction. The company offers these services to assist consumers in making wiser choices regarding their finances.
The client segments represent the dif.
The banking industry is data-demanding with acknowledged ATM and credit processing data. As banks face increasing pressure to stay successful, understanding customer needs and preferences becomes a critical success factor. Along with Data mining and advanced analytics techniques, banks are furnished to manage market uncertainty, minimize fraud, and control exposure risk.
ANALYTICS PLAN TO REDUCE CUSTOMER CHURN AT YORE BLENDS Himabin.docxdaniahendric
ANALYTICS PLAN TO REDUCE CUSTOMER CHURN AT YORE BLENDS
Himabindu Aratikatla
University of the Cumberland's
March 22, 2020
Introduction
Yore Blends (YB) is a fictional online company dedicated to selling subscription-based traditional spice blends coupled with additional complementary products.
Yore Blends (YB) aspire to growing through mergers and acquisitions.
To do this, they need a strong customer base and steady revenue.
Yore Blends is concerned with the rate of customer churn.
Company’s Problem
Yore Blends has been in existence for years.
Nonetheless, the company is considering to expand through mergers and acquisition.
However, they are experiencing customer churn.
A considerable percentage of its clients don’t purchase their goods anymore.
As a result, the company needs to reduce customer attrition by at least 16%.
Causes for Customer Churn
Poor customer care service:
The company minimized rather than maximizing client cost
Bad onboarding:
Yore Blends clients failed to get value for the purchased products.
Clients might have lost interest in the company’s products.
Many companies think of customer service as a cost to be minimized, rather than an investment to be maximized. Here’s the issue with that: if you think of support as a cost center, then it will be. That is, if you don’t prioritize support and work to deliver excellent service to your customers, then it’s only going to cost you money…and customers. A disproportionate amount of your customer churn will take place between (1) and (2).
That’s where customers abandon your product because they get lost, don’t understand something, don’t get value from the product, or simply lose interest.
Bad onboarding – the process by which you help a customer go from (1) to (2) – can crush your retention rate, and undo all of that hard work you did to get your customers to convert in the first place.
4
Causes for Customer Churn (Cont.)
Limited customer success:
Lack of updates regarding new products
Extended absence of the company-client communication
Natural Causes:
Customers may have grown out of the products.
May have resulted due to Vendor switches might
While onboarding gets your customer to their initial success, your job isn’t done there. Hundreds of variables – including changing needs, confusion about new features and product updates, extended absences from the product and competitor marketing – could lead your customers away. If your customers stop hearing from you, and you stop helping them get value from your product throughout their entire lifecycle, then you risk making that lifecycle much, much shorter. Furthermore, Not every customer that abandons you does so because you failed. Sometimes, customers go out of business. Sometimes, operational or staff changes lead to vendor switches. Sometimes, they simply outgrow your product or service. (Salloum, 2016)
5
REASONS TO ANALYZE CUSTOMER CHURN
The company will be in a position to understand c ...
ANALYTICS PLAN TO REDUCE CUSTOMER CHURN AT YORE BLENDS Himabin.docxgreg1eden90113
ANALYTICS PLAN TO REDUCE CUSTOMER CHURN AT YORE BLENDS
Himabindu Aratikatla
University of the Cumberland's
March 22, 2020
Introduction
Yore Blends (YB) is a fictional online company dedicated to selling subscription-based traditional spice blends coupled with additional complementary products.
Yore Blends (YB) aspire to growing through mergers and acquisitions.
To do this, they need a strong customer base and steady revenue.
Yore Blends is concerned with the rate of customer churn.
Company’s Problem
Yore Blends has been in existence for years.
Nonetheless, the company is considering to expand through mergers and acquisition.
However, they are experiencing customer churn.
A considerable percentage of its clients don’t purchase their goods anymore.
As a result, the company needs to reduce customer attrition by at least 16%.
Causes for Customer Churn
Poor customer care service:
The company minimized rather than maximizing client cost
Bad onboarding:
Yore Blends clients failed to get value for the purchased products.
Clients might have lost interest in the company’s products.
Many companies think of customer service as a cost to be minimized, rather than an investment to be maximized. Here’s the issue with that: if you think of support as a cost center, then it will be. That is, if you don’t prioritize support and work to deliver excellent service to your customers, then it’s only going to cost you money…and customers. A disproportionate amount of your customer churn will take place between (1) and (2).
That’s where customers abandon your product because they get lost, don’t understand something, don’t get value from the product, or simply lose interest.
Bad onboarding – the process by which you help a customer go from (1) to (2) – can crush your retention rate, and undo all of that hard work you did to get your customers to convert in the first place.
4
Causes for Customer Churn (Cont.)
Limited customer success:
Lack of updates regarding new products
Extended absence of the company-client communication
Natural Causes:
Customers may have grown out of the products.
May have resulted due to Vendor switches might
While onboarding gets your customer to their initial success, your job isn’t done there. Hundreds of variables – including changing needs, confusion about new features and product updates, extended absences from the product and competitor marketing – could lead your customers away. If your customers stop hearing from you, and you stop helping them get value from your product throughout their entire lifecycle, then you risk making that lifecycle much, much shorter. Furthermore, Not every customer that abandons you does so because you failed. Sometimes, customers go out of business. Sometimes, operational or staff changes lead to vendor switches. Sometimes, they simply outgrow your product or service. (Salloum, 2016)
5
REASONS TO ANALYZE CUSTOMER CHURN
The company will be in a position to understand c.
Identifying causes of customer risk and churn, and then applying approaches for prospective winback, are tremendously important to any company. The content of this presentation enables organizations to optimize customer loyalty behavior
A lot of time and money is being spent by banks in making their organization customer focused and improving the customer experience. According to Gartner’s 2012 CIO Survey, CIOs regard customer experience as the greatest opportunity for IT innovation.
Good performance alone cannot crack the complex code that governs the strength of your customer relationships and the sustainability of your business. As competition intensifies, it is essential to get smarter about the experiences that matter, and deliver return on the bottom line.
Good performance alone cannot crack the complex code that governs the strength of your customer relationships and the sustainability of your business. As competition intensifies, it is essential to get smarter about the experiences that matter, and deliver return on the bottom line.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. WE CARE BECAUSE CUSTOMER
CHURN IMPACTS EVERY BUSINESS:
Lost jobs
Lost Profit
Increased Sales and Marketing
Expenses
Additional Work Created/Lost
Productivity
Increased stress
3. JUST A SMALL SAMPLE OF “BETTER KNOWN”
BUSINESSES THAT FILED BANKRUPCY IN
2019! (WE ALL KNOW MANY MORE!)
4. WE CARE BECAUSE CUSTOMER CHURN
IMPACTS EVERY BUSINESS
According to research firm, CallMiner, avoidable customer churn is
costing US businesses $136 billion a year. CallMiner surveyed 1,000
US adults to find out why consumers contact suppliers, how they
were feeling when they contacted a call centers and which
communications channels they preferred. What we uncovered is a
switching epidemic – and that call centers play a pivotal role in
whether consumers stay loyal or decide to switch.
5. OTHER FRIGHTENING FACTS THAT HAVE
US RUNNING TO EXAMINE CUSTOMER
CHURN:
U.S. companies lose $136.8 billion per year due to avoidable consumer switching. (CallMiner)
More than half of Americans have scrapped a planned purchase or transaction because of bad
service. (American Express)
33% of Americans say they’ll consider switching companies after just a single instance of poor
service. (American Express)
Companies with great experiences have a 16% price premium on products and services. (PWC)
63% of U.S. consumers say they’d share more personal information with a company that offers a
great experience. (PWC)
After having a positive experience with a company, 77% of customers would recommend it to a
friend. (Temkin Group)
6. 2017 TO 2018 CUSTOMER CHURN BY
SECTOR
We see Banks lead in terms of customers
leaving. What can they do to prevent
this?
7. LET’S TAKE THE RIGHT
ACTION!
We attempt to predict churn rate and
Beat the accuracy of other potentially ineffective approaches
such as:
Faulty business changes such as advertising that
produces little measurable results
Basic marketing and competitive analyst which
overlooks deep analysis insights
8. PROJECT STEPS
Analyze a Customer Churn Data Set to:
Discuss/determine goals and metrics for the project. Is reducing
churn the right problem to solve? How specifically shall we
measure performance improvement? What is our success goal?
(Assume here: it is the right problem, we measure performance
overall by reducing customer churn, success is reducing customer
churn by 10% in next 6 months).
Understand what deliverables are useful for internal
stakeholders (Assume it is churn prediction factors, later a
spreadsheet of customer churn predictions, production pipeline
and perhaps an internal dashboard).
9. THE 5 STEP PROCESS TO ANALYZE WORLD
BANK DATA SET
•Classifiers and Evaluation
•Exploring the data and visualization
•Data pre-processing
•Splitting Data
•Resampling Training Data
•Tunning the
parameters
•Connecting to meaningful
accurate data sources
10. PROCESS 1: COLLECTING DATA
The World Bank Data have provided me with an 11 year .csv file for me to work with.
The Data set below has 13 variables to from which to develop a predictive churn model with 10,000
customer observations
12. PROCESS 2: - ANALYZING & EXPLORING THE
DATA
I notice the data is incomplete and leaves a lot of unanswered questions.
Would it be possible to obtain balances over a period of time as opposed to a single date?
What date did the customer exit?
What types of products are the customers in?
Could they have exited from a product and not the bank? What is an “Active Member”?
For this exercise, we proceed to model without context even though typically having context
and better understanding of the data extraction process would give better insight and
possibly lead to better and contextual results of the modelling process
13. PROCESS 2: - ANALYZING & EXPLORING
THE DATA
To begin, I need to verify the type of data, what % is useable and look for patterns; I notice that of the 10,000 customers
of Word Bank, 2,037 have churned in the past 11 years which is a 20% Churn Rate; which is considered an average to
high rate among industry experts. I also review the averages in the variables where customers have churned.
14. PROCESS 2: - EXPLORING THE “CATEGORICAL DATA”
VARIABLES
I Compare the Churn rates by Gender, Geography, Credit Card Holder, and Product Participation:
15. PROCESS 2: - EXPLORING THE “CATEGORICAL DATA”
VARIABLES
I Compare the Churn rates by Gender, Geography, Credit Card Holder, and Product Participation:
16. FINDINGS FROM CATEGORICAL
VARIABLES
We discovered the following:
•The country of France has the least churn rate and Germany the highest. However, the proportion of
churned customers is with inversely related to the population of customers alluding to the bank possibly
having a problem (maybe not enough customer service resources allocated) in the areas where it has
fewer clients.
•The proportion of female customers churning is slightly higher than that of male customers
•Oddly, the majority of the customers that churned are those with credit cards. The majority of customers
have credit cards so this could just be a coincidence.
•The inactive members have a greater churn. The overall proportion of inactive members is quite high
suggesting that the bank may need a program implemented to turn this group to active customers as this
can have a positive impact on the customer churn.
17. LET’S MAKE OUR “NULL HYPOTHESIS”
My “Null Hypothesis” (H0) is that “Since an Average Churn of Banks, according
to “Call Miner” is 20% annually then there is few factors that can be
determined that lead to whether a customer stays with the bank or not.”
My “Alternative Hypothesis” (HA) is that “There are distinct factors and
characteristics” that lead to a customers to leave a bank.”
Through my analysis I will reach a conclusion on my hypothesis.
19. FINDINGS FOR “CONTINUOUS DATA” VARIABLES
Interestingly, neither the product nor the salary has an impactful effect on the likelihood to churn.
There is no significant difference in the credit score distribution between retained and churned customers.
With regard to the tenure, the average tenured client which is 5 years had a lesser likelihood to churn where
those customers on opposing spectrums (spent little time with the bank to a lot of time with the bank) were
more likely to churn. The highest Churn years in the 11 years examined were years 2, 4 and 10.
Worryingly, the bank is losing customers with significant bank balances which is likely to hit their available capital
for lending. The average bank balance for a churned customer is $91,000 with an average bank balance of
$101,465.
One interesting find is the older customers are churning at more than the younger ones which alludes to the fact
that the bank may not adequate service standards that meet customer service expectations of older clients. Another
conclusion is that they may be retiring and consolidating their assets elsewhere. The bank may gain from creating
additional services plans for this client base.
20. AGE GROUPING AND TREND INVESTIGATION
Customer
Churn
by
%
We notice the
highest churn
group is between
40 and 59
21. WE SEE A CLEAR TREND IN OUR BANK MEMBER AGES AND CHURN
BAR CHART OF EXITED BANK MEMBERS GROUPS BY THEIR BANK BALANCE
22. WE SEE A CLEAR TREND IN OUR BANK MEMBER AGES AND CHURN
BAR CHART OF EXITED BANK MEMBERS GROUPS BY THEIR ESTIMATED SALARY
23. GERMANY HAD THE HIGHEST CHURN RATE
AND FRANCE THE LOWEST AS FRANCE HAS THE HIGHEST % OF MEMBERS
24. COUNTRY COMPARISON BY BANK BALANCES AND CHURN
We notice Germany had the
higher customers with larger
balances leave the bank
Avg Bank Balance by Country
Churn versus No Churn
0 France 60339.275678
Germany 119427.106696
Spain 59678.070470
1 France 71192.795728
Germany 120361.075590
Spain 72513.352446
Churn by Country and
Bank Balance
25. CUSTOMER CHURN BY YEAR 1 THROUGH 11
We notice that our graph shows highest Churn rates in years labeled 1, 3 & 9
26. DOES THIS CONNECT? LET’S LOOK OUR
RELATIONSHIPS USING CORRELATION
In this Heatmap, I used
Pearson's correlation
coefficient which is the test
statistics that measures the
statistical relationship, or
association, between two
continuous variables. It is
known as the best method of
measuring the association
between variables of interest
because it is based on the
method of covariance.
The strongest correlation
I noticed was the Age
variant (.29) which
corresponds to my Age
box Plot.
27. STEP 3 – DATA WRANGLING
My Dataset had no “Null” values
so no need to fill or add mean data
From my Data Set I dropped
Columns that had
zero impact on my results:
28. STEP 3 – DATA WRANGLING
I introduce 3 new variables by combining 6 existing variables as
they had correlations to each other and would improve my
analysis.
I created 3 new variables that correlated to each other by
transforming them into ratios: Balance/Salary, TenurebyAge, and
CreditScore Given Age
29. RESULTS OF 3 NEW COMBINED
VARIABLES
We see visit evidence
of a slightly higher
Churn rate among
those with a higher
Balance/Salary Ratio
30. I use the min/max operations to scale my
“continuous variables” to eliminate
unnecessary variances. Min/Max is also
known as “Normalization”. This formula
behind this is below: These “normalization”
techniques help in comparing
corresponding normalized values from two
or more different data sets in a way that it
eliminates the effects of the variation in the
scale of the data sets i.e. a data set with
large values can be easily compared with a
data set of smaller values.
Step Step 3 – Data Wrangling - Normalization3
– Data Wrangling
31. Hot Key encoding: In the dataset, there are some
variables with numerical values, some variables with
categories and some variables with binary values (0 and
1). For numerical and binary variables, we do not worry
about labeling. However, we perform label encoding for
the categorical variables. This step is carried out on the
whole dataset. I “Hot Key” encoded the following
variables: Gender and Geography to transform them to
binary using a “for”/”if” statement. I performed “Hot
Label Encoding” where I changed the value “0” (no
churn) in the two categorical variables “Has Credit
Card” and “Is Active Member” to a -1 to show a
negative relationship more clearly.
Step 3 – Data Wrangling3
32. Data splitting: This involves splitting the label
encoded dataset into train and test datasets. In
this project I separated the data to a 70/30
ratio. The fractions of both classes remain the
same in train (70) and test (30) datasets. This is
to avoid what is known as “overfitting” of data
which related to applying my newly created
Churn Model to general unused data (test data)
after I train my models on my “train” data set.
Step 3 – Data Wrangling3 – Data Splitting into
Test/Train
33. Modeling Pipeline
Next, I build a “Data Pipeline” In Python
which allows the me to transform data
from one representation to another
through a series of steps. In other words,
to ensure my hot encoding, min/max
normalization and both my categorical and
continuous variables continue in the test
and train modeling
Step 4 – Train and Test the Data
34. Step 4 – Train and Test the Data – Feeding the Data into an
Algorithm
35. Logistic Regression: Logistic Regression is one of
the basic and popular algorithm to solve a
classification problem. It is named as ‘Logistic
Regression’, because it’s underlying technique is
quite the same as Linear Regression. The term
“Logistic” is taken from the Logit function that is
used in this method of classification which uses the
Sigmoid function. Since our target variable is
Binary ( either the customer churned or they did
not churn) I chose Logistic Regression.
Step 4 – Train and Test the Data – Logistic
Regression – Model 1
36. Support Vector Machine (SVM) is a supervised machine learning
algorithm which can be used for both classification and regression
challenges. SVM is mostly used in classification problems. In this
algorithm, we plot each data item as a point in n-dimensional space
(where n is number of features you have) with the value of each feature
being the value of a coordinate. SVM is better suited as I need a way to
separate my data into CHURN or NO CHURN and Logistic Regression uses
a straight line. SVM also maximizes margin, so the model is slightly
more robust, but more importantly: SVM supports kernels, so you can
model even non-linear relations. Given a set of training examples, each
marked for belonging to one of two categories, an SVM training
algorithm builds a model that assigns new examples into one category or
the other, making it a non-probabilistic binary linear classifier which my
data requires to build the Churn/No Churn model as mentioned earlier.
Step 4 – Train and Test the Data – SVM Model
2
Source: Wikipedia
37. Random Forest works well with a mixture of
numerical and categorical features which the
Bank Churn data has. When features are on the
various scales, it is also fine. Roughly speaking,
with Random Forest you can use the data as it
is. Random Forest uses a large # of trees, works
with missing values and is often considered to
be a highly accurate model for both regression
and classification problems.
Source: Wikipedia
Step 4 – Train and Test the Data – Random Forest Model 3
38. Step 4 – Lets see both Train and Test Results Side by Side3
TEST DATA
RESULTS
TRAIN DATA
RESULTS
39. Step 4 – Lets see both Train and Test Results Side by Side3
TEST DATA
RESULTS
TRAIN DATA
RESULTS
40. Step 4 – Lets see both Train and Test Results Side by Side3
TEST DATA
RESULTS
TRAIN DATA
RESULTS
41. A ROC curve plots TPR vs. FPR at different classification thresholds.
Lowering the classification threshold classifies more items as positive,
thus increasing both False Positives and True Positives.
AUC - ROC curve is a performance measurement for classification problem
at various thresholds settings. ROC is a probability curve and the AUC
represents degree or measure of separability. It tells how much model is
capable of distinguishing between classes. The Higher the AUC, better the
model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC,
better the model is at distinguishing between patients with disease and
no disease.
The ROC curve is plotted with TPR against the FPR where TPR is on y-axis
and FPR is on the x-axis.
Source: Toward Data Science
Step 5 – Accuracy Check – Did the 3 models make it?
42. Step 5 – Accuracy Check – Bank Churn Results
We see that the RF Score
had the best results with a
.77 which means it is able
to correctly classify the
client data into a “churn”
or “no churn” category .77
of the time.
44. STEP 5 – FEATURE
IMPORTANCE– RANK
WHAT MATTERS
We see that
Age played a key
component in
Customer Churn
along with
Balance.
45. STEP 5 – NULL
HYPOTHESIS
STATEMENT – ACCEPT
OR REJECT?
The Chi-Square test is
intended to test how
the Chi-square test is
intended to test how likely
it is that an observed
distribution is due to
chance. It is also called a
"goodness of fit" statistic,
because it measures how
well the observed
distribution of data fits
with the distribution that is
expected if the variables
are Independent.
“I believe that the Gender of each customer has
little impact on whether the bank customer will
leave or not.”
46. CHI-SQUARE TEST
RESULTS
We see from the results below the Chi-Square Statistic is high
as well as the Critical Value. The p-value is less than our
significance of .05 so we reject the HO and go with the HA
which is “Gender has a determining factor if the customer
leaves the bank”
REJECTED!
47. STEP 5 – A 2ND NULL
HYPOTHESIS STATEMENT –
ACCEPT OR REJECT?
Using the “p-value”
test I will set .05 as my
threshold and run a test
to verify that I am
correct about my Null
Hypothesis (H0)
Below is the best
explanation of “p-value”
CLICK HERE
“I believe that Bank Balance has no
impact to whether the bank customer
will leave or not.”
48. NOW LET’S LOOK DEEPER
AT BANK BALANCE
T-Test
I performed an unpaired t-test to compare the mean Bank Balance
for Retained vs. Churned customers. Retained customers have an
average bank balance of $91,00, vs. $101,465 among Churned
customers. The test yielded a t-statistic of 4.10 with a p-value of
0.00067. The null hypothesis is that the two groups have the same
mean, and the t-statistic and p-value tell us it is very unlikely that we
would observe such large differences in the means assuming the two
groups have the same mean. Thus, with a p-value < 0.05, we reject
the null hypothesis in favor of the alternative hypothesis that there is
a relationship between bank balance and whether the customer
churned or not.
49. FINAL THOUGHTS AND
CONCLUSIONS
Through my examination of the small data set I did discover a few
significant findings:
There is no significant difference in the credit score distribution
between retained and churned customers.
The older customers (over 35) are churning at a higher rate than
the younger ones alluding to a difference in service preference in
the age categories. The bank may need to review their target
market or review the strategy for retention between the different
age groups
Bank members with an average tenure are slightly less likely to
churn than those with either low or high number of tenure years.
50. FINAL THOUGHTS AND
CONCLUSIONS
Continued….
The data shows that customers with higher balances
are churning at a higher rate which is cause for
concern for their lending capability. The bank could
benefit from offering special programs when, say, a
balance of $75,000 and offer a higher rate of interest
on a savings account or special investment privileges.
Neither the product nor the salary has a significant
effect on the likelihood to churn.
More females have churned than males
More credit card holders churn though most of the
bank customers possess credit cards. The bank can
benefit from increasing incentives in keeping credit
card holders.
51. FINAL THOUGHTS AND
CONCLUSIONS
Model Review
My Random Forest Model performed the best as it was able to capture
both the True Positives and True Negative outcomes .86 out of 100. This is
a strong result and in this model an analyst is not required to
scale/normalize the data.
According to my Feature Selection results from my Random Forest model,
the most important determinants of Churn are Age, Gender and Bank
Balance followed by Tenure and Geography.
I think developing an accurate predictive model should be incorporated in
every bank’s business plan as the closer a bank can get to modeling the
characteristics of those who may leave, the more detailed a retention plan
the bank can create and implement. Insights carefully extracted from data
are invaluable to the health of the competitive nature of banks, especially
credit cards, bank-balance specific programs and other key ancillary
features a bank provides to keep their clients.
52. OTHER MODELS
I would like to train banking data using the Naïve
Bayes Classifier as this assumes independence between
every pair of features and accounts for new changes
relating to the features and is quick to run. I would also
like to try K nearest neighbors as it is a classification is a
type of lazy learning as it does not attempt to construct a
general internal model, but simply stores instances of the
training data. Classification is computed from a simple
majority vote of the k nearest neighbors of each point and
is robust against noisy data. It will also work well for a
Bank Data set with over 100,000 observations.
53. FURTHER RESEARCH….
The study can be greatly improved with the following data since there are many
unanswered questions:
Would it be possible to obtain balances over time as opposed to a single date?
What date did the customer exit?
What types of products are the customers in?
Could they have exited from a product and not the bank?
Does the bank have an investment division?
Did the customer retire and consolidate assets elsewhere?
Are there NPS scores to factor into the data?
Of course, every business needs to perform analysis and take measures to prevent
Customer Churn; considering the cost of acquiring each customer, a study should
be an annual requirement and perhaps creating ABM (Accounts Based Marketing)
plans to target key client groups.
54. Carolyn Massa
Personal site: revinformatics.com
GitHub: https://github.com/CarolynMassa
Email: carolyn@revinformatics.com
LinkedIn: https://www.linkedin.com/in/carolynmassa
Portfolio
● Data Analyst/Project Manager
entrepreneur
● Masters in Information Technology
55. Lady Anya’s Lair –
Crawl unto ME. “All you
can do, and all you are
permitted to do, is feel.”