Here are the key requirements for the project:
- The software must be able to analyze customer data from multiple sources like customer records, call detail records, network logs etc. It needs to merge and clean this data.
- It must apply machine learning algorithms like logistic regression to build predictive models that can predict whether a customer will churn or not based on demographic and usage attributes.
- The models need to be trained on historical customer data and then tested to evaluate accuracy. Metrics like confusion matrix need to be used to assess model performance.
- The software should be able to prioritize customers most likely to churn so retention efforts can be focused. It generates a list of potential defectors.
- It is developed using
Automated Feature Selection and Churn Prediction using Deep Learning ModelsIRJET Journal
This document discusses using deep learning models for churn prediction in the telecommunications industry. It begins with an introduction to churn prediction and feature selection challenges. It then provides an overview of deep learning techniques, including artificial neural networks, convolutional neural networks, and their applications. The document proposes three deep learning architectures for churn prediction and experiments with them on two telecom datasets. The results show deep learning models can achieve performance comparable to traditional models without manual feature engineering.
This document describes an internship report submitted by Pradip Sapkota to fulfill the requirements for a Bachelor's degree in Computer Science from Tribhuvan University in Nepal. The internship was conducted at Lumbini Net Pvt Ltd from April to July 2015 under the supervision of Mr. Binod Kumar Adhikari. During the internship, Pradip assisted with tasks like wireless network configuration, router setup, and other activities to support Lumbini Net's internet service offerings. The report documents the work done and lessons learned from the experience.
This document discusses a seminar report on using honeypots in network security. It provides background on honeypots, describing them as virtual machines that emulate real systems and services to detect unauthorized access. The goal of the report is to provide an explanation of honeypots and how they can be deployed to enhance security across organizational networks. It allows system administrators to trace back the source of hackers. In the past few years, honeypot technology has rapidly developed with concepts such as honeypot farms, commercial and open source solutions, and documented findings.
Predicting churn with filter-based techniques and deep learningIJECEIAES
Customer churn prediction is of utmost importance in the telecommunications industry. Retaining customers through effective churn prevention strategies proves to be more cost-efficient. In this study, attribute selection analysis and deep learning are integrated to develop a customer churn prediction model to improve performance while reducing feature dimensions. The study includes the analysis of customer data attributes, exploratory data analysis, and data preprocessing for data quality enhancement. Next, significant features are selected using two attribute selection techniques, which are chi-square and analysis of variance (ANOVA). The selected features are fed into an artificial neural network (ANN) model for analysis and prediction. To enhance prediction performance and stability, a learning rate scheduler is deployed. Implementing the learning rate scheduler in the model can help prevent overfitting and enhance convergence speed. By dynamically adjusting the learning rate during the training process, the scheduler ensures that the model optimally adapts to the data while avoiding overfitting. The proposed model is evaluated using the Cell2Cell telecom database, and the results demonstrate that the proposed model exhibits a promising performance, showcasing its potential as an effective churn prediction solution in the telecommunications industry.
International journal of computer science and innovation vol 2015-n1- paper1sophiabelthome
This document describes the development of a software system for automated end-user support at Goce Delcev University in Macedonia. It discusses the need for a centralized support system to improve efficiency over the previous email/phone-based model. Requirements were gathered through a user survey. The system was designed using Microsoft Dynamics CRM and integrated into the university's existing information system. It allows for electronic logging, tracking, and resolution of support incidents. A subsequent evaluation found the system met users' needs.
Customer churn classification using machine learning techniquesSindhujanDhayalan
Advanced data mining project on classifying customer churn by
using machine learning algorithms such as random forest,
C5.0, Decision tree, KNN, ANN, and SVM. CRISP-DM approach was followed for developing the project. Accuracy rate, Error rate, Precision, Recall, F1 and ROC curve was generated using R programming and the efficient model was found comparing these values.
Data Mining on Customer Churn ClassificationKaushik Rajan
Implemented multiple classifiers to classify if a customer will leave or stay with the company based on multiple independent variables.
Tools used:
> RStudio for Exploratory data analysis, Data Pre-processing and building the models
> Tableau and RStudio for Visualization
> LATEX for documentation
Machine learning models used:
> Random Forest
> C5.0
> Decision tree
> Neural Network
> K-Nearest Neighbour
> Naive Bayes
> Support Vector Machine
Methodology: CRISP-DM
Automated Feature Selection and Churn Prediction using Deep Learning ModelsIRJET Journal
This document discusses using deep learning models for churn prediction in the telecommunications industry. It begins with an introduction to churn prediction and feature selection challenges. It then provides an overview of deep learning techniques, including artificial neural networks, convolutional neural networks, and their applications. The document proposes three deep learning architectures for churn prediction and experiments with them on two telecom datasets. The results show deep learning models can achieve performance comparable to traditional models without manual feature engineering.
This document describes an internship report submitted by Pradip Sapkota to fulfill the requirements for a Bachelor's degree in Computer Science from Tribhuvan University in Nepal. The internship was conducted at Lumbini Net Pvt Ltd from April to July 2015 under the supervision of Mr. Binod Kumar Adhikari. During the internship, Pradip assisted with tasks like wireless network configuration, router setup, and other activities to support Lumbini Net's internet service offerings. The report documents the work done and lessons learned from the experience.
This document discusses a seminar report on using honeypots in network security. It provides background on honeypots, describing them as virtual machines that emulate real systems and services to detect unauthorized access. The goal of the report is to provide an explanation of honeypots and how they can be deployed to enhance security across organizational networks. It allows system administrators to trace back the source of hackers. In the past few years, honeypot technology has rapidly developed with concepts such as honeypot farms, commercial and open source solutions, and documented findings.
Predicting churn with filter-based techniques and deep learningIJECEIAES
Customer churn prediction is of utmost importance in the telecommunications industry. Retaining customers through effective churn prevention strategies proves to be more cost-efficient. In this study, attribute selection analysis and deep learning are integrated to develop a customer churn prediction model to improve performance while reducing feature dimensions. The study includes the analysis of customer data attributes, exploratory data analysis, and data preprocessing for data quality enhancement. Next, significant features are selected using two attribute selection techniques, which are chi-square and analysis of variance (ANOVA). The selected features are fed into an artificial neural network (ANN) model for analysis and prediction. To enhance prediction performance and stability, a learning rate scheduler is deployed. Implementing the learning rate scheduler in the model can help prevent overfitting and enhance convergence speed. By dynamically adjusting the learning rate during the training process, the scheduler ensures that the model optimally adapts to the data while avoiding overfitting. The proposed model is evaluated using the Cell2Cell telecom database, and the results demonstrate that the proposed model exhibits a promising performance, showcasing its potential as an effective churn prediction solution in the telecommunications industry.
International journal of computer science and innovation vol 2015-n1- paper1sophiabelthome
This document describes the development of a software system for automated end-user support at Goce Delcev University in Macedonia. It discusses the need for a centralized support system to improve efficiency over the previous email/phone-based model. Requirements were gathered through a user survey. The system was designed using Microsoft Dynamics CRM and integrated into the university's existing information system. It allows for electronic logging, tracking, and resolution of support incidents. A subsequent evaluation found the system met users' needs.
Customer churn classification using machine learning techniquesSindhujanDhayalan
Advanced data mining project on classifying customer churn by
using machine learning algorithms such as random forest,
C5.0, Decision tree, KNN, ANN, and SVM. CRISP-DM approach was followed for developing the project. Accuracy rate, Error rate, Precision, Recall, F1 and ROC curve was generated using R programming and the efficient model was found comparing these values.
Data Mining on Customer Churn ClassificationKaushik Rajan
Implemented multiple classifiers to classify if a customer will leave or stay with the company based on multiple independent variables.
Tools used:
> RStudio for Exploratory data analysis, Data Pre-processing and building the models
> Tableau and RStudio for Visualization
> LATEX for documentation
Machine learning models used:
> Random Forest
> C5.0
> Decision tree
> Neural Network
> K-Nearest Neighbour
> Naive Bayes
> Support Vector Machine
Methodology: CRISP-DM
Srikanth V has over 5 years of experience as a test analyst and test coordinator working on various projects for clients like CITI Corp, Scottish Widows, and Hutchison3g. He has expertise in mainframe applications, databases like DB2 and Oracle, and testing tools like HP Quality Centre and SOAP UI. Some of the key projects he has worked on include the CITI Cards Customer Communication Letters system, enhancing annuity products for Scottish Widows, and mobile provisioning systems for Hutchison3g.
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...IRJET Journal
This project aimed to develop machine learning models to predict customer churn in the telecommunications industry. Four algorithms were evaluated - logistic regression, support vector machine, decision tree, and random forest. Logistic regression performed best with an accuracy of 79.25% and AUC score of 84.08%. The models analyzed customer attribute data to identify patterns and predict churn, helping telecom companies understand churn reasons and develop retention strategies. The results provide insights to improve customer experience and reduce costly customer churn.
Customer relationship management (CRM) is an important element in all forms of industry. This process involves ensuring that the customers of a business are satisfied with the product or services that they are paying for. Since most businesses collect and store large volumes of data about their customers; it is easy for the data analysts to use that data and perform predictive analysis. One aspect of this includes customer retention and customer churn. Customer churn is defined as the concept of understanding whether or not a customer of the company will stop using the product or service in future. In this paper a supervised machine learning algorithm has been implemented using Python to perform customer churn analysis on a given data-set of Telco, a mobile telecommunication company. This is achieved by building a decision tree model based on historical data provided by the company on the platform of Kaggle. This report also investigates the utility of extreme gradient boosting (XGBoost) library in the gradient boosting framework (XGB) of Python for its portable and flexible functionality which can be used to solve many data science related problems highly efficiently. The implementation result shows the accuracy is comparatively improved in XGBoost than other learning models.
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
This document discusses using machine learning techniques like logistic regression to analyze customer data and predict customer churn in the telecom industry. It proposes a system to build a churn prediction model using logistic regression on historical customer data to identify high-risk customers. The system would have options to view results, perform training and testing on new data, and analyze performance. It would also include a recommender system to recommend suitable plans for identified churn customers based on their usage patterns. The results show the model can predict churn with 80% accuracy and identify similar customers who may also churn.
A Survey on Batch Auditing Systems for Cloud StorageIRJET Journal
1. The document discusses batch auditing systems for cloud storage security. It provides background on cloud computing and security issues with storing data in the cloud.
2. It describes existing auditing systems like public and private auditing. It also summarizes several key research papers that proposed techniques like provable data possession, proof of retrievability, and using a third party auditor and bilinear aggregate signatures for public auditing.
3. The document proposes a new batch auditing method that uses the MapReduce framework to map signatures to data and reduce them to efficiently verify signatures in parallel when the aggregate signature fails verification. This improves the performance and efficiency of integrity verification for large amounts of cloud data.
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERNIRJET Journal
This document summarizes research on predicting customer churn in the telecommunications industry. It first defines customer churn as the rate at which customers stop doing business with a company. It then reviews several past studies that have used techniques like decision trees, neural networks, and data mining to predict churn. The proposed research aims to develop a new churn prediction model using natural language processing (NLP) and machine learning approaches to improve accuracy. It will identify customer behavior patterns and evaluate factors that influence prediction accuracy. The model will be trained and tested on a telecommunications data set to calculate churn rates on both monthly and daily bases. This will help enhance customer service. Gaps in past research identified include issues with imbalanced data, high error rates, and
Shashank Narayan completed a summer training internship at Bharti Airtel Ltd. in their UNOC department in Manesar, Haryana. The report provides an overview of the internship, including detailing various telecommunication technologies like PDH, SDH, DWDM, and network performance monitoring tools used by Airtel. It acknowledges the guidance received from managers during the training period.
The project “Billing system” is an application to automate the process of ordering
And billing of a restaurant .This application is developed for the established restaurants in the city to manage the billing operations. It has the entire basic module to operate the billing modules. This application also administrates its users and customers.
This project will serve the following objectives:-
• Add and maintain records of available products.
• Add and maintain customer details.
• Add and maintain description of new products.
• Add and maintain admin and employee details.
• Provides a convenient solution of billing pattern.
• Make an easy to use environment for users and customers.
• Create membership for customers.
BUSINESS CASE CAPSTONE2BUSINESS CASE CAPSTONE3.docxjasoninnes20
BUSINESS CASE CAPSTONE 2
BUSINESS CASE CAPSTONE 3
Business Case Capstone
Connie Farris
Raphael Brown
Jim Chambers
Shaun Cummings
Deandre Kralevic
Colorado Technical University
IT Capstone II
(IT488-1904B-01)
Henrietta Okora
Running head: BUSINESS CASE CAPSTONE 1
Business Case Capstone
Table of Contents
(Week 1) Section 1: Overview of the project (from IT487)3
Overview3
I75 Corridor4
Section 2: Requirements (from IT487)5
Section 3: Design (from IT487)7
Section 4: System development methodology9
Section 5: Work breakdown structure11
Section 6: Communication Plan13
(WEEK 2) Section 7: Quality Assurance Plan TBD15
(WEEK 3) Section 8: Documentation Plan TBD16
(Week 4) Section 9: Quality Assurance and results of test-case execution TBD17
Section 10: Project Closure18
References:19
(Week 1) Section 1: Overview of the project (from IT487)Overview
The Galactic Customer IT Services is IT Support Company with over 250,000 employees with companies in over 50 states with the main headquarters located in Gainesville, FL. The location chosen for the headquarters is based on a Telco Gateway Infrastructure that the main fiber-optic truck line runs along the I75 corridor, from Miami Lakes FL to the northern part of Michigan. This I75 corridor plays an important part of the Networking ability for the organization. The Galactic Customer IT Services is an IT support company, which provides IT support to various small to large companies both within the Unites States and support to various military bases overseas. This large customer service company has installed application software to its large Help Desk ticketing system. The next phase being implemented is the upgrading of its Networking Infrastructure, which also includes cloud-based networking. This Organization has several new updated Servers ready to install on the network. With the previous project being accepted the organization has decided to move forward to improving its networking infrastructure, however the organization has request the project team to draft a plan that requires the following in the plan: Requirements, Design, System Development Methodology, WBS (Work Breakdown Structure), Communications Plan, Quality Assurance Plan, Documentation Plan, and Quality Assurance and results of test-case execution be the project can be closed. I75 Corridor
http://gregkantner.com/blog/wp-content/uploads/2012/05/interstate-75-map.gif
Section 2: Requirements (from IT487)
The project requirements here are derived from the software requirements for the Galactic customer services. The project was to create new ticketing software that the customer wanted upgraded and to completely replace the previous product. The requirements are the following:
• Ticket system
• Ticke ...
This seminar report discusses the concept of Internet of Behaviour (IoB). The report provides an introduction to IoB, explaining that it collects and analyzes data about human behaviors from various sources to influence behaviors. It outlines some of the benefits of IoB for businesses in tailoring products and services. However, it also notes ethical concerns around privacy and security of user data. The report discusses some applications of IoB in 2021, including in business, and provides examples of how companies like Google and Facebook use behavioral data. It also briefly discusses the potential ramifications of IoB.
A Two Stage Classification Model for Call Center Purchase PredictionTELKOMNIKA JOURNAL
In call center [1] product recommendation field, call center as an organization between users and telecom operator, doesn’t have permission to access users’ specific information and the detailed products information. Accordingly, rule-based selection method is common used to predict user purchase behavior by the call center. Unfortunately, rule-based approach not only ignores the user’s previous behavior information entirely, and it is difficult to make use of the existing interaction records between users and products. Consequently, it will not get desired results if we just use the basic selection method to predict user purchase behavior directly, because the problem is that the features straightly extracted from the interaction data records are limited. In order to solve the problem above, this paper proposes a two-stage algorithm that based on K-Means Clustering Algorithm [2] and SVM [3, 4] Classification Algorithm. Firstly, we get the potential category information of products by K-Means Clustering Algorithm, and then use SVM Classification Model to predict users purchasing behavior. This two-stage prediction model not only solves the feature shortage problem, but also gives full consideration to the potential features between users and product categories, which can help us to gain significant performance in call center product recommendation field.
The document summarizes two vendor proposals from FutureTech and VBS for outsourcing Pacific Petroleum's IT operations. It evaluates the proposals based on key criteria like experience, cost, quality and communication. While FutureTech is more established, VBS offers to employ all current staff and proposes the latest development methodologies. A cost analysis finds FutureTech's proposal more cost-effective. The document also outlines potential issues around security, termination of contract, technical refresh and ownership that need clarification in the final agreement.
The document summarizes the telecom industry in India. It discusses the key players in the public and private sector, as well as research and development firms and service providers. It outlines challenges faced by the industry during COVID-19, such as the need for quick customer service and remote working. Future prospects discussed include the Indian government's initiatives to support 5G rollout through policies, investments and reforms.
This document provides an overview of the role of information technology in Life Insurance Corporation of India (LIC). It discusses how LIC has implemented information technologies like databases, groupware software, mapping, call centers, video conferencing, intranets and the internet across various functional areas including marketing, underwriting and claims processing. The implementation of these technologies has helped improve efficiency, reduce costs and enhance customer service for LIC.
This document describes an online job recruitment system built using PHP. It allows job seekers to register, search for jobs, and manage their profiles. Employers can register, post jobs to the system, and manage job listings. The system has administrative, employer, and job seeker modules. It aims to make the job search and recruitment process easier and more accessible for all users. A feasibility study was conducted and the system was found to be technically, economically, and behaviorally feasible. The system will use PHP for the front end, MySQL for the database, and run on a Windows server environment.
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCEIRJET Journal
1) The document discusses a survey on an online examination system using artificial intelligence. It aims to modernize the examination process and enable remote proctoring using techniques like webcams, screen recording, and AI-based behavior analysis to detect cheating.
2) The proposed system has an administrator module to register instructors and students. Instructors can create, schedule, and remotely proctor timed exams for students. Various AI functions will monitor students to ensure a fair exam environment.
3) The system aims to enable online exams without physical invigilation through comprehensive monitoring. It analyzes the literature on existing remote proctoring techniques and online exam management systems.
This document describes an estimation framework developed by Tata Consultancy Services to standardize and improve the accuracy of software project estimations. The framework includes components for sizing, effort estimation, scheduling, resource planning, and costing. It also includes a decision matrix to select the appropriate estimation model based on project characteristics. Continuous feedback from project outcomes is used to refine the framework over time through a plan-do-check-act cycle. The framework aims to increase predictability and reduce risks associated with inaccurate project estimations. A case study demonstrates how the framework was applied to a sample project.
This document describes an estimation framework developed by Tata Consultancy Services to standardize and improve the accuracy of software project estimations. The framework includes components for sizing, effort estimation, scheduling, resource planning, and costing. It also includes a decision matrix to select the appropriate estimation techniques and models based on project characteristics. Continuous feedback on actual project outcomes is used to refine the framework over time through a plan-do-check-act cycle. The framework aims to increase predictability in estimations and minimize risks from inaccurate estimates. A case study demonstrates how the framework was applied to a project.
Predicting reaction based on customer's transaction using machine learning a...IJECEIAES
Banking advertisements are important because they help target specific customers on subscribing to their packages or other deals by giving their current customers more fixed-term deposit offers. This is done through promotional advertisements on the Internet or media pages, and this task is the responsibility of the shopping department. In order to build a relationship with them, offer them the best deals, and be appropriate for the client with the company's assurance to recover these deposits, many banks or telecommunications firms store the data of their customers. The Portuguese bank increases its sales by establishing a relationship with its customers. This study proposes creating a prediction model using machine learning algorithms, to see how the customer reacts to subscribe to those fixed-term deposits or offers made with the aid of their past record. This classification is binary, i.e., the prediction of whether or not a customer will embrace these offers. Four classifiers that include k-nearest neighbor (k-NN) algorithm, decision tree, naive Bayes, and support vector machines (SVM) were used, and the best result was obtained from the classifier decision tree with an accuracy of 91% and the other classifier SVM with an accuracy of 89%.
The presentation envelopes unified communication services, beginning with what
it means, its importance and benefits. The slide sheds light on what's popularly known as 'Human
Assisted E-commerce' and how Happiest Minds' Lifecycle Services help enhance unified
communications.
Happiest Minds Unified Communication Services:
http://www.happiestminds.com/technology-focus/unified-communications-services/
Height and depth gauge linear metrology.pdfq30122000
Height gauges may also be used to measure the height of an object by using the underside of the scriber as the datum. The datum may be permanently fixed or the height gauge may have provision to adjust the scale, this is done by sliding the scale vertically along the body of the height gauge by turning a fine feed screw at the top of the gauge; then with the scriber set to the same level as the base, the scale can be matched to it. This adjustment allows different scribers or probes to be used, as well as adjusting for any errors in a damaged or resharpened probe.
More Related Content
Similar to major documentation(Telecom churn Based on ML).docx
Srikanth V has over 5 years of experience as a test analyst and test coordinator working on various projects for clients like CITI Corp, Scottish Widows, and Hutchison3g. He has expertise in mainframe applications, databases like DB2 and Oracle, and testing tools like HP Quality Centre and SOAP UI. Some of the key projects he has worked on include the CITI Cards Customer Communication Letters system, enhancing annuity products for Scottish Widows, and mobile provisioning systems for Hutchison3g.
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...IRJET Journal
This project aimed to develop machine learning models to predict customer churn in the telecommunications industry. Four algorithms were evaluated - logistic regression, support vector machine, decision tree, and random forest. Logistic regression performed best with an accuracy of 79.25% and AUC score of 84.08%. The models analyzed customer attribute data to identify patterns and predict churn, helping telecom companies understand churn reasons and develop retention strategies. The results provide insights to improve customer experience and reduce costly customer churn.
Customer relationship management (CRM) is an important element in all forms of industry. This process involves ensuring that the customers of a business are satisfied with the product or services that they are paying for. Since most businesses collect and store large volumes of data about their customers; it is easy for the data analysts to use that data and perform predictive analysis. One aspect of this includes customer retention and customer churn. Customer churn is defined as the concept of understanding whether or not a customer of the company will stop using the product or service in future. In this paper a supervised machine learning algorithm has been implemented using Python to perform customer churn analysis on a given data-set of Telco, a mobile telecommunication company. This is achieved by building a decision tree model based on historical data provided by the company on the platform of Kaggle. This report also investigates the utility of extreme gradient boosting (XGBoost) library in the gradient boosting framework (XGB) of Python for its portable and flexible functionality which can be used to solve many data science related problems highly efficiently. The implementation result shows the accuracy is comparatively improved in XGBoost than other learning models.
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
This document discusses using machine learning techniques like logistic regression to analyze customer data and predict customer churn in the telecom industry. It proposes a system to build a churn prediction model using logistic regression on historical customer data to identify high-risk customers. The system would have options to view results, perform training and testing on new data, and analyze performance. It would also include a recommender system to recommend suitable plans for identified churn customers based on their usage patterns. The results show the model can predict churn with 80% accuracy and identify similar customers who may also churn.
A Survey on Batch Auditing Systems for Cloud StorageIRJET Journal
1. The document discusses batch auditing systems for cloud storage security. It provides background on cloud computing and security issues with storing data in the cloud.
2. It describes existing auditing systems like public and private auditing. It also summarizes several key research papers that proposed techniques like provable data possession, proof of retrievability, and using a third party auditor and bilinear aggregate signatures for public auditing.
3. The document proposes a new batch auditing method that uses the MapReduce framework to map signatures to data and reduce them to efficiently verify signatures in parallel when the aggregate signature fails verification. This improves the performance and efficiency of integrity verification for large amounts of cloud data.
EVALUTION OF CHURN PREDICTING PROCESS USING CUSTOMER BEHAVIOUR PATTERNIRJET Journal
This document summarizes research on predicting customer churn in the telecommunications industry. It first defines customer churn as the rate at which customers stop doing business with a company. It then reviews several past studies that have used techniques like decision trees, neural networks, and data mining to predict churn. The proposed research aims to develop a new churn prediction model using natural language processing (NLP) and machine learning approaches to improve accuracy. It will identify customer behavior patterns and evaluate factors that influence prediction accuracy. The model will be trained and tested on a telecommunications data set to calculate churn rates on both monthly and daily bases. This will help enhance customer service. Gaps in past research identified include issues with imbalanced data, high error rates, and
Shashank Narayan completed a summer training internship at Bharti Airtel Ltd. in their UNOC department in Manesar, Haryana. The report provides an overview of the internship, including detailing various telecommunication technologies like PDH, SDH, DWDM, and network performance monitoring tools used by Airtel. It acknowledges the guidance received from managers during the training period.
The project “Billing system” is an application to automate the process of ordering
And billing of a restaurant .This application is developed for the established restaurants in the city to manage the billing operations. It has the entire basic module to operate the billing modules. This application also administrates its users and customers.
This project will serve the following objectives:-
• Add and maintain records of available products.
• Add and maintain customer details.
• Add and maintain description of new products.
• Add and maintain admin and employee details.
• Provides a convenient solution of billing pattern.
• Make an easy to use environment for users and customers.
• Create membership for customers.
BUSINESS CASE CAPSTONE2BUSINESS CASE CAPSTONE3.docxjasoninnes20
BUSINESS CASE CAPSTONE 2
BUSINESS CASE CAPSTONE 3
Business Case Capstone
Connie Farris
Raphael Brown
Jim Chambers
Shaun Cummings
Deandre Kralevic
Colorado Technical University
IT Capstone II
(IT488-1904B-01)
Henrietta Okora
Running head: BUSINESS CASE CAPSTONE 1
Business Case Capstone
Table of Contents
(Week 1) Section 1: Overview of the project (from IT487)3
Overview3
I75 Corridor4
Section 2: Requirements (from IT487)5
Section 3: Design (from IT487)7
Section 4: System development methodology9
Section 5: Work breakdown structure11
Section 6: Communication Plan13
(WEEK 2) Section 7: Quality Assurance Plan TBD15
(WEEK 3) Section 8: Documentation Plan TBD16
(Week 4) Section 9: Quality Assurance and results of test-case execution TBD17
Section 10: Project Closure18
References:19
(Week 1) Section 1: Overview of the project (from IT487)Overview
The Galactic Customer IT Services is IT Support Company with over 250,000 employees with companies in over 50 states with the main headquarters located in Gainesville, FL. The location chosen for the headquarters is based on a Telco Gateway Infrastructure that the main fiber-optic truck line runs along the I75 corridor, from Miami Lakes FL to the northern part of Michigan. This I75 corridor plays an important part of the Networking ability for the organization. The Galactic Customer IT Services is an IT support company, which provides IT support to various small to large companies both within the Unites States and support to various military bases overseas. This large customer service company has installed application software to its large Help Desk ticketing system. The next phase being implemented is the upgrading of its Networking Infrastructure, which also includes cloud-based networking. This Organization has several new updated Servers ready to install on the network. With the previous project being accepted the organization has decided to move forward to improving its networking infrastructure, however the organization has request the project team to draft a plan that requires the following in the plan: Requirements, Design, System Development Methodology, WBS (Work Breakdown Structure), Communications Plan, Quality Assurance Plan, Documentation Plan, and Quality Assurance and results of test-case execution be the project can be closed. I75 Corridor
http://gregkantner.com/blog/wp-content/uploads/2012/05/interstate-75-map.gif
Section 2: Requirements (from IT487)
The project requirements here are derived from the software requirements for the Galactic customer services. The project was to create new ticketing software that the customer wanted upgraded and to completely replace the previous product. The requirements are the following:
• Ticket system
• Ticke ...
This seminar report discusses the concept of Internet of Behaviour (IoB). The report provides an introduction to IoB, explaining that it collects and analyzes data about human behaviors from various sources to influence behaviors. It outlines some of the benefits of IoB for businesses in tailoring products and services. However, it also notes ethical concerns around privacy and security of user data. The report discusses some applications of IoB in 2021, including in business, and provides examples of how companies like Google and Facebook use behavioral data. It also briefly discusses the potential ramifications of IoB.
A Two Stage Classification Model for Call Center Purchase PredictionTELKOMNIKA JOURNAL
In call center [1] product recommendation field, call center as an organization between users and telecom operator, doesn’t have permission to access users’ specific information and the detailed products information. Accordingly, rule-based selection method is common used to predict user purchase behavior by the call center. Unfortunately, rule-based approach not only ignores the user’s previous behavior information entirely, and it is difficult to make use of the existing interaction records between users and products. Consequently, it will not get desired results if we just use the basic selection method to predict user purchase behavior directly, because the problem is that the features straightly extracted from the interaction data records are limited. In order to solve the problem above, this paper proposes a two-stage algorithm that based on K-Means Clustering Algorithm [2] and SVM [3, 4] Classification Algorithm. Firstly, we get the potential category information of products by K-Means Clustering Algorithm, and then use SVM Classification Model to predict users purchasing behavior. This two-stage prediction model not only solves the feature shortage problem, but also gives full consideration to the potential features between users and product categories, which can help us to gain significant performance in call center product recommendation field.
The document summarizes two vendor proposals from FutureTech and VBS for outsourcing Pacific Petroleum's IT operations. It evaluates the proposals based on key criteria like experience, cost, quality and communication. While FutureTech is more established, VBS offers to employ all current staff and proposes the latest development methodologies. A cost analysis finds FutureTech's proposal more cost-effective. The document also outlines potential issues around security, termination of contract, technical refresh and ownership that need clarification in the final agreement.
The document summarizes the telecom industry in India. It discusses the key players in the public and private sector, as well as research and development firms and service providers. It outlines challenges faced by the industry during COVID-19, such as the need for quick customer service and remote working. Future prospects discussed include the Indian government's initiatives to support 5G rollout through policies, investments and reforms.
This document provides an overview of the role of information technology in Life Insurance Corporation of India (LIC). It discusses how LIC has implemented information technologies like databases, groupware software, mapping, call centers, video conferencing, intranets and the internet across various functional areas including marketing, underwriting and claims processing. The implementation of these technologies has helped improve efficiency, reduce costs and enhance customer service for LIC.
This document describes an online job recruitment system built using PHP. It allows job seekers to register, search for jobs, and manage their profiles. Employers can register, post jobs to the system, and manage job listings. The system has administrative, employer, and job seeker modules. It aims to make the job search and recruitment process easier and more accessible for all users. A feasibility study was conducted and the system was found to be technically, economically, and behaviorally feasible. The system will use PHP for the front end, MySQL for the database, and run on a Windows server environment.
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCEIRJET Journal
1) The document discusses a survey on an online examination system using artificial intelligence. It aims to modernize the examination process and enable remote proctoring using techniques like webcams, screen recording, and AI-based behavior analysis to detect cheating.
2) The proposed system has an administrator module to register instructors and students. Instructors can create, schedule, and remotely proctor timed exams for students. Various AI functions will monitor students to ensure a fair exam environment.
3) The system aims to enable online exams without physical invigilation through comprehensive monitoring. It analyzes the literature on existing remote proctoring techniques and online exam management systems.
This document describes an estimation framework developed by Tata Consultancy Services to standardize and improve the accuracy of software project estimations. The framework includes components for sizing, effort estimation, scheduling, resource planning, and costing. It also includes a decision matrix to select the appropriate estimation model based on project characteristics. Continuous feedback from project outcomes is used to refine the framework over time through a plan-do-check-act cycle. The framework aims to increase predictability and reduce risks associated with inaccurate project estimations. A case study demonstrates how the framework was applied to a sample project.
This document describes an estimation framework developed by Tata Consultancy Services to standardize and improve the accuracy of software project estimations. The framework includes components for sizing, effort estimation, scheduling, resource planning, and costing. It also includes a decision matrix to select the appropriate estimation techniques and models based on project characteristics. Continuous feedback on actual project outcomes is used to refine the framework over time through a plan-do-check-act cycle. The framework aims to increase predictability in estimations and minimize risks from inaccurate estimates. A case study demonstrates how the framework was applied to a project.
Predicting reaction based on customer's transaction using machine learning a...IJECEIAES
Banking advertisements are important because they help target specific customers on subscribing to their packages or other deals by giving their current customers more fixed-term deposit offers. This is done through promotional advertisements on the Internet or media pages, and this task is the responsibility of the shopping department. In order to build a relationship with them, offer them the best deals, and be appropriate for the client with the company's assurance to recover these deposits, many banks or telecommunications firms store the data of their customers. The Portuguese bank increases its sales by establishing a relationship with its customers. This study proposes creating a prediction model using machine learning algorithms, to see how the customer reacts to subscribe to those fixed-term deposits or offers made with the aid of their past record. This classification is binary, i.e., the prediction of whether or not a customer will embrace these offers. Four classifiers that include k-nearest neighbor (k-NN) algorithm, decision tree, naive Bayes, and support vector machines (SVM) were used, and the best result was obtained from the classifier decision tree with an accuracy of 91% and the other classifier SVM with an accuracy of 89%.
The presentation envelopes unified communication services, beginning with what
it means, its importance and benefits. The slide sheds light on what's popularly known as 'Human
Assisted E-commerce' and how Happiest Minds' Lifecycle Services help enhance unified
communications.
Happiest Minds Unified Communication Services:
http://www.happiestminds.com/technology-focus/unified-communications-services/
Similar to major documentation(Telecom churn Based on ML).docx (20)
Height and depth gauge linear metrology.pdfq30122000
Height gauges may also be used to measure the height of an object by using the underside of the scriber as the datum. The datum may be permanently fixed or the height gauge may have provision to adjust the scale, this is done by sliding the scale vertically along the body of the height gauge by turning a fine feed screw at the top of the gauge; then with the scriber set to the same level as the base, the scale can be matched to it. This adjustment allows different scribers or probes to be used, as well as adjusting for any errors in a damaged or resharpened probe.
Build the Next Generation of Apps with the Einstein 1 Platform.
Rejoignez Philippe Ozil pour une session de workshops qui vous guidera à travers les détails de la plateforme Einstein 1, l'importance des données pour la création d'applications d'intelligence artificielle et les différents outils et technologies que Salesforce propose pour vous apporter tous les bénéfices de l'IA.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Home security is of paramount importance in today's world, where we rely more on technology, home
security is crucial. Using technology to make homes safer and easier to control from anywhere is
important. Home security is important for the occupant’s safety. In this paper, we came up with a low cost,
AI based model home security system. The system has a user-friendly interface, allowing users to start
model training and face detection with simple keyboard commands. Our goal is to introduce an innovative
home security system using facial recognition technology. Unlike traditional systems, this system trains
and saves images of friends and family members. The system scans this folder to recognize familiar faces
and provides real-time monitoring. If an unfamiliar face is detected, it promptly sends an email alert,
ensuring a proactive response to potential security threats.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
Open Channel Flow: fluid flow with a free surfaceIndrajeet sahu
Open Channel Flow: This topic focuses on fluid flow with a free surface, such as in rivers, canals, and drainage ditches. Key concepts include the classification of flow types (steady vs. unsteady, uniform vs. non-uniform), hydraulic radius, flow resistance, Manning's equation, critical flow conditions, and energy and momentum principles. It also covers flow measurement techniques, gradually varied flow analysis, and the design of open channels. Understanding these principles is vital for effective water resource management and engineering applications.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Generative AI Use cases applications solutions and implementation.pdf
major documentation(Telecom churn Based on ML).docx
1. 1
A MAJOR PROJECT REPORT ON
“TELECOM CHURN BASED ON ML”
In partial fulfillment of the requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
Hemanth Pasula D Vedanth G.Yashwanth venkat Samrat
16911A1246 16911A1210 16911A1217
Under the Esteemed Guidance of
M.Suresh Babu
Assistant. Professor
DEPARTMENT OF INFORMATION TECHNOLOGY
VIDYA JYOTHI INSTITUTE OF TECHNOLOGY
(Accredited by NBA, Approved by AICTE, Autonomous VJIT Hyderabad)
Aziz Nagar Gate, C.B.Post, Chilkur Road, Hyderabad – 500075
2019 - 2020
2. 2
VIDYA JYOTHI INSTITUTE OF TECHNOLOGY
(Accredited by NBA, Approved by AICTE, Autonomous VJIT Hyderabad)
Aziz Nagar Gate, C.B.Post, Chilkur Road, Hyderabad - 500075
DEPARTMENT OF INFORMATION TECHNOLOGY
CERTIFICATE
This is to certify that the Project Report on “TELECOM CHURN BASED ON ML” is a
bonafide work by D Vedanth(16911A1210), Hemanth Pasula(16911A1246),
G.Yashwanth venkat Samrat (16911A1217) in partial fulfillment of the requirement for
the award of the degree of Bachelor of Technology in “INFORMATION
TECHNOLOGY” VJIT Hyderabad during the year 2019 - 2020.
Project Guide Head of the department
M.Suresh Babu, Mr. B. Srinivasulu,
M.Tech, M.E.,
Assistant Professor. Professor.
External Examiner
3. 3
VIDYA JYOTHI INSTITUTE OF TECHNOLOGY
(Accredited by NBA, Approved by AICTE, Autonomous VJIT Hyderabad)
Aziz Nagar Gate, C.B.Post, Chilkur Road, Hyderabad - 500075
2019 - 2020
DECLARATION
We, D Vedanth(16911A1210), Hemanth Pasula(16911A1246), G.Yashwanth
venkat Samrat (16911A1217) hereby declare that Project Report entitled “TELECOM
CHURN BASED ON ML”, is submitted in the partial fulfillment of the requirement for
the award of Bachelor of Technology in Information Technology to Vidya Jyothi
Institute of Technology, Autonomous VJIT - Hyderabad, is an authentic work and has not
been submitted to any other university or institute for the degree.
D Vedanth (16911A1210)
Hemanth Pasula (16911A1246)
G.Yashwanth venkat Samrat(16911A1217)
4. 4
ACKNOWLEDGEMENT
It is a great pleasure to express our deepest sense of gratitude and indebtedness to
our internal guide M.Suresh Babu, Assistant Professor, Department of IT, VJIT, for having
been a source of constant inspiration, precious guidance and generous assistance during the
project work. We deem it as a privilege to have worked under her able guidance. Without her
close monitoring and valuable suggestions this work wouldn’t have taken this shape. We feel
that this help is un-substitutable and unforgettable.
We wish to express our sincere thanks to Dr. P. Venugopal Reddy, Director VJIT, for
providing the college facilities for the completion of the project. We are profoundly thankful to
Mr.B.Srinivasulu, Professor and Head of Department of IT, for his cooperation and
encouragement. Finally, we thank all the faculty members, supporting staff of IT Department and
friends for their kind co-operation and valuable help for completing the project.
TEAM
MEMBERS
D Vedanth (16911A1210)
Hemanth Pasula (16911A1246)
G.Yashwanth venkat Samrat (16911A1217)
6. 6
6. TESTING & VALIDATION
6.1 Introduction 78
6.2 Design of test cases and scenarios 88
6.3 Validation 88
6.4 Conclusion 90
7. CONCLUSION AND FUTURE-ENHANCEMENTS :
First Paragraph - Project Conclusion 90
Second Paragraph - Future enhancement
90
REFERENCES: 91
1. Author Name, Title of Paper/ Book, Publisher’s Name, Year of
publication
2. Full URL Address
7. 7
LIST OF FIGURES PAGE NO
Fig-1 UML Views 28
Fig-2 Merging data sets 50
Fig-3 Inputting missing value 52
Fig-4 Diving into training set 52
Fig-5 Correlation Matrix 53
Fig-6 Histogram 55
Fig-7 Unit testing life cycle 79
Fig-8 White box testing 80
Fig-9 Black box testing 82
Fig-10 Grey box testing 83
Fig-11 Grey box testing feature 84
Fig-12 Software verification 89
8. 8
LIST OF OUTPUT SCREEN SHOTS PAGE NO
Fig-1 Structure of data 71
Fig-2 Histogram 72
Fig-3 Box plot 72
Fig-4 Graph plot 73
Fig-5 Conversion of data types 73
Fig-6 Missing value 74
Fig-7 Converting the data sets 1 and 0 74
Fig-8 Naive Bayesian 75
Fig-9 Optimal cut-off point 75
Fig-10 Correlation matrix 76
Fig-11 ROC curve 76
Fig-12 Customer who stay 77
Fig-13 Customer who leave 77
9. 9
Chapter-1
Introduction
1.1 Motivation
In the present project we are trying to help the service providers to not to lose their
customers. With the help of this we are better able to understand the customers and let the
service providers know their customers.
1.2 Problem definition
Telecommunication is the exchange of signs, signals, messages, words, writings, images and
sounds or information of any nature by wire, radio, optical or
other electromagnetic systems. Telecommunication occurs when the exchange
of information between communication participants includes the use of technology. It is transmitted
through a transmission medium, such as over physical media, for example, over electrical cable, or
via electromagnetic radiation through space such as radio or light. Such transmission paths are often
divided into communication channels which afford the advantages of multiplexing. Since
the Latin term communication is considered the social process of information exchange, the
term telecommunications is often used in its plural form because it involves many different
technologies.
The churn rate, also known as the rate of attrition or customer churn, is the rate at
which customers stop doing business with an entity. It is most commonly expressed as the
percentage of service subscribers who discontinue their subscriptions within a given time
period. It is also the rate at which employees leave their jobs within a certain period. For a
company to expand its clientele, its growth rate (measured by the number of new customers)
must exceed its churn rate.
Machine learning (ML) is the study of computer algorithms that improve automatically
through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build
a mathematical model based on sample data, known as "training data", in order to make predictions
or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a
wide variety of applications, such as email filtering and computer vision, where it is difficult or
infeasible to develop conventional algorithms to perform the needed tasks.
1.3 Objective of Project
1.3.1 General Objective
Customer attrition, also known as customer churn, customer turnover, or customer
defection, is the loss of clients or customers.
Telephone service companies, Internet service providers, pay TV companies, insurance
firms, and alarm monitoring services, often use customer attrition analysis and customer
attrition rates as one of their key business metrics because the cost of retaining an existing
customer is far less than acquiring a new one. Companies from these sectors often have
customer service branches which attempt to win back defecting clients, because recovered
long-term customers can be worth much more to a company than newly recruited clients.
1.3.2 Specific Objective
10. 10
Companies usually make a distinction between voluntary churn and involuntary
churn. Voluntary churn occurs due to a decision by the customer to switch to another
company or service provider, involuntary churn occurs due to circumstances such as a
customer's relocation to a long-term care facility, death, or the relocation to a distant
location. In most applications, involuntary reasons for churn are excluded from the analytical
models. Analysts tend to concentrate on voluntary churn, because it typically occurs due to
factors of the company-customer relationship which companies control, such as how billing
interactions are handled or how after-sales help is provided.
predictive analytics use churn prediction models that predict customer churn by assessing
their propensity of risk to churn. Since these models generate a small prioritized list of
potential defectors, they are effective at focusing customer retention marketing programs on
the subset of the customer base who are most vulnerable to churn.
1.4 Limitations of Project
Acquiring the data of a customer is difficult. In the present days it is very difficult to
know what a person is thinking. So, we are only able to judge with the help of data which is
in front of us. Which can be only 80-90% reliable. So, we should make shore that our
predictions are good.
11. 11
Chapter-2
LITERATURE SURVEY
2.1 Introduction
LOGISTIC REGRESSION
Problem Statement :
"You have a telecom firm which has collected data of all its customers"
The main types of attributes are :
1.Demographics (age, gender etc.)
2.Services availed (internet packs purchased, special offers etc)
3.Expenses (amount of recharge done per month etc.)
Based on all this past information, you want to build a model which will predict whether a
particular customer will churn or not.
So the variable of interest, i.e. the target variable here is ‘Churn’ which will tell us whether
or not a particular customer has churned. It is a binary variable 1 means that the customer
has churned and 0 means the customer has not churned.
With 21 predictor variables we need to predict whether a particular customer will switch to
another telecom provider or not.
The data sets were taken from the Kaggle website.
DATA PROCEDURE
Import the required libraries
1. Importing all datasets
2. Merging all datasets based on condition ("customer_id ")
3. Data Cleaning - checking the null values
4. Check for the missing values and replace them
5. Model building
• Binary encoding
• One hot encoding
• Creating dummy variables and removing the extra columns
6. Feature selection using RFE - Recursive Feature Elimination
7. Getting the predicted values on train set
8. Creating a new column predicted with 1 if churn > 0.5 else 0
9. Create a confusion matrix on train set and test
10. Check the overall accuracy
12. 12
Chapter-3
ANALYSIS
3.1 Introduction
The field of chemistry uses analysis in at least three ways: to identify the components
of a particular chemical compound (qualitative analysis), to identify the proportions of
components in a mixture (quantitative analysis), and to break down chemical processes and
examine chemical reactions between elements of matter. For an example of its use, analysis
of the concentration of elements is important in managing a nuclear reactor, so nuclear
scientists will analyse neutron activation to develop discrete measurements within vast
samples. A matrix can have a considerable effect on the way a chemical analysis is
conducted and the quality of its results. Analysis can be done manually or with a device.
Chemical analysis is an important element of national security among the major world
powers with materials.
3.2 Software Requirement Specification
3.2.1 Functional requirements
Functional requirements may involve calculations, technical details, data
manipulation and processing, and other specific functionality that define what a system is
supposed to accomplish. Behavioural requirements describe all the cases where the system
uses the functional requirements, these are captured in use cases. Functional requirements are
supported by non-functional requirements (also known as "quality requirements"), which
impose constraints on the design or implementation (such as performance requirements,
security, or reliability). Generally, functional requirements are expressed in the form "system
must do requirement," while non-functional requirements take the form "system shall be
requirement." The plan for implementing functional requirements is detailed in the system
design, whereas non-functional requirements are detailed in the system architecture.
Customer data It contains all data related to customer’s services and contract
information. In addition to all offers, packages, and services subscribed to by the customer.
Furthermore, it also contains information generated from CRM system like (all customer
GSMs, Type of subscription, birthday, gender, the location of living and more ).
Network logs data Contains the internal sessions related to internet, calls, and SMS
for each transaction in Telecom operator, like the time needed to open a session for the
internet and call ending status. It could indicate if the session dropped due to an error in the
internal network.
Call details records “CDRs” Contain all charging information about calls, SMS,
MMS, and internet transaction made by customers. This data source is generated as text files.
This data has a large size and there is a lot of detailed information about it. We spent a lot of
time to understand it and to know its sources and storing format. In addition to these records,
the data must be linked to the detailed data stored in relational databases that contain detailed
information about the customer.
13. 13
The Quality of the System is maintained in such a way so that it can be very user
friendly to all the users.
The software quality attributes are assumed as under:
1.Accurate and hence reliable.
In simpler terms, given a set of data points from repeated measurements of the same
quantity, the set can be said to be accurate if their average is close to the true value of the
quantity being measured, while the set can be said to be precise if the values are close to
each other. In the first, more common definition of "accuracy" above, the two concepts are
independent of each other, so a particular set of data can be said to be either accurate, or
precise, or both, or neither.
In everyday language, we use the word reliable to mean that something is dependable
and that it will give behave predictably every time.
2.Secured.
Security is freedom from, or resilience against, potential harm caused by others.
Beneficiaries ... the English language in the 16th century. It is derived from Latin
secures, meaning freedom from anxiety: se (without) + cura (care, anxiety).
3.Fast speed.
In everyday use and in kinematics, the speed of an object is the magnitude of the
change of its position; it is thus a scalar quantity. The average speed of an object in an
interval of time is the distance travelled by the object divided by the duration of the
interval; the instantaneous speed is the limit of the average speed as the duration of the time
interval approaches zero.
4.Compatibility.
a state in which two things are able to exist or occur together without problems or
conflict.
3.2.2 Non Functional requirements
14. 14
In systems engineering and requirements engineering, a non-functional
requirement (NFR) is a requirement that specifies criteria that can be used to judge the
operation of a system, rather than specific behaviours. They are contrasted with functional
requirements that define specific behaviour or functions. The plan for
implementing functional requirements is detailed in the system design. The plan for
implementing non-functional requirements is detailed in the system architecture, because
they are usually architecturally significant requirements.
3.2.3 Software requirement
1.Python
Introduction
Python is an interpreted, high-level, general-purpose programming language. Created
by Guido van Rossum and first released in 1991, Python's design philosophy
emphasizes code readability with its notable use of significant whitespace. Its language
constructs and object-oriented approach aim to help programmers write clear, logical code
for small and large-scale projects.
Python is dynamically typed and garbage-collected. It supports multiple programming
paradigms, including structured (particularly, procedural), object-oriented, and functional
programming. Python is often described as a "batteries included" language due to its
comprehensive standard library.
Python was conceived in the late 1980s as a successor to the ABC language. Python 2.0,
released in 2000, introduced features like list comprehensions and a garbage collection
system capable of collecting reference cycles. Python 3.0, released in 2008, was a major
revision of the language that is not completely backward-compatible, and much Python 2
code does not run unmodified on Python 3.
The Python 2 language was officially discontinued in 2020 (first planned for 2015), and
"Python 2.7.18 is the last Python 2.7 release and therefore the last Python 2 release." No
more security patches or other improvements will be released for it.[31][32]
With Python
2's end-of-life, only Python 3.5.x and later are supported.
Python interpreters are available for many operating systems. A global community of
programmers develops and maintains CPython, an open source reference implementation.
A non-profit organization, the Python Software Foundation, manages and directs resources
for Python and CPython development.
History of Python
Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde
& Informatica (CWI) in the Netherlands as a successor to the ABC language (itself inspired
by SETL), capable of exception handling and interfacing with the Amoeba operating
system. Its implementation began in December 1989. Van Rossum shouldered sole
responsibility for the project, as the lead developer, until 12 July 2018, when he announced
his "permanent vacation" from his responsibilities as Python's Benevolent Dictator For Life,
a title the Python community bestowed upon him to reflect his long-term commitment as the
15. 15
project's chief decision-maker. He now shares his leadership as a member of a five-person
steering council. In January 2019, active Python core developers elected Brett Cannon, Nick
Coghlan, Barry Warsaw, Carol Willing and Van Rossum to a five-member "Steering
Council" to lead the project.
Python 2.0 was released on 16 October 2000 with many major new features, including
a cycle-detecting garbage collector and support for Unicode.
Python 3.0 was released on 3 December 2008. It was a major revision of the language that is
not completely backward-compatible. Many of its major features were back ported to Python
2.6.x[
and 2.7.x version series. Releases of Python 3 include the 2to3 utility, which
automates (at least partially) the translation of Python 2 code to Python 3.
Python 2.7's end-of-life date was initially set at 2015 then postponed to 2020 out of concern
that a large body of existing code could not easily be forward-ported to Python 3.
2.Anaconda 3
Introduction
Anaconda is a free and open-source distribution of the Python and R programming
languages for scientific computing (data science, machine learning applications, large-scale
data processing, predictive analytics, etc.), that aims to simplify package management and
deployment. The distribution includes data-science packages suitable for Windows, Linux,
and macOS. It is developed and maintained by Anaconda, Inc., which was founded by Peter
Wang and Travis Oliphant in 2012. As an Anaconda, Inc. product, it is also known
as Anaconda Distribution or Anaconda Individual Edition, while other products from the
company are Anaconda Team Edition and Anaconda Enterprise Edition, which are both not
free.
Package versions in Anaconda are managed by the package management system conda. This
package manager was spun out as a separate open-source package as it ended up being useful
on its own and for other things than Python. There is also a small, bootstrap version of
Anaconda called Miniconda, which includes only conda, Python, the packages they depend
on, and a small number of other packages.
History of Anaconda 3
Anaconda distribution comes with 1,500 packages selected from PyPI as well as
the conda package and virtual environment manager. It also includes a GUI, Anaconda
Navigator[
, as a graphical alternative to the command line interface (CLI).
The big difference between conda and the pip package manager is in how package
dependencies are managed, which is a significant challenge for Python data science and the
reason conda exists.
When pip installs a package, it automatically installs any dependent Python packages
without checking if these conflict with previously installed packages.
It will install a package
and any of its dependencies regardless of the state of the existing installation.
Because of this,
a user with a working installation of, for example, Google Tensorflow, can find that it stops
working having used pip to install a different package that requires a different version of the
16. 16
dependent numpy library than the one used by Tensorflow. In some cases, the package may
appear to work but produce different results in detail.
In contrast, conda analyses the current environment including everything currently installed,
and, together with any version limitations specified (e.g. the user may wish to have
Tensorflow version 2,0 or higher), works out how to install a compatible set of
dependencies, and shows a warning if this cannot be done.
Open source packages can be individually installed from the Anaconda repository, Anaconda
Cloud (anaconda.org), or the user's own private repository or mirror, using the command.
Anaconda, Inc. compiles and builds the packages available in the Anaconda repository itself,
and provides binaries for Windows 32/64 bit, Linux 64 bit and MacOS 64-bit. Anything
available on PyPI may be installed into a conda environment using pip, and conda will keep
track of what it has installed itself and what pip has installed.
Custom packages can be made using the command, and can be shared with others by
uploading them to Anaconda Cloud, PyPI or other repositories.
The default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python
3.7. However, it is possible to create new environments that include any version of Python
packaged with conda
Anaconda distribution comes with 1,500 packages selected from PyPI as well as the conda
package and virtual environment manager. It also includes a GUI, Anaconda Navigator, as a
graphical alternative to the command line interface (CLI).
Here we define conda as analyzes your current environment, everything you have installed,
any version limitations you specify (e.g. you only want tensor flow >= 2.0) and figures out
how to install compatible dependencies. Or it will tell you that what you want can't be done.
pip, by contrast, will just install the package you specify and any dependencies, even if that
breaks other packages.
Conda allows users to easily install different versions of binary software packages and any
required libraries appropriate for their computing platform. Also, it allows users to switch
between package versions and download and install updates from a software repository.
Conda is written in the Python programming language, but can manage projects containing
code written in any language (e.g., R), including multi-language projects. Conda can
install Python, while similar Python-based cross-platform package managers (such
as wheel or pip) cannot.
A popular conda channel for bioinformatics software is Bioconda, which provides multiple
software distributions for computational biology. In fact, the conda package and environment
manager is included in all versions of Anaconda, Miniconda and Anaconda Repository.
The big difference between conda and the pip package manager is in how package
dependencies are managed, which is a significant challenge for Python data science and the
reason conda exists.
When pip installs a package, it automatically installs any dependent Python packages
without checking if these conflict with previously installed packages. It will install a package
and any of its dependencies regardless of the state of the existing installation because of this,
a user with a working installation of, for example, Google Tensorflow, can find that it stops
working having used pip to install a different package that requires a different version of the
dependent numpy library than the one used by Tensorflow. In some cases, the package may
appear to work but produce different results in detail.
17. 17
In contrast, conda analyses the current environment including everything currently installed,
and, together with any version limitations specified (e.g. the user may wish to have
Tensorflow version 2,0 or higher), works out how to install a compatible set of
dependencies, and shows a warning if this cannot be done.
Open source packages can be individually installed from the Anaconda repository, Anaconda
Cloud (anaconda.org), or your own private repository or mirror, using the cond
install command. Anaconda Inc compiles and builds the packages available in the Anaconda
repository itself, and provides binaries for Windows 32/64 bit, Linux 64 bit and MacOS 64-
bit. Anything available on PyPI may be installed into a conda environment using pip, and
conda will keep track of what it has installed itself and what pip has installed.
Custom packages can be made using the conda build command, and can be shared with
others by uploading them to Anaconda Cloud,PyPI
or other repositories.
The default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python
3.7. However, it is possible to create new environments that include any version of Python
packaged with conda.
Anaconda Navigator
Anaconda Navigator is a desktop graphical user interview included in Anaconda distribution
that allows users to launch applications and manage conda packages, environments and
channels without using. Navigator can search for packages on Anaconda Cloud or in a local
Anaconda Repository, install them in an environment, run the packages and update them. It
is available for window, micrOS and LINUX.
The following applications are available by default in Navigator.
Conda is an open source language-agnostic and environment management system that
installs, runs, and updates packages and their dependencies. It was created for Python
programs, but it can package and distribute software for any language (e.g.,R) .where R is
a pograming language and free software environment for statistical computing and graphics
supported by the R Foundation for Statistical Computing.The R language is widely used
among statisticians and data miners for developing statistical software and data analysis
.Polls, data mining surveys, and studies of scholarly literature databases show substantial
increases in popularity as of February 2020, R ranks 13th in the TIOBE index, a measure of
popularity of programming languages.
A GNU package, the official R software environment is written primarily in C, FORTRAN,
and R itself (thus, it is partially self-hosting) and is freely available under the GNU General
Public License. Pre-compiled executables are provided for various operating systems.
Although R has a command line interface. There are several third-party graphical user
interfaces, such as RStudio, an integrated development environment, and Jupyter, a notebook
interface, including multi-language projects. The conda package and environment manager is
included in all versions of Anaconda, Miniconda, and Anaconda Repository.
Anaconda cloud:
Anaconda Cloud is a package management service by Anaconda where you can find, access,
store and share public and private notebooks, environments, and conda and PyPI
packages. Cloud hosts useful Python packages, notebooks and environments for a wide
18. 18
variety of applications. You do not need to log in or to have a Cloud account, to search for
public packages, download and install them.
Project jyupter:
Project Jupyter is a nonprofit organization created to "develop open-source software, open-
standards, and services for interactive computing across dozens of programming languages".
Spun-off from IPython in 2014 by Fernando Pérez, Project Jupyter supports execution
environments in several dozen languages. Project Jupyter's name is a reference to the three
core programming languages supported by Jupyter, which are Julia, Python and R, and also a
homage to Galileo’s notebooks recording the discovery of the moons of Jupiter Project
Jupyter has developed and supported the interactive computing products Jupyter Notebook,
JupyterHub, and JupyterLab, the next-generation version of Jupyter Notebook.
In 2014, Fernando Pérez announced a spin-off project from IPython called Project
Jupyter IPython continued to exist as a Python shell and a kernel for Jupyter, while
the notebook and other language-agnostic parts of IPython moved under the Jupyter
name Jupyter is language agnostic and it supports execution environments (aka kernels) in
several dozen languages among which are Julia, R, Haskell, Ruby, and of course Python (via
the IPython kernel)
In 2015, GitHub and the Jupyter Project announced native rendering of Jupyter notebooks
file format (.ipynb files) on the GitHub platform.
Jupyter Notebook
Jupyter Notebook(formerly IPython Notebooks) is a web based computational environment
for creating Jupyter notebook documents. The "notebook" term can colloquially make
reference to many different entities, mainly the Jupyter web application, Jupyter Python web
server, or Jupyter document format depending on context. A Jupyter Notebook document is a
JSON document, following a versioned schema, and containing an ordered list of
input/output cells which can contain code, text, mathematics, plots and rich media, usually
ending with the ".ipynb" extension.
A Jupyter Notebook can be converted to a number of open source output formats
(HTML,PRESENTATION SKILLS,PDF ,markdown,python) through "Download As" in the
web interface, via the nbconvert library or "jupyter nbconvert" command line interface in a
shell. To simplify visualisation of Jupyter notebook documents on the web, the library is
provided as a service through NbViewer which can take a URL to any publicly available
notebook document, convert it to HTML on the fly and display it to the user.
Jupyter Notebook interface
Jupyter Notebook can connect to many kernels to allow programming in many languages.
By default Jupyter Notebook ships with the IPython kernel. As of the 2.3 release(October
2014), there are currently 49 Jupyter-compatible kernels for many programming languages,
including Python, R, Julia and Haskell
The Notebook interface was added to IPython in the 0.12 release (December 2011), renamed
to Jupyter notebook in 2015 (IPython 4.0 – Jupyter 1.0). Jupyter Notebook is similar to the
notebook interface of other programs such as Maple, Mathematica, and SageMath, a
computational interface style that originated with Mathematica in the 1980s.
According
to The Atlantic, Jupyter interest overtook the popularity of the Mathematica notebook
interface in early 2018.
19. 19
Jupyter kernels
A Jupyter kernel is a program responsible for handling various types of requests (code
execution, code completions, inspection), and providing a reply. Kernels talk to the other
components of Jupyter using ZeroMQ over the network, and thus can be on the same or
remote machines. Unlike many other Notebook-like interfaces, in Jupyter, kernels are not
aware that they are attached to a specific document, and can be connected to many clients at
once. Usually kernels allow execution of only a single language, but there are a couple of
exceptions.
By default Jupyter ships with IPython as a default kernel and a reference implementation via
the ipykernel wrapper. Kernels for many languages having varying quality and features
are available.
JupyterHub
JupyterHub is a multi-user server for Jupyter Notebooks. It is designed to support many users
by spawning, managing, and proxying many singular Jupyter Notebook servers. While
JupyterHub requires managing servers, third-party services like Jupyo provide an alternative
to JupyterHub by hosting and managing multi-user Jupyter notebooks in the cloud.
JupyterLab
JupyterLab is the next-generation user interface for Project Jupyter. It offers all the familiar
building blocks of the classic Jupyter Notebook (notebook, terminal, text editor, file browser,
rich outputs, etc.) in a flexible and powerful user interface. The first stable release was
announced on February 20, 2018.
Web application
It has also added user- as well as system-based web applications enhancement to add support
for deployment across the variety of environments. It also tries to manage sessions as well as
applications across the network.
Tomcat is building additional components. A number of additional components may be used
with Apache Tomcat. These components may be built by users should they need them or
they can be downloaded from one of the mirrors.
Features:
Tomcat 7.x implements the Servlet 3.0 and JSP 2.2 specifications. It requires Java version
1.6, although previous versions have run on Java 1.1 through 1.5. Versions 5 through 6 saw
improvements in garbage collection, JSP parsing, performance and scalability. Native
wrappers, known as "Tomcat Native", are available for Microsoft Windows and Unix for
platform integration, Tomcat 8.x implements the Servlet 3.1 and JSP 2.3 Specifications.
Apache Tomcat 8.5.x is intended to replace 8.0.x and includes new features pulled forward
from Tomcat 9.0.x. The minimum Java version and implemented specification versions
remain unchanged.
WINDOWS 10
20. 20
Windows 10 is a series of personal computer operating systems produced by Microsoft as
part of its Windows NT family of operating systems. It is the successor to Windows 8.1, and
was released to manufacturing on July 15, 2015, and broadly released for retail sale on July
29, 2015. Windows 10 receives new builds on an ongoing basis, which are available at no
additional cost to users, in addition to additional test builds of Windows 10 which are
available to Windows Insiders. Devices in enterprise environments can receive these updates
at a slower pace, or use long-term support milestones that only receive critical updates, such
as security patches, over their ten-year lifespan of extended support.
One of Windows 10's most notable features is support for universal apps, an expansion of
the Metro-style apps first introduced in Windows 8. Universal apps can be designed to run
across multiple Microsoft product families with nearly identical code—
including PCs, tablets, smartphones, embedded systems, Xbox One, Surface Hub and Mixed
Reality. The Windows user interface was revised to handle transitions between a mouse-
oriented interface and a touchscreen-optimized interface based on available input devices—
particularly on 2-in-1 PCs, both interfaces include an updated Start menu which incorporates
elements of Windows 7's traditional Start menu with the tiles of Windows 8. Windows 10
also introduced the Microsoft Edge web browser, a virtual desktop system, a window and
desktop management feature called Task View, support for fingerprint and face
recognition login, new security features for enterprise environments, and DirectX 12.
Windows 10 received mostly positive reviews upon its original release in July 2015. Critics
praised Microsoft's decision to provide a desktop-oriented interface in line with previous
versions of Windows, contrasting the tablet-oriented approach of 8, although Windows 10's
touch-oriented user interface mode was criticized for containing regressions upon the touch-
oriented interface of Windows 8. Critics also praised the improvements to Windows 10's
bundled software over Windows 8.1, Xbox Live integration, as well as the functionality and
capabilities of the Cortana personal assistant and the replacement of Internet Explorer with
Edge. However, media outlets have been critical of changes to operating system behaviours,
including mandatory update installation, privacy concerns over data collection performed by
the OS for Microsoft and its partners and the adware-like tactics used to promote the
operating system on its release.
Although Microsoft's goal to have Windows 10 installed on over a billion devices within
three years of its release had failed, it still had an estimated usage share of 60% of all the
Windows versions on traditional PCs, and thus 47% of traditional PCs were running
Windows 10 by September 2019. Across all platforms (PC, mobile, tablet and console), 35%
of devices run some kind of Windows, Windows 10 or older.
Microsoft Excel
MicrosoftExcel isa spreadsheet developedby Microsoft for Windows, macOS, Androi
d and iOS. It features calculation, graphing tools, pivot tables, and a macro programming
language called Visual Basic for Applications. It has been a very widely applied spreadsheet
21. 21
for these platforms, especially since version 5 in 1993, and it has replaced Lotus 1-2-3 as the
industry standard for spreadsheets. Excel forms part of the Microsoft Office suite of
software.
Number of rows and columns Versions of Excel up to 7.0 had a limitation in the size of their
data sets of 16K (214
= 16384) rows. Versions 8.0 through 11.0 could handle 64K
(216
= 65536) rows and 256 columns (28
as label 'IV'). Version 12.0 onwards, including the
current Version 16.x, can handle over 1M (220
= 1048576) rows, and 16384 (214
as label
'XFD') columns.
Microsoft Excel up until 2007 version used a proprietary binary file format called Excel
Binary File Format (.XLS) as its primary format. Excel 2007 uses Office Open XML as its
primary file format, an XML-based format that followed after a previous XML-based format
called "XML Spreadsheet" ("XMLSS"), first introduced in Excel 2002.
Although supporting and encouraging the use of new XML-based formats as replacements,
Excel 2007 remained backwards-compatible with the traditional, binary formats. In addition,
most versions of Microsoft Excel can read CSV, DBF, SYLK, DIF, and other legacy
formats. Support for some older file formats was removed in Excel 2007. The file formats
were mainly from DOS-based programs.
Microsoft originally marketed a spreadsheet program called Multiplan in 1982. Multiplan
became very popular on CP/M systems, but on MS-DOS systems it lost popularity to Lotus
1-2-3. Microsoft released the first version of Excel for the Macintosh on September 30,
1985, and the first Windows version was 2.05 (to synchronize with the Macintosh version
2.2) in November 1987. Lotus was slow to bring 1-2-3 to Windows and by the early 1990s,
Excel had started to outsell 1-2-3 and helped Microsoft achieve its position as a leading PC
software developer. This accomplishment solidified Microsoft as a valid competitor and
showed its future of developing GUI software. Microsoft maintained its advantage with
regular new releases, every two years or so.
Excel 2.0 is the first version of Excel for the Intel platform. Versions prior to 2.0 were only
available on the Apple Macintosh.
Excel 2.0 (1987) The first Windows version was labeled "2" to correspond to the Mac
version. This included a run-time version of Windows.
BYTE in 1989 listed Excel for Windows as among the "Distinction" winners of the BYTE
Awards. The magazine stated that the port of the "extraordinary" Macintosh version
"shines", with a user interface as good as or better than the original.
Excel 3.0 (1990) Included toolbars, drawing capabilities, outlining, add-in support, 3D
charts, and many more new features.[
Also, an easter egg in Excel 4.0 reveals a hidden animation of a dancing set of numbers 1
through 3, representing Lotus 1-2-3, which was then crushed by an Excel logo.[81]
22. 22
Excel 5.0 (1993) With version 5.0, Excel has included Visual Basic for Applications (VBA),
a programming language based on Visual Basic which adds the ability to automate tasks in
Excel and to provide user-defined functions (UDF) for use in worksheets. VBA is a powerful
addition to the application and includes a fully featured integrated development
environment (IDE). Macro recording can produce VBA code replicating user actions, thus
allowing simple automation of regular tasks. VBA allows the creation of forms and
in-worksheet controls to communicate with the user. The language supports use (but not
creation) of ActiveX (COM) DLL's; later versions add support for class modules allowing
the use of basic object-oriented programming techniques.
The automation functionality provided by VBA made Excel a target for macro viruses. This
caused serious problems until antivirus products began to detect these
viruses. Microsoft belatedly took steps to prevent the misuse by adding the ability to disable
macros completely, to enable macros when opening a workbook or to trust all macros signed
using a trusted certificate.
Versions 5.0 to 9.0 of Excel contain various Easter eggs, including a "Hall of Tortured
Souls", although since version 10 Microsoft has taken measures to eliminate such
undocumented features from their products.
5.0 was released in a 16-bit x86 version for Windows 3.1 and later in a 32-bit version for NT
3.51 (x86/Alpha/PowerPC)
Included in Office 2007. This release was a major upgrade from the previous version.
Similar to other updated Office products, Excel in 2007 used the new Ribbon menu system.
This was different from what users were used to, and was met with mixed reactions. One
study reported fairly good acceptance by users except highly experienced users and users of
word processing applications with a classical WIMP interface, but was less convinced in
terms of efficiency and organization. However, an online survey reported that a majority of
respondents had a negative opinion of the change, with advanced users being "somewhat
more negative" than intermediate users, and users reporting a self-estimated reduction in
productivity.
Added functionality included the SmartArt set of editable business diagrams. Also added
was an improved management of named variables through the Name Manager, and much-
improved flexibility in formatting graphs, which allow (x, y) coordinate labeling and lines of
arbitrary weight. Several improvements to pivot tables were introduced.
Also like other office products, the Office Open XML file formats were introduced,
including .xlsm for a workbook with macros and .xlsx for a workbook without macros.
Specifically, many of the size limitations of previous versions were greatly increased. To
illustrate, the number of rows was now 1,048,576 (220
) and columns was 16,384 (214
; the far-
right column is XFD). This changes what is a valid A1 reference versus a named range. This
version made more extensive use of multiple cores for the calculation of spreadsheets;
however, VBA macros are not handled in parallel and XLL add-ins were only executed in
parallel if they were thread-safe and this was indicated at registration.
3.2.4 Hardware requirement
23. 23
Server-side hardware
1.Hardware recommended by all software needed
2.Communication hardware to serve client requests
Client-side hardware
1.Hardware recommended by respective client’s operating system and web browser.
2.Communication hardware to communicate the server,
Others
An INTEL Core 2 Duo CPU, a RAM of 4 GB (minimum), a Hard Disk of any GB, a 105
keys keyboard, an Optical Mouse and Color Display are required.
Operating System
As our system will be platforms independent; so, there is no constraints in case of operating
system in using this software. Windows and Linux operating systems are considered to be
extremely stable platforms. One of the advantages of using the Linux operating system is the
speed at which performance enhancements and security patches are made available since it is
an open source product.
Web Server
The available web server’s platforms are IIS and Apache. Since Apache is open source,
software security patches tend to be released faster. One of the downsides to IIS is its
vulnerability to virus attacks, something that Apache rarely has problems with. Apache
offers PHP; a language considered to be the C of the next part is the end user environment.
Before we have explained that we have assume that the end users are not so much aware
about the computer and they are not so much interested to divert their activities to the
automated system from the manual system.
Web Browser
Let's play word association, just like when a psychologist asks you what comes to mind
when you hear certain words: What do you think when you hear the words "Opera. Safari.
Chrome. Firefox."
If you think of the Broadway play version of "The Lion King," maybe it is time to see a
psychologist. However, if you said, "Internet browsers," you're spot on. That's because the
leading Internet Browsers are:
Google Chrome
Mozilla Firefox
Apple Safari
Microsoft Internet Explorer
24. 24
Microsoft Edge
Opera
Maxton
And that order pretty much lines up with how they're ranked in terms of market share and
popularity...today. Browsers come and go. Ten years ago, Netscape Navigator was a well-
known browser: Netscape is long gone today. Another, called Mosaic, is considered the first
modern browser—it was discontinued in 1997.
So, what exactly is a browser?
Definition:
A browser, short for web browser, is the software application (a program) that you're using
right now to search for, reach and explore websites. Whereas Excel is a program for
spreadsheets and Word® a program for writing documents, a browser is a program for
Internet exploring (which is where that name came from).
Browsers don't get talked about much. A lot of people simply click on the "icon" on our
computers that take us to the Internet—and that's as far as it goes. And in a way, that's
enough. Most of us simply get in a car and turn the key...we don't know what kind of engine
we have or what features it has...it takes us where we want to go. That's why when it comes
to computers:
There are some computer users that can't name more than one or two browsers
Many of them don't know they can switch to another browser for free
There are some who go to Google's webpage to "google" a topic and think that
Google is their browser.
26. 26
Chapter-4
DESIGN
4.1 Introduction
The UML may be used to visualize, specify, construct and document the artifacts of a
software-intensive system.
The UML is only a language and so is just one part of a software development method. The
UML is process independent, although optimally it should be used in a process that is use
case driven, architecture-centric, iterative, and incremental.
What is UML:
The Unified Modeling Language (UML) is a robust notation that we can use to build OOAD
models. It is called so, since it was the unification, of the ideas from different methodologies
from three amigos – Botch, Rumbaugh and Jacobson.
The UML is the standard language for visualizing, specifying, constructing, and
documenting the artifacts of a software-intensive system. It can be used with all processes,
throughout the development life cycle, and across different implementation technologies.
The UML combines the best from:
o Data Modeling concepts
o Business Modeling (work flow)
o Object Modeling
o Component Modeling
The UML may be used to:
o Display the boundary of a system and its major functions using use cases and
actors
o Illustrate use case realizations with interaction diagrams
o Represent a static structure of a system using class diagrams
o Model the behavior of objects with state transition diagrams
o Reveal the physical implementation architecture with component &
deployment diagrams
o Extend the functionality of the system with stereotypes
Evolution of UML
27. 27
One of the methods was the Object Modeling Technique (OMT), devised by James
Rumbaugh and others at General Electric. It consists of a series of models - use case, object,
dynamic, and functional that combines to give a full view of a system.
The Botch method was devised by Grady Booch and developed the practice of analyzing a
system as a series of views. It emphasizes analyzing the system from both a macro
development view and micro development view and it was accompanied by a very detailed
notation.
The Object-Oriented Software Engineering (OOSE) method was devised by Ivar Jacobson
and focused on the analysis of system behavior. It advocated that at each stage of the process
there should be a check to see that the requirements of the user were being met.
The UML is composed of three different parts:
1. Model elements 2. Diagrams 3. Views
Model Elements:
o The model elements represent basic object-oriented concepts such as classes, objects,
and relationships.
o Each model element has a corresponding graphical symbol to represent it in the
diagrams.
Diagrams:
o Diagrams portray different combinations of model elements.
o For example, the class diagram represents a group of classes and the relationships,
such as association and inheritance, between them.
o The UML provides nine types of diagram - use case, class, object, state chart,
sequence, collaboration, activity, component, and deployment
Views:
o Views provide the highest level of abstraction for analyzing the system.
o Each view is an aspect of the system that is abstracted to a number of related UML
diagrams.
o Taken together, the views of a system provide a picture of the system in its entirety.
o In the UML, the five main views of the system are
o User Structural Behavioral Implementation
Environment
28. 28
Fig-1 UML Views
In addition to model elements, diagrams, and views, the UML provides mechanisms for
adding comments, information, or semantics to diagrams. And it provides mechanisms to
adapt or extend itself to a particular method, software system, or organization.
Extensions of UML:
o Stereotypes can be used to extend the UML notational elements
o Stereotypes may be used to classify and extend associations, inheritance
relationships, classes, and components
Examples:
Class stereotypes : boundary, control, entity
Inheritance stereotypes : extend
Dependency stereotypes : uses
Component stereotypes : subsystem
Use Case Diagrams:
The use case diagram presents an outside view of the system
The use-case model consists of use-case diagrams.
o The use-case diagrams illustrate the actors, the use cases, and their relationships.
o Use cases also require a textual description (use case specification), as the visual
diagrams can't contain all of the information that is necessary.
o The customers, the end-users, the domain experts, and the developers all have an
input into the development of the use-case model.
o Creating a use-case model involves the following steps:
29. 29
1. defining the system
2. identifying the actors and the use cases
3. describing the use cases
4. defining the relationships between use cases and actors.
5. defining the relationships between the use cases
Use Case:
Definition: is a sequence of actions a system performs that yields an observable result of
value to a particular actor.
Use Case Naming:
Use Case should always be named in business terms, picking the words from the vocabulary
of the particular domain for which we are modeling the system. It should be meaningful to
the user because use case analysis is always done from user’s perspective. It will be usually
verbs or short verb phrases.
Use Case Specification shall document the following:
o Brief Description
o Precondition
o Main Flow
o Alternate Flow
o Exceptional flows
o Post Condition
o Special Requirements
Notation of use case
Actor:
Definition: someone or something outside the system that interacts with the
system
o An actor is external - it is not actually part of what we are building, but an interface
needed to support or use it.
o It represents anything that interacts with the system.
Notation for Actor
30. 30
Relation: Two important types of relation used in Use Case Diagram
Include: An Include relationship shows behavior that is common to
one or more use cases (Mandatory).
Include relation results when we extract the common sub flows and
make it a use case
Extend: An extend relationship shows optional behavior (Optional)
Extend relation results usually when we add a bit more specialized
feature to the already existing one, we say the use case B extends its
functionality to use case A.
A system boundary rectangle separates the system from the external actors.
An extend relationship indicates that one use case is a variation of another. Extend notation
is a dotted line, labeled <<extend>>, and with an arrow toward the base case. The extension
points, which determines when the extended case is appropriate, is written inside the base
case.
Class Diagrams:
Class diagram shows the existence of classes and their relationships in the structural view of
a system.
UML modeling elements in class diagram:
o Classes and their structure and behavior
o Relationships
Association
Aggregation
Composition
Dependency
Generalization / Specialization (inheritance relationships)
Multiplicity and navigation indicators
Role names
A class describes properties and behavior of a type of object.
o Classes are found by examining the objects in sequence and collaboration diagram
o A class is drawn as a rectangle with three compartments
o Classes should be named using the vocabulary of the domain Naming standards
should be created e.g., all classes are singular nouns starting with a capital letter
31. 31
o The behavior of a class is represented by its operations. Operations may be found by
examining interaction diagrams
o The structure of a class is represented by its attributes. Attributes may be found by
examining class definitions, the problem requirements, and by applying domain
knowledge
Notation:
Class information: visibility and scope
The class notation is a 3-piece rectangle with the class name, attributes, and operations.
Attributes and operations can be labeled according to access and scope.
Relationship:
Association:
o Association represents the physical or conceptual connection between two or more
objects
o An association is a bi-directional connection between classes
o An association is shown as a line connecting the related classes
Aggregation:
o An aggregation is a stronger form of relationship where the relationship is between a
whole and its parts
o It is entirely conceptual and does nothing more than distinguishing a ‘whole’ from
the ‘part’.
o It doesn’t link the lifetime of the whole and its parts
32. 32
o An aggregation is shown as a line connecting the related classes with a diamond next
to the class representing the whole
Composition:
Composition is a form of aggregation with strong ownership and coincident
lifetime as the part of the whole.
Multiplicity
o Multiplicity defines how many objects participate in relationships
o Multiplicity is the number of instances of one class related to ONE instance of the
other class
This table gives the most common multiplicities
For each association and aggregation, there are two multiplicity decisions to make:
one for each end of the relationship
Although associations and aggregations are bi-directional by default, it is often
desirable to restrict navigation to one direction
If navigation is restricted, an arrowhead is added to indicate the direction of the
navigation
33. 33
Dependency:
o A dependency relationship is a weaker form of relationship showing a relationship
between a client and a supplier where the client does not have semantic knowledge of
the supplier.
o A dependency is shown as a dashed line pointing from the client to the supplier.
Generalization / Specialization (inheritance relationships):
o It is a relationship between a general thing (called the super class or the
parent) and a more specific kind of that thing (called the subclass(es) or the
child).
o An association class is an association that also has class properties (or a class
has association properties)
o A constraint is a semantic relationship among model elements that specifies
conditions and propositions that must be maintained as true: otherwise, the
system described by the model is invalid.
34. 34
o An interface is a specifier for the externally visible operations of a class
without specification of internal structure. An interface is formally equivalent
to abstract class with no attributes and methods, only abstract operations.
o A qualifier is an attribute or set of attributes whose values serve to partition
the set of instances associated with an instance across an association.
Interaction Diagrams:
o Interaction diagrams is used to model the dynamic behavior of the system
o Interaction diagram helps us to identify the classes and its methods
o Interaction diagrams describe how use cases are realized as interactions
among objects
o Show classes, objects, actors and messages between them to achieve the
functionality of a Use Case
There are two types of interaction diagrams:
1. Sequence Diagram
2. Collaboration Diagram
35. 35
1. Sequence Diagram:
A sequence diagram simply depicts interaction between objects in a sequential order i.e. the
order in which these interactions take place. We can also use the terms event diagrams or
event scenarios to refer to a sequence diagram. Sequence diagrams describe how and in what
order the objects in a system function. These diagrams are widely used by businessmen and
software developers to document and understand requirements for new and existing systems.
Sequence Diagram Notations:
1. Actors – An actor in a UML diagram represents a type of role where it interacts with
the system and its objects. It is important to note here that an actor is always outside the
scope of the system we aim to model using the UML diagram.
We use actors to depict various roles including human users and other external
subjects. We represent an actor in a UML diagram using a stick person notation. We
can have multiple actors in a sequence diagram.
For example – Here the user in seat reservation system is shown as an actor where it
exists outside the system and is not a part of the system.
2. Lifelines – A lifeline is a named element which depicts an individual participant in a
sequence diagram. So basically, each instance in a sequence diagram is represented by
a lifeline. Lifeline elements are located at the top in a sequence diagram. The standard
in UML for naming a lifeline follows the following format – Instance Name: Class
36. 36
Name
3. Messages – Communication between objects is depicted using messages. The messages
appear in a sequential order on the lifeline. We represent messages using arrows.
Lifelines and messages form the core of a sequence diagram.
Messages can be broadly classified into the following
Synchronous messages – A synchronous message waits for a reply before the
interaction can move forward. The sender waits until the receiver has completed
the processing of the message. The caller continues only when it knows that the
37. 37
receiver has processed the previous message i.e. it receives a reply message. A
large number of calls in object-oriented programming are synchronous. We use a
solid arrow head to represent a synchronous message.
Asynchronous Messages – An asynchronous message does not wait for a reply
from the receiver. The interaction moves forward irrespective of the receiver
processing the previous message or not. We use a lined arrow head to represent
an asynchronous message.
Create message – We use a Create message to instantiate a new object in the
sequence diagram. There are situations when a particular message call requires
the creation of an object. It is represented with a dotted arrow and create word
labelled on it to specify that it is the create Message symbol.
For example – The creation of a new order on a e-commerce website would
require a new object of Order class to be created.
Delete Message – We use a Delete Message to delete an object. When an object
is deallocated memory or is destroyed within the system, we use the Delete
Message symbol. It destroys the occurrence of the object in the system. It is
represented by an arrow terminating with a x.
For example – In the scenario below when the order is received by the user, the
object of order class can be destroyed.
. Self-Message – Certain scenarios might arise where the object needs to send a
message to itself. Such messages are called Self Messages and are
represented with a U-shaped arrow
Reply Message – Reply messages are used to show the message being sent from
the receiver to the sender. We represent a return/reply message using an open
38. 38
arrowhead with a dotted line. The interaction moves forward only when a reply
message is sent by the receiver.
2. Collaboration diagram:
It shows the interaction of the objects and also the group of all messages sent or received
by an object. This allows us to see the complete set of services that an object must
provide.
The following collaboration diagram realizes a scenario of reserving a copy of book in a
library
Difference between the Sequence Diagram and Collaboration Diagram
o Sequence diagrams emphasize the temporal aspect of a scenario - they
focus on time.
o Collaboration diagrams emphasize the spatial aspect of a scenario - they focus
on how objects are linked.
Activity Diagram:
An activity diagram is essentially a fancy flowchart. Activity diagrams. An activity diagram
focuses on the flow of activities involved in a single process. The activity diagram shows
how those activities depend on one another.
1. initial State – The starting state before an activity takes place is depicted using the
initial state.
2. Action or Activity State – An activity represents execution of an action on objects or
by objects. We represent an activity using a rectangle with rounded corners. Basically
any action or event that takes place is represented using an activity.
3. Action Flow or Control flows – Action flows or Control flows are also referred to as
paths and edges. They are used to show the transition from one activity state to another.
39. 39
4. Decision node and Branching – When we need to make a decision before deciding the
flow of control, we use the decision node.
5. Guards – A Guard refers to a statement written next to a decision node on an arrow
sometimes within square brackets.
6.Fork – Fork nodes are used to support concurrent activities.
40. 40
Join – Join nodes are used to support concurrent activities converging into one. For join
notations we have two or more incoming edges and one outgoing edge.
6. Merge or Merge Event – Scenarios arise when activities which are not being executed
concurrently have to be merged. We use the merge notation for such scenarios. We can
merge two or more activities into one if the control proceeds onto the next activity
irrespective of the path chosen.
7. Swim lanes – We use swim lanes for grouping related activities in one column. Swim
lanes group related activities into one column or one row. Swim lanes can be vertical
and horizontal. Swim lanes are used to add modularity to the activity diagram. It is not
mandatory to use swim lanes. They usually give more clarity to the activity diagram.
It’s similar to creating a function in a program. It’s not mandatory to do so, but it is a
recommended practice.
41. 41
Component Diagram:
Describe organization and dependency between the software implementation components.
Components are distributable physical units - e.g. source code, object code.
Deployment Diagram:
Describe the configuration of processing resource elements and the mapping of software
implementation components onto them.
Contain components - e.g. object code, source code and nodes (e.g. printer, database, client
machine).
4.2 UML Diagram
49. 49
Chapter-5
IMPLEMENTATION METHODS & RESULTS
5.1 Introduction :
We implement our project through Anaconda 3 and jyupter when comes
to Anaconda Anaconda is a free and open source and distribution
the Python and R programming languages for scientific computing (data science, machine
learning applications, large-scale data processing, predictive analytics, etc.),that aims to
simplify package management and deployment. Package versions are managed by
the package management system conda .Anaconda distribution includes data-science
packages suitable for Windows, Linux, and mac OS.
Anaconda distribution comes with 1,500 packages selected from PyPI as well as the conda
package and virtual environment manager. It also includes a GUI, Anaconda Navigator, as
a graphical alternative to the command line interface (CLI).
Here we define conda as analyzes your current environment, everything you have installed,
any version limitations you specify (e.g. you only want tensor flow >= 2.0) and figures out
how to install compatible dependencies. Or it will tell you that what you want can't be done.
pip, by contrast, will just install the package you specify and any dependencies, even if that
breaks other packages.
Conda allows users to easily install different versions of binary software packages and any
required libraries appropriate for their computing platform. Also, it allows users to switch
between package versions and download and install updates from a software repository.
Conda is written in the Python programming language, but can manage projects containing
code written in any language (e.g., R), including mufti-language projects. Conda can
install Python, while similar Python-based cross-platform package managers (such
as wheel or pip) cannot.
A popular conda channel for bio informatics software is Bioconda, which provides multiple
software distributions for computational biology. In fact, the conda package and environment
manager is included in all versions of Anaconda, Miniconda and Anaconda Repository.
The big difference between conda and the pip package manager is in how package
dependencies are managed, which is a significant challenge for Python data science and the
reason conda exists.
When pip installs a package, it automatically installs any dependent Python packages
without checking if these conflict with previously installed packages. It will install a package
and any of its dependencies regardless of the state of the existing installation because of this,
a user with a working installation of, for example, Google Tensor flow, can find that it stops
working having used pip to install a different package that requires a different version of the
dependent numpy library than the one used by Tensor flow. In some cases, the package may
appear to work but produce different results in detail.
50. 50
5.2 Explanation of Key functions
1.Importing and Merging Data
Python code in one module gains access to the code in another module by the process
of importing it. The import statement is the most common way of invoking the import
machinery, but it is not the only way. Functions such as importlib.import_module() and
built-in can also be used to invoke the import machinery.
The import statement combines two operations; it searches for the named module, then it
binds the results of that search to a name in the local scope. The search operation of
the import statement is defined as a call to the function, with the appropriate arguments. The
return value of is used to perform the name binding operation of the import statement. See
the import statement for the exact details of that name binding operation.
Contrary to when you merge new cases, merging in new variables requires the IDs
for each case in the two files to be the same, but the variable names should be different. In
this scenario, which is sometimes referred to as augmenting your data (or in SQL, “joins”) or
merging data by columns (i.e. you’re adding new columns of data to each row), you’re
adding in new variables with information for each existing case in your data file. As with
merging new cases where not all variables are present, the same thing applies if you merge in
new variables where some cases are missing – these should simply be given blank values.
Fig-2 Merging data sets
2.Data volume
Since we don’t know the features that could be useful to predict the churn, we had to work
on all the data that reflect the customer behavior in general. We used data sets related to
51. 51
calls, SMS, MMS, and the internet with all related information like complaints, network
data, IMEI, charging, and other. The data contained transactions for all customers during
nine months before the prediction baseline. The size of this data was more than 70 Terabyte,
and we couldn’t perform the needed feature engineering phase using traditional databases.
3.Data variety
The data used in this research is collected from multiple systems and databases. Each source
generates the data in a different type of files as structured, semi-structured (XML-JSON) or
unstructured (CSV-Text). Dealing with these kinds of data types is very hard without big
data platform since we can work on all the previous data types without making any
modification or transformation. By using the big data platform, we no longer have any
problem with the size of these data or the format in which the data are represented.
4.Unbalanced data set
The generated data set was unbalanced since it is a special case of the classification problem
where the distribution of a class is not usually homogeneous with other classes. The
dominant class is called the basic class, and the other is called the secondary class. The data
set is unbalanced if one of its categories is 10% or less compared to the other one [18].
Although machine learning algorithms are usually designed to improve accuracy by reducing
error, not all of them take into account the class balance, and that may give bad results [18].
In general, classes are considered to be balanced in order to be given the same importance in
training.
We found that Syria Tel data set was unbalanced since the percentage of the secondary class
that represents churn customers is about 5% of the whole data set.
5.Extensive features
The collected data was full of columns, since there is a column for each service, product, and
offer related to calls, SMS, MMS, and internet, in addition to columns related to personnel
and demographic information. If we need to use all these data sources the number of
columns for each customer before the data being processed will exceed ten thousand
columns.
6.Missing values
There is a representation of each service and product for each customer. Missing values may
occur because not all customers have the same subscription. Some of them may have a
number of services and others may have something different. In addition, there are some
columns related to system configurations and these columns have only null value for all
customers.
52. 52
Fig-3 Inputting Missing values
7.Checking the Churn Rate
The most basic way you can calculate churn for a given month is to check how many
customers you have at the start of the month and how many of those customers leave by the
end of the month. Once you have both numbers, divide the number of customers that left by
the number of customers at the start of the month.
8.Splitting Data into Training and Test Sets
As I said before, the data we use is usually split into training data and test data. The training
set contains a known output and the model learns on this data in order to be generalized to
other data later on. We have the test data set (or subset) in order to test our model’s prediction
on this subset.
Fig-4 Diving into training and testing sets
9.Correlation Matrix
A correlation matrix is a table showing correlation coefficients between variables.
Each cell in the table shows the correlation between two variables. A correlation matrix is
used to summarize data, as an input into a more advanced analysis, and as a diagnostic for
advanced analyses.
Typically, a correlation matrix is “square”, with the same variables shown in the rows and
columns. I've shown an example below. This shows correlations between the stated
importance of various things to people. The line of 1.00s going from the top left to the
bottom right is the main diagonal, which shows that each variable always perfectly
53. 53
correlates with itself. This matrix is symmetrical, with the same correlation is shown above
the main diagonal being a mirror image of those below the main diagonal.
Fig-5 Correlation Matrix
10.Feature Selection Using RFE
When U.S. Citizenship and Immigration Services (USCIS) needs more information
in order to proceed any further on your application, it can issue you a Request for Evidence
(RFE). You will need to respond to the RFE within the time frame indicated (usually 30 to
90 days) so that the immigration official adjudicating your case will have enough evidence to
make a favorable decision.
11.Making Predictions
Making predictions is a strategy in which readers use information from a text
(including titles, headings, pictures, and diagrams) and their own personal experiences to
anticipate what they are about to read (or what comes next). A reader involved in making
predictions is focused on the text at hand, constantly thinking ahead and also refining,
revising, and verifying his or her predictions. This strategy also helps students make
connections between their prior knowledge and the text.
Students may initially be more comfortable making predictions about fiction than nonfiction
or informational text. This may be due to the fact that fiction is more commonly used in
early reading instruction. Students also tend to be more comfortable with the structure of
narrative text than they are with the features and structures used in informational text.
However, the strategy is important for all types of text. Teachers should make sure to include
time for instruction, modeling, and practice as students read informational text. They can
also help students successfully make predictions about informational text by ensuring that
students have sufficient background knowledge before beginning to read the text.
54. 54
12.Model Evaluation
Model Evaluation is an integral part of the model development process. It helps to
find the best model that represents our data and how well the chosen model will work in the
future. To avoid outfitting, both methods use a test set (not seen by the model) to evaluate
model performance.
13.ROC Curve
A receiver operating characteristic curve, or ROC curve, is a graphical plot that
illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is
varied.
The ROC curve is created by plotting the true positive rate (TPR) against the false positive
rate (FPR) at various threshold settings. The true-positive rate is also known
as sensitivity, recall or probability of detection in machine learning. The false-positive rate is
also known as probability of false alarm and can be calculated as (1 − specificity). It can also
be thought of as a plot of the power as a function of the Type I Error of the decision rule
(when the performance is calculated from just a sample of the population, it can be thought
of as estimators of these quantities). The ROC curve is thus the sensitivity or recall as a
function of fall-out. In general, if the probability distributions for both detection and false
alarm are known, the ROC curve can be generated by plotting the cumulative distribution
function (area under the probability distribution from to the discrimination threshold) of the
detection probability in the y-axis versus the cumulative distribution function of the false-
alarm probability on the x-axis.
Histogram:
A histogram is a display of statistical information that uses rectangles to show the frequency
of data items in successive numerical intervals of equal size. In the most common form of
histogram, the independent variable is plotted along the horizontal axis and the dependent
variable is plotted along the vertical axis. The data appears as colored or shaded rectangles of
variable area.
The illustration, below, is a histogram showing the results of a final exam given to a
hypothetical class of students. Each score range is denoted by a bar of a certain color. If this
histogram were compared with those of classes from other years that received the same test
from the same professor, conclusions might be drawn about intelligence changes among
students over the years. Conclusions might also be drawn concerning the improvement or
decline of the professor's teaching ability with the passage of time. If this histogram were
compared with those of other classes in the same semester who had received the same final
exam but who had taken the course from different professors, one might draw conclusions
about the relative competence of the professors.
55. 55
Fig-6 Histogram
Some histograms are presented with the independent variable along the vertical axis and the
dependent variable along the horizontal axis. That format is less common than the one shown
here.
5.3 Screenshots or output screens
1.Code
# In[1]:
import pandas as pd
import numpy as np
# In[2]:
churn_data = pd.read_csv("OneDrive/Desktop/project datasets/churn_data.csv")
customer_data = pd.read_csv("OneDrive/Desktop/project datasets/customer_data.csv")
internet_data = pd.read_csv("OneDrive/Desktop/project datasets/internet_data.csv")
# In[3]:
56. 56
df_1 = pd.merge(churn_data, customer_data, how='inner', on='customerID')
# In[4]:
telecom = pd.merge(df_1, internet_data, how='inner', on='customerID')
# In[5]:
telecom.head()
# In[6]:
telecom.describe()
# In[7]:
telecom.info()
# In[8]:
telecom['PhoneService'] = telecom['PhoneService'].map({'Yes': 1, 'No': 0})
telecom['PaperlessBilling'] = telecom['PaperlessBilling'].map({'Yes': 1, 'No': 0})
telecom['Churn'] = telecom['Churn'].map({'Yes': 1, 'No': 0})
telecom['Partner'] = telecom['Partner'].map({'Yes': 1, 'No': 0})
telecom['Dependents'] = telecom['Dependents'].map({'Yes': 1, 'No': 0})
# In[9]:
# Creating a dummy variable for the variable 'Contract' and dropping the first one.
cont = pd.get_dummies(telecom['Contract'],prefix='Contract',drop_first=True)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,cont],axis=1)
# Creating a dummy variable for the variable 'PaymentMethod' and dropping the first one.
pm = pd.get_dummies(telecom['PaymentMethod'],prefix='PaymentMethod',drop_first=True)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,pm],axis=1)
57. 57
# Creating a dummy variable for the variable 'gender' and dropping the first one.
gen = pd.get_dummies(telecom['gender'],prefix='gender',drop_first=True)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,gen],axis=1)
# Creating a dummy variable for the variable 'MultipleLines' and dropping the first one.
ml = pd.get_dummies(telecom['MultipleLines'],prefix='MultipleLines')
# dropping MultipleLines_No phone service column
ml1 = ml.drop(['MultipleLines_No phone service'],1)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,ml1],axis=1)
# Creating a dummy variable for the variable 'InternetService' and dropping the first one.
iser = pd.get_dummies(telecom['InternetService'],prefix='InternetService',drop_first=True)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,iser],axis=1)
# Creating a dummy variable for the variable 'OnlineSecurity'.
os = pd.get_dummies(telecom['OnlineSecurity'],prefix='OnlineSecurity')
os1= os.drop(['OnlineSecurity_No internet service'],1)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,os1],axis=1)
# Creating a dummy variable for the variable 'OnlineBackup'.
ob =pd.get_dummies(telecom['OnlineBackup'],prefix='OnlineBackup')
ob1 =ob.drop(['OnlineBackup_No internet service'],1)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,ob1],axis=1)
# Creating a dummy variable for the variable 'DeviceProtection'.
dp =pd.get_dummies(telecom['DeviceProtection'],prefix='DeviceProtection')
dp1 = dp.drop(['DeviceProtection_No internet service'],1)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,dp1],axis=1)
# Creating a dummy variable for the variable 'TechSupport'.
ts =pd.get_dummies(telecom['TechSupport'],prefix='TechSupport')
ts1 = ts.drop(['TechSupport_No internet service'],1)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,ts1],axis=1)
# Creating a dummy variable for the variable 'StreamingTV'.
st =pd.get_dummies(telecom['StreamingTV'],prefix='StreamingTV')
st1 = st.drop(['StreamingTV_No internet service'],1)
#Adding the results to the master dataframe
telecom = pd.concat([telecom,st1],axis=1)
# Creating a dummy variable for the variable 'StreamingMovies'.
sm =pd.get_dummies(telecom['StreamingMovies'],prefix='StreamingMovies')
sm1 = sm.drop(['StreamingMovies_No internet service'],1)
#Adding the results to the master dataframe
60. 60
churn = (sum(telecom['Churn'])/len(telecom['Churn'].index))*100
# In[26]:
telecom
# In[27]:
churn
# In[28]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
import numpy as np
from sklearn.metrics import mean_squared_error
# In[29]:
features = telecom.drop(['Churn','customerID'],axis=1).values
target = telecom['Churn'].values
# In[30]:
features_train,target_train=features[20:30],target[20:30]
features_test,target_test=features[40:50],target[40:50]
print(features_test)
# In[31]:
model = GaussianNB()
model.fit(features_train,target_train)
print(model.predict(features_test))
# In[32]:
61. 61
from sklearn.model_selection import train_test_split
# In[33]:
# Putting feature variable to X
X = telecom.drop(['Churn','customerID'],axis=1)
# Putting response variable to y
y = telecom['Churn']
# In[34]:
y.head()
# In[35]:
# Splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X,y,
train_size=0.7,test_size=0.3,random_state=100)
# In[36]:
import statsmodels.api as sm
# In[37]:
# Logistic regression model
logm1 = sm.GLM(y_train,(sm.add_constant(X_train)), family = sm.families.Binomial())
logm1.fit().summary()
# In[38]:
# Importing matplotlib and seaborn
import matplotlib.pyplot as plt
import seaborn as sns
get_ipython().run_line_magic('matplotlib', 'inline')
62. 62
# In[39]:
# Let's see the correlation matrix
plt.figure(figsize = (20,10)) # Size of the figure
sns.heatmap(telecom.corr(),annot = True)
# In[40]:
X_test2 =
X_test.drop(['MultipleLines_No','OnlineSecurity_No','OnlineBackup_No','DeviceProtection
_No','TechSupport_No','StreamingTV_No','StreamingMovies_No'],1)
X_train2 =
X_train.drop(['MultipleLines_No','OnlineSecurity_No','OnlineBackup_No','DeviceProtectio
n_No','TechSupport_No','StreamingTV_No','StreamingMovies_No'],1)
# In[41]:
plt.figure(figsize = (20,10))
sns.heatmap(X_train2.corr(),annot = True)
# In[42]:
logm2 = sm.GLM(y_train,(sm.add_constant(X_train2)), family = sm.families.Binomial())
logm2.fit().summary()
# In[43]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
from sklearn.feature_selection import RFE
rfe = RFE(logreg, 13) # running RFE with 13 variables as output
rfe = rfe.fit(X,y)
print(rfe.support_) # Printing the boolean results
print(rfe.ranking_) # Printing the ranking
# In[44]:
col = ['PhoneService', 'PaperlessBilling', 'Contract_One year', 'Contract_Two year',
63. 63
'PaymentMethod_Electronic check','MultipleLines_No','InternetService_Fiber optic',
'InternetService_No',
'OnlineSecurity_Yes','TechSupport_Yes','StreamingMovies_No','tenure','TotalCharges']
# In[45]:
# Let's run the model using the selected variables
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
logsk = LogisticRegression()
logsk.fit(X_train[col], y_train)
# In[46]:
#Comparing the model with StatsModels
logm4 = sm.GLM(y_train,(sm.add_constant(X_train[col])), family =
sm.families.Binomial())
logm4.fit().summary()
# In[47]:
# UDF for calculating vif value
def vif_cal(input_data, dependent_col):
vif_df = pd.DataFrame( columns = ['Var', 'Vif'])
x_vars=input_data.drop([dependent_col], axis=1)
xvar_names=x_vars.columns
for i in range(0,xvar_names.shape[0]):
y=x_vars[xvar_names[i]]
x=x_vars[xvar_names.drop(xvar_names[i])]
rsq=sm.OLS(y,x).fit().rsquared
vif=round(1/(1-rsq),2)
vif_df.loc[i] = [xvar_names[i], vif]
return vif_df.sort_values(by = 'Vif', axis=0, ascending=False, inplace=False)
# In[48]:
telecom.columns
['PhoneService', 'PaperlessBilling', 'Contract_One year', 'Contract_Two year',
'PaymentMethod_Electronic check','MultipleLines_No','InternetService_Fiber optic',
'InternetService_No',
'OnlineSecurity_Yes','TechSupport_Yes','StreamingMovies_No','tenure','TotalCharges']
65. 65
# Let's run the model using the selected variables
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
logsk = LogisticRegression()
logsk.fit(X_train[col], y_train)
# In[54]:
y_pred = logsk.predict_proba(X_test[col])
# In[55]:
y_pred_df = pd.DataFrame(y_pred)
# In[56]:
y_pred_1 = y_pred_df.iloc[:,[1]]
# In[57]:
y_pred_1.head()
# In[58]:
y_test_df = pd.DataFrame(y_test)
# In[59]:
y_test_df = pd.DataFrame(y_test)
# In[60]:
# Removing index for both dataframes to append them side by side
y_pred_1.reset_index(drop=True, inplace=True)
66. 66
y_test_df.reset_index(drop=True, inplace=True)
# In[61]:
y_pred_final = pd.concat([y_test_df,y_pred_1],axis=1)
# In[62]:
y_pred_final= y_pred_final.rename(columns={ 1 : 'Churn_Prob'})
# In[63]:
# Rearranging the columns
y_pred_final = y_pred_final.reindex(['CustID','Churn','Churn_Prob'], axis=1)
# In[64]:
y_pred_final.head()
# In[65]:
# Creating new column 'predicted' with 1 if Churn_Prob>0.5 else 0
y_pred_final['predicted'] = y_pred_final.Churn_Prob.map( lambda x: 1 if x > 0.5 else 0)
# In[66]:
y_pred_final.head()
# In[67]:
from sklearn import metrics
# In[68]:
77. 77
Chapter-6
TESTING & VALIDATION
6.1 Introduction:
Functional vs. Non-functional Testing
The goal of utilizing numerous testing methodologies in your development process is to
make sure your software can successfully operate in multiple environments and across
different platforms. These can typically be broken down between functional and non-
functional testing. Functional testing involves testing the application against the business
requirements. It incorporates all test types designed to guarantee each part of a piece of
software behaves as expected by using uses cases provided by the design team or business
analyst. These testing methods are usually conducted in order and include:
Unit testing
Integration testing
System testing
Acceptance testing
Non-functional testing methods incorporate all test types focused on the operational aspects
of a piece of software. These include:
Performance testing
Security testing
Usability testing
Compatibility testing
The key to releasing high quality software that can be easily adopted by your end users is to
build a robust testing frameworkfs that implements both functional and non-functional
software testing methodologies.
Unit Testing
Unit testing is the first level of testing and is often performed by the developers themselves.
It is the process of ensuring individual components of a piece of software at the code level
are functional and work as they were designed to. Developers in a test-driven environment
will typically write and run the tests prior to the software or feature being passed over to the
test team. Unit testing can be conducted manually, but automating the process will speed up
delivery cycles and expand test coverage. Unit testing will also make debugging easier
because finding issues earlier means they take less time to fix than if they were discovered
later in the testing process. TestLeft is a tool that allows advanced testers and developers to
shift left with the fastest test automation tool embedded in any IDE.
78. 78
Fig-7 Unit Testing Life cycle
Unit Testing Techniques:
White Box Testing - used to test each one of those functions behavior
Black Box Testing - Using which the user interface, input and output are tested
Gray Box Testing - Used to execute tests, risks and assessment methods.
White box testing:
The box testing approach of software testing consists of black box testing and white box
testing. We are discussing here white box testing which also known as glass box is testing,
structural testing, clear box testing, open box testing and transparent box testing. It
tests internal coding and infrastructure of a software focus on checking of predefined inputs
against expected and desired outputs. It is based on inner workings of an application and
revolves around internal structure testing. In this type of testing programming skills are
required to design test cases. The primary goal of white box testing is to focus on the flow of
inputs and outputs through the software and strengthening the security of the software.
79. 79
The term 'white box' is used because of the internal perspective of the system. The clear box
or white box or transparent box name denote the ability to see through the software's outer
shell into its inner workings.
Test cases for white box testing are derived from the design phase of the software
development lifecycle. Data flow testing, control flow testing, path testing, branch testing,
statement and decision coverage all these techniques used by white box testing as a guideline
to create an error-free software.
Fig-8 White Box Testing
White box testing follows some working steps to make testing manageable and easy to
understand what the next task to do. There are some basic steps to perform white box testing.
Generic steps of white box testing:
o Design all test scenarios, test cases and prioritize them according to high priority
number.
o This step involves the study of code at runtime to examine the resource utilization,
not accessed areas of the code, time taken by various methods and operations and so
on.
o In this step testing of internal subroutines takes place. Internal subroutines such as
nonpublic methods, interfaces are able to handle all types of data appropriately or
not.
o This step focuses on testing of control statements like loops and conditional
statements to check the efficiency and accuracy for different data inputs.
o In the last step white box testing includes security testing to check all possible
security loopholes by looking at how the code handles security
Reasoning of white box testing:
o It identifies internal security holes.
o To check the way of input inside the code.
o Check the functionality of conditional loops.
o To test function, object, and statement at an individual level.
80. 80
Advantages of white box testing:
o White box testing optimizes code so hidden errors can be identified.
o Test cases of white box testing can be easily automated.
o This testing is more thorough than other testing approaches as it covers all code
paths.
o It can be started in the SDLC phase even without GUI.
Disadvantages of white box testing:
o White box testing is too much time consuming when it comes to large-scale
programming applications.
o White box testing is much expensive and complex.
o It can lead to production error because it is not detailed by the developers.
o White box testing needs professional programmers who have a detailed knowledge
and understanding of programming language and implementation.
Black box testing:
Black box testing is a technique of software testing which examines the functionality of
software without peering into its internal structure or coding. The primary source of black
box testing is a specification of requirements that is stated by the customer.
In this method, tester selects a function and gives input value to examine its functionality,
and checks whether the function is giving expected output or not. If the function produces
correct output, then it is passed in testing, otherwise failed. The test team reports the
result to the development team and then tests the next function. After completing testing
of all functions if there are severe problems, then it is given back to the development team
for correction.
Fig-9 Black Box Testing
Generic step of black box testing: