The KNN (K Nearest Neighbors) algorithm analyzes all available data points and classifies this data, then classifies new cases based on these established categories. It is useful for recognizing patterns and for estimating. The KNN Classification algorithm is useful in determining probable outcome and results, and in forecasting and predicting results, given the existence of multiple variables.
Mask Detection and Temperature Sensor System.pptxKdOfficial1
Due to the COVID-19 pandemic, wearing a mask is mandatory in public spaces, as properly wearing a mask offers a maximum preventive effect against viral transmission. Body temperature has also become an important consideration in determining whether an individual is healthy. In this work, we design a real-time deep learning model to meet the current demand to detect the mask-wearing position and head temperature of a person before he or she enters a public space. In this experiment, we use a deep learning object detection method to create a mask position and head temperature detector. We implement an RGB camera and thermal camera to generate input images and capture a person's temperature respectively. The output of these experiments is a live video that carries accurate information about whether a person is wearing a mask properly and what his or her head temperature or palm temperature.
Mask Detection and Temperature Sensor System.pptxKdOfficial1
Due to the COVID-19 pandemic, wearing a mask is mandatory in public spaces, as properly wearing a mask offers a maximum preventive effect against viral transmission. Body temperature has also become an important consideration in determining whether an individual is healthy. In this work, we design a real-time deep learning model to meet the current demand to detect the mask-wearing position and head temperature of a person before he or she enters a public space. In this experiment, we use a deep learning object detection method to create a mask position and head temperature detector. We implement an RGB camera and thermal camera to generate input images and capture a person's temperature respectively. The output of these experiments is a live video that carries accurate information about whether a person is wearing a mask properly and what his or her head temperature or palm temperature.
Find Transitive closure of a Graph Using Warshall's AlgorithmSafayet Hossain
Here I actually describe how we can find transitive closure of a graph using warshall' algorithm. It will be easy to learn about transitive closure, their time complexity, count space complexity.
SVM Classifications are designed to find a hyper plane that best divides a dataset into predefined classes and choose a hyperplane with the greatest possible margin between the hyper-plane and any point within the training set, giving a greater chance of new data being classified correctly. SVM Classification analysis helps organizations to predict outcomes, based on attributes and variables in the profile of a customer, a patient, a product etc.
Binary Logistic Regression Classification makes use of one or more predictor variables that may be either continuous or categorical to predict target variable classes. This technique identifies important factors impacting the target variable and also the nature of the relationship between each of these factors and the dependent variable. It is useful in the analysis of multiple factors influencing an outcome, or other classification where there two possible outcomes.
Find Transitive closure of a Graph Using Warshall's AlgorithmSafayet Hossain
Here I actually describe how we can find transitive closure of a graph using warshall' algorithm. It will be easy to learn about transitive closure, their time complexity, count space complexity.
SVM Classifications are designed to find a hyper plane that best divides a dataset into predefined classes and choose a hyperplane with the greatest possible margin between the hyper-plane and any point within the training set, giving a greater chance of new data being classified correctly. SVM Classification analysis helps organizations to predict outcomes, based on attributes and variables in the profile of a customer, a patient, a product etc.
Binary Logistic Regression Classification makes use of one or more predictor variables that may be either continuous or categorical to predict target variable classes. This technique identifies important factors impacting the target variable and also the nature of the relationship between each of these factors and the dependent variable. It is useful in the analysis of multiple factors influencing an outcome, or other classification where there two possible outcomes.
Naive Bayes is a classification algorithm that is suitable for binary and multiclass classification. It is suitable for binary and multiclass classification. Naïve Bayes performs well in cases of categorical input variables compared to numerical variables. It is useful for making predictions and forecasting data based on historical results.
Hierarchical Clustering is a process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group and as similar as possible within each group. This technique can help an enterprise organize data into groups to identify similarities and, equally important, dissimilar groups and characteristics, so the business can target pricing, products, services, marketing messages and more.
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Aleksi Aaltonen
Presentation at the University of Miami on 3 December 2021 on how Stack Overflow improved the retention of new contributors whose initial question is rejected (closed) as substandard. The presentation is based on a paper coauthored with Sunil Wattal.
Random Forest Classification is a machine learning technique utilizing aggregated outcome of many decision tree classifiers in order to improve precision of the outcome. It measures the relationship between the categorical target variable and one or more independent variables.
"Multilayer perceptron (MLP) is a technique of feed
forward artificial neural network using back
propagation learning method to classify the target
variable used for supervised learning. It consists of multiple layers and non-linear activation allowing it to distinguish data that is not linearly separable."
The Paired Sample T Test is used to determine whether the mean of a dependent variable. For example, weight, anxiety level, salary, or reaction time is the same in two related groups. It is particularly useful in measuring results before and after a particular event, action, process change, etc.
Using Python library such as numpy, scipy and pandas to carry out supervised learning operations like Support vector machine, decision tree and K-nearest neighbor.
Survey of Finance and Engineering Economics Presented byMoha.docxmattinsonjanel
Survey of Finance and Engineering Economics
Presented by
Mohammed Ali Alsendi
Nadia Mohammed Daabis
Instructor
Professor Wajeeh Elali
Time Value of Money
Time value of money refers to the concept that a dollar today is worth more than a dollar tomorrow.
Case study
NATASHA, 30 years old and has Bachelor of science degree in computer science.
Working as Tier 2 field service representative for a telephony corporation located in Seattle, Washington.
She has $75,000 that recently inherited from her aunt, and invested this money in 10 years treasury bond.
Terms of Common Inputs
Current Salary $38,000/-
She don’t expect to lose any income during the Certification or while she earning her MBA.
In both cases, she expect her salary differential will also grow at a rate of 3% per year, for as long as she keep working.
Keep using the interest rate as discount rate for the remainder of the problem
CAMPARISME SUMMARYOption 1 "Network Design"Option 2 "MBA"PositionTier 3Managerial PositionCost$5,000 $25,000 / YearPeriod1 year3 years Salery Increasment$10,000 $20,000 Payment DueEnd of 1 yearBegin of each yearRiskAbove 80% on an exam at end of courseEvening program which will take 3 years to complete
Summary
Timeline
Option 1
Option 2
t0
t1
t2
t3
$38,000
$39,140
$50,614.20
$52,132.62
$38,000 x 3%
($39,140+$10,000) x 3%
$50,614.20 x 3%
($5,000)
($25,000)
($25,000)
($25,000)
$39,140
$40,314.20
$41,523.626
$38,000 x 3%
$39,140x 3%
$39,140 x 3%
t4
$53,696.59
$63,369.33
($41,523.626+$20,000) x 3%
$52,132.62x 3%
Timeline Graph
Current Sutation 38000 39140 40314.200000000004 41523.626000000004 42769.334780000005 44052.414823400009 45373.987268102013 46735.206886145075 Certificate 38000 39140 50614.200000000004 52132.626000000004 53696.604780000009 55307.50292340001 56966.728011102015 58675.729851435077 MBA 38000 39140 40314.200000000004 62123.625999999997 63987.334779999997 65906.954823399996 67884.163468101993 69920.688372145058
Yearly Income
Treasury Bond
Amount $75,000
Period 10 years
Rate 3.52% (1st June, 2009)*
A marketable, fixed-interest government debt security with a maturity of more than 10 years. Treasury bond make interest payment annualy and the income that holders receive is only taxed the federal level.
t0
t1
t2
t10
($75,000)
Treasury Bond
$9027.19
$9027.19
$9027.19
…..
PVA(ordinary) = PMT 1 – (1+k)-n
K
$75,000 = x 1 – (1+0.0352)-10
0.0352
PMT = $9027.190
[ ]
[ ]
C ...
Prediction of Crime Type plays a vital role in preventing crime in the society as well as assisting law agencies to design optimal strategies to ward off crime happenings in turn increasing public safety and decreasing economical loss.
Generalized Linear Regression with Gaussian Distribution is a statistical technique which is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The Generalized Linear Model (GLM) generalizes linear regression by allowing the linear model to be related to the response variable via a link function (in this case link function being Gaussian Distribution) and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
Isotonic Regression is a statistical technique of fitting a free-form line to a sequence of observations such that the fitted line is non-decreasing (or non-increasing) everywhere, and lies as close to the observations as possible. Isotonic Regression is limited to predicting numeric output so the dependent variable must be numeric in nature…
Predictive analytics of students' academic performance can help decision makers take appropriate actions at the right moment and plan appropriate training in order to improve the student’s success rate.
This overview discusses the predictive analytical technique known as Random Forest Regression, a method of analysis that creates a set of Decision Trees from a randomly selected subset of the training set, and aggregates by averaging values from different decision trees to decide the final target value. This technique is useful to determine which predictors have a significant impact on the target values, e.g., the impact of average rainfall, city location, parking availability, distance from hospital, and distance from shopping on the price of a house, or the impact of years of experience, position and productive hours on employee salary. Random Forest Regression is limited to predicting numeric output so the dependent variable has to be numeric in nature. The minimum sample size is 20 cases per independent variable. Random Forest Regression is just one of the numerous predictive analytical techniques and algorithms included in the Assisted Predictive Modeling module of the Smarten augmented analytics solution. This solution is designed to serve business users with sophisticated tools that are easy to use and require no data science or technical skills. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.
This overview discusses the predictive analytical technique known as Gradient Boosting Regression, an analytical technique that explore the relationship between two or more variables (X, and Y). Its analytical output identifies important factors ( Xi ) impacting the dependent variable (y) and the nature of the relationship between each of these factors and the dependent variable. Gradient Boosting Regression is limited to predicting numeric output so the dependent variable has to be numeric in nature. The minimum sample size is 20 cases per independent variable. The Gradient Boosting Regression technique is useful in many applications, e.g., targeted sales strategies by using appropriate predictors to ensure accuracy of marketing campaigns and clarify relationships among factors such as seasonality, product pricing and product promotions, or for an agriculture business attempting to ascertain the effects of temperature, rainfall and humidity on crop production. Gradient Boosting Regression is just one of the numerous predictive analytical techniques and algorithms included in the Assisted Predictive Modeling module of the Smarten augmented analytics solution. This solution is designed to serve business users with sophisticated tools that are easy to use and require no data science or technical skills. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.
Simple Linear Regression is a statistical technique that attempts to explore the relationship between one independent variable (X) and one dependent variable (Y). The Simple Linear Regression technique is not suitable for datasets where more than one variable/predictor exists.
Multiple Linear Regression is a statistical technique that is designed to explore the relationship between two or more. It is useful in identifying important factors that will affect a dependent variable, and the nature of the relationship between each of the factors and the dependent variable. It can help an enterprise consider the impact of multiple independent predictors and variables on a dependent variable, and is beneficial for forecasting and predicting results.
sing advanced analytics to identify quality issues will improve production processes, protect the business against liability claims and allow the organization to focus on quality issues and change product design and/or processes.
Predictive analytics for maintenance management can take the guesswork out of equipment maintenance, which parts to order and when equipment should be replaced.
Predictive analytics targets data to predict if ATL advertising is more effective than BTL advertising and to target customer segments and characteristics.
Predictive analytics for human resource attrition identifies areas of dissatisfaction, analyzes processes, benefits, training and environs to improve retention.
Predictive Analytics for customer targeting identifies buying frequency, what causes customers to buy, factors informing purchases and messaging by segment.
Multiple Linear Regression is a statistical technique that is designed to explore the relationship between two or more. It is useful in identifying important factors that will affect a dependent variable, and the nature of the relationship between each of the factors and the dependent variable. It can help an enterprise consider the impact of multiple independent predictors and variables on a dependent variable, and is beneficial for forecasting and predicting results.
The independent sample t-test is a statistical method of hypothesis testing that determines whether there is a statistically significant difference between the means of two independent samples. It is helpful when an organization wants to determine whether there is a statistical difference between two categories or groups or items and, furthermore, if there is a statistical difference, whether that difference is significant.
Sampling is the technique of selecting a representative part of a population for the purpose of determining the characteristics of the whole population. There are two types of sampling analysis: Simple Random Sampling and Stratified Random Sampling. Sampling is useful in assigning values and predicting outcomes for an entire population, based on a smaller subset or sample of the population.
Simple Linear Regression is a statistical technique that attempts to explore the relationship between one independent variable (X) and one dependent variable (Y). The Simple Linear Regression technique is not suitable for datasets where more than one variable/predictor exists.
An ARIMAX model can be viewed as a multiple regression model with one or more autoregressive (AR) terms and/or one or more moving average (MA) terms. It is suitable for forecasting when data is stationary/non stationary, and multivariate with any type of data pattern, i.e., level/trend /seasonality/cyclicity. ARIMAX provides forecasted values of the target variables for user-specified time periods to illustrate results for planning, production, sales and other factors.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
What is KNN Classification and How Can This Analysis Help an Enterprise?
1. Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
3. Basic Terminologies
Target variable usually denoted by Y , is the variable being predicted and
is also called dependent variable, output variable, response variable or
outcome variable (Ex : One highlighted in red box in table below)
Predictor, sometimes called an independent variable, is a variable that is
being used to predict the target variable ( Ex : variables highlighted in
green box in table below )
Age Marital Status Gender
Satisfaction
level
58 married Female High
44 single Female Low
33 married Male Medium
47 married Female High
33 single Female Medium
4. Introduction
• An instance (data record or case) is assigned a class, which is most common among its k nearest
neighbors
• Here, k is a positive integer, typically an odd number and ranging between 1 to 10
For instance, for k = 3, the majority class of 3 nearest neighbors
of center point shown as star in image is Class B (two out of
three circles are purple , i.e. class B) whereas for k=6, majority is
Class A (four out of 6 circles are yellow, i.e. class A)
Note : K is automatically identified by an algorithm based on the
number which gives highest classification accuracy
5. Steps :
• Determine K = number of nearest neighbors( in terms of distance) to
check for class assignment
• Calculate the distance between an instance and all the training
instances
• Rank the instances by distance and find out k nearest neighbors in
terms of shortest distance from new instance
• Gather the classes of nearest neighbors to find out the majority
• Use this majority of class as a final predicted value of a class
6. Example : Input
• Based on two attributes : Acid durability and strength, we want to classify a paper tissue into good/bad quality classes :
Acid durability ( in
Seconds)
Strength
(Kg/Square meter)
Paper tissue
Quality
7 7 Good
7 4 Good
3 4 Bad
1 4 Good
Target Variable (Y)Independent variables/predictors
7. Example : Steps :
Calculate Distance and Ranking to find K nearest neighbors
• For instance :
• If a paper tissue’s acid durability = 3 and strength = 7 then take following steps :
• Step 1 : Decide value of k ; Say it is 3 (based on classification accuracy )
• Step 2,3 : Calculate the distance between each input instance and new instance and rank each input
instance by distance to find out the k nearest neighborsAcid durability ( In
Seconds)
Strength
(Kg/Square
meter)
Paper
tissue
Quality
Distance to instance
Rank by distance to find
nearest neighbor
7 7 Good (7 -3)2 + (7-7)2 =16 3
7 4 Good (7 -3)2 + (4-7)2 =25 4
3 4 Bad (3 -3)2 + (4-7)2 =9 1
1 4 Good (1-3)2 + (4-7)2 =13 2
Input dataset Derived results to find out the majority class of k nearest neighbors
8. Step 4,5 : As the majority class =
Good for the three nearest
neighbors ( two out of three
records have class = Good) ,
predicted class of an instance =
Good, i.e. quality of a paper
tissue having acid durability =3
and strength =7 is good
Final output :
Acid durability ( In
Seconds)
Strength
(Kg/Square
meter)
Paper
tissue
Quality
7 7 Good
7 4 Good
3 4 Bad
1 4 Good
3 7 Good
Example : Steps
Select majority class of k nearest neighbors as predicted
class
9. Example : Steps
Find out Accuracy
CLASSIFICATION ACCURACY : (35+ 70) / (35+70+4+4) = 92%
• The prediction accuracy is useful criterion for assessing the model performance
• Model with prediction accuracy >= 70% is useful
CLASSIFICATION ERROR = 100- Accuracy = 8%
There is 8% chance of error in classification
Good Bad
Good 35 4
Bad 4 70
Predicted
Actual
11. Standard output 1 : Model Summary
Good Bad
Good 35 4
Bad 4 70
ACTUAL VERSUS PREDICTED
Predicted
Actual
PROFILE OF CLASSES
• Good quality class has average acid durability = 6 and Strength = 7
• Bad quality class has average acid durability = 3 and Strength = 4
13. Sample output 3 : Classification plot
• Lesser the overlap
between two classes in
the plot, better the
classification done by
model
Thus, output will contain predicted class and probability columns, confusion matrix
and classification plot
14. Limitations :
• Data needs to be scaled [(x-min(x)/max(x)-min(x)] before inputting in
the algorithm, else it can lead to high % of misclassification and in
turn low accuracy
• Not suitable for classifying categorical variables
• Individual variable importance can not be measured (which
variable(s) is most important or has high contribution in the
classification model)
• For instance, Age/income might be impactful variables or say, determinant factors when
classifying the applicants into likely defaulters/non defaulters
15. General applications
Credit/loan
approval analysis
• Given a list of client’s
transactional
attributes, predict
whether a client will
default or not on a
bank loan
Weather
Prediction
• Based on temperature,
humidity, pressure etc.
predict if it will be
rainy/sunny/cold
weather
Rain forecasting
• Based on temperature,
humidity, pressure etc.
predict if it will be
raining or not
Fraud analysis
• Based on various bills
submitted by an
employee for
reimbursement of
food , travel , medical
expense etc., predict
the likelihood of an
employee doing fraud
16. Use case 1
Business benefit:
•Once classes are assigned, bank will
have a loan applicants’ dataset with
each applicant labeled as
“likely/unlikely to default”
•Based on this labels , bank can easily
make a decision on whether to give
loan to an applicant or not and if yes
then how much credit limit and
interest rate each applicant is eligible
for based on the amount of risk
involved
Business problem :
•A bank loans officer wants to predict if
the loan applicant will be a bank
defaulter or non defaulter based on
attributes such as Loan amount ,
Monthly installment, Employment
tenure , Times delinquent, Annual
income, Debt to income ratio etc.
•Here the target variable would be ‘past
default status’ and predicted class
would be containing values ‘yes or no’
representing ‘likely to default/unlikely
to default’ class respectively
17. Use case 1 : Input Dataset
Customer
ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past default
status
1039153 21000 701.73 105000 9 5 4 No
1069697 15000 483.38 92000 11 5 2 No
1068120 25600 824.96 110000 10 9 2 No
563175 23000 534.94 80000 9 2 12 No
562842 19750 483.65 57228 11 3 21 Yes
562681 25000 571.78 113000 10 0 9 No
562404 21250 471.2 31008 12 1 12 Yes
700159 14400 448.99 82000 20 6 6 No
696484 10000 241.33 45000 18 8 2 Yes
18. Use case 1 : Output : Predicted Class
Output : Each record will have the predicted class assigned as shown below (Column :
Predicted class) :
Customer
ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past
default
status
Predicted
class
1039153 21000 701.73 105000 9 5 4 No No
1069697 15000 483.38 92000 11 5 2 No No
1068120 25600 824.96 110000 10 9 2 No No
563175 23000 534.94 80000 9 2 12 No No
562842 19750 483.65 57228 11 3 21 Yes No
562681 25000 571.78 113000 10 0 9 No No
562404 21250 471.2 31008 12 1 12 Yes Yes
700159 14400 448.99 82000 20 6 6 No No
696484 10000 241.33 45000 18 8 2 Yes Yes
19. Use case 1 : Output : Class profile
As can be seen in the table above, there are distinctive characteristics of
defaulters (Class : Yes ) and non defaulters ( Class : No )
Defaulters have tendency to be delinquent, higher debt to income ratio and lower
employment tenure as compared to non defaulters
Hence , delinquency , employment tenure and debt to income ratio are the
determinant factors when it comes to classifying loan applicants into likely
defaulter/non defaulters
Class(Likely to
default)
Average
loan
amount
Average
monthly
installment
Average
annual
income
Average debt
to income
ratio
Average
times
delinquent
Average
employment
tenure
No 10447.30 304.87 66467.74 9.58 1.69 16.82
Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
20. Use case 2
Business benefit:
•Given the body profile of a patient and
recent treatments and drugs taken by
him/her , probability of a cure can be
predicted and changes in treatment/drug
can be suggested if required
Business problem :
•A doctor/ pharmacist wants to predict
the likelihood of a new patient’s disease
being cured/not cured based on various
attributes of a patient such as blood
pressure , hemoglobin level, sugar level ,
name of a drug given to patient, name of
a treatment given to patient etc.
•Here the target variable would be ‘past
cure status’ and predicted class would
contain values ‘yes or no’ meaning ‘prone
to cure/ not prone to cure’ respectively
21. Use case 3
Business benefit:
•Such classification can prevent a
company from spending unreasonably
on any employee and can in turn save
the company budget by detecting such
fraud beforehand
Business problem :
•An accountant/human resource
manager wants to predict the
likelihood of an employee doing fraud
to a company based on various bills
submitted by him/her so far such as
food bill , travel bill , medical bill
•The target variable in this case would
be ‘past fraud status’ and predicted
class would contain values ‘yes or no’
representing likely fraud and no fraud
respectively
22. Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018