1. Detection Of Fraudlent Behavior In Water
Consumption Using A Data Mining Based Model
By Gedela Pradeep Under The Guidance Of
PG212206010 Sri P.Venkata Rao Sir
M.Sc(CS)- 4th sem Associate Professor
By N. Naveen kumar Under The Guidence Of
PG-212202042 Sri P.Venkata Rao Sir
MCA -4th sem Associate Professor
2. CONTENTS
• 1. Abstract
• 2.Existing system
• 3. Proposed system
• 4. Hardware and software requirement
• 5.Algorithm and examples
• 6.Block diagram
• 7.Conclusion
3. ABSTRACT
Drinking water fraud is a major issue for water supply businesses and authorities. This
conduct generates a very high percentage of non-technical losses and causes large revenue
losses. Developing efficient methods for identifying fake jobs become a viable area of
research in recent years. Water supply companies can identify These fraudulent operations
to reduce losses by using clever data mining tools .This study examines the use of two
split strategies Decision Tree and Bayesian Classification are used to look for questionable
water customers. The client loading profile attributes used by the Decision Tree-based
technique are used to show abnormal behaviour that are known to be linked to non-
technical losses.
4. EXISTING SYSTEM
Water fraud causes significant losses for water supply corporations. The first
two categories are really network water transfer and network washout difficulties in
the manufacturing system, which are all related to technology loss (TL). The amount
of water given to consumers but not charged results in non-technical loss (NTL),
which causes a loss of income. This study examines the use of two spilt
strategies Decision Tree and Baysian classifications.
5. PROPOSED SYSTEM
This project focuses on customer history data, and its major goal is to
employ the well-known data mining techniques Decision Tree (DT) and
Bayesian Classifier to create an adequate model for identifying suspect
fraudulent consumers based on how they use water history metres. - This
study's execution was contracted out to CRSP-DM .
8. DECISION TREE:
Decision Tree is a supervised learning method used in data mining for classification
and regression methods. It is a tree that helps us in decision-making purposes. The decision
tree creates classification or regression models as a tree structure. It separates a data set into
smaller subsets, and at the same time, the decision tree is steadily developed. The final tree
is a tree with the decision nodes and leaf nodes. A decision node has at least two branches.
The leaf nodes show a classification or decision. We can't accomplish more split on leaf
nodes-The uppermost decision node in a tree that relates to the best predictor called the root
node. Decision trees can deal with both categorical and numerical data.
9. •Step-1: Begin the tree with the root node, says C, which contains the
complete dataset.
•Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
•Step-3: Divide the C into subsets that contains possible values for the best
attributes.
•Step-4: Generate the decision tree node, which contains the best attribute.
•Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
ALGORITHM
10. •Decision Tree Example:
Consider the given example of a factory where
• Expanding factor costs $3 million, the probability of a good economy is 0.6 (60%), which
leads to $8 million profit, and the probability of a bad economy is 0.4 (40%), which leads to
$6 million profit.
• Not expanding factor with 0$ cost, the probability of a good economy is 0.6(60%), which
leads to $4 million profit, and the probability of a bad economy is 0.4, which leads to $2
million profit.
11. NAVIE BAYS:
• Navie Bayes classifiers are a collection of classification algorithms based on Bayes’
Theorem. It is not a single algorithm but a family of algorithms where all of them share
a common principle, i.e. every pair of features being classified is independent of each
other.
12. EXAMPLE
• Naïve Bayes' Classifier:
• Naïve Bayes' Classifier can be understood with the help of the below example:
• Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we should play
or not on a particular day according to the weather conditions. So to solve this
problem, we need to follow the below steps:
1.Convert the given dataset into frequency tables.
2.Generate Likelihood table by finding the probabilities of given features.
3.Now, use Bayes theorem to calculate the posterior probability.
• Problem: If the weather is sunny, then the Player should play or not?
14. Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation
that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
16. CONCLUSION
To reduce the fradulent behaviour in water consumption the Decision
tree and bayesian classification models helps. These data mining models has
many advantages compared to other methods it increase profits. To reduce
fradulent behaviour we used this data mining technique.