Electricity theft detection using image processing
1. Dr.D.Y.Patil Institute of Technology Pimpri,Pune-411018
Department of Computer Engineering
2019-2020
PROJECT REPORT ON
Electricity Theft Detection using Machine
Learning Algorithm
Submitted by :
BCOB23 Anwar Patel
BCOB24 Onkar Yadav
BCOB25 Ankush Maratkar
BCOB26 Nilesh Maher
Project Guide:
Prof. Rajesh Bharati
2. Contents
Abstract
Introduction
Motivation
Existing System
Mathematical Model
Advantage and Disadvantage
Proposed System
Goals and Objective
System Architecture & Explanation
Literature Review
Hardware requirement and Software requirement
Conclusion
References
3. Abstract
Electricity theft is one of the major problems of electric utilities .Such
electricity theft produces financial loss to the utility companies. It is not
possible to inspect manually such theft in large amount of data. For
detecting such electricity theft introduces OCR(Optical Character
Recognition) algorithm which covert the any image to text data and we
also use some mathematical formulas . In this we apply preprocessing on
meter data then do feature selection. It detects theft using anomaly
behavior of user. Electricity Data of different society and commercial
users is given to find the thefts.
4. Introduction
Many electric utilities have financial loss due to electricity theft. Here are
various types of electrical power theft, including Tapping a line or
bypassing the energy meter.
According to a study 80% of worldwide theft occurs in private dwellings
and 20% on commercial and industrial premises. If we try to detect the theft
manually then it’s not possible as large amount of data will be there.
To improve the accuracy of input data we are using Google-OCR image
processing algorithm. Image processing includes images of meter, in short
we are using meter images to get the exact reading of user.
We implement a supervised ML-based theft detection model that identifies
whether an abnormal/fraudulent usage pattern has occurred in the meter.
Here we use s meter image data to detect the theft. Using electricity
consumption data we detect theft from available all societies. Here we used
the ensemble technique that is boosting which required less time and gives
more accuracy.
5. Motivation
Many electric utilities have financial loss due to electricity theft .The
theft is detected from the dataset using user ID or meter unit image
which is converted into text file using Google-OCR. To recover the loss
of electricity utilities.
Here are various types of electrical power theft , including Tapping a
line or bypassing the energy meter .
Honest consumers , poor people and those without connections , who
bear the burden of high tariffs, system inefficiencies ,and inadequate
and unreliable power supply.
6. Existing System
In existing system utilities have to send their employee to check the
smart meter of Society and Commercial Industries users.
When employee goes to check the meter and that time if he get any
users meter is off then only they get the theft otherwise peoples are
doing the electricity theft .
To minimize the electricity theft we developing a project and
overcoming the existing system problems.
7. Mathematical Model
Mathematical model
Let S be the Whole system which consists:
S= {Dataset }.
Where,
Meter data is the input of the system.
OP is the output is as Fraud and trustworthy in our system.
8. Input:
IP={ I }
Where, I is set of dataset provided as an input.
Procedure:
Step1:- Dataset with user id, meter image.
Step2:- Convert the meter image to text data.
Step3:- We propose preprocessing with given data.
Step4:- Use Average calculation formula for finding average meter
reading of twelve month the user
Step5:- As per comparison show result.
Output:
Getting the theft available or not in dataset as output result.
9. Advantage
We detect electricity theft to avoid financial loss of electric utilities.
It improve the analysis and prediction accuracy.
Google-OCR Algorithm used for meter image processing.
Reducing manpower and Increasing automation system.
Reduce Time Complexity for detecting the theft.
10. Disadvantage
Applicable for large data.
As machine learning project comes accuracy factor comes there is no
100% accuracy.
Data must contain variations.
11. Proposed System
• The Highest electricity theft done by India $16.2 billion in this world.
Second highest electricity theft doing in Brazil $10.5 billion.
• In India maximum theft is happening in Maharashtra which include in
Mumbai alone $2.8 billion.
• In this proposed system we use dataset having electricity usage of a
Smart meter and industrial meter usage.
• Using this dataset we does feature selection and preprocessing on
dataset. As we use feature selection it gives us more accuracy.
• Then we perform the preprocessing on that data. After that we use
Google-OCR algorithm over other ML algorithms detect the theft.
12. Goal and Objective
The main goal of this project is to detect electricity theft.
Power theft is one of the most prevalent issues which not only cause
economic losses but also irregular supply of electricity utilities.
To reduce the electricity theft in the society and commercial industries.
This project is increase the data accuracy using Machine Learning and
Automation System.
18. Literature Review
Sr.N
o
Authors Name Title of the
Paper
Advantage Advantage we used
in our project
1 Jeyaranjani J,
Devaraj D D
Machine
Learning
Algorithm for
Efficient Power
Theft
Detection.
The trustworthiness of
customers is verified and is
selected for theft program.
This analysis is carried out
by tweaking the actual Smart
Meter data to create
fraudulent data.
Use neural network
which gives high
accuracy.
2 P. Jokar, N.
Arianpoo, and
V. C. M. Leung
Electricity theft
detection in AMI
using customers’
consumption
patterns..
we present a novel
consumption pattern-based
energy theft detector, which
leverages the predictability
property of customers'
normal and malicious
consumption patterns.
Application of
appropriate
classification and
clustering
techniques, as well
as concurrent use of
transformer meters
and anomaly
detectors.
19. Sr.N
o.
Author’s Name Title of the Paper Advantage Advantage we used
in our project
3 Buzau, J.
Aguilera, P.
Romero, and
A. Exposito
Detection of Non-
Technical Losses
Using Smart Meter
Data and Supervised
Learning.
It can obtain an in-
depth analysis of the
customer's
consumption behavior
It can obtain an in-
depth analysis of the
customer's
consumption
behavior.
21. What is OCR ?
Machine replication of human functions, like reading, is an ancient
dream. However, over the last five decades, machine reading has grown
from a dream to reality.
Optical character recognition has become one of the most successful
applications of technology in the field of pattern recognition and
artificial intelligence.
22. Baseline Fitting: -
In the event that the contents lines found at that point baselines are
absolutely fitted utilizing quadratic spline. That’s nothing but another to
begin with for an OCR framework and empowered Tesseract to handle
pages with bended baselines which are a common antique in scanning.
The baselines are fitted by dividing the blobs into sensible and
continuous displacement of the first straight baseline. A quadratic spline
is fitted to the foremost crowded segment by a slightest squares fit.
More conventional cubic spline might work better. Speaks to an
illustration of a line of content with a fitted pattern, slid line too cruel
line and ascender line.
All these lines are parallel and somewhat bended. The ascender line is
light dark and the dark line lover it in straight organize. The near
assessment appears that the cyan line is bended to the straight dark line.
23. Line Finding:-
Tesseract is an open source OCR engine is created by HP in 1994. The
line finding calculation is one of step of Tesseract that has been
distributed. The line finding algorithm is utilized for a skewed page that
can be recognized without having de-skewed, consequently by sparing
misfortune of picture quality. The key parts of the line finding are blob
sifting and line construction.
On suspicion of examination of content locales and straightforward
percentile tallness channel expels drop cap and vertically touch the
character, the middle stature approximate the measure of content within
the locale subsequently it is secure to channel the blobs which are littler
than the division being most accentuation, clamor and diacritical marks.
The sifted blobs fit the show of non-overlapping, parallel and inclining
lines. Sorting and preparing of the blobs by x-coordinate allocates the
blob to a one of a kind content line and track that incline over the page.
Once the sifted blobs are doled out to lines a slightest middle of squares
fit and it is utilized to assess.
24. Fixed Pitch Detection and Chopping:-
Tesseract tests the content lines and decides whether they are settled
pitch or not. It finds settle pitch and after that Tesseract chops the words
into characters by utilizing the debilitates and pitch. i.e. for the word
acknowledgment step.
25. Proportional Word Finding:-
The non-fixed pitch (relatives content dividing) is exceedingly non-trivial
errand. The crevice between the tens and units inexact Twelve percent is
comparable estimate to the common space and it is certainly bigger than
the kerned space between garbage and aerated.
There’s no flat crevice display in between the bounding boxes of money
related and of. Tesseract unravel these issue. By measuring crevices in a
restricted vertical extend between the pattern and cruel line the issue is to
be get unraveled. Spaces that are closer to the threshold since the ultimate
choice can be made after the acknowledgment of word.
26. Hardware Requirement:
System Processors : Core i3,i5,i7.
Speed : 2.4 GHz
Ram : 2 GB
Hard Disk : 500 GB
Software Requirement:
Operating system : 64-bit Windows 7,8,10.
Coding Language: Python
Design constraints : Spyder and Pycharm.
27. Conclusion
This proposed system detects the electricity theft using Google-OCR
machine learning method and basic mathematical formulae. This system
helps to electricity utilities to detect electricity theft and they will not have
to financial loss. It hampers functioning of industries and factories, due to
shortage of power supplied to them.
28. References
Tripathi, Bhasker (26 March 2018). "Now, India is the third largest
electricity producer ahead of Russia, Japan". Business Standard India.
Retrieved 27 September 2019.
"One Nation-One Grid". Power Grid Corporation of India. Retrieved 2
December 2016.
P. Jokar, N. Arianpoo, and V. C. M. Leung, “Electricity theft detection
in AMI using customers’ consumption patterns,” IEEE Trans. Smart
Grid, vol. 7, no. 1, pp. 216-226, Jan. 2016
M. Buzau, J. Aguilera, P. Romero, and A. Expósito, “Detection of Non-
Technical Losses Using Smart Meter Data and Supervised Learning,”
IEEE Trans. Smart Grid, Feb. 2018. [DOI:
10.1109/TSG.2018.2807925]
P. Jokar, N. Arianpoo, and V. C. M. Leung, IEEE Trans. Smart Grid,
vol. 7. (Jan. 2016)
P. J. Rousseeuw, A. M. Leroy, Robust Regression and Outlier
Detection. (Wiley- IEEE, 2003)