This document presents a method to detect corruption using machine learning and natural language processing. Users provide anonymous feedback about public services received. The feedback is clustered using a static centroid k-means algorithm to group employees as honest, less honest, or corrupted based on averages of responses. The results provide an ethical distribution of corruption within an organization to identify problematic individuals.
Corruption detection using machine learning and natural language
1. Corruption detection Using Machine
Learning and Natural Language Processing
Supervised by
Dr. M.M.A. Hashem
Professor
Dept of CSE
Presented by
Md Zabirul Islam
Roll:1507110
Anik Pramanik
Roll:1507103
2. Outline
Objective
Introduction
Methodology
i. Creating Datasets
ii. Data Processing
iii. Clustering Technique
Results
Conclusion & Future work
References
Department of Computer Science and Engineering 1
3. Objective
Corruption in Bangladesh has been a continuing
problem
In general, corruption means “the abuse of
entrusted power for private gain”.
There are some common sectors where
corruption is highly effected i.e. Public Services,
Land Administration, Tax Administration,
Customs Administration etc..
Department of Computer Science and Engineering 2
4. Introduction
In this paper, an intelligent system has been developed by creating a user feedback
interface.
When a person receives a service from an organization then he/she can submit his/her
opinion against specific person or the organization anonymously.
Then clustering and NLP technique will be executed and output will be sorted for the
person of the organization according to their corruption level.
Department of Computer Science and Engineering 3
5. Flowchart
The service receiver provide two types opinion.
Feedback are stored into database .
Then the proposed algorithm i.e. clustering and NLP technique
will execute separately to find out their corruption level.
Department of Computer Science and Engineering 4
Figure 1: System Flowchart
6. Methodology
Creating Datasets:
When the user visits to the evaluation portal to
provide his/her opinion.
The service receiver will get a form to
evaluate the employee.
There are 5 psychological statements to
evaluate the employee.
The options have provided points from 1 to 5.
Figure 2: Sample questions for employee A
Department of Computer Science and Engineering 5
7. Cont..
Data Preprocessing:
We will store feedback of all users against
employee A.
We will preprocess the data by calculating
average of users rating of each questions of
employee A.
For other employee e.g. B, C, D we will do the
same and store them in database.
Figure 4: Average of each question of employee A
Department of Computer Science and Engineering 6
Figure 3: Feedback for employee A
8. Cont..
Clustering Algorithm:
Our dataset created with 5 features or attributes (fig 5).
This complete dataset with 5 dimension is used to
separate the corrupted people by using the proposed
“Static Centroid k-means Clustering Algorithm.
Our proposed “Static Centroid k-means clustering”
almost similar with “k-means Clustering”.
Figure 5: Data for each employee
Department of Computer Science and Engineering 7
9. Cont..
The difference is, in this clustering the centroid value is
defined manually and it will be fixed for all the centers.
Before calculating static centroid k-means clustering,
we will execute traditional k-means clustering
Cluster 1 honest person
. Cluster 2 less honest person
Cluster 3 corrupted employee’s group.
Department of Computer Science and Engineering 8
Figure 7: Clustered Data
Figure 8: Static Centers
10. Overall Methodology
Department of Computer Science and Engineering 9Static Centers
Ethical Distribution of an organization Clustered Data
Data for each employee
Average of each question of employee A
Feedback for employee ASample questions for employee A
11. Expected Result
Using static centroids , we can easily calculate class label for new points.
We will evaluate class of all employees and get a general idea of ethical standard
of the whole organization
Figure 9: Ethical Distribution of an organization
Department of Computer Science and Engineering 10
12. CONCLUSION AND FUTURE WORK
This model will be effective in society if the corrupted people are being faced
punishment by using independent feedback about them.
There are a lots of future scope of this model. The proposed model can be
upgraded by adding comment section .
Auto mail sending option to concern organizations after a specific time duration
can be developed .
It is possible to generate a history graph to find the improvement of employees .
Department of Computer Science and Engineering 11
13. References
[1] e. V., T. (2018). Transparency International - What is Corruption? [online]
Transparency.org. Available at: https://www.transparency.org/what-is-corruption
[Accessed 12 May 2018].
[2] Bliss, B. (2018). Bangladesh Corruption Report. [online] Business Anti-Corruption
Portal. Available at: https://www.business-anti-corruption.com/country-
profiles/bangladesh/ [Accessed 12 May 2018].
Department of Computer Science and Engineering 12