We will talk about devising the indicators of tax evasion from big data obtained by the Tax Administration. The talk will also present the path of the project including problems in data acquisition, data transformation, understanding between researches from different scientific fields and professions, and need of including wide range of profiles and skills in order to obtain good results. We will also present ideas for the future work including higher presence of machine learning techniques.
Using Big Data Analytics to improve efficiency of tax collection in the tax administration of the Republic of Serbia
1. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
USING BIG DATA ANALYTICS TO IMPROVE
EFFICIENCY OF TAX COLLECTION IN THE TAX
ADMINISTRATION OF THE REPUBLIC OF SERBIA
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c,
Nataša Krklec Jerinki´c, Dragana Markovi´c
Faculty of Sciences, University of Novi Sad, Serbia
Tax Administration of the Republic of Serbia
DSC 5.0
Belgrade 2019
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
2. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Contents
Research setting
Eaxample of the risk indicator using big data analytics
Lessons learned
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
3. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Research setting
Team and skillset
Communication and exchange
Research approach
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
4. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Team and skillset
Skills
Economics
Data base management (Oracle)
Statistics (Stata)
Numerical optimization, alghorithms (Matlab)
Big data computing software (Python)
Communication
Administration of tax collection
Taxes regulation
Data science team
Team at the Faculty of Sciences (PMF UNS): 8 persons
(researchers, research assistants, IT staff)
Team in Tax administration, Department for risk analysis
and Sector for transformation of the TA: 5 persons
(employees in TA)
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
5. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Communication and exchange
Initializing is important:
It takes time and effort to bring the team to the same level
of understanding of the goal, data, context →
meetings, meetings, meetings
To understand data and its information value it is important
to read the regulation and existing guidelines for users
To understand the context and the problem (tax evasion):
reading of the existing studies and discussion with the
Users (TA). However respecting they everyday priorities
(not to do the research)
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
6. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Communication and exchange
Regular briefings on progress and feedback
Every 3 months, joint meeting with the TA team to report on
progress, get feedback, inform hypothesis by the
experience and ensure to use the dataset properly (always
some caveats that can be clarified only by practitioners who
"live close to the field and the data")
Research team exchange and consultations - a right way
to fruitful ideas
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
7. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Research approach
Understanding of the context and the problem (various
sources of insights: literature, discussions, news)
Tax evasion is high in Serbia reflecting in the shadow
economy estimated at about 15% to 20% of GDP
The highest level of grey economy is related to personal
income tax - consequence is undeclared labour or partly
declared earning (partly unregistered payment in cash)
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
8. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Research approach
Database transformation for analytical usage
Transforming the original dataset (at the level of tax
declaration) to the analytical table (at the level of individual
monthly income by type of income and by payer of income)
Transforming the set of attributes referring to the fields in
the tax declaration to a meaningful list of variables of
interest for analysis (sum of all monthly income by person
from different tranches of salary or different sources of
income, delay in payment etc.)
Large effort, but one off investment for the time long of the
research.
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
9. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Research approach
Database description
Using samples is more "handy" (of about 1 million lines),
but selecting the appropriate type of sample is always a
challenge (e.g. random, a whole business sector, time
series for a set of firms, a section by time of input etc.)
Making hypothesis based on identified "deviations" always
referring to some theoretical framework (e.g. Why people
pay taxes; theory on income and wealth distribution; wage
equation etc.)
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
10. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Research approach
Selecting the potential risk indicators to develop
Indicators are based on evidence on the deviation of
behavior from the expected pattern (relying on theory and
experience)
Intuition helps a lot!
It is very useful to test the idea and set priority indicators for
development with practitioners (TA team)
Mathematical modeling (on sample) and computation on
the entire population (example follows)
Testing in practice (tax control based on priority list
obtained using the risk indicator(s)) →
Can be done in parallel for several different indicators.
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
11. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Data distribution
M - monthly income of an individual
Categories of income:
(22000, 23000], (23000, 24000], ..., (150000, 151000]
In general Bi = ((i − 1) ∗ 1000, i ∗ 1000]
ni - number of individuals in Bi.
empirical probability
P(M ∈ Bi) =
ni
n
,
where n = 151
i=23 ni.
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
12. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Number of ind. ni vs category Bi
60,000
80,000
100,000
120,000
140,000
160,000
empirical
0
20,000
40,000
60,000
23
30
37
44
51
58
65
72
79
86
93
100
107
114
121
128
135
142
149
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
13. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Theoretical distribution
Previous results → Log-normal distribution
M : logN(µ, σ)
Data → parameters µ, σ
Theoretical probability
pi := P(M ∈ Bi)
Scaling with n → theoretical number of ind. in Bi
mi := pin
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
14. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Theoretical distribution
15000
20000
25000
30000
35000
40000
45000
theoretical
0
5000
10000
15000
23
30
37
44
51
58
65
72
79
86
93
100
107
114
121
128
135
142
149Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
15. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Disagreement
60,000
80,000
100,000
120,000
140,000
160,000
empirical
theoretical
0
20,000
40,000
60,000
23
31
39
47
55
63
71
79
87
95
103
111
119
127
135
143
151
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
16. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Around 22% of individuals
60,000
80,000
100,000
120,000
140,000
160,000
0
20,000
40,000
60,000
23
29
35
41
47
53
59
65
71
77
83
89
95
101
107
113
119
125
131
137
143
149
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
17. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Sectors
Construction Sector
Financial and Insurance Sector
0K
2K
4K
6K
8K
10K
12K
14K
16K
0K
3K
6K
9K
12K
15K
18K
21K
24K
27K
30K
33K
36K
39K
42K
45K
48K
51K
54K
57K
60K
63K
66K
69K
72K
75K
78K
81K
84K
87K
90K
93K
96K
99K
102K
105K
108K
111K
114K
117K
120K
123K
126K
129K
132K
135K
138K
141K
144K
147K
150K
numberofsalaries
Beverage and Food Production
Computer Programming
0K
2K
4K
6K
8K
10K
12K
14K
16K
18K
20K
22K
24K
26K
0K
3K
6K
9K
12K
15K
18K
21K
24K
27K
30K
33K
36K
39K
42K
45K
48K
51K
54K
57K
60K
63K
66K
69K
72K
75K
78K
81K
84K
87K
90K
93K
96K
99K
102K
105K
108K
111K
114K
117K
120K
123K
126K
129K
132K
135K
138K
141K
144K
147K
150K
numberofsalaries
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
18. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Risk indicators
Observe only companies with 10 or more employees
Observe a company’s monthly income distribution
Measure deviation from the expected (benchmark)
distribution
Benchmark - business industry
ρ1 - deviation for a fixed month
ρ2 - deviation for a certain time period
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
19. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Risk indicator ρ1
Start from the minimum salary of E0 = 15000
Form bins (categories)
Bi = [Ei−1, Ei)
Bin’s width
li = Ei − Ei−1
l1 = 1500 → B1 = [15000, 16500)
Increase bin width for 10%, →
li+1 = 1.1li
27 bins:
[15000, 16500), [16500, 18150), ..., [178773, 196650), [196650, ∞)
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
20. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Risk indicator ρ1
Benchmark distribution
Observe the entities within a business industry
Di - number of individuals in category Bi
Number of relevant observed entities
D =
27
i=1
Di
Form
d = (d1, ..., d27)T
,
where
di =
Di
D
≈ P(M ∈ Bi)
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
21. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Risk indicator ρ1
Company’s distribution
Observe the entities within a company
Ai - number of individuals in category Bi
Number of relevant observed entities
A =
27
i=1
Ai
Form
a = (a1, ..., a27)T
,
where
ai =
Ai
A
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
22. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Risk indicator ρ1
Measures deviation of a from d.
Lot of possible choices - a − d 2, a − d 1, ...
Ex.
a − d 1 =
27
i=1
|ai − di|
Nonuniform treatment of categories → weighted norm
27
i=1
wi|ai − di|
Putting more weight to deviation in smaller salaries
ρ1 =
27
i=1
|ai − di|
E2
i
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
23. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Risk indicator ρ2
Including the time component →
ρ2 =
1
m
m
j=1
ρ1(j)
ρ1(j) - risk indicator for month j
Risk indicators → comparable
Calculate risk indicators for all sectors
Range the companies by risk indicator ρ
Risk categories: first 33% - high; last 33% low; the rest -
medium.
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
24. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
High risk
High risk
0K
2K
4K
6K
8K
10K
12K
14K
16K
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
25. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Medium risk
Medium risk
0K
10K
20K
30K
40K
50K
60K
70K
80K
90K
100K
110K
120K
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
26. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Low risk
Low risk
0K
10K
20K
30K
40K
50K
60K
70K
80K
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA
27. Introduction Team and skillset Communication and exchange Research approach Risk indicators Conclusions
Lessons learned
There is no such a person as DATA SCIENTIST: there is a
data science team
Communication, learning and exchange of opinions and
insights is essential for achieving good data analytics
results
(As the purpose of the dataset is not to use it for the
specific goal of the research), it is crucial to understand the
principal goal and rules behind the database (tax
regulation, tax behaviour in practice and from experience
of TA employees)
Relying on existing theoretical and empirical findings in
setting hypothesis (on tax related behavior) is very helpful
to conduct research on big datasets
Jasna Atanasijevi´c, Dušan Jakoveti´c, Nataša Kreji´c, Nataša Krklec Jerinki´c, Dragana Markovi´c UNS
USING BIG DATA ANALYTICS TO IMPROVE EFFICIENCY OF TAX COLLECTION IN THE REPUBLIC OF SERBIA