Models15

A Model-Based
Framework for
Probabilistic Simulation
of Legal Policies
Ghanem Soltana, Nicolas Sannier, Mehrdad Sabetzadeh,
and Lionel Briand
SnT Centre for Security, Reliability and Trust
University of Luxembourg, Luxembourg

How did this work come about?
2
• Collaboration with
Government of
Luxembourg
 CTIE: Government’s IT Centre
 ACD: Tax Administration Department
• New tax system under development
• Develop tailored solutions for decision-support and
software verification

Context
3
Using UM L for M odeling Procedural Legal Rules:
A pproach and a Study of Luxembourg’s Tax Law
Ghanem Soltana, Elizabeta Fourneret, Morayo Adedjouma,
Mehrdad Sabetzadeh, and Lionel Briand
SnT Centre for Security, Reliability and Trust, University of Luxembourg
{ f i r st name. l ast name} @uni . l u
A bst ract . Many laws, e.g., those concerning taxes and social beneﬁts,
need to be operationalized and implemented into public administration
procedures and eGovernment applications. Where such operationaliza-
tion is warranted, the legal frameworks that interpret the underlying

Context
4
Simulation
data Generates
(optional)
Simulates
Models of
legal
policies
0%
2%
4%
6%
8%
10%
12%
0%
5%
10%
15%
20%
25%
0-10.000
10.000-20.000
20.000-30.000
30.000-40.000
40.000-50.000
50.000-60.000
60.000-70.000
70.000-80.000
80.000-90.000
90.000-100.000
100.000-110.000
110.000-120.000
120.000-130.000
130.000-140.000
140.000-150.000
150.000-160.000
160.000-170.000
170.000-180.000
180.000-190.000
190.000-200.000
200.000-250.000
250.000-350.000
350.000-500.000
500.000-700.000
700.000-1.000.000
>1.000.000
Gross annual income (in Euros)
Contributiontorevenue
Households
Percentage
Percentage
Percentage
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0
1-3.000
3.001-6.000
6.001-9.000
9.001-12.000
12.001-15.000
15.001-18.000
18.001-21.000
21.001-24.000
24.001-27.000
27.001-30.000
>30.000
Annual income taxes due (in Euros)
Households
Before change
After change
Input to
Impact of legal
policy changes on
variables of
interest

Objectives
5
• Simulating the impact of legal policy changes
• Enabling simulation even when simulation data
is not available
Simulation
data Generates
(optional)
Simulates
Models of
legal
policies
0%
2%
4%
6%
8%
10%
12%
0%
5%
10%
15%
20%
25%
0-10.000
10.000-20.000
20.000-30.000
30.000-40.000
40.000-50.000
50.000-60.000
60.000-70.000
70.000-80.000
80.000-90.000
90.000-100.000
100.000-110.000
110.000-120.000
120.000-130.000
130.000-140.000
140.000-150.000
150.000-160.000
160.000-170.000
170.000-180.000
180.000-190.000
190.000-200.000
200.000-250.000
250.000-350.000
350.000-500.000
500.000-700.000
700.000-1.000.000
>1.000.000
Gross annual income (in Euros)
Contributiontorevenue
Households
Percentage
Percentage
Percentage
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0
1-3.000
3.001-6.000
6.001-9.000
9.001-12.000
12.001-15.000
15.001-18.000
18.001-21.000
21.001-24.000
24.001-27.000
27.001-30.000
>30.000
Households
Before change
After change
Input to
Impact of legal
policy changes on
variables of
interest

Legal policy simulation in practice
6
Some existing simulation tools focused on taxation and social
security:
• ASSERT: Assessing the effects of reforms in taxation
• SYSIFF: A micro-simulation model for the French tax system
• POLIMOD: A national static tax-benefit model for the UK
• EUROMOD: European benefit-tax model and social integration

Dee
EUROMOD example
7
Dependent age range Dependent count

Limitations of current simulation
frameworks
9
• Legal policies are hard-to-validate
• Single-purpose models
• Unusable when simulation data is not available

• Legal policies should be captured in a
precise and yet easy to understand manner
• Automated simulation/analysis should be
possible even when data is not available
Desiderata
10

11
• Legal policies are from prescriptive laws
- Taxation and social benefits
• No change in human behavior due to legal policy
modifications
Working assumptions

Our policy simulation framework
Relevant
legal texts
Domain model
Policy models
Model
legal policies
Generated
simulation data
Simulation
results
¨
Generate
simulation data
Annotated
domain model
<<s>>
<>
<>
<<m>>
Annotate
domain model with
probabilities
≠
ÆØPerform
simulation
Is simulation
data available?
Yes
No
Simulation
data
12

• A legal policy model captures the procedure envisaged by law for
performing a certain activity
• Notation: Extended Activity Diagrams (ADs)
• Facilitates communication between legal and IT experts
ExpressiveVisual
PreciseExecutable
ADs
Legal policy models
[Soltana et al., 2014]
13

Art. 105bis […] The commuting expenses deduction is defined as a
function over the distance between the principal towns of the
municipalities of a taxpayer's home and his place of work.
The distance is measured in units of distance expressing the kilometric
distance between [principal] towns. A ministerial regulation provides
these distances.
The amount of the deduction is calculated as follows:
• If the distance exceeds 4 units but is less than 30 units, the deduction
is 99€ per unit of distance.
• The first 4 units are not taken into account and the deduction for a
distance exceeding 30 units is limited to 2,574€.
* Translation from French text
Excerpt from the income tax law
14

Example policy model
15
Procedure
defined by the
legal policy

16
Inputs from the
legal policy

17
Inputs from the
simulation data
Domain
model
(partial)

Simulation framework overview
Relevant
legal texts
Domain model
Policy models
Model
legal policies
Generated
simulation data
Simulation
results
¨
Generate
simulation data
Annotated
domain model
<<s>>
<>
<>
<<m>>
Annotate
domain model with
probabilities
≠
ÆØPerform
simulation
Is simulation
data available?
Yes
No
Simulation
data
18

Related work on instance generation
• Exhaustive search:
- UML2CSP [Cabot et al., 2014]
- Alloy [Jackson, 2009]
• Non-exhaustive techniques:
- Metaheuristic-search [Ali et al., 2013]
- Predefined patterns [Gogolla et al., 2005]
- Mutation analysis [Di Nardo et al., 2015]
- Configurable random generation [Hartmann et al.,
2014]
19

Limitations in existing work
Existing techniques cannot generate
data that is suitable for our analysis
needs
20
Representativenes
s
Scalability
Limitation
s

Our solution to generate simulation
data
21
Random
generation
Profile for
capturing
probabilistic
characteristics of
the real population
Scalability Representativenes
s
guided by
Limitation
s

Relative frequencies
* Source: STATEC, Luxembourg
60% of income types are Employment, 20% are Pension,
and the remaining 20% are Other
22

23
Histograms
- «from histogram»
birthYear: Integer [1]
TaxPayer

24
Distributions
* Source: Synthetized data
OCL query

25
Probabilistic multiplicities
«multiplicity»
{relativeTo: Income
source: «from barchart»}
1 taxpayer incomes 1..*
Income
TaxPayer (abstract)

26
Conditional probabilities
1 taxpayer incomes 1..*
Income
TaxPayer (abstract)
«type dependency»
{relativeTo: Income;
condition: self.getAge() >= 60;
source: «from barchart»}

27
Consistency constraints
The sound application of the profile’s stereotypes is enforced by
several consistency constraints:
• Completeness of the probabilistic information
• Well-formedness of the probabilistic information
• Mutual-exclusiveness application of certain stereotypes

Relevant
legal texts
Domain model
Policy models
Model
legal policies
Generated
simulation data
Simulation
results
¨
Generate
simulation data
Annotated
domain model
<<s>>
<>
<>
<<m>>
Annotate
domain model with
probabilities
≠
ÆØPerform
simulation
Is simulation
data available?
Yes
No
Simulation
data
28

29
Fully automated data generation
Policy models (set)
Simulation
data (instance
of slice model)
Annotated domain model
<<s>>
<>
<>
Slice
model
Slice
domain model
¨
1
2
6
3
7
8
9
5
4
Instantiate
slice model
Ø
Traversal order
a c
b
d
a' b'
c'
d'
Segments
classiﬁcation
Identify
traversal order
ÆClassify
path segments
Simulation unit (class)
≠
Sample size

Relevant
legal texts
Domain model
Policy models
Model
legal policies
Generated
simulation data
Simulation
results
¨
Generate
simulation data
Annotated
domain model
<<s>>
<>
<>
<<m>>
Annotate
domain model with
probabilities
≠
ÆØPerform
simulation
Is simulation
data available?
Yes
No
Simulation
data
30

31
Simulation process
Activity Diagram(s)
(legal rule) Feedback
Generate
simulation code
Simulation code
Visualize and
analyze results
Run simulator
Simulation Results
Simulation
data
Domain model
Original and
modified sets of
legal policies

33
Research questions
• RQ1: Do data generation and simulation run in reasonable
time?
• RQ2: Does our data generator produce data that is
consistent with the specified characteristics of the
population?
• RQ3: Are the results of different data generation runs
consistent (up to random variation)?

34
Case study
• Models for personal income taxes
were created (domain model + policy
models)
• Six representative policy models were
selected (out of 18 policy models)
• All models used in this evaluation were
validated by legal experts

35
Probabilistic information
Statistic Description
Age Distribution of taxpayers by age
Income type Relative distribution of different incomes types
(employment, agriculture, business and trade, etc.)
Income rage Distribution of the annual income ranges for taxpayers
Invalidity rate Percentage of invalid taxpayers
Invalidity type Relative distribution of different invalidity types
Residence
status
Relative distribution of resident versus non-resident
taxpayers
…
15 distributions (from census and synthetized data) were used to
specify Luxembourg’s population’s characteristics
STATEC, Luxembourg

36
RQ1: Do data generation and simulation run in
reasonable time?
0 1k 2k 3k 4k 5k 6k 7k 8k 9k 10k
051015202530
ID + CIS + PE + FD + LD + CIP
ID + CIS + PE + FD + LD
ID + CIS + PE + FD
ID + CIS + PE
ID + CIS
ID
Number of generated tax cases
Executiontime(inminutes)
Results for the
generator
- Deduction for invalidity (ID)
- Credit for salaried workers (CIS)
- Deduction for permanent expenses
(PE)
- Deduction for commuting expenses
(FD)
- Deduction for long-term debts (LD)
- Credits for pensioners (CIP)

37
- Deduction for invalidity (ID)
- Credit for salaried workers (CIS)
- Deduction for permanent expenses
(PE)
- Deduction for commuting expenses
(FD)
- Deduction for long-term debts (LD)
- Credits for pensioners (CIP)
Results for the simulator
RQ1: Do data generation and
simulation run in reasonable
time?

38
RQ2: Does our data generator produce data
that is consistent with the specified
characteristics?
Generated sample
starts to be
representative for a
size above 2000 units

39
RQ3: Are the results of different
data generation runs consistent?
• 5 samples of 5000 tax cases
• Pairwise comparison of the generated samples using
kolmogorov-smirnov test
No counter-evidence that the samples come from different
populations

40
Ongoing work
• Decision-support for the Government’s actual tax reforms
• Evaluating the accuracy of the simulation results
0%
10%
20%
30%
40%
50%
60%
70%
Tax class 1 Tax class 1.a Tax class 2
Taxpayers
Before change
After change
- 20%!
0%!
20%!
40%!
60%!
80%!
100%!
>21.001!
18.001-21.000!
15.001-18.000!
12.001-15.000!
9001-1200!
6001-9000!
3001-6000!
1-3000!
0!
1-3000!
3001-6000!
6001-9000!
9001-1200!
12.001-15.000!
15.001-18.000!
18.001-21.000!
>21.001!
Less taxes to pay! More taxes to pay!
Annual decrease / increase in taxes due (in Euros)!
Households!

41
Summary
• Model-based simulation framework for legal
policies
• A profile for expressing probabilistic characteristics
of a population
• An automated stochastic data generator
• Preliminary evaluation of scalability,
representativeness, and reproducibility is promising
• Applied to assess actual tax reforms

43
Model sizes
• The domain model has: 64 classes, 43 generalizations, 344
attributes, and 53 associations
• The six policy models have an average of 35 elements

44
Path segments classification illustration
Sample unit
3
2
1
Safe
Unsafe

45
Traversal order illustration
Sample unit
3
2
1

46
Simulation results
Taxpayer AEP (old) AEP (new) Old Tax Class New Tax Class Income Type Gross Taxable Taxes (new) Taxes (old)
Resident_Tax_Payer 1 0 0 One_A One_A Other 21535,32 19150 0 0
Resident_Tax_Payer 2 0 0 Two One Pension 21588 21550 1218 0
Non_Resident_Tax_Payer 3 0 0 Two Two Employment 21600 19200 0 0
Resident_Tax_Payer 4 0 0 Two One Employment 21600 19200 790 14124 (with spouse)
Resident_Tax_Payer 5 4500 0 Two One_A Employment 21600 19200 0 3146(with spouse)
Resident_Tax_Payer 6 0 0 Two One Employment 21612 19200 790 10283(with spouse)
…
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0
1-3.000
3.001-6.000
6.001-9.000
9.001-12.000
12.001-15.000
15.001-18.000
18.001-21.000
21.001-24.000
24.001-27.000
27.001-30.000
>30.000
Households
Before change
After change

Models15

Recommended

Recommended

More Related Content

Similar to Models15

Similar to Models15 (20)

Recently uploaded

Recently uploaded (20)

Models15

Editor's Notes