Eric Torreborre / FP-Syd
Data generation
The hard parts
NOT SO SIMPLE
Recursive data structures
Polymorphic functions
Constrained data
Recursive data structures
Agile Estimating and Planning
Agile Management
Agile Product Management with Scrum
Agile Product Planning and Analysis
Agile Project Management:
Creating Innovative Products
Agile Project Management For Dummies
Agile Software Development, Principles,
Patterns, and Practices
Agile Software Development with Scrum
Agile Software Development with Distributed Teams
Agile Testing
Agile Estimating and Planning
Agile Management
Agile Product Management with Scrum
Agile Product Planning and Analysis
Agile Project Management:
Creating Innovative Products
Agile Project Management For Dummies
Agile Software Development, Principles,
Patterns, and Practices
Agile Software Development with Scrum
Agile Software Development with Distributed Teams
Agile Testing
Agile
+ Estimating and Planning
+ Management
+ Product
+ Management with Scrum
+ Planning and Analysis
+ Project Management
+ Creating Innovative Products
+ For Dummies
+ Software Development
+ Principles, Patterns, and Practices
+ with Scrum
+ with Distributed Teams
+ Testing
Agile
+ Estimating and Planning
+ Management
+ Product
+ Management with Scrum
+ Planning and Analysis
+ Project Management
+ Creating Innovative Products
+ For Dummies
+ Software Development
+ Principles, Patterns, and Practices
+ with Scrum
+ with Distributed Teams
+ Testing
Generate trees
Generate treesDepth?
Width?
Balanced?
Coverage?
Composition?
Uniformity?
Constraints?
Performance?
programs
well-typed
Size and
dimension
Boltzmann
Model
Combinatorial
species
Same size bound
505
values
on average for n=100
18 constructors
P = 1 / 8
9
P=1/4
P = 1 / 9
generating function
System of equations
solution +
singularity
size in O(n)
Enumerate structures
Sample uniformly
Set of
labels
Family of
structures
2
1
3
4
5
6
2
1
3
4
5
6
2
1
3
4
5
6
b
a
c
d
e
f
b
a
c
d
e
f
b
a
c
d
e
e
2
1
3
4
5
6
2
1
3
4
5
6
Regular species
0
1
X
11
F
1
G
2
3
4
5
F
1
2
3
4
5
G
1
2
3
4
5
X
1
0
X
1
X
1
1
X
1
1 1
1
1
n 11 1 …
n 11 1 …
n 11 1 …
F
1
G
2
3
4
5
F
1
2
3
4
5
G
X 0
X
1
1
X
1
1
X
1
X
X
1
X
1
2
2
X
2
X 1
X
1
X
X
1
X
2
2
3
X
X 3
X
1
X 3
X 2
X
2
X 1
X 3
X
3
X 2
X 1
L X1 L
L X1 L
L X
L X X
L X X X
L
1
2
3
4
1 2 3 4
2 1 3 4
3 2 1 4
L X1 L
L
X
1
1
2 3 4
1
5
9
6 7 8
10
11
No symmetries
GF
1
2
3
4
5
1
2
3
4
5
G
F
G
G G
R LX R
2
1
3
4
5
6 7
F G
F '
1
2
3
4
5
F
1
2
3
4
5
F '
L L L
'
C L
'
F |n|
1 2
3
n
5
F
1
2
3
4
5
n
… …
… /
= n
Non regular speciesNon regular species
E
1 2
3
5
E
1
2
3
4
5
4
C
1 2
3
5
C
4
…
E
CEP
C
P CE
L
'C
L P
GF
1
2
3
4
5
F G
4
1
3
2
5 G
4
1
3
2
5
G
4
1
3
2
5
GF
EE |2|
EX|2|
in code?
Maths…
"seems it is doable
to find such a
function,
but needs work."
"we have a
noneasy
question"
My strategy
number of partitions having p sets
…
int partitions of 6
change of representation
3-int partitions of n
Given an index k
Proper notion
of size
Uniformity
Species combinators
for constraints
Eric Torreborre / FP-Syd
Data generation
The hard parts
Thanks!

Data generation, the hard parts

Editor's Notes

  • #22 Unbound recursivity == non termination
  • #23 high number of dimensions === very large values
  • #24 very small element (just 18 constructors) but lost in an ocean of elements (1/8^^9) 1/8 for a list with 8 elements + 1/8 for each [] list afterwards
  • #25 Irregular probabilities for elements of the same size
  • #27 cn = number of objects of size n in C data type -> system of equation the solution x must be <= ro the value at x = ro can derive a uniform generator: 2 trees of size n have the same probability to be generated With the Boltzmann method, the size of generated trees is random, with a distribution that depends on the specification C and a mean value that goes from 0 to infinity when parameter x goes from 0 to , and is equal to xC0(x)=C(x). The probability for the result to be of size n is cnxn􀀀n, which for most tree specifications and for large n is proportional to n􀀀3 2 xn􀀀n. In all cases, the closest is x to the value of , the biggest is the probability of generating large size trees.
  • #28 enumeration with a clear notion of size = number of labels sample = the tricky part 
  • #31 Let's talk about labels
  • #35 Let's talk about labels
  • #40 Zero element
  • #41 Maybe
  • #42 True/False
  • #49 Ordered pairs
  • #50 Ordered pairs
  • #53 Linear orderings
  • #54 Regular species = (+, x, fixed-point)
  • #55 No symmetries = +, x, fixed point
  • #56 Virtual species No symmetries
  • #57 Composition
  • #58 Composition
  • #59 Composition
  • #60 Composition
  • #61 Composition
  • #63 Composition
  • #64 Composition
  • #65 P = Permutations L /= P even though they enumerate the same values!
  • #66 L /= P even though they enumerate the same values!
  • #67 Functor composition F-structures over the set of all G-structures on U
  • #68 Functor composition S is the species of subsets = E x E
  • #76 How to index set partitions?
  • #77 My strategy