This document describes adapting a folklore algorithm for counting and sampling binary contingency tables to sample student enrollment data while satisfying degree program rules and semester balances. It discusses representing enrollment data as two-way tables, implementing the folklore algorithm components, and modifying it to first split tables into semester sub-tables to preserve balances, and then introduce compressed partial row sums to represent class selection rules. The adapted algorithm is applied to sample data for two example degree programs, and conditional value testing is proposed to evaluate results.
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
FinalPres
1. Counting and Sampling Two-Way Tables
Representing Degree Table Enrollments
Jordan Gillies
Supervisor: Dr. Mary Cryan
2. Introduction 2
Project Scope
“The problem outlined in this dissertation is that of implementing an
algorithm which can correctly sample tables conforming to the rules of a
degree program, and the semester-splits of it’s enrolled students...”
“Persons chi-squared test and Conditional Volume Testing as outlined by
Diaconis and Efron is used to evaluate collected results”
3. Introduction 3
Project Scope
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
4. Implementng A Folklore Algorithm 4
Implementing The Folklore Algorithm
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
5. Implementng A Folklore Algorithm 5
The Binary Contingency Table Problem
The task of approximately counting the number of
realisable {0,1} matrices with:
– Row Sums:Row Sums: rr = (r1
, …, rm
)
– Column Sums: cc = (c1
, …, cn
)
is known as the Binary Contingency Table Problem
6. Implementng A Folklore Algorithm 6
Example Binary Contingency Table
Here:
● r = (4, 3, 4, 4, 2)
● c = (5, 2, 1, 2, 2, 1, 4)
7. Implementng A Folklore Algorithm 7
Representing Enrollment Data
● If we were to let each row denote a different student from a
particular year in a degree program, and each column a
class in that same year and program, we can easily
represent this table of enrollments as a two-way table in
similar fashion.
– Each cell ei,j
in an enrollment table E is either 0,
denoting the student is not enrolled in this course, or
non-zero if they are enrolled.
– In our case the non-zero value will equal that of the
credit value of the respected column (class).
8. Implementng A Folklore Algorithm 8
Example Enrollment Table
● 20 credit courses: Logic & Spanish
● 10 credit courses: English, Maths, History, Art, PE
● Notice that:
row totals → Total credits a student has taken in classes
column totals / credit value → no. of students enrolled in a class
9. Implementng A Folklore Algorithm 9
Folklore Algorithm Components
● Input: row and column sums r and c.
● Counting Algorithm: Dynamically computes the number
of m x k tables for k increasing k in 0...n. Using
compressed partial row sums and a Hash Structure.
● Sampling algorithm: Uses the Hash Structure generated
for the m x n table to uniformly sample tables with row
sums = r and column sums = c.
10. Implementng A Folklore Algorithm 10
PRS and Dynamic Counting
Where c[k] is the column set (c1
, …, ck
)
11. Implementng A Folklore Algorithm 11
PRS and Dynamic Counting
● This allows us to define a dynamic counting algorithm in
which we can compute N(p', c[k+1]) from N(p, c[k]) and
iteratively increase k until N(r,c) is computed.
● For each partial row sum p in Pk
, we will compute how
many ways we can decompose the column value ck+1
across the m rows. This creates the new set of PRS Pk+1
.
Once Pn
is computed, we will have N(r,c).
12. Implementng A Folklore Algorithm 12
Compressed PRS and Shifting
● Throughout the algorithm we will be using a compressed
representation of partial row sums p' = p'0
, …, p'c
in which
p'i
= where pi
= i. The number of ways to decompose m
into c+1 parts is 2mc
.
● For every valid decomposition of column ck+1
, we will
calculate the resulting shifted compressed partial row sum
p*.
∑
i=0
n
pi
13. Implementng A Folklore Algorithm 13
Hash Structure
● In order to store the induced compressed partial row sums
for each column we decompose, we will make use of a
Hash Structure.
● Entries have format (k, H(p'), v), in which k represents the
current column, H(p') represents a Hashed value used to
look-up a compressed PRS p', and v representing the total
number of binary contingency matrices so far with
compressed PRS p' and column sums c[k] – N(p',c[k]).
15. Implementng A Folklore Algorithm 15
Counting Algorithm
Over all possible tables (p_k, c[k])
16. Implementng A Folklore Algorithm 16
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
17. Implementng A Folklore Algorithm 17
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Calculate the new CPRS entailed
by allocating 1s w.r.t d
18. Implementng A Folklore Algorithm 18
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Calculate N(p*, c[k+1])
That is :- x *
19. Implementng A Folklore Algorithm 19
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Update count if already
added p* to index [k+1],
else create a new entry
21. Implementng A Folklore Algorithm 21
Sampling Algorithm
Starting from
Hash Table[n]:
Select a CPRS induced
on the column set c[k-1]
based on how many
tables it contributes to
the count of root. Store
the associated
decomposition.
22. Implementng A Folklore Algorithm 22
Sampling Algorithm
Starting from
Hash Table[n]:
Repeat for m-1 columns.
We now have a randomly
generated set of
decompositions.
23. Implementng A Folklore Algorithm 23
Sampling Algorithm
Starting from
Hash Table[n]:
Repeat for n-1 columns.
We now have a randomly
generated set of
decompositions. M is
initially a set of empty
rows, which we will build
to reach r using out
randomly generated
decompositions.
24. Implementng A Folklore Algorithm 24
Sampling Algorithm
Starting from
Hash Table[n]:
Allocating 1s w.r.t each
decomposition
guarantees we will reach
the goal r.
25. Implementng A Folklore Algorithm 25
Why we need to adapt
● The folklore algorithm samples tables with given row and
columns sums. That's all.
● Student Enrollment tables adhere to certain structures.
● Problems are outlined as followed:
– DPT 1: 3rd
Year - Philosophy (MA Hons.):-
● Semester balances
– DPT 2: 3rd
Year - History of Art (MA Hons):-
● Enrollment Rules
26. Adapting to Change 26
Implementing The Folklore Algorithm
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
27. Adapting to Change 27
DPT 1: Philosophy
• No rules, simply select 5 or 6
courses.
• Columns (classes) now have
an associated semester.
• In order to sample tables
corresponding to a set of
student class selections, we
will want to preserve the
semester-balances of each
student.
• Lucily, we can extract this
information from our input table
by counting the courses in
semester 1 and 2 for each
student, and separating the
table into 2 sub tables – 1 for
each table. - Semesterisation
28. Adapting to Change 28
Semester-Balanced Algorithm
Split the table into 2 sub-
tables, each with columns
of a particular semester x.
The row total (goal)
become that of the
semester x totals for each
student.
29. Adapting to Change 29
Semester-Balanced Algorithm
Count each table and
generate, a Hash Table for
each.
30. Adapting to Change 30
Semester-Balanced Algorithm
Sample 2 sub-tables, and
merge into a valid
Enrollment table.
31. Adapting to Change 31
DPT 2: History of Art
We will now introduce rules across
which combination of classes
(columns) can be selected for a
particular student (row)
32. Adapting to Change 32
Compressed PRS to SOPRS
Equivalent compressed PRS = (0,0,1,2,1)
33. Adapting to Change 33
DPT 2: History of Art
Checkpoint 1
Checkpoint 2
Checkpoint 3
Ancestor = (0,a,b,0....)
Ancestor = (0,a,0,b,....)
Ancestor = (0,a,0,0,b,....)
Ancestor = (0,...)
End
**a + b = m
check
check
34. Adapting to Change 34
The Final Counting Algorithm
The new 'C': max credit value
in each semester (minus the
terminal value)
The checking process, to
ensure no invalid SOPRS
contributes to the final count
35. Results 35
Results
● Created an efficient algorithm to count and sample student
enrollment tables - semester balanced and rule conforming
– to a particular family of DPTS.
● Created an efficient algorithm to count and sample student
enrollment tables - semester balanced and rule conforming
– to a particular any DPT.
● Although, it's certainly not far off...
36. Applications 36
Application (DEMO)
Chi-squared statistic for a table p with table total n:
CVT score of a table with
Chi-squared value S:
Although as we cant enumerate all possible tables, N(r,c), we will
use a suitable sample size instead