FinalPres

Counting and Sampling Two-Way Tables
Representing Degree Table Enrollments
Jordan Gillies
Supervisor: Dr. Mary Cryan

Introduction 2
Project Scope
“The problem outlined in this dissertation is that of implementing an
algorithm which can correctly sample tables conforming to the rules of a
degree program, and the semester-splits of it’s enrolled students...”
“Persons chi-squared test and Conditional Volume Testing as outlined by
Diaconis and Efron is used to evaluate collected results”

Introduction 3
Project Scope
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.

Implementng A Folklore Algorithm 4
Implementing The Folklore Algorithm

The Binary Contingency Table Problem
The task of approximately counting the number of
realisable {0,1} matrices with:
– Row Sums:Row Sums: rr = (r1
, …, rm
)
– Column Sums: cc = (c1
, …, cn
)
is known as the Binary Contingency Table Problem

Example Binary Contingency Table
Here:
● r = (4, 3, 4, 4, 2)
● c = (5, 2, 1, 2, 2, 1, 4)

Representing Enrollment Data
● If we were to let each row denote a different student from a
particular year in a degree program, and each column a
class in that same year and program, we can easily
represent this table of enrollments as a two-way table in
similar fashion.
– Each cell ei,j
in an enrollment table E is either 0,
denoting the student is not enrolled in this course, or
non-zero if they are enrolled.
– In our case the non-zero value will equal that of the
credit value of the respected column (class).

Example Enrollment Table
● 20 credit courses: Logic & Spanish
● 10 credit courses: English, Maths, History, Art, PE
● Notice that:
row totals → Total credits a student has taken in classes
column totals / credit value → no. of students enrolled in a class

Folklore Algorithm Components
● Input: row and column sums r and c.
● Counting Algorithm: Dynamically computes the number
of m x k tables for k increasing k in 0...n. Using
compressed partial row sums and a Hash Structure.
● Sampling algorithm: Uses the Hash Structure generated
for the m x n table to uniformly sample tables with row
sums = r and column sums = c.

PRS and Dynamic Counting
Where c[k] is the column set (c1
, …, ck
)

PRS and Dynamic Counting
● This allows us to define a dynamic counting algorithm in
which we can compute N(p', c[k+1]) from N(p, c[k]) and
iteratively increase k until N(r,c) is computed.
● For each partial row sum p in Pk
, we will compute how
many ways we can decompose the column value ck+1
across the m rows. This creates the new set of PRS Pk+1
.
Once Pn
is computed, we will have N(r,c).

Compressed PRS and Shifting
● Throughout the algorithm we will be using a compressed
representation of partial row sums p' = p'0
, …, p'c
in which
p'i
= where pi
= i. The number of ways to decompose m
into c+1 parts is 2mc
.
● For every valid decomposition of column ck+1
, we will
calculate the resulting shifted compressed partial row sum
p*.
∑
i=0
n
pi

Hash Structure
● In order to store the induced compressed partial row sums
for each column we decompose, we will make use of a
Hash Structure.
● Entries have format (k, H(p'), v), in which k represents the
current column, H(p') represents a Hashed value used to
look-up a compressed PRS p', and v representing the total
number of binary contingency matrices so far with
compressed PRS p' and column sums c[k] – N(p',c[k]).

Counting Algorithm
Decomposition of column 1

Counting Algorithm
Over all possible tables (p_k, c[k])

Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:

Counting Algorithm
Calculate the new CPRS entailed
by allocating 1s w.r.t d

Counting Algorithm
Calculate N(p*, c[k+1])
That is :- x *

Counting Algorithm
Update count if already
added p* to index [k+1],
else create a new entry

Sampling Algorithm
Starting from
Hash Table[n]:

Sampling Algorithm
Starting from
Hash Table[n]:
Select a CPRS induced
on the column set c[k-1]
based on how many
tables it contributes to
the count of root. Store
the associated
decomposition.

Sampling Algorithm
Starting from
Hash Table[n]:
Repeat for m-1 columns.
We now have a randomly
generated set of
decompositions.

Sampling Algorithm
Starting from
Hash Table[n]:
Repeat for n-1 columns.
We now have a randomly
generated set of
decompositions. M is
initially a set of empty
rows, which we will build
to reach r using out
randomly generated
decompositions.

Sampling Algorithm
Starting from
Hash Table[n]:
Allocating 1s w.r.t each
decomposition
guarantees we will reach
the goal r.

Why we need to adapt
● The folklore algorithm samples tables with given row and
columns sums. That's all.
● Student Enrollment tables adhere to certain structures.
● Problems are outlined as followed:
– DPT 1: 3rd
Year - Philosophy (MA Hons.):-
● Semester balances
– DPT 2: 3rd
Year - History of Art (MA Hons):-
● Enrollment Rules

Adapting to Change 26
Implementing The Folklore Algorithm

DPT 1: Philosophy
• No rules, simply select 5 or 6
courses.
• Columns (classes) now have
an associated semester.
• In order to sample tables
corresponding to a set of
student class selections, we
will want to preserve the
semester-balances of each
student.
• Lucily, we can extract this
information from our input table
by counting the courses in
semester 1 and 2 for each
student, and separating the
table into 2 sub tables – 1 for
each table. - Semesterisation

Semester-Balanced Algorithm
Split the table into 2 sub-
tables, each with columns
of a particular semester x.
The row total (goal)
become that of the
semester x totals for each
student.

Count each table and
generate, a Hash Table for
each.

Sample 2 sub-tables, and
merge into a valid
Enrollment table.

DPT 2: History of Art
We will now introduce rules across
which combination of classes
(columns) can be selected for a
particular student (row)

Compressed PRS to SOPRS
Equivalent compressed PRS = (0,0,1,2,1)

DPT 2: History of Art
Checkpoint 1
Checkpoint 2
Checkpoint 3
Ancestor = (0,a,b,0....)
Ancestor = (0,a,0,b,....)
Ancestor = (0,a,0,0,b,....)
Ancestor = (0,...)
End
**a + b = m
check
check

The Final Counting Algorithm
The new 'C': max credit value
in each semester (minus the
terminal value)
The checking process, to
ensure no invalid SOPRS
contributes to the final count

Results 35
Results
● Created an efficient algorithm to count and sample student
enrollment tables - semester balanced and rule conforming
– to a particular family of DPTS. 
● Created an efficient algorithm to count and sample student
enrollment tables - semester balanced and rule conforming
– to a particular any DPT. 
● Although, it's certainly not far off...

Applications 36
Application (DEMO)
Chi-squared statistic for a table p with table total n:
CVT score of a table with
Chi-squared value S:
Although as we cant enumerate all possible tables, N(r,c), we will
use a suitable sample size instead

37
Thanks For Listening!
Any Questions?

FinalPres

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (9)

Similar to FinalPres

Similar to FinalPres (20)

FinalPres