Counting and Sampling Two-Way Tables
Representing Degree Table Enrollments
Jordan Gillies
Supervisor: Dr. Mary Cryan
Introduction 2
Project Scope
“The problem outlined in this dissertation is that of implementing an
algorithm which can correctly sample tables conforming to the rules of a
degree program, and the semester-splits of it’s enrolled students...”
“Persons chi-squared test and Conditional Volume Testing as outlined by
Diaconis and Efron is used to evaluate collected results”
Introduction 3
Project Scope
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
Implementng A Folklore Algorithm 4
Implementing The Folklore Algorithm
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
Implementng A Folklore Algorithm 5
The Binary Contingency Table Problem
The task of approximately counting the number of
realisable {0,1} matrices with:
– Row Sums:Row Sums: rr = (r1
, …, rm
)
– Column Sums: cc = (c1
, …, cn
)
is known as the Binary Contingency Table Problem
Implementng A Folklore Algorithm 6
Example Binary Contingency Table
Here:
● r = (4, 3, 4, 4, 2)
● c = (5, 2, 1, 2, 2, 1, 4)
Implementng A Folklore Algorithm 7
Representing Enrollment Data
● If we were to let each row denote a different student from a
particular year in a degree program, and each column a
class in that same year and program, we can easily
represent this table of enrollments as a two-way table in
similar fashion.
– Each cell ei,j
in an enrollment table E is either 0,
denoting the student is not enrolled in this course, or
non-zero if they are enrolled.
– In our case the non-zero value will equal that of the
credit value of the respected column (class).
Implementng A Folklore Algorithm 8
Example Enrollment Table
● 20 credit courses: Logic & Spanish
● 10 credit courses: English, Maths, History, Art, PE
● Notice that:
row totals → Total credits a student has taken in classes
column totals / credit value → no. of students enrolled in a class
Implementng A Folklore Algorithm 9
Folklore Algorithm Components
● Input: row and column sums r and c.
● Counting Algorithm: Dynamically computes the number
of m x k tables for k increasing k in 0...n. Using
compressed partial row sums and a Hash Structure.
● Sampling algorithm: Uses the Hash Structure generated
for the m x n table to uniformly sample tables with row
sums = r and column sums = c.
Implementng A Folklore Algorithm 10
PRS and Dynamic Counting
Where c[k] is the column set (c1
, …, ck
)
Implementng A Folklore Algorithm 11
PRS and Dynamic Counting
● This allows us to define a dynamic counting algorithm in
which we can compute N(p', c[k+1]) from N(p, c[k]) and
iteratively increase k until N(r,c) is computed.
● For each partial row sum p in Pk
, we will compute how
many ways we can decompose the column value ck+1
across the m rows. This creates the new set of PRS Pk+1
.
Once Pn
is computed, we will have N(r,c).
Implementng A Folklore Algorithm 12
Compressed PRS and Shifting
● Throughout the algorithm we will be using a compressed
representation of partial row sums p' = p'0
, …, p'c
in which
p'i
= where pi
= i. The number of ways to decompose m
into c+1 parts is 2mc
.
● For every valid decomposition of column ck+1
, we will
calculate the resulting shifted compressed partial row sum
p*.
∑
i=0
n
pi
Implementng A Folklore Algorithm 13
Hash Structure
● In order to store the induced compressed partial row sums
for each column we decompose, we will make use of a
Hash Structure.
● Entries have format (k, H(p'), v), in which k represents the
current column, H(p') represents a Hashed value used to
look-up a compressed PRS p', and v representing the total
number of binary contingency matrices so far with
compressed PRS p' and column sums c[k] – N(p',c[k]).
Implementng A Folklore Algorithm 14
Counting Algorithm
Decomposition of column 1
Implementng A Folklore Algorithm 15
Counting Algorithm
Over all possible tables (p_k, c[k])
Implementng A Folklore Algorithm 16
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Implementng A Folklore Algorithm 17
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Calculate the new CPRS entailed
by allocating 1s w.r.t d
Implementng A Folklore Algorithm 18
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Calculate N(p*, c[k+1])
That is :- x *
Implementng A Folklore Algorithm 19
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Update count if already
added p* to index [k+1],
else create a new entry
Implementng A Folklore Algorithm 20
Sampling Algorithm
Starting from
Hash Table[n]:
Implementng A Folklore Algorithm 21
Sampling Algorithm
Starting from
Hash Table[n]:
Select a CPRS induced
on the column set c[k-1]
based on how many
tables it contributes to
the count of root. Store
the associated
decomposition.
Implementng A Folklore Algorithm 22
Sampling Algorithm
Starting from
Hash Table[n]:
Repeat for m-1 columns.
We now have a randomly
generated set of
decompositions.
Implementng A Folklore Algorithm 23
Sampling Algorithm
Starting from
Hash Table[n]:
Repeat for n-1 columns.
We now have a randomly
generated set of
decompositions. M is
initially a set of empty
rows, which we will build
to reach r using out
randomly generated
decompositions.
Implementng A Folklore Algorithm 24
Sampling Algorithm
Starting from
Hash Table[n]:
Allocating 1s w.r.t each
decomposition
guarantees we will reach
the goal r.
Implementng A Folklore Algorithm 25
Why we need to adapt
● The folklore algorithm samples tables with given row and
columns sums. That's all.
● Student Enrollment tables adhere to certain structures.
● Problems are outlined as followed:
– DPT 1: 3rd
Year - Philosophy (MA Hons.):-
● Semester balances
– DPT 2: 3rd
Year - History of Art (MA Hons):-
● Enrollment Rules
Adapting to Change 26
Implementing The Folklore Algorithm
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
Adapting to Change 27
DPT 1: Philosophy
• No rules, simply select 5 or 6
courses.
• Columns (classes) now have
an associated semester.
• In order to sample tables
corresponding to a set of
student class selections, we
will want to preserve the
semester-balances of each
student.
• Lucily, we can extract this
information from our input table
by counting the courses in
semester 1 and 2 for each
student, and separating the
table into 2 sub tables – 1 for
each table. - Semesterisation
Adapting to Change 28
Semester-Balanced Algorithm
Split the table into 2 sub-
tables, each with columns
of a particular semester x.
The row total (goal)
become that of the
semester x totals for each
student.
Adapting to Change 29
Semester-Balanced Algorithm
Count each table and
generate, a Hash Table for
each.
Adapting to Change 30
Semester-Balanced Algorithm
Sample 2 sub-tables, and
merge into a valid
Enrollment table.
Adapting to Change 31
DPT 2: History of Art
We will now introduce rules across
which combination of classes
(columns) can be selected for a
particular student (row)
Adapting to Change 32
Compressed PRS to SOPRS
Equivalent compressed PRS = (0,0,1,2,1)
Adapting to Change 33
DPT 2: History of Art
Checkpoint 1
Checkpoint 2
Checkpoint 3
Ancestor = (0,a,b,0....)
Ancestor = (0,a,0,b,....)
Ancestor = (0,a,0,0,b,....)
Ancestor = (0,...)
End
**a + b = m
check
check
Adapting to Change 34
The Final Counting Algorithm
The new 'C': max credit value
in each semester (minus the
terminal value)
The checking process, to
ensure no invalid SOPRS
contributes to the final count
Results 35
Results
● Created an efficient algorithm to count and sample student
enrollment tables - semester balanced and rule conforming
– to a particular family of DPTS. 
● Created an efficient algorithm to count and sample student
enrollment tables - semester balanced and rule conforming
– to a particular any DPT. 
● Although, it's certainly not far off...
Applications 36
Application (DEMO)
Chi-squared statistic for a table p with table total n:
CVT score of a table with
Chi-squared value S:
Although as we cant enumerate all possible tables, N(r,c), we will
use a suitable sample size instead
37
Thanks For Listening!
Any Questions?

FinalPres

  • 1.
    Counting and SamplingTwo-Way Tables Representing Degree Table Enrollments Jordan Gillies Supervisor: Dr. Mary Cryan
  • 2.
    Introduction 2 Project Scope “Theproblem outlined in this dissertation is that of implementing an algorithm which can correctly sample tables conforming to the rules of a degree program, and the semester-splits of it’s enrolled students...” “Persons chi-squared test and Conditional Volume Testing as outlined by Diaconis and Efron is used to evaluate collected results”
  • 3.
    Introduction 3 Project Scope ●Implement a well-known “folklore” algorithm used to count and sample binary contingency tables. ● Modify this algorithm to correctly sample data representing students enrolled in a particular degree and their respective class selections, ensuring that the semester splits of each student and rules of class selections outlined in the respective DPT are satisfied. ● Demonstrate an application of such an algorithm – Conditional Value Testing.
  • 4.
    Implementng A FolkloreAlgorithm 4 Implementing The Folklore Algorithm ● Implement a well-known “folklore” algorithm used to count and sample binary contingency tables. ● Modify this algorithm to correctly sample data representing students enrolled in a particular degree and their respective class selections, ensuring that the semester splits of each student and rules of class selections outlined in the respective DPT are satisfied. ● Demonstrate an application of such an algorithm – Conditional Value Testing.
  • 5.
    Implementng A FolkloreAlgorithm 5 The Binary Contingency Table Problem The task of approximately counting the number of realisable {0,1} matrices with: – Row Sums:Row Sums: rr = (r1 , …, rm ) – Column Sums: cc = (c1 , …, cn ) is known as the Binary Contingency Table Problem
  • 6.
    Implementng A FolkloreAlgorithm 6 Example Binary Contingency Table Here: ● r = (4, 3, 4, 4, 2) ● c = (5, 2, 1, 2, 2, 1, 4)
  • 7.
    Implementng A FolkloreAlgorithm 7 Representing Enrollment Data ● If we were to let each row denote a different student from a particular year in a degree program, and each column a class in that same year and program, we can easily represent this table of enrollments as a two-way table in similar fashion. – Each cell ei,j in an enrollment table E is either 0, denoting the student is not enrolled in this course, or non-zero if they are enrolled. – In our case the non-zero value will equal that of the credit value of the respected column (class).
  • 8.
    Implementng A FolkloreAlgorithm 8 Example Enrollment Table ● 20 credit courses: Logic & Spanish ● 10 credit courses: English, Maths, History, Art, PE ● Notice that: row totals → Total credits a student has taken in classes column totals / credit value → no. of students enrolled in a class
  • 9.
    Implementng A FolkloreAlgorithm 9 Folklore Algorithm Components ● Input: row and column sums r and c. ● Counting Algorithm: Dynamically computes the number of m x k tables for k increasing k in 0...n. Using compressed partial row sums and a Hash Structure. ● Sampling algorithm: Uses the Hash Structure generated for the m x n table to uniformly sample tables with row sums = r and column sums = c.
  • 10.
    Implementng A FolkloreAlgorithm 10 PRS and Dynamic Counting Where c[k] is the column set (c1 , …, ck )
  • 11.
    Implementng A FolkloreAlgorithm 11 PRS and Dynamic Counting ● This allows us to define a dynamic counting algorithm in which we can compute N(p', c[k+1]) from N(p, c[k]) and iteratively increase k until N(r,c) is computed. ● For each partial row sum p in Pk , we will compute how many ways we can decompose the column value ck+1 across the m rows. This creates the new set of PRS Pk+1 . Once Pn is computed, we will have N(r,c).
  • 12.
    Implementng A FolkloreAlgorithm 12 Compressed PRS and Shifting ● Throughout the algorithm we will be using a compressed representation of partial row sums p' = p'0 , …, p'c in which p'i = where pi = i. The number of ways to decompose m into c+1 parts is 2mc . ● For every valid decomposition of column ck+1 , we will calculate the resulting shifted compressed partial row sum p*. ∑ i=0 n pi
  • 13.
    Implementng A FolkloreAlgorithm 13 Hash Structure ● In order to store the induced compressed partial row sums for each column we decompose, we will make use of a Hash Structure. ● Entries have format (k, H(p'), v), in which k represents the current column, H(p') represents a Hashed value used to look-up a compressed PRS p', and v representing the total number of binary contingency matrices so far with compressed PRS p' and column sums c[k] – N(p',c[k]).
  • 14.
    Implementng A FolkloreAlgorithm 14 Counting Algorithm Decomposition of column 1
  • 15.
    Implementng A FolkloreAlgorithm 15 Counting Algorithm Over all possible tables (p_k, c[k])
  • 16.
    Implementng A FolkloreAlgorithm 16 Counting Algorithm Calculate every possible way we can add 1s to the table represented by p', and for each:
  • 17.
    Implementng A FolkloreAlgorithm 17 Counting Algorithm Calculate every possible way we can add 1s to the table represented by p', and for each: Calculate the new CPRS entailed by allocating 1s w.r.t d
  • 18.
    Implementng A FolkloreAlgorithm 18 Counting Algorithm Calculate every possible way we can add 1s to the table represented by p', and for each: Calculate N(p*, c[k+1]) That is :- x *
  • 19.
    Implementng A FolkloreAlgorithm 19 Counting Algorithm Calculate every possible way we can add 1s to the table represented by p', and for each: Update count if already added p* to index [k+1], else create a new entry
  • 20.
    Implementng A FolkloreAlgorithm 20 Sampling Algorithm Starting from Hash Table[n]:
  • 21.
    Implementng A FolkloreAlgorithm 21 Sampling Algorithm Starting from Hash Table[n]: Select a CPRS induced on the column set c[k-1] based on how many tables it contributes to the count of root. Store the associated decomposition.
  • 22.
    Implementng A FolkloreAlgorithm 22 Sampling Algorithm Starting from Hash Table[n]: Repeat for m-1 columns. We now have a randomly generated set of decompositions.
  • 23.
    Implementng A FolkloreAlgorithm 23 Sampling Algorithm Starting from Hash Table[n]: Repeat for n-1 columns. We now have a randomly generated set of decompositions. M is initially a set of empty rows, which we will build to reach r using out randomly generated decompositions.
  • 24.
    Implementng A FolkloreAlgorithm 24 Sampling Algorithm Starting from Hash Table[n]: Allocating 1s w.r.t each decomposition guarantees we will reach the goal r.
  • 25.
    Implementng A FolkloreAlgorithm 25 Why we need to adapt ● The folklore algorithm samples tables with given row and columns sums. That's all. ● Student Enrollment tables adhere to certain structures. ● Problems are outlined as followed: – DPT 1: 3rd Year - Philosophy (MA Hons.):- ● Semester balances – DPT 2: 3rd Year - History of Art (MA Hons):- ● Enrollment Rules
  • 26.
    Adapting to Change26 Implementing The Folklore Algorithm ● Implement a well-known “folklore” algorithm used to count and sample binary contingency tables. ● Modify this algorithm to correctly sample data representing students enrolled in a particular degree and their respective class selections, ensuring that the semester splits of each student and rules of class selections outlined in the respective DPT are satisfied. ● Demonstrate an application of such an algorithm – Conditional Value Testing.
  • 27.
    Adapting to Change27 DPT 1: Philosophy • No rules, simply select 5 or 6 courses. • Columns (classes) now have an associated semester. • In order to sample tables corresponding to a set of student class selections, we will want to preserve the semester-balances of each student. • Lucily, we can extract this information from our input table by counting the courses in semester 1 and 2 for each student, and separating the table into 2 sub tables – 1 for each table. - Semesterisation
  • 28.
    Adapting to Change28 Semester-Balanced Algorithm Split the table into 2 sub- tables, each with columns of a particular semester x. The row total (goal) become that of the semester x totals for each student.
  • 29.
    Adapting to Change29 Semester-Balanced Algorithm Count each table and generate, a Hash Table for each.
  • 30.
    Adapting to Change30 Semester-Balanced Algorithm Sample 2 sub-tables, and merge into a valid Enrollment table.
  • 31.
    Adapting to Change31 DPT 2: History of Art We will now introduce rules across which combination of classes (columns) can be selected for a particular student (row)
  • 32.
    Adapting to Change32 Compressed PRS to SOPRS Equivalent compressed PRS = (0,0,1,2,1)
  • 33.
    Adapting to Change33 DPT 2: History of Art Checkpoint 1 Checkpoint 2 Checkpoint 3 Ancestor = (0,a,b,0....) Ancestor = (0,a,0,b,....) Ancestor = (0,a,0,0,b,....) Ancestor = (0,...) End **a + b = m check check
  • 34.
    Adapting to Change34 The Final Counting Algorithm The new 'C': max credit value in each semester (minus the terminal value) The checking process, to ensure no invalid SOPRS contributes to the final count
  • 35.
    Results 35 Results ● Createdan efficient algorithm to count and sample student enrollment tables - semester balanced and rule conforming – to a particular family of DPTS.  ● Created an efficient algorithm to count and sample student enrollment tables - semester balanced and rule conforming – to a particular any DPT.  ● Although, it's certainly not far off...
  • 36.
    Applications 36 Application (DEMO) Chi-squaredstatistic for a table p with table total n: CVT score of a table with Chi-squared value S: Although as we cant enumerate all possible tables, N(r,c), we will use a suitable sample size instead
  • 37.