SlideShare a Scribd company logo
Counting and Sampling Two-Way Tables
Representing Degree Table Enrollments
Jordan Gillies
Supervisor: Dr. Mary Cryan
Introduction 2
Project Scope
“The problem outlined in this dissertation is that of implementing an
algorithm which can correctly sample tables conforming to the rules of a
degree program, and the semester-splits of it’s enrolled students...”
“Persons chi-squared test and Conditional Volume Testing as outlined by
Diaconis and Efron is used to evaluate collected results”
Introduction 3
Project Scope
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
Implementng A Folklore Algorithm 4
Implementing The Folklore Algorithm
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
Implementng A Folklore Algorithm 5
The Binary Contingency Table Problem
The task of approximately counting the number of
realisable {0,1} matrices with:
– Row Sums:Row Sums: rr = (r1
, …, rm
)
– Column Sums: cc = (c1
, …, cn
)
is known as the Binary Contingency Table Problem
Implementng A Folklore Algorithm 6
Example Binary Contingency Table
Here:
● r = (4, 3, 4, 4, 2)
● c = (5, 2, 1, 2, 2, 1, 4)
Implementng A Folklore Algorithm 7
Representing Enrollment Data
● If we were to let each row denote a different student from a
particular year in a degree program, and each column a
class in that same year and program, we can easily
represent this table of enrollments as a two-way table in
similar fashion.
– Each cell ei,j
in an enrollment table E is either 0,
denoting the student is not enrolled in this course, or
non-zero if they are enrolled.
– In our case the non-zero value will equal that of the
credit value of the respected column (class).
Implementng A Folklore Algorithm 8
Example Enrollment Table
● 20 credit courses: Logic & Spanish
● 10 credit courses: English, Maths, History, Art, PE
● Notice that:
row totals → Total credits a student has taken in classes
column totals / credit value → no. of students enrolled in a class
Implementng A Folklore Algorithm 9
Folklore Algorithm Components
● Input: row and column sums r and c.
● Counting Algorithm: Dynamically computes the number
of m x k tables for k increasing k in 0...n. Using
compressed partial row sums and a Hash Structure.
● Sampling algorithm: Uses the Hash Structure generated
for the m x n table to uniformly sample tables with row
sums = r and column sums = c.
Implementng A Folklore Algorithm 10
PRS and Dynamic Counting
Where c[k] is the column set (c1
, …, ck
)
Implementng A Folklore Algorithm 11
PRS and Dynamic Counting
● This allows us to define a dynamic counting algorithm in
which we can compute N(p', c[k+1]) from N(p, c[k]) and
iteratively increase k until N(r,c) is computed.
● For each partial row sum p in Pk
, we will compute how
many ways we can decompose the column value ck+1
across the m rows. This creates the new set of PRS Pk+1
.
Once Pn
is computed, we will have N(r,c).
Implementng A Folklore Algorithm 12
Compressed PRS and Shifting
● Throughout the algorithm we will be using a compressed
representation of partial row sums p' = p'0
, …, p'c
in which
p'i
= where pi
= i. The number of ways to decompose m
into c+1 parts is 2mc
.
● For every valid decomposition of column ck+1
, we will
calculate the resulting shifted compressed partial row sum
p*.
∑
i=0
n
pi
Implementng A Folklore Algorithm 13
Hash Structure
● In order to store the induced compressed partial row sums
for each column we decompose, we will make use of a
Hash Structure.
● Entries have format (k, H(p'), v), in which k represents the
current column, H(p') represents a Hashed value used to
look-up a compressed PRS p', and v representing the total
number of binary contingency matrices so far with
compressed PRS p' and column sums c[k] – N(p',c[k]).
Implementng A Folklore Algorithm 14
Counting Algorithm
Decomposition of column 1
Implementng A Folklore Algorithm 15
Counting Algorithm
Over all possible tables (p_k, c[k])
Implementng A Folklore Algorithm 16
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Implementng A Folklore Algorithm 17
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Calculate the new CPRS entailed
by allocating 1s w.r.t d
Implementng A Folklore Algorithm 18
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Calculate N(p*, c[k+1])
That is :- x *
Implementng A Folklore Algorithm 19
Counting Algorithm
Calculate every possible way we
can add 1s to the table represented
by p', and for each:
Update count if already
added p* to index [k+1],
else create a new entry
Implementng A Folklore Algorithm 20
Sampling Algorithm
Starting from
Hash Table[n]:
Implementng A Folklore Algorithm 21
Sampling Algorithm
Starting from
Hash Table[n]:
Select a CPRS induced
on the column set c[k-1]
based on how many
tables it contributes to
the count of root. Store
the associated
decomposition.
Implementng A Folklore Algorithm 22
Sampling Algorithm
Starting from
Hash Table[n]:
Repeat for m-1 columns.
We now have a randomly
generated set of
decompositions.
Implementng A Folklore Algorithm 23
Sampling Algorithm
Starting from
Hash Table[n]:
Repeat for n-1 columns.
We now have a randomly
generated set of
decompositions. M is
initially a set of empty
rows, which we will build
to reach r using out
randomly generated
decompositions.
Implementng A Folklore Algorithm 24
Sampling Algorithm
Starting from
Hash Table[n]:
Allocating 1s w.r.t each
decomposition
guarantees we will reach
the goal r.
Implementng A Folklore Algorithm 25
Why we need to adapt
● The folklore algorithm samples tables with given row and
columns sums. That's all.
● Student Enrollment tables adhere to certain structures.
● Problems are outlined as followed:
– DPT 1: 3rd
Year - Philosophy (MA Hons.):-
● Semester balances
– DPT 2: 3rd
Year - History of Art (MA Hons):-
● Enrollment Rules
Adapting to Change 26
Implementing The Folklore Algorithm
● Implement a well-known “folklore” algorithm used to count
and sample binary contingency tables.
● Modify this algorithm to correctly sample data representing
students enrolled in a particular degree and their
respective class selections, ensuring that the semester
splits of each student and rules of class selections outlined
in the respective DPT are satisfied.
● Demonstrate an application of such an algorithm –
Conditional Value Testing.
Adapting to Change 27
DPT 1: Philosophy
• No rules, simply select 5 or 6
courses.
• Columns (classes) now have
an associated semester.
• In order to sample tables
corresponding to a set of
student class selections, we
will want to preserve the
semester-balances of each
student.
• Lucily, we can extract this
information from our input table
by counting the courses in
semester 1 and 2 for each
student, and separating the
table into 2 sub tables – 1 for
each table. - Semesterisation
Adapting to Change 28
Semester-Balanced Algorithm
Split the table into 2 sub-
tables, each with columns
of a particular semester x.
The row total (goal)
become that of the
semester x totals for each
student.
Adapting to Change 29
Semester-Balanced Algorithm
Count each table and
generate, a Hash Table for
each.
Adapting to Change 30
Semester-Balanced Algorithm
Sample 2 sub-tables, and
merge into a valid
Enrollment table.
Adapting to Change 31
DPT 2: History of Art
We will now introduce rules across
which combination of classes
(columns) can be selected for a
particular student (row)
Adapting to Change 32
Compressed PRS to SOPRS
Equivalent compressed PRS = (0,0,1,2,1)
Adapting to Change 33
DPT 2: History of Art
Checkpoint 1
Checkpoint 2
Checkpoint 3
Ancestor = (0,a,b,0....)
Ancestor = (0,a,0,b,....)
Ancestor = (0,a,0,0,b,....)
Ancestor = (0,...)
End
**a + b = m
check
check
Adapting to Change 34
The Final Counting Algorithm
The new 'C': max credit value
in each semester (minus the
terminal value)
The checking process, to
ensure no invalid SOPRS
contributes to the final count
Results 35
Results
● Created an efficient algorithm to count and sample student
enrollment tables - semester balanced and rule conforming
– to a particular family of DPTS. 
● Created an efficient algorithm to count and sample student
enrollment tables - semester balanced and rule conforming
– to a particular any DPT. 
● Although, it's certainly not far off...
Applications 36
Application (DEMO)
Chi-squared statistic for a table p with table total n:
CVT score of a table with
Chi-squared value S:
Although as we cant enumerate all possible tables, N(r,c), we will
use a suitable sample size instead
37
Thanks For Listening!
Any Questions?

More Related Content

What's hot

Interactive notes 5.1
Interactive notes 5.1Interactive notes 5.1
Interactive notes 5.1
venice156
 

What's hot (8)

A0280115(1)
A0280115(1)A0280115(1)
A0280115(1)
 
Matrices
MatricesMatrices
Matrices
 
REDES NEURONALES Performance Optimization
REDES NEURONALES Performance OptimizationREDES NEURONALES Performance Optimization
REDES NEURONALES Performance Optimization
 
MATLAB review questions 2014 15
MATLAB review questions 2014 15MATLAB review questions 2014 15
MATLAB review questions 2014 15
 
Operation Research Techniques in Transportation
Operation Research Techniques in Transportation Operation Research Techniques in Transportation
Operation Research Techniques in Transportation
 
Interactive notes 5.1
Interactive notes 5.1Interactive notes 5.1
Interactive notes 5.1
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
MT6702 Unit 2 Random Number Generation
MT6702 Unit 2 Random Number GenerationMT6702 Unit 2 Random Number Generation
MT6702 Unit 2 Random Number Generation
 

Viewers also liked (9)

FINAL MARKETING PLAN
FINAL MARKETING PLAN FINAL MARKETING PLAN
FINAL MARKETING PLAN
 
Brew & Chew Final
Brew & Chew FinalBrew & Chew Final
Brew & Chew Final
 
Google search
Google searchGoogle search
Google search
 
130214 copy
130214   copy130214   copy
130214 copy
 
RFID Application
RFID ApplicationRFID Application
RFID Application
 
Zones Corporate Value
Zones Corporate ValueZones Corporate Value
Zones Corporate Value
 
Research_Review_Smoking Cessation.docx
Research_Review_Smoking Cessation.docxResearch_Review_Smoking Cessation.docx
Research_Review_Smoking Cessation.docx
 
B2b lead generation process by Vista lead generation
B2b lead generation process by Vista lead generationB2b lead generation process by Vista lead generation
B2b lead generation process by Vista lead generation
 
Creative journal
Creative journalCreative journal
Creative journal
 

Similar to FinalPres

Workshop 4
Workshop 4Workshop 4
Workshop 4
eeetq
 
Probability and statistics (frequency distributions)
Probability and statistics (frequency distributions)Probability and statistics (frequency distributions)
Probability and statistics (frequency distributions)
Don Bosco BSIT
 
Exposure Interval Initial Risk Assessment (with exist.docx
Exposure Interval    Initial Risk Assessment (with exist.docxExposure Interval    Initial Risk Assessment (with exist.docx
Exposure Interval Initial Risk Assessment (with exist.docx
ssuser454af01
 

Similar to FinalPres (20)

Workshop 4
Workshop 4Workshop 4
Workshop 4
 
A Comparative Analysis Of Assignment Problem
A Comparative Analysis Of Assignment ProblemA Comparative Analysis Of Assignment Problem
A Comparative Analysis Of Assignment Problem
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Cse cpl manual-2016
Cse cpl manual-2016Cse cpl manual-2016
Cse cpl manual-2016
 
Fundamentals of quantum computing part i rev
Fundamentals of quantum computing   part i revFundamentals of quantum computing   part i rev
Fundamentals of quantum computing part i rev
 
BPstudy sklearn 20180925
BPstudy sklearn 20180925BPstudy sklearn 20180925
BPstudy sklearn 20180925
 
Probability and statistics (frequency distributions)
Probability and statistics (frequency distributions)Probability and statistics (frequency distributions)
Probability and statistics (frequency distributions)
 
Class X Mathematics Study Material
Class X Mathematics Study MaterialClass X Mathematics Study Material
Class X Mathematics Study Material
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equations
 
Fst ch2 notes
Fst ch2 notesFst ch2 notes
Fst ch2 notes
 
GE3171-PROBLEM SOLVING AND PYTHON PROGRAMMING LABORATORY
GE3171-PROBLEM SOLVING AND PYTHON PROGRAMMING LABORATORYGE3171-PROBLEM SOLVING AND PYTHON PROGRAMMING LABORATORY
GE3171-PROBLEM SOLVING AND PYTHON PROGRAMMING LABORATORY
 
DAA - chapter 1.pdf
DAA - chapter 1.pdfDAA - chapter 1.pdf
DAA - chapter 1.pdf
 
Exposure Interval Initial Risk Assessment (with exist.docx
Exposure Interval    Initial Risk Assessment (with exist.docxExposure Interval    Initial Risk Assessment (with exist.docx
Exposure Interval Initial Risk Assessment (with exist.docx
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Presentacion limac-unc
Presentacion limac-uncPresentacion limac-unc
Presentacion limac-unc
 
Newton cotes integration method
Newton cotes integration  methodNewton cotes integration  method
Newton cotes integration method
 
study Latent Doodle Space
study Latent Doodle Spacestudy Latent Doodle Space
study Latent Doodle Space
 
1.02.Mathematical_background.pptx
1.02.Mathematical_background.pptx1.02.Mathematical_background.pptx
1.02.Mathematical_background.pptx
 
10.1.1.630.8055
10.1.1.630.805510.1.1.630.8055
10.1.1.630.8055
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
 

FinalPres

  • 1. Counting and Sampling Two-Way Tables Representing Degree Table Enrollments Jordan Gillies Supervisor: Dr. Mary Cryan
  • 2. Introduction 2 Project Scope “The problem outlined in this dissertation is that of implementing an algorithm which can correctly sample tables conforming to the rules of a degree program, and the semester-splits of it’s enrolled students...” “Persons chi-squared test and Conditional Volume Testing as outlined by Diaconis and Efron is used to evaluate collected results”
  • 3. Introduction 3 Project Scope ● Implement a well-known “folklore” algorithm used to count and sample binary contingency tables. ● Modify this algorithm to correctly sample data representing students enrolled in a particular degree and their respective class selections, ensuring that the semester splits of each student and rules of class selections outlined in the respective DPT are satisfied. ● Demonstrate an application of such an algorithm – Conditional Value Testing.
  • 4. Implementng A Folklore Algorithm 4 Implementing The Folklore Algorithm ● Implement a well-known “folklore” algorithm used to count and sample binary contingency tables. ● Modify this algorithm to correctly sample data representing students enrolled in a particular degree and their respective class selections, ensuring that the semester splits of each student and rules of class selections outlined in the respective DPT are satisfied. ● Demonstrate an application of such an algorithm – Conditional Value Testing.
  • 5. Implementng A Folklore Algorithm 5 The Binary Contingency Table Problem The task of approximately counting the number of realisable {0,1} matrices with: – Row Sums:Row Sums: rr = (r1 , …, rm ) – Column Sums: cc = (c1 , …, cn ) is known as the Binary Contingency Table Problem
  • 6. Implementng A Folklore Algorithm 6 Example Binary Contingency Table Here: ● r = (4, 3, 4, 4, 2) ● c = (5, 2, 1, 2, 2, 1, 4)
  • 7. Implementng A Folklore Algorithm 7 Representing Enrollment Data ● If we were to let each row denote a different student from a particular year in a degree program, and each column a class in that same year and program, we can easily represent this table of enrollments as a two-way table in similar fashion. – Each cell ei,j in an enrollment table E is either 0, denoting the student is not enrolled in this course, or non-zero if they are enrolled. – In our case the non-zero value will equal that of the credit value of the respected column (class).
  • 8. Implementng A Folklore Algorithm 8 Example Enrollment Table ● 20 credit courses: Logic & Spanish ● 10 credit courses: English, Maths, History, Art, PE ● Notice that: row totals → Total credits a student has taken in classes column totals / credit value → no. of students enrolled in a class
  • 9. Implementng A Folklore Algorithm 9 Folklore Algorithm Components ● Input: row and column sums r and c. ● Counting Algorithm: Dynamically computes the number of m x k tables for k increasing k in 0...n. Using compressed partial row sums and a Hash Structure. ● Sampling algorithm: Uses the Hash Structure generated for the m x n table to uniformly sample tables with row sums = r and column sums = c.
  • 10. Implementng A Folklore Algorithm 10 PRS and Dynamic Counting Where c[k] is the column set (c1 , …, ck )
  • 11. Implementng A Folklore Algorithm 11 PRS and Dynamic Counting ● This allows us to define a dynamic counting algorithm in which we can compute N(p', c[k+1]) from N(p, c[k]) and iteratively increase k until N(r,c) is computed. ● For each partial row sum p in Pk , we will compute how many ways we can decompose the column value ck+1 across the m rows. This creates the new set of PRS Pk+1 . Once Pn is computed, we will have N(r,c).
  • 12. Implementng A Folklore Algorithm 12 Compressed PRS and Shifting ● Throughout the algorithm we will be using a compressed representation of partial row sums p' = p'0 , …, p'c in which p'i = where pi = i. The number of ways to decompose m into c+1 parts is 2mc . ● For every valid decomposition of column ck+1 , we will calculate the resulting shifted compressed partial row sum p*. ∑ i=0 n pi
  • 13. Implementng A Folklore Algorithm 13 Hash Structure ● In order to store the induced compressed partial row sums for each column we decompose, we will make use of a Hash Structure. ● Entries have format (k, H(p'), v), in which k represents the current column, H(p') represents a Hashed value used to look-up a compressed PRS p', and v representing the total number of binary contingency matrices so far with compressed PRS p' and column sums c[k] – N(p',c[k]).
  • 14. Implementng A Folklore Algorithm 14 Counting Algorithm Decomposition of column 1
  • 15. Implementng A Folklore Algorithm 15 Counting Algorithm Over all possible tables (p_k, c[k])
  • 16. Implementng A Folklore Algorithm 16 Counting Algorithm Calculate every possible way we can add 1s to the table represented by p', and for each:
  • 17. Implementng A Folklore Algorithm 17 Counting Algorithm Calculate every possible way we can add 1s to the table represented by p', and for each: Calculate the new CPRS entailed by allocating 1s w.r.t d
  • 18. Implementng A Folklore Algorithm 18 Counting Algorithm Calculate every possible way we can add 1s to the table represented by p', and for each: Calculate N(p*, c[k+1]) That is :- x *
  • 19. Implementng A Folklore Algorithm 19 Counting Algorithm Calculate every possible way we can add 1s to the table represented by p', and for each: Update count if already added p* to index [k+1], else create a new entry
  • 20. Implementng A Folklore Algorithm 20 Sampling Algorithm Starting from Hash Table[n]:
  • 21. Implementng A Folklore Algorithm 21 Sampling Algorithm Starting from Hash Table[n]: Select a CPRS induced on the column set c[k-1] based on how many tables it contributes to the count of root. Store the associated decomposition.
  • 22. Implementng A Folklore Algorithm 22 Sampling Algorithm Starting from Hash Table[n]: Repeat for m-1 columns. We now have a randomly generated set of decompositions.
  • 23. Implementng A Folklore Algorithm 23 Sampling Algorithm Starting from Hash Table[n]: Repeat for n-1 columns. We now have a randomly generated set of decompositions. M is initially a set of empty rows, which we will build to reach r using out randomly generated decompositions.
  • 24. Implementng A Folklore Algorithm 24 Sampling Algorithm Starting from Hash Table[n]: Allocating 1s w.r.t each decomposition guarantees we will reach the goal r.
  • 25. Implementng A Folklore Algorithm 25 Why we need to adapt ● The folklore algorithm samples tables with given row and columns sums. That's all. ● Student Enrollment tables adhere to certain structures. ● Problems are outlined as followed: – DPT 1: 3rd Year - Philosophy (MA Hons.):- ● Semester balances – DPT 2: 3rd Year - History of Art (MA Hons):- ● Enrollment Rules
  • 26. Adapting to Change 26 Implementing The Folklore Algorithm ● Implement a well-known “folklore” algorithm used to count and sample binary contingency tables. ● Modify this algorithm to correctly sample data representing students enrolled in a particular degree and their respective class selections, ensuring that the semester splits of each student and rules of class selections outlined in the respective DPT are satisfied. ● Demonstrate an application of such an algorithm – Conditional Value Testing.
  • 27. Adapting to Change 27 DPT 1: Philosophy • No rules, simply select 5 or 6 courses. • Columns (classes) now have an associated semester. • In order to sample tables corresponding to a set of student class selections, we will want to preserve the semester-balances of each student. • Lucily, we can extract this information from our input table by counting the courses in semester 1 and 2 for each student, and separating the table into 2 sub tables – 1 for each table. - Semesterisation
  • 28. Adapting to Change 28 Semester-Balanced Algorithm Split the table into 2 sub- tables, each with columns of a particular semester x. The row total (goal) become that of the semester x totals for each student.
  • 29. Adapting to Change 29 Semester-Balanced Algorithm Count each table and generate, a Hash Table for each.
  • 30. Adapting to Change 30 Semester-Balanced Algorithm Sample 2 sub-tables, and merge into a valid Enrollment table.
  • 31. Adapting to Change 31 DPT 2: History of Art We will now introduce rules across which combination of classes (columns) can be selected for a particular student (row)
  • 32. Adapting to Change 32 Compressed PRS to SOPRS Equivalent compressed PRS = (0,0,1,2,1)
  • 33. Adapting to Change 33 DPT 2: History of Art Checkpoint 1 Checkpoint 2 Checkpoint 3 Ancestor = (0,a,b,0....) Ancestor = (0,a,0,b,....) Ancestor = (0,a,0,0,b,....) Ancestor = (0,...) End **a + b = m check check
  • 34. Adapting to Change 34 The Final Counting Algorithm The new 'C': max credit value in each semester (minus the terminal value) The checking process, to ensure no invalid SOPRS contributes to the final count
  • 35. Results 35 Results ● Created an efficient algorithm to count and sample student enrollment tables - semester balanced and rule conforming – to a particular family of DPTS.  ● Created an efficient algorithm to count and sample student enrollment tables - semester balanced and rule conforming – to a particular any DPT.  ● Although, it's certainly not far off...
  • 36. Applications 36 Application (DEMO) Chi-squared statistic for a table p with table total n: CVT score of a table with Chi-squared value S: Although as we cant enumerate all possible tables, N(r,c), we will use a suitable sample size instead