In this presentation we discuss the problems that occur when splitting wide tables across multiple pages. We focus our attention on finding solutions that minimize the impact on the meaning of data when the objective is to reorder the columns such that the number of pages used is minimal. Reordering of columns in a table raises a number of complex optimization problems that we will study in this paper: minimizing page count and at the same time the number of column positions changes or the number of column groups split across pages. We show that by using integer programming solutions the number of pages used when splitting wide tables can be reduced by up to 25% and it can be achieved in short computational time. http://doi.acm.org/10.1145/2494266.2494317
1. DocEng2013, September 10– 13, 2013, Florence, Italy
Splitting Wide Tables Optimally
Mihai Bilauca
Patrick Healy
Department of Computer Science and Information Systems
University of Limerick, Ireland
Supported by Science Foundation Ireland under the research programme 01/P1.2/C009,
Mathematical Foundations, Practical Notations, and Tools for Reliable Flexible Software.
2. Splitting Wide Tables Optimally
Why this paper?
• Tables are widely used for presenting logical
relationships between data items;
• Widely spread WYSIWYG tools have poor support for
wide tables;
• Authoring tables is hard, time consuming and error
prone;
• Style manuals recommendations are not always
supported
• Very little research in this area
Splitting Wide Tables Optimally
Slide 2 of 23
3. A wide table split across multiple pages
Splitting Wide Tables Optimally
Slide 3 of 23
4. + Zoom in
Grouping of data items increases readability
Splitting Wide Tables Optimally
Slide 4 of 23
5. Splitting Wide Tables Optimally
Style recommendations from Chicago Manual of Style
“For a two-page broadside table – which should be presented
on facing pages if at all possible – column heads need not be
repeated; for broadside tables that run beyond two pages,
column heads are repeated only on each new verso.
Where column heads are repeated, the table number and
“continued” should also appear.
For any table that is likely to run to more than one page, the
editor should specify whether continued lines and repeated
column heads will be needed and where footnotes should
appear (usually at the end of the table as a whole).”
Splitting Wide Tables Optimally
Slide 5 of 23
6. Splitting Wide Tables Optimally
Overview
We present MIP Solutions using OPL for 3 problems that occur
when splitting wide tables with the aim to minimize the effect
on the meaning of data:
1. Minimize Page Count
2. Minimize Page Count and Column Positioning
Changes
3. Minimize Page Count and Group Splitting
Report experimental results with IBM CPLEX 12.3
Conclusions
MIP – Mixed Integer Programming
OPL – Optimization Programming Language
Splitting Wide Tables Optimally
Slide 6 of 23
8. 1.Minimum Page Count – OPL Model
dvar int+ pageSel[Pages] in 0..1;
dvar int+ X[Pages][Cols] in 0..1;
dexpr int pageCount = sum(p in Pages) pageSel[p];
minimize pageCount;
subject to
{
ct1: // select only one page for each column
forall(j in Cols)
sum(p in Pages) X[p][j] == 1;
ct2: // only columns that fit in the page
forall(p in Pages)
sum(j in Cols)
colW[j] / pageW ∗ X[p][j] <= pageSel[p];
}
Splitting Wide Tables Optimally
Slide 8 of 23
9. 1.Minimum Page Count - Results
●
Page count can be reduced by 14% to 25%
●
The difficulty of the problem is not directly linked to the
problem size but to the data itself
Columns
10
20
30
40
50
60
PC
7
16
19
29
34
48
OPC
6
12
15
23
26
39
%Imp
14.28%
25.00%
21.05%
20.68%
23.52%
18.75%
Time
2.25
0.13
0.17
1.18
04.30
1.52
Building Table Formatting Tools
Slide 9 of 23
12. 2.Minimum Page Count & Column Positioning Changes
dvar int+ pageSel[Pages] in 0..1;
dvar int+ pageIdx[Cols] in 0..1;
dvar int+ colIdx[Cols] in 0..1;
// check if j1 is placed on a page before j2
dexpr int posO[j1,j2 in Cols] = j1 <= j2−1;
dexpr int posN[j1,j2 in Cols] = (colIdx[j1]<=colIdx[j2]−1)
dexpr float posDiff = sum(j1,j2 in Cols : j2 < j1)
abs(posO[j1,j2] − posN[j1,j2]);
dexpr int pageCount = sum(p in Pages) pageSel[p];
// a, b, obj1Val variables are used for OPL flow control
minimize a * pageCount + b * posDiff;
Splitting Wide Tables Optimally
Slide 12 of 23
13. 2.Minimum Page Count & Column Positioning Changes
subject to {
ct1: // do not exceed page width
forall(p in Pages)
sum(j in Cols)
colW[j]/(p==pageIdx[j]) / pageW <= pageSel[p];
ct2: // page and column indexes relationship
forall(ordered j1,j2 in Cols)
(pageIdx[j1]<=pageIdx[j2]-1) (colIdx[j1]<=colIdx[j2]-1) == 0;
ct3: // unique column index values
forall(ordered j1,j2 in Cols)
colIdx[j1]!=colIdx[j2];
// if the minimum page count obj1Val is set
// maintain this value for subsequent searches
ct4:
if (obj1Val >= 0 ) pageCount == obj1Val;
}
Splitting Wide Tables Optimally
Slide 13 of 23
14. 2.Minimum Page Count & Column Positioning Changes
Results
●
Promising performance:
– 2.25s for minimizing a 10 column table with posDiff
33 down to 4, page count from 9 down to 8;
– 89s for minimizing a 20 column table with posDiff
194 down to 4, page count from 13 down to 11;
●
Computational time increases with columns number
●
The data instance can have no better solutions
Building Table Formatting Tools
Slide 14 of 23
15. 3.Minimum Page Count & Group Splitting
Splitting Wide Tables Optimally
Slide 15 of 23
16. 3.Minimum Page Count & Group Splitting
User specifies which columns should preferably be
kept together
PageW: 490 points
colW : [210, 140, 210, 420, 280, 350, 70, 140, 140, 350]
7 pages: {210,140} {210} {420} {280} {350,70} {140,140}
{350}
Minimum 5 pages:
ColIdx:[3, 5, 4, 7, 10, 6, 8, 1, 2, 9]
Pages: {210,280} {420} {70,350} {350,140} {210,140,140}
Group columns 2,3 and 7:
colIdx:[2, 3, 7, 4, 9, 10, 6, 8, 1, 5]
Pages :{140,210,70} {420} {140,350} {350,140} {210,280}
Splitting Wide Tables Optimally
Slide 16 of 23
17. 3.Minimum Page Count & Group Splitting
int colG[Cols] = ...;// column groups
dvar int+ pageSel[Pages] in 0..1;
dvar int+ pageIdx[Cols] in 0..1;
// find the first column of the group
int gFirstCol[g in groups] =
first({j | j in Cols : colG[j] == g});
// counts how many columns of a group are on a
// different page than the first group’s column
dexpr int gSplit[g in groups ] =
sum(j in Cols : colG[j] == g )
(pageIdx[j] != pageIdx[gFirstCol[g]]);
dexpr int gSplitCount = sum(g in groups)
(gSplit[g] >= 1 );
dexpr int pageCount = sum(p in Pages) pageSel[p];
Splitting Wide Tables Optimally
Slide 17 of 23
18. 3.Minimum Page Count & Group Splitting
// a, b, obj1Val variables are used for OPL flow control
minimize a * pageCount + b * posDiff;
subject to {
ct1: // do not exceed page width
forall(p in Pages)
sum(j in Cols)
colW[j] * (p==pageIdx[j])/ pageW <= pageSel[p];
// if the minimum page count obj1Val is set
// maintain this value for subsequent searches
ct2:
if (obj1Val >= 0 ) pageCount == obj1Val;
}
Splitting Wide Tables Optimally
Slide 18 of 23
19. 3.Minimum Page Count & Group Splitting Model
Results
●
●
Promising performance:
●
1m for a 20 column table with 3 groups, none
split, page count from 12 down to 9;
●
2m for 30-40 column tables but time increased
up to 12m when the number of groups
increased;
Computational time increases with columns and
groups number
●
Some relaxed solutions can be preffered
Building Table Formatting Tools
Slide 19 of 23
21. Conclusions
•
•
•
Optimal arrangement of columns such that the
page count is minimized when splitting wide tables
can be achieved in relatively short running time; for
tables with 60 columns a solution has been found
in less than 2s;
If additional criteria are added, for example
minimizing the number of relative column positions
changes,the problems become harder as the
number of columns increase;
the difficulty of the problems not only depends on
the problem size but on the complexity of the data;
Splitting Wide Tables Optimally
Slide 21 of 23
22. Ongoing work
Minimizing the overall page count when a large table
containing text is displayed on fixed size pages and
neither column widths nor row heights are known in
advance.
Splitting Wide Tables Optimally
Slide 22 of 23