Comparing human solving time with SAT-solving for Sudoku problems

Comparing human solving time with
SAT-solving for Sudoku problems
Knowledge Representation Project for the Master Artificial Intelligence
Vrije Universiteit Amsterdam
De Boelelaan 1105, 1081 HV Amsterdam
Abstract. This study aims to give more insight into human heuristic
Boolean Satisfiability (SAT) problem solving. Several variations of the
David-Putnam-Logemann-Loveland (DPLL) SAT-solvers were used to
compare computational steps with human solving time, while increasing
the difficulty of Sudoku problems. Results show that all implemented
variations of DPLL have low computational cost for Sudoku’s that are
quick to solve for humans. The Dynamic Largest Combined Sum (DLCS)
heuristic shows the highest correlation with human solving time (r: 0.55).
However, when human solving time increases, no correlation is found
between computational cost and human solve time. Further research with
more data on human Sudoku solving time should indicate whether there
exists a relationship between computational cost and human perceived
difficulty level, and if so, what kind of relationship this would be.
Keywords: Human Problem Solving · Boolean Satisfiability · Sudoku
1 Introduction
1.1 Boolean Satisfiability
Human problem solving is one of the most important research issues in cogni-
tive science and computer science, especially in the artificial intelligence domain.
Human beings are rational, and a major component of rationality is the abil-
ity to reason. Cognitive research has shown that individuals with no training
in logic are still able to make logical deductions [1]. However, as the number
of propositions in inferences increases, human reasoning soon demands a pro-
cessing capacity exceeding the human brain, whereas modern computers, using
efficient methods, are able to process a large amount of propositions in infer-
ences. For solving problems with a large amount of propositions in inferences
with computers, Boolean Satisfiability (SAT) solvers have been proposed.
Representing and solving various practical and theoretical problems as SAT
is a core subject in Artificial Intelligence, as well as in many other areas of
Computer Science and Engineering. Given a set of propositional variables and a
set of constraints expressed in conjunctive normal form (CNF), the goal of a SAT
problem is finding a variable assignment that satisfies all constraints or proves

2 Knowledge Representation Project for the Master Artificial Intelligence
that no such assignment exists [2]. Over the years many different algorithms have
been used to solve a SAT problem. The most well known SAT algorithms are
variations of the Davis-Putnam (DP) algorithm. This procedure is based on a
backtracking search algorithm that, at each step, elects an assignment and tries
to simplify the remaining unresolved expressions [3].
1.2 Sudoku as SAT-problem
Solving a Sudoku has been subject to many SAT problem studies, particularly
with respect to its mathematical, algorithmic and heuristic properties. Recently,
also psychological aspects of Sudoku solving have been studied [4, 5].
Several studies attest to the role of processing resources in problem solving.
To solve a task like Sudoku takes considerable working memory resources. Even
if a human is approximating in taking all the constraints into account, a se-
quence of problem solving and keeping-tracks steps would tax working memory
[6]. Newell and Simon state that humans always use heuristic strategy to make
complex problem solving easier [7]. Wang, Xiang, Zhou, Qin and Zhong (2009)
investigated heuristic retrieval in human problem solving by combining the com-
putational cognitive model ACT-R and advanced fMRI brain imaging technique.
They let participants solve a 4 x 4 Sudoku and show that the ways of problem
presentation, complexity of heuristics and status of goal take important roles in
the retrieval of heuristics [5]. However, it is not known if this ability of using
heuristic retrieval extends to spontaneous usage of propositional logic, as SAT
solvers do, for solving logical puzzles.
Research by Pélanek (2011) shows that the solving time of a backtracking algo-
rithm grows exponentially with the number of variables. Nevertheless, classical
9 x 9 Sudoku can be easily solved by computer using the backtracking search.
For humans, however, systematic search is laborious and error-prone. Studies on
human SAT solving mainly involve understanding the retrieval, implication and
selection of heuristics [8].
1.3 Experiment
Difficulty of Sudoku problems is often measured by the positioning of the num-
bers on the grid, the number of steps required to solve the problem, whereas the
number of given numbers are of less importance [9, 10]. Human solving time is
rarely used as a measure for Sudoku difficulty. To gain more understanding on
human (heuristic) SAT solving, the current paper focuses on comparing com-
putational steps of a SAT solver, with different implemented heuristics, in com-
parison with human solving time. More specifically, this paper tries to identify
if there exists a heuristic for Sudoku SAT solvers, such that it’s computational
cost scales similarly to humans computational cost.

Comparing human solving time with SAT-solving for Sudoku problems 3
Earlier research in SAT solving often uses tasks that are somewhat complicated,
take a long time, used a small sample of human data or did not use human
data at all. Therefore 9 x 9 Sudoku’s with human solving data will be used to
compare normalized scaling of solving time for humans and the different SAT
algorithms for different difficulty levels of Sudoku’s. Since difficulty of Sudoku is
often measured as a function of minimal needed computational steps to solve the
Sudoku, it is hypothesized that there exists a heuristic for SAT solving whose
computational cost scales similarly to human solve time for different human
defined difficulty levels.
The aim of this paper goes beyond the specific study of Sudoku. The aim
of this paper is to give more insight into human cognition and thinking, more
specific in human heuristic SAT solving. It also has important applications in
human-computer collaboration and training of problem solving skills, e.g., for
developing intelligent tutoring systems [8].
The remainder of this paper is organized in four sections. In Section 2, defi-
nitions of SAT solving will be introduced, as well as the implementation of the
variations of SAT-solvers used for this paper. Section 3 will describe the exper-
imental setup and in section 4, the results of the experiment will be presented.
In section 5, the results and the implications of the results will be discussed.
2 Definitions and implementations of SAT-solvers
A CNF formula with n binary variables x1, x2...x consists of the conjunction
(AND) of m clauses, each of which consists of the disjunction (OR) of k literals.
A literal l is an occurrence of a Boolean variable or its negated form [2]. A SAT
solver is designed to traverse through all variable assignments of the CNF until
a truth value assignment for the literals is found (the CNF formula is satisfiable)
or all combinations of truth assignments have been exhausted and no solution
has been found (unsatisfiable).
The SAT solver that is used in this paper is based on the
Davis–Putnam–Logemann–Loveland (DPLL) backtrack search algorithm [3]. From
the start, all literals are unassigned. The algorithm first traverses through all the
clauses of the CNF and will return ‘satisfiable’ if the CNF has no clauses and
return ‘unsatisfiable’ if an empty clause is found. If both conditions do not ap-
ply to the CNF, the algorithm searches for an unit clause (a clause with only
one variable) and if found, satisfies this clause. If no unit clauses are found, the
algorithm assigns a truth value to a randomly chosen variable. If no conflict is
found after this step, all steps are repeated. If a conflict has been found, the al-
gorithm backtracks by unassigning one or more recently assigned variables and
continues by assigning a truth value to another variable. Pseudo-code for the
DPLL algorithm is given below. The DPLL algorithm will for the remainder of
this article be referenced as ‘RANDOM’.

function DPLL(Φ)
if is a consistent set of literals then
return true;
if Φ contains an empty clause then
return false;
for every unit clause ll in Φ do
Φ ← unit-propagate(l, Φ);
for every literal l that occurs pure in Φ do
Φ ← pure-literal-assign(l, Φ);
l ← choose-literal(Φ);
return DPLL(Φ∧ l) or DPLL(Φ ∧ ¬ l);
Φ∧ l denotes the simplified result of substituting ”true” for l in Φ.
In later years, researchers have built more heuristics on top of the original DPLL
algorithm to improve performance. An important step in the DPLL algorithm
is the assigning of a truth value to a variable. Various heuristics for this step
(branching heuristics) have resulted in significant reduction of the amount of
search and running time [11].
For this paper, the Dynamic-Largest-Individual-Sum (DLIS) , the Dynamic-
Largest-Combined-Sum (DLCS) and the One-Sides and Two-Sided Jeroslaw-
Wang (OSJW and TSJW, respectively) branching heuristics have been imple-
mented [12, 13]. DLIS selects the variable which appears most in unresolved
clauses, whereas DLCS selects the most frequent appearing literal in the un-
resolved clauses (the sum of the original variable and the negated form) and
branches to the most frequent appearing form of that variable. TSJW branches
based on formula 1, where length is the length of clause C in which the literal
exists. The algorithm branches to the literal with the highest sum of this formula
in the unresolved clauses. Hence, this algorithm gives higher weight to literals
in shorter clauses. Literal is replaced with variable for the formula of OSJW.
l
2−|len(Cl)|
(1)
3 Experimental design
To see whether the SAT-solvers computational cost scales similarly to humans
computational cost, a 9 x 9 Sudoku is used. For a partially filled 9 x 9 Sudoku,
the goal is to place numbers 1 to 9 to each cell in such a way that in each row,
column, and 3 x 3 sub-grid, each number occurs exactly once.
3.1 Data collection
Sudoku’s were obtained from the website www.sudoku.org.uk, where every day
a Sudoku problem is presented. When a person completes a Sudoku, this person

can upload the solution together with the estimated solution time. Every day,
200-300 solutions with estimated solving time are uploaded. For the experiment,
30 Sudoku’s, from August 19th - September 18th 2020, were used. Although the
lack of direct control over participants is a disadvantage with this way of data
collection of human solving time, the high sample size makes this way of data
collection robust and applicable for research.
3.2 Experimental conditions
The SAT-solvers RANDOM, DLIS, DLCS, OSJW and TSJW were tested on
thirty Sudoku’s. The Sudoku’s are ordered for human solving time, with 1 as
the quickest solving time and 30 as the slowest solving time.
3.3 Metrics
For statistical analysis, Pearson’s correlation coefficient (r) is used to compare
the scaling in number of iterations of the different implementations of the SAT
solver with human solving time.
Fig. 1: The best fit linear line (r: 0.99) for number of iterations versus solving
time for all Sudoku’s

Since actual solve time will differ between central processing units (CPU’s),
the number of iterations is used as a measure for computer solving time. The
number of branches scale linearly with actual solving time (r: 0.99, see fig. 1).
For a visual comparison in the graphs, data was first normalized by dividing
solving time per Sudoku by the longest solving time for the particular heuristic
or for human data.
4 Results
In table 1, correlation r is presented for the individual heuristics with human
solving time as an average for all Sudoku’s. The graphs of human solving time
and computations iterations for DLCS is presented in figure 2; the graph with
other heuristics are presented in appendix A.
Fig. 2: Graph presenting the normalized human solving time and normalized
computational iterations. The Sudoku index on the horizontal axis is sorted by
human solving time, where Sudoku 1 has quickest solving time and Sudoku 30
has longest solving time.

Table 1: Correlations of heuristics with average human solving time.
Heuristic Correlation r with human solving time
Original DPLL (RANDOM) 0.44
Dynamic Largest Combined Sum (DLCS) 0.55
Dynamic Largest Individual Sum (DLIS) 0.41
One Sided Jeroslow-Wang (OSJW) 0.29
Two Sided Jeroslow-Wang (TSJW) 0.29
5 Discussion and Conclusion
This paper focused on gaining more understanding on human (heuristic) SAT
solving by comparing the scaled computational cost of solving a Sudoku between
humans and a SAT-solver with four implemented heuristics. It was hypothesized
that there exists an heuristic for SAT solving which computational cost scales
similarly to human solve time for different human defined difficulty levels.
The results show that for all the heuristics, a small positive correlation has been
found for scaled human solving time and number of iterations. This indicates
that for both humans and computers, solving time scales with the difficulty
levels of Sudoku. The correlation was highest for the DLCS heuristic (r: 0.55).
However, a suggestion that humans Sudoku solving corresponds most with the
DLSC solving procedure is too simplistic and further research is needed.
Because the CNF for Sudoku contained neither tautologies nor pure literals,
the number of clauses and literals were analyzed for further investigation. An
observation of interest is the value of the ratio between the number of clauses and
the number of literals at which the Sudoku problem becomes increasingly hard
to solve. This value turns out to be about 2.55. It was calculated by calculating
the ratio between number of clauses and number of literals at every split, and
then averaging the values out for the total number of splits performed while
solving the Sudoku problem (see fig 3).
The high peak in amount of iterations needed for the SAT-solvers at the clause-
literal ratio of 2.55 suggests no linear relationship between computational cost
and human solve time exists. These findings do not rule out the existence of a
relationship between computational cost and human perceived difficulty level,
but the peak in iterations does suggest a linear relationship would be too sim-
plistic and such a relationship cannot be successfully modelled with a data set of
thirty Sudoku’s. Further research with more Sudoku examples would make this
suggestion more robust.
Our findings indicate that DPLL (regardless of chosen heuristic) has low com-
putational cost for Sudoku’s that are quickly solvable for humans. However,
when the human solution time increases, we fail to find a correlation between
computation cost and human solve time (regardless of chosen heuristic). Further

Fig. 3: Graph presenting the number of computational iterations needed to solve
a Sudoku. A high amount of Sudoku’s have a clause-literal ratio around 2.5.
Around this ratio, some Sudoku’s were solved with a high amount of iterations,
suggesting that these Sudoku’s were hard to solve for the SAT-solvers.
research with more data on human Sudoku solving time should indicate whether
there exists a relationship between computational cost and human perceived
difficulty level, and if so, what kind of relationship this would be.
References
1. Philips, N. J-L.: Mental models and human reasoning. Department of Psychology
2(5), (2010)
2. F.A. Aloul.: Search techniques for SAT-based Boolean optimization. Journal of the
Franklin Institute 45), 436–447, (2006)
3. Martin Davis, George Logemann, and Donald Loveland. 1962. A machine
program for theorem-proving. Commun. ACM 5, 7 (July 1962), 394–397.
DOI:https://doi.org/10.1145/368273.368557
4. Lee, N.Y.L. , Goodwin, G.P. and Johnson-Laird, P.N.: The psychological puzzle of
Sudoku. Thinking Reasoning 14(4), 342–364, (2008)
5. Wang, Jie Xiang, Haiyan Zhou Yulin Qin and Ning Zhong.: Simulating Human
Heuristic Problem Solving: A Study by Combining ACT-R and fMRI Brain Image
Rifeng , F.: Article title. Journal 2(5),
6. Ashcraft, M H., and Radvansky, G. A.: Cognition. 5th edn. Pearson, Boston (2010)
7. Newell, A., and Simon, H.A.: Human Problem Solving. Prentice-Hal, Englewood
Cliffs (1972)
8. Pelánek, R.: Human Problem Solving: Sudoku Case Study. Faculty of Informatics.
(2011)
9. Moraglio, Alberto Togelius, Julian. (2007). Geometric particle swarm optimization
for the sudoku puzzle. 118-125. 10.1145/1276958.1276975.

10. Jaysonne A. Pacurib, Glaiza Mae M. Seno, and John Paul T. Yusiong. 2009.
Solving Sudoku Puzzles Using Improved Artificial Bee Colony Algorithm. In Pro-
ceedings of the 2009 Fourth International Conference on Innovative Computing,
Information and Control (ICICIC ’09). IEEE Computer Society, USA, 885–888.
DOI:https://doi.org/10.1109/ICICIC.2009.334
11. Marques-Silva J. (1999) The Impact of Branching Heuristics in Propositional Sat-
isfiability Algorithms. In: Barahona P., Alferes J.J. (eds) Progress in Artificial Intel-
ligence. EPIA 1999. Lecture Notes in Computer Science, vol 1695. Springer, Berlin,
Heidelberg. ://doi.org/10.1007/3-540-48159-15
12. Marques-Silva, J. P., and Sakallah, K. A.: GRASP: A Search Algorithm for Proposi-
tional Satisfiability. Transactions on Computers 48 (5), 506-521, (1999)
13. Jeroslow, R.G., Wang, J. Solving propositional satisfiability problems. Ann Math Artif
Intell 1, 167–187 (1990). https://doi.org/10.1007/BF01531077
6 Appendix A

Comparing human solving time with SAT-solving for Sudoku problems

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Comparing human solving time with SAT-solving for Sudoku problems

Similar to Comparing human solving time with SAT-solving for Sudoku problems (20)

Recently uploaded

Recently uploaded (20)

Comparing human solving time with SAT-solving for Sudoku problems