Andrey Z
a
kh
a
revich for MLST, 9.03.2022
Program Synthesis,
DreamCoder and ARC
About me
• 2002 —
fi
rst Hello World 🤯


• 2010 — dropped out from the university


• 2012-2021 — various software engineering jobs, mostly backend


• 2013 —
fi
rst neural network, in Ruby (sic)


• 2018 — immigrated to Israel


• March 2020 — started working on an ARC solution


• July 2021 — learned about DreamCoder and switched to using it as a base for an
ARC solution attempt
Outline
• What is program synthesis


• Top-level overview of DreamCoder


• What is ARC and why it is important


• My own insights from working on all this and possible directions
Program Synthesis
• Task de
fi
nition


• ????


• PROGRAM!
Program synthesis
• Input-output pairs (FlashMeta, DreamCoder)


• Text prompt (Codex, Copilot)


• Logical constraints (Coq)


• High-level code (compilers)
Task definition
Program synthesis
• Incremental search over a tree of possible programs


• Tree is exponentially big


• Hard to evaluate incomplete programs


• Full-text generation by language model


• Hard to ensure syntax, type, and memory correctness


• Prompts usually don’t include tests


• Genetic programming


• Needs mutation and crossover operations that preserve code correctness


• Hard to de
fi
ne which intermediate programs are more “
fi
t”
Main approaches
DreamCoder
1. Takes a set of tasks


2. Starts with a library (or grammar) of primitive functions


3. (Enumeration) Tries to solve tasks with the current grammar


4. Generates more possible programs from the current grammar (dreams)


5. (Recognition) Trains a neural network on found solutions and dreams that
predicts probabilities of each function from a string of task description


6. (Enumeration) Tries to solve tasks again, this time using probabilities from NN


7. (Compression) Looks for repeated patterns in found solutions, adds them to
grammar


8. Goes to step 3
DreamCoder
• Takes a set of tasks of the same type and a
grammar


• Programs are expressions of typed lambda calculus
with De Bruijn indexing


• Search starts from a single Hole ( ?? ) of expected
type


• All primitives from the grammar are checked if they
can unify with the hole type


• All possibilities are weighted according to the
grammar, partial solutions are stored in a priority
queue


• When all holes are
fi
lled, the program is checked
against all unsolved tasks
Enumeration
(
?
?
[list(int)
-
>
list(int)] )
(lambda
?
?
[list(int)] )
(lambda empty) (lambda $0)
(lambda (cons
?
?
[int]
?
?
[list(int)] )
…
(lambda (cons 0
?
?
[list(int)] ) …
DreamCoder
• Type-correctness requires that we go from output to input


• We can’t check partial programs for runtime correctness, all the possible solutions
of (cons (car empty)
?
?
) will be explored (within priorities and time limit)


• Can generate in
fi
nite loops, requires timeouts and interruption management
Enumeration
DreamCoder
Generates many transformations of found
programs and looks for repeating
subprograms such that adding them to
library reduces combined length of all
found solutions and the library itself
Compression
DreamCoder
• RNN with LSTM layers


• Input is task de
fi
nition


• Di
ff
erent domains can have di
ff
erent features (like a couple CNN layers for image
domains)


• For a grammar with n functions NN will have n+1 outputs


• Each output is a probability that the corresponding function is used in the
solution to the task


• The last output is the probability of a free variable term
Recognition
DreamCoder
• Gradually expands its library of available functions, thus learning new discrete
concepts without human guidance


• NN model can be referred to as an intuition part. “This task looks like I should
totally use reduce and not map in the solution”


• No support of dependent types means that we can’t propagate constraints
through holes, see (car empty) example


• Single probability for a function may be not enough for complex problems with
long solutions that utilize a big portion of the library. There is the context
grammar extension, but it’s still fairly limited


• Lambda calculus may be quite limited for e
ffi
cient algorithms
Overall
Abstraction and Reasoning Corpus
• Introduced by François Chollet in “On the
Measure of Intelligence”


• Solvable by humans but not machines


• Targets ability to operate with complex
combinations of abstract patterns without
knowledge about real world, except for
Core Knowledge


• Has parallels with skill acquisition


• Private test set su
ffi
ciently di
ff
erent from
public train and test data


• Tests developer-aware generalization
Intermission
How do I solve these tasks?
How do I solve these tasks?
How do I solve these tasks?
Abstractors
• A.k.a reversible functions


• Somewhat akin to witness functions from FlashMeta


• A combination of to_abstract and from_abstract operations


• Preserve information, but present it in a di
ff
erent, possibly more e
ffi
cient way


• to_abstract can have several outputs


• to_abstract can output several possible options


• Examples: grid_size, extract_background, extract_objects,
group_similar_items, group_objects_by_color, vert_symmetry
How to evaluate representations?
A good evaluation function should:


• Work on di
ff
erent data types


• Probably not Monte-Carlo — if it returns non-zero result, we have a solution


My current solution is weighted Kolmogorov complexity.


• Each type has a certain weight per item


• Items of complex types use sum of the weight of all their subitems plus the
weight of the type itself
Intermediate results
• Solved 34/400 training tasks with a threshold of 500 visited partial solutions


• Abstractor library was quite limited


• I had to write all abstractors by hand


• I had to manually pick weights for di
ff
erent abstractors and types
Moving to DreamCoder
• It can learn new functions from primitives on its own


• It can learn weights for functions on its own
Why?
Moving to DreamCoder
• Written in OCaml — no type information in runtime, hard to experiment, not so
easy to read


• Creating programs from output to input means that I don’t have any intermediate
representations to evaluate during the search
Obstacles
Moving to DreamCoder
• No runtime type information in OCaml and absolute type strictness (you can have
either unit ref and have no idea what’s inside, or manually specify all the
possible options) meant that I can’t manipulate any intermediate representations
at all. The solution is to rewrite it to another more dynamic language, I chose Julia


• Introduce named variables to generated programs as in let $x = … in …


• Make search bidirectional, go for simpler representations of both input and
output while checking if new representations can help in explaining the output


• Add a special class of reversible functions, specify how they can be combined so
that the compression step will be able to learn new abstractors without losing
their reversible nature


• Measure intermediate data complexity, learn type weight alongside function
probabilities
Path to solution
Moving to DreamCoder
• What is the best way to evaluate a program with a types and functions weights
set? If we make decisions based on the qualities of intermediate representations,
it’s no longer an admissible search problem


• Should we run NN model not only in the beginning of an attempt to solve a task,
but also on some intermediate representations? We are no longer constraint by
OCaml here, but our model should be able to deal with various data types on its
own without our additional feature engineering


• Should we add dependent types support and learn aliases for them? Rectangle is
still an object but it supports some very speci
fi
c set of operations
Questions
References
• Kevin Ellis, Lucas Morales, Mathias Sable ́-Meyer, Armando Solar-Lezama, and Josh
Tenenbaum: Library learning for neurally-guided bayesian program induction. (2018)


• Ellis, K., Wong, C., Nye, M., Sable-Meyer, M., Cary, L., Morales, L., Hewitt, L.,


Solar-Lezama, A., Tenenbaum, J.B.: Dreamcoder: Growing generalizable, inter-


pretable knowledge with wake-sleep bayesian program learning (2020)


• Chollet, F.: On the measure of intelligence (2019)


• Polozov, O., Gulwani, S.: Flashmeta: a framework for inductive program synthesis. In: Aldrich,
J., Eugster, P. (eds.) OOPSLA. pp. 107–126. ACM (2015), http: //dblp.uni-trier.de/db/conf/
oopsla/oopsla2015.html#PolozovG15


• Alford, S., Gandhi, A., Rangamani, A., Banburski, A., Wang, T., Dandekar, S., ... & Chin, P. (2021,
November). Neural-Guided, Bidirectional Program Search for Abstraction and Reasoning.
In International Conference on Complex Networks and Their Applications (pp. 657-668).
Springer, Cham.
That’s all!
• I’m open for collaboration and discussions


• I’m also open for employment, especially on
something related


• https://github.com/andreyz4k/ec/tree/
julia_enumerator


• https://www.linkedin.com/in/
andreyzakharevich/


• Or @andreyz4k on most social media

Program Synthesis, DreamCoder, and ARC

  • 1.
    Andrey Z a kh a revich forMLST, 9.03.2022 Program Synthesis, DreamCoder and ARC
  • 2.
    About me • 2002— fi rst Hello World 🤯 • 2010 — dropped out from the university • 2012-2021 — various software engineering jobs, mostly backend • 2013 — fi rst neural network, in Ruby (sic) • 2018 — immigrated to Israel • March 2020 — started working on an ARC solution • July 2021 — learned about DreamCoder and switched to using it as a base for an ARC solution attempt
  • 3.
    Outline • What isprogram synthesis • Top-level overview of DreamCoder • What is ARC and why it is important • My own insights from working on all this and possible directions
  • 4.
    Program Synthesis • Taskde fi nition • ???? • PROGRAM!
  • 5.
    Program synthesis • Input-outputpairs (FlashMeta, DreamCoder) • Text prompt (Codex, Copilot) • Logical constraints (Coq) • High-level code (compilers) Task definition
  • 6.
    Program synthesis • Incrementalsearch over a tree of possible programs • Tree is exponentially big • Hard to evaluate incomplete programs • Full-text generation by language model • Hard to ensure syntax, type, and memory correctness • Prompts usually don’t include tests • Genetic programming • Needs mutation and crossover operations that preserve code correctness • Hard to de fi ne which intermediate programs are more “ fi t” Main approaches
  • 7.
    DreamCoder 1. Takes aset of tasks 2. Starts with a library (or grammar) of primitive functions 3. (Enumeration) Tries to solve tasks with the current grammar 4. Generates more possible programs from the current grammar (dreams) 5. (Recognition) Trains a neural network on found solutions and dreams that predicts probabilities of each function from a string of task description 6. (Enumeration) Tries to solve tasks again, this time using probabilities from NN 7. (Compression) Looks for repeated patterns in found solutions, adds them to grammar 8. Goes to step 3
  • 8.
    DreamCoder • Takes aset of tasks of the same type and a grammar • Programs are expressions of typed lambda calculus with De Bruijn indexing • Search starts from a single Hole ( ?? ) of expected type • All primitives from the grammar are checked if they can unify with the hole type • All possibilities are weighted according to the grammar, partial solutions are stored in a priority queue • When all holes are fi lled, the program is checked against all unsolved tasks Enumeration ( ? ? [list(int) - > list(int)] ) (lambda ? ? [list(int)] ) (lambda empty) (lambda $0) (lambda (cons ? ? [int] ? ? [list(int)] ) … (lambda (cons 0 ? ? [list(int)] ) …
  • 9.
    DreamCoder • Type-correctness requiresthat we go from output to input • We can’t check partial programs for runtime correctness, all the possible solutions of (cons (car empty) ? ? ) will be explored (within priorities and time limit) • Can generate in fi nite loops, requires timeouts and interruption management Enumeration
  • 10.
    DreamCoder Generates many transformationsof found programs and looks for repeating subprograms such that adding them to library reduces combined length of all found solutions and the library itself Compression
  • 11.
    DreamCoder • RNN withLSTM layers • Input is task de fi nition • Di ff erent domains can have di ff erent features (like a couple CNN layers for image domains) • For a grammar with n functions NN will have n+1 outputs • Each output is a probability that the corresponding function is used in the solution to the task • The last output is the probability of a free variable term Recognition
  • 12.
    DreamCoder • Gradually expandsits library of available functions, thus learning new discrete concepts without human guidance • NN model can be referred to as an intuition part. “This task looks like I should totally use reduce and not map in the solution” • No support of dependent types means that we can’t propagate constraints through holes, see (car empty) example • Single probability for a function may be not enough for complex problems with long solutions that utilize a big portion of the library. There is the context grammar extension, but it’s still fairly limited • Lambda calculus may be quite limited for e ffi cient algorithms Overall
  • 13.
    Abstraction and ReasoningCorpus • Introduced by François Chollet in “On the Measure of Intelligence” • Solvable by humans but not machines • Targets ability to operate with complex combinations of abstract patterns without knowledge about real world, except for Core Knowledge • Has parallels with skill acquisition • Private test set su ffi ciently di ff erent from public train and test data • Tests developer-aware generalization
  • 14.
  • 15.
    How do Isolve these tasks?
  • 16.
    How do Isolve these tasks?
  • 17.
    How do Isolve these tasks?
  • 18.
    Abstractors • A.k.a reversiblefunctions • Somewhat akin to witness functions from FlashMeta • A combination of to_abstract and from_abstract operations • Preserve information, but present it in a di ff erent, possibly more e ffi cient way • to_abstract can have several outputs • to_abstract can output several possible options • Examples: grid_size, extract_background, extract_objects, group_similar_items, group_objects_by_color, vert_symmetry
  • 19.
    How to evaluaterepresentations? A good evaluation function should: • Work on di ff erent data types • Probably not Monte-Carlo — if it returns non-zero result, we have a solution My current solution is weighted Kolmogorov complexity. • Each type has a certain weight per item • Items of complex types use sum of the weight of all their subitems plus the weight of the type itself
  • 20.
    Intermediate results • Solved34/400 training tasks with a threshold of 500 visited partial solutions • Abstractor library was quite limited • I had to write all abstractors by hand • I had to manually pick weights for di ff erent abstractors and types
  • 21.
    Moving to DreamCoder •It can learn new functions from primitives on its own • It can learn weights for functions on its own Why?
  • 22.
    Moving to DreamCoder •Written in OCaml — no type information in runtime, hard to experiment, not so easy to read • Creating programs from output to input means that I don’t have any intermediate representations to evaluate during the search Obstacles
  • 23.
    Moving to DreamCoder •No runtime type information in OCaml and absolute type strictness (you can have either unit ref and have no idea what’s inside, or manually specify all the possible options) meant that I can’t manipulate any intermediate representations at all. The solution is to rewrite it to another more dynamic language, I chose Julia • Introduce named variables to generated programs as in let $x = … in … • Make search bidirectional, go for simpler representations of both input and output while checking if new representations can help in explaining the output • Add a special class of reversible functions, specify how they can be combined so that the compression step will be able to learn new abstractors without losing their reversible nature • Measure intermediate data complexity, learn type weight alongside function probabilities Path to solution
  • 24.
    Moving to DreamCoder •What is the best way to evaluate a program with a types and functions weights set? If we make decisions based on the qualities of intermediate representations, it’s no longer an admissible search problem • Should we run NN model not only in the beginning of an attempt to solve a task, but also on some intermediate representations? We are no longer constraint by OCaml here, but our model should be able to deal with various data types on its own without our additional feature engineering • Should we add dependent types support and learn aliases for them? Rectangle is still an object but it supports some very speci fi c set of operations Questions
  • 25.
    References • Kevin Ellis,Lucas Morales, Mathias Sable ́-Meyer, Armando Solar-Lezama, and Josh Tenenbaum: Library learning for neurally-guided bayesian program induction. (2018) • Ellis, K., Wong, C., Nye, M., Sable-Meyer, M., Cary, L., Morales, L., Hewitt, L., 
 Solar-Lezama, A., Tenenbaum, J.B.: Dreamcoder: Growing generalizable, inter- 
 pretable knowledge with wake-sleep bayesian program learning (2020) • Chollet, F.: On the measure of intelligence (2019) • Polozov, O., Gulwani, S.: Flashmeta: a framework for inductive program synthesis. In: Aldrich, J., Eugster, P. (eds.) OOPSLA. pp. 107–126. ACM (2015), http: //dblp.uni-trier.de/db/conf/ oopsla/oopsla2015.html#PolozovG15 • Alford, S., Gandhi, A., Rangamani, A., Banburski, A., Wang, T., Dandekar, S., ... & Chin, P. (2021, November). Neural-Guided, Bidirectional Program Search for Abstraction and Reasoning. In International Conference on Complex Networks and Their Applications (pp. 657-668). Springer, Cham.
  • 26.
    That’s all! • I’mopen for collaboration and discussions • I’m also open for employment, especially on something related • https://github.com/andreyz4k/ec/tree/ julia_enumerator • https://www.linkedin.com/in/ andreyzakharevich/ • Or @andreyz4k on most social media