The Game of Life is a famous zero-player game that has been devised by John Conway more than forty years ago. It is a cellular automaton that runs on a rectangular grid of cells, each of which is either alive or dead. A set of four simple transition rules prescribe how the cells in the grid evolve from one generation to the other.
Life, as it is called for short, has been able to capture the fascination of many programmers ever since it was published – mostly because of the surprising behavior it can result in. Many programmers have written implementations of Life at some point, probably in an educational context or maybe just for fun.
This webinar is intended as a combination of the two: education and fun. It will show how a distributed version of Life can be implemented using the Data Distribution Service (DDS) standard from the Object Management Group (OMG). The presented Distributed Life system will be able to deal with real-life system requirements like fault tolerance, scalability and deployment flexibility. It will be shown that by leveraging advanced DDS data-management features, system developers can off-load most of the complexity associated with the distribution and fault-tolerance aspects onto the infrastructure and focus on the application logics.
AWS Community Day CPH - Three problems of Terraform
Learn How to Develop a Distributed Game of Life with DDS
1. Your systems. Working as one.
May 15, 2013
Reinier Torenbeek
reinier@rti.com
Learn How to Develop a
Distributed Game of Life with DDS
2. Agenda
• Problem definition: Life Distributed
• A solution: RTI Connext DDS
• Applying DDS to Life Distributed: concepts
• Applying DDS to Life Distributed: (pseudo)-code
• Advanced Life Distributed: leveraging DDS
• Summary
• Questions and Answers
3. Conway's Game of Life
• Devised by John Conway in 1970
• Zero-player game
– evolution determined by initial state
– no further input required
• Plays in two-dimensional, orthogonal grid of
square cells
– originally of infinite size
– for this webinar, toroidal array is used
• At any moment in time, each cell is either dead or
alive
• Neighboring cells interact with each other
– horizontally, vertically, or diagonally adjacent.
4. Conway's Game of Life
At each step in time, the following transitions occur:
1. Any live cell with fewer than two live neighbors
dies, as if caused by under-population.
2. Any live cell with two or three live neighbors lives on
to the next generation.
3. Any live cell with more than three live neighbors
dies, as if by overcrowding.
4. Any dead cell with exactly three live neighbors
becomes a live cell, as if by reproduction.
These rules continue to be applied repeatedly to create
further generations.
6. Conway's Game of Life – Distributed
Problem description: how can Life be properly
implemented in a distributed fashion?
• have multiple processes work on parts of the
Universe in parallel
8. Conway's Game of Life – Distributed
Problem description: how can Life be properly
implemented in a distributed fashion?
• have multiple processes work on parts of the
Universe in parallel
• have these processes exchange the required
information for the evolutionary steps
10. Conway's Game of Life – Distributed
Problem description: how can Life be properly
implemented in a distributed fashion?
• have multiple processes work on parts of the
Universe in parallel
• have these processes exchange the required
information for the evolutionary steps
This problem and its solution serve as an
example for developing distributed applications
in general
11. Conway's Game of Life – Distributed
Properly here means:
• with minimal impact on the application logics
– let distribution artifacts be dealt with transparently
– let the developer focus on Life and its algorithms
• allowing for mixed environments
– multiple programming languages, OS-es and hardware
– asymmetric processing power
• supporting scalability
– for very large Life Universes on many machines
– for load balancing of CPU intensive calculations
• in a fault-tolerant fashion
12. Agenda
• Problem definition: Life Distributed
• A solution: RTI Connext DDS
• Applying DDS to Life Distributed: concepts
• Applying DDS to Life Distributed: (pseudo)-code
• Advanced Life Distributed: leveraging DDS
• Summary
• Questions and Answers
13. RTI Connext DDS
A few words describing RTI Connext DDS:
• an implementation of the Object Management
Group (OMG) Data Distribution Service (DDS)
– standardized, multi-language API
– standardized wire-protocol
– see www.rti.com/elearning for tutorials (some free)
• a high performance, scalable, anonymous
publish/subscribe infrastructure
• an advanced distributed data management
technology
– supporting many features know from DBMS-es
14. RTI Connext DDS
DDS revolves around the concept of a typed data-
space that
• consists of a collection of structured, observable
items which
– go through their individual lifecycle of
creation, updating and deletion (CRUD)
– are updated by Publishers
– are observed by Subscribers
• is managed in a distributed fashion
– by Connext libraries and (optionally) services
– transparent to applications
15. RTI Connext DDS
DDS revolves around the concept of a typed data-
space that
• allows for extensive fine-tuning
– to adjust distribution behavior according to
application needs
– using standard Quality of Service (QoS) mechanisms
• can evolve dynamically
– allowing Publishers and Subscribers to join and leave
at any time
– automatically discovering communication paths
between Publishers and Subscribers
16. Agenda
• Problem definition: Life Distributed
• A solution: RTI Connext DDS
• Applying DDS to Life Distributed: concepts
• Applying DDS to Life Distributed: (pseudo)-code
• Advanced Life Distributed: leveraging DDS
• Summary
• Questions and Answers
17. Applying DDS to Life Distributed
First step is to define the data-model in IDL
• cells are observable items, or "instances"
– row and col identify their location in the grid
– generation identifies the "tick nr" in evolution
– alive identifies the state of the cell
18. module life {
struct CellType {
long row; //@key
long col; //@key
unsigned long generation;
boolean alive;
};
};
19. Applying DDS to Life Distributed
First step is to define the data-model in IDL
• cells are observable items, or "instances"
– row and col identify their location in the grid
– generation identifies the "tick nr" in evolution
– alive identifies the state of the cell
• the collection of all cells is the CellTopic Topic
– cells exist side-by-side and for the Universe
– conceptually stored "in the data-space"
– in reality, local copies where needed
21. Applying DDS to Life Distributed
Each process is responsible for publishing the
state of a certain subset of cells of the Universe:
• a rectangle or square area with corners
(rowmin,colmin)i and (rowmax,colmax)i for process i
23. Applying DDS to Life Distributed
Each process is responsible for publishing the
state of a certain subset of cells of the Universe:
• a rectangle or square area with corners
(rowmin,colmin)i and (rowmax,colmax)i for process i
• each cell is individually updated using the
write() call on a CellTopic DataWriter
– middleware analyzes the key values (row,col) and
maintains the individual states of all cells
• updating happens generation by generation
24. Applying DDS to Life Distributed
Each process subscribes to the required subset
of cells in order to determine its current state:
• all neighboring cells, as well as its "own" cells
25.
26. Applying DDS to Life Distributed
Each process subscribes to the required subset
of cells in order to determine its current state:
• all neighboring cells, as well as its "own" cells
• using a SQL-expression to identify the cells
subscribed to (content-based filtering)
– complexity is "Life-specific", not "DDS-specific"
27. "((row >= 1 AND row <= 11) OR row = 20) AND
((col >= 1 AND col <= 11) OR col = 20)"
28. Applying DDS to Life Distributed
Each process subscribes to the required subset
of cells in order to determine its current state:
• all neighboring cells, as well as its "own" cells
• using a SQL-expression to identify the cells
subscribed to (content-based filtering)
– complexity is "Life-specific", not "DDS-specific"
• middleware will deliver cell updates to those
DataReaders that are interested in it
29. Applying DDS to Life Distributed
Additional processes can be added to peek at
the evolution of Life:
• subscribing to (a subset of) the CellTopic
30. "row >= 8 AND row <= 13 AND col >= 8 AND col <= 13"
31. Applying DDS to Life Distributed
Additional processes can be added to peek at
the evolution of Life:
• subscribing to (a subset of) the CellTopic
• using any supported language, OS, platform
– C, C++, Java, C#, Ada
– Windows, Linux, AIX, Mac OS X, Solaris,
INTEGRITY, LynxOS, VxWorks, QNX…
• without changes to the existing applications
– middleware discovers new topology and
distributes updates accordingly
32. Agenda
• Problem definition: Life Distributed
• A solution: RTI Connext DDS
• Applying DDS to Life Distributed: concepts
• Applying DDS to Life Distributed: (pseudo)-code
• Advanced Life Distributed: leveraging DDS
• Summary
• Questions and Answers
33. Life Distributed (pseudo-)code
Life Distributed prototype applications were
developed on Mac OS X
• Life evolution application written in C
• Life observer application written in Python
– using Pythons extension-API
• (Pseudo-)code covers basic scenario only
– more advanced apects are covered in next section
34. Life Distributed (pseudo-)code
Life evolution application written in C:
• application is responsible for
– knowing about the Life seed (initial state of cells)
– executing the Life rules based on cell updates
coming from DDS
– updating cell states after a full generation tick has
been processed
• evolution of Life takes place one generation at
a time
– consequently, Life applications run in "lock-step"
35. initialize DDS
current generation = 0
write sub-universe Life seed to DDS
repeat
repeat
wait for DDS cell update for
current generation
update sub-universe with cell
until 8 neighbors seen for all cells
execute Life rules on sub-universe
increase current generation
write all new cell states to DDS
until last generation reached
36. Life Distributed (pseudo-)code
Worth to note about the Life application:
• loss of one cell-update will eventually stall the
complete evolution
– this is by nature of the Life algorithm
– implies RELIABLE reliability QoS for DDS
– history of 2 generations need to be stored to avoid
overwriting
38. Life Distributed (pseudo-)code
Worth to note about the Life application:
• loss of one cell-update will eventually stall the
complete evolution
– this is by nature of the Life algorithm
– implies RELIABLE reliability QoS for DDS
– history of 2 generations needs to be stored to avoid
overwriting of state of a single cell
• startup-order issues resolved by DDS durability QoS
– newly joined applications will be delivered current state
– delivery of historical data transparent to applications
– applications not waiting for other applications, but for cell
updates
40. Life Distributed (pseudo-)code
Worth to note about the Life application:
• DDS cell updates come from different places
– mostly from the application's own DataWriter
– also from neighboring sub-Universes' DataWriters
– all transparently arranged based on the filter
41. create DDS DomainParticipant
with DomainParticipant, create DDS Topic "CellTopic"
with CellTopic and filterexpression, create DDS
ContentFilteredTopic "FilteredCellTopic"
create DDS Subscriber
create DDS DataReader for FilteredCellTopic
42. Life Distributed (pseudo-)code
Worth to note about the Life application:
• DDS cell updates come from different places
– mostly from the application's own DataWriter
– also from neighboring sub-Universes' DataWriters
– all transparently arranged based on the filter
• algorithm relies on reading cell-updates for a
single generation
– evolving one tick at a time
– leverages DDS QueryCondition
– "generation = %0" with %0 value changing
43. create DDS DomainParticipant
with DomainParticipant, create DDS Topic "CellTopic"
with CellTopic and filterexpression, create DDS
ContentFilteredTopic "FilteredCellTopic"
with DomainParticiapnt, create DDS Subscriber
with Subscriber and FilteredCellTopic, create DDS
CellTopicDataReader
with CellTopicDataReader, query expression and
parameterlist, create QueryCondition
with DomainParticipant, create WaitSet
attach QueryCondition to WaitSet
in main loop:
in generation loop:
block thread in WaitSet, wait for data from DDS
read with QueryCondition from CellTopicDataReader
increase generation
update query parameterlist with new generation
44. Life Distributed (pseudo-)code
Life observer application written in Python:
• application is responsible for
– subscribing to cell updates
– printing cell states to display evolution
– ignoring any generations that have missing cell
updates
45. import clifedds as life
#omitted option parsing
filterString = 'row>={} and col>={} and row<={} and
col<={}'.
format(options.minRow, options.minCol,
options.maxRow, options.maxCol)
life.open(options.domainId, filterString)
generation = 0
while generation is not None:
# read from DDS, block if nothing availble,
# returns Nones in case of time-out after 10 seconds
row, col, generation, isAlive = life.read(10)
# omitted administration for building and printing strings
life.close()
46. Life Distributed (pseudo-)code
Life observer application written in Python:
• application is responsible for
– subscribing to cell updates
– printing cell states to display evolution
– ignoring any generations that have missing cell
updates
• for minimal impact, DataReader uses default QoS
settings
– BEST_EFFORT reliability, so updates might be lost
– VOLATILE durability, so no delivery of historical
updates
– still history depth of 2
51. Agenda
• Problem definition: Life Distributed
• A solution: RTI Connext DDS
• Applying DDS to Life Distributed: concepts
• Applying DDS to Life Distributed: (pseudo)-code
• Advanced Life Distributed: leveraging DDS
• Summary
• Questions and Answers
52. Fault tolerance
Not all is lost if Life application crashes
• if using TRANSIENT durability
QoS, infrastructure will keep status roaming
– requires extra service to run (redundantly)
• after restart, current status is available
automatically
– new incarnation can continue seamlessly
54. Fault tolerance
Not all is lost if Life application crashes
• if using TRANSIENT durability QoS,
infrastructure will keep status roaming
– requires extra service to run (redundantly)
• after restart, current status is available
automatically
– new incarnation can continue seamlessly
• results in high robustness
• even more advanced QoS-es are possible
55. Reliability and flow control
Running the Python app with a larger grid:
• with current QoS, faster writer with slower
reader will overwrite samples in reader
• whenever at least one cell update is
missing, the generation is not printed (by
design)
56.
57. Reliability and flow control
Running the Python app with a larger grid:
• whenever at least one cell update is missing,
the generation is not printed (by design)
• with current QoS, faster writer with slower
reader will overwrite samples in reader
• this is often desired result, to avoid system-
wide impact of asymmetric processing power
• if not desired, KEEP_ALL QoS can be leveraged
59. Reliability and flow control
Running the Python app with a larger grid:
• whenever at least one cell update is missing,
the generation is not printed (by design)
• with current QoS, faster writer with slower
reader will overwrite samples in reader
• this is often desired result, to avoid system-
wide impact of asymmetric processing power
• if not desired, KEEP_ALL QoS can be leveraged
– flow control will slow down writer to avoid loss
60.
61. More advanced problem solving
Other ways to improve the Life implementation:
• for centralized grid configuration, distribute grid-
sizes with DDS
– with TRANSIENT or PERSISTENT QoS
– this isolates configuration-features to one single app
– dynamic grid-reconfiguration can be done by re-
publishing grid-sizes
• for centralized seed (generation 0)
management, distribute seed with DDS
– with TRANSIENT or PERSISTENT QoS
– this isolates seeding to one single app
62. More advanced problem solving
Other ways to improve the Life implementation:
• in addition to separate cells, distribute complete sub-
Universe state using a more compact data-type
– DDS supports a very rich set of data-types
• bitmap-like type would work well
– especially useful for very large scale Universes
– can be used for seeding as well
– with TRANSIENT QoS
• multiple Universes can exist and evolve side-by-side using
Partitions
– only readers and writers that have a Partition in common will
interact
– Partitions can be added and removed on the fly
– Partitions are string names, allowing good flexibility
63. Agenda
• Problem definition: Life Distributed
• A solution: RTI Connext DDS
• Applying DDS to Life Distributed: concepts
• Applying DDS to Life Distributed: (pseudo)-code
• Advanced Life Distributed: leveraging DDS
• Summary
• Questions and Answers
64. Summary
Distributed Life can be properly implemented
leveraging DDS
• communication complexity is off-loaded to
middleware, developer can focus on application
• advanced QoS settings allow for adjustment to
requirements and deployment characteristics
• DDS features simplify extending Distributed Life
beyond its basic implementation
• all of this in a standardized, multi-language, multi-
platform environment with an infrastructure built
to scale and perform
65. Agenda
• Problem definition: Life Distributed
• A solution: RTI Connext DDS
• Applying DDS to Life Distributed: concepts
• Applying DDS to Life Distributed: (pseudo)-code
• Advanced Life Distributed: leveraging DDS
• Summary
• Questions and Answers