SlideShare a Scribd company logo
1 of 85
Download to read offline
STUDENT DECLARATION
I, Gábor Apagyi, declare that I have created this diploma without any unauthorized help,
using only the specified sources (literatures, tools, etc.). Every section is obviously
marked with the source, where I have used the original text verbatim or rephrased.
I give permission to BME VIK to publish the basic data of this work (author(s), title,
abstract in English and Hungarian, date of creation, name of consultants) in electronic
form within reach of everyone, and the full text of the work through the intranet of the
university. I declare that the handed in and the electronic versions are identical. Text of
encrypted diplomas, with permission of the Dean, are only become accessible after 3
years.
Dated: Budapest, 12/05/2015
...…………………………………………….
Gábor Apagyi
HALLGATÓI NYILATKOZAT
Alulírott Apagyi Gábor, szigorló hallgató kijelentem, hogy ezt a diplomatervet meg nem
engedett segítség nélkül, saját magam készítettem, csak a megadott forrásokat
(szakirodalom, eszközök stb.) használtam fel. Minden olyan részt, melyet szó szerint,
vagy azonos értelemben, de átfogalmazva más forrásból átvettem, egyértelműen, a forrás
megadásával megjelöltem.
Hozzájárulok, hogy a jelen munkám alapadatait (szerző(k), cím, angol és magyar nyelvű
tartalmi kivonat, készítés éve, konzulens(ek) neve) a BME VIK nyilvánosan hozzáférhető
elektronikus formában, a munka teljes szövegét pedig az egyetem belső hálózatán
keresztül (vagy hitelesített felhasználók számára) közzétegye. Kijelentem, hogy a
benyújtott munka és annak elektronikus verziója megegyezik. Dékáni engedéllyel
titkosított diplomatervek esetén a dolgozat szövege csak 3 év eltelte után válik
hozzáférhetővé.
Kelt: Budapest, 2015. 05. 12.
...…………………………………………….
Apagyi Gábor
3
Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics
Department of Automation and Applied Informatics
Gábor Apagyi
ANALYSIS OF EXECUTABLE
GRAPH MODEL
CONSULTANTS
Dr. Gergely Mezei
Ferenc Nasztanovics (Morgan Stanley)
Loránd Szöllősi (Morgan Stanley)
BUDAPEST, 2015
4
Table of contents
Abstract............................................................................................................................ 7
Összefoglaló ..................................................................................................................... 8
1 Introduction.................................................................................................................. 9
1.1 The structure ........................................................................................................... 9
1.2 The domain ............................................................................................................. 9
1.3 Motivation............................................................................................................. 10
2 Theoretical background ............................................................................................ 11
2.1 Graph theory ......................................................................................................... 11
2.1.1 History ........................................................................................................... 11
2.1.2 Definition....................................................................................................... 12
2.1.3 Important properties of graphs....................................................................... 13
2.1.4 Algorithms ..................................................................................................... 15
2.1.5 Executable graphs.......................................................................................... 20
2.2 Distributed environments...................................................................................... 20
2.2.1 Software solutions ......................................................................................... 21
2.2.2 Advantages and drawbacks............................................................................ 27
2.3 Scheduling ............................................................................................................ 28
2.3.1 Cost of scheduling ......................................................................................... 29
2.3.2 Type of schedulers......................................................................................... 31
2.3.3 Algorithms ..................................................................................................... 32
2.4 Compilers.............................................................................................................. 36
2.4.1 Compilation process ...................................................................................... 36
2.4.2 Compiler optimization................................................................................... 37
2.4.3 Higher level languages................................................................................... 37
2.5 Stack programming............................................................................................... 38
2.5.1 Reverse Polish Notation................................................................................. 40
2.5.2 Forth............................................................................................................... 41
3 Design and implementation....................................................................................... 43
3.1 Business problem.................................................................................................. 43
3.1.1 Morgan Stanley.............................................................................................. 43
3.1.2 Pricing............................................................................................................ 44
5
3.2 Modelling the problem.......................................................................................... 45
3.2.1 Graph representation...................................................................................... 45
3.3 Building up a graph............................................................................................... 47
3.3.1 Quadratic equation......................................................................................... 47
3.3.2 IProcessable interface .................................................................................... 47
3.3.3 Node types ..................................................................................................... 48
3.3.4 Edge types...................................................................................................... 49
3.3.5 Graph object................................................................................................... 49
3.3.6 Example graph............................................................................................... 50
3.4 Simple execution models...................................................................................... 51
3.4.1 Recursion based execution............................................................................. 52
3.4.2 BFS based execution...................................................................................... 53
3.4.3 Simulating stack execution............................................................................ 54
3.4.4 Disadvantages of high level modelling.......................................................... 54
3.5 Execution model ................................................................................................... 55
3.5.1 Tokens............................................................................................................ 55
3.5.2 Execution ....................................................................................................... 56
3.6 Compiling ............................................................................................................. 56
3.6.1 Transforming IProcessable objects................................................................ 57
3.6.2 Order of compiling......................................................................................... 60
3.7 Distribution of tasks.............................................................................................. 60
3.7.1 Random scheduler.......................................................................................... 61
3.7.2 Greedy scheduler ........................................................................................... 61
3.7.3 Separation of tasks......................................................................................... 62
3.8 Improving the execution model ............................................................................ 62
3.8.1 Sync and wait operators................................................................................. 62
3.8.2 Modifications in the compiler........................................................................ 64
3.8.3 Execution logic modifications ....................................................................... 65
3.9 Visualization ......................................................................................................... 66
3.10 Measuring methods............................................................................................. 69
3.10.1 Time measurement....................................................................................... 69
3.10.2 Measuring memory usage of the execution................................................. 70
4 Testing and conclusions............................................................................................. 71
4.1 Test environment .................................................................................................. 71
6
4.2 Test cases .............................................................................................................. 71
4.2.1 Quadratic equation......................................................................................... 71
4.2.2 Random graph................................................................................................ 72
4.3 Test runs................................................................................................................ 73
5 Improvement possibilities ......................................................................................... 77
Acknowledgements ....................................................................................................... 78
Bibliography.................................................................................................................. 79
Appendix........................................................................................................................ 82
7
Abstract
Environment changes rapidly these days – technologies of today will be obsolete
tomorrow, meadows become metropolises in years and even into the seas new islands can
be built. This rapid lifestyle is reflected in everyday life as well. Fast-food restaurants,
fast cars, speed fitness are usual words today, in every corners of our life we can feel the
rapidity. In financial sector agility is more out-standing. People open and close bank
accounts online in about 10 minutes or buy stocks from many different companies in a
flash. To remain competitive in such environment, banks like Morgan Stanley has only
one choice: be the quickest player on the market. Morgan Stanley realized this fact and
puts serious effort into their research activity.
Pricing of the available assets on the market is a very complex task and it is also
very important to do it quickly. Calculating a meaningful prices one second before
another participants of the market means clear business advantages. My thesis is intended
to analyze the possibility to model pricing process using graphs and find an effective way
to execute the constructed model.
Taking the very first step, I started analyzing graph structures and algorithms,
distributed system architectures, scheduling algorithms and possible executing models.
My focus was caught by the execution models as I saw improvement possibilities in the
usual implementations and innovation possibilities for example executing the graphs on
FPGAs.
During my work I wanted to keep the expressiveness of the original reference
model but I also wanted to improve the performance. My intentions ended up in
implementing an expressive reference model, an effective execution model and a
compiler which transforms one model into another.
In the testing phase, I measured 20% performance improvement in average in non-
distributed environment compared to the original reference model. In distributed
environment, using an advanced schedulers, the improvement compared to the original
model is even more outstanding.
At the end of my thesis, I listed a couple of development opportunities, which
would make the performance even better and the system more user-friendly.
8
Összefoglaló
Napjainkban a környezetünk rendkívüli ütemben változik – a mai fejlett
technológiák holnapra elavulnak, évek alatt a legelők helyén városok épülnek vagy akár
szigetek nőnek ki a tengerből. A gyorsuló életstílus a mindennapi életünkben is
észrevehető. Gyorsétterem, gyors autó, speed fitness - gyakori szavak manapság. A bank
szektorban ez a gyors ritmus még jobban megfigyelhető. Az emberek 10 perc alatt
számlákat nyitnak, vagy szüntetnek meg, esetleg pillanatok alatt részvényeket vásárolnak.
Ahhoz hogy ilyen környezetben egy bank, például a Morgan Stanley versenyképes
maradjon egy lehetősége van: a leggyorsabbnak kell lennie a piacon. Ezt a tényt a Morgan
Stanley is felismerte és komoly kutatási tevékenységbe kezdett.
A termékek beárazása a piacokon nagyon komplex feladat, illetve szintén nagyon
fontos a folyamat tényező a folyamat gyorsasága. Egy másodperccel hamarabb tudni egy
termék árát, mint a többi szereplő a piacon komoly üzleti előnyt jelent. Dolgozatom célja
megvizsgálni a lehetőséget, hogy az árazást gráfokkal modellezzük, illetve kidolgozni
egy hatékony végrehajtási módot ezen modellek részére.
Első lépésként megvizsgáltam a gráf adatszerkezetet és algoritmusait, elosztott
rendszer architektúrákat, ütemező algoritmusokat és lehetséges végrehajtási modelleket.
Ezen témák közül a végrehajtási modellekben fedeztem fel továbbfejlesztési és
innovációs lehetőségeket, mint például a gráf futtatása FPGA-n.
Munkám során fontosnak tartottam megőrizni az eredeti gráf reprezentáció
kifejezőképességét, de növelni akartam a rendszer teljesítményét. Végül három részre
bontottam az implementációt: egy nagy kifejezőképességgel bíró modellre, egy hatékony
végrehajtási modellre, illetve egy fordítóra, ami a két modell közötti átjárhatóságot
biztosítja.
A tesztelés átlagosan 20%-os teljesítménynövekedést mutatott egy processzoros
környezetben az eredeti referencia implementációhoz képest. Több processzoros
környezetben, fejlett ütemezőt használva a teljesítménynövekedés még szembetűnőbb
volt.
Dolgozatom végén számos továbbfejlesztési lehetőséget felsorolok, melyek
megvalósításával a rendszer teljesítménye tovább növelhető, illetve jobb felhasználói
élmény érhető el.
9
1 Introduction
This introduction chapter gives a short overview of the topic, the structure and the
desired goal of this paper.
1.1 The structure
The paper is arranged into chapters, which ones guides the reader from the
theoretical background to the implemented solution and tests. The first chapter is the
introduction, it clarifies and details the borders of my task. The second chapter provides
a theoretical background and describes the principles and methods used in the paper. The
design and implementation chapter gives insight to the implementation and explains some
of the most important design decisions. Chapter four evaluates my solution and tests its
performance. The fifth chapter proposes improvement possibilities which can improve
the results.
1.2 The domain
Morgan Stanley[1][2]
is one of the biggest investment bank in the world. In order
to do their everyday work, they use lots of different incoming data feed. If processing of
these feeds is quick enough, the feeds can provide help to make decisions based on the
actual situation or at least the most updated view of the market. When processing speed
exceeds the limit of human limitations, speeding up the systems is not necessary anymore.
Or is it? The answer is definitely: “Yes, it is.” It is essential to boost the speed of
processing. When human factor is left out from the equation, whole systems can be built
relying on the data streams to lift trading onto a higher level. These new systems are called
automated trading platforms and recently they are becoming more and more important.
As all market data is available to every participants of the market, the number of
possible ways to gain advantages over the competitors is very limited. Banks can either
fine tune their systems to boost up the performance or can develop new algorithms and
techniques to make better decisions.
10
1.3 Motivation
Let us assume that, we have a mathematic model of a market, where we can
express connections between products and their prices. Moreover assume we have some
other solution to express the same problem in another way. If we compare these models,
there are two important properties to check. The first is how accurate the model is, how
well it describes the real market. The second is how fast it is to obtain new prices of
products when the incoming data feed - related to the particular product - changes.
From Morgan Stanley’s point of view, having the most advanced solution (both
in terms of accuracy and performance) means direct advantages on the market. As they
realized how much they keen on innovation and new technologies, they started to invest
serious effort in research projects. My work compares some possible solutions, tries to
maximize the performance of one possible solution and also tries to come up with some
guidelines about future possibilities.
11
2 Theoretical background
This chapter is intended to give the reader a better understanding on the terms,
laws and some thesis used. I will provide the necessary amount of information to
understand my work and I will also mark the points, which can be used as a good starting
point for extending my solution.
2.1 Graph theory
As this paper is all about graphs, first important step is to understand what a graph
is, why graphs are important, how we can leverage them in our everyday life. The
following section will guide us through the important parts of graph theory related to this
paper.
2.1.1 History
There are number of times, when understanding a complex problem is much easier
with a small illustration. “Seven Bridges of Königsberg” problem[3]
is one of that type.
In the 18th
century, Leonhard Euler was asked by the citizens of Königsberg with
the following question: “Is it possible to visit all mainland of the city, while every bridge
will be crossed once, and only once?”
2.1.1 – Map of Königsberg
As you can see on the map, the city is crossed by the river Pregel and the lands
of the city therefore are connected by seven bridges. While Euler tried to answer this
question, he basically formed graph theory. Let us take a look at his solution.
12
Solution of Königsberg problem
First, we need to get our problem domain smaller. We do not need to see the
houses of the city, nor the river, nor the bridges. All mainland parts are be represented by
a simple dot and bridges by lines. Here comes the above mentioned principle about
illustration. Try to redraw the map using only dots and lines. The result should be
something as in figure 2.1.1.1.
2.1.1.1 – Redrawn map of Königsberg
It is much easier to analyze the model without the unimportant details. As Euler
observed, if we want to walk through all the bridges (edges), we have to “get in” and “get
out” of a mainland (dot, node) – except the first and the last node, if we do not want to
arrive back to our starting point. Let us call the number of the edges crossing a node the
degree of the node. Euler’s observation implies that each node degree must be even in
our map (except the degree of the first and the last node, if they are not the same).
If we calculate the degree of all nodes, we will see each node has an odd degree.
In real life this result means: after a while, when we get into a mainland, we will not have
any unused route out of it. This determinates the answer of the original question: there is
no way to take a walk in Königsberg and visit all mainland while crossing every bridges,
but crossing each of them exactly once. This type of walk in a graph, where each of the
edges are visited once and only once are called Eulerian path or Euler walk in his honor.
The above mentioned observation about node degrees is a necessary and sufficient
condition for the existence of Eulerian paths.
2.1.2 Definition
At first, let us define what graph means: [4]
13
Graph is an ordered pair G = (V, E) comprising a set of vertices or nodes (V) and
a set of edges (E).
2.1.3 Important properties of graphs
2.1.3.1 Edge direction
Graphs can be grouped by the direction of its edges. An edge (E) is directed if
𝐸 = (𝑣1, 𝑣2); 𝐸1 = (𝑣2, 𝑣1); 𝐸 ≠ 𝐸1; 𝑣1, 𝑣2 ∈ 𝑉
and it is undirected if
𝐸 = (𝑣1, 𝑣2); 𝐸1 = (𝑣2, 𝑣1); 𝐸 = 𝐸1; 𝑣1, 𝑣2 ∈ 𝑉
A graph is considered as undirected graph if it does not contain any directed edges,
otherwise it is referred as directed graph. Directed graphs can be created from undirected
graphs by replacing every edges by two new edges which connect the same nodes but
their direction is reversed. This conversion always can be done without modifying the
meaning of the graph. However, conversion in the reverse direction can possibly modify
the underlying meaning of the model.
2.1.3.1 – Converting undirected graphs to directed graphs and vice-versa
In this paper we are going to use mostly directed graphs as they can express
connections between stock prices in a natural way if we imagine a directed edge as an
arrow. If an arrow (edge) points from a stock price to another stock price (nodes) it means
changes in the source stock price have an impact on the destination stock price.
2.1.3.2 Cyclic/acyclic property
A graph is considered cyclic if there exists a sequence of nodes, where each
following nodes are connected by an edge and the end node of the sequence is equals to
the beginning node of the sequence. Formally:
𝑋 = (𝑣1, 𝑣2, … , 𝑣 𝑛); 𝑣1 = 𝑣 𝑛; 𝑣𝑖 ∈ 𝑉
14
X is called a circle of the graph. A graph is acyclic if it does not contain any
circles.
2.1.3.2 – Graph with circle on the left, without circle on the right
2.1.3.3 DAG property
DAG[5]
stands for Directed Acyclic Graphs. From the previous definitions it is
very straightforward to see the meaning of this property. A DAG is a graph, which is
directed and does not contain any circles. This class of graphs is very important, since
there are many problems considered to be solvable if the underlying graph is a DAG.
Let us suppose we have a collection of tasks that must be ordered into a sequence
(and suppose such ordering exists) and we have a collection of constraints, i.e. rules
stating a task must be completed before another task is started. Constrains can be
expressed by directed edges between nodes representing the tasks: the source node of the
edge must be completed before the destination node of the edge.
2.1.3.3 – Directed Acyclic Graph
As the source node of an edge must be completed earlier than the destination node,
starting from a randomly selected node 𝑛1 and assuming we can have a circle in the graph,
it is possible to reach the starting node again after a while. Let us name the last node 𝑛𝑖
from where we return to 𝑛1. The result implies 𝑛1 must be completed before 𝑛𝑖, as we
15
started from 𝑛1 and reached 𝑛𝑖. It also implies 𝑛𝑖 must be completed before 𝑛1 as we have
an edge which is directed from 𝑛𝑖 to 𝑛1. This is an obvious conflict which means the
graphs must not contain a circle.
Given a set of nodes (V) and edges (E) a topological order is a 𝑣1, 𝑣2, 𝑣3 … 𝑣 𝑛
sequence, where
∀𝐸 = (𝑣𝑖, 𝑣𝑗) ∈ 𝐸(𝑣1, 𝑣2, 𝑣3 … 𝑣 𝑛); 𝑗 = 𝑖 + 1; 1 ≤ 𝑖 < |𝑉| 𝑣𝑖 ∈ 𝑉
It can be proved that if topological order exists, the graph is a DAG and vice-
versa. It is also a fact that topological order is not always unique which means that there
can exist more ordering for a given graph. When it comes to scheduling differences
between the orderings give us the opportunity to optimize our solution based on several
criteria (e.g.: running time, cost, finishing time, etc.).
2.1.4 Algorithms
Graphs are powerful structures for representing real life problems such as
simplified stock pricing. For formalized problems it is easier to create general solutions.
In this section, several algorithms are elaborated.
2.1.4.1 BFS
Breadth First Search is a method to find a particular node in the graph starting
from a given point or prove that the point is not reachable from the given start point.
The algorithm starts by visiting the neighbors directly reachable from the starting
point. Let us consider that the starting point is on level 0. In this case, level 1 contains all
directly reachable nodes from the starting point, level 2 contains those nodes, which can
be directly reached from level 1 nodes, etc. Generally, level N contains nodes which are
reachable from level N-1, but not yet reachable from N-2.
The algorithm stops when it finds the required node or there are no more
reachable, unlabeled nodes. It is possible that the algorithm does not process all the nodes
– only nodes reachable from the starting point will be processed. This fact gives us tools
to identify properties such as reachability from a given node, to prove that the graph is
fully connected or not.
There are two important good-to-knows about this labeling procedure. Firstly, the
algorithm processes all available nodes on level N – which basically means identifying
16
nodes on level N+1 – before processing any nodes on a higher level. Secondly, the
algorithm goes forward, which means it skips all nodes which are already labelled with a
lower number.
2.1.4.1 – BFS ordering – each node on level N gets processed before going onto level N+1, order of
nodes on a particular level is not defined
The way of determining which node to process next is not specified in the original
algorithm, thus, it can be chosen. It can be full random or can use various data to make
better decisions. The simplest way is to queue the nodes, thus when a new node is
identified to be processed, it is put at the end of the queue. The first element in the queue
will be processed next.
From a higher perspective, level numbering can be looked as a super simple
heuristics. If we use problem specific knowledge, there are wiser ways to shepherd the
order of the processing, however using more specific heuristics can easily lead to a
whisper which says we should reorder processing even if we cross levels even if it leads
to an algorithm, which is not a BFS anymore. For example, speaking about route search
on a map – which can be easily tracked back to a search in a directed graph – it is wiser
to use air distance rather than leveling. But as I mentioned earlier, this leads to conflict
with leveling. It is more useful to process nodes with lower air distance metrics to the
destination even if these nodes are on higher levels from level number perspective.
However air distance heuristics can be wrong and lead the algorithm to a dead-end where
we are really close to the destination but we cannot reach it.
17
As we saw there are trade-offs when designing or using algorithms. BFS can be
easily implemented and used in various situations (detecting partitions in graphs,
traversing all the nodes, leveling problems, etc.) but also can be an in effective mechanism
as we illustrated with the map example.
2.1.4.2 DFS
Depth First Search is really similar to BFS. The difference between the two
algorithms are in the order of the processing. In BFS we use levels to identify a group of
nodes to process. In DFS we can also use the level term, but the applied rules will be
different.
Let us suppose we have a starting point and we know the level numbers for all the
nodes (level numbers are specific to the starting point). DFS will pick a node from the
directly reachable nodes (level 1 nodes). However, rather than picking the next node from
level 1, the algorithm will pick the next node from the directly reachable set of nodes (on
level 2) from the previously selected level 1 node. If there are no more reachable next
level nodes, the algorithm will jump back and select the next node from a previous level.
The exit criteria is the same as in BFS algorithm. The algorithm stops if it finds
the searched node or there are no more unprocessed node. It is also possible that the
algorithm does not process every nodes as the graph is not necessary fully connected, thus
not every nodes are reachable from the given starting node.
This means that all the reachable graph nodes will be processed before processing
the next node on level 1. The name of the algorithm comes from this fact, as it searches
deeply inside the graph while BFS first checks the closest nodes and slowly grows the
checked area.
From the implementation point of view, DFS can also be implemented easily with
a queue. In this case, new nodes will be put in the beginning of the queue and we will
pick the next node also from the beginning of the queue.
18
2.1.4.2 – DFS order with random selection. Level 3 node gets processed before lower level nodes are
finished.
Note that the previously mentioned modification of BFS with the air-distance
heuristics silently turned out to be a DFS. Heuristics is used when deciding on which path
to go forward.
2.1.4.3 Traversal using search algorithms
There are cases, when we want to apply functions to every node in a graph rather
than just finding one particular node.
Technically when processing a node, we can apply any function to it. For instance
we can write out data from the node or accumulate its value to a global variable. The only
problem is to ensure that we have visited all the nodes. With a small modification, the
described BFS and DFS algorithms can do this for us.
Remember in both cases the algorithm has exit criteria – namely finding a node
which fulfills the search criteria or getting to the end of the graph. If we ensure that the
search predicate will never evaluate to true or just simply skip checking the search criteria,
the algorithms are guaranteed not to stop before reaching the end of the graph.
Actually we still have an issue with this approach. It is not ensured that every
nodes are reachable from the starting point – it is not guaranteed even in a randomly
selected undirected graph, since graphs can have separated partitions. The solution is to
check that we have processed all the nodes in the graph or not when we reach an endpoint
19
(for example by counting the number of processed nodes and comparing to the total
number of the nodes in the graph). If the graph still contains unprocessed nodes, randomly
select one of them and restart the algorithm using the selected node as starting point.
2.1.4.4 Topological ordering
As we saw earlier, topological order carries important and meaningful properties
such as the DAG property. The following algorithm gives a simple solution to generate a
topological ordering if it exists.
We describe the algorithm using scheduling as an example. The initial setup is the
usual: we have a directed graph where nodes are the tasks and directed edges express
precedence between the tasks.
Firstly, algorithm selects all the nodes which have no outgoing edges. These nodes
will be processed as last since there are no dependencies on them. If no such nodes exist,
that means no topological order exists, hence it means the graph contains at least one
directed circle. Let us leave out these nodes from the graph and repeat the previous step.
Then repeat again until no more nodes remain in the graph.
2.1.4.4 – Execution flow of topological ordering algorithm
It is easy to prove that algorithm gives a topological order of the graph and can be
implemented efficiently. As mentioned earlier, topological order is not unique. Given the
result of this algorithm multiple topological orders can be created if we know which nodes
belongs to which iteration. Nodes added to the schedule in the same iteration can be
shuffled as they do not influence each other and their execution.
20
2.1.5 Executable graphs
Executable graphs are more than usual DAGs by storing a function inside the
nodes and the edges. In general, these function can do many operations from logging to
changing its value based on some criteria. Executing the graph means to visit all the
elements of the graph and run their stored functions.
Execution starts at the nodes which are marked as inputs. When all the nodes are
run, we create a collection from the values of the nodes which are marked as output. This
collection will be the result of the execution.
Execution may raise interesting questions. Consider we have a huge graph with
multiple inputs and outputs. What happens if an input changes? Do we really want to
recalculate the whole graph or is it possible to recalculate only the affected parts? The
starting point of the execution is locked, but the execution flow can shape very differently
based on the algorithm used. In this paper, I am searching for a proper way to optimize
the execution of this type of graphs. By the end of this paper, the reader will have a good
understanding about the possibilities, challenges and pitfalls of the problem.
2.2 Distributed environments
Computer science has been changing on a very high pace since the first computer
was turned on. According to Moore’s law[6]
, computing capacity will be doubled every
two years. This law looks accurate based on the last couple of decades.
This fact also means shifting in problems in computing science as well as the
development of the underlying hardware. First computers were giant, monolithic
machines which executed programs sequentially. Sequentially means that programmers
had to explicitly declare the order of the commands to be executed. In addition, back in
these days the machines were exclusively reserved for the program under execution.
This may sound strange, since today users are listening to music, while chatting
on the internet and editing a document in a text editor – probably on a device which fits
into your pocket. This example frames the shift very well – in old days a programmer
knew the program will run on a dedicated machine and the interpreter is going to execute
the commands in a well-defined order. In the era of “internet-of-things” this model has
been dramatically changed. Multiuser, multitasking, parallel programming, remote
execution and asynchronous execution are concepts which developers of today are needed
21
to be aware of and deeply understand them in order to develop high-end, modern
applications and leverage hardware capabilities.
2.2.1 Software solutions
It is usually hard to clearly separate hardware and software solutions as they both
require support from the other side as well. XXIst century is characterized from IT
perspective mainly by internet, mobile and recently cloud boom[7]
. Hardware
developments are continuing as well, but as hardware capabilities are sufficient for
various everyday usage, software engineers’ focus is now shifted to serve other purposes
as well like user experience, scalability, reliability and etc. Altered or at least extended
goals require new approaches.
2.2.1.1 Parallel programming
While in most cases, a mobile device has more computing resources than the first
mainframe computers had, these devices are used to solve more complex problems as
well. To leverage hardware capabilities, programmers are needed to be aware of parallel
programming[8]
. In the sequential model there exist an exact order of the commands which
is defined by the developers – and optimized by the modern compilers. Considering cases
when the execution of two commands do not have effect on each other, a natural solution
is to boost the execution time. Use the available free processors to compute distinct parts
of a complex expression.
2.2.2.1.1 – Parallel execution of (1+2)*(3+4) with one and two CPUs
22
While multitasking does not require paradigm change among developers, parallel
programming does. Designing parallel algorithms is a very creative, intuitive and
innovative process. While best practices exist, there is no ultimate way to design parallel
algorithms or convert sequential algorithms to parallel mode. However, in most of the
cases, developers get a huge payback in trade of the effort made to design the parallel
algorithms. Depending on the level of parallelization, there is a fairly simple way to
calculate the profit. A simple theoretical upper barrier of the obtainable speed-up, given
the sequential execution time is 𝑇 and we use 𝑛 processors, is 𝑇 −
𝑇
𝑛
. In practice, this
barrier cannot be exceeded, reached and even with the wisest design, usually we cannot
get near to this limit. However, we can influence the order of magnitude of the execution
time. First reason for this limitation is that parallel programming also introduces
synchronization and governing overhead. The second is: parts of a parallel executed
problem are needed to be joined at a certain point. It means that even if we could
completely parallelized a problem, reaching the end of the execution line will cause some
processors to be idle and others to calculate the joined results. In real life problems only
parts of the algorithms can be parallelized, thus during the execution, parallel and
sequential parts follow each other. The more alternating parts means the more
synchronization overhead is required. This fact explains well why parallel programming
introduces serious overhead. It is worth to mention that, by using overlapping between
execution cycles we can counteract some of the overhead. Overlapping means we start to
execute the next cycle before the current cycle is finished to leverage idle time of the
resources.
2.2.2.1.2 – Alternating between sequential and parallel execution
2.2.1.2 Threading
While parallel programming leverages multiple CPUs, threading aims to run
programs in parallel on a single CPU. This concept is very similar to multitasking, while
multitasking is the idea, threading is the implementation.
23
In computer science, a thread of execution is the smallest sequence of
programmed instructions that can be managed independently by a scheduler, which is a
typically part of the operating system.[9]
Threads could be defined in some other ways as well, however this definition
carries much additional information about threads.
The first important fact is that a thread is not a program or source code, but the
execution of them. It means threads exist only in execution time which implies that
debugging of multithreaded programs is hard since static analysis is much harder or even
impossible to perform. The first fact also means that a big execution flow can be broken
into smaller parts – manually or semi-automatically. The concept of threads advances the
need for schedulers, which we will discuss in more detail in the following chapter. As the
definition above shows, schedulers and threads are strong parts of every single modern
operating systems - threading enables to execute multiple programs on a single CPU or
execute single program on multiple CPUs.
Going deeper into the hardware and analyzing how threading works, it becomes
clear that threading can be very expensive and after a certain number of threads, creating
new threads will seriously degrade the performance of the system. This is caused mainly
by the context switches. When a CPU executes a program, it loads the program and the
data into the memory, populates the required registers, positions the PC (Program
Counter) and then steps through all the instructions. But what happens, if in the middle
of the program, the CPU is get preempted and it needs to start to execute another program
– knowing that we want to finish the intermittent program later? The CPU has to save the
current execution state, load the environment of the new program, which is probably a
previously preempted program, and execute it. This is called context switch.
It is very important to keep the number of context switches at a reasonable level.
The time, which the CPU spends on changing between contexts is the overhead of the
threading. If the computer has only one CPU installed, this overhead is the cost for
multitasking and of course it means the users cannot leverage 100% of the computing
capabilities. Moreover, we cannot expect boost in the execution time (but we can expect
it in throughput). When using multiple CPUs, advantages of threading are easier to notice.
While we get the ability of multitasking, we can also gain serious performance boost.
24
2.2.1.3 Remote execution
Mobile devices and personal computers can carry huge computing and storage
capabilities, but there are cases when it makes sense to separate concerns: to store sensible
data in replicated and safe data houses, to execute complex calculations in the cloud and
to interact with the user through personal devices.
In this approach, the actual footprint of the system is determined only at runtime
and can be dynamically changed. It also requires communication between the parts of the
system. Vigilant readers can spot that, this model is a scaled-up version of an everyday
computer. These concerns are also separated in personal computers, but the distance
between the parts is much smaller. If the user wants to access a file from the EU which is
stored in the US, a request must be sent to the data warehouse and the data must be
transferred over at least 4-5 thousands kilometers. One challenge is to overcome this issue
and provide a competitive solution opposite to store the data locally. Nowadays internet
access is fast enough to leverage the advantages of the described scenario and avoid the
latency. There are some serious advantages as well- data warehouses provide 24/7
support, insurance, replication, competent professionals, high standards and almost
unlimited storage capabilities which can be easily extended based on user needs.
Speaking of clouds, advantages are probably not clear for everybody at the first
sight. People usually run multiple programs at the same time. If we consider not just
everyday users but also professionals, creative people, such as 3D model creators, image
editors run complex algorithms which can run for days even on a high-end computer or
to stick with the topic, build and compile complex computer programs can take minutes,
hours or even days. For these scenarios, cloud computing is a viable solution. But what
is the cloud? By definition, cloud is a pool of resources which can be dynamically
allocated for users based on their needs. It means if developers want to build and compile
a complex program, they simply allocate the resource they need and run their task.
Everybody pays after allocated resources. In 3D modelling, rendering an image can take
days on limited resources. In cloud, rendering time can be influenced by the amount of
the allocated resources. If you need to demonstrate the current state of your scene quickly,
you can allocate 10 times more resources than usual and demonstrate your results in
hours. Also you have to pay more, but only for the time period when you use the
resources. It means reduction in costs and at the same time serious improvement in
productivity.
25
Does it worth to invest serious amount of energy, time and money to make remote
execution possible? Fortunately, these technologies are designed and tend to be
transparent to the user and as much as possible to the developers as well. Remote
execution requires complex underlying systems, but companies, like Google[10]
,
Microsoft[11]
, Amazon[12]
, etc. offer these systems out of the box. It does not mean that
developers do not have to modify their code to be able to leverage these technologies, but
it means, they do not have to be aware of how these functionalities are implemented.
Usually they get an API and build their systems against it while behind the scenes remote
calls, remote storage, and remote calculations are used.
2.2.2.3 – Remote procedure calling architecture
The idea is to abstract away calls and replace the actual implementation with an
interface which has the same public available methods, properties and etc. Developers
will see the same methods, properties and they call these methods and use the properties
in the same way as they usually do. But behind the scenes the actual implementation of
the interface will just simple pack the request into a package, send it to an execution server
which can be in the cloud. The execution server will instantiate the original
implementation and call the method with the parameters of the original call. Once the
execution is finished, the result will travel back to the client side skeleton and the original
calling object will see the result. The beauty of this idea is that neither the calling object,
nor the serving object is necessary to know about the fact that the call is made remotely.
26
Moreover, if the design of the original code satisfy some criteria, support for remote
procedure call or remote storage, can be injected automatically.
2.2.1.4 Asynchronous calls
Imagine a scenario, when you have only one CPU and you build a system which
leverages remote procedure calls, remote storage and you also want to support
multitasking via threading. Let us say, you want to calculate and show the aggregated risk
for your whole business. This requires to load data from your remote storage, execute
some complex calculations on it and populate a fancy chart with the results. You run the
business for 5 years now, you have 10 gigabytes historical data and the calculation takes
10 minutes in the cloud. During the calculation, you want to switch to your client
communication module inside your application and you plan to reply some user
feedbacks. So you kick off the calculation process and try to switch to another module,
but the application is frozen.
In this case, I presented an unfortunately very common use-case. Developers
usually forget the fact that remote execution does not solve every problem. In this case,
you started a process which takes minutes to complete due to the remote storage and
complex calculations. The developers of your application were not aware of threading
and asynchronous methodology. When you click on the button which starts your
calculations the caller is blocked until the results arrive – in this case, user will be blocked
for at least 10 minutes. It is called synchronized method call.
Threading can be a solution. When you start the process, the application will
create a new thread and execute the call on the thread. This means that the main thread,
which is almost in every cases the UI thread, will not be blocked.
An alternative solution is the asynchronous call[13]
, which basically does the same
but abstracts the details away. Programming languages usually contains keywords to
mark a call as asynchronous. Behind the scenes, the compiler replaces the keyword with
a wrapper around the call, starts and initializes a new thread and executes the method.
Nowadays in modern applications, it is a very important requirement to be smooth
and reactive. If an application hangs for more than one second, users think something
went wrong. Asynchronous notion is very similar to parallel programming as your
program does something parallel while the results are getting available from another
27
function call. Thinking about remote execution and clouds it absolutely makes sense to
use your local resources for other purposes, while remote execution runs.
2.2.2 Advantages and drawbacks
As we saw above it is worth the effort to invest into distributed environments. In
this section, I try to summarize the advantages and the drawbacks of distributed systems
and try to highlight when to use these techniques.
Nowadays multiuser systems does not mean the same as in the past. A single PC
is a multiuser system in sense of the old definition. Nowadays, multiuser systems are
huge, globally or at least widely available systems for thousands of users. When building
such systems, we cannot avoid distribution of concerns. Scalability and reliability usually
comes with wisely designed distributed system. If more users are interested, the support
team adds a new server to serve the higher number of requests. This model is also more
reliable, as if one server goes down, others can step up and serve the broken servers
requests, ensuring that support team has enough time to replace the broken one without
the users noticing that the original server was down.
Distributed system can naturally boost up the performance of several algorithms.
Although it has some limitations, advantages are clear. With some investment into a
distributed algorithm, the order of magnitude of the execution time can be seriously
improved.
Distributed systems can provide more fluent and transparent experience. It is also
more rational to let professionals take care about our data storage, while we can focus on
developing our business logic and the cloud professionals can take care of the execution
background of our solution.
On the other hand, usage of these technologies can lead to overcomplicated source
code, our algorithms and our systems. It is a common mistake to fine-tune an algorithm
for parallel execution which will be executed only a few times. Developers need to learn
how to leverage the advantages, while minimizing the drawbacks. For example, it is worth
to fine-tune an algorithm which is called every fifth seconds during the application
lifetime and now takes 4 seconds to complete. But it is not worth to do it with an
initialization part of the software which runs only once a day, but it takes ten minutes to
execute – or if it is that important to fire up the application quicker, we can write a script
which runs the application before the user gets into the office.
28
There are cases, when parallelization would be really necessary, developers put
serious effort into the design and at the end of the day it turns out that the solution is
wrong or the problem cannot be parallelized efficiently. Or it would take more time,
resource or money to do than the actual profit.
It is also worth to mention that, implementation, testing and debugging is much
more complicated than in case of a single, standalone, not distributed application.
Moreover, maintaining distributed systems requires professionals who really understand
how the system works, requires communication between vendors to solve problems, etc.
These factors highly influence the costs of a distributed system.
As a rule of thumb, we should think before deciding to build distributed systems.
Developers should mindfully analyze the requirements, think about the edge cases, the
advantages and also about the drawbacks and decide wisely.
2.3 Scheduling
Scheduling is the process of defining the order of the tasks [14]
. People use
scheduling all the time, for example recipes define in what order you put ingredients into
the meal, bus schedule defines the time when a particular bus arrives at the station etc. In
general, scheduling allows accessing to a set of resources by setting up rules.
Before the dawn of parallel programming, programmers defined the order of the
execution implicitly in their source code. If a command precedes another one in the
source, the execution of it will also precede the other one. Main goal of parallel
programming is to allow unrelated code blocks to execute simultaneously. In order to
achieve this functionality, programmers have to explicitly design the code and mark
parallel parts of the system. This model is useful, but leaves the responsibility of deciding
when to parallelize code in the hands of developers. It means freedom but can also lead
to possible errors or unused possibilities. It would be good to take over the responsibility
for marking parallel blocks automatically and just allow the developers to mark blocks
where they forbid parallel execution. Scheduling is capable to provide this functionality
from an aspect.
A good order of tasks can seriously boost the execution time, while it also can
seriously degrade the performance. A badly designed scheduling has huge impact on the
throughput of the system.
29
2.3 – Various execution times for different schedules
Figure 2.3 illustrates the case, when two processing units are working on a bunch
of tasks. The longest task is executed first which hangs the other processor because the
second has dependency on a task which is not yet calculated on the first one. Switching
the order of tasks for the first processing unit will dramatically lower the finishing time
of the job.
2.3.1 Cost of scheduling
As we saw in the previous example, scheduling can have huge impact on
performance. It is also important how we define the actual schedule, how we measure
how good the actual order of the tasks is and how quick we can generate an acceptable
schedule. The advantages of scheduling are strongly depending on these facts.
2.3.1.1 Defining the schedule
Usually schedulers are functions which answers the very basic question: “Which
task shall I run next? – CPUx”.
There are a number of possible ways to come up with an answer. In the Algorithms
part of this section, I am going to describe some of the available algorithms. For example,
the simplest algorithm is the random scheduler. It selects a task randomly from the
available ones and does not consider any optimization. Or the greedy scheduler, which
30
has one dedicated metric to optimize and greedily selects the available best task in order
to optimize the performance. Many more advanced algorithms exist, like the genetic
algorithm which tries to find the best solution using evolution.
2.3.1.2 Measure the goodness of the schedule
The goodness of a schedule is very subjective. While in a typical backend system
a good schedule can take hours to complete, in a frontend system we usually consider an
application good, if response time is under one second. But time is only one aspect. Others
may focus much more on energy consumption, disk space consumption or cost of the
execution.
We can compare the performance (based on any criteria, which is measureable)
of the old and the new implementation and we can calculate how well the new
implementation performs compared to the previous one. But it does not really say
anything about the best solution. Let us say we use overall execution time as the metric
for the goodness. If we use 𝑁 processors, the theoretical minimum is
𝑇
𝑁
where T is the
sum of the execution time of all the subtasks. It can be proved, that the minimum is not
always reachable, but it acts well as a low boundary for the goodness.
2.3.1.3 Generate the schedule
Another aspect is the creation of the schedule. Even if it is possible to create a best
schedule for a given problem, it is not that clear we want to reach that or not.
A random scheduler can be implemented easily and can quickly select the next
task to run. In contrast, a genetic algorithm requires much more time to create a schedule.
The complexity of the generator algorithm highly depends on the actual problem we try
to solve. If we have a graph which has 5 nodes and 6 edges, it is easy to implement a
method to check all available combinations and choose the best one. If we have a bigger
graph, let us say 100 nodes and 500 edges, the complexity of the tester and the required
time for the algorithm jumps to the air. Roughly estimation for the first case is 5! = 120
possible combinations, while the second case has 100! = 9.33 ∗ 10157
combinations.
This is too much to be solved.
Usually schedulers allow us to search for approximate, sub-optimal results, which
are between defined boundaries of the optimal solution. We can limit the time for the
scheduler and hope that the given result is good enough.
31
2.3.2 Type of schedulers
Schedulers can be grouped by a lots of properties.[15]
In this section, I am going
to describe one possible grouping which focus is mainly on the role of the scheduler
throughout the lifetime of the program.
2.3.2.1 Online schedulers
Online schedulers are actively participating shaping the execution flow during the
lifetime of the program. Their role is to monitor the status of the system and update their
metrics accordingly and make scheduling decisions based on the latest available state.
Online schedulers do not have a predefined order of the tasks and they can adopt
to new situations occurred during the execution. They usually tries to make as good
decisions as possible, but their goal is not to reach the optimum, but to be as flexible as
possible.
Online schedulers are mainly used in interactive systems and where the scheduler
algorithm allows and needs to apply the scheduling multiple times. The algorithms are
usually try to be simple – round robin, time slicing, etc. It is also important to use this
type of scheduler, when the system is being modified during the execution.
Operating systems are a good place to use online schedulers. Computers have
multiple CPUs with multiple cores, many processes tries to access the resources, the
scheduler usually runs each time when a new request comes in or on a periodic time to
schedule the running tasks. Good schedulers avoid starving of processes, provide justice
and support prioritizing.
2.3.2.2 Offline schedulers
Offline schedulers usually run only once in the lifetime of the program – or in a
given execution period. They are mainly used, when the problem is complex, the domain
of the problem does not change during the execution period and the scheduler algorithm
requires serious amount of time or resources to complete. They are also good choices,
when the execution is periodic.
Comparing to their online solutions, they are less flexible, as they cannot react to
changes in the environment during their execution period. They usually aim for better
decisions than online schedulers. They usually analyze the model of the problem and
create a suboptimal schedule.
32
Example for their usage can be a car factory. In the case the scheduler is the
production engineer who sets up the layout of the factory and defines the subtasks of
creating a new car. In computer science, an example could be the field of executable
graphs. These graphs are usually used more than once as their creation is very resourceful,
which means they do not really change between executions. The scheduler can create a
schedule for the graph and the executor can use this information during the execution
time.
2.3.3 Algorithms
Algorithms used in schedulers are specialized for a given problem, but they all
follow some common schemes. The following algorithms can be found in several
products on the market.
Note that each algorithm takes a list of tasks and figures out an order for the
execution, thus we can assume all of them are offline schedulers. For simplicity, each
algorithm description will use overall execution time as a key property to optimize and
will work on a DAG. Let us suppose we have 𝑁 processing units.
2.3.3.1 Random scheduler
Random scheduler is the simplest to implement. It takes the input nodes of the
graph and marks them as available to process as they do not have any dependencies. It
stores the list of the marked processes in a queue (FIFO container). The algorithm
simulates one run of the graph and create a mapping between the nodes and the processing
units.
In the beginning of the algorithm, each processing unit is available. The algorithm
takes the first available process from the queue and randomly selects a processing unit
for it. The scheduler checks the nodes which are depending on the scheduled one – if one
of them becomes available due to the procession of the node, it is pushed into the queue
otherwise the scheduler notes that one dependency for the node is ready. Next step is to
schedule the next available node from the queue. The algorithm stops, when all the nodes
are scheduled.
These algorithm is not very clever as random selection is not optimal – worst case
scenario is to schedule all the nodes to the same processing unit and leave others idle. It
does not consider the possible parallelization chances either. One improvement to support
33
parallelization is to select the next node from the available queue also randomly, which
creates the possibility to execute unrelated part graphs simultaneously.
2.3.3.2 Greedy scheduler
While random scheduler only uses the simulation to visit all the nodes, greedy
scheduler tries to leverage simulation of the execution more.[16]
It also maintains a list of
the available nodes and also adds the input nodes to an available nodes queue in its
initialization phase.
The main difference can be spotted in finding the next node to schedule. Let us
define variable 𝑇, which keeps track of the required amount of time for the schedule. At
the beginning of the simulation, the scheduler selects 𝑁 nodes and distributes them among
the processing units. This distribution can be random or based on several criteria. Some
implementations do the distribution based on the required time to execute (shortest job
first, longest job first) or on how long the node has been waiting to execute. Selection of
the processing unit is not random anymore. At a given point in time, only 𝑁 nodes can
run, as we have limited number of processing units. If we have unscheduled but available
nodes and do not have any available processing units, we increment variable 𝑇. The
simulation also has to know or at least to have some meaningful estimate, on how long
the execution of a particular node takes. Based on this information and the starting time
of a node, the simulator is capable to calculate when a processor will become free again.
It means that incrementing 𝑇 for a while will end up making a processor available again
so we can schedule the remaining nodes. This iteration is going to continue until all the
nodes are scheduled.
This method is better than the random scheduler as it tries to balance the usage of
processing resources. It makes parallel execution possible in a sense that parallel graph
parts can be executed simultaneously. It can also be implemented efficiently.
2.3.3.3 Critical path scheduler
This type of schedulers is originated from the critical path.[17]
Prerequisite is to
know the execution time of each nodes in the graph.
If we look at a schedule, usually we can reorder some tasks without modifying the
finishing time of the schedule, but there are tasks which cause delay in the execution if
they are executed later than expected. Critical path is a list of tasks which cause delay in
34
the schedule if started later than expected. The scheduler tries to focus on the critical path
and ensures that we have an available processor when a new critical element becomes
available.
To identify the critical path, we have to calculate some metrics for each node. Let
the EST (Earliest Starting Time) of the input nodes be zero. Every nodes following the
first one should be started as soon as it gets ready. EST should be the maximum of EETs
(Earliest Ending Time) of the dependency nodes. EET can be calculated from the EST
and the execution time of the node. Another important metric is LST (Latest Starting
Time). Let the LST of the output nodes be the EST of them. LST of the previous nodes
will be calculated based on the LST of the successor nodes, by subtracting the execution
time from the LST of the successor nodes and set the value for the minimum of the
calculated values. MDT (Maximum Delay Time) is calculated from EST and LST:
𝑀𝐷𝑇 = 𝐿𝑆𝑇 − 𝐸𝑆𝑇. If MDT of a node is zero, that means delay in its start will result in
delay in the whole schedule.
This class of schedulers tries to satisfy the starting time requirements, but if we
restrict the number of the processing units, it is possible that we must introduce some
delay to ensure the correct order of the execution. This scheduler gives very good results
and allows us to leverage parallel execution. Moreover it helps to calculate how many
processing units are needed to solve the problem effectively and can be visualized easily
using Gant diagrams.
2.3.3.4 Genetic algorithm scheduler
Genetic algorithm scheduler uses a very different approach to create a schedule:
it tries to model the evolution. [18]
Genetic algorithms use genetics, reproduction and
natural selection to converge to an optimal solution.
The previous algorithms tried to create the solution in one step. Genetic algorithm
initially creates a series of solutions usually randomly which is called the initial
population and improving them iteratively. In our case, a solution consists of a mapping
between the nodes, the processing units and can also include an order of the nodes. One
single solution is considered as a genome. The goal of the algorithm is to create the best
genome.
How can we measure the goodness of a genome? In genetic algorithm terms,
goodness is considered as fitness of the genome. Like the stronger beats the weaker in
35
evolution, more fit a genome is, more likely it survives to the next generation. In our
example, we can compare genomes by comparing the required execution time of the
encoded schedules. Lower execution time means higher fitness value.
Once we have the initial population, genomes can start reproducing. This process
is called crossing. As two genomes are paired, they can be combined into a new genome.
The offspring contains a random proportion of the information stored in its parents. We
select one random crossover point, where we split the parent genomes. In practice if we
store the mapping in an array, crossing means we take the first part of the array from the
first parent, the second part of the array from the second parent and creates a new array
from them. If we apply this method for the array where we store the order, we have to be
very careful and ensure that the order we create stays consistent with our restrictions.
After the offspring is born, the algorithm applies some random mutation. It is an important
step, as it helps the algorithm to overcome on local maximums/minimums and add more
variations which provide the chance to evolve. The mutation can improve and also
degrade the fitness of the genome.
This reproduction cycle continues until the population reaches a predefined size.
Once it happens, natural selection is performed among the genomes. Fitter genomes will
make it to the next generation while less fit genomes will drop out from the population.
The selection process has to ensure that a predefined count of genomes will survive the
natural selection. The new generation will start over the described cycle, will be
reproducing for a while and then natural selection will happen again.
Every generation is going to be at least fit as the previous one. The algorithm stops
after a predefined number of generations, a predefined time limit or a predefined
acceptable fitness of the generation.
Comparing to the previous algorithms, genetic algorithm is the most advanced
one, but also requires the most resources and takes a long time to complete. It is very
flexible, developers have a lot of choices to influence the algorithm: one can modify the
randomness in the mixing or in the pairing process, randomness of the mutation and its
value, the maximum size of the population, the threshold in the selection algorithm and
the number of the generations.
It has pitfalls as well, for example the population can stuck into a local optimum
or the extended resource and time requirements to create the schedule.
36
2.4 Compilers
When the computer is about to execute the program, it needs information which
it can understand. The set of the information which the CPU understands is called the
instruction set of the CPU. A program is executable on a given CPU, if there exists a
mapping between the language of the program and the instruction set of the CPU.
Compilers act as a bridge between the source and the destination set of symbols.
2.4.1 Compilation process
During the compilation process, the source code is transformed into byte code,
executable by the CPU. This flow consists of several components. The following picture
illustrates a possible process for C++ code compilation.[19][20]
2.4.1 – C++ compilation process
Developers need some tricks to make the development and maintaining process
easier. They can write comments into the source code or they can split source code into
smaller parts for example for clarity. Usually preprocessor is in charge to support these
features. It removes comments, reassemble divided parts by copying them together and
removes unnecessary clutter from the code which are important only for the developers.
Preprocessor also can take care of specific language features which are only syntax-
sugars. The output is a clear, compile-ready code.
The compiler transforms the input into the desired output language. In our case,
the compiler will take C++ code and compile it into Assembly language. It is important
to notice, that the process needs to know some parameters about the final system which
will run the program. The main advantage for using higher level languages is that by
introducing one more abstraction level, we define general mapping between higher level
37
constructs and lower level implementation. The program can focus on its own purpose
meanwhile compiler vendors take care about the compilation process for different
platforms.
At the end of the process, the assembler and the linker takes the compiled files
and creates an executable binary by setting the memory layout of the program, by linking
other libraries and by transforming them to binary code which is now platform dependent.
2.4.2 Compiler optimization
During the compilation, modern compilers applies some optimization.
There is a huge number of available optimization options for the most widely
used, free C++ compiler.[21]
Let us take a look on some examples, to see the choices a
developer has when optimizing the compilation process or the result.
 -fno-inline – tells the compiler not to expand any functions inline
 -fno-function-cse – makes each instruction that calls a constant function
contain the function’s address explicitly
 -fsplit-wide-types – if a type occupies more registers, split them apart and
handle independently
 -fdevirtualize – try to convert calls to virtual functions to direct calls
It is good to be aware of the actual optimization mechanisms, as they can cause
serious improvement as well as serious problems too. For example, compilers usually
eliminate variables which are never assigned in the code. However in hardware
programming if it is possible to set the value of the variable from hardware side as well,
eliminating the variable can cause unexpected behavior. To turn this feature off locally,
we can mark the variable as volatile.
2.4.3 Higher level languages
On the dawn of programming language evolution, creators of new languages
usually wanted to keep the possibility to reach hardware easily if needed – C and C++ are
very good examples for this. Later, as reoccurring tasks were discovered, people started
to create programming languages which fitted their needs better. As Assembly is quite
expressive in hardware level programming, functional programming languages like
38
Scala[22]
or F#[23]
are more expressive in their own fields. We can call these languages
higher level or domain specific languages[24]
.
Domain specific languages have multiple advantages. The most important one is
how natural a program can be in a well-designed domain specific language. Back to the
previous example, in the Assembly program you use more like registers and basic
mathematic operators, in C++ you call a predefined method of a virtual console. Behind
the scenes, we have a serious chance that with a good compiler, both programs compile
to similar byte code. No doubt, we could write thousands of compilers for every
programming languages, which compile them to bytecode, but in fact, compilers of
domain specific languages usually compile to the most similar language which already
has an existing compiler rather than reinventing the wheel.
Another strong argument for using DSL-s is the following: usually developers
create software based on someone’s requirements. A trader of a big bank presumably will
not be able to describe problems in an Assembly like environment. They speak about
bonds, trades, yields and other financial terms but not registers, jumps and functions. If
developers would have a set of tools in which they can express their users need and then
compile them to binary executables, development process would be more effective and
less expensive. There are many promising results in the field of very high level DSL-s,
but we are far from being able to solve every problem this way.[25]
While developing time and cost can be extremely reduced by using DSL-s, the
effort put into developing the DSL itself is huge. In the developer team at least one
member has to be a domain specialist to be able to identify the required features to
support. The set of the features can be huge and we have to filter wisely what to support
from them. It is also hard to decide how generic the new DSL would be. If it is too generic,
it may not be as useful as it could be. If it is too specific, we restrict ourselves to a very
small set of problems, which we can solve by the DSL. Also, if the DSL would be used
only once, most likely it will not worth the effort to develop.
2.5 Stack programming
Standard programming languages abstract away the details of the underlying
infrastructure and execution details, however many systems use stack-based execution
model under the hood. They operate upon one or more stacks.
39
Consider a recursive function call. When the function starts to execute, it sets up
an environment for itself, for example it allocates variables. This environment is
considered as the context of the function. When the first recursive call is made, the
function will execute again and setup its own context. Under the hood, the context of the
outer function will be pushed onto a stack, otherwise after the recursive call returns, the
variables would be overwritten. If recursion happens in multiple levels, we just need to
push more and more contexts to the stack. When the recursive call returns, just pop the
latest element from the stack and we get back the previous execution context. This
example is specialized to recursive calls, but it is easy to understand, every function works
the same way.
The data structure supports two basic operations. First one is the push operator,
which puts an element onto the top of the stack. Second operator is the pop, which takes
the top-most element of the stack.
In stack programming these operators can be considered as the assembly level
bricks. As we saw earlier, it can worth the effort to add higher level constructs to the
language, which helps developers to be more productive. Stack effect diagrams help to
define these extensions as they describe how the operator changes the state of the
structure.
Stack effect diagrams (SED)[28]
are usually given in the following standardized
form: opName ( before -- after ). Let us examine the pop method. Before executing the
operation, the stack contains 𝑁 data, after the operation it will only have 𝑁 − 1 elements.
SED of the pop operation is: pop ( a -- ).
There are some, widely used and known stack operators, those are introduced in
the following.
Op. name Stack Effect Diagram
dup ( a -- a a )
drop ( a -- )
swap ( a b -- b a )
over ( a b -- a b a )
rot ( a b c -- b c a )
Stack operators
40
Other stack operators can be easily created. Let us create a multiply operator,
which takes the top two element, multiplies them and pushes back the result to the stack.
SED for this operation looks like: mul ( a b -- c ).
2.5.1 Reverse Polish Notation
Reverse Polish Notation (RPN)[29]
switches the order of the arguments and the
operators in an expression. As it will turn out soon, RPN is essential to leverage stack
programming when solving mathematic problems.
Let us assume we want to calculate the following expression: (2 + 3) ∗ (4 + 5).
Humans usually solve the two addition first, then multiplies the result. For a computer the
given expression is a little difficult to solve. When the interpreter sees the + sign, the
second operator is not yet available. However in this simple case it would be possible to
implement the addition operator to solve this issue, but in general the solution is the RPN.
Rewrite the expression based on RPN: 2 3 + 4 5 + * . This way when the execution
gets to the addition, required data for the operator will be in place.
2.5.3.1 – Stack based execution of (2+3)*(4+5)
RPN has a direct sibling in graph theory. Traversal of the graph, if it is a binary
tree, has three simple variations. Starting from the root of the tree and applying one of the
following rules recursively, it is guaranteed to visit all the nodes.
1. First visit the node itself, then visit the left child and then the right child
2. First visit the left child, then the right child, then the node itself
3. First visit the left child, then visit the node itself, then visit the right child
Consider the following graph, which describes the expression above.
41
2.5.3.2 – Graph representation of (2+3)*(4+5)
Applying the third rule and log out the content of the nodes when visiting them
will result the original expression. Second rule produces the RPN form of the expression,
while the first one is the Polish Notation form. This mapping will be very useful, when
modelling mathematical problems in this paper.
2.5.2 Forth
Forth[30]
is an imperative stack-based computer programming language and
programming environment. It supports structured programming, reflection, concatenative
programming and extensibility. The environment can be used to execute stored programs
as well as an interactive shell. Forth is not as popular as other programming languages or
environments, but sill used in some operating systems and space applications.
Forth operates with words (subroutines). Implementations usually have two
stacks. First stack is used to store local variables and parameters, named data stack.
Second one is the function-call stack, which is called linkage or return stack.
Let us take a look at a self-defined word in Forth:
: FLOOR5 ( n -- n’ ) DUP 6 < IF DROP 5 ELSE 1 – THEN ;
Here we define a new word (FLOOR5). As the stack effect diagram shows it will
manipulate the top element of the stack by replacing it with something else. The
remaining part of the expression is the body of the method. DUP word duplicates the top
element, then 6 pushes a new element (6) into the stack. < compares the top two elements
of the stack and replaces them with a true or false value. IF and THEN are the usual
conditional branch statement. In the true branch, it replaces the original element with a
value of 5, while in the false branch, it pushes a value of 1 into the stack and then subtract
it from the original value.
Execution of a Forth program is as simple as possible. The interpreter reads a line
from the user input and tries to parse it. When the interpreter finds a word, it looks it up
42
in the dictionary for the associated code and executes it. If the word cannot be found in
the dictionary, the interpreter assumes it is a number and tries to push it into the stack. If
both try is failed, execution of the code is aborted. When defining a new word, Forth
compiles the word and makes the name of the word findable in the dictionary.
Stack based programming languages, especially Forth inspired me when
designing the execution model used to run an executable graph.
43
3 Design and implementation
In this chapter I set up and analyze the domain problem of my thesis and also
design and implement a solution for it. I explain the solution step by step starting with the
problem definition and provide usage and implementation details for each step in the
process.
3.1 Business problem
Financial sector is living a high pace lifestyle. Markets and regulations are rapidly
changing and financial crisis just boosted this effect. Regulatory presence is common and
supervisions happen every other day especially since the financial crisis.[31]
In such
environment performance, reliability and maintenance are essential parts in every systems
developed.
3.1.1 Morgan Stanley
Morgan Stanley[1]
is one of the biggest financial services corporation in the world.
It operates in more than 40 countries, with more than 1300 offices and 60000 employees.
History of Morgan Stanley[2]
is dated back to 1935 when some J.P. Morgan employees,
namely Henry S. Morgan and Harold Stanley decided to start a new firm.
Morgan Stanley splits its business into the following categories:
 Wealth Management
 Investment Banking & Capital
Markets
 Sales & Trading
 Research
 Investment Management
Although it is a financial company, it has an outstanding IT department which
supports everyday operation. Even if the systems inside the company cannot harm
humans like an airplane software if it is malfunctioning, IT department has to consider
other important factors. For example, a bug in a software on the trader floor, which
3.1.1 – Morgan Stanley
44
prevents the trader from doing business causes profit loss to the firm. The actual value of
the loss depends on various factors, but in general we speak about millions of dollars.
3.1.1.1 Team work
It is an unconcealed fact that Morgan Stanley seeks the best talents in every area
where they operate. They are actively participating in the education, announcing
internship programs and fresh graduate programs. For example in Hungary, their office
mainly acts as a back office, they do IT developments and have some accounting related
tasks there, but in the last couple of years, they extended the number of their employees
from 100 to over 1000.
Knowing the size of the firm, the complexity of their systems and the fact they are
committed to team-working, they also try to prepare future candidates to work as a team.
As this paper is supported and inspired by the firm, during my studies and work I also
participated in team-work.
In the beginning our main method was to consult the basic ideas, problems and
principles onsite with our consultants and implement the solution separately. During the
first semester, we have identified a lot of interesting topics and distributed them between
us. After this point, consultations were similar to scrum meetings, where we presented
our progress, ideas, problems and got directions if we were stuck.
3.1.2 Pricing
In the financial world, pricing is one of the most important notion.[32]
As the
proverb says: “Buy cheap, sell high”. Process of defining value for cheap and high is the
pricing process. We can price anything – stocks, bonds, options, gold, silver, cars, flats,
vacations. Markets even price possible defect of countries or changes in laws. It also
makes sense to create prices for different scenarios – if it is going to happen, the price is
𝑋 otherwise 𝑌.
Accuracy of pricing is essential. Yield of a trade is usually the bid-ask spread,
which is the difference between the bid (buy) price and the ask (sell) price. If we speak
about trading, another important aspect is the speed of the pricing. As seller’s or buyer’s
will is available to every market participants, the quicker response to a request can
improve the odds to make a good deal. Speed of pricing gets even more important, when
talking about automatic trading, where computers trading against each other.
45
Pricing is also a technical challenge. There are a huge number of available assets
to sell or buy on the market, most of them are correlated to some other assets. Data is live,
which means it ticks every seconds or even more frequently.
The problem I try to solve in this paper is to model connections between the assets
and develop an efficient and effective pricing implementation.
3.2 Modelling the problem
The problem to solve is defined in native English. To be able to analyze and solve
it, I have to model the problem using computer science terms.
Pricing algorithms are usually considered as business secret, so it is almost
impossible to find any information about how big companies implemented their own
version. Even Morgan Stanley could not provide me anything about their implementation.
Probably every big company use the same algorithm tailored to their own directives. In
such competitive environment this conduct is absolutely understandable but also makes
harder to position my results.
3.2.1 Graph representation
When representing networks, dependencies and connections, graph is the obvious
choice. As we want to model the pricing of assets and the connections between them, it
is easy to identify a mapping: nodes of the graph will be the assets we want to price and
edges will represent connections. Mapping of the edges carries information in its own,
therefore edges are directed.
3.2.1.1 Challenges and constraints
Connections between assets can be really complex. Rather than allowing
expressions to be complex, I have decided to force users to build up complex expressions
from simple bricks, like addition, multiplication and some other basic mathematic
operations. On one hand, this makes the input graph bigger and harder to create, but the
user can reuse partial results preventing recalculation of some common values.
As mentioned before, pricing can get extremely complex. For example, it is quite
usual that one asset influences another asset, but that asset also influences the first one
directly or indirectly. In this case, we have a circle in the graph, which filters out a number
of usable methods. To sort this problem out, in the beginning we are going to allow only
46
DAGs as input. Later on the solution can be extended to support circles as well, since
solutions exist to slice up a graph which contains circles subgraphs which are DAG and
later on connect them again without messing up the results.
3.2.1.2 Implementation details
To represent a graph on a computer, we can store the list of the edges, store the
adjacent matrix of the nodes or use other storing mechanisms. In the first case, we store
a list for each nodes, in which we store every edges which contains the given node. In the
second case, we store a 𝑁 ∗ 𝑁 matrix, where 𝑁 is the number of the nodes. Every element
of the matrix represents one possible edge between the nodes. Both approaches has its
own advantages, but in our case storing the adjacent matrix is not efficient. Our domain
problem implies that our graph is DAG. If we use adjacent matrix, we have to store an
entry for each possible edges even if we do not want create it. Due to the nature of the
domain problem, we do not have all the edges, which means unnecessary memory
consumption.
Thinking ahead, a node itself has to know which edges are directed to it. From the
implementation point of view, we get some benefits if the node also knows which edges
are directed out from it. Keeping this in mind, I have decided to implement a hybrid
solution, which tries to unify the advantages of the concepts. My implementation creates
one list for the nodes and one for the edges. These lists help to find any element in the
graph quickly. A node itself contains two lists – one for the incoming edges and one for
the outgoing edges. Finally an edge contains the two nodes it connects, distinguishing
source and target.
3.2.1.2 – Graph structure
47
Using this structure, navigating in the graph is very quick, easy and efficient. On
the other hand, the structure complicates operations like adding or deleting nodes or
edges. For example, when we delete a node, we have to maintain multiple vectors, which
is clearly an overhead. Comparing the amount of the advantages and the drawbacks, the
structure is quite sufficient as most of the time the graph will execute rather than change.
The architecture is designed to be as lightweight as possible. The implementation
uses pointers everywhere it is possible. More specifically, we use pointer types from the
C++11 STD library – shared, weak and unique pointers.
3.3 Building up a graph
Using a simple example, all the details of the model and the graph building process
is illustrated.
3.3.1 Quadratic equation
For simplicity we aim to solve the quadratic equation in our example.
𝑦1,2 =
−𝑏 ± √𝑏2 − 4𝑎𝑐
2𝑎
We can consider 𝑎, 𝑏, 𝑐 as inputs and 𝑦1, 𝑦2 as outputs. One execution of the graph
is going to solve the equation for us.
3.3.2 IProcessable interface
In our model every element is capable to run, this is true for nodes and for edges
as well. This observation leads us to leverage object oriented paradigm and create a base
class for all the runnable elements in the graph.
In C++ we create interfaces as classes since the language does not contain the
interface keyword. The IProcessable base class has two main functions: to store the value
of a node and to store the function which we will fire during the execution. In this model,
the function we execute is the process( ) function. To ensure that every derived classes
implement their own function to run, I marked the process function as abstract.
48
3.2.2 – Basic hierarchy of the graph elements
The signature of the function is empty, because the arguments of the function will
be the predecessors of an element. It means flexibility since instead of restricting the
signature we leave argument handling to the function itself. Although variable length
parameter list is available in C++, it would be much harder to correctly design the
signature of the process function and to ensure every possible combinations than checking
predecessors in the beginning of each functions and decide whether they can be
considered as a valid input.
The Value property can be used to store information which is required for the
element. Let us say we want to create edges, which not only transport the value from one
node to another, but also multiply the carried value. If we want to create an edge which
doubles the value and an edge which triples the value, we can create two separate classes,
but then we duplicate the code. If we create just one class and store the multiplier as a
variable, we can easily modify the multiplication factor for each edges without creating
unnecessary complexity in the code base.
Node and Edge classes also have default implementations and should be
specialized further. Their properties provide the functionalities I described earlier –
namely a node stores every predecessor and successor edges in two separate lists and an
edge stores the nodes it connects to each other. This structure stores the direction of the
edge: From property is the source and To property is the destination node.
3.3.3 Node types
Node is a very generic class. It only knows its predecessors and successors. I have
defined some node types which I can solve mathematic problems with. The type system
is easily extensible, so further types can be created quickly and easily.
49
Addition node
This type does the basic mathematic addition. Takes the values of the predecessors
and sums them up, then notifies the successors about completion. Value property is not
used.
Constant node
This type acts as a constant input. It never has any predecessors and it is always
ready to process. It uses the Value property to store the predefined value. In its process
function it does nothing but notifies the successors about its value.
Division node
This type realizes the division operation. It waits for two predecessors, first one is
the dividend, and second one is the divisor. Value property is not used.
Multiplication node
This type does the basic mathematic multiplication, implementation is very
similar to the addition type node.
Square root node
This type calculates the square root of a number. It takes one predecessor. Value
property is not used.
3.3.4 Edge types
Edge class is also a generic base class. It stores the source and the destination
nodes. The implemented function is very simple, takes the result value from the source
node, stores it and notifies the destination value.
3.3.5 Graph object
Nodes and edges are self-descriptive, but it makes sense to encapsulate coherent
parts.
3.3.5 – Graph structure
50
The class itself tries to be as simple as possible. To support previously described
functionalities, nodes and edges are stored in separate lists. At this point, we only need
two public methods, namely add a node and add an edge to the graph. Each function takes
a node or an edge as argument appropriately. I have decided to decouple the creation of
nodes and edges from the graph. This separation helps not to pollute the graph object with
the details of how to create a node or an edge. Creation of elements can be done via a
factory class which can take care of the creation of separate types of nodes and edges.
3.3.6 Example graph
With the acquired knowledge, we can now setup our example graph. First step is
to separate the starting equation to express the two results separately. Doing that, we get
the following two expressions for the results:
𝑦1 =
−𝑏 + √𝑏2 − 4𝑎𝑐
2𝑎
𝑦2 =
−𝑏 − √𝑏2 − 4𝑎𝑐
2𝑎
To be able to express the RPN form of the expressions, rewrite the formulas and
then create the RPN form:
𝑦1 = (−𝑏 + (𝑏2
− 4𝑎𝑐)0.5
)/2𝑎 𝑦2 = (−𝑏 − (𝑏2
− 4𝑎𝑐)0.5
)/2𝑎
𝑦1 = (−𝑏) 𝑏2
4𝑎𝑐 − √+2 𝑎 ∗/ 𝑦2 = (−𝑏) 𝑏2
4𝑎𝑐 − √−2 𝑎 ∗/
Using the RPN form of the expression, it is easy to create the graph of the
expression. During the build process if we encounter a subexpression which we need and
has been already calculated and has not changed since then, we can reuse it to prevent
unnecessary recalculation of values. Using these observations and expressions, the
following graph can be constructed.
51
3.3.6 – Graph of the quadratic equation
In the picture each colors have different meaning. Blue nodes are input nodes.
Pink nodes are constant nodes with predefined values (nodes with the value of -1 are
identical, displayed twice only in order to simplify the picture). Yellow nodes are
operators of type addition, multiplication, division and square root. Output nodes are
marked with green.
C++ code for creating this graph can be found in the Appendix.
3.4 Simple execution models
Once we have the model setup, the next step is to run it and acquire the results.
Before designing an advanced model, let us take a look, how easy it is to express the
notion “running” in our current model.
52
3.4.1 Recursion based execution
As we discussed earlier we have a simple mapping between the RPN form of an
expression and its graph representation. We also saw an example on how to solve an
expression using its RPN form. In this section, we are going to implement a solution using
the graph representation.
3.4.1.1 Recursion
Using recursion[33]
, we can easily iterate through all the elements of the graph and
calculate the final result. To start the execution we have to call the process function of an
output node as recursion works backwards.
To process a node, we simply take the result of all the predecessor nodes, execute
its own calculation and return. Getting the result of a predecessor element causes
recursion.
Each recursive algorithm needs a stop condition. In our case it is going to be hit
on a constant node – when we reach one, we do not need to call any further process
functions of predecessors, we can return with the value of the node.
3.4.1.2 Implementation details
Each element of a graph is derived from IProcessable base class hence have a
process( ) function. To support recursion based execution I had to ensure that process
function of any IProcessable causes calls to the process functions of each predecessors.
Underlying implementation of process function calls the GetResultValue( )
function of its predecessors. It basically means that when calling the GetResultValue
function, the element itself does not know the call will cause a recursion or will just
simply return with the result value. This fact comes handy when dealing with multiple
outputs. GetResultValue function is implemented in way that it only causes recursive calls
when it is called the first time. When we have multiple outputs, much likely they use
common part-graphs, so rather than recalculating the whole part-graph again, we can
reuse intermediate results.
To get the results, we have to call the GetResultValue( ) functions of the output
nodes. I consider every node without a successor as an output.
53
3.4.1.3 Experiences
Implementing this version pointed out that using higher level languages makes
development much easier as notion like “process” can be expressed naturally via well
designed objects.
This version is the first one I implemented so it is really hard to judge the
performance but in theory I can make provisions.
First barrier of usage can be memory limitation. This type of recursion, where we
have to keep the original calling context of a recursive call, can quickly lead to out of
memory exceptions, especially for big graphs. Other issue could be the speed of the
execution. Knowing that we have a complex data structure for storing a graph and also
have a complex class hierarchy, I assumed that using wiser execution algorithms can have
outstanding performance improvement comparing to the recursion based execution
model.
3.4.2 BFS based execution
From a certain point of view, BFS-based execution is the opposite of the recursion
based model. While using recursion means we automatically know which element is
ready to be processed, if we go the other way around and start from input nodes, we have
to monitor, which element is ready to be processed.
3.4.2.1 Implementation details
Remember that we are working on DAGs, which means a BFS traversal ends up
in a valid order of execution, if we apply a little modification in the algorithm.
In BFS, after visiting a node we add all unvisited neighbors to the ToProcess list.
In our case, we need to ensure that an element is added to the list only if all of its
predecessors are ready. To support this validation, I have implemented two functions.
The isReadyForProcess( ) function checks that every prerequisite of the element is ready.
Purpose of registerParameter( ) function is to help determine the actual state of an
element. When an element is processed, it notifies its successors about this fact by calling
the registerParameter function of each successors. The function inside maintains a
counter and increments it on every call. The checking function compares this counter with
the number of predecessors and if they equal, the element is ready to be processed.
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma
diploma

More Related Content

Viewers also liked

POV Fueling GrowthThrough Customer Centricity
POV Fueling GrowthThrough Customer CentricityPOV Fueling GrowthThrough Customer Centricity
POV Fueling GrowthThrough Customer Centricity
Rob Golden
 
Awesome Beginnings Childcare
Awesome Beginnings ChildcareAwesome Beginnings Childcare
Awesome Beginnings Childcare
Lisa Boerum
 
The Built Environment PCNZ Presentation
The Built Environment PCNZ PresentationThe Built Environment PCNZ Presentation
The Built Environment PCNZ Presentation
Alex Voutratzis
 

Viewers also liked (8)

Sabias que...
Sabias que...Sabias que...
Sabias que...
 
Encontro com Maria João Lopo de Carvalho
Encontro com Maria João Lopo de Carvalho Encontro com Maria João Lopo de Carvalho
Encontro com Maria João Lopo de Carvalho
 
Sabias que ...
Sabias que ... Sabias que ...
Sabias que ...
 
POV Fueling GrowthThrough Customer Centricity
POV Fueling GrowthThrough Customer CentricityPOV Fueling GrowthThrough Customer Centricity
POV Fueling GrowthThrough Customer Centricity
 
Atividades de Natal nas Bibliotecas Martim de Freitas
Atividades de Natal nas Bibliotecas Martim de FreitasAtividades de Natal nas Bibliotecas Martim de Freitas
Atividades de Natal nas Bibliotecas Martim de Freitas
 
2015 10-ntt-com-forum-miyakawa-revised
2015 10-ntt-com-forum-miyakawa-revised2015 10-ntt-com-forum-miyakawa-revised
2015 10-ntt-com-forum-miyakawa-revised
 
Awesome Beginnings Childcare
Awesome Beginnings ChildcareAwesome Beginnings Childcare
Awesome Beginnings Childcare
 
The Built Environment PCNZ Presentation
The Built Environment PCNZ PresentationThe Built Environment PCNZ Presentation
The Built Environment PCNZ Presentation
 

Similar to diploma

2010 thesis guide
2010 thesis guide2010 thesis guide
2010 thesis guide
tettehfred
 
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
Chimwani George
 
Organisational Structure in Support of the IT Knowledge Worker
Organisational Structure in Support of the IT Knowledge WorkerOrganisational Structure in Support of the IT Knowledge Worker
Organisational Structure in Support of the IT Knowledge Worker
Gary Merrigan (CITO)
 
Dissertation
DissertationDissertation
Dissertation
Amy Duff
 
Bachelors Thesis_Best Buy Co.
Bachelors Thesis_Best Buy Co.Bachelors Thesis_Best Buy Co.
Bachelors Thesis_Best Buy Co.
natia manjgaladze
 
DimitrisByritis_Thesis_2014
DimitrisByritis_Thesis_2014DimitrisByritis_Thesis_2014
DimitrisByritis_Thesis_2014
Dimitris Byritis
 
“Workshop on the use of didactic teaching aids for english classes, for teach...
“Workshop on the use of didactic teaching aids for english classes, for teach...“Workshop on the use of didactic teaching aids for english classes, for teach...
“Workshop on the use of didactic teaching aids for english classes, for teach...
Josenglish Ramos
 
FINNISH SPORT SPONSORSHIP AND SPONSORED SOCIAL MEDIA CONTENT
FINNISH SPORT SPONSORSHIP AND SPONSORED SOCIAL MEDIA CONTENTFINNISH SPORT SPONSORSHIP AND SPONSORED SOCIAL MEDIA CONTENT
FINNISH SPORT SPONSORSHIP AND SPONSORED SOCIAL MEDIA CONTENT
Laura Peltonen
 

Similar to diploma (20)

Impact of sovereign wealth funds on international finance
Impact of sovereign wealth funds on international financeImpact of sovereign wealth funds on international finance
Impact of sovereign wealth funds on international finance
 
Application form librarystaff kashmir
Application form librarystaff kashmirApplication form librarystaff kashmir
Application form librarystaff kashmir
 
2010 thesis guide
2010 thesis guide2010 thesis guide
2010 thesis guide
 
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
 
Consumer Perception and Market analysis of PARTNER truck- Ashok Leyland
Consumer Perception and Market analysis of PARTNER truck- Ashok Leyland Consumer Perception and Market analysis of PARTNER truck- Ashok Leyland
Consumer Perception and Market analysis of PARTNER truck- Ashok Leyland
 
Organisational Structure in Support of the IT Knowledge Worker
Organisational Structure in Support of the IT Knowledge WorkerOrganisational Structure in Support of the IT Knowledge Worker
Organisational Structure in Support of the IT Knowledge Worker
 
ME501_2014_1_1_1_Graduation_Project_Form.docx
ME501_2014_1_1_1_Graduation_Project_Form.docxME501_2014_1_1_1_Graduation_Project_Form.docx
ME501_2014_1_1_1_Graduation_Project_Form.docx
 
Dissertation
DissertationDissertation
Dissertation
 
Happy schools
Happy schoolsHappy schools
Happy schools
 
ThesisCIccone
ThesisCIcconeThesisCIccone
ThesisCIccone
 
Connie
ConnieConnie
Connie
 
Bachelors Thesis_Best Buy Co.
Bachelors Thesis_Best Buy Co.Bachelors Thesis_Best Buy Co.
Bachelors Thesis_Best Buy Co.
 
DimitrisByritis_Thesis_2014
DimitrisByritis_Thesis_2014DimitrisByritis_Thesis_2014
DimitrisByritis_Thesis_2014
 
The Effect of Counterfeit Products on the Consumers’ Perception of a Brand: A...
The Effect of Counterfeit Products on the Consumers’ Perception of a Brand: A...The Effect of Counterfeit Products on the Consumers’ Perception of a Brand: A...
The Effect of Counterfeit Products on the Consumers’ Perception of a Brand: A...
 
Burden of proof
Burden of proofBurden of proof
Burden of proof
 
“Workshop on the use of didactic teaching aids for english classes, for teach...
“Workshop on the use of didactic teaching aids for english classes, for teach...“Workshop on the use of didactic teaching aids for english classes, for teach...
“Workshop on the use of didactic teaching aids for english classes, for teach...
 
FINNISH SPORT SPONSORSHIP AND SPONSORED SOCIAL MEDIA CONTENT
FINNISH SPORT SPONSORSHIP AND SPONSORED SOCIAL MEDIA CONTENTFINNISH SPORT SPONSORSHIP AND SPONSORED SOCIAL MEDIA CONTENT
FINNISH SPORT SPONSORSHIP AND SPONSORED SOCIAL MEDIA CONTENT
 
REPORT on OUTREACH PROGRAMME: FEB – MARCH 2017
REPORT on OUTREACH PROGRAMME: FEB – MARCH 2017REPORT on OUTREACH PROGRAMME: FEB – MARCH 2017
REPORT on OUTREACH PROGRAMME: FEB – MARCH 2017
 
Validation Report - Adult Education and Lifelong Learning Sector
Validation Report - Adult Education and Lifelong Learning SectorValidation Report - Adult Education and Lifelong Learning Sector
Validation Report - Adult Education and Lifelong Learning Sector
 
Civil engineering final internal attachement.docx
Civil engineering final internal attachement.docxCivil engineering final internal attachement.docx
Civil engineering final internal attachement.docx
 

diploma

  • 1. STUDENT DECLARATION I, Gábor Apagyi, declare that I have created this diploma without any unauthorized help, using only the specified sources (literatures, tools, etc.). Every section is obviously marked with the source, where I have used the original text verbatim or rephrased. I give permission to BME VIK to publish the basic data of this work (author(s), title, abstract in English and Hungarian, date of creation, name of consultants) in electronic form within reach of everyone, and the full text of the work through the intranet of the university. I declare that the handed in and the electronic versions are identical. Text of encrypted diplomas, with permission of the Dean, are only become accessible after 3 years. Dated: Budapest, 12/05/2015 ...……………………………………………. Gábor Apagyi
  • 2. HALLGATÓI NYILATKOZAT Alulírott Apagyi Gábor, szigorló hallgató kijelentem, hogy ezt a diplomatervet meg nem engedett segítség nélkül, saját magam készítettem, csak a megadott forrásokat (szakirodalom, eszközök stb.) használtam fel. Minden olyan részt, melyet szó szerint, vagy azonos értelemben, de átfogalmazva más forrásból átvettem, egyértelműen, a forrás megadásával megjelöltem. Hozzájárulok, hogy a jelen munkám alapadatait (szerző(k), cím, angol és magyar nyelvű tartalmi kivonat, készítés éve, konzulens(ek) neve) a BME VIK nyilvánosan hozzáférhető elektronikus formában, a munka teljes szövegét pedig az egyetem belső hálózatán keresztül (vagy hitelesített felhasználók számára) közzétegye. Kijelentem, hogy a benyújtott munka és annak elektronikus verziója megegyezik. Dékáni engedéllyel titkosított diplomatervek esetén a dolgozat szövege csak 3 év eltelte után válik hozzáférhetővé. Kelt: Budapest, 2015. 05. 12. ...……………………………………………. Apagyi Gábor
  • 3. 3 Budapest University of Technology and Economics Faculty of Electrical Engineering and Informatics Department of Automation and Applied Informatics Gábor Apagyi ANALYSIS OF EXECUTABLE GRAPH MODEL CONSULTANTS Dr. Gergely Mezei Ferenc Nasztanovics (Morgan Stanley) Loránd Szöllősi (Morgan Stanley) BUDAPEST, 2015
  • 4. 4 Table of contents Abstract............................................................................................................................ 7 Összefoglaló ..................................................................................................................... 8 1 Introduction.................................................................................................................. 9 1.1 The structure ........................................................................................................... 9 1.2 The domain ............................................................................................................. 9 1.3 Motivation............................................................................................................. 10 2 Theoretical background ............................................................................................ 11 2.1 Graph theory ......................................................................................................... 11 2.1.1 History ........................................................................................................... 11 2.1.2 Definition....................................................................................................... 12 2.1.3 Important properties of graphs....................................................................... 13 2.1.4 Algorithms ..................................................................................................... 15 2.1.5 Executable graphs.......................................................................................... 20 2.2 Distributed environments...................................................................................... 20 2.2.1 Software solutions ......................................................................................... 21 2.2.2 Advantages and drawbacks............................................................................ 27 2.3 Scheduling ............................................................................................................ 28 2.3.1 Cost of scheduling ......................................................................................... 29 2.3.2 Type of schedulers......................................................................................... 31 2.3.3 Algorithms ..................................................................................................... 32 2.4 Compilers.............................................................................................................. 36 2.4.1 Compilation process ...................................................................................... 36 2.4.2 Compiler optimization................................................................................... 37 2.4.3 Higher level languages................................................................................... 37 2.5 Stack programming............................................................................................... 38 2.5.1 Reverse Polish Notation................................................................................. 40 2.5.2 Forth............................................................................................................... 41 3 Design and implementation....................................................................................... 43 3.1 Business problem.................................................................................................. 43 3.1.1 Morgan Stanley.............................................................................................. 43 3.1.2 Pricing............................................................................................................ 44
  • 5. 5 3.2 Modelling the problem.......................................................................................... 45 3.2.1 Graph representation...................................................................................... 45 3.3 Building up a graph............................................................................................... 47 3.3.1 Quadratic equation......................................................................................... 47 3.3.2 IProcessable interface .................................................................................... 47 3.3.3 Node types ..................................................................................................... 48 3.3.4 Edge types...................................................................................................... 49 3.3.5 Graph object................................................................................................... 49 3.3.6 Example graph............................................................................................... 50 3.4 Simple execution models...................................................................................... 51 3.4.1 Recursion based execution............................................................................. 52 3.4.2 BFS based execution...................................................................................... 53 3.4.3 Simulating stack execution............................................................................ 54 3.4.4 Disadvantages of high level modelling.......................................................... 54 3.5 Execution model ................................................................................................... 55 3.5.1 Tokens............................................................................................................ 55 3.5.2 Execution ....................................................................................................... 56 3.6 Compiling ............................................................................................................. 56 3.6.1 Transforming IProcessable objects................................................................ 57 3.6.2 Order of compiling......................................................................................... 60 3.7 Distribution of tasks.............................................................................................. 60 3.7.1 Random scheduler.......................................................................................... 61 3.7.2 Greedy scheduler ........................................................................................... 61 3.7.3 Separation of tasks......................................................................................... 62 3.8 Improving the execution model ............................................................................ 62 3.8.1 Sync and wait operators................................................................................. 62 3.8.2 Modifications in the compiler........................................................................ 64 3.8.3 Execution logic modifications ....................................................................... 65 3.9 Visualization ......................................................................................................... 66 3.10 Measuring methods............................................................................................. 69 3.10.1 Time measurement....................................................................................... 69 3.10.2 Measuring memory usage of the execution................................................. 70 4 Testing and conclusions............................................................................................. 71 4.1 Test environment .................................................................................................. 71
  • 6. 6 4.2 Test cases .............................................................................................................. 71 4.2.1 Quadratic equation......................................................................................... 71 4.2.2 Random graph................................................................................................ 72 4.3 Test runs................................................................................................................ 73 5 Improvement possibilities ......................................................................................... 77 Acknowledgements ....................................................................................................... 78 Bibliography.................................................................................................................. 79 Appendix........................................................................................................................ 82
  • 7. 7 Abstract Environment changes rapidly these days – technologies of today will be obsolete tomorrow, meadows become metropolises in years and even into the seas new islands can be built. This rapid lifestyle is reflected in everyday life as well. Fast-food restaurants, fast cars, speed fitness are usual words today, in every corners of our life we can feel the rapidity. In financial sector agility is more out-standing. People open and close bank accounts online in about 10 minutes or buy stocks from many different companies in a flash. To remain competitive in such environment, banks like Morgan Stanley has only one choice: be the quickest player on the market. Morgan Stanley realized this fact and puts serious effort into their research activity. Pricing of the available assets on the market is a very complex task and it is also very important to do it quickly. Calculating a meaningful prices one second before another participants of the market means clear business advantages. My thesis is intended to analyze the possibility to model pricing process using graphs and find an effective way to execute the constructed model. Taking the very first step, I started analyzing graph structures and algorithms, distributed system architectures, scheduling algorithms and possible executing models. My focus was caught by the execution models as I saw improvement possibilities in the usual implementations and innovation possibilities for example executing the graphs on FPGAs. During my work I wanted to keep the expressiveness of the original reference model but I also wanted to improve the performance. My intentions ended up in implementing an expressive reference model, an effective execution model and a compiler which transforms one model into another. In the testing phase, I measured 20% performance improvement in average in non- distributed environment compared to the original reference model. In distributed environment, using an advanced schedulers, the improvement compared to the original model is even more outstanding. At the end of my thesis, I listed a couple of development opportunities, which would make the performance even better and the system more user-friendly.
  • 8. 8 Összefoglaló Napjainkban a környezetünk rendkívüli ütemben változik – a mai fejlett technológiák holnapra elavulnak, évek alatt a legelők helyén városok épülnek vagy akár szigetek nőnek ki a tengerből. A gyorsuló életstílus a mindennapi életünkben is észrevehető. Gyorsétterem, gyors autó, speed fitness - gyakori szavak manapság. A bank szektorban ez a gyors ritmus még jobban megfigyelhető. Az emberek 10 perc alatt számlákat nyitnak, vagy szüntetnek meg, esetleg pillanatok alatt részvényeket vásárolnak. Ahhoz hogy ilyen környezetben egy bank, például a Morgan Stanley versenyképes maradjon egy lehetősége van: a leggyorsabbnak kell lennie a piacon. Ezt a tényt a Morgan Stanley is felismerte és komoly kutatási tevékenységbe kezdett. A termékek beárazása a piacokon nagyon komplex feladat, illetve szintén nagyon fontos a folyamat tényező a folyamat gyorsasága. Egy másodperccel hamarabb tudni egy termék árát, mint a többi szereplő a piacon komoly üzleti előnyt jelent. Dolgozatom célja megvizsgálni a lehetőséget, hogy az árazást gráfokkal modellezzük, illetve kidolgozni egy hatékony végrehajtási módot ezen modellek részére. Első lépésként megvizsgáltam a gráf adatszerkezetet és algoritmusait, elosztott rendszer architektúrákat, ütemező algoritmusokat és lehetséges végrehajtási modelleket. Ezen témák közül a végrehajtási modellekben fedeztem fel továbbfejlesztési és innovációs lehetőségeket, mint például a gráf futtatása FPGA-n. Munkám során fontosnak tartottam megőrizni az eredeti gráf reprezentáció kifejezőképességét, de növelni akartam a rendszer teljesítményét. Végül három részre bontottam az implementációt: egy nagy kifejezőképességgel bíró modellre, egy hatékony végrehajtási modellre, illetve egy fordítóra, ami a két modell közötti átjárhatóságot biztosítja. A tesztelés átlagosan 20%-os teljesítménynövekedést mutatott egy processzoros környezetben az eredeti referencia implementációhoz képest. Több processzoros környezetben, fejlett ütemezőt használva a teljesítménynövekedés még szembetűnőbb volt. Dolgozatom végén számos továbbfejlesztési lehetőséget felsorolok, melyek megvalósításával a rendszer teljesítménye tovább növelhető, illetve jobb felhasználói élmény érhető el.
  • 9. 9 1 Introduction This introduction chapter gives a short overview of the topic, the structure and the desired goal of this paper. 1.1 The structure The paper is arranged into chapters, which ones guides the reader from the theoretical background to the implemented solution and tests. The first chapter is the introduction, it clarifies and details the borders of my task. The second chapter provides a theoretical background and describes the principles and methods used in the paper. The design and implementation chapter gives insight to the implementation and explains some of the most important design decisions. Chapter four evaluates my solution and tests its performance. The fifth chapter proposes improvement possibilities which can improve the results. 1.2 The domain Morgan Stanley[1][2] is one of the biggest investment bank in the world. In order to do their everyday work, they use lots of different incoming data feed. If processing of these feeds is quick enough, the feeds can provide help to make decisions based on the actual situation or at least the most updated view of the market. When processing speed exceeds the limit of human limitations, speeding up the systems is not necessary anymore. Or is it? The answer is definitely: “Yes, it is.” It is essential to boost the speed of processing. When human factor is left out from the equation, whole systems can be built relying on the data streams to lift trading onto a higher level. These new systems are called automated trading platforms and recently they are becoming more and more important. As all market data is available to every participants of the market, the number of possible ways to gain advantages over the competitors is very limited. Banks can either fine tune their systems to boost up the performance or can develop new algorithms and techniques to make better decisions.
  • 10. 10 1.3 Motivation Let us assume that, we have a mathematic model of a market, where we can express connections between products and their prices. Moreover assume we have some other solution to express the same problem in another way. If we compare these models, there are two important properties to check. The first is how accurate the model is, how well it describes the real market. The second is how fast it is to obtain new prices of products when the incoming data feed - related to the particular product - changes. From Morgan Stanley’s point of view, having the most advanced solution (both in terms of accuracy and performance) means direct advantages on the market. As they realized how much they keen on innovation and new technologies, they started to invest serious effort in research projects. My work compares some possible solutions, tries to maximize the performance of one possible solution and also tries to come up with some guidelines about future possibilities.
  • 11. 11 2 Theoretical background This chapter is intended to give the reader a better understanding on the terms, laws and some thesis used. I will provide the necessary amount of information to understand my work and I will also mark the points, which can be used as a good starting point for extending my solution. 2.1 Graph theory As this paper is all about graphs, first important step is to understand what a graph is, why graphs are important, how we can leverage them in our everyday life. The following section will guide us through the important parts of graph theory related to this paper. 2.1.1 History There are number of times, when understanding a complex problem is much easier with a small illustration. “Seven Bridges of Königsberg” problem[3] is one of that type. In the 18th century, Leonhard Euler was asked by the citizens of Königsberg with the following question: “Is it possible to visit all mainland of the city, while every bridge will be crossed once, and only once?” 2.1.1 – Map of Königsberg As you can see on the map, the city is crossed by the river Pregel and the lands of the city therefore are connected by seven bridges. While Euler tried to answer this question, he basically formed graph theory. Let us take a look at his solution.
  • 12. 12 Solution of Königsberg problem First, we need to get our problem domain smaller. We do not need to see the houses of the city, nor the river, nor the bridges. All mainland parts are be represented by a simple dot and bridges by lines. Here comes the above mentioned principle about illustration. Try to redraw the map using only dots and lines. The result should be something as in figure 2.1.1.1. 2.1.1.1 – Redrawn map of Königsberg It is much easier to analyze the model without the unimportant details. As Euler observed, if we want to walk through all the bridges (edges), we have to “get in” and “get out” of a mainland (dot, node) – except the first and the last node, if we do not want to arrive back to our starting point. Let us call the number of the edges crossing a node the degree of the node. Euler’s observation implies that each node degree must be even in our map (except the degree of the first and the last node, if they are not the same). If we calculate the degree of all nodes, we will see each node has an odd degree. In real life this result means: after a while, when we get into a mainland, we will not have any unused route out of it. This determinates the answer of the original question: there is no way to take a walk in Königsberg and visit all mainland while crossing every bridges, but crossing each of them exactly once. This type of walk in a graph, where each of the edges are visited once and only once are called Eulerian path or Euler walk in his honor. The above mentioned observation about node degrees is a necessary and sufficient condition for the existence of Eulerian paths. 2.1.2 Definition At first, let us define what graph means: [4]
  • 13. 13 Graph is an ordered pair G = (V, E) comprising a set of vertices or nodes (V) and a set of edges (E). 2.1.3 Important properties of graphs 2.1.3.1 Edge direction Graphs can be grouped by the direction of its edges. An edge (E) is directed if 𝐸 = (𝑣1, 𝑣2); 𝐸1 = (𝑣2, 𝑣1); 𝐸 ≠ 𝐸1; 𝑣1, 𝑣2 ∈ 𝑉 and it is undirected if 𝐸 = (𝑣1, 𝑣2); 𝐸1 = (𝑣2, 𝑣1); 𝐸 = 𝐸1; 𝑣1, 𝑣2 ∈ 𝑉 A graph is considered as undirected graph if it does not contain any directed edges, otherwise it is referred as directed graph. Directed graphs can be created from undirected graphs by replacing every edges by two new edges which connect the same nodes but their direction is reversed. This conversion always can be done without modifying the meaning of the graph. However, conversion in the reverse direction can possibly modify the underlying meaning of the model. 2.1.3.1 – Converting undirected graphs to directed graphs and vice-versa In this paper we are going to use mostly directed graphs as they can express connections between stock prices in a natural way if we imagine a directed edge as an arrow. If an arrow (edge) points from a stock price to another stock price (nodes) it means changes in the source stock price have an impact on the destination stock price. 2.1.3.2 Cyclic/acyclic property A graph is considered cyclic if there exists a sequence of nodes, where each following nodes are connected by an edge and the end node of the sequence is equals to the beginning node of the sequence. Formally: 𝑋 = (𝑣1, 𝑣2, … , 𝑣 𝑛); 𝑣1 = 𝑣 𝑛; 𝑣𝑖 ∈ 𝑉
  • 14. 14 X is called a circle of the graph. A graph is acyclic if it does not contain any circles. 2.1.3.2 – Graph with circle on the left, without circle on the right 2.1.3.3 DAG property DAG[5] stands for Directed Acyclic Graphs. From the previous definitions it is very straightforward to see the meaning of this property. A DAG is a graph, which is directed and does not contain any circles. This class of graphs is very important, since there are many problems considered to be solvable if the underlying graph is a DAG. Let us suppose we have a collection of tasks that must be ordered into a sequence (and suppose such ordering exists) and we have a collection of constraints, i.e. rules stating a task must be completed before another task is started. Constrains can be expressed by directed edges between nodes representing the tasks: the source node of the edge must be completed before the destination node of the edge. 2.1.3.3 – Directed Acyclic Graph As the source node of an edge must be completed earlier than the destination node, starting from a randomly selected node 𝑛1 and assuming we can have a circle in the graph, it is possible to reach the starting node again after a while. Let us name the last node 𝑛𝑖 from where we return to 𝑛1. The result implies 𝑛1 must be completed before 𝑛𝑖, as we
  • 15. 15 started from 𝑛1 and reached 𝑛𝑖. It also implies 𝑛𝑖 must be completed before 𝑛1 as we have an edge which is directed from 𝑛𝑖 to 𝑛1. This is an obvious conflict which means the graphs must not contain a circle. Given a set of nodes (V) and edges (E) a topological order is a 𝑣1, 𝑣2, 𝑣3 … 𝑣 𝑛 sequence, where ∀𝐸 = (𝑣𝑖, 𝑣𝑗) ∈ 𝐸(𝑣1, 𝑣2, 𝑣3 … 𝑣 𝑛); 𝑗 = 𝑖 + 1; 1 ≤ 𝑖 < |𝑉| 𝑣𝑖 ∈ 𝑉 It can be proved that if topological order exists, the graph is a DAG and vice- versa. It is also a fact that topological order is not always unique which means that there can exist more ordering for a given graph. When it comes to scheduling differences between the orderings give us the opportunity to optimize our solution based on several criteria (e.g.: running time, cost, finishing time, etc.). 2.1.4 Algorithms Graphs are powerful structures for representing real life problems such as simplified stock pricing. For formalized problems it is easier to create general solutions. In this section, several algorithms are elaborated. 2.1.4.1 BFS Breadth First Search is a method to find a particular node in the graph starting from a given point or prove that the point is not reachable from the given start point. The algorithm starts by visiting the neighbors directly reachable from the starting point. Let us consider that the starting point is on level 0. In this case, level 1 contains all directly reachable nodes from the starting point, level 2 contains those nodes, which can be directly reached from level 1 nodes, etc. Generally, level N contains nodes which are reachable from level N-1, but not yet reachable from N-2. The algorithm stops when it finds the required node or there are no more reachable, unlabeled nodes. It is possible that the algorithm does not process all the nodes – only nodes reachable from the starting point will be processed. This fact gives us tools to identify properties such as reachability from a given node, to prove that the graph is fully connected or not. There are two important good-to-knows about this labeling procedure. Firstly, the algorithm processes all available nodes on level N – which basically means identifying
  • 16. 16 nodes on level N+1 – before processing any nodes on a higher level. Secondly, the algorithm goes forward, which means it skips all nodes which are already labelled with a lower number. 2.1.4.1 – BFS ordering – each node on level N gets processed before going onto level N+1, order of nodes on a particular level is not defined The way of determining which node to process next is not specified in the original algorithm, thus, it can be chosen. It can be full random or can use various data to make better decisions. The simplest way is to queue the nodes, thus when a new node is identified to be processed, it is put at the end of the queue. The first element in the queue will be processed next. From a higher perspective, level numbering can be looked as a super simple heuristics. If we use problem specific knowledge, there are wiser ways to shepherd the order of the processing, however using more specific heuristics can easily lead to a whisper which says we should reorder processing even if we cross levels even if it leads to an algorithm, which is not a BFS anymore. For example, speaking about route search on a map – which can be easily tracked back to a search in a directed graph – it is wiser to use air distance rather than leveling. But as I mentioned earlier, this leads to conflict with leveling. It is more useful to process nodes with lower air distance metrics to the destination even if these nodes are on higher levels from level number perspective. However air distance heuristics can be wrong and lead the algorithm to a dead-end where we are really close to the destination but we cannot reach it.
  • 17. 17 As we saw there are trade-offs when designing or using algorithms. BFS can be easily implemented and used in various situations (detecting partitions in graphs, traversing all the nodes, leveling problems, etc.) but also can be an in effective mechanism as we illustrated with the map example. 2.1.4.2 DFS Depth First Search is really similar to BFS. The difference between the two algorithms are in the order of the processing. In BFS we use levels to identify a group of nodes to process. In DFS we can also use the level term, but the applied rules will be different. Let us suppose we have a starting point and we know the level numbers for all the nodes (level numbers are specific to the starting point). DFS will pick a node from the directly reachable nodes (level 1 nodes). However, rather than picking the next node from level 1, the algorithm will pick the next node from the directly reachable set of nodes (on level 2) from the previously selected level 1 node. If there are no more reachable next level nodes, the algorithm will jump back and select the next node from a previous level. The exit criteria is the same as in BFS algorithm. The algorithm stops if it finds the searched node or there are no more unprocessed node. It is also possible that the algorithm does not process every nodes as the graph is not necessary fully connected, thus not every nodes are reachable from the given starting node. This means that all the reachable graph nodes will be processed before processing the next node on level 1. The name of the algorithm comes from this fact, as it searches deeply inside the graph while BFS first checks the closest nodes and slowly grows the checked area. From the implementation point of view, DFS can also be implemented easily with a queue. In this case, new nodes will be put in the beginning of the queue and we will pick the next node also from the beginning of the queue.
  • 18. 18 2.1.4.2 – DFS order with random selection. Level 3 node gets processed before lower level nodes are finished. Note that the previously mentioned modification of BFS with the air-distance heuristics silently turned out to be a DFS. Heuristics is used when deciding on which path to go forward. 2.1.4.3 Traversal using search algorithms There are cases, when we want to apply functions to every node in a graph rather than just finding one particular node. Technically when processing a node, we can apply any function to it. For instance we can write out data from the node or accumulate its value to a global variable. The only problem is to ensure that we have visited all the nodes. With a small modification, the described BFS and DFS algorithms can do this for us. Remember in both cases the algorithm has exit criteria – namely finding a node which fulfills the search criteria or getting to the end of the graph. If we ensure that the search predicate will never evaluate to true or just simply skip checking the search criteria, the algorithms are guaranteed not to stop before reaching the end of the graph. Actually we still have an issue with this approach. It is not ensured that every nodes are reachable from the starting point – it is not guaranteed even in a randomly selected undirected graph, since graphs can have separated partitions. The solution is to check that we have processed all the nodes in the graph or not when we reach an endpoint
  • 19. 19 (for example by counting the number of processed nodes and comparing to the total number of the nodes in the graph). If the graph still contains unprocessed nodes, randomly select one of them and restart the algorithm using the selected node as starting point. 2.1.4.4 Topological ordering As we saw earlier, topological order carries important and meaningful properties such as the DAG property. The following algorithm gives a simple solution to generate a topological ordering if it exists. We describe the algorithm using scheduling as an example. The initial setup is the usual: we have a directed graph where nodes are the tasks and directed edges express precedence between the tasks. Firstly, algorithm selects all the nodes which have no outgoing edges. These nodes will be processed as last since there are no dependencies on them. If no such nodes exist, that means no topological order exists, hence it means the graph contains at least one directed circle. Let us leave out these nodes from the graph and repeat the previous step. Then repeat again until no more nodes remain in the graph. 2.1.4.4 – Execution flow of topological ordering algorithm It is easy to prove that algorithm gives a topological order of the graph and can be implemented efficiently. As mentioned earlier, topological order is not unique. Given the result of this algorithm multiple topological orders can be created if we know which nodes belongs to which iteration. Nodes added to the schedule in the same iteration can be shuffled as they do not influence each other and their execution.
  • 20. 20 2.1.5 Executable graphs Executable graphs are more than usual DAGs by storing a function inside the nodes and the edges. In general, these function can do many operations from logging to changing its value based on some criteria. Executing the graph means to visit all the elements of the graph and run their stored functions. Execution starts at the nodes which are marked as inputs. When all the nodes are run, we create a collection from the values of the nodes which are marked as output. This collection will be the result of the execution. Execution may raise interesting questions. Consider we have a huge graph with multiple inputs and outputs. What happens if an input changes? Do we really want to recalculate the whole graph or is it possible to recalculate only the affected parts? The starting point of the execution is locked, but the execution flow can shape very differently based on the algorithm used. In this paper, I am searching for a proper way to optimize the execution of this type of graphs. By the end of this paper, the reader will have a good understanding about the possibilities, challenges and pitfalls of the problem. 2.2 Distributed environments Computer science has been changing on a very high pace since the first computer was turned on. According to Moore’s law[6] , computing capacity will be doubled every two years. This law looks accurate based on the last couple of decades. This fact also means shifting in problems in computing science as well as the development of the underlying hardware. First computers were giant, monolithic machines which executed programs sequentially. Sequentially means that programmers had to explicitly declare the order of the commands to be executed. In addition, back in these days the machines were exclusively reserved for the program under execution. This may sound strange, since today users are listening to music, while chatting on the internet and editing a document in a text editor – probably on a device which fits into your pocket. This example frames the shift very well – in old days a programmer knew the program will run on a dedicated machine and the interpreter is going to execute the commands in a well-defined order. In the era of “internet-of-things” this model has been dramatically changed. Multiuser, multitasking, parallel programming, remote execution and asynchronous execution are concepts which developers of today are needed
  • 21. 21 to be aware of and deeply understand them in order to develop high-end, modern applications and leverage hardware capabilities. 2.2.1 Software solutions It is usually hard to clearly separate hardware and software solutions as they both require support from the other side as well. XXIst century is characterized from IT perspective mainly by internet, mobile and recently cloud boom[7] . Hardware developments are continuing as well, but as hardware capabilities are sufficient for various everyday usage, software engineers’ focus is now shifted to serve other purposes as well like user experience, scalability, reliability and etc. Altered or at least extended goals require new approaches. 2.2.1.1 Parallel programming While in most cases, a mobile device has more computing resources than the first mainframe computers had, these devices are used to solve more complex problems as well. To leverage hardware capabilities, programmers are needed to be aware of parallel programming[8] . In the sequential model there exist an exact order of the commands which is defined by the developers – and optimized by the modern compilers. Considering cases when the execution of two commands do not have effect on each other, a natural solution is to boost the execution time. Use the available free processors to compute distinct parts of a complex expression. 2.2.2.1.1 – Parallel execution of (1+2)*(3+4) with one and two CPUs
  • 22. 22 While multitasking does not require paradigm change among developers, parallel programming does. Designing parallel algorithms is a very creative, intuitive and innovative process. While best practices exist, there is no ultimate way to design parallel algorithms or convert sequential algorithms to parallel mode. However, in most of the cases, developers get a huge payback in trade of the effort made to design the parallel algorithms. Depending on the level of parallelization, there is a fairly simple way to calculate the profit. A simple theoretical upper barrier of the obtainable speed-up, given the sequential execution time is 𝑇 and we use 𝑛 processors, is 𝑇 − 𝑇 𝑛 . In practice, this barrier cannot be exceeded, reached and even with the wisest design, usually we cannot get near to this limit. However, we can influence the order of magnitude of the execution time. First reason for this limitation is that parallel programming also introduces synchronization and governing overhead. The second is: parts of a parallel executed problem are needed to be joined at a certain point. It means that even if we could completely parallelized a problem, reaching the end of the execution line will cause some processors to be idle and others to calculate the joined results. In real life problems only parts of the algorithms can be parallelized, thus during the execution, parallel and sequential parts follow each other. The more alternating parts means the more synchronization overhead is required. This fact explains well why parallel programming introduces serious overhead. It is worth to mention that, by using overlapping between execution cycles we can counteract some of the overhead. Overlapping means we start to execute the next cycle before the current cycle is finished to leverage idle time of the resources. 2.2.2.1.2 – Alternating between sequential and parallel execution 2.2.1.2 Threading While parallel programming leverages multiple CPUs, threading aims to run programs in parallel on a single CPU. This concept is very similar to multitasking, while multitasking is the idea, threading is the implementation.
  • 23. 23 In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is a typically part of the operating system.[9] Threads could be defined in some other ways as well, however this definition carries much additional information about threads. The first important fact is that a thread is not a program or source code, but the execution of them. It means threads exist only in execution time which implies that debugging of multithreaded programs is hard since static analysis is much harder or even impossible to perform. The first fact also means that a big execution flow can be broken into smaller parts – manually or semi-automatically. The concept of threads advances the need for schedulers, which we will discuss in more detail in the following chapter. As the definition above shows, schedulers and threads are strong parts of every single modern operating systems - threading enables to execute multiple programs on a single CPU or execute single program on multiple CPUs. Going deeper into the hardware and analyzing how threading works, it becomes clear that threading can be very expensive and after a certain number of threads, creating new threads will seriously degrade the performance of the system. This is caused mainly by the context switches. When a CPU executes a program, it loads the program and the data into the memory, populates the required registers, positions the PC (Program Counter) and then steps through all the instructions. But what happens, if in the middle of the program, the CPU is get preempted and it needs to start to execute another program – knowing that we want to finish the intermittent program later? The CPU has to save the current execution state, load the environment of the new program, which is probably a previously preempted program, and execute it. This is called context switch. It is very important to keep the number of context switches at a reasonable level. The time, which the CPU spends on changing between contexts is the overhead of the threading. If the computer has only one CPU installed, this overhead is the cost for multitasking and of course it means the users cannot leverage 100% of the computing capabilities. Moreover, we cannot expect boost in the execution time (but we can expect it in throughput). When using multiple CPUs, advantages of threading are easier to notice. While we get the ability of multitasking, we can also gain serious performance boost.
  • 24. 24 2.2.1.3 Remote execution Mobile devices and personal computers can carry huge computing and storage capabilities, but there are cases when it makes sense to separate concerns: to store sensible data in replicated and safe data houses, to execute complex calculations in the cloud and to interact with the user through personal devices. In this approach, the actual footprint of the system is determined only at runtime and can be dynamically changed. It also requires communication between the parts of the system. Vigilant readers can spot that, this model is a scaled-up version of an everyday computer. These concerns are also separated in personal computers, but the distance between the parts is much smaller. If the user wants to access a file from the EU which is stored in the US, a request must be sent to the data warehouse and the data must be transferred over at least 4-5 thousands kilometers. One challenge is to overcome this issue and provide a competitive solution opposite to store the data locally. Nowadays internet access is fast enough to leverage the advantages of the described scenario and avoid the latency. There are some serious advantages as well- data warehouses provide 24/7 support, insurance, replication, competent professionals, high standards and almost unlimited storage capabilities which can be easily extended based on user needs. Speaking of clouds, advantages are probably not clear for everybody at the first sight. People usually run multiple programs at the same time. If we consider not just everyday users but also professionals, creative people, such as 3D model creators, image editors run complex algorithms which can run for days even on a high-end computer or to stick with the topic, build and compile complex computer programs can take minutes, hours or even days. For these scenarios, cloud computing is a viable solution. But what is the cloud? By definition, cloud is a pool of resources which can be dynamically allocated for users based on their needs. It means if developers want to build and compile a complex program, they simply allocate the resource they need and run their task. Everybody pays after allocated resources. In 3D modelling, rendering an image can take days on limited resources. In cloud, rendering time can be influenced by the amount of the allocated resources. If you need to demonstrate the current state of your scene quickly, you can allocate 10 times more resources than usual and demonstrate your results in hours. Also you have to pay more, but only for the time period when you use the resources. It means reduction in costs and at the same time serious improvement in productivity.
  • 25. 25 Does it worth to invest serious amount of energy, time and money to make remote execution possible? Fortunately, these technologies are designed and tend to be transparent to the user and as much as possible to the developers as well. Remote execution requires complex underlying systems, but companies, like Google[10] , Microsoft[11] , Amazon[12] , etc. offer these systems out of the box. It does not mean that developers do not have to modify their code to be able to leverage these technologies, but it means, they do not have to be aware of how these functionalities are implemented. Usually they get an API and build their systems against it while behind the scenes remote calls, remote storage, and remote calculations are used. 2.2.2.3 – Remote procedure calling architecture The idea is to abstract away calls and replace the actual implementation with an interface which has the same public available methods, properties and etc. Developers will see the same methods, properties and they call these methods and use the properties in the same way as they usually do. But behind the scenes the actual implementation of the interface will just simple pack the request into a package, send it to an execution server which can be in the cloud. The execution server will instantiate the original implementation and call the method with the parameters of the original call. Once the execution is finished, the result will travel back to the client side skeleton and the original calling object will see the result. The beauty of this idea is that neither the calling object, nor the serving object is necessary to know about the fact that the call is made remotely.
  • 26. 26 Moreover, if the design of the original code satisfy some criteria, support for remote procedure call or remote storage, can be injected automatically. 2.2.1.4 Asynchronous calls Imagine a scenario, when you have only one CPU and you build a system which leverages remote procedure calls, remote storage and you also want to support multitasking via threading. Let us say, you want to calculate and show the aggregated risk for your whole business. This requires to load data from your remote storage, execute some complex calculations on it and populate a fancy chart with the results. You run the business for 5 years now, you have 10 gigabytes historical data and the calculation takes 10 minutes in the cloud. During the calculation, you want to switch to your client communication module inside your application and you plan to reply some user feedbacks. So you kick off the calculation process and try to switch to another module, but the application is frozen. In this case, I presented an unfortunately very common use-case. Developers usually forget the fact that remote execution does not solve every problem. In this case, you started a process which takes minutes to complete due to the remote storage and complex calculations. The developers of your application were not aware of threading and asynchronous methodology. When you click on the button which starts your calculations the caller is blocked until the results arrive – in this case, user will be blocked for at least 10 minutes. It is called synchronized method call. Threading can be a solution. When you start the process, the application will create a new thread and execute the call on the thread. This means that the main thread, which is almost in every cases the UI thread, will not be blocked. An alternative solution is the asynchronous call[13] , which basically does the same but abstracts the details away. Programming languages usually contains keywords to mark a call as asynchronous. Behind the scenes, the compiler replaces the keyword with a wrapper around the call, starts and initializes a new thread and executes the method. Nowadays in modern applications, it is a very important requirement to be smooth and reactive. If an application hangs for more than one second, users think something went wrong. Asynchronous notion is very similar to parallel programming as your program does something parallel while the results are getting available from another
  • 27. 27 function call. Thinking about remote execution and clouds it absolutely makes sense to use your local resources for other purposes, while remote execution runs. 2.2.2 Advantages and drawbacks As we saw above it is worth the effort to invest into distributed environments. In this section, I try to summarize the advantages and the drawbacks of distributed systems and try to highlight when to use these techniques. Nowadays multiuser systems does not mean the same as in the past. A single PC is a multiuser system in sense of the old definition. Nowadays, multiuser systems are huge, globally or at least widely available systems for thousands of users. When building such systems, we cannot avoid distribution of concerns. Scalability and reliability usually comes with wisely designed distributed system. If more users are interested, the support team adds a new server to serve the higher number of requests. This model is also more reliable, as if one server goes down, others can step up and serve the broken servers requests, ensuring that support team has enough time to replace the broken one without the users noticing that the original server was down. Distributed system can naturally boost up the performance of several algorithms. Although it has some limitations, advantages are clear. With some investment into a distributed algorithm, the order of magnitude of the execution time can be seriously improved. Distributed systems can provide more fluent and transparent experience. It is also more rational to let professionals take care about our data storage, while we can focus on developing our business logic and the cloud professionals can take care of the execution background of our solution. On the other hand, usage of these technologies can lead to overcomplicated source code, our algorithms and our systems. It is a common mistake to fine-tune an algorithm for parallel execution which will be executed only a few times. Developers need to learn how to leverage the advantages, while minimizing the drawbacks. For example, it is worth to fine-tune an algorithm which is called every fifth seconds during the application lifetime and now takes 4 seconds to complete. But it is not worth to do it with an initialization part of the software which runs only once a day, but it takes ten minutes to execute – or if it is that important to fire up the application quicker, we can write a script which runs the application before the user gets into the office.
  • 28. 28 There are cases, when parallelization would be really necessary, developers put serious effort into the design and at the end of the day it turns out that the solution is wrong or the problem cannot be parallelized efficiently. Or it would take more time, resource or money to do than the actual profit. It is also worth to mention that, implementation, testing and debugging is much more complicated than in case of a single, standalone, not distributed application. Moreover, maintaining distributed systems requires professionals who really understand how the system works, requires communication between vendors to solve problems, etc. These factors highly influence the costs of a distributed system. As a rule of thumb, we should think before deciding to build distributed systems. Developers should mindfully analyze the requirements, think about the edge cases, the advantages and also about the drawbacks and decide wisely. 2.3 Scheduling Scheduling is the process of defining the order of the tasks [14] . People use scheduling all the time, for example recipes define in what order you put ingredients into the meal, bus schedule defines the time when a particular bus arrives at the station etc. In general, scheduling allows accessing to a set of resources by setting up rules. Before the dawn of parallel programming, programmers defined the order of the execution implicitly in their source code. If a command precedes another one in the source, the execution of it will also precede the other one. Main goal of parallel programming is to allow unrelated code blocks to execute simultaneously. In order to achieve this functionality, programmers have to explicitly design the code and mark parallel parts of the system. This model is useful, but leaves the responsibility of deciding when to parallelize code in the hands of developers. It means freedom but can also lead to possible errors or unused possibilities. It would be good to take over the responsibility for marking parallel blocks automatically and just allow the developers to mark blocks where they forbid parallel execution. Scheduling is capable to provide this functionality from an aspect. A good order of tasks can seriously boost the execution time, while it also can seriously degrade the performance. A badly designed scheduling has huge impact on the throughput of the system.
  • 29. 29 2.3 – Various execution times for different schedules Figure 2.3 illustrates the case, when two processing units are working on a bunch of tasks. The longest task is executed first which hangs the other processor because the second has dependency on a task which is not yet calculated on the first one. Switching the order of tasks for the first processing unit will dramatically lower the finishing time of the job. 2.3.1 Cost of scheduling As we saw in the previous example, scheduling can have huge impact on performance. It is also important how we define the actual schedule, how we measure how good the actual order of the tasks is and how quick we can generate an acceptable schedule. The advantages of scheduling are strongly depending on these facts. 2.3.1.1 Defining the schedule Usually schedulers are functions which answers the very basic question: “Which task shall I run next? – CPUx”. There are a number of possible ways to come up with an answer. In the Algorithms part of this section, I am going to describe some of the available algorithms. For example, the simplest algorithm is the random scheduler. It selects a task randomly from the available ones and does not consider any optimization. Or the greedy scheduler, which
  • 30. 30 has one dedicated metric to optimize and greedily selects the available best task in order to optimize the performance. Many more advanced algorithms exist, like the genetic algorithm which tries to find the best solution using evolution. 2.3.1.2 Measure the goodness of the schedule The goodness of a schedule is very subjective. While in a typical backend system a good schedule can take hours to complete, in a frontend system we usually consider an application good, if response time is under one second. But time is only one aspect. Others may focus much more on energy consumption, disk space consumption or cost of the execution. We can compare the performance (based on any criteria, which is measureable) of the old and the new implementation and we can calculate how well the new implementation performs compared to the previous one. But it does not really say anything about the best solution. Let us say we use overall execution time as the metric for the goodness. If we use 𝑁 processors, the theoretical minimum is 𝑇 𝑁 where T is the sum of the execution time of all the subtasks. It can be proved, that the minimum is not always reachable, but it acts well as a low boundary for the goodness. 2.3.1.3 Generate the schedule Another aspect is the creation of the schedule. Even if it is possible to create a best schedule for a given problem, it is not that clear we want to reach that or not. A random scheduler can be implemented easily and can quickly select the next task to run. In contrast, a genetic algorithm requires much more time to create a schedule. The complexity of the generator algorithm highly depends on the actual problem we try to solve. If we have a graph which has 5 nodes and 6 edges, it is easy to implement a method to check all available combinations and choose the best one. If we have a bigger graph, let us say 100 nodes and 500 edges, the complexity of the tester and the required time for the algorithm jumps to the air. Roughly estimation for the first case is 5! = 120 possible combinations, while the second case has 100! = 9.33 ∗ 10157 combinations. This is too much to be solved. Usually schedulers allow us to search for approximate, sub-optimal results, which are between defined boundaries of the optimal solution. We can limit the time for the scheduler and hope that the given result is good enough.
  • 31. 31 2.3.2 Type of schedulers Schedulers can be grouped by a lots of properties.[15] In this section, I am going to describe one possible grouping which focus is mainly on the role of the scheduler throughout the lifetime of the program. 2.3.2.1 Online schedulers Online schedulers are actively participating shaping the execution flow during the lifetime of the program. Their role is to monitor the status of the system and update their metrics accordingly and make scheduling decisions based on the latest available state. Online schedulers do not have a predefined order of the tasks and they can adopt to new situations occurred during the execution. They usually tries to make as good decisions as possible, but their goal is not to reach the optimum, but to be as flexible as possible. Online schedulers are mainly used in interactive systems and where the scheduler algorithm allows and needs to apply the scheduling multiple times. The algorithms are usually try to be simple – round robin, time slicing, etc. It is also important to use this type of scheduler, when the system is being modified during the execution. Operating systems are a good place to use online schedulers. Computers have multiple CPUs with multiple cores, many processes tries to access the resources, the scheduler usually runs each time when a new request comes in or on a periodic time to schedule the running tasks. Good schedulers avoid starving of processes, provide justice and support prioritizing. 2.3.2.2 Offline schedulers Offline schedulers usually run only once in the lifetime of the program – or in a given execution period. They are mainly used, when the problem is complex, the domain of the problem does not change during the execution period and the scheduler algorithm requires serious amount of time or resources to complete. They are also good choices, when the execution is periodic. Comparing to their online solutions, they are less flexible, as they cannot react to changes in the environment during their execution period. They usually aim for better decisions than online schedulers. They usually analyze the model of the problem and create a suboptimal schedule.
  • 32. 32 Example for their usage can be a car factory. In the case the scheduler is the production engineer who sets up the layout of the factory and defines the subtasks of creating a new car. In computer science, an example could be the field of executable graphs. These graphs are usually used more than once as their creation is very resourceful, which means they do not really change between executions. The scheduler can create a schedule for the graph and the executor can use this information during the execution time. 2.3.3 Algorithms Algorithms used in schedulers are specialized for a given problem, but they all follow some common schemes. The following algorithms can be found in several products on the market. Note that each algorithm takes a list of tasks and figures out an order for the execution, thus we can assume all of them are offline schedulers. For simplicity, each algorithm description will use overall execution time as a key property to optimize and will work on a DAG. Let us suppose we have 𝑁 processing units. 2.3.3.1 Random scheduler Random scheduler is the simplest to implement. It takes the input nodes of the graph and marks them as available to process as they do not have any dependencies. It stores the list of the marked processes in a queue (FIFO container). The algorithm simulates one run of the graph and create a mapping between the nodes and the processing units. In the beginning of the algorithm, each processing unit is available. The algorithm takes the first available process from the queue and randomly selects a processing unit for it. The scheduler checks the nodes which are depending on the scheduled one – if one of them becomes available due to the procession of the node, it is pushed into the queue otherwise the scheduler notes that one dependency for the node is ready. Next step is to schedule the next available node from the queue. The algorithm stops, when all the nodes are scheduled. These algorithm is not very clever as random selection is not optimal – worst case scenario is to schedule all the nodes to the same processing unit and leave others idle. It does not consider the possible parallelization chances either. One improvement to support
  • 33. 33 parallelization is to select the next node from the available queue also randomly, which creates the possibility to execute unrelated part graphs simultaneously. 2.3.3.2 Greedy scheduler While random scheduler only uses the simulation to visit all the nodes, greedy scheduler tries to leverage simulation of the execution more.[16] It also maintains a list of the available nodes and also adds the input nodes to an available nodes queue in its initialization phase. The main difference can be spotted in finding the next node to schedule. Let us define variable 𝑇, which keeps track of the required amount of time for the schedule. At the beginning of the simulation, the scheduler selects 𝑁 nodes and distributes them among the processing units. This distribution can be random or based on several criteria. Some implementations do the distribution based on the required time to execute (shortest job first, longest job first) or on how long the node has been waiting to execute. Selection of the processing unit is not random anymore. At a given point in time, only 𝑁 nodes can run, as we have limited number of processing units. If we have unscheduled but available nodes and do not have any available processing units, we increment variable 𝑇. The simulation also has to know or at least to have some meaningful estimate, on how long the execution of a particular node takes. Based on this information and the starting time of a node, the simulator is capable to calculate when a processor will become free again. It means that incrementing 𝑇 for a while will end up making a processor available again so we can schedule the remaining nodes. This iteration is going to continue until all the nodes are scheduled. This method is better than the random scheduler as it tries to balance the usage of processing resources. It makes parallel execution possible in a sense that parallel graph parts can be executed simultaneously. It can also be implemented efficiently. 2.3.3.3 Critical path scheduler This type of schedulers is originated from the critical path.[17] Prerequisite is to know the execution time of each nodes in the graph. If we look at a schedule, usually we can reorder some tasks without modifying the finishing time of the schedule, but there are tasks which cause delay in the execution if they are executed later than expected. Critical path is a list of tasks which cause delay in
  • 34. 34 the schedule if started later than expected. The scheduler tries to focus on the critical path and ensures that we have an available processor when a new critical element becomes available. To identify the critical path, we have to calculate some metrics for each node. Let the EST (Earliest Starting Time) of the input nodes be zero. Every nodes following the first one should be started as soon as it gets ready. EST should be the maximum of EETs (Earliest Ending Time) of the dependency nodes. EET can be calculated from the EST and the execution time of the node. Another important metric is LST (Latest Starting Time). Let the LST of the output nodes be the EST of them. LST of the previous nodes will be calculated based on the LST of the successor nodes, by subtracting the execution time from the LST of the successor nodes and set the value for the minimum of the calculated values. MDT (Maximum Delay Time) is calculated from EST and LST: 𝑀𝐷𝑇 = 𝐿𝑆𝑇 − 𝐸𝑆𝑇. If MDT of a node is zero, that means delay in its start will result in delay in the whole schedule. This class of schedulers tries to satisfy the starting time requirements, but if we restrict the number of the processing units, it is possible that we must introduce some delay to ensure the correct order of the execution. This scheduler gives very good results and allows us to leverage parallel execution. Moreover it helps to calculate how many processing units are needed to solve the problem effectively and can be visualized easily using Gant diagrams. 2.3.3.4 Genetic algorithm scheduler Genetic algorithm scheduler uses a very different approach to create a schedule: it tries to model the evolution. [18] Genetic algorithms use genetics, reproduction and natural selection to converge to an optimal solution. The previous algorithms tried to create the solution in one step. Genetic algorithm initially creates a series of solutions usually randomly which is called the initial population and improving them iteratively. In our case, a solution consists of a mapping between the nodes, the processing units and can also include an order of the nodes. One single solution is considered as a genome. The goal of the algorithm is to create the best genome. How can we measure the goodness of a genome? In genetic algorithm terms, goodness is considered as fitness of the genome. Like the stronger beats the weaker in
  • 35. 35 evolution, more fit a genome is, more likely it survives to the next generation. In our example, we can compare genomes by comparing the required execution time of the encoded schedules. Lower execution time means higher fitness value. Once we have the initial population, genomes can start reproducing. This process is called crossing. As two genomes are paired, they can be combined into a new genome. The offspring contains a random proportion of the information stored in its parents. We select one random crossover point, where we split the parent genomes. In practice if we store the mapping in an array, crossing means we take the first part of the array from the first parent, the second part of the array from the second parent and creates a new array from them. If we apply this method for the array where we store the order, we have to be very careful and ensure that the order we create stays consistent with our restrictions. After the offspring is born, the algorithm applies some random mutation. It is an important step, as it helps the algorithm to overcome on local maximums/minimums and add more variations which provide the chance to evolve. The mutation can improve and also degrade the fitness of the genome. This reproduction cycle continues until the population reaches a predefined size. Once it happens, natural selection is performed among the genomes. Fitter genomes will make it to the next generation while less fit genomes will drop out from the population. The selection process has to ensure that a predefined count of genomes will survive the natural selection. The new generation will start over the described cycle, will be reproducing for a while and then natural selection will happen again. Every generation is going to be at least fit as the previous one. The algorithm stops after a predefined number of generations, a predefined time limit or a predefined acceptable fitness of the generation. Comparing to the previous algorithms, genetic algorithm is the most advanced one, but also requires the most resources and takes a long time to complete. It is very flexible, developers have a lot of choices to influence the algorithm: one can modify the randomness in the mixing or in the pairing process, randomness of the mutation and its value, the maximum size of the population, the threshold in the selection algorithm and the number of the generations. It has pitfalls as well, for example the population can stuck into a local optimum or the extended resource and time requirements to create the schedule.
  • 36. 36 2.4 Compilers When the computer is about to execute the program, it needs information which it can understand. The set of the information which the CPU understands is called the instruction set of the CPU. A program is executable on a given CPU, if there exists a mapping between the language of the program and the instruction set of the CPU. Compilers act as a bridge between the source and the destination set of symbols. 2.4.1 Compilation process During the compilation process, the source code is transformed into byte code, executable by the CPU. This flow consists of several components. The following picture illustrates a possible process for C++ code compilation.[19][20] 2.4.1 – C++ compilation process Developers need some tricks to make the development and maintaining process easier. They can write comments into the source code or they can split source code into smaller parts for example for clarity. Usually preprocessor is in charge to support these features. It removes comments, reassemble divided parts by copying them together and removes unnecessary clutter from the code which are important only for the developers. Preprocessor also can take care of specific language features which are only syntax- sugars. The output is a clear, compile-ready code. The compiler transforms the input into the desired output language. In our case, the compiler will take C++ code and compile it into Assembly language. It is important to notice, that the process needs to know some parameters about the final system which will run the program. The main advantage for using higher level languages is that by introducing one more abstraction level, we define general mapping between higher level
  • 37. 37 constructs and lower level implementation. The program can focus on its own purpose meanwhile compiler vendors take care about the compilation process for different platforms. At the end of the process, the assembler and the linker takes the compiled files and creates an executable binary by setting the memory layout of the program, by linking other libraries and by transforming them to binary code which is now platform dependent. 2.4.2 Compiler optimization During the compilation, modern compilers applies some optimization. There is a huge number of available optimization options for the most widely used, free C++ compiler.[21] Let us take a look on some examples, to see the choices a developer has when optimizing the compilation process or the result.  -fno-inline – tells the compiler not to expand any functions inline  -fno-function-cse – makes each instruction that calls a constant function contain the function’s address explicitly  -fsplit-wide-types – if a type occupies more registers, split them apart and handle independently  -fdevirtualize – try to convert calls to virtual functions to direct calls It is good to be aware of the actual optimization mechanisms, as they can cause serious improvement as well as serious problems too. For example, compilers usually eliminate variables which are never assigned in the code. However in hardware programming if it is possible to set the value of the variable from hardware side as well, eliminating the variable can cause unexpected behavior. To turn this feature off locally, we can mark the variable as volatile. 2.4.3 Higher level languages On the dawn of programming language evolution, creators of new languages usually wanted to keep the possibility to reach hardware easily if needed – C and C++ are very good examples for this. Later, as reoccurring tasks were discovered, people started to create programming languages which fitted their needs better. As Assembly is quite expressive in hardware level programming, functional programming languages like
  • 38. 38 Scala[22] or F#[23] are more expressive in their own fields. We can call these languages higher level or domain specific languages[24] . Domain specific languages have multiple advantages. The most important one is how natural a program can be in a well-designed domain specific language. Back to the previous example, in the Assembly program you use more like registers and basic mathematic operators, in C++ you call a predefined method of a virtual console. Behind the scenes, we have a serious chance that with a good compiler, both programs compile to similar byte code. No doubt, we could write thousands of compilers for every programming languages, which compile them to bytecode, but in fact, compilers of domain specific languages usually compile to the most similar language which already has an existing compiler rather than reinventing the wheel. Another strong argument for using DSL-s is the following: usually developers create software based on someone’s requirements. A trader of a big bank presumably will not be able to describe problems in an Assembly like environment. They speak about bonds, trades, yields and other financial terms but not registers, jumps and functions. If developers would have a set of tools in which they can express their users need and then compile them to binary executables, development process would be more effective and less expensive. There are many promising results in the field of very high level DSL-s, but we are far from being able to solve every problem this way.[25] While developing time and cost can be extremely reduced by using DSL-s, the effort put into developing the DSL itself is huge. In the developer team at least one member has to be a domain specialist to be able to identify the required features to support. The set of the features can be huge and we have to filter wisely what to support from them. It is also hard to decide how generic the new DSL would be. If it is too generic, it may not be as useful as it could be. If it is too specific, we restrict ourselves to a very small set of problems, which we can solve by the DSL. Also, if the DSL would be used only once, most likely it will not worth the effort to develop. 2.5 Stack programming Standard programming languages abstract away the details of the underlying infrastructure and execution details, however many systems use stack-based execution model under the hood. They operate upon one or more stacks.
  • 39. 39 Consider a recursive function call. When the function starts to execute, it sets up an environment for itself, for example it allocates variables. This environment is considered as the context of the function. When the first recursive call is made, the function will execute again and setup its own context. Under the hood, the context of the outer function will be pushed onto a stack, otherwise after the recursive call returns, the variables would be overwritten. If recursion happens in multiple levels, we just need to push more and more contexts to the stack. When the recursive call returns, just pop the latest element from the stack and we get back the previous execution context. This example is specialized to recursive calls, but it is easy to understand, every function works the same way. The data structure supports two basic operations. First one is the push operator, which puts an element onto the top of the stack. Second operator is the pop, which takes the top-most element of the stack. In stack programming these operators can be considered as the assembly level bricks. As we saw earlier, it can worth the effort to add higher level constructs to the language, which helps developers to be more productive. Stack effect diagrams help to define these extensions as they describe how the operator changes the state of the structure. Stack effect diagrams (SED)[28] are usually given in the following standardized form: opName ( before -- after ). Let us examine the pop method. Before executing the operation, the stack contains 𝑁 data, after the operation it will only have 𝑁 − 1 elements. SED of the pop operation is: pop ( a -- ). There are some, widely used and known stack operators, those are introduced in the following. Op. name Stack Effect Diagram dup ( a -- a a ) drop ( a -- ) swap ( a b -- b a ) over ( a b -- a b a ) rot ( a b c -- b c a ) Stack operators
  • 40. 40 Other stack operators can be easily created. Let us create a multiply operator, which takes the top two element, multiplies them and pushes back the result to the stack. SED for this operation looks like: mul ( a b -- c ). 2.5.1 Reverse Polish Notation Reverse Polish Notation (RPN)[29] switches the order of the arguments and the operators in an expression. As it will turn out soon, RPN is essential to leverage stack programming when solving mathematic problems. Let us assume we want to calculate the following expression: (2 + 3) ∗ (4 + 5). Humans usually solve the two addition first, then multiplies the result. For a computer the given expression is a little difficult to solve. When the interpreter sees the + sign, the second operator is not yet available. However in this simple case it would be possible to implement the addition operator to solve this issue, but in general the solution is the RPN. Rewrite the expression based on RPN: 2 3 + 4 5 + * . This way when the execution gets to the addition, required data for the operator will be in place. 2.5.3.1 – Stack based execution of (2+3)*(4+5) RPN has a direct sibling in graph theory. Traversal of the graph, if it is a binary tree, has three simple variations. Starting from the root of the tree and applying one of the following rules recursively, it is guaranteed to visit all the nodes. 1. First visit the node itself, then visit the left child and then the right child 2. First visit the left child, then the right child, then the node itself 3. First visit the left child, then visit the node itself, then visit the right child Consider the following graph, which describes the expression above.
  • 41. 41 2.5.3.2 – Graph representation of (2+3)*(4+5) Applying the third rule and log out the content of the nodes when visiting them will result the original expression. Second rule produces the RPN form of the expression, while the first one is the Polish Notation form. This mapping will be very useful, when modelling mathematical problems in this paper. 2.5.2 Forth Forth[30] is an imperative stack-based computer programming language and programming environment. It supports structured programming, reflection, concatenative programming and extensibility. The environment can be used to execute stored programs as well as an interactive shell. Forth is not as popular as other programming languages or environments, but sill used in some operating systems and space applications. Forth operates with words (subroutines). Implementations usually have two stacks. First stack is used to store local variables and parameters, named data stack. Second one is the function-call stack, which is called linkage or return stack. Let us take a look at a self-defined word in Forth: : FLOOR5 ( n -- n’ ) DUP 6 < IF DROP 5 ELSE 1 – THEN ; Here we define a new word (FLOOR5). As the stack effect diagram shows it will manipulate the top element of the stack by replacing it with something else. The remaining part of the expression is the body of the method. DUP word duplicates the top element, then 6 pushes a new element (6) into the stack. < compares the top two elements of the stack and replaces them with a true or false value. IF and THEN are the usual conditional branch statement. In the true branch, it replaces the original element with a value of 5, while in the false branch, it pushes a value of 1 into the stack and then subtract it from the original value. Execution of a Forth program is as simple as possible. The interpreter reads a line from the user input and tries to parse it. When the interpreter finds a word, it looks it up
  • 42. 42 in the dictionary for the associated code and executes it. If the word cannot be found in the dictionary, the interpreter assumes it is a number and tries to push it into the stack. If both try is failed, execution of the code is aborted. When defining a new word, Forth compiles the word and makes the name of the word findable in the dictionary. Stack based programming languages, especially Forth inspired me when designing the execution model used to run an executable graph.
  • 43. 43 3 Design and implementation In this chapter I set up and analyze the domain problem of my thesis and also design and implement a solution for it. I explain the solution step by step starting with the problem definition and provide usage and implementation details for each step in the process. 3.1 Business problem Financial sector is living a high pace lifestyle. Markets and regulations are rapidly changing and financial crisis just boosted this effect. Regulatory presence is common and supervisions happen every other day especially since the financial crisis.[31] In such environment performance, reliability and maintenance are essential parts in every systems developed. 3.1.1 Morgan Stanley Morgan Stanley[1] is one of the biggest financial services corporation in the world. It operates in more than 40 countries, with more than 1300 offices and 60000 employees. History of Morgan Stanley[2] is dated back to 1935 when some J.P. Morgan employees, namely Henry S. Morgan and Harold Stanley decided to start a new firm. Morgan Stanley splits its business into the following categories:  Wealth Management  Investment Banking & Capital Markets  Sales & Trading  Research  Investment Management Although it is a financial company, it has an outstanding IT department which supports everyday operation. Even if the systems inside the company cannot harm humans like an airplane software if it is malfunctioning, IT department has to consider other important factors. For example, a bug in a software on the trader floor, which 3.1.1 – Morgan Stanley
  • 44. 44 prevents the trader from doing business causes profit loss to the firm. The actual value of the loss depends on various factors, but in general we speak about millions of dollars. 3.1.1.1 Team work It is an unconcealed fact that Morgan Stanley seeks the best talents in every area where they operate. They are actively participating in the education, announcing internship programs and fresh graduate programs. For example in Hungary, their office mainly acts as a back office, they do IT developments and have some accounting related tasks there, but in the last couple of years, they extended the number of their employees from 100 to over 1000. Knowing the size of the firm, the complexity of their systems and the fact they are committed to team-working, they also try to prepare future candidates to work as a team. As this paper is supported and inspired by the firm, during my studies and work I also participated in team-work. In the beginning our main method was to consult the basic ideas, problems and principles onsite with our consultants and implement the solution separately. During the first semester, we have identified a lot of interesting topics and distributed them between us. After this point, consultations were similar to scrum meetings, where we presented our progress, ideas, problems and got directions if we were stuck. 3.1.2 Pricing In the financial world, pricing is one of the most important notion.[32] As the proverb says: “Buy cheap, sell high”. Process of defining value for cheap and high is the pricing process. We can price anything – stocks, bonds, options, gold, silver, cars, flats, vacations. Markets even price possible defect of countries or changes in laws. It also makes sense to create prices for different scenarios – if it is going to happen, the price is 𝑋 otherwise 𝑌. Accuracy of pricing is essential. Yield of a trade is usually the bid-ask spread, which is the difference between the bid (buy) price and the ask (sell) price. If we speak about trading, another important aspect is the speed of the pricing. As seller’s or buyer’s will is available to every market participants, the quicker response to a request can improve the odds to make a good deal. Speed of pricing gets even more important, when talking about automatic trading, where computers trading against each other.
  • 45. 45 Pricing is also a technical challenge. There are a huge number of available assets to sell or buy on the market, most of them are correlated to some other assets. Data is live, which means it ticks every seconds or even more frequently. The problem I try to solve in this paper is to model connections between the assets and develop an efficient and effective pricing implementation. 3.2 Modelling the problem The problem to solve is defined in native English. To be able to analyze and solve it, I have to model the problem using computer science terms. Pricing algorithms are usually considered as business secret, so it is almost impossible to find any information about how big companies implemented their own version. Even Morgan Stanley could not provide me anything about their implementation. Probably every big company use the same algorithm tailored to their own directives. In such competitive environment this conduct is absolutely understandable but also makes harder to position my results. 3.2.1 Graph representation When representing networks, dependencies and connections, graph is the obvious choice. As we want to model the pricing of assets and the connections between them, it is easy to identify a mapping: nodes of the graph will be the assets we want to price and edges will represent connections. Mapping of the edges carries information in its own, therefore edges are directed. 3.2.1.1 Challenges and constraints Connections between assets can be really complex. Rather than allowing expressions to be complex, I have decided to force users to build up complex expressions from simple bricks, like addition, multiplication and some other basic mathematic operations. On one hand, this makes the input graph bigger and harder to create, but the user can reuse partial results preventing recalculation of some common values. As mentioned before, pricing can get extremely complex. For example, it is quite usual that one asset influences another asset, but that asset also influences the first one directly or indirectly. In this case, we have a circle in the graph, which filters out a number of usable methods. To sort this problem out, in the beginning we are going to allow only
  • 46. 46 DAGs as input. Later on the solution can be extended to support circles as well, since solutions exist to slice up a graph which contains circles subgraphs which are DAG and later on connect them again without messing up the results. 3.2.1.2 Implementation details To represent a graph on a computer, we can store the list of the edges, store the adjacent matrix of the nodes or use other storing mechanisms. In the first case, we store a list for each nodes, in which we store every edges which contains the given node. In the second case, we store a 𝑁 ∗ 𝑁 matrix, where 𝑁 is the number of the nodes. Every element of the matrix represents one possible edge between the nodes. Both approaches has its own advantages, but in our case storing the adjacent matrix is not efficient. Our domain problem implies that our graph is DAG. If we use adjacent matrix, we have to store an entry for each possible edges even if we do not want create it. Due to the nature of the domain problem, we do not have all the edges, which means unnecessary memory consumption. Thinking ahead, a node itself has to know which edges are directed to it. From the implementation point of view, we get some benefits if the node also knows which edges are directed out from it. Keeping this in mind, I have decided to implement a hybrid solution, which tries to unify the advantages of the concepts. My implementation creates one list for the nodes and one for the edges. These lists help to find any element in the graph quickly. A node itself contains two lists – one for the incoming edges and one for the outgoing edges. Finally an edge contains the two nodes it connects, distinguishing source and target. 3.2.1.2 – Graph structure
  • 47. 47 Using this structure, navigating in the graph is very quick, easy and efficient. On the other hand, the structure complicates operations like adding or deleting nodes or edges. For example, when we delete a node, we have to maintain multiple vectors, which is clearly an overhead. Comparing the amount of the advantages and the drawbacks, the structure is quite sufficient as most of the time the graph will execute rather than change. The architecture is designed to be as lightweight as possible. The implementation uses pointers everywhere it is possible. More specifically, we use pointer types from the C++11 STD library – shared, weak and unique pointers. 3.3 Building up a graph Using a simple example, all the details of the model and the graph building process is illustrated. 3.3.1 Quadratic equation For simplicity we aim to solve the quadratic equation in our example. 𝑦1,2 = −𝑏 ± √𝑏2 − 4𝑎𝑐 2𝑎 We can consider 𝑎, 𝑏, 𝑐 as inputs and 𝑦1, 𝑦2 as outputs. One execution of the graph is going to solve the equation for us. 3.3.2 IProcessable interface In our model every element is capable to run, this is true for nodes and for edges as well. This observation leads us to leverage object oriented paradigm and create a base class for all the runnable elements in the graph. In C++ we create interfaces as classes since the language does not contain the interface keyword. The IProcessable base class has two main functions: to store the value of a node and to store the function which we will fire during the execution. In this model, the function we execute is the process( ) function. To ensure that every derived classes implement their own function to run, I marked the process function as abstract.
  • 48. 48 3.2.2 – Basic hierarchy of the graph elements The signature of the function is empty, because the arguments of the function will be the predecessors of an element. It means flexibility since instead of restricting the signature we leave argument handling to the function itself. Although variable length parameter list is available in C++, it would be much harder to correctly design the signature of the process function and to ensure every possible combinations than checking predecessors in the beginning of each functions and decide whether they can be considered as a valid input. The Value property can be used to store information which is required for the element. Let us say we want to create edges, which not only transport the value from one node to another, but also multiply the carried value. If we want to create an edge which doubles the value and an edge which triples the value, we can create two separate classes, but then we duplicate the code. If we create just one class and store the multiplier as a variable, we can easily modify the multiplication factor for each edges without creating unnecessary complexity in the code base. Node and Edge classes also have default implementations and should be specialized further. Their properties provide the functionalities I described earlier – namely a node stores every predecessor and successor edges in two separate lists and an edge stores the nodes it connects to each other. This structure stores the direction of the edge: From property is the source and To property is the destination node. 3.3.3 Node types Node is a very generic class. It only knows its predecessors and successors. I have defined some node types which I can solve mathematic problems with. The type system is easily extensible, so further types can be created quickly and easily.
  • 49. 49 Addition node This type does the basic mathematic addition. Takes the values of the predecessors and sums them up, then notifies the successors about completion. Value property is not used. Constant node This type acts as a constant input. It never has any predecessors and it is always ready to process. It uses the Value property to store the predefined value. In its process function it does nothing but notifies the successors about its value. Division node This type realizes the division operation. It waits for two predecessors, first one is the dividend, and second one is the divisor. Value property is not used. Multiplication node This type does the basic mathematic multiplication, implementation is very similar to the addition type node. Square root node This type calculates the square root of a number. It takes one predecessor. Value property is not used. 3.3.4 Edge types Edge class is also a generic base class. It stores the source and the destination nodes. The implemented function is very simple, takes the result value from the source node, stores it and notifies the destination value. 3.3.5 Graph object Nodes and edges are self-descriptive, but it makes sense to encapsulate coherent parts. 3.3.5 – Graph structure
  • 50. 50 The class itself tries to be as simple as possible. To support previously described functionalities, nodes and edges are stored in separate lists. At this point, we only need two public methods, namely add a node and add an edge to the graph. Each function takes a node or an edge as argument appropriately. I have decided to decouple the creation of nodes and edges from the graph. This separation helps not to pollute the graph object with the details of how to create a node or an edge. Creation of elements can be done via a factory class which can take care of the creation of separate types of nodes and edges. 3.3.6 Example graph With the acquired knowledge, we can now setup our example graph. First step is to separate the starting equation to express the two results separately. Doing that, we get the following two expressions for the results: 𝑦1 = −𝑏 + √𝑏2 − 4𝑎𝑐 2𝑎 𝑦2 = −𝑏 − √𝑏2 − 4𝑎𝑐 2𝑎 To be able to express the RPN form of the expressions, rewrite the formulas and then create the RPN form: 𝑦1 = (−𝑏 + (𝑏2 − 4𝑎𝑐)0.5 )/2𝑎 𝑦2 = (−𝑏 − (𝑏2 − 4𝑎𝑐)0.5 )/2𝑎 𝑦1 = (−𝑏) 𝑏2 4𝑎𝑐 − √+2 𝑎 ∗/ 𝑦2 = (−𝑏) 𝑏2 4𝑎𝑐 − √−2 𝑎 ∗/ Using the RPN form of the expression, it is easy to create the graph of the expression. During the build process if we encounter a subexpression which we need and has been already calculated and has not changed since then, we can reuse it to prevent unnecessary recalculation of values. Using these observations and expressions, the following graph can be constructed.
  • 51. 51 3.3.6 – Graph of the quadratic equation In the picture each colors have different meaning. Blue nodes are input nodes. Pink nodes are constant nodes with predefined values (nodes with the value of -1 are identical, displayed twice only in order to simplify the picture). Yellow nodes are operators of type addition, multiplication, division and square root. Output nodes are marked with green. C++ code for creating this graph can be found in the Appendix. 3.4 Simple execution models Once we have the model setup, the next step is to run it and acquire the results. Before designing an advanced model, let us take a look, how easy it is to express the notion “running” in our current model.
  • 52. 52 3.4.1 Recursion based execution As we discussed earlier we have a simple mapping between the RPN form of an expression and its graph representation. We also saw an example on how to solve an expression using its RPN form. In this section, we are going to implement a solution using the graph representation. 3.4.1.1 Recursion Using recursion[33] , we can easily iterate through all the elements of the graph and calculate the final result. To start the execution we have to call the process function of an output node as recursion works backwards. To process a node, we simply take the result of all the predecessor nodes, execute its own calculation and return. Getting the result of a predecessor element causes recursion. Each recursive algorithm needs a stop condition. In our case it is going to be hit on a constant node – when we reach one, we do not need to call any further process functions of predecessors, we can return with the value of the node. 3.4.1.2 Implementation details Each element of a graph is derived from IProcessable base class hence have a process( ) function. To support recursion based execution I had to ensure that process function of any IProcessable causes calls to the process functions of each predecessors. Underlying implementation of process function calls the GetResultValue( ) function of its predecessors. It basically means that when calling the GetResultValue function, the element itself does not know the call will cause a recursion or will just simply return with the result value. This fact comes handy when dealing with multiple outputs. GetResultValue function is implemented in way that it only causes recursive calls when it is called the first time. When we have multiple outputs, much likely they use common part-graphs, so rather than recalculating the whole part-graph again, we can reuse intermediate results. To get the results, we have to call the GetResultValue( ) functions of the output nodes. I consider every node without a successor as an output.
  • 53. 53 3.4.1.3 Experiences Implementing this version pointed out that using higher level languages makes development much easier as notion like “process” can be expressed naturally via well designed objects. This version is the first one I implemented so it is really hard to judge the performance but in theory I can make provisions. First barrier of usage can be memory limitation. This type of recursion, where we have to keep the original calling context of a recursive call, can quickly lead to out of memory exceptions, especially for big graphs. Other issue could be the speed of the execution. Knowing that we have a complex data structure for storing a graph and also have a complex class hierarchy, I assumed that using wiser execution algorithms can have outstanding performance improvement comparing to the recursion based execution model. 3.4.2 BFS based execution From a certain point of view, BFS-based execution is the opposite of the recursion based model. While using recursion means we automatically know which element is ready to be processed, if we go the other way around and start from input nodes, we have to monitor, which element is ready to be processed. 3.4.2.1 Implementation details Remember that we are working on DAGs, which means a BFS traversal ends up in a valid order of execution, if we apply a little modification in the algorithm. In BFS, after visiting a node we add all unvisited neighbors to the ToProcess list. In our case, we need to ensure that an element is added to the list only if all of its predecessors are ready. To support this validation, I have implemented two functions. The isReadyForProcess( ) function checks that every prerequisite of the element is ready. Purpose of registerParameter( ) function is to help determine the actual state of an element. When an element is processed, it notifies its successors about this fact by calling the registerParameter function of each successors. The function inside maintains a counter and increments it on every call. The checking function compares this counter with the number of predecessors and if they equal, the element is ready to be processed.