Good afternoon. My name is Saulo, and I am going to present Afluentes, which is a Java framework for developing IO intensive systems. More specifically, for improving the performance of these systems through concurrent IO. But, before I start, I would like to talk a little bit about myself.
I have been working in the Brazilian software industry for almost 20 years now. In the beginning of my professional life, I co-founded a startup. Then I worked in medium and large sized companies. Now I am working at the Brazilian Central Bank. At all these places, I worked mainly as a software architect for object-oriented web applications backed by relational databases. As a software architect, I have been dealing with performance issues ever since. This is especially true regarding my work at the Brazilian Central Bank, because there I work with a team that develops and maintains some mission critical systems. For example, the Brazilian Central Bank uses one of these systems to intermediate the relationship between the judiciary system and the financial institutions of the country. As you can imagine, these mission critical systems not only have to work correctly, but also have to show good performance. This experience is the starting point of the Afluentes framework. So, let’s start!
Many systems are IO intensive, which means their total execution time is dominated by the time the processor spends waiting for IO operations to complete. In several IO intensive systems, there are few dependencies among data transfer and processing tasks. These systems may profit from requesting IO operations concurrently because, if the necessary resources are available, then these operations will be performed in parallel, resulting in huge performance gains. Despite this enormous potential, systems, even IO intensive ones, seldom request IO operations concurrently. Of course, the question we should ask is…
An, at least partial, answer to this question is that mainstream methods for concurrent IO make programming harder. For example, if we use synchronous IO primitives with multithreading programming, then we face nasty problems such as race conditions, deadlocks and scalability issues. On the other hand, when we use asynchronous primitives, we have to deal with callbacks, and it is no wonder developers call programming with callbacks “callback hell” and “pyramid of doom”. This is because callbacks really make programming harder. We can appreciate how hard by comparing the implementation with and without callbacks of even simple examples.
This Java code snippet calculates the value of the expression b2-4ac using regular methods, which are synchronous methods that deliver their results through return values. This example also illustrates a nice property of synchronous methods. They are easily composed and the runtime system coordinates their executions.
By contrast, this Java code snippet also calculates the value of the expression b2-4ac, but this time using asynchronous methods, which deliver their results through callbacks. As we can easily see comparing both implementations, programming with asynchronous methods is at least much more verbose than programming with synchronous ones.
Actually, programming with asynchronous methods is harder than with synchronous ones not only because we are writing more lines of code. The real reason is that it falls upon the programmer to manually coordinate the execution of asynchronous methods whilst the runtime system coordinates the execution of synchronous ones. Callbacks should be just a mechanism through which asynchronous methods deliver their results, but in the absence of further programming language support, programmers end up using callbacks to coordinate communication and synchronization among different execution contexts. This is how things become nasty. It is the road to “callback hell”. It is the entrance to the “pyramid of doom”.
Let’s summarize the main properties of synchronous and asynchronous methods.
Synchronous methods deliver their results through return values, which makes composing them easy. The runtime system coordinates their executions. A major drawback to synchronous methods is that they are sequentially executed, therefore we cannot do concurrent I/O with them.
By contrast, asynchronous methods deliver their results thorugh callbacks, which makes composing them hard. The programmer has to manually coordinate their executions. Despite this issues, asynchronous methods have a major advantage over synchronous ones: they allow us do to concurrent I/O.
Hopefully, at this point of the discussion, a question that naturally arises is “is it possible to combine the advantages of synchronous and asynchronous methods?” In other words, “is it possible to compose asynchronous methods with the same ease with which synchronous ones are composed, and still concurrently execute them?”
Fortunately, the answer is yes.
This diagram illustrates how the Afluentes framework combines the advantages of synchronous and asynchronous methods. On the left side, we have the set of asynchronous methods. On the right side, we have the set of synchronous methods. As we can see, the set of synchronous methods has a subset whose elements we call evaluators. I will explain what evaluators are in just a little bit. For now, the important thing to assimilate is that the Afluentes framework provides adapters that turn asynchronous methods into evaluators. Since evaluators are synchronous methods, this transformation makes it easy to compose asynchronous methods. The Afluentes framework also provides adapters that turn synchronous methods into evaluators. This is nice because, by turning both synchronous and asynchronous methods into evaluators, we can mix both kinds into the same method composition. Now, I can explain what evaluators are.
Evaluators are synchronous methods. Therefore they are easily composed. What is special about evaluators is they are methods which receive evaluations as parameters and also produce evaluations as results. Of course, now I need to explain what evaluations are.
Evaluations are objects constructed by evaluators. They encapsulate a method to be invoked and also the parameters with which this method must be invoked. Since these parameters are also evaluations, we can construct evaluation trees. In fact, a composition of evaluators produces an evaluation tree, when executed.
For example, if we had composed evaluators in order to calculate the value of the expression b2-4ac, then this composition would produce the evaluation tree shown in figure (a) when executed. When the programmer requests the result of its root, the Afluentes framework traverses this tree looking for evaluations that can be performed, i.e., evaluations whose parameters have already been evaluated. The evaluations the Afluentes framework finds can be related to synchronous or asynchronous methods. In both cases, the Afluentes framework invokes the method the evaluation encapsulates. If the method is synchronous, then there are no difficulties invoking it. Things get more interesting when the method is asynchronous. In this case, the Afluentes framework takes care of setting up a callback to receive the result of the method. This is how the Afluentes framework frees the programmer from coordinating the execution of asynchronous methods with callbacks. In the Afluentes framework programming model, callbacks are what they should have be since the very beginning: just a mechanism for asynchronous methods to deliver their results. This is the only moment programmers have to deal with callbacks. When the execution of the asynchronous method is complete, the Afluentes framework repeats this process. Eventually, the Afluentes framework will perform the root evaluation. At this point, the evaluation process is complete.
Unfortunately, methods are not first class citizens in Java. For example, they cannot be stored in attributes, which is a necessary condition for implementing evaluations.
The Afluentes framework works around this Java limitation by simultaneously representing synchronous and asynchronous methods, callbacks and evaluators through objects that implement specific interfaces.
This slide shows the interfaces ISyncFn2 and ISyncFn3, which represents synchronous binary and ternary functions, respectively. The definition of these interfaces has a lot of type parameters. The Afluentes framework makes extensive use of the Java type system in order to guarantee type safeness in function compositions. The type parameters X1, X2 and X3 define the function domain. The type parameter Y defines its codomain. These interfaces have a single method called y, which encapsulates the task the function performs. Since these interfaces represent synchronous functions, the y method delivers its result through a return value.
This Java code snippet calculates the value of the expression b2-4ac, this time using synchronous functions. As we can see, we compose synchronous functions just like we compose synchronous methods. This is no accident. We designed the interfaces that represent synchronous functions with this similarity in mind.
This slide shows the interface IAsyncFn2, which represents asynchronous binary functions. Since this interface represents asynchronous functions, its y method has a void return value, delivering its result through a callback.
The interface ICallback represents callbacks. Asynchronous functions invoke its y method to deliver the result of its executions and the method t to notify the occurrence of an exception during its execution.
Now we can see all these interfaces in action. This Java code snippet defines the asynchronous methods sub and mul, which calculates the difference and the product of two numbers respectively. Actually, we cannot see the method mul but it is defined just like the method sub. Since these methods are asynchronous, they deliver their results through callbacks. We turn these asynchronous methods into evaluators through adapters. This transformation allows us to easily compose the asynchronous methods sub and mul. As I told you before, the execution of evaluators creates an evaluation tree, which will be executed only when the result of its root is explicitly requested. As we can see, the Afluentes framework made it possible for us to easily compose asynchronous functions whilst preserving their main property, which is to be concurrently executed.
We evaluated the Afluentes framework through experiments performed in a Java web application that allows its users to exchange messages. A sort of email. This application persists objects in a relational database through an object-relational mapper. We chose this system because it was developed with a mix of technologies that are very popular in web applications.
Systems that adopt this mix of technologies are usually IO intensive because the time the database spends executing queries, is much longer than the time the application takes to process queries results. These applications also offer many opportunities for concurrently submitting queries to databases. This is because they have many instances of code like the one shown in this slide, which executes a query and traverses from the objects the query returns to other associated objects. Thanks to the way object-relational mappers work, every time the system traverses to an associated object, a new query will be submitted to the database. Object-relational mappers submit these queries to databases sequentially but there is no reason why they cannot do this concurrently.
We opted for introducing the Afluentes framework into the system in a slightly intrusive way. We did this by inserting a code that uses the Afluentes framework to prefetch associated objects by concurrently submitting queries to the system database.
This chart summarizes the evaluation results. The first bar represents the total execution time of each experiment when submitting queries to the database sequentially. The second bar represents the total execution time when submitting queries to the database concurrently with the Afluentes framework, significantly reducing the total execution time. The third bar represents the total execution time when also submitting queries to the database concurrently. However, this time we used “pure” callbacks, which means that we used callbacks to manually coordinate the execution of asynchronous methods. As we can see, the total execution time when using the Afluentes framework and when using “pure” callbacks are very similar, showing that the overhead introduced by the Afluentes framework whilst coordinating the execution of asynchronous methods is negligible.
In our observations, we noticed the code that uses the Afluentes framework for prefetching associated objects has a very uniform structure. From this observation, we decided to define a domain-specific language that allows programmers to declare which objects a system will traverse. We developed an interpreter for this language. This interpreter uses the Afluentes framework for concurrently prefetching objects targeted by statements of this language. This Java code snippet shows the interpreter in action. This concludes my talk.
I would like to thank you all and also refer you to the Afluentes framework site for more details. Thanks!
Afluentes - Concurrent I/O Made Easy with Lazy Evaluation
Concurrent I/O Made Easy With Lazy Evaluation
Saulo Medeiros de Araujo
A Little Bit About Me
• I have been working in the Brazilian software industry for almost 20
• I cofounded a startup (1999-2004)
• I worked in medium and large sized companies (2004-2006)
• Currently, I am working at the Brazilian Central Bank (2006-Present)
• I have been acting mainly as a software architect for object-oriented
web applications backed by relational databases
• This experience is the starting point of the Afluentes framework
• Many systems are I/O intensive
• Several I/O intensive systems exhibit few dependencies among data
transfer and processing tasks
• The performance of these systems may be significantly improved by
requesting I/O operations concurrently
• This is because these operations will be performed in parallel, if the
necessary resources are available
• Despite this huge potential, most systems request I/O operations
Asynchronous Methods and Callbacks Issues
• More lines of code
• The programmer has to manually coordinate the execution of
asynchronous methods manually
• Callbacks should be just a mechanism for asynchronous methods
deliver their executions results
• In the absence of further programming language support,
programmers end up using callbacks to manage communication and
synchronization among different execution contexts
• This is how things become nasty
Synchronous and Asynchronous Methods
• Synchronous Methods
– Deliver their results through return values
– Easily composed
– The runtime system coordinates their executions
– Are sequentially executed
• Asynchronous methods
– Deliver their results through callbacks
– Composing them is hard
– The programmer has to manually coordinate their executions
– Are concurrently executed
• Is it possible to combine the advantages of synchronous and asynchronous
• Evaluators are synchronous methods, therefore, composing them is
• They receive evaluations as parameters
• They produce evaluations as results
• Evaluations are objects built by evaluators
• They encapsulate
– A method to be invoked
– The parameters (other evaluations) with witch this method must be invoked
• When executed, an evaluator composition builds an evaluation tree
Problem in the Paradise
• Methods are not first class citzens in Java
• They cannot be stored in attributes, passed as parameters, etc.
• This is a necessary condition for implementing evaluations
The Afluentes Framework Strategy
• The Afluentes framework represents methods, both
synchronous and asynchronous, and callbacks through objects
that implement specific interfaces
• Java web application that allows its users to exchange messages
– Sort of e-mail
• Persists objects in a relational database (MySQL)
• Interacts with this database through an object-relational mapper