Be the first to like this
A lot of database related algorithms are more difficult to implement in a distributed environment. Quite often, the "distributed" version is far from the "classical" version : constraints are dropped (see the CAP theorem), only specific cases are supported (for example : the involved data needs to be co-located within the distributed system), etc. This talk focuses on joins. We start by presenting join implementations in "classical" relational databases than we lead the audience through the challenges and solutions to make these functions available in a distributed environment. While we start with a theoretical point of view, we finish by giving real life examples from implementations in ETL systems (known for joining heterogeneous databases and therefore quite advanced in this area, but often not real-time) and some modern NoSQL databases (where most systems choose to offer less features with respect to joins).