GaianDB

GaianDB
A dynamic distributed
federated database
Dale Lane
@dalelane

A massively over-simplified view of
data-warehousing...

GaianDB
a
dynamic
distributed
federated
database

Network of distributed databases

A dynamic network
Biologically-Inspired Self-Organisation
Exploit natural selection in nature to
build better networks
Robust self-organizing network
architectures
Frameworks and algorithms for robust
fault-tolerant information dissemination
Robust communications with minimal
complexity or human control

Gaian database
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Queries
Queries routed to all database nodes – a
flood query, but retrieving only the data
required to satisfy a query
Exchanges query traffic in the network for
data traffic – aiming to minimize total traffic
Predicated on a concept of ‘store
data locally - read data from
anywhere’ paradigm

Architecture
GaianDB
Derby Engine: Parsing, Compilation, Execution
GaianPStmtNode VTI:
Executes queries on physical leaf nodes +
Propagates the original SQL (+ queryID & steps state info) to linked Gaian nodes
Instantiates Invokes costing
methods
Pushes columns
and ‘where’ clause
in a structure
MQ(tt)
Stream Data
Original SQL
DB2 Oracle MS
SQLServer
Sybase MySQL Flat files
In-memory
tables
Derby
GaianDB
GaianDB
GaianDB
propagate
Text Index
Derby
tables
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
Expanded Node
Multithreaded, breadth-first query propagation
Loop detection/handling – no duplicates

Performance – with 1,250 nodes
Query time for 1025 nodes, fetching up to 1025 rows from each
y = 4.217x + 349.251
0
1000
2000
3000
4000
5000
6000
0 200 400 600 800 1000 1200
Row s fetched per node
Time(milliseconds)
Query Execute Time
Total Query Time
Linear (Total Query Time)
Query Performance
0.0
53.9
107.8
161.7
215.6
269.5
323.4
377.3
431.2
485.1
539.0
0 200 400 600 800 1000 1200
Number of Nodes
QueryTime(milliseconds)
Average Query Time
Predicted Max (Layers)
Predicted Min (Layers)

Performance questions
The time to propagate a query to all of
the nodes in the database, as a function
of the number of database nodes (N);
The time to fetch data from across the
nodes of the database to a single node,
as a function of the volume of data;
The time to fetch data from across the
database to multiple nodes concurrently
querying, as a function of the number
of nodes concurrently querying.

Graph metrics
The eccentricity ε(νi) of a graph
vertex νi is the maximum graph
distance between νi and any other
vertex νj of G i.e. the "longest
shortest path" between any two
graph vertices (νi , νj) of the graph.
The maximum eccentricity is the
graph diameter Gd. The minimum
graph eccentricity is the graph
radius Gr. We define the size of G as
the number of vertices N and the
number of connections at each
vertex as the vertex degree δi
(1 < i ≤ N).

Biologically inspired self-organisation
0
1
2
3
4
5
6
7
8
9
10
0 200 400 600 800 1000
Number of Nodes (N)
GraphDimension(edges)
Radius
Diameter
(1+e)ln(N)
(1-e)ln(N)
Network growth by
preferential attachment
Using a fitness function at
each node
Limit maximum vertex degree =10
Gd = nint [ (1+e) * ln(N) ]
Gr = nint [ (1-e) * ln(N) ]
e = 0.24

Query propagation time
The predicted maximum (Tmax) and
minimum times (Tmin) to execute the
flood query are:
TL = link latency
Tp = processor delay
Tmax = (Gd + 1)(TL + Tp)
Tmin = (Gr + 1)(TL + Tp)
with the predicted execute query time
from any node (Tν) being:
Tν = (ε(ν) + 1)(TL + Tp)
Hence substituting for ε(ν)
Tν = nint[1 + B * ln(N) * (TL + Tp)]

Measured query propagation
IndividualQueryTimeScalability
0.0
53.9
107.8
161.7
215.6
269.5
323.4
377.3
431.2
485.1
539.0
592.9
0 200 400 600 800 1000 1200
Number of Nodes
QueryTime(ms) AverageQueryTime
PredictedMax(Diameter+1)
PredictedMin(Radius+1)
Queriednodeeccentricity+1
Individual Query Time Scalability
0
53.9
107.8
161.7
215.6
269.5
323.4
0 50 100
Number ofNodes
QueryTime(ms)
Individual Query Times
Average Query Time
Queried node eccentricity+1

Measured data fetch
Query time to fetch 1 million rows
y = 4.217x + 349.251
y = 1.7383x + 678.141
0
1000
2000
3000
4000
5000
6000
0 200000 400000 600000 800000 1000000 1200000
Total Rows fetched
Time(milliseconds)
Total Query Time 1025 nodes
Total Query Time 1 node
Total Query Time 1 node indexed
Linear (Total Query Time 1025 nodes)
Linear (Total Query Time 1 node)

Smart Metering
centralised
write

Smart Metering
centralised
read

Smart Metering
distributed federated
write

Smart Metering
distributed federated
read

http://www.alphaworks.ibm.com/tech/gaiandb

Image credits
Background: YouTube video “The Internet of Things”, IBM
http://www.youtube.com/watch?v=sfEbMV295Kk
Icons: DB and envelope icons, Tim Morgan
http://flickr.com/photos/timothymorgan/sets/1615269
Microsoft Excel icon, Vincent Garnier (courtesy of IconArchive)
http://iconarchive.com/show/softdimension-icons-by-benjigarner/Excel-icon.html
Photo of car mechanics, Tomas
http://flickr.com/photos/tma/2264878
All other images original from GaianDB work

GaianDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GaianDB

Similar to GaianDB (20)

More from Dale Lane

More from Dale Lane (20)

Recently uploaded

Recently uploaded (20)

GaianDB