GaianDB
A dynamic distributed
federated database
Dale Lane
@dalelane
A massively over-simplified view of
data-warehousing...
The “Internet of Things”
GaianDB
a
dynamic
distributed
federated
database
Federated data
Network of distributed databases
A dynamic network
A dynamic network
Biologically-Inspired Self-Organisation
Exploit natural selection in nature to
build better networks
Rob...
Gaian database
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N4
N5
N1
N2
N6
N7
N8
N10
N9
SQL Query
N0
N3
N11
N...
Architecture
GaianDB
Derby Engine: Parsing, Compilation, Execution
GaianPStmtNode VTI:
Executes queries on physical leaf n...
Performance – with 1,250 nodes
Query time for 1025 nodes, fetching up to 1025 rows from each
y = 4.217x + 349.251
0
1000
2...
Performance questions
The time to propagate a query to all of
the nodes in the database, as a function
of the number of da...
Graph metrics
The eccentricity ε(νi) of a graph
vertex νi is the maximum graph
distance between νi and any other
vertex νj...
Biologically inspired self-organisation
0
1
2
3
4
5
6
7
8
9
10
0 200 400 600 800 1000
Number of Nodes (N)
GraphDimension(e...
Query propagation time
The predicted maximum (Tmax) and
minimum times (Tmin) to execute the
flood query are:
TL = link lat...
Measured query propagation
IndividualQueryTimeScalability
0.0
53.9
107.8
161.7
215.6
269.5
323.4
377.3
431.2
485.1
539.0
5...
Measured data fetch
Query time to fetch 1 million rows
y = 4.217x + 349.251
y = 1.7383x + 678.141
0
1000
2000
3000
4000
50...
Example uses
Smart Metering
centralised
write
Smart Metering
centralised
read
Smart Metering
distributed federated
write
Smart Metering
distributed federated
read
Other uses...
http://www.alphaworks.ibm.com/tech/gaiandb
Image credits
Background: YouTube video “The Internet of Things”, IBM
http://www.youtube.com/watch?v=sfEbMV295Kk
Icons: DB...
Upcoming SlideShare
Loading in …5
×

GaianDB

3,315 views
3,121 views

Published on

presentation I gave on GaianDB - a dynamic federated distributed database available on IBM alphaWorks

The presentation wont make a lot of sense without speaker notes... which I've not written yet. Sorry about that.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,315
On SlideShare
0
From Embeds
0
Number of Embeds
281
Actions
Shares
0
Downloads
33
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

GaianDB

  1. 1. GaianDB A dynamic distributed federated database Dale Lane @dalelane
  2. 2. A massively over-simplified view of data-warehousing...
  3. 3. The “Internet of Things”
  4. 4. GaianDB a dynamic distributed federated database
  5. 5. Federated data
  6. 6. Network of distributed databases
  7. 7. A dynamic network
  8. 8. A dynamic network Biologically-Inspired Self-Organisation Exploit natural selection in nature to build better networks Robust self-organizing network architectures Frameworks and algorithms for robust fault-tolerant information dissemination Robust communications with minimal complexity or human control
  9. 9. Gaian database N0 N3 N11 N4 N5 N1 N2 N6 N7 N8 N10 N9 SQL Query N0 N3 N11 N4 N5 N1 N2 N6 N7 N8 N10 N9 SQL Query N0 N3 N11 N4 N5 N1 N2 N6 N7 N8 N10 N9 SQL Query N0 N3 N11 N4 N5 N1 N2 N6 N7 N8 N10 N9 SQL Queries Queries routed to all database nodes – a flood query, but retrieving only the data required to satisfy a query Exchanges query traffic in the network for data traffic – aiming to minimize total traffic Predicated on a concept of ‘store data locally - read data from anywhere’ paradigm
  10. 10. Architecture GaianDB Derby Engine: Parsing, Compilation, Execution GaianPStmtNode VTI: Executes queries on physical leaf nodes + Propagates the original SQL (+ queryID & steps state info) to linked Gaian nodes Instantiates Invokes costing methods Pushes columns and ‘where’ clause in a structure MQ(tt) Stream Data Original SQL DB2 Oracle MS SQLServer Sybase MySQL Flat files In-memory tables Derby GaianDB GaianDB GaianDB propagate Text Index Derby tables N0 N3 N11 N4 N5 N1 N2 N6 N7 N8 N10 N9 SQL Query N0 N3 N11 N4 N5 N1 N2 N6 N7 N8 N10 N9 SQL Query Expanded Node Multithreaded, breadth-first query propagation Loop detection/handling – no duplicates
  11. 11. Performance – with 1,250 nodes Query time for 1025 nodes, fetching up to 1025 rows from each y = 4.217x + 349.251 0 1000 2000 3000 4000 5000 6000 0 200 400 600 800 1000 1200 Row s fetched per node Time(milliseconds) Query Execute Time Total Query Time Linear (Total Query Time) Query Performance 0.0 53.9 107.8 161.7 215.6 269.5 323.4 377.3 431.2 485.1 539.0 0 200 400 600 800 1000 1200 Number of Nodes QueryTime(milliseconds) Average Query Time Predicted Max (Layers) Predicted Min (Layers)
  12. 12. Performance questions The time to propagate a query to all of the nodes in the database, as a function of the number of database nodes (N); The time to fetch data from across the nodes of the database to a single node, as a function of the volume of data; The time to fetch data from across the database to multiple nodes concurrently querying, as a function of the number of nodes concurrently querying.
  13. 13. Graph metrics The eccentricity ε(νi) of a graph vertex νi is the maximum graph distance between νi and any other vertex νj of G i.e. the "longest shortest path" between any two graph vertices (νi , νj) of the graph. The maximum eccentricity is the graph diameter Gd. The minimum graph eccentricity is the graph radius Gr. We define the size of G as the number of vertices N and the number of connections at each vertex as the vertex degree δi (1 < i ≤ N).
  14. 14. Biologically inspired self-organisation 0 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 1000 Number of Nodes (N) GraphDimension(edges) Radius Diameter (1+e)ln(N) (1-e)ln(N) Network growth by preferential attachment Using a fitness function at each node Limit maximum vertex degree =10 Gd = nint [ (1+e) * ln(N) ] Gr = nint [ (1-e) * ln(N) ] e = 0.24
  15. 15. Query propagation time The predicted maximum (Tmax) and minimum times (Tmin) to execute the flood query are: TL = link latency Tp = processor delay Tmax = (Gd + 1)(TL + Tp) Tmin = (Gr + 1)(TL + Tp) with the predicted execute query time from any node (Tν) being: Tν = (ε(ν) + 1)(TL + Tp) Hence substituting for ε(ν) Tν = nint[1 + B * ln(N) * (TL + Tp)]
  16. 16. Measured query propagation IndividualQueryTimeScalability 0.0 53.9 107.8 161.7 215.6 269.5 323.4 377.3 431.2 485.1 539.0 592.9 0 200 400 600 800 1000 1200 Number of Nodes QueryTime(ms) AverageQueryTime PredictedMax(Diameter+1) PredictedMin(Radius+1) Queriednodeeccentricity+1 Individual Query Time Scalability 0 53.9 107.8 161.7 215.6 269.5 323.4 0 50 100 Number ofNodes QueryTime(ms) Individual Query Times Average Query Time Queried node eccentricity+1
  17. 17. Measured data fetch Query time to fetch 1 million rows y = 4.217x + 349.251 y = 1.7383x + 678.141 0 1000 2000 3000 4000 5000 6000 0 200000 400000 600000 800000 1000000 1200000 Total Rows fetched Time(milliseconds) Total Query Time 1025 nodes Total Query Time 1 node Total Query Time 1 node indexed Linear (Total Query Time 1025 nodes) Linear (Total Query Time 1 node)
  18. 18. Example uses
  19. 19. Smart Metering centralised write
  20. 20. Smart Metering centralised read
  21. 21. Smart Metering distributed federated write
  22. 22. Smart Metering distributed federated read
  23. 23. Other uses...
  24. 24. http://www.alphaworks.ibm.com/tech/gaiandb
  25. 25. Image credits Background: YouTube video “The Internet of Things”, IBM http://www.youtube.com/watch?v=sfEbMV295Kk Icons: DB and envelope icons, Tim Morgan http://flickr.com/photos/timothymorgan/sets/1615269 Microsoft Excel icon, Vincent Garnier (courtesy of IconArchive) http://iconarchive.com/show/softdimension-icons-by-benjigarner/Excel-icon.html Photo of car mechanics, Tomas http://flickr.com/photos/tma/2264878 All other images original from GaianDB work

×