Datomic R-trees

James Sofra
@sofra
https://github.com/jsofra/datomic-rtree
Summary
●

Motivations

●

Datomic overview

●

Datomic R-tree implementation

●

Hilbert Curves

●

Bulk loading (via Hil...
Motivations
●

I have an interest in geospatial applications
–

●

e.g. Thunderstorm probability application
(THESPA)

Dat...
Why don't we have both?
Datomic Overview
●

Immutable database

●

Time-base facts (stored as entites)

●

ACID transactions

●

Expressive querie...
Datomic Overview Architecture
Datomic Motivations
●

Things that make Datomic appealing for spatial data
–

Time-base nature of Datomic is useful for ti...
R-trees
●

●

●

●

●

●

Efficient query of
multi-dimensional data
Groups nearby objects
Balanced (all leaf nodes at
same...
R-trees - Insertions
●
●

●

Choose a leaf node to insert
Insert entry into leaf node and enlarge
node
If node has more th...
Datomic R-tree - Schema
:rtree/root

:db.type/ref

:rtree/max-children

:db.type/long

:rtree/min-children

:db.type/long
...
Datomic R-tree choose-leaf
Datomic R-tree split-node
Datomic R-tree pick-seeds
Datomic R-tree - pick-next
Datomic R-tree –
regular transaction
Transaction for
adding new entry,
calls database
function
Database function

New entr...
Datomic R-tree –
split transaction
New entry
Remove root
Create new
leaf nodes

Add new root
Bulk loading
●

Issues with single insertion loading of R-tree
–
–

●

●

●

Becomes slow with with many insertions
The re...
Bulk loading – sort based loading
●

Aims for better R-tree performance

●

Bottom-up approach

●

Sorts all entities in a...
Hilbert Curves
●

●

●

●

a continuous fractal
space-filling curve
first described by
mathematician David Hilbert in
1891...
Hilbert Curves
●

●

●

●

a continuous fractal
space-filling curve
first described by
mathematician David Hilbert in
1891...
Bulk loading – hilbert sort based

●

Better Hilbert partitioning
Bulk loading via Hilbert curves
●

●

●

●

Insert all entities into Datomic (or using
existing entities)
Entities include...
Bulk - hilbert-ents

Takes advantage of Datomic index API to get direct
access to the Hilbert index
Bulk - min-cost-index

List of options for the
next partition point
Must be at least
min-children in the
partition
Bulk - cost-partition
Bulk - p-cost-partition
Bulk - dyn-cost-partition
Conclusions
●

It works!
(install-single-insertions conn 50000 20 10)
–

"Elapsed time: 119114.342783 msecs"

(install-and...
Future plans
●

Retractions and updates

●

Bulk insertions

●

More search and query support

●

●

Schema for supporting...
Questions?

Thanks you! Any questions?
James Sofra
@sofra
Other Interesting
Resources
●

●

"The R*-tree: an efficient and robust access method for points
and rectangles"
“OMT: Ove...
Upcoming SlideShare
Loading in...5
×

Datomic R-trees

1,551

Published on

Slides for a talk given at Melbourne Functional Users Group on an R-tree based spatial indexer for Datomic.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,551
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Datomic R-trees"

  1. 1. Datomic R-trees James Sofra @sofra https://github.com/jsofra/datomic-rtree
  2. 2. Summary ● Motivations ● Datomic overview ● Datomic R-tree implementation ● Hilbert Curves ● Bulk loading (via Hilbert Curves) ● Future plans
  3. 3. Motivations ● I have an interest in geospatial applications – ● e.g. Thunderstorm probability application (THESPA) Datomic is an interesting database that makes different trade-offs to other databases – Wonder how far we can take the ability to describe arbitrary structures in Datomic
  4. 4. Why don't we have both?
  5. 5. Datomic Overview ● Immutable database ● Time-base facts (stored as entites) ● ACID transactions ● Expressive queries using Datalog ● Pluggable storage ● Flexible enough to act as row, column or graph database ● Schema that describes attributes that can be attached to entities – ● Attributes have a type; String, Long, Double, Inst, Ref etc. Database functions – Stored in the database, see the in transaction value
  6. 6. Datomic Overview Architecture
  7. 7. Datomic Motivations ● Things that make Datomic appealing for spatial data – Time-base nature of Datomic is useful for time series data which we often have – No need to add spatial operations (union, intersection, etc.) to the database, can be handled by libraries in the peers – Spatial indexes can be stored as regular data, allows for a lot of freedom over choice of index, handling multiple indexes over subsets of the data in space and time – Flexible entity structures are useful because spatial data frequently does not fit nicely in a table – Immutability is surprisingly useful in lots of different applications!
  8. 8. R-trees ● ● ● ● ● ● Efficient query of multi-dimensional data Groups nearby objects Balanced (all leaf nodes at same level) Aims for nodes minimise empty space coverage and overlap Designed for storage on disk (as used in databases) "R-Trees: A Dynamic Index Structure for Spatial Searching" – Guttman, A (1984)
  9. 9. R-trees - Insertions ● ● ● Choose a leaf node to insert Insert entry into leaf node and enlarge node If node has more than max number of children split the node and propagate enlargement and splits up tree
  10. 10. Datomic R-tree - Schema :rtree/root :db.type/ref :rtree/max-children :db.type/long :rtree/min-children :db.type/long :node/children :db.type/ref :node/is-leaf? :db.type/boolean :node/entry :db.type/ref :bbox/min-x :db.type/double :bbox/min-y :db.type/double :bbox/max-x :db.type/double :bbox/max-y :db.type/double
  11. 11. Datomic R-tree choose-leaf
  12. 12. Datomic R-tree split-node
  13. 13. Datomic R-tree pick-seeds
  14. 14. Datomic R-tree - pick-next
  15. 15. Datomic R-tree – regular transaction Transaction for adding new entry, calls database function Database function New entry with new ID Add new entry as child to leaf node
  16. 16. Datomic R-tree – split transaction New entry Remove root Create new leaf nodes Add new root
  17. 17. Bulk loading ● Issues with single insertion loading of R-tree – – ● ● ● Becomes slow with with many insertions The resulting tree is not as always as efficient as it could be Bulk loading builds a tree once from a number of entities Two basic approaches top-down and bottom-up Bulk loading does not imply bulk insertion
  18. 18. Bulk loading – sort based loading ● Aims for better R-tree performance ● Bottom-up approach ● Sorts all entities in an order that aims to preserve locality ● ● ● Partitions the entities into clusters that are (hopefully) spatially collocated Recursively apply partitioning to build up the tree “Sort-based Query-adaptive Loading of R-trees” – ● D. Achakeev; B. Seeger; P. Widmayer (2012) “Sort-based parallel loading of R-trees” – D. Achakeev; M. Seidemann; M. Schmidt; B. Seeger (2012)
  19. 19. Hilbert Curves ● ● ● ● a continuous fractal space-filling curve first described by mathematician David Hilbert in 1891 useful because it enables mapping from 2D to 1D preserving some notion of locality Other options are; Peano curve, Z-order curve (aka Morton Curve)
  20. 20. Hilbert Curves ● ● ● ● a continuous fractal space-filling curve first described by mathematician David Hilbert in 1891 useful because it enables mapping from 2D to 1D preserving some notion of locality Other options are; Peano curve, Z-order curve (aka Morton Curve)
  21. 21. Bulk loading – hilbert sort based ● Better Hilbert partitioning
  22. 22. Bulk loading via Hilbert curves ● ● ● ● Insert all entities into Datomic (or using existing entities) Entities include an indexed Hilbert value attribute Obtain a seq of the entities using the :avet index with the Hilbert value Perform partioning
  23. 23. Bulk - hilbert-ents Takes advantage of Datomic index API to get direct access to the Hilbert index
  24. 24. Bulk - min-cost-index List of options for the next partition point Must be at least min-children in the partition
  25. 25. Bulk - cost-partition
  26. 26. Bulk - p-cost-partition
  27. 27. Bulk - dyn-cost-partition
  28. 28. Conclusions ● It works! (install-single-insertions conn 50000 20 10) – "Elapsed time: 119114.342783 msecs" (install-and-bulk-load conn 50000 20 10) – "Elapsed time: 6511.543299 msecs" (time (naive-intersecting all-entries search-box)) – "Elapsed time: 870.575802 msecs" (time (intersecting root search-box)) – "Elapsed time: 2.927883 msecs" * note these times should be regarded with suspicion since they only use the in memory database
  29. 29. Future plans ● Retractions and updates ● Bulk insertions ● More search and query support ● ● Schema for supporting Meridian Shapes and Features Investigate other R-trees; R* tree, R+ tree
  30. 30. Questions? Thanks you! Any questions? James Sofra @sofra
  31. 31. Other Interesting Resources ● ● "The R*-tree: an efficient and robust access method for points and rectangles" “OMT: Overlap Minimizing Top-down Bulk Loading Algorithm for R-tree.” – ● “The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree” – ● L. Arge; M. de Berg; K. Yi (2004) “Compact Hilbert Indices” – ● T. Lee; S. Lee (2003) Hamilton. C (2006) “R-Trees: Theory and Applications” – Manolopoulos. Y; Nanopoulos. A; Papadopoulos. A. N; Theodoridis. Y (2006)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×