Your SlideShare is downloading. ×
FluidDB
Design ⊳ Architecture ⊳ Tradeoffs

           Terry Jones

           terry@jon.es
           @terrycojones
Or...
          How someone
who knows nothing about databases
    started building a database
Early days

FluidDB is not yet released

It’s an experiment

Much remains to be done

Alpha release real soon now (end of ...
Fluidinfo

Founded in the UK in March 2006

Two full-time employees

Development in Barcelona

Friends, family, & Esther D...
Motivations

How humans work with information: Concepts

User interface and API difficulties / restrictions

Personalizatio...
Concepts
             Are not owned
         No formal structure
       No permission is needed
     Partial or full disor...
Concepts
               Are not owned
           No formal structure
         No permission is needed
       Partial or fu...
UIs and APIs
Where did my information go?
Can I extract, re-use, add, delete?
Is there an API?
What am I allowed to do?
Ca...
Personalization
In our hands vs on our behalf
 Add anything to anything, and search on it

 Protect, share, delete as you ...
Read ⊳ Search ⊳ Write ?

 The web is readable (c. 1990)
 And searchable (c. 1995)
 But not generally writable
Read ⊳ Search ⊳ Write ?

 The web is readable (c. 1990)
 And searchable (c. 1995)
 But not generally writable

 Plus, sear...
Why don’t our architectures
let us work with information
        more flexibly?
What would it take
to build something that did?
How can we address
all these problems at once ?
Alan Kay
At PARC we had a slogan: “Point of view is worth 80 IQ
points.” It was based on a few things from the past like
h...
Alan Kay
At PARC we had a slogan: “Point of view is worth 80 IQ
points.” It was based on a few things from the past like
h...
1             1
     2            10
     4           100
     8          1000
   16          10000
   32         100000
 ...
VIII
 XVII
 XLIV
LXXX
XCVI
CCLV
VIII
 XVII
 XLIV
LXXX
XCVI
CCLV +
VIII
 XVII
 XLIV
LXXX
XCVI
CCLV +
   D
VIII
 XVII
 XLIV
LXXX
XCVI
CCLV +
   D
8 queens problem
Use a poor representation with
264 (281,474,976,710,656) states:

Look for a smart algorithm!


         ...
Objects
      digg.com/date May 21, 2009
         meg/web/rating 6
         tim/seen true
   about http://digg.com/news.ht...
What else?
Objects have no owner, though attributes do
No metadata/data duality
Unlimited aggregation & combination of inf...
FluidDB
Objects composed of attributes with values

Attributes in namespaces: joe/rating, amazon.com/price

Simple query l...
FluidDB

Single-instance hosted database (like SimpleDB)

Enables sharing between applications, people

Distributed storag...
Design goals
Simple data model
Simple permissions model
Simple, easily parallelizable, query language
Horizontally scalabl...
Permissions

For each action on a namespace or attribute:

  There’s a policy: ‘open’ or ‘closed’

  And a (possibly empty...
Permissions
attribute or
               action   policy   exceptions
namespace
  tim/seen      read    closed    tim, meg
...
Query language
Numeric: attribute value (=, <, etc.)
Textual: attribute text match
Presence: has attribute
Grouping/logic:...
Query language
   Numeric: attribute value (=, <, etc.)
   Textual: attribute text match
   Presence: has attribute
   Gro...
Demo
Data buffet
username seen, lastvisited, rating, goingtoread, comment
username me, FBfriend, linkedInContact, met, family
u...
Data buffet 2
email messageId, fromId, toId
username/email
 from, to, subject, date
alexa.com rank
digg.com title, descrip...
Example queries
terry/rating > 5 and has reddit.com/score

has goingtoread and seen > quot;January 1, 2008quot;

has FBfri...
More queries
terry/seen > quot;July 1, 2007quot;

has russell/myusername/name and not has
terry/myusername/name

flickr.com...
Architecture
 Permissions
 Software
 Communications
 Functional
 Storage
 Query processing
 Per box
Twisted

Asynchronous Python libraries

We’re using Python because of Twisted

Event-driven, using deferreds with callback...
AMQP
Standardized, high perfomance messaging

Backed by JP Morgan, Novell, MS, Redhat

Fully asynchronous, up to 600K mess...
Thrift
Released by Facebook

Serialization of structures

Definition of services

RPC

We released txThrift
Communications

     AMQP
d
d
                  Functional
                                    objects
           http   facade    sets
    apps    ...
d
d
           Functional
                            objects
    http   facade   sets
                           objects
...
d
d
           Functional
                           objects
    http   facade   sets
                           objects
 ...
d
d
           Functional
                           objects
    http   facade   sets
                           objects
 ...
d
d
            Functional
                            objects
    http    facade   sets
                            objec...
d
d
                   Functional
                                   objects
    load   http    facade   sets
            ...
d
d
                   Functional
                                       objects
    load   http    facade     sets
      ...
d
d
                   Functional
                                       objects   db
    load   http    facade     sets
 ...
d
d
                   Functional
                                       objects   db   kv
    load   http    facade     s...
Attribute Storage
meg/rating                  tim/books/opinion
object id user id   value   object id user id    value
123...
Query processing

                       and

digg/date > “Monday”     or

        meg/rating > 5        has tim/seen
Query processing
                       attr   set ops

digg/date > “Monday”


meg/rating > 5


has tim/seen
Attribute affinity
                       attr   set ops

digg/date > “Monday”




meg/rating > 5
has tim/seen
Per box
A controller service, launched on boot

The controller launches new services (processes)

All services talk AMQP a...
Upcoming SlideShare
Loading in...5
×

The design, architecture, and tradeoffs of FluidDB

5,257

Published on

Slides from a talk given on May 22, 2009 at PGCon in Ottawa. Abstract and more details at http://www.pgcon.org/2009/schedule/events/176.en.html

Video available shortly.

Published in: Technology
2 Comments
7 Likes
Statistics
Notes
  • Crazy comment by 98545. We need new and inspired ways of thinking to drive innovation. 'Outside the box' as they say.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Slide 2: 'How someone who knows nothing about databases started building a database'

    ... by making all the same mistakes people already figured out 40 years ago. Welcome to the '60s, folks, network databases are back!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
5,257
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
2
Likes
7
Embeds 0
No embeds

No notes for slide






































































































  • Transcript of "The design, architecture, and tradeoffs of FluidDB"

    1. 1. FluidDB Design ⊳ Architecture ⊳ Tradeoffs Terry Jones terry@jon.es @terrycojones
    2. 2. Or... How someone who knows nothing about databases started building a database
    3. 3. Early days FluidDB is not yet released It’s an experiment Much remains to be done Alpha release real soon now (end of June?)
    4. 4. Fluidinfo Founded in the UK in March 2006 Two full-time employees Development in Barcelona Friends, family, & Esther Dyson funded
    5. 5. Motivations How humans work with information: Concepts User interface and API difficulties / restrictions Personalization Make the world more writable
    6. 6. Concepts Are not owned No formal structure No permission is needed Partial or full disorganization No predefined set of qualities No special central piece of content Easy to reorganize or multiply organize
    7. 7. Concepts Are not owned No formal structure No permission is needed Partial or full disorganization No predefined set of qualities No special central piece of content Easy to reorganize or multiply organize To engineer this kind of flexibility, we must rethink control
    8. 8. UIs and APIs Where did my information go? Can I extract, re-use, add, delete? Is there an API? What am I allowed to do? Can I search? On what? Special pleading Wouldn't it be cool if...
    9. 9. Personalization In our hands vs on our behalf Add anything to anything, and search on it Protect, share, delete as you wish Organize things as you please Freedom of choice in applications
    10. 10. Read ⊳ Search ⊳ Write ? The web is readable (c. 1990) And searchable (c. 1995) But not generally writable
    11. 11. Read ⊳ Search ⊳ Write ? The web is readable (c. 1990) And searchable (c. 1995) But not generally writable Plus, search is not very interactive: Dull: refine query or ask for more results Can’t change results Can’t organize
    12. 12. Why don’t our architectures let us work with information more flexibly?
    13. 13. What would it take to build something that did?
    14. 14. How can we address all these problems at once ?
    15. 15. Alan Kay At PARC we had a slogan: “Point of view is worth 80 IQ points.” It was based on a few things from the past like how smart you had to be in Roman times to multiply two numbers together; only geniuses did it. We haven't gotten any smarter, we've just changed our representation system. We think better generally by inventing better representations; that's something that we as computer scientists recognize as one of the main things that we try to do.
    16. 16. Alan Kay At PARC we had a slogan: “Point of view is worth 80 IQ points.” It was based on a few things from the past like how smart you had to be in Roman times to multiply two numbers together; only geniuses did it. We haven't gotten any smarter, we've just changed our representation system. We think better generally by inventing better representations; that's something that we as computer scientists recognize as one of the main things that we try to do.
    17. 17. 1 1 2 10 4 100 8 1000 16 10000 32 100000 64 1000000 128 10000000 256 100000000 512 1000000000 1024 + 10000000000 + ?????? 11111111111
    18. 18. VIII XVII XLIV LXXX XCVI CCLV
    19. 19. VIII XVII XLIV LXXX XCVI CCLV +
    20. 20. VIII XVII XLIV LXXX XCVI CCLV + D
    21. 21. VIII XVII XLIV LXXX XCVI CCLV + D
    22. 22. 8 queens problem Use a poor representation with 264 (281,474,976,710,656) states: Look for a smart algorithm! Or... Use a good representation with 8! (40,320) states and exhaustive search as an algorithm!
    23. 23. Objects digg.com/date May 21, 2009 meg/web/rating 6 tim/seen true about http://digg.com/news.html mike/opinion “half-baked nonsense”
    24. 24. What else? Objects have no owner, though attributes do No metadata/data duality Unlimited aggregation & combination of information No a priori organization Search, search, search! Dynamic data structures
    25. 25. FluidDB Objects composed of attributes with values Attributes in namespaces: joe/rating, amazon.com/price Simple query language Simple API: HTTP, JSON, REST etc. Users, attributes, namespaces each have an object Applications are users too
    26. 26. FluidDB Single-instance hosted database (like SimpleDB) Enables sharing between applications, people Distributed storage and query processing
    27. 27. Design goals Simple data model Simple permissions model Simple, easily parallelizable, query language Horizontally scalable Fast for common tasks Implementable!
    28. 28. Permissions For each action on a namespace or attribute: There’s a policy: ‘open’ or ‘closed’ And a (possibly empty) list of exceptions
    29. 29. Permissions attribute or action policy exceptions namespace tim/seen read closed tim, meg mike/opinion update open mike/ create closed mike meg/rating see open meg/rating read closed meg
    30. 30. Query language Numeric: attribute value (=, <, etc.) Textual: attribute text match Presence: has attribute Grouping/logic: (...), and, or, not
    31. 31. Query language Numeric: attribute value (=, <, etc.) Textual: attribute text match Presence: has attribute Grouping/logic: (...), and, or, not Designed to bring back object ids
    32. 32. Demo
    33. 33. Data buffet username seen, lastvisited, rating, goingtoread, comment username me, FBfriend, linkedInContact, met, family username/myusername name, password flickr.com longitude, latitude, owner, date, camera/make username/flickr title, description VCspotter fredwilson, bradburnham nasdaq.com name, symbol, type, outstanding, value, price username/stocks shares, date google.com pagerank tracks album, artist, name, year username/music count, favorite, lastPlay, stars, bestOf2007
    34. 34. Data buffet 2 email messageId, fromId, toId username/email from, to, subject, date alexa.com rank digg.com title, description, date, diggs mahalo.com appeared, category readwriteweb.com appeared reddit.com date, score techcrunch.com appeared, URI attribute description, name, path namespace description, name, path
    35. 35. Example queries terry/rating > 5 and has reddit.com/score has goingtoread and seen > quot;January 1, 2008quot; has FBfriend and has linkedInContact has james/FBfriend and not has anne/FBfriend alexa.com/rank < 50 and fred/comment ~ cool has reddit.com/score and not has digg.com/diggs has readwriteweb.com/appeared and not has techcrunch.com/appeared
    36. 36. More queries terry/seen > quot;July 1, 2007quot; has russell/myusername/name and not has terry/myusername/name flickr.com/latitude > 52.15 and flickr.com/ latitude < 52.35 and flickr.com/camera/make ~ Sony and has sally/seen amazon.com/stars > 3 and amazon.com/price < 20 and amazon.com/title ~ chess and peter/ bookrating > 3 sort by amazon.com/ publication-date
    37. 37. Architecture Permissions Software Communications Functional Storage Query processing Per box
    38. 38. Twisted Asynchronous Python libraries We’re using Python because of Twisted Event-driven, using deferreds with callbacks Steep learning curve, documentation is patchy
    39. 39. AMQP Standardized, high perfomance messaging Backed by JP Morgan, Novell, MS, Redhat Fully asynchronous, up to 600K messages/sec Exchanges, queues, bindings We released txAMQP
    40. 40. Thrift Released by Facebook Serialization of structures Definition of services RPC We released txThrift
    41. 41. Communications AMQP
    42. 42. d d Functional objects http facade sets apps objects http facade sets namespace coord namespace coord attr attr control text text
    43. 43. d d Functional objects http facade sets objects http facade sets namespace coord namespace coord attr attr text text
    44. 44. d d Functional objects http facade sets objects http facade sets namespace coord amqp namespace coord amqp attr attr text text
    45. 45. d d Functional objects http facade sets objects http facade sets namespace xmpp coord amqp namespace xmpp coord amqp attr attr text text
    46. 46. d d Functional objects http facade sets objects http facade sets namespace xmpp coord amqp namespace xmpp coord amqp attr other attr other text text
    47. 47. d d Functional objects load http facade sets objects load http facade sets namespace load xmpp coord amqp namespace load xmpp coord amqp attr other attr other text text
    48. 48. d d Functional objects load http facade sets objects load http facade sets namespace load xmpp coord amqp namespace load xmpp coord amqp attr other memcache attr other text memcache text
    49. 49. d d Functional objects db load http facade sets objects db load http facade sets namespace db load xmpp coord amqp namespace db load xmpp coord amqp attr db other memcache attr db other text db memcache text db
    50. 50. d d Functional objects db kv load http facade sets objects db load http facade sets namespace db load xmpp coord amqp namespace db load xmpp coord amqp attr db other memcache attr db other text db memcache text db
    51. 51. Attribute Storage meg/rating tim/books/opinion object id user id value object id user id value 1234567 667 26 526141 362 nice 6527527 667 188 726483 362 fun 2876281 17 207 635378 362 boring 7628876 667 1225 477582 362 sexy 362782 362 long PostgreSQL Tall tables Independent (column store) Backed by key/value store (Amazon S3, for now)
    52. 52. Query processing and digg/date > “Monday” or meg/rating > 5 has tim/seen
    53. 53. Query processing attr set ops digg/date > “Monday” meg/rating > 5 has tim/seen
    54. 54. Attribute affinity attr set ops digg/date > “Monday” meg/rating > 5 has tim/seen
    55. 55. Per box A controller service, launched on boot The controller launches new services (processes) All services talk AMQP as well as pure Thrift A coordinator brings up new boxes & services We use Amazon EC2, for now

    ×