FluidDB
  Terry Jones

  terry@jon.es
  @terrycojones
Proposal

  FluidDB is a hosted database
 we (Fluidinfo) will launch - by
 hook or by crook - into Alpha
shortly before Eu...
Agenda

51% me: Motivations, high-level view of FluidDB
49% you: Questions, details, API, architecture, etc.
Motivations

Contrasts in how we work with information

User interface and API difficulties / restrictions

Personalization...
Concepts
             Are not owned.
         No formal structure.
       No permission is needed.
     Partial or full di...
Concepts
               Are not owned.
           No formal structure.
         No permission is needed.
       Partial or...
UIs and APIs
Where did my information go?
Can I extract, re-use, add, delete?
Is there an API?
What am I allowed to do?
Ca...
Personalization
In our hands, not just on our behalf

   Add anything to anything, and search on it

   Protect, share, co...
Read ⊳ Search ⊳ Write ?

 The web is readable (c. 1990)
 And searchable (c. 1995)
 But not generally writable
Read ⊳ Search ⊳ Write ?

 The web is readable (c. 1990)
 And searchable (c. 1995)
 But not generally writable

 Plus, sear...
Why don’t our architectures
let us work with information
        more flexibly?
What would it take
to build something that did?
How can we address
all these problems at once ?
Alan Kay
At PARC we had a slogan: “Point of view is worth 80 IQ
points.” It was based on a few things from the past like
h...
Alan Kay
At PARC we had a slogan: “Point of view is worth 80 IQ
points.” It was based on a few things from the past like
h...
1             1
     2            10
     4           100
     8          1000
   16          10000
   32         100000
 ...
VIII
 XVII
 XLIV
LXXX
XCVI
CCLV
VIII
 XVII
 XLIV
LXXX
XCVI
CCLV +
VIII
 XVII
 XLIV
LXXX
XCVI
CCLV +
   D
VIII
 XVII
 XLIV
LXXX
XCVI
CCLV +
   D
8 queens problem
Use a poor representation with
264 (281,474,976,710,656) states:

Look for a smart algorithm!


         ...
A humble FluidDB object

         digg.com/date May 21, 2009
            meg/web/rating 6
            tim/seen True
      ...
A humble FluidDB object

         digg.com/date May 21, 2009

   NO       meg/web/rating 6
            tim/seen true


  O...
Consequences
Where to put related information / metadata?
How to get permission?
No need to anticipate
How to organize?
Ho...
A handy object
A handy object
A handy object
A handy object
A handy object
A handy object
A handy object
In FluidDB
    europython.eu/2009/time 11:30am
       europython.eu/2009/date June 29, 2009
      europython.eu/2009/durat...
In FluidDB
terry/rating 8
                 terry/will-attend
                 europython.eu/2009/time 11:30am
            ...
In FluidDB
terry/rating 8
                 terry/will-attend
                 europython.eu/2009/time 11:30am
            ...
Double-click to edit

   Can you add to this?
   Why not?
   Are you blogging/tweeting about it?
A folder
                                      jack/interesting
digg.com/date        russell/seen       amazon.com/price
 ...
A folder
                                                   jack/interesting
digg.com/date               russell/seen     ...
A folder
                                                             jack/interesting
          digg.com/date            ...
A folder
                                                             jack/interesting
          digg.com/date            ...
A folder
                                                             jack/interesting
          digg.com/date            ...
A folder
                                                             jack/interesting
          digg.com/date            ...
A folder
folder-manager/name-624CA19 chapter 1
                                                               jack/interes...
A folder
folder-manager/name-624CA19 chapter 1
                                                               jack/interes...
What’s FluidDB about?


   Whatever you like
A voting box
A voting box

       pat/vote
A voting box

       pat/vote



       james/vote
A voting box

         pat/vote



          james/vote

     sally/vote
A voting box

         pat/vote

            tim/vote

          james/vote

     sally/vote
A voting box
 russell/vote
                pat/vote

                  tim/vote

                james/vote

           sa...
A voting box
    russell/vote
                   pat/vote

                     tim/vote

                   james/vote

 ...
A voting box
    russell/vote
                   pat/vote

                     tim/vote



              sally/vote
sofia/...
Inter-process comms
Inter-process comms

producer/data/01
Inter-process comms

producer/data/01

 producer/data/02
Inter-process comms

producer/data/01    consumer1/seq 02
 producer/data/02
Inter-process comms

producer/data/01    consumer1/seq 02
 producer/data/02

producer/data/03
Inter-process comms

producer/data/01    consumer1/seq 02
 producer/data/02

producer/data/03
                    consumer...
Inter-process comms

                    consumer1/seq 02
 producer/data/02

producer/data/03
                    consumer...
Social data

Data that increases in value when it’s co-located
All data is social, once we begin to interact with it
Info in context
Information is more useful in context:
  Google
  Wikipedia
FluidDB can do a similiar thing, but for DBs &...
FluidDB

Database of these simple objects

Enables sharing between applications, people

Single-instance, hosted (like Sim...
Sack the golden towns of Montezuma!

  “My dear fellow,” Burlingame said caustically, “we sit on a
  blind rock careening ...
Sack the golden towns of Montezuma!

  “My dear fellow,” Burlingame said caustically, “we sit on a
  blind rock careening ...
Challenge
Engineer a system that can efficiently hold an
effectively unlimited number of these objects

Let each object hav...
OK, your 49%
Questions        Apps

HTTP API         Storage

Query language   Architecture

Python           Data

Permis...
Design goals
Simple data model
Simple permissions model
Simple, easily parallelizable, query language
Horizontally scalabl...
Permissions

For each action on a namespace or tag:

  There’s a policy: ‘open’ or ‘closed’

  And a (possibly empty) list...
Permissions
  tag or
               action   policy   exceptions
namespace
  tim/seen      read    closed    tim, meg
mike...
Query language
Numeric: tag value (=, <, etc.)
Textual: tag text match
Presence: has attribute
Grouping/logic: (...), and,...
Query language
   Numeric: tag value (=, <, etc.)
   Textual: tag text match
   Presence: has attribute
   Grouping/logic:...
Data buffet
username seen, lastvisited, rating, goingtoread, comment
username me, FBfriend, linkedInContact, met, family
u...
Data buffet 2
email messageId, fromId, toId
username/email
 from, to, subject, date
alexa.com rank
digg.com title, descrip...
Example queries
terry/rating > 5 and has reddit.com/score

has goingtoread and seen > "January 1, 2008"

has FBfriend and ...
More queries
terry/seen > "July 1, 2007"

has russell/myusername/name and not has
terry/myusername/name

flickr.com/latitud...
Architecture

 Software
 Communications
 Functional
 Storage
 Query processing
 Per box
Software
Python

Twisted

PostgreSQL

Thrift

AMQP (RabbitMQ)

Lucene
d
d
                  Functional
                                    objects
           http   facade    sets
    apps    ...
d
d
           Functional
                            objects
    http   facade   sets
                           objects
...
d
d
           Functional
                           objects
    http   facade   sets
                           objects
 ...
d
d
           Functional
                           objects
    http   facade   sets
                           objects
 ...
d
d
            Functional
                            objects
    http    facade   sets
                            objec...
d
d
                   Functional
                                   objects
    load   http    facade   sets
            ...
d
d
                   Functional
                                       objects
    load   http    facade     sets
      ...
d
d
                   Functional
                                       objects   db
    load   http    facade     sets
 ...
d
d
                   Functional
                                       objects   db   kv
    load   http    facade     s...
Tag Storage
meg/rating                  tim/books/opinion
object id user id   value   object id user id    value
1234567 6...
Query processing

                       and

digg/date > “Monday”     or

        meg/rating > 5        has tim/seen
Query processing
                       tag   set ops

digg/date > “Monday”


meg/rating > 5


has tim/seen
Tag affinity
                       attr   set ops

digg/date > “Monday”




meg/rating > 5
has tim/seen
Per box
A controller service, launched on boot

The controller launches new services (processes)

All services talk AMQP a...
Upcoming SlideShare
Loading in...5
×

Introducing FluidDB

3,458

Published on

Talk by Terry Jones of Fluidinfo at EuroPython. June 30, 2009. Birmingham.

Published in: Technology, Spiritual, Business
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,458
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
90
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Introducing FluidDB

  1. 1. FluidDB Terry Jones terry@jon.es @terrycojones
  2. 2. Proposal FluidDB is a hosted database we (Fluidinfo) will launch - by hook or by crook - into Alpha shortly before EuroPython 2009
  3. 3. Agenda 51% me: Motivations, high-level view of FluidDB 49% you: Questions, details, API, architecture, etc.
  4. 4. Motivations Contrasts in how we work with information User interface and API difficulties / restrictions Personalization The computational world is not very writable!
  5. 5. Concepts Are not owned. No formal structure. No permission is needed. Partial or full disorganization. No predefined set of qualities. No special central piece of content. Easy to reorganize or multiply organize.
  6. 6. Concepts Are not owned. No formal structure. No permission is needed. Partial or full disorganization. No predefined set of qualities. No special central piece of content. Easy to reorganize or multiply organize. To engineer this kind of flexibility, we must rethink control
  7. 7. UIs and APIs Where did my information go? Can I extract, re-use, add, delete? Is there an API? What am I allowed to do? Can I search? On what? Special pleading Wouldn't it be cool if...
  8. 8. Personalization In our hands, not just on our behalf Add anything to anything, and search on it Protect, share, combine, delete as you wish Organize things as you please
  9. 9. Read ⊳ Search ⊳ Write ? The web is readable (c. 1990) And searchable (c. 1995) But not generally writable
  10. 10. Read ⊳ Search ⊳ Write ? The web is readable (c. 1990) And searchable (c. 1995) But not generally writable Plus, search is not very interactive: Dull: refine query or ask for more results Can’t change results Can’t organize
  11. 11. Why don’t our architectures let us work with information more flexibly?
  12. 12. What would it take to build something that did?
  13. 13. How can we address all these problems at once ?
  14. 14. Alan Kay At PARC we had a slogan: “Point of view is worth 80 IQ points.” It was based on a few things from the past like how smart you had to be in Roman times to multiply two numbers together; only geniuses did it. We haven’t gotten any smarter, we’ve just changed our representation system. We think better generally by inventing better representations; that’s something that we as computer scientists recognize as one of the main things that we try to do.
  15. 15. Alan Kay At PARC we had a slogan: “Point of view is worth 80 IQ points.” It was based on a few things from the past like how smart you had to be in Roman times to multiply two numbers together; only geniuses did it. We haven’t gotten any smarter, we’ve just changed our representation system. We think better generally by inventing better representations; that’s something that we as computer scientists recognize as one of the main things that we try to do.
  16. 16. 1 1 2 10 4 100 8 1000 16 10000 32 100000 64 1000000 128 10000000 256 100000000 512 1000000000 1024 + 10000000000 + ?????? 11111111111
  17. 17. VIII XVII XLIV LXXX XCVI CCLV
  18. 18. VIII XVII XLIV LXXX XCVI CCLV +
  19. 19. VIII XVII XLIV LXXX XCVI CCLV + D
  20. 20. VIII XVII XLIV LXXX XCVI CCLV + D
  21. 21. 8 queens problem Use a poor representation with 264 (281,474,976,710,656) states: Look for a smart algorithm! Or... Use a good representation with 8! (40,320) states and exhaustive search as an algorithm!
  22. 22. A humble FluidDB object digg.com/date May 21, 2009 meg/web/rating 6 tim/seen True about http://digg.com/news.html mike/opinion “half-baked nonsense”
  23. 23. A humble FluidDB object digg.com/date May 21, 2009 NO meg/web/rating 6 tim/seen true OWNER! about http://digg.com/news.html mike/opinion “half-baked nonsense”
  24. 24. Consequences Where to put related information / metadata? How to get permission? No need to anticipate How to organize? How to personalize? Continue to own your data Level playing field: no rules, & it’s OK to be late Makes the world writable by default
  25. 25. A handy object
  26. 26. A handy object
  27. 27. A handy object
  28. 28. A handy object
  29. 29. A handy object
  30. 30. A handy object
  31. 31. A handy object
  32. 32. In FluidDB europython.eu/2009/time 11:30am europython.eu/2009/date June 29, 2009 europython.eu/2009/duration 30 europython.eu/2009/speaker Esteve Fernández europython.eu/2009/title Twisted, AMQP and Thrift europython.eu/2009/room Arena Foyer
  33. 33. In FluidDB terry/rating 8 terry/will-attend europython.eu/2009/time 11:30am europython.eu/2009/date June 29, 2009 europython.eu/2009/duration 30 europython.eu/2009/speaker Esteve Fernández europython.eu/2009/title Twisted, AMQP and Thrift europython.eu/2009/room Arena Foyer
  34. 34. In FluidDB terry/rating 8 terry/will-attend europython.eu/2009/time 11:30am europython.eu/2009/date June 29, 2009 europython.eu/2009/duration 30 europython.eu/2009/speaker Esteve Fernández europython.eu/2009/title Twisted, AMQP and Thrift europython.eu/2009/room Arena Foyer russell/will-attend
  35. 35. Double-click to edit Can you add to this? Why not? Are you blogging/tweeting about it?
  36. 36. A folder jack/interesting digg.com/date russell/seen amazon.com/price meg/web/rating james/stars sam/wishlist tim/seen tim/own about about mike/opinion sally/read-it
  37. 37. A folder jack/interesting digg.com/date russell/seen amazon.com/price meg/web/rating james/stars sam/wishlist tim/seen tim/own about about mike/opinion sally/read-it folder-manager/tag folder-manager/tag-F5283AC21
  38. 38. A folder jack/interesting digg.com/date russell/seen amazon.com/price meg/web/rating james/stars sam/wishlist tim/seen tim/own about about mike/opinion sally/read-it folder-manager/tag-F5283AC21 folder-manager/tag folder-manager/tag-F5283AC21
  39. 39. A folder jack/interesting digg.com/date russell/seen amazon.com/price meg/web/rating james/stars sam/wishlist tim/seen tim/own about about mike/opinion sally/read-it folder-manager/tag-F5283AC21 folder-manager/tag-F5283AC21 folder-manager/tag folder-manager/tag-F5283AC21
  40. 40. A folder jack/interesting digg.com/date russell/seen amazon.com/price meg/web/rating james/stars sam/wishlist tim/seen tim/own folder-manager/tag-F5283AC21 about about mike/opinion sally/read-it folder-manager/tag-F5283AC21 folder-manager/tag-F5283AC21 folder-manager/tag folder-manager/tag-F5283AC21
  41. 41. A folder jack/interesting digg.com/date russell/seen amazon.com/price meg/web/rating james/stars sam/wishlist tim/seen tim/own folder-manager/tag-F5283AC21 about about mike/opinion sally/read-it folder-manager/tag-F5283AC21 folder-manager/tag-F5283AC21 folder-manager/tag folder-manager/tag-F5283AC21 folder-manager/name folder-manager/name-624CA19
  42. 42. A folder folder-manager/name-624CA19 chapter 1 jack/interesting digg.com/date russell/seen amazon.com/price meg/web/rating james/stars sam/wishlist tim/seen tim/own folder-manager/tag-F5283AC21 about about mike/opinion sally/read-it folder-manager/tag-F5283AC21 folder-manager/tag-F5283AC21 folder-manager/tag folder-manager/tag-F5283AC21 folder-manager/name folder-manager/name-624CA19
  43. 43. A folder folder-manager/name-624CA19 chapter 1 jack/interesting digg.com/date russell/seen amazon.com/price meg/web/rating james/stars sam/wishlist tim/seen tim/own folder-manager/tag-F5283AC21 about about mike/opinion sally/read-it folder-manager/tag-F5283AC21 folder-manager/tag-F5283AC21 folder-manager/name-624CA19 chapter 2 folder-manager/tag folder-manager/tag-F5283AC21 folder-manager/name folder-manager/name-624CA19
  44. 44. What’s FluidDB about? Whatever you like
  45. 45. A voting box
  46. 46. A voting box pat/vote
  47. 47. A voting box pat/vote james/vote
  48. 48. A voting box pat/vote james/vote sally/vote
  49. 49. A voting box pat/vote tim/vote james/vote sally/vote
  50. 50. A voting box russell/vote pat/vote tim/vote james/vote sally/vote
  51. 51. A voting box russell/vote pat/vote tim/vote james/vote sally/vote sofia/vote
  52. 52. A voting box russell/vote pat/vote tim/vote sally/vote sofia/vote
  53. 53. Inter-process comms
  54. 54. Inter-process comms producer/data/01
  55. 55. Inter-process comms producer/data/01 producer/data/02
  56. 56. Inter-process comms producer/data/01 consumer1/seq 02 producer/data/02
  57. 57. Inter-process comms producer/data/01 consumer1/seq 02 producer/data/02 producer/data/03
  58. 58. Inter-process comms producer/data/01 consumer1/seq 02 producer/data/02 producer/data/03 consumer2/seq 03
  59. 59. Inter-process comms consumer1/seq 02 producer/data/02 producer/data/03 consumer2/seq 03
  60. 60. Social data Data that increases in value when it’s co-located All data is social, once we begin to interact with it
  61. 61. Info in context Information is more useful in context: Google Wikipedia FluidDB can do a similiar thing, but for DBs & apps Data is held in silos by apps Behind restrictive APIs
  62. 62. FluidDB Database of these simple objects Enables sharing between applications, people Single-instance, hosted (like SimpleDB) Distributed storage and query processing
  63. 63. Sack the golden towns of Montezuma! “My dear fellow,” Burlingame said caustically, “we sit on a blind rock careening through space; we are all of us rushing headlong to the grave. Think you the worms will care, when anon they make a meal of you, whether you spent your moment sighing wigless in your chamber, or sacked the golden towns of Montezuma? Lookee, the day’s nigh spent; ‘tis gone careening into time forever. Not a tale’s length since we lined our bowels with dinner, and already they growl for more. We are dying men Ebenezer: i’faith, there’s time for nought but bold resolves!” John Barth The Sotweed Factor
  64. 64. Sack the golden towns of Montezuma! “My dear fellow,” Burlingame said caustically, “we sit on a blind rock careening through space; we are all of us rushing headlong to the grave. Think you the worms will care, when anon they make a meal of you, whether you spent your moment sighing wigless in your chamber, or sacked the golden towns of Montezuma? Lookee, the day’s nigh spent; ‘tis gone careening into time forever. Not a tale’s length since we lined our bowels with dinner, and already they growl for more. We are dying men Ebenezer: i’faith, there’s time for nought but bold resolves!” John Barth The Sotweed Factor
  65. 65. Challenge Engineer a system that can efficiently hold an effectively unlimited number of these objects Let each object have an effectively unlimited number of tags Design a query language, permissions, and API Make simple and common things fast Make it realizable
  66. 66. OK, your 49% Questions Apps HTTP API Storage Query language Architecture Python Data Permissions Demo Access Release!?
  67. 67. Design goals Simple data model Simple permissions model Simple, easily parallelizable, query language Horizontally scalable Fast for common tasks Implementable!
  68. 68. Permissions For each action on a namespace or tag: There’s a policy: ‘open’ or ‘closed’ And a (possibly empty) list of exceptions
  69. 69. Permissions tag or action policy exceptions namespace tim/seen read closed tim, meg mike/opinion update open mike/ create closed mike meg/rating see open meg/rating read closed meg
  70. 70. Query language Numeric: tag value (=, <, etc.) Textual: tag text match Presence: has attribute Grouping/logic: (...), and, or, not
  71. 71. Query language Numeric: tag value (=, <, etc.) Textual: tag text match Presence: has attribute Grouping/logic: (...), and, or, not Designed to bring back object ids
  72. 72. Data buffet username seen, lastvisited, rating, goingtoread, comment username me, FBfriend, linkedInContact, met, family username/myusername name, password flickr.com longitude, latitude, owner, date, camera/make username/flickr title, description VCspotter fredwilson, bradburnham nasdaq.com name, symbol, type, outstanding, value, price username/stocks shares, date google.com pagerank tracks album, artist, name, year username/music count, favorite, lastPlay, stars, bestOf2007
  73. 73. Data buffet 2 email messageId, fromId, toId username/email from, to, subject, date alexa.com rank digg.com title, description, date, diggs mahalo.com appeared, category readwriteweb.com appeared reddit.com date, score techcrunch.com appeared, URI attribute description, name, path namespace description, name, path
  74. 74. Example queries terry/rating > 5 and has reddit.com/score has goingtoread and seen > "January 1, 2008" has FBfriend and has linkedInContact has james/FBfriend and not has anne/FBfriend alexa.com/rank < 50 and fred/comment ~ cool has reddit.com/score and not has digg.com/diggs has readwriteweb.com/appeared and not has techcrunch.com/appeared
  75. 75. More queries terry/seen > "July 1, 2007" has russell/myusername/name and not has terry/myusername/name flickr.com/latitude > 52.15 and flickr.com/ latitude < 52.35 and flickr.com/camera/make ~ Sony and has sally/seen amazon.com/stars > 3 and amazon.com/price < 20 and amazon.com/title ~ chess and peter/ bookrating > 3 sort by amazon.com/ publication-date
  76. 76. Architecture Software Communications Functional Storage Query processing Per box
  77. 77. Software Python Twisted PostgreSQL Thrift AMQP (RabbitMQ) Lucene
  78. 78. d d Functional objects http facade sets apps objects http facade sets namespace coord namespace coord tag tag control text text
  79. 79. d d Functional objects http facade sets objects http facade sets namespace coord namespace coord tag tag text text
  80. 80. d d Functional objects http facade sets objects http facade sets namespace coord amqp namespace coord amqp tag tag text text
  81. 81. d d Functional objects http facade sets objects http facade sets namespace xmpp coord amqp namespace xmpp coord amqp tag tag text text
  82. 82. d d Functional objects http facade sets objects http facade sets namespace xmpp coord amqp namespace xmpp coord amqp tag other tag other text text
  83. 83. d d Functional objects load http facade sets objects load http facade sets namespace load xmpp coord amqp namespace load xmpp coord amqp tag other tag other text text
  84. 84. d d Functional objects load http facade sets objects load http facade sets namespace load xmpp coord amqp namespace load xmpp coord amqp tag other memcache tag other text memcache text
  85. 85. d d Functional objects db load http facade sets objects db load http facade sets namespace db load xmpp coord amqp namespace db load xmpp coord amqp tag db other memcache tag db other text db memcache text db
  86. 86. d d Functional objects db kv load http facade sets objects db load http facade sets namespace db load xmpp coord amqp namespace db load xmpp coord amqp tag db other memcache tag db other text db memcache text db
  87. 87. Tag Storage meg/rating tim/books/opinion object id user id value object id user id value 1234567 667 26 526141 362 nice 6527527 667 188 726483 362 fun 2876281 17 207 635378 362 boring 7628876 667 1225 477582 362 sexy 362782 362 long PostgreSQL Tall tables Independent (column store) Backed by key/value store (Amazon S3, for now)
  88. 88. Query processing and digg/date > “Monday” or meg/rating > 5 has tim/seen
  89. 89. Query processing tag set ops digg/date > “Monday” meg/rating > 5 has tim/seen
  90. 90. Tag affinity attr set ops digg/date > “Monday” meg/rating > 5 has tim/seen
  91. 91. Per box A controller service, launched on boot The controller launches new services (processes) All services talk AMQP as well as pure Thrift A coordinator brings up new boxes & services We use Amazon EC2, for now
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×