N-O-SQL, new database technologies on the rise

5,604 views

Published on

An introductory presentation on NOSQL technology for SAI (2010-04-20)

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,604
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
239
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

N-O-SQL, new database technologies on the rise

  1. 1. N-O-SQL new database technologies on the rise http://www.flickr.com/photos/wolfgangstaudt/2215246206/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  2. 2. Who am I » Steven Noels - stevenn@outerthought.org » Outerthought : scalable content applications » makers of Daisy and Lily open source CMS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
  3. 3. Agenda » raison d’être: what brought us here » concepts: required theory readings » market overview: trees & the forest » experiences and (h)in(d)sights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  4. 4. Raison d’être IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  5. 5. History 2. simplification 1. standardization hierarchical databases IMS XMLDB RDBMS OODBMS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  6. 6. Inconsistency through slave lag John Quinn (Digg) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  7. 7. Scaling writes (1) John Quinn (Digg) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  8. 8. Scaling writes (2) John Quinn (Digg) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  9. 9. Issues with partitioning » lose the ability to make arbitrary queries » have to predict data access patterns when formulating partitioning strategy » complex and fragile systems IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  10. 10. Replication complexity IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  11. 11. Scaling relational systems » When scaling relational systems you loose their advantages but retain their overhead IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  12. 12. History 4. rethinking the problem RDBMS NOSQL caching denormalisation sharding replication ... 3. scaling IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  13. 13. Moore vs Kryder » seek time is constant (network latency as well?) » transfer rate ! spindles ! » as a principle, writes are hard to scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  14. 14. Cambrian Explosion IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  15. 15. Buzz-oriented development ? IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  16. 16. Cambrian Explosion N-O-SQL IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  17. 17. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  18. 18. The Perspective of Cost IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  19. 19. Common themes » SCALE SCALE SCALE » new datamodels » devops » N-O-SQL » The Cloud : technology is of no interest anymore IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
  20. 20. Numbers of scale http://qos.doubleclick.net/counters/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
  21. 21. Types of scaling » scaling for usage » scaling types of ops » volume of users » concurrent read » volume of data » concurrent write availability partioning replication consistency distribution IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  22. 22. Distributed systems are hard ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  23. 23. 8 fallacies of distributed computing » The network is reliable. » Latency is zero. » Bandwidth is infinite. Peter Deutsch and James Gosling » The network is secure. » Topology doesn't change. » There is one administrator. » Transport cost is zero. » The network is homogeneous. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  24. 24. New Data » sparse structures » weak schemas » graphs » semi-structured » document-oriented IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  25. 25. N-O-SQL = not only SQL ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  26. 26. The NOSQL footprint free-structured or sparse data NOSQL MongoDB CouchDB neo4j Cassandra available (complexity) simple operational HBase highly scalable and constraints ACID, SQL referential integrity, typed data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
  27. 27. NOSQL, if you need ... » horizontal scaling (out rather than up) » unusually common data (aka free-structured) » speed (especially for writes) » the bleeding edge IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
  28. 28. SQL/RDBMS, if you need ... » SQL » ACID » normalisation » a defined liability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  29. 29. Theory IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  30. 30. Robust systems IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
  31. 31. Academic background » Amazon Dynamo » Google BigTable » Eric Brewer CAP theorem IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
  32. 32. Amazon Dynamo » coined the term ‘eventual consistency’ » consistent hashing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
  33. 33. Consistent hashing http://horicky.blogspot.com/2009/11/nosql-patterns.html IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
  34. 34. Consistent hashing - node C + node D http://www.lexemetech.com/2007/11/consistent-hashing.html IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
  35. 35. Google BigTable » multi-dimensional column-oriented database » on top of GoogleFileSystem » object versioning IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
  36. 36. CAP theorem strong high consistency availability partition- tolerance IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
  37. 37. CAP » Strong Consistency: all clients see the same view, even in the presence of updates » High Availability: all clients can find some replica of the data, even in the presence of failures » Partition-tolerance: the system properties hold even when the system is partitioned IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
  38. 38. Consistency » Where is my data I just updated? » Ideal world : The result of every write-operation is reflected by subsequent read-operations. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
  39. 39. Consistency IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 39
  40. 40. Sunny-day scenario IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 40
  41. 41. Network partioning IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 41
  42. 42. Culture Clash » Classic distributed systems: focus on ACID » atomic » consistent » isolated » durable » Modern internet systems: focus on BASE » basically available » soft-state (or scalable) » eventually consistent IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 42
  43. 43. Culture Clash » ACID » BASE » highest priority: strong » availability and scaling consistency for highest priorities transactions » weak consistency » availability less important » optimistic » pessimistic » best effort » rigorous analysis » simple and fast » complex mechanisms spectrum IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 43
  44. 44. Building for failure » defensive programming » creating replicas » disk flushing » watch out for failure of utility infrastructure » conscious sync/async decisions IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 44
  45. 45. Possible storage failures » Application errors » Repeatable DB failures » Unrepeatable DB failures » OS errors » Local cluster HW failure Michael Stonebreaker » Local cluster network partitioning » Disaster » WAN network failure between remote clusters IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
  46. 46. Availability ≠ total async ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  47. 47. ✘ The Enterprise Service Bus bus = congestion IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 47
  48. 48. Bus systems » objects don’t fit in a pipe » object ➙ message » serialization / de-serialization cost » message size » queuing = cost IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 48
  49. 49. Use a mixture of both »async + sync stuff which matters ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 49
  50. 50. Numbers of scale http://qos.doubleclick.net/counters/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 50
  51. 51. Processing large datasets : Map/Reduce IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  52. 52. Smart Data » sparse as a feature » weak schemas » ad-hoc indexing » organic analytics » near-data processing » live(ly) datawarehouse » distribution ➙ parallellization ➙ performance IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 52
  53. 53. Hadoop: HDFS + MapReduce » single filesystem + single execution-space IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 53
  54. 54. MapReduce example: WordCount IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 54
  55. 55. MapReduce IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 55
  56. 56. MapReduce and HDFS © lars george IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 56
  57. 57. Physical architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 57
  58. 58. Processing large datasets with MR » Benefit from parallellisation » Less modelling upfront (ad-hoc processing) » Compartmentalized approach reduces operational risks » AsterData et al. have SQL/MR hybrids for huge-scale BI IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 58
  59. 59. Market overview IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  60. 60. Categories » key-value stores » column stores » document stores » graph databases IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 60
  61. 61. Key-value stores » Redis » Voldemort » Tokyo Cabinet IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 61
  62. 62. Redis » REmote DIctionary Server » http://code.google.com/p/redis/ » vmware IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 62
  63. 63. Redis Features » persisted memcache, ‘awesome’ » RAM-based + persistable » key ➙ values: string, list, set » higher-level ops » i.e. push/pop and sort for lists » fast (very) » configurable durability » client-managed sharding IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 63
  64. 64. Voldemort » http://project-voldemort.com/ » LinkedIn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 64
  65. 65. Voldemort » persistent » distributed » fault-tolerant » hash table IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 65
  66. 66. Voldemort API: GET, PUT, DELETE IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 66
  67. 67. Voldemort routing logic moving up the stack, smaller latency IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 67
  68. 68. Voldemort data format » key+values = arrays of bytes » So how do we objects ⬌ bytes ? » json » string » java-serialization » protobuf » identity IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 68
  69. 69. Tokyo Cabinet » http://1978th.net/tokyocabinet/ » mixi.jp (i.e. Facebook Japan) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 69
  70. 70. Product Family IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 70
  71. 71. Tokyo Cabinet » memory or filesystem » hash, b-tree, fixed-length, table IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 71
  72. 72. Column stores » BigTable » HBase » Cassandra IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 72
  73. 73. BigTable » http://labs.google.com/papers/bigtable.html » Google » layered on top of GFS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 73
  74. 74. HBase » http://hadoop.apache.org/hbase/ » StumbleUpon / Adobe / Cloudera IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 74
  75. 75. HBase » sorted » persisted » distributed » storage system » column-oriented » multi-dimensional » highly-available » adds random access » high-performance reads and writes atop HDFS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 75
  76. 76. HBase data model » Distributed multi-dimensional sparse map » Multi-dimensional keys: (table, row, family:column, timestamp) → value » Keys are arbitrary strings » Access to row data is atomic IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 76
  77. 77. Storage architecture © lars george IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 77
  78. 78. Cassandra » http://cassandra.apache.org/ » Rackspace / Facebook IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 78
  79. 79. Cassandra » Key-value store (with added structure) » Reliability (identical nodes) » Eventual consistent » Distributed A C » Tunable » Partitioning P » Replication IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 79
  80. 80. Cassandra write pattern IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 80
  81. 81. Cassandra applicability FIT NO FIT » Scalable reliability » Flexible indexing (through identical » Only PK-based nodes) querying » Linear scaling » Big Binary Data » Write throughput » 1 Row must fit in » Large Data Sets RAM entirely IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 81
  82. 82. Document stores » CouchDB » MongoDB » Riak » MarkLogic IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 82
  83. 83. CouchDB » http://couchdb.apache.org/ » couch.io IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 83
  84. 84. CouchDB » fault-tolerant » schema-free » document-oriented » accessible via a RESTful HTTP/JSON API IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 84
  85. 85. CouchDB documents { “_id”: ”BCCD12CBB”, “_rev”: ”AB764C”, “type”: ”person”, “name”: ”Darth Vader”, “age”: 63, “headware”: [“Helmet”, “Sombrero”], “dark_side”: true } IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 85
  86. 86. CouchDB REST API » HTTP » PUT /db/docid » GET /db/docid » POST /db/docid » DELETE /db/docid IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 86
  87. 87. CouchDB Views » MapReduce-based » Filter, Collate, Aggregate » Javascript map reduce function (doc) { function (Key, Values) { for(var i in doc.tags) var sum = 0; emit(doc.tags[i], 1); for(var i in Values) } sum += Values[i]; return sum; } IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 87
  88. 88. CouchDB » be careful on semantics » replication ≠ partioning/sharding ! » distributed database = distributable database » sharded / distributed deployment requires proxy layer IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 88
  89. 89. MongoDB » http://www.mongodb.org/ » 10gen IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 89
  90. 90. MongoDB » cfr. CouchDB, really » except for: » C++ » performance focus » runtime queries (mapreduce still available) » native drivers (no REST/HTTP layering) » no MVCC: update-in-place » auto sharding (alpha) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 90
  91. 91. Riak » http://riak.basho.com/ » Basho Technologies IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 91
  92. 92. Riak » buckets/keys, links » values/content = bucket + metadata » pluggable storage engines (fs, (D)ETS, InnoDB) » HTTP/REST API » automatic distribution » mapreduce using Javascript IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 92
  93. 93. Jackrabbit » http://jackrabbit.apache.org/ » Day Software IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 93
  94. 94. Jackrabbit » reference implementation for JSR 170 & 283 » remoting: WebDAV & RMI » persistence: RDBMS, fs, memory IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 94
  95. 95. Jackrabbit » Java-centric (duh) » complex repository model (nodes+properties) » mixins, inheritance » workspaces » query language » no partioning/sharding IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 95
  96. 96. JCR API levels IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 96
  97. 97. Graph databases » Neo4j » AllegroGraph (RDF) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 97
  98. 98. Neo4j » http://neo4j.org/ » Neo Technology IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 98
  99. 99. Neo4j » data = nodes + relationships + key/value properties IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 99
  100. 100. Neo4j » many language bindings, little remoting » ‘whiteboard’ friendly » scaling to complexity (rather than volume?) » lots of focus on domain modelling » SPARQL/SAIL impl for triple geeks » mostly RAM centric (with disk swapping & persistence) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 100
  101. 101. Experiences & (h)in(d)sights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  102. 102. NOSQL applicability » Horizontal scaling » Multi-Master » Data representation » search of simplicity » data that doesn’t fit the E-R model (graphs, trees, versions) » Speed IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 102
  103. 103. Tools for the trade » non-relational data: Couch, Mongo, Riak » massive quantities: Cassandra, HBase » persistent caching: Redis, Voldemort » graphs: neo4j IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 103
  104. 104. Tool selection » be careful on the marketeese: smoke and mirrors beware! » monitor dev list, IRC, Twitter, blogs » monitor project ‘sponsors’ » mix-and-match » DON’T NOSQL WITHOUT INTERNAL SYS ARCHS & DEV(OP)S ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 104
  105. 105. } aptness NOSQL internet enterprise } SQL corporate community complexity IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 105
  106. 106. Our NOSQL-based project: Lily » (open source) » scalable store (Apache HBase) » and search (Apache SOLR) » content repository » α due mid 2010 » www.lilycms.org or @outerthought IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 106
  107. 107. Lily architecture distributed process coordination and configuration (ZooKeeper) } query update indexer Lily Lily Lily Store Server store client node WAL MQ M/R client } store node 2ary WAL / HBase Region Server documents indexes MQ client store node } Hadoop DFS REST index replica inverted index replica replica } SOLR IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 107
  108. 108. When combining store and search, make sure your (search) index doesn’t become the store. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  109. 109. Key lessons learned » importance of keyspace design » secondary indexing » data de-normalization » schema vs. code flexibility? » distribution is everywhere and you shouldn’t forget about it IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 109
  110. 110. Reading material » Amazon Dynamo, Google BigTable, CAP » http://nosql.mypopescu.com/ » http://nosql-database.org/ » http://twitter.com/nosqlupdate » http://highscalability.com/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 110
  111. 111. Questions? http://www.flickr.com/photos/leehaywood/4237636853/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 111
  112. 112. Thanks for your attention ! » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 112

×