N-O-SQL, new database technologies on the rise
Upcoming SlideShare
Loading in...5
×
 

N-O-SQL, new database technologies on the rise

on

  • 6,200 views

An introductory presentation on NOSQL technology for SAI (2010-04-20)

An introductory presentation on NOSQL technology for SAI (2010-04-20)

Statistics

Views

Total Views
6,200
Views on SlideShare
6,165
Embed Views
35

Actions

Likes
8
Downloads
227
Comments
0

2 Embeds 35

http://www.slideshare.net 30
http://www.techgig.com 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

N-O-SQL, new database technologies on the rise N-O-SQL, new database technologies on the rise Presentation Transcript

  • N-O-SQL new database technologies on the rise http://www.flickr.com/photos/wolfgangstaudt/2215246206/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Who am I » Steven Noels - stevenn@outerthought.org » Outerthought : scalable content applications » makers of Daisy and Lily open source CMS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
  • Agenda » raison d’être: what brought us here » concepts: required theory readings » market overview: trees & the forest » experiences and (h)in(d)sights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  • Raison d’être IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • History 2. simplification 1. standardization hierarchical databases IMS XMLDB RDBMS OODBMS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  • Inconsistency through slave lag John Quinn (Digg) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  • Scaling writes (1) John Quinn (Digg) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  • Scaling writes (2) John Quinn (Digg) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  • Issues with partitioning » lose the ability to make arbitrary queries » have to predict data access patterns when formulating partitioning strategy » complex and fragile systems IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  • Replication complexity IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  • Scaling relational systems » When scaling relational systems you loose their advantages but retain their overhead IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  • History 4. rethinking the problem RDBMS NOSQL caching denormalisation sharding replication ... 3. scaling IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  • Moore vs Kryder » seek time is constant (network latency as well?) » transfer rate ! spindles ! » as a principle, writes are hard to scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  • Cambrian Explosion IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  • Buzz-oriented development ? IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  • Cambrian Explosion N-O-SQL IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  • IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • The Perspective of Cost IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  • Common themes » SCALE SCALE SCALE » new datamodels » devops » N-O-SQL » The Cloud : technology is of no interest anymore IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
  • Numbers of scale http://qos.doubleclick.net/counters/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
  • Types of scaling » scaling for usage » scaling types of ops » volume of users » concurrent read » volume of data » concurrent write availability partioning replication consistency distribution IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  • Distributed systems are hard ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 8 fallacies of distributed computing » The network is reliable. » Latency is zero. » Bandwidth is infinite. Peter Deutsch and James Gosling » The network is secure. » Topology doesn't change. » There is one administrator. » Transport cost is zero. » The network is homogeneous. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  • New Data » sparse structures » weak schemas » graphs » semi-structured » document-oriented IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  • N-O-SQL = not only SQL ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • The NOSQL footprint free-structured or sparse data NOSQL MongoDB CouchDB neo4j Cassandra available (complexity) simple operational HBase highly scalable and constraints ACID, SQL referential integrity, typed data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
  • NOSQL, if you need ... » horizontal scaling (out rather than up) » unusually common data (aka free-structured) » speed (especially for writes) » the bleeding edge IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
  • SQL/RDBMS, if you need ... » SQL » ACID » normalisation » a defined liability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  • Theory IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Robust systems IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
  • Academic background » Amazon Dynamo » Google BigTable » Eric Brewer CAP theorem IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
  • Amazon Dynamo » coined the term ‘eventual consistency’ » consistent hashing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
  • Consistent hashing http://horicky.blogspot.com/2009/11/nosql-patterns.html IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
  • Consistent hashing - node C + node D http://www.lexemetech.com/2007/11/consistent-hashing.html IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
  • Google BigTable » multi-dimensional column-oriented database » on top of GoogleFileSystem » object versioning IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
  • CAP theorem strong high consistency availability partition- tolerance IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
  • CAP » Strong Consistency: all clients see the same view, even in the presence of updates » High Availability: all clients can find some replica of the data, even in the presence of failures » Partition-tolerance: the system properties hold even when the system is partitioned IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
  • Consistency » Where is my data I just updated? » Ideal world : The result of every write-operation is reflected by subsequent read-operations. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
  • Consistency IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 39
  • Sunny-day scenario IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 40
  • Network partioning IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 41
  • Culture Clash » Classic distributed systems: focus on ACID » atomic » consistent » isolated » durable » Modern internet systems: focus on BASE » basically available » soft-state (or scalable) » eventually consistent IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 42
  • Culture Clash » ACID » BASE » highest priority: strong » availability and scaling consistency for highest priorities transactions » weak consistency » availability less important » optimistic » pessimistic » best effort » rigorous analysis » simple and fast » complex mechanisms spectrum IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 43
  • Building for failure » defensive programming » creating replicas » disk flushing » watch out for failure of utility infrastructure » conscious sync/async decisions IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 44
  • Possible storage failures » Application errors » Repeatable DB failures » Unrepeatable DB failures » OS errors » Local cluster HW failure Michael Stonebreaker » Local cluster network partitioning » Disaster » WAN network failure between remote clusters IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
  • Availability ≠ total async ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • ✘ The Enterprise Service Bus bus = congestion IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 47
  • Bus systems » objects don’t fit in a pipe » object ➙ message » serialization / de-serialization cost » message size » queuing = cost IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 48
  • Use a mixture of both »async + sync stuff which matters ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 49
  • Numbers of scale http://qos.doubleclick.net/counters/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 50
  • Processing large datasets : Map/Reduce IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Smart Data » sparse as a feature » weak schemas » ad-hoc indexing » organic analytics » near-data processing » live(ly) datawarehouse » distribution ➙ parallellization ➙ performance IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 52
  • Hadoop: HDFS + MapReduce » single filesystem + single execution-space IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 53
  • MapReduce example: WordCount IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 54
  • MapReduce IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 55
  • MapReduce and HDFS © lars george IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 56
  • Physical architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 57
  • Processing large datasets with MR » Benefit from parallellisation » Less modelling upfront (ad-hoc processing) » Compartmentalized approach reduces operational risks » AsterData et al. have SQL/MR hybrids for huge-scale BI IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 58
  • Market overview IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Categories » key-value stores » column stores » document stores » graph databases IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 60
  • Key-value stores » Redis » Voldemort » Tokyo Cabinet IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 61
  • Redis » REmote DIctionary Server » http://code.google.com/p/redis/ » vmware IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 62
  • Redis Features » persisted memcache, ‘awesome’ » RAM-based + persistable » key ➙ values: string, list, set » higher-level ops » i.e. push/pop and sort for lists » fast (very) » configurable durability » client-managed sharding IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 63
  • Voldemort » http://project-voldemort.com/ » LinkedIn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 64
  • Voldemort » persistent » distributed » fault-tolerant » hash table IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 65
  • Voldemort API: GET, PUT, DELETE IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 66
  • Voldemort routing logic moving up the stack, smaller latency IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 67
  • Voldemort data format » key+values = arrays of bytes » So how do we objects ⬌ bytes ? » json » string » java-serialization » protobuf » identity IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 68
  • Tokyo Cabinet » http://1978th.net/tokyocabinet/ » mixi.jp (i.e. Facebook Japan) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 69
  • Product Family IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 70
  • Tokyo Cabinet » memory or filesystem » hash, b-tree, fixed-length, table IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 71
  • Column stores » BigTable » HBase » Cassandra IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 72
  • BigTable » http://labs.google.com/papers/bigtable.html » Google » layered on top of GFS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 73
  • HBase » http://hadoop.apache.org/hbase/ » StumbleUpon / Adobe / Cloudera IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 74
  • HBase » sorted » persisted » distributed » storage system » column-oriented » multi-dimensional » highly-available » adds random access » high-performance reads and writes atop HDFS IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 75
  • HBase data model » Distributed multi-dimensional sparse map » Multi-dimensional keys: (table, row, family:column, timestamp) → value » Keys are arbitrary strings » Access to row data is atomic IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 76
  • Storage architecture © lars george IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 77
  • Cassandra » http://cassandra.apache.org/ » Rackspace / Facebook IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 78
  • Cassandra » Key-value store (with added structure) » Reliability (identical nodes) » Eventual consistent » Distributed A C » Tunable » Partitioning P » Replication IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 79
  • Cassandra write pattern IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 80
  • Cassandra applicability FIT NO FIT » Scalable reliability » Flexible indexing (through identical » Only PK-based nodes) querying » Linear scaling » Big Binary Data » Write throughput » 1 Row must fit in » Large Data Sets RAM entirely IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 81
  • Document stores » CouchDB » MongoDB » Riak » MarkLogic IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 82
  • CouchDB » http://couchdb.apache.org/ » couch.io IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 83
  • CouchDB » fault-tolerant » schema-free » document-oriented » accessible via a RESTful HTTP/JSON API IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 84
  • CouchDB documents { “_id”: ”BCCD12CBB”, “_rev”: ”AB764C”, “type”: ”person”, “name”: ”Darth Vader”, “age”: 63, “headware”: [“Helmet”, “Sombrero”], “dark_side”: true } IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 85
  • CouchDB REST API » HTTP » PUT /db/docid » GET /db/docid » POST /db/docid » DELETE /db/docid IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 86
  • CouchDB Views » MapReduce-based » Filter, Collate, Aggregate » Javascript map reduce function (doc) { function (Key, Values) { for(var i in doc.tags) var sum = 0; emit(doc.tags[i], 1); for(var i in Values) } sum += Values[i]; return sum; } IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 87
  • CouchDB » be careful on semantics » replication ≠ partioning/sharding ! » distributed database = distributable database » sharded / distributed deployment requires proxy layer IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 88
  • MongoDB » http://www.mongodb.org/ » 10gen IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 89
  • MongoDB » cfr. CouchDB, really » except for: » C++ » performance focus » runtime queries (mapreduce still available) » native drivers (no REST/HTTP layering) » no MVCC: update-in-place » auto sharding (alpha) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 90
  • Riak » http://riak.basho.com/ » Basho Technologies IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 91
  • Riak » buckets/keys, links » values/content = bucket + metadata » pluggable storage engines (fs, (D)ETS, InnoDB) » HTTP/REST API » automatic distribution » mapreduce using Javascript IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 92
  • Jackrabbit » http://jackrabbit.apache.org/ » Day Software IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 93
  • Jackrabbit » reference implementation for JSR 170 & 283 » remoting: WebDAV & RMI » persistence: RDBMS, fs, memory IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 94
  • Jackrabbit » Java-centric (duh) » complex repository model (nodes+properties) » mixins, inheritance » workspaces » query language » no partioning/sharding IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 95
  • JCR API levels IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 96
  • Graph databases » Neo4j » AllegroGraph (RDF) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 97
  • Neo4j » http://neo4j.org/ » Neo Technology IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 98
  • Neo4j » data = nodes + relationships + key/value properties IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 99
  • Neo4j » many language bindings, little remoting » ‘whiteboard’ friendly » scaling to complexity (rather than volume?) » lots of focus on domain modelling » SPARQL/SAIL impl for triple geeks » mostly RAM centric (with disk swapping & persistence) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 100
  • Experiences & (h)in(d)sights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • NOSQL applicability » Horizontal scaling » Multi-Master » Data representation » search of simplicity » data that doesn’t fit the E-R model (graphs, trees, versions) » Speed IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 102
  • Tools for the trade » non-relational data: Couch, Mongo, Riak » massive quantities: Cassandra, HBase » persistent caching: Redis, Voldemort » graphs: neo4j IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 103
  • Tool selection » be careful on the marketeese: smoke and mirrors beware! » monitor dev list, IRC, Twitter, blogs » monitor project ‘sponsors’ » mix-and-match » DON’T NOSQL WITHOUT INTERNAL SYS ARCHS & DEV(OP)S ! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 104
  • } aptness NOSQL internet enterprise } SQL corporate community complexity IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 105
  • Our NOSQL-based project: Lily » (open source) » scalable store (Apache HBase) » and search (Apache SOLR) » content repository » α due mid 2010 » www.lilycms.org or @outerthought IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 106
  • Lily architecture distributed process coordination and configuration (ZooKeeper) } query update indexer Lily Lily Lily Store Server store client node WAL MQ M/R client } store node 2ary WAL / HBase Region Server documents indexes MQ client store node } Hadoop DFS REST index replica inverted index replica replica } SOLR IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 107
  • When combining store and search, make sure your (search) index doesn’t become the store. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • Key lessons learned » importance of keyspace design » secondary indexing » data de-normalization » schema vs. code flexibility? » distribution is everywhere and you shouldn’t forget about it IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 109
  • Reading material » Amazon Dynamo, Google BigTable, CAP » http://nosql.mypopescu.com/ » http://nosql-database.org/ » http://twitter.com/nosqlupdate » http://highscalability.com/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 110
  • Questions? http://www.flickr.com/photos/leehaywood/4237636853/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 111
  • Thanks for your attention ! » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 112