• Share
  • Email
  • Embed
  • Like
  • Private Content
CouchConf-Berlin-Advanced-querying
 

CouchConf-Berlin-Advanced-querying

on

  • 598 views

 

Statistics

Views

Total Views
598
Views on SlideShare
598
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This presentation shares some tips on how I've gotten CouchDB to perform well for me in the past as well as things to looks forward to in the future.\n\nAdvanced is kind of a distraction. CouchDB is simple so what you see here shouldn't be that different from basic queries.\n
  • Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
  • Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
  • Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
  • Everything. Even when it's calculated live, in memory. Not all of these are created equal however. Fortunately CouchDB keeps it simple and presents one general structure for most use cases.\n
  • I won't cover B-trees in depth here. Wikipedia is a good start if you're wondering. Keep in mind that CouchDB has a specific incarnation that gives us some special properties.\n
  • Cornerstone to all databases, I/O will decide if your ideas fly or fail. Feeding your intense, networked, interactive software of today requires a serious study of I/O characteristics.\n
  • Throughput and latency tend to be the measurements of choice. Notice how big of a jump RAM is. Imagine how many CPU cycles o e HDD seek is.\n
  • So let's keep RAM in mind. Couchbase does make good use of RAM in their clustered product for documents but it's not available for queries.\n
  • Usually enough but this should actually be measured. How, well, let's look at what I call a "working set".\n
  • All of your data might exist somewhere on disk. That doesn't mean it can't have those disk pages cached in RAM. Keep it there. Try to keep data clustered on disk so you have better page cache and buffer cache efficiency.\n
  • All of your data might exist somewhere on disk. That doesn't mean it can't have those disk pages cached in RAM. Keep it there. Try to keep data clustered on disk so you have better page cache and buffer cache efficiency.\n
  • What a working set is.\n
  • Controlling the working set by tuning your database design. This talk will focus on views for queries but all of these point matter. Measure because it better add up or your performance will be painfully slow.\n
  • I always like to start talking about indexing by declaring that it's already there. We already have an automatic index. I call this the primary index, but that's just me.\n
  • Key-value anyone? How do we make key based access fast. How do we accelerate random access vs sequential access. It's all about data layout. It equates to an index.\n
  • Key-value applies to CouchDB.\n
  • A nice property of this key index is that it provides a method of uniques. I hear this question all the time. "How do I constrain fields of a document to a unique value?" Short answer is _id.\n
  • This leads beautifully to revision based concurrency. Semantic keying is a good idea, even if it's not in you primary index, but why wait to build a view?\n
  • Finally, my favorite part of the primary document tree is that it's just one file. No duplication of information, do your overhead is nice and small. It's always fresh too, unlike views.\n
  • \n
  • These are just a few ideas I've made up names for.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • It's pretty obvious how this key design helps turn joins into a range query.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • _rev can also be passed, but be careful as revisions can be pruned during compaction.\n
  • They don't cost much so it pays to have default reduce functions. It's all about knowing your data better.\n
  • \n
  • \n
  • \n
  • When you have one big database, you pay all costs all at once. Compaction costs, for example, can be huge.\n
  • When you have many smaller databases, costs can be paid for incrementally. Compaction will take much less overhead for example.\n
  • \n
  • Key access is fast. Simple.\n
  • Key access is fast. Simple.\n
  • Key access is fast. Simple.\n
  • \n
  • \n
  • \n
  • Merging queries means you might have cases with partial results.\n
  • \n
  • It's still an option, especially if you need certain performance on a cluster.\n
  • Available as part of Couchbase Single/Mobile.\n
  • CouchDB, Couchbase Single only.\n
  • Good way to extend an existing cluster. Up to the application layer.\n
  • \n

CouchConf-Berlin-Advanced-querying CouchConf-Berlin-Advanced-querying Presentation Transcript

  • Advanced Querying Brian Mitchell (strmpnk)
  • Query
  • Queryfinding the right information
  • Query finding the right informationscanning and processing data
  • Query finding the right informationscanning and processing data traversing data structures
  • Everything ends up in some sort of data structure.
  • B-tree B-tree B-tree B-tree B-tree B-tree B-tree B-tree B-tree B-treeshallow, append only, compressed, awesome
  • I/O
  • I/Oall of your data structures are limited by the medium Throughput (MB/s) Latency (microseconds) 3000 2250 1500 750 0 HDD SSD RAM
  • Obviously RAM is good. Cheap too. Not unlimited.
  • Not unlimited.
  • all your data working set
  • all your data working setKeep it in RAM
  • "Working Set"• recently accessed documents• replicating documents• compaction files• index files
  • Controlling Working Set Size• smaller documents • short object keys, less repetition• smaller databases • increases locality and minimizes compaction overhead• fewer or smaller views • multi-purpose • avoid repeating document data
  • Primary IndexYour first line of defense against bloat
  • Function of an Index Key Value
  • Function of a Primary Index In Couchbase Key Doc
  • UniquenessA B C
  • UniquenessA B C B
  • Uniqueness Semantic KeyingA B C B
  • One FileAlways Fresh, No Extra Cleaning
  • Secondary Index aka. View• Projects a new sequence• Custom mapped values• M-N• Links back to source document
  • View Techniques• Join by collation• Page by key• Foreign includes• Cheap aggregates• Flexible grouping
  • Join By CollationContact A Contact B Note for A Note for B Note for A
  • Join By CollationContact A Contact B Note for A Note for B Note for A Emit A B A-note B-note A-note
  • Join By CollationContact A Contact B Note for A Note for B Note for A Emit A A-note A-note B B-note
  • Page By KeyA B C D E
  • Page By Key limit=2A B C D E
  • Page By Key limit=2A B C D E limit=2&start_key=Bufff0
  • Foreign Includes A B Emit a a
  • Foreign Includes A B Reference_id=A _id=B
  • Cheap Aggregates• It pays to know your data well• Reduce values are stored inline with the view b-tree• Small values take very little space• Nice built-in reduce functions• Not just for user visible data
  • Flexible Grouping2008-10-02 2008-08-17 2009-02-12 Emit[2008,10] [2008, 8] [2009, 2]
  • Flexible Grouping2008-10-02 2008-08-17 2009-02-12 Emit[2008,10] [2008, 8] [2009, 2]
  • Flexible Grouping2008-10-02 2008-08-17 2009-02-12 Emit[2008,10] [2008, 8] [2009, 2]
  • Traditional CouchDB
  • 20%10% 70%
  • 20% 20% 20% 20%10% 10% 10% 10% 70% 70% 70% 70% 20% 20% 20% 20%10% 10% 10% 10% 70% 70% 70% 70% 20% 20% 20% 20%10% 10% 10% 10% 70% 70% 70% 70%
  • Clustering
  • Single Key
  • Single Key
  • Single Key
  • Single Key
  • Query
  • Query
  • Query
  • Query
  • Alternatives
  • Manual Indexing• Store an index as a document• Good properties for mostly static indexing• Cluster friendly• Create custom constrains (uniqueness)• Snapshot of a slow query for speed
  • GeoCouch• R-tree based• First-class Erlang • improved with view engine refactor• Can be abused for multi-dimensional queries • more than just geo-data
  • CouchDB Lucene• Based on CouchDB Externals• Limited to Couchbase Single Server• Faceted queries• Full-text indexing
  • Hybrid• Application managed• Allow stand alone service to work with Couchbase cluster • i.e. Solr, Redis, PostgreSQL• Complex concurrency• More moving parts
  • Fintwitter: @strmpnk email: b@p2p.io