Best Practices for  Good Document DesignChris Anderson    @jchris
DOCUMENTS ARE NOT…                     2
DOCUMENTS ARE NOT…                     3
DOCUMENTS ARE NOT…                     4
5
SCHEMA-FREE DATABASE• schema definition is optional at write time – no need to define schema before adding data  – write any...
HOWEVER!There are constraints.                         7
INHERENT SCHEMA      sort of                  8
UNIQUENESS• Document ID is the only (DB-side) way to  make something unique• App could de-duplicate from map/reduce – but ...
IMPLICIT BASICS                  10
JSON DOCUMENTS{    “json”: “key / value pairs”,    “_id”: “specified id or auto generated    UUID”,    “_rev”: “mvcc”,    “...
KEY NAMES• JSON Object restrictions – they’re all strings• Couchbase Server reserves these prefixes  on top-level keys – “_...
VALUES• JSON restrictions  – objects, arrays, strings, numbers• Consider how you’ll be using it in your app  – template sy...
ONE DOC OR MULTIPLE      DOCS?                      14
DECISION MAKERS• what does this document look like in real  life?• how often will I update this?• does this need its own r...
QUERYINGcan I get at the doc’s data easily?                                      16
UPDATING• When things change, do I want to update the  doc? – or record the document’s changes as individual  docs• Freque...
REPLICATION  the biggie!                18
REPLICATION• Where possible... – avoid conflicts – leverage small pieces• Keep uniqueness and conflicts in balance          ...
CONVENTIONSpaving the cattle trails                           20
CONVENTIONS & GOOD           HABITS• “type”: “contact”• “created_at”: Unix timestamp• “status”: some status for this doc (...
MORE CONVENTIONS• “created_by”: username – typically from _users database• “profile”: CouchApp profile contents – from _user...
TOOLS        23
VALIDATE_DOC_UPDATE• optional schema enforcement• function(newDoc, storedDoc, userCtx)• throw errors to prevent save• cann...
SAMPLE DOCS (IN 2.0) HANDY FOR QUICK DOC “SCHEMA” REFERENCING                                            25
ADVANCED DOCUMENT      DESIGNmore tools and tricks in this session                                        26
EXAMPLES           27
1.{2.     "_id": "2011-10-20T00:32:58_101D8A2A000000F7",3.     "_rev": "1-0c9914a4695b67a4f38cb5f8e345d28f",4.     "readin...
29
1.{2.  "_id": "station_724",3.  "_rev": "1-35f3b06a85f2997f365d5e41bcf6967a",4.  "code": "BIH",5.  "name": "Bairagarh",6. ...
1.{2.   "_id": "sched_284908",3.   "_rev": "1-776a7ceeea990c8eb84d57dc01ea4d2f",4.   "arrival": "14:40",5.   "halt": "10m"...
1. {2. "_id": "train_97",3. "_rev": "2-640c3360c86405167e3b59a8f463d1c0",4. "return_train": "06617",5. "number": "06618",6...
1. {2.    "_id": "0017dcf0149c130229f35b537df48073",3.    "_rev": "7-ef6c5723edafb9828dbc36467493341d",4.    "old_id": 734...
BANKING          34
TRANSACTION LOG                  35
CURRENT BALANCE                  36
CURRENT BALANCE                  37
NEW TRANSACTION                  38
NEW BALANCE              39
ANY QUESTIONS?• submit to couchconfisrael@couchbase.com• or ask me: • @jchris • jchris@couchbase.com                       ...
Upcoming SlideShare
Loading in …5
×

CouchConf Israel Best practices for Good Document Design

459 views
404 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
459
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • It’s not everything you know…\n
  • It’s not half of what you know…\n
  • It’s not even everything you know about a particular subject…\n
  • Who knows what it is?\n\nhttp://polaris.gseis.ucla.edu/gleazer/260_readings/Buckland.pdf\n
  • IMO, schemas always exist, but you may be in the process of discovering them while you’re developing your application.\n
  • \n
  • \n
  • - UUIDs don’t cut it for this\n- same doc POSTed twice gets 2 different UUIDs\n\n
  • \n
  • \n
  • \n
  • * running _stats (or _sum) on strings will ruin your day\n* use Number() if you’re unsure\n\n
  • \n
  • Side Note:\n revisions should never be used for versioning as compaction will remove them and you’ll be sad\n
  • \n
  • the accounting model - requires a “reconciliation” (via MapReduce) to take place to produce the canonical document\n\nDenormalized data example: author data when updating a blog post\n
  • \n
  • \n
  • \n
  • caution: someone else likely uses this type name\n maybe “namespace” them: yourapp.contact\n\n can turn this into whatever other format you need\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Simple document representing a single reading from a single thermometer.\n
  • \n
  • India train station data.\n
  • India train data.\n
  • India train data. This is modified from the original form. Problems:\n1. departure_days was in the form of “---W---”\n2. Uses “T” and “F” for true and false\n3. Can tell starting and ending stations, but not middle stations (redundant, but important for querying)\n4. Had fields for each day of the week (e.g. “monday”: “F”).\n
  • Complex document with lots of fields (exif abbreviated, more attachments, etc...). Real life image from my photo album. ID is an MD5 of the original image.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • CouchConf Israel Best practices for Good Document Design

    1. 1. Best Practices for Good Document DesignChris Anderson @jchris
    2. 2. DOCUMENTS ARE NOT… 2
    3. 3. DOCUMENTS ARE NOT… 3
    4. 4. DOCUMENTS ARE NOT… 4
    5. 5. 5
    6. 6. SCHEMA-FREE DATABASE• schema definition is optional at write time – no need to define schema before adding data – write any sort of JSON you’d like• schemas can be enforced• but (by default) only matter when writing queries 6
    7. 7. HOWEVER!There are constraints. 7
    8. 8. INHERENT SCHEMA sort of 8
    9. 9. UNIQUENESS• Document ID is the only (DB-side) way to make something unique• App could de-duplicate from map/reduce – but that can be tricky• So, be prepared to handle conflicting IDs 9
    10. 10. IMPLICIT BASICS 10
    11. 11. JSON DOCUMENTS{ “json”: “key / value pairs”, “_id”: “specified id or auto generated UUID”, “_rev”: “mvcc”, “keys are strings” : [1, 2, 3, “four”, null], “schema free” : true} 11
    12. 12. KEY NAMES• JSON Object restrictions – they’re all strings• Couchbase Server reserves these prefixes on top-level keys – “_” underscore - also reserved by CouchDB – “$” dollar signs• cannot have duplicate key names on a single level – {“key”: 1, “key”: 2} is invalid (thankfully) 12
    13. 13. VALUES• JSON restrictions – objects, arrays, strings, numbers• Consider how you’ll be using it in your app – template system constraints – Mustache needs arrays of objects vs. arrays of arrays• Be careful of numbers as strings• Date formats – ISO8601 – unix timestamps – output as an array for grouping reductions 13
    14. 14. ONE DOC OR MULTIPLE DOCS? 14
    15. 15. DECISION MAKERS• what does this document look like in real life?• how often will I update this?• does this need its own revision/transaction path? – does all this data need updating together? – or rolled back together? 15
    16. 16. QUERYINGcan I get at the doc’s data easily? 16
    17. 17. UPDATING• When things change, do I want to update the doc? – or record the document’s changes as individual docs• Frequently written docs might make replication harder due to higher conflict probability• Will I have de-normalized portions of data on hand in the client app when updating? 17
    18. 18. REPLICATION the biggie! 18
    19. 19. REPLICATION• Where possible... – avoid conflicts – leverage small pieces• Keep uniqueness and conflicts in balance 19
    20. 20. CONVENTIONSpaving the cattle trails 20
    21. 21. CONVENTIONS & GOOD HABITS• “type”: “contact”• “created_at”: Unix timestamp• “status”: some status for this doc (ex: published)• “tags”: [“couch”, “db”, “nosql”] 21
    22. 22. MORE CONVENTIONS• “created_by”: username – typically from _users database• “profile”: CouchApp profile contents – from _users database – stored on the doc for convenience 22
    23. 23. TOOLS 23
    24. 24. VALIDATE_DOC_UPDATE• optional schema enforcement• function(newDoc, storedDoc, userCtx)• throw errors to prevent save• cannot modify newDoc• can enforce field types, values, formats• can prevent docs or fields from being changed (created_at, user)• runs every time a document is updated – even during replication 24
    25. 25. SAMPLE DOCS (IN 2.0) HANDY FOR QUICK DOC “SCHEMA” REFERENCING 25
    26. 26. ADVANCED DOCUMENT DESIGNmore tools and tricks in this session 26
    27. 27. EXAMPLES 27
    28. 28. 1.{2.   "_id": "2011-10-20T00:32:58_101D8A2A000000F7",3.   "_rev": "1-0c9914a4695b67a4f38cb5f8e345d28f",4.   "reading": 22.98,5.   "sn": "101D8A2A000000F7",6.   "ts": "2011-10-20T00:32:58",7.   "type": "reading"8.} 28
    29. 29. 29
    30. 30. 1.{2.  "_id": "station_724",3.  "_rev": "1-35f3b06a85f2997f365d5e41bcf6967a",4.  "code": "BIH",5.  "name": "Bairagarh",6.  "zone": "WR",7.  "doctype": "station",8.  "state": "Madhya Pradesh",9.  "address": "Bhopal, Madhya Pradesh",10. "id": 72411.} 30
    31. 31. 1.{2.   "_id": "sched_284908",3.   "_rev": "1-776a7ceeea990c8eb84d57dc01ea4d2f",4.   "arrival": "14:40",5.   "halt": "10m",6.   "stop_number": "27",7.   "station_code": "KOTA",8.   "departure": "14:50",9.   "train_number": "19039",10.  "day": 2,11.  "doctype": "schedule",12.  "station_name": "Kota Junction",13.  "id": 284908,14.  "distance_travelled": 90915.} 31
    32. 32. 1. {2. "_id": "train_97",3. "_rev": "2-640c3360c86405167e3b59a8f463d1c0",4. "return_train": "06617",5. "number": "06618",6.  "duration": "11h 45m",7.  "id": 97,8.  "zone": "SR",9.  "date_from": "Nov 23",10. "to_station_code": "CBE",11. "number_of_halts": 13,12. "sleeper": "T",13. "type": "Exp",14. "arrival": "08:00",15. "from_station_code": "NCJ",16. "doctype": "train",17. "departure_days": [18.     "Wednesday"19. ],20. "date_to": "Jan 18",21. "first_class": "F",22. "distance": "497 km",23. "third_ac": "T",24. "name": "Nagercoil-Coimbatore Special",25. "from_station_name": "Nagercoil Junction",26. "departure": "20:15",27. "second_ac": "T",28. "classes": "SL 3A 2A",29. "second_sitting": "F",30. "to_station_name": "Coimbatore Main Junction",31. "first_ac": "F"32.} 32
    33. 33. 1. {2.    "_id": "0017dcf0149c130229f35b537df48073",3.    "_rev": "7-ef6c5723edafb9828dbc36467493341d",4.    "old_id": 7343,5.    "height": 1800,6.    "keywords": [7.        "wrx"8.    ],9.    "cat": "Public",10.    "size": 1711223,11.    "tnwidth": 194,12.    "exif": {13.        "EXIF ApertureValue": "367/100",14.        /* ... */15.        "EXIF SensingMethod": "One-chip color area",16.        "MakerNote AEWarning": "Off",17.        "Thumbnail Orientation": "Horizontal (normal)"18.    },19.    "descr": "My car had fun this weekend.  It got all dirty in the snow and then ran into a sign. ",20.    "ts": "2006-02-22T17:47:16",21.    "addedby": "dustin",22.    "width": 2400,23.    "extension": "jpg",24.    "tnheight": 146,25.    "taken": "2006-02-21",26.    "type": "photo",27.    "annotations": [28.    ],29.    "_attachments": {30.        "800x600.jpg": {31.            "content_type": "image/jpeg",32.            "revpos": 4,33.            "digest": "md5-HB0NfWVLWeQJn8j79214Fw==",34.            "length": 84388,35.            "stub": true36.        }, 37. // [...] 33
    34. 34. BANKING 34
    35. 35. TRANSACTION LOG 35
    36. 36. CURRENT BALANCE 36
    37. 37. CURRENT BALANCE 37
    38. 38. NEW TRANSACTION 38
    39. 39. NEW BALANCE 39
    40. 40. ANY QUESTIONS?• submit to couchconfisrael@couchbase.com• or ask me: • @jchris • jchris@couchbase.com 40

    ×