Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NoSQL: Why, When, and How

6,687 views

Published on

Published in: Technology
  • Be the first to comment

NoSQL: Why, When, and How

  1. 1. NoSQLWhy, When, and How starring CouchDB
  2. 2. aka built rebuilding on hack atcontribute to work at
  3. 3. One More thing...Co‐organizing
REST
Fest
2011
w/Mike
Amundsenrestfest.org
  4. 4. What is Why NoSQL ^
  5. 5. NoSQL is... Not OnlySQL or SQL I
vote
for
this
one
  6. 6. Origins of the NameCarlo
Strozzi
‐
NoSQL
app started
in
1998.
RDBMS
sans
SQLapplied
to
non‐relational
DB’s
around
2008‐ishno:sql(east)select
fun,
profit
from
real_world
where
relational=false;
  7. 7. Origin of the Speciesnon‐relational
databases
pre‐date
relational
onesit’s
sorta
like
AJAX been
doin’
it
for
awhile gets
a
name now
it’s
cool!!1!
  8. 8. Types of NoSQL DBsgraph
(RDF/Semantic
Web/triples)key‐value
(just
what
it
says)document
(k/v
+
queriability)object
(big
in
the
1980s)multivalue
(old
tech...like
1960s)NewSQL?
  9. 9. NewSQL?Recently
(April,
2011)
coined
term by
The
451
Groupmostly
means
SQL
db’s
+
better
scallabilityor
NoSQL
db’s
+
SQL
layersyeah!
more
keywords!! keeps
marketing
happy...
  10. 10. ...on to specifics
  11. 11. Graph (RDF) pretty
heady
stuff Web
3.0?
Maybe... simple
concept,
FlockDB (from Twitter) complicated
execution
 (often) AllegroGraph Queried
with: Cytoscape SPARQL Java
  12. 12. Key Value Stores scalable
caches generally
no
 query
languageProject Voldemort get
key(s) Hibari return
value(s) Big
Co
need
 driven
  13. 13. Document Databases key value + querying Lotus NotesAmazon SimpleDB
  14. 14. object
  15. 15. multivaluepretty
antiquelots
of
legacy
rollouts
  16. 16. NewSQLany
of
the
previously
mentioned
DB’s
+
query
layers...maybeexpect
any
and
all
SQL
DB’s
to
jump
on
this
train
and/or
slip
in
the
next
few
yearsworth
a
look
if
you
*must*
have
normalized
storagebut
who
needs
normalization?
  17. 17. Why
  18. 18. One Reason
  19. 19. options
  20. 20. denormalizationschemalessgraph/object
schema closer
match
to
business
logicgenerally
faster/more
scalablecertainly
more
distributable
  21. 21. alt.queriesMap/Reduce thanks,
Google
(paper
from
2004)XPath
and/or
XQuerySPARQLSQL...or
something
quite
similar Linq
  22. 22. licensingApache
License
2.0
the
favoriteAGPLv3
the
commercial
favoriteothers
include LGPL(v2/3),
GPL(v2/3),
BSD,
MIT, custom
commercial
or
open
source ...here
there
be
dragons...
  23. 23. (un)expected extras at least in CouchDBBuilt
in
Web
Server or
App
ServerGeospatial
bounding
box
queriesn‐master
replicationbinary
file
storagescales
up
and
down
  24. 24. When
  25. 25. ScenariosScalability caching,
shardingAnalyticsData
WarehousingUbiquitous/Distributed
Data mobile,
desktop,
server
  26. 26. Scalability“cache”
style
DBs data
served
from
RAM
(mostly) Membase,
Elastic
Couchbase,
 memcached,
MongoDB,
Casandrahorizontal
scalability add
more
servers,
not
more
server
  27. 27. AnalyticsHadoopHBaseCassandrathis
bleeds
over
into
data
warehousing
quickly
  28. 28. Ubiquitous/Distributed Data CouchDB server
<=>
desktop
<=>
mobile Riak
Mobile? go
to
Erlang
Factory
in
June
  29. 29. FALE Scenariospower
outagesdata
loss failed
persistence,
no
persistencenetwork
unavailability special thanks to @coats who runs fale.ca for the FALE stamp
  30. 30. Solution: CouchDB
  31. 31. Howwith CouchDB
  32. 32. Time to Relax
  33. 33. Time to Relax That’s
Damien. He
built
CouchDB.
  34. 34. That’s
the
CouchDB gang
sign.Time to Relax That’s
Damien. He
built
CouchDB.
  35. 35. That’s
the
CouchDB gang
sign.Time to Relax Learn
it! That’s
Damien. He
built
CouchDB.
  36. 36. and here’s why
  37. 37. CouchDB has Super Powers!SchemalessDocument
centricReplication/SyncFail
Fast
Architecture stateless
API append
only
file
storage
  38. 38. Document centric“natural”
data
model store
data
like
it
exists
 everywhere
else:
as
a
documentmap/reduce
vs.
sql sorting
documents
out
of
a
drawer
 vs.
reassembling
them
from
bits
of
 data
  39. 39. Replication/SyncMVCC‐based
transactions versioning...but
only
meant
for
 transactionssafely
merge
databasesdocuments
aren’t
compared,
only
UUID’s
&
revision
ID’sconflicting
documents
are
marked
and
a
winner
is
picked
  40. 40. Fail Fastappend
only
database
file everything
goes
on
the
end
of
the
 file querys
are
cached
there
toobounce
back
from
errors
rather
than
spin
wheels
indefinitely
  41. 41. C.O.U.C.HCollectionOfUnreliableCommodityHardware
  42. 42. CouchDB Scalingscales
up
and
down server
<=>
desktop
<=>
mobile thanks
to
n‐master
replicationBigCouch
plugin
for
shardingHTTP
API
can
be
load
balanced reverse
proxies
and
caching
  43. 43. Get CouchDBhttp://www.couchbase.com/downloads/couchbase‐server/communityhttp://iriscouch.com/ both
have
GeoCouch
built
in!https://cloudant.com/
(BigCouch/sharded)your
system’s
package
installer... http://wiki.apache.org/couchdb/ Installation
  44. 44. Time to Relax (command line)Apache CouchDB 1.0.2 (LogLevel=info) is starting.Apache CouchDB has started. Time to relax.[info] [<0.31.0>] Apache CouchDB has started on http://127.0.0.1:5984/
  45. 45. PHP & CouchDB
  46. 46. HTTP ClientsAll
you
really
need
to
get
startedDoes
require
a
better
understanding
of
CouchDB’s
API...but
that’s
A
Good
Thing!Will
improve
your
HTTP
skillz...another
Good
Thing!requires
more
work...the
part
you
won’t
like but
it’s
worth
it
to
learn
HTTP
&
REST
  47. 47. CouchDB ClientsCouchDB’s
API
is
just
HTTPbut...helper
libraries
can...help: auto
(en|de)code
JSON handle
base64‐ing
“inline”
attachments manage
authentication,
cookies,
OAuth
 token
exchange caching!!! _changes
feed
watching
  48. 48. CouchDB ClientsSag
for
CouchDB
(Apache
License
2.0)PHP
On
Couch
(GPLv2
or
v3)Beyond
here,
there
be
giants... PHPillow
(LGPL
3) PHP
Object_Freezer
(BSD) PHP
CouchDB
Extension
(PHP
License
3.0) Doctrine2
CouchDB
ODM
  49. 49. HTTP Clientscurl ugh...messypecl_http lovely
(next
to
curl),
but
takes
 some
install
time,
lacks
examplesZend_HTTP
&
PEAR
HTTP_Request2Most
major
frameworks
have
their
own
  50. 50. Client SuggestionsHTTP pick
one
that’s
flexible
(can
 handle
COPY)CouchDB I
use
Sag
currently.
Caching,
 Cookie
Auth,
nice
name. PHP‐on‐Couch
seems
great,
but
watch
 the
license
(GPL)
  51. 51. Today’s StackIn
PHP: Sag
‐
saggingcouch.com/For
HTTP
API
Demoing/Testing: Resty
‐
github.com/micha/resty Poster
for
Firefox code.google.com/p/poster‐ extension/
  52. 52. Other Handy HTTP ClientsHTTPClient
for
Mac
OS
XCharles
Proxy
($$)http‐twiddle
for
EmacsFiddler
for
WindowsSolex
for
Eclipse
  53. 53. CouchDB HTTP API we’ll be back to PHP in a bit
  54. 54. JSON Documentsall
responses
are
valid
JSONJSON
support
is
built
into
PHP
5.2+pecl
&
“pure”
PHP
(de|en)code
for
older
versions
  55. 55. A JSON Document{ “json”: “key/value pairs”, “_id” : “some uuid”, “_rev”: “mvcc key”, “string keys”: [1,2,3,”four”,null], “schema free”: {“so it’s”:“flexible”}}
  56. 56. JSON to PHP Object$json
=
{"json":"document","with":["an",
"array"]};$j
=
json_decode($json);//
$jstdClass
Object(



[json]
=>
document



[with]
=>
Array







(











[0]
=>
an











[1]
=>
array







))echo
$j‐>with[1];
  57. 57. JSON to PHP array $json = {"json":"document","with":["an", "array"]}; $j = json_decode($json, true); // $j Array ( [json] => document [with] => Array ( [0] => an [1] => array ) ) echo $j[‘with’][1];
  58. 58. I got tired of having to pick -> or [] $j
=
new
ArrayObject( 






json_decod($json), 






ArrayObject::ARRAY_AS_PROPS 




); print_r($j[‘with’][0]); print_r($j‐>with[0]); //
“an”
‐‐
same
result!
no
errors!
  59. 59. HTTP / REST basics GET read PUT create or updateDELETE deletePOST bulk operation
  60. 60. Restycommand line RESTful good times
  61. 61. Setup Restyinstall
(see
resty
page)$
resty
.#
GET,
POST,
PUT,
DELETE
&
HEAD#
are
scripts
now!$
resty
http://localhost:5984/#
set
Resty
to
default
to
CouchDB
  62. 62. Create a Database$
PUT
/pouch/{“ok”:true}$
GET
/pcouch/{"db_name":"pcouch","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"instance_start_time":"1289923325819422","disk_format_version":5,"committed_update_seq":0}
  63. 63. DELETE a Database$
DELETE
/pouch/{“ok”:true}#
^^^
be
careful
with
that
one!#
let’s
recreate
it$
PUT
/pouch/
  64. 64. Create DocumentPUT
/pouch/tek

‘{“php”:”tek”}’{"ok":true,"id":"test","rev":"1‐89af21439a03933bc3fc8c14cbeb496e"}
  65. 65. GET the DocumentGET
/pouch/tek{"_id":"test","_rev":"1‐89af21439a03933bc3fc8c14cbeb496e","php":"tek"}//
well
need
that
_rev
value
to
update
this
doc
  66. 66. PUT (failure) on purpose...PUT
/pouch/tek
‘{“php”:”tek”}’{"error":"conflict","reason":"Document
update
conflict."}#
we
need
that
_rev
value
now
  67. 67. PUT things rightPUT
/pouch/tek
‘{"_rev":
"1‐89af21439a03933bc3fc8c14cbeb496e",
"php":"tek"}’#
that

represents
a
new
line#
doc
could
also
contain
the
“id”{"ok":true,
"id":"tek","rev":"2‐fff750985c2c2602e859fe38cd1d347e"}
  68. 68. PUT binary attachments PUT
/pouch/tek/photo?rev=2‐ fff750985c2c2602e859fe38cd1d347e
 ‐Q
filename.png
 ‐H
“Content‐Type:
image/png” {"ok":true,"id":"tek", "rev":"3‐18d519e58b569e43a6fd5e87491f0c4c"} GET
/pouch/tek {"_id":"tek", "_rev":"3‐18d519e58b569e43a6fd5e87491f0c4c", "_attachments":{"photo":{"content_type":"image/ png", "revpos":1,"length":18,"stub":true}}}
  69. 69. A bit about attachmentsEach
attachment
to
a
doc
has
it’s
own
URL: /pouch/tek/photo /pouch/tek/schedule.pdfeach
attachments
has
it’s
own
mimetypeattachments
can
be
added/updated
via
their
own
URLs
or
inline
  70. 70. DELETE (failure) on purpose...againDELETE
/pouch/tek{"error":"conflict","reason":"Document
update
conflict."}#
you
always
have
to
send
a
“rev”
when
changing
a
doc
in
any
way
  71. 71. DELETEDELETE
‘/pouch/tek?rev=
3‐18d519e58b569e43a6fd5e87491f0c4c’
‐Q#
that
‐Q
tells
Resty
not
to
urlencode{"ok":true,"id":"tek","rev":"4‐3e8c21e8e610c4ea7f8e247a45c6eb04"}#
what?!?
another
rev?...but
the#
document
should
be
dead?!
  72. 72. GETing 404’sGET
/pouch/tek{"error":"not_found","reason":"deleted"}GET
/pouch/tek12{"error":"not_found","reason":"missing"}#
RESTifarian
note:
would
be
a
409
#
if
cache’s
were
built
better
  73. 73. That was just the very basics GET
/_stats
(server
stats) GET
/_all_dbs
(list
all
DB’s) GET
/_all_docs
(list
all
docs) GET
/db/_changes
(list
recent
 changes) super
powers!
  74. 74. Map/Reduce Queries searching the file draweressentially
stored
queries a.k.a.
“Views”written
in
JavaScript
(or
Python
or
Ruby
or
Erlang
or
PHP?) similar
to
array_map()/ array_reduce(),
but
scalableno
ad‐hoc
queries
  75. 75. Pouchput your media in the Couch starring PHP!
  76. 76. i want to coverschema‐less
JSON
docs _securityBinary
Attachments server
side
rendering
 stuffMap/Reduce _show Javascript _list several
examples URL
Rewritingreplication _rewritesecurity vhosts validate_doc_updates
  77. 77. Pouch is...a
Database
of JSON
docs
of
file
meta‐data with
the
file
attached!a
filesystem
importera
Web‐based
CouchApp
for
browsing
  78. 78. 0.1 Files into CouchDB gather
EXIF
data add
attachment PUT
to
CouchDB let
us
know
how
it
went
  79. 79. code preamble#!/usr/bin/php<?phpif ($argc < 2) die(I need a file name.."n");// CouchDB config$user = admin;$password = passwd;// gather and clean EXIF data$exif = exif_read_data($argv[1]);unset($exif[MakerNote]);unset($exif[ComponentsConfiguration]);unset($exif[JPEGThumbnail]);unset($exif[TIFFThumbnail]);
  80. 80. require_once dirname(realpath(__FILE__))./../libs/Sag/src/Sag.php;$sag = new Sag(localhost, 5984);$sag->setDatabase(pouch);// PUT the binary attachment first$sag->setAttachment(original, // the attachment name file_get_contents($argv[1]), // the file image_type_to_mime_type(exif_imagetype($argv[1])), $id = md5_file($argv[1])); // the doc id// then add the EXIF data// GET the full doc as we need to add our info to whats there// atomicity is at a document level in CouchDB$doc = $sag->get($id)->body;$doc->exif = $exif;print_r($sag->put(md5_file($argv[1]), $doc));
  81. 81. handle updates: GET/PUT or HEAD/PUT or PUT/ error_handle/PUT
  82. 82. 3 Tier UI for viewing browser <-> php <-> couchdb
  83. 83. - GET attachments
  84. 84. _all_docs for doc list
  85. 85. loading CouchDB results intoJSON and then into template/ PHP output
  86. 86. Just certain documents ala Map/Reduce
  87. 87. _view API & query params
  88. 88. include_docs
  89. 89. thumbnail creation from PHP async or during upload if we want to wrap CouchDB
  90. 90. disadvantages of non- async operations
  91. 91. 2.5 Tier refactoring browser <-> couchdb ^ php
  92. 92. find photos sans thumbnails and add them
  93. 93. _changes feedcould check this first before hitting the view: _changes is “lighter”
  94. 94. JS app to replace the PHP-built UI
  95. 95. PHP cronjob handlesthumbs/metadata async
  96. 96. CouchDB setup on public port
  97. 97. couchapp script http://couchapp.org
  98. 98. Replicating the App
  99. 99. big advantage: data & app stay together this is huge!
  100. 100. small disadvantages: dynamicdata (thumbs) won’t always be thereno real data loss in this case, as they can be re-created later
  101. 101. _replicate
  102. 102. Removing Images just DELETE the docs
  103. 103. Compaction (for space cleanup) _deleted docs _stats (for CouchDB-wide) space usage)
  104. 104. Securing the CouchApp
  105. 105. _security API
  106. 106. Users/Roles on docs
  107. 107. CouchDB permissions
  108. 108. Cookie Authentication (in the CouchApp)
  109. 109. Basic Authentication (for the cronjob) Sag, HTTP API access, etc.
  110. 110. OAuthit’s an option, but a whole ‘nother talk
  111. 111. Replicating with security
  112. 112. Adding a form for User Data Entru
  113. 113. - Server-side validation
  114. 114. validate_doc_updates document validate for additional security
  115. 115. Making the App “Static” (less AJAX dependent)
  116. 116. _show“templating” for a single doc
  117. 117. _listtemplating for view output
  118. 118. Migrating existing MySQL-based Gallery data to Pouch
  119. 119. writing JSON docspulled from old Gallery
  120. 120. requesting those docs, PUTing them, PUTing the attachment
  121. 121. CouchDB as REST API server Building
another
API
for
Pouch In
addition
to
(or
instead
of)
the
 standard
CouchDB
API Putting
that
into
CouchDB _list
‐
index
pages _show
‐
single
document
page _update
‐
document
modification _rewrite
‐
URL
Rewriting
  122. 122. Any questions?
  123. 123. Other CouchAppsdemo/review
other
CouchApps?discuss
scalability/load
balancing?deeper
dive
into
Map/Reduce
joins?

×