NoSQL: Why, When, and How

  • 4,187 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,187
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
61
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. NoSQLWhy, When, and How starring CouchDB
  • 2. aka built rebuilding on hack atcontribute to work at
  • 3. One More thing...Co‐organizing
REST
Fest
2011
w/Mike
Amundsenrestfest.org
  • 4. What is Why NoSQL ^
  • 5. NoSQL is... Not OnlySQL or SQL I
vote
for
this
one
  • 6. Origins of the NameCarlo
Strozzi
‐
NoSQL
app started
in
1998.
RDBMS
sans
SQLapplied
to
non‐relational
DB’s
around
2008‐ishno:sql(east)select
fun,
profit
from
real_world
where
relational=false;
  • 7. Origin of the Speciesnon‐relational
databases
pre‐date
relational
onesit’s
sorta
like
AJAX been
doin’
it
for
awhile gets
a
name now
it’s
cool!!1!
  • 8. Types of NoSQL DBsgraph
(RDF/Semantic
Web/triples)key‐value
(just
what
it
says)document
(k/v
+
queriability)object
(big
in
the
1980s)multivalue
(old
tech...like
1960s)NewSQL?
  • 9. NewSQL?Recently
(April,
2011)
coined
term by
The
451
Groupmostly
means
SQL
db’s
+
better
scallabilityor
NoSQL
db’s
+
SQL
layersyeah!
more
keywords!! keeps
marketing
happy...
  • 10. ...on to specifics
  • 11. Graph (RDF) pretty
heady
stuff Web
3.0?
Maybe... simple
concept,
FlockDB (from Twitter) complicated
execution
 (often) AllegroGraph Queried
with: Cytoscape SPARQL Java
  • 12. Key Value Stores scalable
caches generally
no
 query
languageProject Voldemort get
key(s) Hibari return
value(s) Big
Co
need
 driven
  • 13. Document Databases key value + querying Lotus NotesAmazon SimpleDB
  • 14. object
  • 15. multivaluepretty
antiquelots
of
legacy
rollouts
  • 16. NewSQLany
of
the
previously
mentioned
DB’s
+
query
layers...maybeexpect
any
and
all
SQL
DB’s
to
jump
on
this
train
and/or
slip
in
the
next
few
yearsworth
a
look
if
you
*must*
have
normalized
storagebut
who
needs
normalization?
  • 17. Why
  • 18. One Reason
  • 19. options
  • 20. denormalizationschemalessgraph/object
schema closer
match
to
business
logicgenerally
faster/more
scalablecertainly
more
distributable
  • 21. alt.queriesMap/Reduce thanks,
Google
(paper
from
2004)XPath
and/or
XQuerySPARQLSQL...or
something
quite
similar Linq
  • 22. licensingApache
License
2.0
the
favoriteAGPLv3
the
commercial
favoriteothers
include LGPL(v2/3),
GPL(v2/3),
BSD,
MIT, custom
commercial
or
open
source ...here
there
be
dragons...
  • 23. (un)expected extras at least in CouchDBBuilt
in
Web
Server or
App
ServerGeospatial
bounding
box
queriesn‐master
replicationbinary
file
storagescales
up
and
down
  • 24. When
  • 25. ScenariosScalability caching,
shardingAnalyticsData
WarehousingUbiquitous/Distributed
Data mobile,
desktop,
server
  • 26. Scalability“cache”
style
DBs data
served
from
RAM
(mostly) Membase,
Elastic
Couchbase,
 memcached,
MongoDB,
Casandrahorizontal
scalability add
more
servers,
not
more
server
  • 27. AnalyticsHadoopHBaseCassandrathis
bleeds
over
into
data
warehousing
quickly
  • 28. Ubiquitous/Distributed Data CouchDB server
<=>
desktop
<=>
mobile Riak
Mobile? go
to
Erlang
Factory
in
June
  • 29. FALE Scenariospower
outagesdata
loss failed
persistence,
no
persistencenetwork
unavailability special thanks to @coats who runs fale.ca for the FALE stamp
  • 30. Solution: CouchDB
  • 31. Howwith CouchDB
  • 32. Time to Relax
  • 33. Time to Relax That’s
Damien. He
built
CouchDB.
  • 34. That’s
the
CouchDB gang
sign.Time to Relax That’s
Damien. He
built
CouchDB.
  • 35. That’s
the
CouchDB gang
sign.Time to Relax Learn
it! That’s
Damien. He
built
CouchDB.
  • 36. and here’s why
  • 37. CouchDB has Super Powers!SchemalessDocument
centricReplication/SyncFail
Fast
Architecture stateless
API append
only
file
storage
  • 38. Document centric“natural”
data
model store
data
like
it
exists
 everywhere
else:
as
a
documentmap/reduce
vs.
sql sorting
documents
out
of
a
drawer
 vs.
reassembling
them
from
bits
of
 data
  • 39. Replication/SyncMVCC‐based
transactions versioning...but
only
meant
for
 transactionssafely
merge
databasesdocuments
aren’t
compared,
only
UUID’s
&
revision
ID’sconflicting
documents
are
marked
and
a
winner
is
picked
  • 40. Fail Fastappend
only
database
file everything
goes
on
the
end
of
the
 file querys
are
cached
there
toobounce
back
from
errors
rather
than
spin
wheels
indefinitely
  • 41. C.O.U.C.HCollectionOfUnreliableCommodityHardware
  • 42. CouchDB Scalingscales
up
and
down server
<=>
desktop
<=>
mobile thanks
to
n‐master
replicationBigCouch
plugin
for
shardingHTTP
API
can
be
load
balanced reverse
proxies
and
caching
  • 43. Get CouchDBhttp://www.couchbase.com/downloads/couchbase‐server/communityhttp://iriscouch.com/ both
have
GeoCouch
built
in!https://cloudant.com/
(BigCouch/sharded)your
system’s
package
installer... http://wiki.apache.org/couchdb/ Installation
  • 44. Time to Relax (command line)Apache CouchDB 1.0.2 (LogLevel=info) is starting.Apache CouchDB has started. Time to relax.[info] [<0.31.0>] Apache CouchDB has started on http://127.0.0.1:5984/
  • 45. PHP & CouchDB
  • 46. HTTP ClientsAll
you
really
need
to
get
startedDoes
require
a
better
understanding
of
CouchDB’s
API...but
that’s
A
Good
Thing!Will
improve
your
HTTP
skillz...another
Good
Thing!requires
more
work...the
part
you
won’t
like but
it’s
worth
it
to
learn
HTTP
&
REST
  • 47. CouchDB ClientsCouchDB’s
API
is
just
HTTPbut...helper
libraries
can...help: auto
(en|de)code
JSON handle
base64‐ing
“inline”
attachments manage
authentication,
cookies,
OAuth
 token
exchange caching!!! _changes
feed
watching
  • 48. CouchDB ClientsSag
for
CouchDB
(Apache
License
2.0)PHP
On
Couch
(GPLv2
or
v3)Beyond
here,
there
be
giants... PHPillow
(LGPL
3) PHP
Object_Freezer
(BSD) PHP
CouchDB
Extension
(PHP
License
3.0) Doctrine2
CouchDB
ODM
  • 49. HTTP Clientscurl ugh...messypecl_http lovely
(next
to
curl),
but
takes
 some
install
time,
lacks
examplesZend_HTTP
&
PEAR
HTTP_Request2Most
major
frameworks
have
their
own
  • 50. Client SuggestionsHTTP pick
one
that’s
flexible
(can
 handle
COPY)CouchDB I
use
Sag
currently.
Caching,
 Cookie
Auth,
nice
name. PHP‐on‐Couch
seems
great,
but
watch
 the
license
(GPL)
  • 51. Today’s StackIn
PHP: Sag
‐
saggingcouch.com/For
HTTP
API
Demoing/Testing: Resty
‐
github.com/micha/resty Poster
for
Firefox code.google.com/p/poster‐ extension/
  • 52. Other Handy HTTP ClientsHTTPClient
for
Mac
OS
XCharles
Proxy
($$)http‐twiddle
for
EmacsFiddler
for
WindowsSolex
for
Eclipse
  • 53. CouchDB HTTP API we’ll be back to PHP in a bit
  • 54. JSON Documentsall
responses
are
valid
JSONJSON
support
is
built
into
PHP
5.2+pecl
&
“pure”
PHP
(de|en)code
for
older
versions
  • 55. A JSON Document{ “json”: “key/value pairs”, “_id” : “some uuid”, “_rev”: “mvcc key”, “string keys”: [1,2,3,”four”,null], “schema free”: {“so it’s”:“flexible”}}
  • 56. JSON to PHP Object$json
=
{"json":"document","with":["an",
"array"]};$j
=
json_decode($json);//
$jstdClass
Object(



[json]
=>
document



[with]
=>
Array







(











[0]
=>
an











[1]
=>
array







))echo
$j‐>with[1];
  • 57. JSON to PHP array $json = {"json":"document","with":["an", "array"]}; $j = json_decode($json, true); // $j Array ( [json] => document [with] => Array ( [0] => an [1] => array ) ) echo $j[‘with’][1];
  • 58. I got tired of having to pick -> or [] $j
=
new
ArrayObject( 






json_decod($json), 






ArrayObject::ARRAY_AS_PROPS 




); print_r($j[‘with’][0]); print_r($j‐>with[0]); //
“an”
‐‐
same
result!
no
errors!
  • 59. HTTP / REST basics GET read PUT create or updateDELETE deletePOST bulk operation
  • 60. Restycommand line RESTful good times
  • 61. Setup Restyinstall
(see
resty
page)$
resty
.#
GET,
POST,
PUT,
DELETE
&
HEAD#
are
scripts
now!$
resty
http://localhost:5984/#
set
Resty
to
default
to
CouchDB
  • 62. Create a Database$
PUT
/pouch/{“ok”:true}$
GET
/pcouch/{"db_name":"pcouch","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"instance_start_time":"1289923325819422","disk_format_version":5,"committed_update_seq":0}
  • 63. DELETE a Database$
DELETE
/pouch/{“ok”:true}#
^^^
be
careful
with
that
one!#
let’s
recreate
it$
PUT
/pouch/
  • 64. Create DocumentPUT
/pouch/tek

‘{“php”:”tek”}’{"ok":true,"id":"test","rev":"1‐89af21439a03933bc3fc8c14cbeb496e"}
  • 65. GET the DocumentGET
/pouch/tek{"_id":"test","_rev":"1‐89af21439a03933bc3fc8c14cbeb496e","php":"tek"}//
well
need
that
_rev
value
to
update
this
doc
  • 66. PUT (failure) on purpose...PUT
/pouch/tek
‘{“php”:”tek”}’{"error":"conflict","reason":"Document
update
conflict."}#
we
need
that
_rev
value
now
  • 67. PUT things rightPUT
/pouch/tek
‘{"_rev":
"1‐89af21439a03933bc3fc8c14cbeb496e",
"php":"tek"}’#
that

represents
a
new
line#
doc
could
also
contain
the
“id”{"ok":true,
"id":"tek","rev":"2‐fff750985c2c2602e859fe38cd1d347e"}
  • 68. PUT binary attachments PUT
/pouch/tek/photo?rev=2‐ fff750985c2c2602e859fe38cd1d347e
 ‐Q
filename.png
 ‐H
“Content‐Type:
image/png” {"ok":true,"id":"tek", "rev":"3‐18d519e58b569e43a6fd5e87491f0c4c"} GET
/pouch/tek {"_id":"tek", "_rev":"3‐18d519e58b569e43a6fd5e87491f0c4c", "_attachments":{"photo":{"content_type":"image/ png", "revpos":1,"length":18,"stub":true}}}
  • 69. A bit about attachmentsEach
attachment
to
a
doc
has
it’s
own
URL: /pouch/tek/photo /pouch/tek/schedule.pdfeach
attachments
has
it’s
own
mimetypeattachments
can
be
added/updated
via
their
own
URLs
or
inline
  • 70. DELETE (failure) on purpose...againDELETE
/pouch/tek{"error":"conflict","reason":"Document
update
conflict."}#
you
always
have
to
send
a
“rev”
when
changing
a
doc
in
any
way
  • 71. DELETEDELETE
‘/pouch/tek?rev=
3‐18d519e58b569e43a6fd5e87491f0c4c’
‐Q#
that
‐Q
tells
Resty
not
to
urlencode{"ok":true,"id":"tek","rev":"4‐3e8c21e8e610c4ea7f8e247a45c6eb04"}#
what?!?
another
rev?...but
the#
document
should
be
dead?!
  • 72. GETing 404’sGET
/pouch/tek{"error":"not_found","reason":"deleted"}GET
/pouch/tek12{"error":"not_found","reason":"missing"}#
RESTifarian
note:
would
be
a
409
#
if
cache’s
were
built
better
  • 73. That was just the very basics GET
/_stats
(server
stats) GET
/_all_dbs
(list
all
DB’s) GET
/_all_docs
(list
all
docs) GET
/db/_changes
(list
recent
 changes) super
powers!
  • 74. Map/Reduce Queries searching the file draweressentially
stored
queries a.k.a.
“Views”written
in
JavaScript
(or
Python
or
Ruby
or
Erlang
or
PHP?) similar
to
array_map()/ array_reduce(),
but
scalableno
ad‐hoc
queries
  • 75. Pouchput your media in the Couch starring PHP!
  • 76. i want to coverschema‐less
JSON
docs _securityBinary
Attachments server
side
rendering
 stuffMap/Reduce _show Javascript _list several
examples URL
Rewritingreplication _rewritesecurity vhosts validate_doc_updates
  • 77. Pouch is...a
Database
of JSON
docs
of
file
meta‐data with
the
file
attached!a
filesystem
importera
Web‐based
CouchApp
for
browsing
  • 78. 0.1 Files into CouchDB gather
EXIF
data add
attachment PUT
to
CouchDB let
us
know
how
it
went
  • 79. code preamble#!/usr/bin/php<?phpif ($argc < 2) die(I need a file name.."n");// CouchDB config$user = admin;$password = passwd;// gather and clean EXIF data$exif = exif_read_data($argv[1]);unset($exif[MakerNote]);unset($exif[ComponentsConfiguration]);unset($exif[JPEGThumbnail]);unset($exif[TIFFThumbnail]);
  • 80. require_once dirname(realpath(__FILE__))./../libs/Sag/src/Sag.php;$sag = new Sag(localhost, 5984);$sag->setDatabase(pouch);// PUT the binary attachment first$sag->setAttachment(original, // the attachment name file_get_contents($argv[1]), // the file image_type_to_mime_type(exif_imagetype($argv[1])), $id = md5_file($argv[1])); // the doc id// then add the EXIF data// GET the full doc as we need to add our info to whats there// atomicity is at a document level in CouchDB$doc = $sag->get($id)->body;$doc->exif = $exif;print_r($sag->put(md5_file($argv[1]), $doc));
  • 81. handle updates: GET/PUT or HEAD/PUT or PUT/ error_handle/PUT
  • 82. 3 Tier UI for viewing browser <-> php <-> couchdb
  • 83. - GET attachments
  • 84. _all_docs for doc list
  • 85. loading CouchDB results intoJSON and then into template/ PHP output
  • 86. Just certain documents ala Map/Reduce
  • 87. _view API & query params
  • 88. include_docs
  • 89. thumbnail creation from PHP async or during upload if we want to wrap CouchDB
  • 90. disadvantages of non- async operations
  • 91. 2.5 Tier refactoring browser <-> couchdb ^ php
  • 92. find photos sans thumbnails and add them
  • 93. _changes feedcould check this first before hitting the view: _changes is “lighter”
  • 94. JS app to replace the PHP-built UI
  • 95. PHP cronjob handlesthumbs/metadata async
  • 96. CouchDB setup on public port
  • 97. couchapp script http://couchapp.org
  • 98. Replicating the App
  • 99. big advantage: data & app stay together this is huge!
  • 100. small disadvantages: dynamicdata (thumbs) won’t always be thereno real data loss in this case, as they can be re-created later
  • 101. _replicate
  • 102. Removing Images just DELETE the docs
  • 103. Compaction (for space cleanup) _deleted docs _stats (for CouchDB-wide) space usage)
  • 104. Securing the CouchApp
  • 105. _security API
  • 106. Users/Roles on docs
  • 107. CouchDB permissions
  • 108. Cookie Authentication (in the CouchApp)
  • 109. Basic Authentication (for the cronjob) Sag, HTTP API access, etc.
  • 110. OAuthit’s an option, but a whole ‘nother talk
  • 111. Replicating with security
  • 112. Adding a form for User Data Entru
  • 113. - Server-side validation
  • 114. validate_doc_updates document validate for additional security
  • 115. Making the App “Static” (less AJAX dependent)
  • 116. _show“templating” for a single doc
  • 117. _listtemplating for view output
  • 118. Migrating existing MySQL-based Gallery data to Pouch
  • 119. writing JSON docspulled from old Gallery
  • 120. requesting those docs, PUTing them, PUTing the attachment
  • 121. CouchDB as REST API server Building
another
API
for
Pouch In
addition
to
(or
instead
of)
the
 standard
CouchDB
API Putting
that
into
CouchDB _list
‐
index
pages _show
‐
single
document
page _update
‐
document
modification _rewrite
‐
URL
Rewriting
  • 122. Any questions?
  • 123. Other CouchAppsdemo/review
other
CouchApps?discuss
scalability/load
balancing?deeper
dive
into
Map/Reduce
joins?