NoSQL: Why, When, and How
Upcoming SlideShare
Loading in...5
×
 

NoSQL: Why, When, and How

on

  • 3,827 views

 

Statistics

Views

Total Views
3,827
Views on SlideShare
3,804
Embed Views
23

Actions

Likes
2
Downloads
56
Comments
0

5 Embeds 23

http://coderwall.com 8
http://www.linkedin.com 7
https://twitter.com 5
http://paper.li 2
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

NoSQL: Why, When, and How NoSQL: Why, When, and How Presentation Transcript

  • NoSQLWhy, When, and How starring CouchDB
  • aka built rebuilding on hack atcontribute to work at
  • One More thing...Co‐organizing
REST
Fest
2011
w/Mike
Amundsenrestfest.org
  • What is Why NoSQL ^
  • NoSQL is... Not OnlySQL or SQL I
vote
for
this
one
  • Origins of the NameCarlo
Strozzi
‐
NoSQL
app started
in
1998.
RDBMS
sans
SQLapplied
to
non‐relational
DB’s
around
2008‐ishno:sql(east)select
fun,
profit
from
real_world
where
relational=false;
  • Origin of the Speciesnon‐relational
databases
pre‐date
relational
onesit’s
sorta
like
AJAX been
doin’
it
for
awhile gets
a
name now
it’s
cool!!1!
  • Types of NoSQL DBsgraph
(RDF/Semantic
Web/triples)key‐value
(just
what
it
says)document
(k/v
+
queriability)object
(big
in
the
1980s)multivalue
(old
tech...like
1960s)NewSQL?
  • NewSQL?Recently
(April,
2011)
coined
term by
The
451
Groupmostly
means
SQL
db’s
+
better
scallabilityor
NoSQL
db’s
+
SQL
layersyeah!
more
keywords!! keeps
marketing
happy...
  • ...on to specifics
  • Graph (RDF) pretty
heady
stuff Web
3.0?
Maybe... simple
concept,
FlockDB (from Twitter) complicated
execution
 (often) AllegroGraph Queried
with: Cytoscape SPARQL Java
  • Key Value Stores scalable
caches generally
no
 query
languageProject Voldemort get
key(s) Hibari return
value(s) Big
Co
need
 driven
  • Document Databases key value + querying Lotus NotesAmazon SimpleDB
  • object
  • multivaluepretty
antiquelots
of
legacy
rollouts
  • NewSQLany
of
the
previously
mentioned
DB’s
+
query
layers...maybeexpect
any
and
all
SQL
DB’s
to
jump
on
this
train
and/or
slip
in
the
next
few
yearsworth
a
look
if
you
*must*
have
normalized
storagebut
who
needs
normalization?
  • Why
  • One Reason
  • options
  • denormalizationschemalessgraph/object
schema closer
match
to
business
logicgenerally
faster/more
scalablecertainly
more
distributable
  • alt.queriesMap/Reduce thanks,
Google
(paper
from
2004)XPath
and/or
XQuerySPARQLSQL...or
something
quite
similar Linq
  • licensingApache
License
2.0
the
favoriteAGPLv3
the
commercial
favoriteothers
include LGPL(v2/3),
GPL(v2/3),
BSD,
MIT, custom
commercial
or
open
source ...here
there
be
dragons...
  • (un)expected extras at least in CouchDBBuilt
in
Web
Server or
App
ServerGeospatial
bounding
box
queriesn‐master
replicationbinary
file
storagescales
up
and
down
  • When
  • ScenariosScalability caching,
shardingAnalyticsData
WarehousingUbiquitous/Distributed
Data mobile,
desktop,
server
  • Scalability“cache”
style
DBs data
served
from
RAM
(mostly) Membase,
Elastic
Couchbase,
 memcached,
MongoDB,
Casandrahorizontal
scalability add
more
servers,
not
more
server
  • AnalyticsHadoopHBaseCassandrathis
bleeds
over
into
data
warehousing
quickly
  • Ubiquitous/Distributed Data CouchDB server
<=>
desktop
<=>
mobile Riak
Mobile? go
to
Erlang
Factory
in
June
  • FALE Scenariospower
outagesdata
loss failed
persistence,
no
persistencenetwork
unavailability special thanks to @coats who runs fale.ca for the FALE stamp
  • Solution: CouchDB
  • Howwith CouchDB
  • Time to Relax
  • Time to Relax That’s
Damien. He
built
CouchDB.
  • That’s
the
CouchDB gang
sign.Time to Relax That’s
Damien. He
built
CouchDB.
  • That’s
the
CouchDB gang
sign.Time to Relax Learn
it! That’s
Damien. He
built
CouchDB.
  • and here’s why
  • CouchDB has Super Powers!SchemalessDocument
centricReplication/SyncFail
Fast
Architecture stateless
API append
only
file
storage
  • Document centric“natural”
data
model store
data
like
it
exists
 everywhere
else:
as
a
documentmap/reduce
vs.
sql sorting
documents
out
of
a
drawer
 vs.
reassembling
them
from
bits
of
 data
  • Replication/SyncMVCC‐based
transactions versioning...but
only
meant
for
 transactionssafely
merge
databasesdocuments
aren’t
compared,
only
UUID’s
&
revision
ID’sconflicting
documents
are
marked
and
a
winner
is
picked
  • Fail Fastappend
only
database
file everything
goes
on
the
end
of
the
 file querys
are
cached
there
toobounce
back
from
errors
rather
than
spin
wheels
indefinitely
  • C.O.U.C.HCollectionOfUnreliableCommodityHardware
  • CouchDB Scalingscales
up
and
down server
<=>
desktop
<=>
mobile thanks
to
n‐master
replicationBigCouch
plugin
for
shardingHTTP
API
can
be
load
balanced reverse
proxies
and
caching
  • Get CouchDBhttp://www.couchbase.com/downloads/couchbase‐server/communityhttp://iriscouch.com/ both
have
GeoCouch
built
in!https://cloudant.com/
(BigCouch/sharded)your
system’s
package
installer... http://wiki.apache.org/couchdb/ Installation
  • Time to Relax (command line)Apache CouchDB 1.0.2 (LogLevel=info) is starting.Apache CouchDB has started. Time to relax.[info] [<0.31.0>] Apache CouchDB has started on http://127.0.0.1:5984/
  • PHP & CouchDB
  • HTTP ClientsAll
you
really
need
to
get
startedDoes
require
a
better
understanding
of
CouchDB’s
API...but
that’s
A
Good
Thing!Will
improve
your
HTTP
skillz...another
Good
Thing!requires
more
work...the
part
you
won’t
like but
it’s
worth
it
to
learn
HTTP
&
REST
  • CouchDB ClientsCouchDB’s
API
is
just
HTTPbut...helper
libraries
can...help: auto
(en|de)code
JSON handle
base64‐ing
“inline”
attachments manage
authentication,
cookies,
OAuth
 token
exchange caching!!! _changes
feed
watching
  • CouchDB ClientsSag
for
CouchDB
(Apache
License
2.0)PHP
On
Couch
(GPLv2
or
v3)Beyond
here,
there
be
giants... PHPillow
(LGPL
3) PHP
Object_Freezer
(BSD) PHP
CouchDB
Extension
(PHP
License
3.0) Doctrine2
CouchDB
ODM
  • HTTP Clientscurl ugh...messypecl_http lovely
(next
to
curl),
but
takes
 some
install
time,
lacks
examplesZend_HTTP
&
PEAR
HTTP_Request2Most
major
frameworks
have
their
own
  • Client SuggestionsHTTP pick
one
that’s
flexible
(can
 handle
COPY)CouchDB I
use
Sag
currently.
Caching,
 Cookie
Auth,
nice
name. PHP‐on‐Couch
seems
great,
but
watch
 the
license
(GPL)
  • Today’s StackIn
PHP: Sag
‐
saggingcouch.com/For
HTTP
API
Demoing/Testing: Resty
‐
github.com/micha/resty Poster
for
Firefox code.google.com/p/poster‐ extension/
  • Other Handy HTTP ClientsHTTPClient
for
Mac
OS
XCharles
Proxy
($$)http‐twiddle
for
EmacsFiddler
for
WindowsSolex
for
Eclipse
  • CouchDB HTTP API we’ll be back to PHP in a bit
  • JSON Documentsall
responses
are
valid
JSONJSON
support
is
built
into
PHP
5.2+pecl
&
“pure”
PHP
(de|en)code
for
older
versions
  • A JSON Document{ “json”: “key/value pairs”, “_id” : “some uuid”, “_rev”: “mvcc key”, “string keys”: [1,2,3,”four”,null], “schema free”: {“so it’s”:“flexible”}}
  • JSON to PHP Object$json
=
{"json":"document","with":["an",
"array"]};$j
=
json_decode($json);//
$jstdClass
Object(



[json]
=>
document



[with]
=>
Array







(











[0]
=>
an











[1]
=>
array







))echo
$j‐>with[1];
  • JSON to PHP array $json = {"json":"document","with":["an", "array"]}; $j = json_decode($json, true); // $j Array ( [json] => document [with] => Array ( [0] => an [1] => array ) ) echo $j[‘with’][1];
  • I got tired of having to pick -> or [] $j
=
new
ArrayObject( 






json_decod($json), 






ArrayObject::ARRAY_AS_PROPS 




); print_r($j[‘with’][0]); print_r($j‐>with[0]); //
“an”
‐‐
same
result!
no
errors!
  • HTTP / REST basics GET read PUT create or updateDELETE deletePOST bulk operation
  • Restycommand line RESTful good times
  • Setup Restyinstall
(see
resty
page)$
resty
.#
GET,
POST,
PUT,
DELETE
&
HEAD#
are
scripts
now!$
resty
http://localhost:5984/#
set
Resty
to
default
to
CouchDB
  • Create a Database$
PUT
/pouch/{“ok”:true}$
GET
/pcouch/{"db_name":"pcouch","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"instance_start_time":"1289923325819422","disk_format_version":5,"committed_update_seq":0}
  • DELETE a Database$
DELETE
/pouch/{“ok”:true}#
^^^
be
careful
with
that
one!#
let’s
recreate
it$
PUT
/pouch/
  • Create DocumentPUT
/pouch/tek

‘{“php”:”tek”}’{"ok":true,"id":"test","rev":"1‐89af21439a03933bc3fc8c14cbeb496e"}
  • GET the DocumentGET
/pouch/tek{"_id":"test","_rev":"1‐89af21439a03933bc3fc8c14cbeb496e","php":"tek"}//
well
need
that
_rev
value
to
update
this
doc
  • PUT (failure) on purpose...PUT
/pouch/tek
‘{“php”:”tek”}’{"error":"conflict","reason":"Document
update
conflict."}#
we
need
that
_rev
value
now
  • PUT things rightPUT
/pouch/tek
‘{"_rev":
"1‐89af21439a03933bc3fc8c14cbeb496e",
"php":"tek"}’#
that

represents
a
new
line#
doc
could
also
contain
the
“id”{"ok":true,
"id":"tek","rev":"2‐fff750985c2c2602e859fe38cd1d347e"}
  • PUT binary attachments PUT
/pouch/tek/photo?rev=2‐ fff750985c2c2602e859fe38cd1d347e
 ‐Q
filename.png
 ‐H
“Content‐Type:
image/png” {"ok":true,"id":"tek", "rev":"3‐18d519e58b569e43a6fd5e87491f0c4c"} GET
/pouch/tek {"_id":"tek", "_rev":"3‐18d519e58b569e43a6fd5e87491f0c4c", "_attachments":{"photo":{"content_type":"image/ png", "revpos":1,"length":18,"stub":true}}}
  • A bit about attachmentsEach
attachment
to
a
doc
has
it’s
own
URL: /pouch/tek/photo /pouch/tek/schedule.pdfeach
attachments
has
it’s
own
mimetypeattachments
can
be
added/updated
via
their
own
URLs
or
inline
  • DELETE (failure) on purpose...againDELETE
/pouch/tek{"error":"conflict","reason":"Document
update
conflict."}#
you
always
have
to
send
a
“rev”
when
changing
a
doc
in
any
way
  • DELETEDELETE
‘/pouch/tek?rev=
3‐18d519e58b569e43a6fd5e87491f0c4c’
‐Q#
that
‐Q
tells
Resty
not
to
urlencode{"ok":true,"id":"tek","rev":"4‐3e8c21e8e610c4ea7f8e247a45c6eb04"}#
what?!?
another
rev?...but
the#
document
should
be
dead?!
  • GETing 404’sGET
/pouch/tek{"error":"not_found","reason":"deleted"}GET
/pouch/tek12{"error":"not_found","reason":"missing"}#
RESTifarian
note:
would
be
a
409
#
if
cache’s
were
built
better
  • That was just the very basics GET
/_stats
(server
stats) GET
/_all_dbs
(list
all
DB’s) GET
/_all_docs
(list
all
docs) GET
/db/_changes
(list
recent
 changes) super
powers!
  • Map/Reduce Queries searching the file draweressentially
stored
queries a.k.a.
“Views”written
in
JavaScript
(or
Python
or
Ruby
or
Erlang
or
PHP?) similar
to
array_map()/ array_reduce(),
but
scalableno
ad‐hoc
queries
  • Pouchput your media in the Couch starring PHP!
  • i want to coverschema‐less
JSON
docs _securityBinary
Attachments server
side
rendering
 stuffMap/Reduce _show Javascript _list several
examples URL
Rewritingreplication _rewritesecurity vhosts validate_doc_updates
  • Pouch is...a
Database
of JSON
docs
of
file
meta‐data with
the
file
attached!a
filesystem
importera
Web‐based
CouchApp
for
browsing
  • 0.1 Files into CouchDB gather
EXIF
data add
attachment PUT
to
CouchDB let
us
know
how
it
went
  • code preamble#!/usr/bin/php<?phpif ($argc < 2) die(I need a file name.."n");// CouchDB config$user = admin;$password = passwd;// gather and clean EXIF data$exif = exif_read_data($argv[1]);unset($exif[MakerNote]);unset($exif[ComponentsConfiguration]);unset($exif[JPEGThumbnail]);unset($exif[TIFFThumbnail]);
  • require_once dirname(realpath(__FILE__))./../libs/Sag/src/Sag.php;$sag = new Sag(localhost, 5984);$sag->setDatabase(pouch);// PUT the binary attachment first$sag->setAttachment(original, // the attachment name file_get_contents($argv[1]), // the file image_type_to_mime_type(exif_imagetype($argv[1])), $id = md5_file($argv[1])); // the doc id// then add the EXIF data// GET the full doc as we need to add our info to whats there// atomicity is at a document level in CouchDB$doc = $sag->get($id)->body;$doc->exif = $exif;print_r($sag->put(md5_file($argv[1]), $doc));
  • handle updates: GET/PUT or HEAD/PUT or PUT/ error_handle/PUT
  • 3 Tier UI for viewing browser <-> php <-> couchdb
  • - GET attachments
  • _all_docs for doc list
  • loading CouchDB results intoJSON and then into template/ PHP output
  • Just certain documents ala Map/Reduce
  • _view API & query params
  • include_docs
  • thumbnail creation from PHP async or during upload if we want to wrap CouchDB
  • disadvantages of non- async operations
  • 2.5 Tier refactoring browser <-> couchdb ^ php
  • find photos sans thumbnails and add them
  • _changes feedcould check this first before hitting the view: _changes is “lighter”
  • JS app to replace the PHP-built UI
  • PHP cronjob handlesthumbs/metadata async
  • CouchDB setup on public port
  • couchapp script http://couchapp.org
  • Replicating the App
  • big advantage: data & app stay together this is huge!
  • small disadvantages: dynamicdata (thumbs) won’t always be thereno real data loss in this case, as they can be re-created later
  • _replicate
  • Removing Images just DELETE the docs
  • Compaction (for space cleanup) _deleted docs _stats (for CouchDB-wide) space usage)
  • Securing the CouchApp
  • _security API
  • Users/Roles on docs
  • CouchDB permissions
  • Cookie Authentication (in the CouchApp)
  • Basic Authentication (for the cronjob) Sag, HTTP API access, etc.
  • OAuthit’s an option, but a whole ‘nother talk
  • Replicating with security
  • Adding a form for User Data Entru
  • - Server-side validation
  • validate_doc_updates document validate for additional security
  • Making the App “Static” (less AJAX dependent)
  • _show“templating” for a single doc
  • _listtemplating for view output
  • Migrating existing MySQL-based Gallery data to Pouch
  • writing JSON docspulled from old Gallery
  • requesting those docs, PUTing them, PUTing the attachment
  • CouchDB as REST API server Building
another
API
for
Pouch In
addition
to
(or
instead
of)
the
 standard
CouchDB
API Putting
that
into
CouchDB _list
‐
index
pages _show
‐
single
document
page _update
‐
document
modification _rewrite
‐
URL
Rewriting
  • Any questions?
  • Other CouchAppsdemo/review
other
CouchApps?discuss
scalability/load
balancing?deeper
dive
into
Map/Reduce
joins?