Framing the Argument: How to Scale Faster with NoSQL

Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!

The Briefing Room
Framing the Argument: How to Scale Faster with NoSQL

Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh

  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission

Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD

More Than } Way to Skin a Cat
NoSQL engines provide escape hatches
  Force-fitting all data into relational will fail, because:
Performance is ALWAYS important,
now more than ever

Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor

IBM Cloudant
  IBM Cloudant offers a non-relational, cloud-based
distributed database
  The product is based on Apache CouchDB and provides data
management, search, hosting, admin tools and analytics
Cloudant’s database-as-a-service is often used for web or
mobile application development

Guest: Ryan Millay
Ryan Millay started with IBM® Cloudant® in
May 2014 after three years as a software
engineer. Now he is part of the Field
Engineering team working on both pre- and
post-sales opportunities with a variety of
different accounts. He is also a member of
the Cloudant Local Services team to help
customers scope and install Cloudant’s on-
premises software. When not at Cloudant,
Ryan enjoys travelling, playing a round of
golf, or binging on the latest show on Netflix.

SQL to NoSQL: Top 5 Questions
Mike Broberg
Marketing Communications, Cloudant, IBM Cloud Data Services
Ryan Millay
Field Engineer, Cloudant, IBM Cloud Data Services

Agenda
11
•  About Cloudant
•  Top 5 Questions When Moving to NoSQL
•  Live Q&A

Housekeeping Notes
12
•  Today’s webcast is being recorded. We
will send you a link to the recording, a
link to the library and its code examples,
and a copy of the slide deck after the
presentation.
•  The webcast recording will be available
on our website: https://cloudant.com
•  If you would like to ask a question during
today’s presentation, please type in your
question using the GoToWebinar tool bar.

But, What Is NoSQL, Really?
14
•  Umbrella term for databases using non-SQL query languages
•  Key-Value stores
•  Wide column stores
•  Document stores
•  Graph stores
•  Some also say "non-relational," because data is not
decomposed into separate tables, rows, and columns
•  As we’ll see, it’s still possible to represent relationships in NoSQL
•  The question is, are these relationships always necessary?

Schema Flexibility
15
•  Cloudant uses JavaScript Object Notation (JSON) as its data format
•  Cloudant is based on Apache CouchDB. In both systems, a "database" is simply
a collection of JSON documents
{
"docs": [
{
"_id": "df8cecd9809662d08eb853989a5ca2f2",
"_rev": "1-8522c9a1d9570566d96b7f7171623270",
"Movie_runtime": 162,
"Movie_rating": "PG-13",
"Person_name": "Zoe Saldana",
"Actor_actor_id": "0757855",
"Movie_genre": "AVYS",
"Movie_name": "Avatar",
"Actor_movie_id": "0499549",
"Movie_earnings_rank": "1",
"Person_pob": "New Jersey, USA",
"Person_id": "0757855",
"Movie_id": "0499549",
"Movie_year": 2009,
"Person_dob": "1978-06-19"
}
]
}

Horizontal Scaling
16
•  Many commodity servers vs. few expensive ones
•  Performance improves linearly with cost, not exponentially
Master-Master Replication
•  Or "masterless replica architecture"
•  Minimize latency by putting data close to users
•  Replicate data widely to mitigate disasters
•  Cloudant excels at data movement

2. Rows and Tables Become ... What?
17

... This!
SQL Terms/Concepts
database -->
table -->
row -->
column -->
materialized view -->
primary key -->
table JOIN operations -->
Document Store Terms/Concepts
database
bunch of documents
document
field
index/database view/secondary index
"_id":
entity relations
18

Rows --> Documents
19
•  Use some field to group documents by schema
•  Example: "type":"user" or "type":"edge:follower"
Tables --> Databases
•  Put all tables in one database; use "type": to distinguish
•  Model entity relationships with secondary indexes
•  More on this later in the webinar
•  If you're curious, we're talking about concepts described in the
CouchDB documentation on entity relations
•  http://wiki.apache.org/couchdb/EntityRelationship

Indexes and Queries
20
•  An "index" in Cloudant is not strictly a performance optimization
•  Instead, more akin to "materialized view" in RDBMS terms
•  Index also called a "database view" in Cloudant
•  Index, then query.
•  You need one before you can do the other
•  Create index, then query by URL
•  Can create a secondary index on any field within a document
•  You get primary index (based on reserved "_id": field) by default
•  Indexes precomputed, updated in real time
•  Performant at big-honkin' scale

3. Will I Have to Rebuild My App?
21

Yes
22
By ripping out the bad parts:
•  Extract, Transform, Load
•  Schema migrations
•  JOINs that don't scale
A little more work up-front, but your application will adapt to scale
much better

4. So Each of My Tables Becomes a
Different Type of JSON Document?
23

No
24
•  Fancy explanation:
•  Best practice is to denormalize data into 3rd normal form
•  Or, less fancy:
•  Smoosh relationships for each entry all together into one JSON doc
•  Denormalization
•  Approach to data modeling that shards well and scales well
•  Works well with data that is somewhat static, or infrequently updated

Static Data Example: TV Cast Members
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-
never-use-mongodb/
25

What Doesn't Scale
26
•  RDBMS JOINs across shards
•  Presumably across different machines
•  Common pain point when scaling RDBMS
What Does Scale
•  Denormalized data models + modern
distributed systems
•  More efficient to distribute data if it's already
in one compact unit

5. But What if I Need Relationships? Can
Cloudant Do JOINs?
27

Yes ... But First, Don't Do This
Relationships as single documents
28
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-never-use-
mongodb/

Some "Key" Concepts
29
•  Inject logic into "_id": field to enforce uniqueness
•  Example: "_id":"<course>-<student>" ensures at most one
document per course per student
•  Give your documents a "type": field
•  Add relations as separate "edge" documents
•  Exploit powerful materialized view engine

Preview: Defining an Index/View
30
•  This design document (built in Cloudant Web dashboard)
encapsulates everything that follows
•  It builds our secondary index/database view, which we will soon query
•  It's the incremental MapReduce view engine we cited earlier
•  https://webinar.cloudant.com/relational/_design/join

Sample Related Data: Twitter
31
User documents flexible & straightforward

How Do We Deal With Followers?
32
a.  Update each user document with a list
b.  Create relation documents and "join"

Goal: Materialize Users & Following List
35
"join" by selecting rows at lines 103–105

Index Sorting Rules
36
http://wiki.apache.org/couchdb/View_collation

Materialize Users, With All Followed
37

Materialize Users, With All Followed
38

Let's Query That View
39
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]
System-generated
unique doc "_id":
Sort key Pointer to related
followed user's
doc "_id":

Let's Query
That View, and
Follow Pointers
40
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true

Wait. What Did We Get?
41
•  kocolosk’s USER document
•  list of all USERs kocolosk FOLLOWS
•  full USER document for all USERs that kocolosk FOLLOWS
•  In a fast, single query

Legal Slide #1
42
© "Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered
trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Legal Slide #2
43
© Copyright IBM Corporation 2015.
IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A
current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/
copytrade.shtml

Thank You
@cloudant
mbroberg@us.ibm.com
rmillay@us.ibm.com

Perceptions & Questions
Analyst:
Robin Bloor

Database is Being Disrupted
u  Data volumes
u  Speed of arrival
u  Content data (JSON)
u  IOT data
u  Cloud deployment
u  Schema on read
u  Memory for disk
u  Analytic workloads
THIS IS A PERFECT
STORM OF A KIND

What Is a Database?
A database is software that presides over a heap
of data that:
u  Implements a data model
u  Manages multiple concurrent requests for data
u  Implements a security model
u  Is ACID compliant (?)
u  Is resilient

RDBMS
Databases that:
u  Assume you can represent all data in related
tables
u  Assume that you want to process data in a set-wise
manner
u  Can be used for many problems
u  Are absolutely not universal, hence:
•  The Null kluge
•  The impedance mismatch
•  BLOBS
•  OR Databases

Another Couple of Issues…
Programmers prefer JSON
The SEMANTICS of data
u  It is already beginning to look as though
graph databases are a separate category of
engine
u  The triple store tactic (representing data in
triples) is required for semantics, otherwise
meaning is limited

Data Access
In reality there is no
DATA ACCESS STANDARD
There are several different
approaches according to the
data model

u  How much evangelizing of JSON do you find it
necessary to do?
u  How swiftly do SQL developers adjust to JSON?
u  JOINs are performance hogs in all database
systems. Please explain why you think they are
more economic with Cloudant.
u  Does Cloudant scale better than, say, a column
store SQL model?

u  Can you explain the tuning and other DBA
activities with Cloudant?
u  Is recovery the same as with RDBMS?
u  What is the database size of your largest
customer (users, data volume)?

Upcoming Topics
www.insideanalysis.com
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD

THANK YOU
for your
ATTENTION!
Some images provided courtesy of
Wikimedia Commons

Framing the Argument: How to Scale Faster with NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Framing the Argument: How to Scale Faster with NoSQL

Similar to Framing the Argument: How to Scale Faster with NoSQL (20)

More from Inside Analysis

More from Inside Analysis (20)

Recently uploaded

Recently uploaded (20)

Framing the Argument: How to Scale Faster with NoSQL