Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
The Briefing Room
Framing the Argument: How to Scale Faster with NoSQL
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
Twitter Tag: #briefr The Briefing Room
  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
More Than } Way to Skin a Cat
NoSQL engines provide escape hatches
  Force-fitting all data into relational will fail, because:
Performance is ALWAYS important,
now more than ever
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
Twitter Tag: #briefr The Briefing Room
IBM Cloudant
  IBM Cloudant offers a non-relational, cloud-based
distributed database
  The product is based on Apache CouchDB and provides data
management, search, hosting, admin tools and analytics
Cloudant’s database-as-a-service is often used for web or
mobile application development
Twitter Tag: #briefr The Briefing Room
Guest: Ryan Millay
Ryan Millay started with IBM® Cloudant® in
May 2014 after three years as a software
engineer. Now he is part of the Field
Engineering team working on both pre- and
post-sales opportunities with a variety of
different accounts. He is also a member of
the Cloudant Local Services team to help
customers scope and install Cloudant’s on-
premises software. When not at Cloudant,
Ryan enjoys travelling, playing a round of
golf, or binging on the latest show on Netflix.
SQL to NoSQL: Top 5 Questions
Mike Broberg
Marketing Communications, Cloudant, IBM Cloud Data Services
Ryan Millay
Field Engineer, Cloudant, IBM Cloud Data Services
Agenda
11
•  About Cloudant
•  Top 5 Questions When Moving to NoSQL
•  Live Q&A
Housekeeping Notes
12
•  Today’s webcast is being recorded. We
will send you a link to the recording, a
link to the library and its code examples,
and a copy of the slide deck after the
presentation.
•  The webcast recording will be available
on our website: https://cloudant.com
•  If you would like to ask a question during
today’s presentation, please type in your
question using the GoToWebinar tool bar.
1. Why NoSQL?
13
But, What Is NoSQL, Really?
14
•  Umbrella term for databases using non-SQL query languages
•  Key-Value stores
•  Wide column stores
•  Document stores
•  Graph stores
•  Some also say "non-relational," because data is not
decomposed into separate tables, rows, and columns
•  As we’ll see, it’s still possible to represent relationships in NoSQL
•  The question is, are these relationships always necessary?
Schema Flexibility
15
•  Cloudant uses JavaScript Object Notation (JSON) as its data format
•  Cloudant is based on Apache CouchDB. In both systems, a "database" is simply
a collection of JSON documents
{
"docs": [
{
"_id": "df8cecd9809662d08eb853989a5ca2f2",
"_rev": "1-8522c9a1d9570566d96b7f7171623270",
"Movie_runtime": 162,
"Movie_rating": "PG-13",
"Person_name": "Zoe Saldana",
"Actor_actor_id": "0757855",
"Movie_genre": "AVYS",
"Movie_name": "Avatar",
"Actor_movie_id": "0499549",
"Movie_earnings_rank": "1",
"Person_pob": "New Jersey, USA",
"Person_id": "0757855",
"Movie_id": "0499549",
"Movie_year": 2009,
"Person_dob": "1978-06-19"
}
]
}
Horizontal Scaling
16
•  Many commodity servers vs. few expensive ones
•  Performance improves linearly with cost, not exponentially
Master-Master Replication
•  Or "masterless replica architecture"
•  Minimize latency by putting data close to users
•  Replicate data widely to mitigate disasters
•  Cloudant excels at data movement
2. Rows and Tables Become ... What?
17
... This!
SQL Terms/Concepts
database -->
table -->
row -->
column -->
materialized view -->
primary key -->
table JOIN operations -->
Document Store Terms/Concepts
database
bunch of documents
document
field
index/database view/secondary index
"_id":
entity relations
18
Rows --> Documents
19
•  Use some field to group documents by schema
•  Example: "type":"user" or "type":"edge:follower"
Tables --> Databases
•  Put all tables in one database; use "type": to distinguish
•  Model entity relationships with secondary indexes
•  More on this later in the webinar
•  If you're curious, we're talking about concepts described in the
CouchDB documentation on entity relations
•  http://wiki.apache.org/couchdb/EntityRelationship
Indexes and Queries
20
•  An "index" in Cloudant is not strictly a performance optimization
•  Instead, more akin to "materialized view" in RDBMS terms
•  Index also called a "database view" in Cloudant
•  Index, then query.
•  You need one before you can do the other
•  Create index, then query by URL
•  Can create a secondary index on any field within a document
•  You get primary index (based on reserved "_id": field) by default
•  Indexes precomputed, updated in real time
•  Performant at big-honkin' scale
3. Will I Have to Rebuild My App?
21
Yes
22
By ripping out the bad parts:
•  Extract, Transform, Load
•  Schema migrations
•  JOINs that don't scale
A little more work up-front, but your application will adapt to scale
much better
4. So Each of My Tables Becomes a
Different Type of JSON Document?
23
No
24
•  Fancy explanation:
•  Best practice is to denormalize data into 3rd normal form
•  Or, less fancy:
•  Smoosh relationships for each entry all together into one JSON doc
•  Denormalization
•  Approach to data modeling that shards well and scales well
•  Works well with data that is somewhat static, or infrequently updated
Static Data Example: TV Cast Members
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-
never-use-mongodb/
25
What Doesn't Scale
26
•  RDBMS JOINs across shards
•  Presumably across different machines
•  Common pain point when scaling RDBMS
What Does Scale
•  Denormalized data models + modern
distributed systems
•  More efficient to distribute data if it's already
in one compact unit
5. But What if I Need Relationships? Can
Cloudant Do JOINs?
27
Yes ... But First, Don't Do This
Relationships as single documents
28
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-never-use-
mongodb/
Some "Key" Concepts
29
•  Inject logic into "_id": field to enforce uniqueness
•  Example: "_id":"<course>-<student>" ensures at most one
document per course per student
•  Give your documents a "type": field
•  Add relations as separate "edge" documents
•  Exploit powerful materialized view engine
Preview: Defining an Index/View
30
•  This design document (built in Cloudant Web dashboard)
encapsulates everything that follows
•  It builds our secondary index/database view, which we will soon query
•  It's the incremental MapReduce view engine we cited earlier
•  https://webinar.cloudant.com/relational/_design/join
Sample Related Data: Twitter
31
User documents flexible & straightforward
How Do We Deal With Followers?
32
a.  Update each user document with a list
b.  Create relation documents and "join"
E.g., Follower Graph
33
Relationships as Documents
34
Goal: Materialize Users & Following List
35
"join" by selecting rows at lines 103–105
Index Sorting Rules
36
http://wiki.apache.org/couchdb/View_collation
Materialize Users, With All Followed
37
Materialize Users, With All Followed
38
Let's Query That View
39
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]
System-generated
unique doc "_id":
Sort key Pointer to related
followed user's
doc "_id":
Let's Query
That View, and
Follow Pointers
40
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true
Wait. What Did We Get?
41
•  kocolosk’s USER document
•  list of all USERs kocolosk FOLLOWS
•  full USER document for all USERs that kocolosk FOLLOWS
•  In a fast, single query
Legal Slide #1
42
© "Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered
trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.
Legal Slide #2
43
© Copyright IBM Corporation 2015.
IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A
current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/
copytrade.shtml
Thank You
@cloudant
mbroberg@us.ibm.com
rmillay@us.ibm.com
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
Robin Bloor, PhD
Database is Being Disrupted
u  Data volumes
u  Speed of arrival
u  Content data (JSON)
u  IOT data
u  Cloud deployment
u  Schema on read
u  Memory for disk
u  Analytic workloads
THIS IS A PERFECT
STORM OF A KIND
What Is a Database?
A database is software that presides over a heap
of data that:
u  Implements a data model
u  Manages multiple concurrent requests for data
u  Implements a security model
u  Is ACID compliant (?)
u  Is resilient
RDBMS
Databases that:
u  Assume you can represent all data in related
tables
u  Assume that you want to process data in a set-wise
manner
u  Can be used for many problems
u  Are absolutely not universal, hence:
•  The Null kluge
•  The impedance mismatch
•  BLOBS
•  OR Databases
Another Couple of Issues…
Programmers prefer JSON
The SEMANTICS of data
u  It is already beginning to look as though
graph databases are a separate category of
engine
u  The triple store tactic (representing data in
triples) is required for semantics, otherwise
meaning is limited
Data Access
In reality there is no
DATA ACCESS STANDARD
There are several different
approaches according to the
data model
u  How much evangelizing of JSON do you find it
necessary to do?
u  How swiftly do SQL developers adjust to JSON?
u  JOINs are performance hogs in all database
systems. Please explain why you think they are
more economic with Cloudant.
u  Does Cloudant scale better than, say, a column
store SQL model?
u  Can you explain the tuning and other DBA
activities with Cloudant?
u  Is recovery the same as with RDBMS?
u  What is the database size of your largest
customer (users, data volume)?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of
Wikimedia Commons

Framing the Argument: How to Scale Faster with NoSQL

  • 1.
    Grab some coffee and enjoythe pre-show banter before the top of the hour!
  • 2.
    The Briefing Room Framingthe Argument: How to Scale Faster with NoSQL
  • 3.
    Twitter Tag: #briefrThe Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4.
    Twitter Tag: #briefrThe Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  • 5.
    Twitter Tag: #briefrThe Briefing Room Topics March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 6.
    Twitter Tag: #briefrThe Briefing Room More Than } Way to Skin a Cat NoSQL engines provide escape hatches   Force-fitting all data into relational will fail, because: Performance is ALWAYS important, now more than ever
  • 7.
    Twitter Tag: #briefrThe Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8.
    Twitter Tag: #briefrThe Briefing Room IBM Cloudant   IBM Cloudant offers a non-relational, cloud-based distributed database   The product is based on Apache CouchDB and provides data management, search, hosting, admin tools and analytics Cloudant’s database-as-a-service is often used for web or mobile application development
  • 9.
    Twitter Tag: #briefrThe Briefing Room Guest: Ryan Millay Ryan Millay started with IBM® Cloudant® in May 2014 after three years as a software engineer. Now he is part of the Field Engineering team working on both pre- and post-sales opportunities with a variety of different accounts. He is also a member of the Cloudant Local Services team to help customers scope and install Cloudant’s on- premises software. When not at Cloudant, Ryan enjoys travelling, playing a round of golf, or binging on the latest show on Netflix.
  • 10.
    SQL to NoSQL:Top 5 Questions Mike Broberg Marketing Communications, Cloudant, IBM Cloud Data Services Ryan Millay Field Engineer, Cloudant, IBM Cloud Data Services
  • 11.
    Agenda 11 •  About Cloudant • Top 5 Questions When Moving to NoSQL •  Live Q&A
  • 12.
    Housekeeping Notes 12 •  Today’swebcast is being recorded. We will send you a link to the recording, a link to the library and its code examples, and a copy of the slide deck after the presentation. •  The webcast recording will be available on our website: https://cloudant.com •  If you would like to ask a question during today’s presentation, please type in your question using the GoToWebinar tool bar.
  • 13.
  • 14.
    But, What IsNoSQL, Really? 14 •  Umbrella term for databases using non-SQL query languages •  Key-Value stores •  Wide column stores •  Document stores •  Graph stores •  Some also say "non-relational," because data is not decomposed into separate tables, rows, and columns •  As we’ll see, it’s still possible to represent relationships in NoSQL •  The question is, are these relationships always necessary?
  • 15.
    Schema Flexibility 15 •  Cloudantuses JavaScript Object Notation (JSON) as its data format •  Cloudant is based on Apache CouchDB. In both systems, a "database" is simply a collection of JSON documents { "docs": [ { "_id": "df8cecd9809662d08eb853989a5ca2f2", "_rev": "1-8522c9a1d9570566d96b7f7171623270", "Movie_runtime": 162, "Movie_rating": "PG-13", "Person_name": "Zoe Saldana", "Actor_actor_id": "0757855", "Movie_genre": "AVYS", "Movie_name": "Avatar", "Actor_movie_id": "0499549", "Movie_earnings_rank": "1", "Person_pob": "New Jersey, USA", "Person_id": "0757855", "Movie_id": "0499549", "Movie_year": 2009, "Person_dob": "1978-06-19" } ] }
  • 16.
    Horizontal Scaling 16 •  Manycommodity servers vs. few expensive ones •  Performance improves linearly with cost, not exponentially Master-Master Replication •  Or "masterless replica architecture" •  Minimize latency by putting data close to users •  Replicate data widely to mitigate disasters •  Cloudant excels at data movement
  • 17.
    2. Rows andTables Become ... What? 17
  • 18.
    ... This! SQL Terms/Concepts database--> table --> row --> column --> materialized view --> primary key --> table JOIN operations --> Document Store Terms/Concepts database bunch of documents document field index/database view/secondary index "_id": entity relations 18
  • 19.
    Rows --> Documents 19 • Use some field to group documents by schema •  Example: "type":"user" or "type":"edge:follower" Tables --> Databases •  Put all tables in one database; use "type": to distinguish •  Model entity relationships with secondary indexes •  More on this later in the webinar •  If you're curious, we're talking about concepts described in the CouchDB documentation on entity relations •  http://wiki.apache.org/couchdb/EntityRelationship
  • 20.
    Indexes and Queries 20 • An "index" in Cloudant is not strictly a performance optimization •  Instead, more akin to "materialized view" in RDBMS terms •  Index also called a "database view" in Cloudant •  Index, then query. •  You need one before you can do the other •  Create index, then query by URL •  Can create a secondary index on any field within a document •  You get primary index (based on reserved "_id": field) by default •  Indexes precomputed, updated in real time •  Performant at big-honkin' scale
  • 21.
    3. Will IHave to Rebuild My App? 21
  • 22.
    Yes 22 By ripping outthe bad parts: •  Extract, Transform, Load •  Schema migrations •  JOINs that don't scale A little more work up-front, but your application will adapt to scale much better
  • 23.
    4. So Eachof My Tables Becomes a Different Type of JSON Document? 23
  • 24.
    No 24 •  Fancy explanation: • Best practice is to denormalize data into 3rd normal form •  Or, less fancy: •  Smoosh relationships for each entry all together into one JSON doc •  Denormalization •  Approach to data modeling that shards well and scales well •  Works well with data that is somewhat static, or infrequently updated
  • 25.
    Static Data Example:TV Cast Members http://www.sarahmei.com/blog/ 2013/11/11/why-you-should- never-use-mongodb/ 25
  • 26.
    What Doesn't Scale 26 • RDBMS JOINs across shards •  Presumably across different machines •  Common pain point when scaling RDBMS What Does Scale •  Denormalized data models + modern distributed systems •  More efficient to distribute data if it's already in one compact unit
  • 27.
    5. But Whatif I Need Relationships? Can Cloudant Do JOINs? 27
  • 28.
    Yes ... ButFirst, Don't Do This Relationships as single documents 28 http://www.sarahmei.com/blog/ 2013/11/11/why-you-should-never-use- mongodb/
  • 29.
    Some "Key" Concepts 29 • Inject logic into "_id": field to enforce uniqueness •  Example: "_id":"<course>-<student>" ensures at most one document per course per student •  Give your documents a "type": field •  Add relations as separate "edge" documents •  Exploit powerful materialized view engine
  • 30.
    Preview: Defining anIndex/View 30 •  This design document (built in Cloudant Web dashboard) encapsulates everything that follows •  It builds our secondary index/database view, which we will soon query •  It's the incremental MapReduce view engine we cited earlier •  https://webinar.cloudant.com/relational/_design/join
  • 31.
    Sample Related Data:Twitter 31 User documents flexible & straightforward
  • 32.
    How Do WeDeal With Followers? 32 a.  Update each user document with a list b.  Create relation documents and "join"
  • 33.
  • 34.
  • 35.
    Goal: Materialize Users& Following List 35 "join" by selecting rows at lines 103–105
  • 36.
  • 37.
    Materialize Users, WithAll Followed 37
  • 38.
    Materialize Users, WithAll Followed 38
  • 39.
    Let's Query ThatView 39 https://webinar.cloudant.com/relational/_design/join/_view/follows? startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}] System-generated unique doc "_id": Sort key Pointer to related followed user's doc "_id":
  • 40.
    Let's Query That View,and Follow Pointers 40 https://webinar.cloudant.com/relational/_design/join/_view/follows? startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true
  • 41.
    Wait. What DidWe Get? 41 •  kocolosk’s USER document •  list of all USERs kocolosk FOLLOWS •  full USER document for all USERs that kocolosk FOLLOWS •  In a fast, single query
  • 42.
    Legal Slide #1 42 ©"Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.
  • 43.
    Legal Slide #2 43 ©Copyright IBM Corporation 2015. IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/ copytrade.shtml
  • 44.
  • 45.
    Twitter Tag: #briefrThe Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 46.
  • 47.
    Database is BeingDisrupted u  Data volumes u  Speed of arrival u  Content data (JSON) u  IOT data u  Cloud deployment u  Schema on read u  Memory for disk u  Analytic workloads THIS IS A PERFECT STORM OF A KIND
  • 48.
    What Is aDatabase? A database is software that presides over a heap of data that: u  Implements a data model u  Manages multiple concurrent requests for data u  Implements a security model u  Is ACID compliant (?) u  Is resilient
  • 49.
    RDBMS Databases that: u  Assumeyou can represent all data in related tables u  Assume that you want to process data in a set-wise manner u  Can be used for many problems u  Are absolutely not universal, hence: •  The Null kluge •  The impedance mismatch •  BLOBS •  OR Databases
  • 50.
    Another Couple ofIssues… Programmers prefer JSON The SEMANTICS of data u  It is already beginning to look as though graph databases are a separate category of engine u  The triple store tactic (representing data in triples) is required for semantics, otherwise meaning is limited
  • 51.
    Data Access In realitythere is no DATA ACCESS STANDARD There are several different approaches according to the data model
  • 52.
    u  How muchevangelizing of JSON do you find it necessary to do? u  How swiftly do SQL developers adjust to JSON? u  JOINs are performance hogs in all database systems. Please explain why you think they are more economic with Cloudant. u  Does Cloudant scale better than, say, a column store SQL model?
  • 53.
    u  Can youexplain the tuning and other DBA activities with Cloudant? u  Is recovery the same as with RDBMS? u  What is the database size of your largest customer (users, data volume)?
  • 54.
    Twitter Tag: #briefrThe Briefing Room
  • 55.
    Twitter Tag: #briefrThe Briefing Room Upcoming Topics www.insideanalysis.com March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 56.
    Twitter Tag: #briefrThe Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons