The Briefing Room with Dr. Robin Bloor and IBM Cloudant
Live Webcast March 24, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e8bf62408d47e76c43aa73be08377e41c
Context matters. Perspective matters. Thinking outside the box? That's often the key! While the Structured Query Language remains the lingua Franca of data, there are some views of the world that are best rendered with the benefit of NoSQL engines. As usual, that's easier said than done. How can your organization migrate from a structured query to unstructured or semi-structured query language?
Register for this episode of The Briefing Room to find out! Veteran Analyst Dr. Robin Bloor will provide a detailed assessment of serious considerations when using NoSQL engines in conjunction with SQL. He'll be briefed by Ryan Millay of IBM Cloudant, who will showcase his company's solution, and how it's addressing the more vexing challenges facing today's information managers.
Visit InsideAnalysis.com for more information.
3. Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
4. Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise
software, good and bad
Provide a forum for detailed analysis of today s innovative
technologies
Give vendors a chance to explain their product to savvy
analysts
Allow audience members to pose serious questions... and
get answers!
Mission
5. Twitter Tag: #briefr The Briefing Room
Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
6. Twitter Tag: #briefr The Briefing Room
More Than } Way to Skin a Cat
NoSQL engines provide escape hatches
Force-fitting all data into relational will fail, because:
Performance is ALWAYS important,
now more than ever
7. Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
8. Twitter Tag: #briefr The Briefing Room
IBM Cloudant
IBM Cloudant offers a non-relational, cloud-based
distributed database
The product is based on Apache CouchDB and provides data
management, search, hosting, admin tools and analytics
Cloudant’s database-as-a-service is often used for web or
mobile application development
9. Twitter Tag: #briefr The Briefing Room
Guest: Ryan Millay
Ryan Millay started with IBM® Cloudant® in
May 2014 after three years as a software
engineer. Now he is part of the Field
Engineering team working on both pre- and
post-sales opportunities with a variety of
different accounts. He is also a member of
the Cloudant Local Services team to help
customers scope and install Cloudant’s on-
premises software. When not at Cloudant,
Ryan enjoys travelling, playing a round of
golf, or binging on the latest show on Netflix.
10. SQL to NoSQL: Top 5 Questions
Mike Broberg
Marketing Communications, Cloudant, IBM Cloud Data Services
Ryan Millay
Field Engineer, Cloudant, IBM Cloud Data Services
12. Housekeeping Notes
12
• Today’s webcast is being recorded. We
will send you a link to the recording, a
link to the library and its code examples,
and a copy of the slide deck after the
presentation.
• The webcast recording will be available
on our website: https://cloudant.com
• If you would like to ask a question during
today’s presentation, please type in your
question using the GoToWebinar tool bar.
14. But, What Is NoSQL, Really?
14
• Umbrella term for databases using non-SQL query languages
• Key-Value stores
• Wide column stores
• Document stores
• Graph stores
• Some also say "non-relational," because data is not
decomposed into separate tables, rows, and columns
• As we’ll see, it’s still possible to represent relationships in NoSQL
• The question is, are these relationships always necessary?
15. Schema Flexibility
15
• Cloudant uses JavaScript Object Notation (JSON) as its data format
• Cloudant is based on Apache CouchDB. In both systems, a "database" is simply
a collection of JSON documents
{
"docs": [
{
"_id": "df8cecd9809662d08eb853989a5ca2f2",
"_rev": "1-8522c9a1d9570566d96b7f7171623270",
"Movie_runtime": 162,
"Movie_rating": "PG-13",
"Person_name": "Zoe Saldana",
"Actor_actor_id": "0757855",
"Movie_genre": "AVYS",
"Movie_name": "Avatar",
"Actor_movie_id": "0499549",
"Movie_earnings_rank": "1",
"Person_pob": "New Jersey, USA",
"Person_id": "0757855",
"Movie_id": "0499549",
"Movie_year": 2009,
"Person_dob": "1978-06-19"
}
]
}
16. Horizontal Scaling
16
• Many commodity servers vs. few expensive ones
• Performance improves linearly with cost, not exponentially
Master-Master Replication
• Or "masterless replica architecture"
• Minimize latency by putting data close to users
• Replicate data widely to mitigate disasters
• Cloudant excels at data movement
18. ... This!
SQL Terms/Concepts
database -->
table -->
row -->
column -->
materialized view -->
primary key -->
table JOIN operations -->
Document Store Terms/Concepts
database
bunch of documents
document
field
index/database view/secondary index
"_id":
entity relations
18
19. Rows --> Documents
19
• Use some field to group documents by schema
• Example: "type":"user" or "type":"edge:follower"
Tables --> Databases
• Put all tables in one database; use "type": to distinguish
• Model entity relationships with secondary indexes
• More on this later in the webinar
• If you're curious, we're talking about concepts described in the
CouchDB documentation on entity relations
• http://wiki.apache.org/couchdb/EntityRelationship
20. Indexes and Queries
20
• An "index" in Cloudant is not strictly a performance optimization
• Instead, more akin to "materialized view" in RDBMS terms
• Index also called a "database view" in Cloudant
• Index, then query.
• You need one before you can do the other
• Create index, then query by URL
• Can create a secondary index on any field within a document
• You get primary index (based on reserved "_id": field) by default
• Indexes precomputed, updated in real time
• Performant at big-honkin' scale
22. Yes
22
By ripping out the bad parts:
• Extract, Transform, Load
• Schema migrations
• JOINs that don't scale
A little more work up-front, but your application will adapt to scale
much better
23. 4. So Each of My Tables Becomes a
Different Type of JSON Document?
23
24. No
24
• Fancy explanation:
• Best practice is to denormalize data into 3rd normal form
• Or, less fancy:
• Smoosh relationships for each entry all together into one JSON doc
• Denormalization
• Approach to data modeling that shards well and scales well
• Works well with data that is somewhat static, or infrequently updated
25. Static Data Example: TV Cast Members
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-
never-use-mongodb/
25
26. What Doesn't Scale
26
• RDBMS JOINs across shards
• Presumably across different machines
• Common pain point when scaling RDBMS
What Does Scale
• Denormalized data models + modern
distributed systems
• More efficient to distribute data if it's already
in one compact unit
27. 5. But What if I Need Relationships? Can
Cloudant Do JOINs?
27
28. Yes ... But First, Don't Do This
Relationships as single documents
28
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-never-use-
mongodb/
29. Some "Key" Concepts
29
• Inject logic into "_id": field to enforce uniqueness
• Example: "_id":"<course>-<student>" ensures at most one
document per course per student
• Give your documents a "type": field
• Add relations as separate "edge" documents
• Exploit powerful materialized view engine
30. Preview: Defining an Index/View
30
• This design document (built in Cloudant Web dashboard)
encapsulates everything that follows
• It builds our secondary index/database view, which we will soon query
• It's the incremental MapReduce view engine we cited earlier
• https://webinar.cloudant.com/relational/_design/join
39. Let's Query That View
39
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]
System-generated
unique doc "_id":
Sort key Pointer to related
followed user's
doc "_id":
40. Let's Query
That View, and
Follow Pointers
40
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true
41. Wait. What Did We Get?
41
• kocolosk’s USER document
• list of all USERs kocolosk FOLLOWS
• full USER document for all USERs that kocolosk FOLLOWS
• In a fast, single query
47. Database is Being Disrupted
u Data volumes
u Speed of arrival
u Content data (JSON)
u IOT data
u Cloud deployment
u Schema on read
u Memory for disk
u Analytic workloads
THIS IS A PERFECT
STORM OF A KIND
48. What Is a Database?
A database is software that presides over a heap
of data that:
u Implements a data model
u Manages multiple concurrent requests for data
u Implements a security model
u Is ACID compliant (?)
u Is resilient
49. RDBMS
Databases that:
u Assume you can represent all data in related
tables
u Assume that you want to process data in a set-wise
manner
u Can be used for many problems
u Are absolutely not universal, hence:
• The Null kluge
• The impedance mismatch
• BLOBS
• OR Databases
50. Another Couple of Issues…
Programmers prefer JSON
The SEMANTICS of data
u It is already beginning to look as though
graph databases are a separate category of
engine
u The triple store tactic (representing data in
triples) is required for semantics, otherwise
meaning is limited
51. Data Access
In reality there is no
DATA ACCESS STANDARD
There are several different
approaches according to the
data model
52. u How much evangelizing of JSON do you find it
necessary to do?
u How swiftly do SQL developers adjust to JSON?
u JOINs are performance hogs in all database
systems. Please explain why you think they are
more economic with Cloudant.
u Does Cloudant scale better than, say, a column
store SQL model?
53. u Can you explain the tuning and other DBA
activities with Cloudant?
u Is recovery the same as with RDBMS?
u What is the database size of your largest
customer (users, data volume)?