Hear Ryan Millay, IBM Cloudant software development manager, discuss what you need to consider when moving from world of relational databases to a NoSQL document store.
You'll learn about the key differences between relational databases and JSON document stores like Cloudant, as well as how to dodge the pitfalls of migrating from a relational database to NoSQL.
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
SQL to NoSQL: Top 6 Questions
1. SQL to NoSQL: Top 6 Questions
Mike Broberg
Marketing Communications, Cloudant, IBM Cloud Data Services
Ryan Millay
Field Engineer, Cloudant, IBM Cloud Data Services
2. Agenda
2
• Top 6 Questions When Moving to NoSQL
1. Why NoSQL?
a. What Is Cloudant?
2. Rows and Tables Become ... What?
3. Will I Have to Rebuild My App?
4. Each of My Tables Becomes a Different Type of JSON Document?
5. What if I Need Relationships? Can Cloudant Do JOINs?
6. Are There Tools That Make Migrating My Data to Cloudant Easier?
• Live Q&A
3. Housekeeping Notes
3
• Today’s webcast is being recorded. We
will send you a link to the recording, a link
to the library and its code examples, and
a copy of the slide deck after the
presentation.
• The webcast recording will be available
on our website: https://cloudant.com
• If you would like to ask a question during
today’s presentation, please type in your
question using the GoToWebinar tool bar.
5. But, What Is NoSQL, Really?
5
• Umbrella term for databases using non-SQL query languages
• Key-Value stores
• Wide column stores
• Document stores
• Graph stores
• Some also say "non-relational," because data is not decomposed
into separate tables, rows, and columns
• As we’ll see, it’s still possible to represent relationships in NoSQL
• The question is, are these relationships always necessary?
6. Today's NoSQL Focus: Document Stores
6
• That's databases like MongoDB, Apache CouchDB™, Cloudant,
and MarkLogic
• Optimized for "semi-structured" or "schema-optional" data
• People say "unstructured," but that's inaccurate
• Each document has its own structure
7. Schema Flexibility
7
• Cloudant uses JavaScript Object Notation (JSON) as its data format
• Cloudant is based on Apache CouchDB. In both systems, a "database" is simply
a collection of JSON documents
{
"docs": [
{
"_id": "df8cecd9809662d08eb853989a5ca2f2",
"_rev": "1-8522c9a1d9570566d96b7f7171623270",
"Movie_runtime": 162,
"Movie_rating": "PG-13",
"Person_name": "Zoe Saldana",
"Actor_actor_id": "0757855",
"Movie_genre": "AVYS",
"Movie_name": "Avatar",
"Actor_movie_id": "0499549",
"Movie_earnings_rank": "1",
"Person_pob": "New Jersey, USA",
"Person_id": "0757855",
"Movie_id": "0499549",
"Movie_year": 2009,
"Person_dob": "1978-06-19"
}
]
}
8. Horizontal Scaling
8
• Many commodity servers vs. few expensive ones
• Performance improves linearly with cost, not exponentially
9. Master-Master Replication
9
Or "masterless replica architecture"
• Replicate data widely to mitigate disasters
• No single point of failure
• Minimize latency by putting data close to users
• Cloudant excels at data movement
10. The Cloudant Data Layer
10
• Distributed NoSQL data persistence
layer
• Available as a fully-managed DBaaS,
or managed by you on-premises
• Transactional JSON document
database with REST API
• Spreads data across data centers &
devices for scale & high availability
• Ideal for apps that require:
• Massive, elastic scalability
• High availability
• Geo-location services
• Full-text search
• Offline-first design for occasionally
connected users
11. Not One DB Server; a Cluster of Servers
• A Cloudant cluster
• Horizontal scale
• Redundant load balancers
backed by multiple DB servers
• Designed for durability
• Saves multiple copies of data
• Spreads copies across cluster
• All replicas do reads & writes
• Access Cloudant over the Web
• Developers get an API
• Cloudant manages it all
behind the scenes
11
lb2 (failover)
lb1
db1
db2 db3
HAProxy
NGINX
Cloudant
Dashboard
12. Bringing OSS and Custom Technology Together
12
Operational
Tooling
Reshard / Rebalance
Monitoring
Built-in monitoring
and system collection
CouchDB 2.0
JSON storage, API,
Replication
Lucene
Text indexing &
Search
Haproxy
Load Balancing
GeoJSON
Geospatial indexing
& query
Cloudant
Query
Declarative Lang.
Diagnostics
Tooling for diagnosing
common issues with
clusters
Using Apache CouchDB™ 2.0 as one of the core components
and wrapping additional features and operational expertise
14. ... This!
SQL Terms/Concepts
database -->
table -->
row -->
column -->
materialized view -->
primary key -->
table JOIN operations -->
Document Store Terms/Concepts
database
bunch of documents
document
field
index/database view/secondary index
"_id":
entity relations
14
15. Rows --> Documents
15
• Use some field to group documents by schema
• Example: "type":"user" or "type":"edge:follower"
• Don't worry. We'll return to this example later on
16. Tables --> Databases
16
• Put all tables in one database; use "type": to distinguish
• Model entity relationships with secondary indexes
• More on this later in the webinar
• Can't wait? We're talking about concepts described in the CouchDB
documentation on entity relationships
• http://wiki.apache.org/couchdb/EntityRelationship
17. Indexes and Queries
17
• An "index" in Cloudant is not strictly a performance optimization
• Instead, more akin to "materialized view" in RDBMS terms
• Index also called a "database view" in Cloudant
• Index, then query
• You need one before you can do the other
• Create index, then query by URL
• Can create a secondary index on any field within a document
• You get primary index (based on reserved "_id": field) by default
• Indexes precomputed, updated in real time
• Indexes are updated using incremental MapReduce
• You don't need to rebuild the entire index every time a document is changed,
added, or deleted
• Performant at big-honkin' scale
18. Aside: One Cloudant DB, Many Indexes
18
• Cloudant comes with several different indexing & query systems
• Cloudant Query: declarative query system
• Borrows syntax from MongoDB, but applied to Cloudant's REST API
• Incremental MapReduce view engine: traditional CouchDB approach
• Efficient range queries at large scale. Useful for aggregate functions/light
analytics on operational data
• Cloudant Search: full-text indexing via Apache Lucene™
• Cloudant Geospatial: proprietary tech for GeoJSON spec
• Beyond bounding box with custom polygons, predictive path, etc.
• All out-of-the-box in Cloudant. No added integration or separate
systems to maintain
20. Yes
20
By ripping out the bad parts:
• Extract, Transform, Load
• Schema migrations
• JOINs that don't scale
21. Scale Whale
• A little more work up-front, but your application will adapt to scale
much better
21
22. 4. Each of My Tables Becomes a Different
Type of JSON Document?
22
23. No
• Fancy explanation:
• Best practice is to denormalize
data into 3rd normal form
• Or, less fancy:
• Smoosh relationships for each
entry all together into one JSON
doc
• Denormalization
• Approach to data modeling that
shards well and scales well
• Works well with data that is
somewhat static, or infrequently
updated
23
A smooshed and griddled cheese sandwich
24. Static Data Example: TV Cast Members
http://www.sarahmei.com/blog/20
13/11/11/why-you-should-never-
use-mongodb/
24
25. 5. What if I Need Relationships?
Can Cloudant Do JOINs?
27
29. Some "Key" Concepts
31
• Inject logic into "_id": field to enforce uniqueness
• Example: "_id":"<course>-<student>" ensures at most one
document per course per student
• Give your documents a "type": field
• Add relations as separate "edge" documents
• Exploit powerful materialized view engine
30. Let's See One in Action
32
https://webinar.cloudant.com/relational
31. Preview: Defining an Index/View
33
• This design document (built in Cloudant Web dashboard)
encapsulates everything that follows
• It builds our secondary index/database view, which we will soon query
• It's the incremental MapReduce view engine we cited earlier
• https://webinar.cloudant.com/relational/_design/join
43. Wait. What Did We Get?
45
• kocolosk’s USER document
• list of all USERs kocolosk FOLLOWS
• full USER document for all USERs that kocolosk FOLLOWS
• In a fast, single query:
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true
44. 6. Are There Tools That Make Migrating My
Data to Cloudant Easier?
46
• Yes
• https://cloudant.com/for-developers/migrating-data/
• But every use case is different and everyone’s data is different
• Lots of DIY tools on github that could work for you
• Cloudant’s Homegrown CSV --> JSON Tools
• python: https://github.com/claudiusli/csv-import
• Java: https://github.com/cavanaugh-ibm/db-data-loader
• Some support for direct SQL queries to database
45. Big Time
47
• IBM InfoSphere
• Complex ETL tool that profiles, cleanses, and transforms data from
heterogeneous data sources
• http://ibm.com/software/data/infosphere/
• SPViewer CouchDBPumper for Oracle
• Commercial tool for migrating data back and forth from CouchDB and
Oracle
• http://spviewer.com/couchdbpump.html
• Eight-Wire Conductor
• Commercial tool for moving data between different sources
• http://www.eight-wire.com/
It’s like XML, but less verbose and marked up in JavaScript syntax. Here, we see one JSON document.
No enforced schema, can vary widely from document to document
Easily handle data from numerous sources
Iterate quickly without schema migrations
... movement across mobile devices, web browsers, within or between clusters, or across distributed data centers
Key to grasp here is fully managed service, integrated features, and data movement (db clusters, data centers, individual mobile devices).
Mention sharding here.
We get this question all the time…what database is Cloudant?
It’s a database that we made to address the needs of people creating large-scale apps with a global user base.
Cloudant is built out of a combination of open source and proprietary technology.
Primarily, it’s based on Apache CouchDB, which is a JSON doc store that excels at data replication and sync. Cloudant is API compatible with CouchDB so that CouchDB apps can move to Cloudant easily and so that Cloudant can replicate and sync data with CouchDB databases (more on that later).
What differentiates Cloudant and CouchDB are the blue boxes in this diagram…Apache Lucene for doing text search, MongoDB_style query syntax, GeoJSON-based geospatial indexing and querying, and so on.
We also give back to open source…Cloudant employs a large number of Apache CouchDB committers.
via http://inkdroid.org/images/ndnp-schema.png
Cloudant is a distributed system that excels at:
Providing many read-able & write-able copies of data
Moving that data around closest to where it's needed
Maintaining high availability data access
More copies in more places avoids typical master-slave failure scenarios
System handles synchronizing changes between different copies
"Duplication stinks"
Just focus on the keys for now
Now, let's look at the values associated with those keys.
You have null as a value
We also have another user doc ID, which serves as a pointer to that record