Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
NoSQL Data Modeling Using JSON Documents –
A Practical Approach
David Segleau
Dir.Technical Product Marketing
Couchbase
©2016 Couchbase Inc. 2
About the speaker – David Segleau
David Segleau
DirectorTechnical Product Marketing
Couchbase (sinc...
©2016 Couchbase Inc. 3
Today’s agenda
§ What is Couchbase?
§ Why NoSQL?
§ Identifying the right application
§ Modeling you...
©2016 Couchbase Inc. 4
What is Couchbase?
Couchbase delivers the Data Platform for the Digital Economy
• Products: Couchba...
©2016 Couchbase Inc. 5
Who is using Couchbase?
6 of the top 10
ECOMMERCE
COMPANIES
IN THE US
3 of the 3
GDS COMPANIES
3 of...
©2016 Couchbase Inc. 6
Who is using Couchbase?
§ Gannett, publisher of 90+ media properties, replaced relational database
...
©2016 Couchbase Inc. 7
What is NoSQL?
§ No SQL?
§ Not only SQL?
üNon relational
§ Distributed (most)
– Scaled out, not up
...
©2016 Couchbase Inc. 8
Why are they using NoSQL?
Technology Drivers
§ Customers are going online
§ The internet is connect...
©2016 Couchbase Inc. 9
NoSQL vs. RDBMS
§ Replace or Complement? à It depends
– Replace: NoSQL is often the operational
dat...
©2016 Couchbase Inc. 10
Why migrate from an RDBMS to NoSQL?
§ Easier to scale
3 nodes to 100s, 1 data center to many, comm...
©2016 Couchbase Inc. 11
How do you get started?
1. Identify the right application
2. Model your data
3. Access your data
4...
©2016 Couchbase Inc. 12
Identifying the right application
©2016 Couchbase Inc. 13
Identifying the right application
Have one or more of the following characteristics or requirement...
©2016 Couchbase Inc. 14
Model your data
©2016 Couchbase Inc. 15
Demystifying terminology
Relational NoSQL (Couchbase)
Failover Cluster Cluster
Availability Group ...
©2016 Couchbase Inc. 16
Data Modeling Approaches
NoSQL
Relaxed Normalization
schema implied by structure
fields may be emp...
©2016 Couchbase Inc. 17
What and Why JSON?
17
• What is JSON?
– Schema flexibility
– Lightweight data interchange format
–...
©2016 Couchbase Inc. 18
Modeling your data: Fixed vs. self-describing schema
©2016 Couchbase Inc. 19
Modeling your data:The flexibility of JSON
Same document type,
Different fields
• Different types
...
©2016 Couchbase Inc. 20
Modeling your data: Changing the data model
Relational database
• Modify the database schema
• Mod...
©2016 Couchbase Inc. 21
Modeling your data: Object IDs
Best Practices
• Natural Keys
• Human Readable
• Deterministic
• Se...
©2016 Couchbase Inc. 22
Modeling your data: Relationships
Author
Blog (FK)Blog (FK)
Comment (FK) Comment (FK)
Author (FK x...
©2016 Couchbase Inc. 23
Modeling your data: Relationships - Related or Nested
©2016 Couchbase Inc. 24
Modeling your data: Strategies and best practices
If … Then …
Relationship is one-to-one or one-to...
©2016 Couchbase Inc. 25
Modeling your data: Strategies and best practices
§ Are there a lot of concurrent writes, continuo...
©2016 Couchbase Inc. 26
Some JSON Design Choices
26
• Couchbase Server neither enforces nor validates for any particular d...
©2016 Couchbase Inc. 27
Access your data
©2016 Couchbase Inc. 28
Accessing your data: Options
Key-Value
(CRUD)
N1QL
(Query)
Views
(Query)
Documents
Indexes MapRedu...
©2016 Couchbase Inc. 29
Accessing your data – N1QL queries: Capabilities
Feature SQL N1QL
JOIN ✔ ✔
TRANSFORM ✔ ✔
FILTER ✔ ...
©2016 Couchbase Inc. 30
Accessing your data: N1QL queries – referenced data
©2016 Couchbase Inc. 31
Accessing your data: N1QL queries – nested data
©2016 Couchbase Inc. 32
Accessing your data: N1QL queries – CRUD
©2016 Couchbase Inc. 33
Accessing your data: N1QL queries – indexes
Simple
Compound
Functional
Partial
©2016 Couchbase Inc. 34
Couchbase Index Options
34
IndexType Description
1 Primary Index Index on the document key on the ...
©2016 Couchbase Inc. 35
Accessing your data: Indexing Considerations
Relational Couchbase
Indexes are synchronous, index &...
©2016 Couchbase Inc. 36
Understanding your Query Plan: Explain
§ EXPLAIN shows the query plan, i.e exact steps how N1QL
pl...
©2016 Couchbase Inc. 37
Accessing your data: Strategies and best practices
Concept Strategies & Best Practices
Key-Value O...
©2016 Couchbase Inc. 38
Migrate your data
©2016 Couchbase Inc. 39
So many options! Remember the KISS principle
1) Identify the requirements
• ETL vs. Data cleanse v...
©2016 Couchbase Inc. 40
How can you sync NoSQL and relational?
§ 1. Application Code (Manual)
§ 2. Replication (Automatic)...
©2016 Couchbase Inc. 41
Data Modeling Best Practices Recap
• Pick the right application
• Focus on SOA, application/use ca...
©2016 Couchbase Inc. 42
Questions?
©2016 Couchbase Inc. 43
Want to learn more?
Getting Started guide:
http://www.couchbase.com/get-started-developing-nosql
D...
©2016 Couchbase Inc. 44
Additional Resources
44
§ General Docs: http://docs.couchbase.com
§ Developer Portal: http://devel...
©2016 Couchbase Inc. 45
Additional Resources – Data Modeling
45
Webinar:The Why,When, and How of NoSQL: A Practical Approa...
©2016 Couchbase Inc. 46
Thank you
Upcoming SlideShare
Loading in …5
×

Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach

2,630 views

Published on

After three decades of relational data modeling, everyone’s pretty comfortable with schemas, tables, and entity-relationships. As more and more Global 2000 companies choose NoSQL databases to power their Digital Economy applications, they need to think about how to best model their data. How do they move from a constrained, table-driven model to an agile, flexible data model based on JSON documents?

This webinar is intended for architects and application developers who want to learn about new JSON document data modeling approaches, techniques, and best practices. This webinar will show you how to get started building a JSON document data model, how to migrate a table-based data model to JSON documents, and how to optimize your design to enable fast query performance.

This webinar will provide practical, experience-based advice and best practices for modeling JSON documents, including:

- When to embed or not embed objects in your JSON document
- Data modeling using a practical data access pattern approach
- Indexing your JSON documents
- Querying your data using N1QL (SQL for JSON)

Published in: Technology
  • Be the first to comment

Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach

  1. 1. NoSQL Data Modeling Using JSON Documents – A Practical Approach David Segleau Dir.Technical Product Marketing Couchbase
  2. 2. ©2016 Couchbase Inc. 2 About the speaker – David Segleau David Segleau DirectorTechnical Product Marketing Couchbase (since Nov 2015) Experience: - Database guy - Couchbase, Oracle, Sleepycat, Informix, Illustra,Teradata - Tech Marketing,VP Eng, Prod Mgmt, QA, Support,Training, Docs - Technology is only useful when it’s deployed - Expertise: - Database server technology, RDBMS, and NoSQL
  3. 3. ©2016 Couchbase Inc. 3 Today’s agenda § What is Couchbase? § Why NoSQL? § Identifying the right application § Modeling your data § Accessing your data § Migrating your data § Q & A
  4. 4. ©2016 Couchbase Inc. 4 What is Couchbase? Couchbase delivers the Data Platform for the Digital Economy • Products: Couchbase Server & Couchbase Mobile • Open source NoSQL, JSON document database • Founded 2010 • 500+ enterprise customers, including 20+ Fortune 100 UNIFIED ADMINISTRATION UNIFIED PROGRAMMING INTERFACE Data Query Index SearchMobileReplication Analytics {N1QL}
  5. 5. ©2016 Couchbase Inc. 5 Who is using Couchbase? 6 of the top 10 ECOMMERCE COMPANIES IN THE US 3 of the 3 GDS COMPANIES 3 of the 10 AIRLINES 6 of the top 10 US & EUROPEAN BROADCAST COMPANIES 6 of the top 10 ONLINE CASINO GAMING COMPANIES 6 of the top 10 FIN SERVICES COMPANIES IN THE US
  6. 6. ©2016 Couchbase Inc. 6 Who is using Couchbase? § Gannett, publisher of 90+ media properties, replaced relational database technology with NoSQL to power its digital publishing platform. § eBay, with over 2 billion page views per day, uses Couchbase + RDBMS for their Listing cache, and Couchbase as database of record forToken management. § Cars.com, with over 30 million visits per month, replaced SQL Server with NoSQL to store customer and vehicle data. § Marriott deployed NoSQL to modernize its hotel reservation system that supports $38 billion in annual bookings. § Equifax uses Couchbase to generate insights from historic credit data, leveraging the JSON documents to represent complex data objects without normalization.
  7. 7. ©2016 Couchbase Inc. 7 What is NoSQL? § No SQL? § Not only SQL? üNon relational § Distributed (most) – Scaled out, not up • Elasticity and commodity hardware – Partitioned and replicated • Scalability, performance, availability § Schema-less (most) – Flexible model – JSON (some) § Multi-model – Key-value & Document – Columnar & Graph – Graph & Key-value
  8. 8. ©2016 Couchbase Inc. 8 Why are they using NoSQL? Technology Drivers § Customers are going online § The internet is connecting everything § Big Data is getting bigger § Applications are moving to the cloud § The world has gone mobile Technical Needs § Develop with agility – Flexibility + Simplicity – Easier + Faster § Operate at any scale – Elasticity + Availability – Performance at scale – Always-on, global deployment Business Needs § Innovate and compete – Faster time to market – Reduced costs (operational + hardware) – Increased revenue
  9. 9. ©2016 Couchbase Inc. 9 NoSQL vs. RDBMS § Replace or Complement? à It depends – Replace: NoSQL is often the operational database of record – Complement: NoSQL adds perf, scale, and availability to legacy RDBMS § Most customers use RDBMS and NoSQL § NoSQL is adding RDBMS features – Security, Query Language, Analytics § RDBMS is adding NoSQL features – Sharding, JSON, Distributed Processing
  10. 10. ©2016 Couchbase Inc. 10 Why migrate from an RDBMS to NoSQL? § Easier to scale 3 nodes to 100s, 1 data center to many, commodity hardware § Better performance Integrated caching, memory-optimized indexes, memory-based replication § Up to 40x lower cost Open source, subscription-based, per instance (not per core) § Greater agility JSON-based data model, SQL-based query language § Cross-platform Runs onWindows or Linux (Red Hat, Ubuntu, Debian, etc.)
  11. 11. ©2016 Couchbase Inc. 11 How do you get started? 1. Identify the right application 2. Model your data 3. Access your data 4. Migrate your data 5. Q&A
  12. 12. ©2016 Couchbase Inc. 12 Identifying the right application
  13. 13. ©2016 Couchbase Inc. 13 Identifying the right application Have one or more of the following characteristics or requirements: ü Innovate and iterate faster ü Send and receive JSON ü Provide low latency at any throughput ü Support many concurrent users ü Supports users anywhere and everywhere ü Be available 24x7 ü Store terabytes of data ü Read and write to multiple data centers Service RDBMS Service Service NoSQL Application Examples: Ø High performance, high availability caching service Ø Independent application with a narrow scope Ø Logical or physical service within a large application Ø Global service that powers multiple applications
  14. 14. ©2016 Couchbase Inc. 14 Model your data
  15. 15. ©2016 Couchbase Inc. 15 Demystifying terminology Relational NoSQL (Couchbase) Failover Cluster Cluster Availability Group Cluster Database Bucket Table Bucket Row (Tuple) Document (JSON) Primary Key Object ID IDENTITY or Sequence Counter IndexedView View SQL N1QL
  16. 16. ©2016 Couchbase Inc. 16 Data Modeling Approaches NoSQL Relaxed Normalization schema implied by structure fields may be empty, duplicate, or missing Relational Required Normalization schema enforced by DB same fields in all records • Minimize data inconsistencies (one item = one location) • Reduced duplicated data • Preserve storage resources • Optimized based on access patterns • Flexible, based on application requirements • Supports clustered architecture • Reduced server overhead
  17. 17. ©2016 Couchbase Inc. 17 What and Why JSON? 17 • What is JSON? – Schema flexibility – Lightweight data interchange format – Based on JavaScript – Programming language independent – Field names must be unique • Why JSON? – Less verbose – Can represent Objects and Arrays (including nested documents) No impedance mismatch between a JSON Document and a Java Object
  18. 18. ©2016 Couchbase Inc. 18 Modeling your data: Fixed vs. self-describing schema
  19. 19. ©2016 Couchbase Inc. 19 Modeling your data:The flexibility of JSON Same document type, Different fields • Different types • Optional • On demand Tip: Add a version field to track changes. {“docType”: “user”, “docVersion”: “1”, …} {“docType”: “user”, “docVersion”: “2”, …}
  20. 20. ©2016 Couchbase Inc. 20 Modeling your data: Changing the data model Relational database • Modify the database schema • Modify the application code (e.g., Java) • Modify the interface (e.g., HTML5/JS) Document database • Modify the interface (e.g., HTML5/JS)
  21. 21. ©2016 Couchbase Inc. 21 Modeling your data: Object IDs Best Practices • Natural Keys • Human Readable • Deterministic • Semantic Examples • author::shane • author::shane::blogs • blog::nosql_fueled_hadoop • blog::nosql_fueled_hadoop::comments What about identity columns? 1. Document<Long> nextAuthorIdDoc = bucket.counter(“authorIdCounter”, 1); 2. Long nextAuthorId = nextAuthorIdDoc.content(); 3. String authDocId = “author::” + nextAuthorId; // author::101 Tip: Increment the counter by 10, 20, etc. instead of doing it for every insert.
  22. 22. ©2016 Couchbase Inc. 22 Modeling your data: Relationships Author Blog (FK)Blog (FK) Comment (FK) Comment (FK) Author (FK x2) BlogBlog (FK x2) Comment Comment Bottom up/”BelongsTo” Top down/”Has”
  23. 23. ©2016 Couchbase Inc. 23 Modeling your data: Relationships - Related or Nested
  24. 24. ©2016 Couchbase Inc. 24 Modeling your data: Strategies and best practices If … Then … Relationship is one-to-one or one-to-many Store related data as nested objects Relationship is many-to-one or many-to-many Store related data as separate documents Data reads are mostly parent fields Store children as separate documents Data reads are mostly parent + child fields Store children as nested objects Data writes are mostly parent or child (not both) Store children as separate documents Data writes are mostly parent and child (both) Store children as nested objects
  25. 25. ©2016 Couchbase Inc. 25 Modeling your data: Strategies and best practices § Are there a lot of concurrent writes, continuous updates? § Store children as separate documents Blog § Thread § Comment § Comment § Thread § Comment § Comment Blog { “docType”: “blog”, “author”: “author::shane”, “title”: “Couchbase Wins”, “threads”: [ “blog::couchbase_wins::threads::001”, “blog::couchbase_wins::threads::002” } Thread { “docType”: “thread”, “comments”: [ { “visitor”: “Brendan Bond”, “text”: “This blog is amazing!” “replies”: [ { “user”: “Dustin Johnson”, “text”: “No, it is not.” }] } }
  26. 26. ©2016 Couchbase Inc. 26 Some JSON Design Choices 26 • Couchbase Server neither enforces nor validates for any particular document structure • Choices that impact JSON document design: – Single Root Attributes vs. Document type – Objects vs. Arrays – Array ElementTypes – Timestamp Formats – Property Names – Empty and Null PropertyValuesVS Missing Properties – JSON Schema Options • See "Agile document modeling and data structures“ from Couchbase Connect16 On-Demand Recordings
  27. 27. ©2016 Couchbase Inc. 27 Access your data
  28. 28. ©2016 Couchbase Inc. 28 Accessing your data: Options Key-Value (CRUD) N1QL (Query) Views (Query) Documents Indexes MapReduce FullText (Search) Geospatial (Search) We’ll focus on N1QL ]for now. Indexes MapReduce
  29. 29. ©2016 Couchbase Inc. 29 Accessing your data – N1QL queries: Capabilities Feature SQL N1QL JOIN ✔ ✔ TRANSFORM ✔ ✔ FILTER ✔ ✔ AGGREGATE ✔ ✔ SORT ✔ ✔ SUBQUERIES ✔ ✔ PAGINATION ✔ ✔ OPERATORS ✔ ✔ FUNCTIONS ✔ ✔
  30. 30. ©2016 Couchbase Inc. 30 Accessing your data: N1QL queries – referenced data
  31. 31. ©2016 Couchbase Inc. 31 Accessing your data: N1QL queries – nested data
  32. 32. ©2016 Couchbase Inc. 32 Accessing your data: N1QL queries – CRUD
  33. 33. ©2016 Couchbase Inc. 33 Accessing your data: N1QL queries – indexes Simple Compound Functional Partial
  34. 34. ©2016 Couchbase Inc. 34 Couchbase Index Options 34 IndexType Description 1 Primary Index Index on the document key on the whole bucket 2 Simple Index Index on the key-value or document-key 3 Composite Index Index on more than one key-value 4 Functional Index Index on function or expression on key-values 5 Partial Index Index subset of items in the bucket -- usesWHERE clause 6 Array Index Index individual elements of the arrays 7 Memory Optimized Index Index that is pinned in memory – defined when the cluster is configured 8 Covering Index Query able to resolve the query 100% within the index 9 Duplicate Index Ability to create a copy of the index on specific nodes within the cluster, thereby providing load balancing and failover – usesWITH { “nodes”: } clause
  35. 35. ©2016 Couchbase Inc. 35 Accessing your data: Indexing Considerations Relational Couchbase Indexes are synchronous, index & data are in sync Indexes are asynchronous, index updates lag behind the data, application specifies read consistency Indexes slow down write operations Indexes do not affect write throughput Index load balancing for queries can only be implemented in the application Index load balancing for queries is automatic, based on index signature Indexes contend with other memory usage Memory Optimized indexes are pinned in memory and provides low-latency, high mutation throughput
  36. 36. ©2016 Couchbase Inc. 36 Understanding your Query Plan: Explain § EXPLAIN shows the query plan, i.e exact steps how N1QL plans to execute the query cbq> EXPLAIN INSERT INTO default VALUES ("1", { "make" : "Toyota"}); "plan": { "#operator": "Sequence", "~children": [ { "#operator": "ValueScan", "values": "[["1", {""make"": "Toyota"}]]" }, { "#operator": "Parallel", "maxParallelism": 1, "~child": { "#operator": "Sequence", "~children": [ { "#operator": "SendInsert",
  37. 37. ©2016 Couchbase Inc. 37 Accessing your data: Strategies and best practices Concept Strategies & Best Practices Key-Value Operations provide the best possible performance • Create an effective key naming strategy • Create an optimized data model Incremental MapReduce (Views) are well suited to aggregation • Ideal for large data sets • Data set can be used to create complex view indexes N1QL queries provide the most flexibility – everything else • Query data regardless of how it is modeled • Remember to create secondary indexes, leverage covering indexes where possible
  38. 38. ©2016 Couchbase Inc. 38 Migrate your data
  39. 39. ©2016 Couchbase Inc. 39 So many options! Remember the KISS principle 1) Identify the requirements • ETL vs. Data cleanse vs. Data enrichment • Duration vs. Resources • Data governance 2) Pick your strategy • Batch vs. Incremental • Single threaded vs. multi-threaded 3) Pick your tools • Data migration tools (Informatica, Looker, Talend) • BYO-tool (PHP & Python scripts, Hadoop, Spark) • KISS with Couchbase • Export to CVS; Import as documents; Use N1QL to transform & insert into new bucket • Use SQL to transform & export; Insert into Couchbase • Best Practices • Align with your data model • Plan for failure (bad source data, hardware failure, resource limitations) • Ensure interruptible, restartable, logged, predictable
  40. 40. ©2016 Couchbase Inc. 40 How can you sync NoSQL and relational? § 1. Application Code (Manual) § 2. Replication (Automatic) – From NoSQL to relational – From relational to NoSQL Couchbase Kafka Queue Producer Consumer RDBMSDCP Stream RDBMS Handler CouchbaseGoldenGate https://github.com/mahurtado/CouchbaseGoldenGateAdapter
  41. 41. ©2016 Couchbase Inc. 41 Data Modeling Best Practices Recap • Pick the right application • Focus on SOA, application/use case specific • Drive data model from data access patterns • Use Document type,Versionid • Create optimized, understandable keys • Weigh nested, referenced or mixed designs • Add indexes: Simple, Compound, Functional, Partial, Array, Covering, Memory Optimized • Match the data access method to requirements • N1QL, Key-value,Views, • Proof of Concept • Focus, Success Criteria, Review Architecture
  42. 42. ©2016 Couchbase Inc. 42 Questions?
  43. 43. ©2016 Couchbase Inc. 43 Want to learn more? Getting Started guide: http://www.couchbase.com/get-started-developing-nosql Download Couchbase software: http://www.couchbase.com/nosql-databases/downloads Free OnlineTraining http://training.couchbase.com/online “Why NoSQL” white paper http://www.couchbase.com/nosql-resources/why-nosql
  44. 44. ©2016 Couchbase Inc. 44 Additional Resources 44 § General Docs: http://docs.couchbase.com § Developer Portal: http://developer.couchbase.com § Couchbase Labs: https://github.com/couchbaselabs § Query Portal: http://query.couchbase.com § Sample Applications: § https://github.com/couchbaselabs?utf8=%E2%9C%93&query=try § https://github.com/couchbaselabs?utf8=%E2%9C%93&query=beer § Blog: http://blog.couchbase.com § Forum: http://forums.couchbase.com
  45. 45. ©2016 Couchbase Inc. 45 Additional Resources – Data Modeling 45 Webinar:The Why,When, and How of NoSQL: A Practical Approach Webinar: Relational to NoSQL: How to Get Started from SQL Server Presentation: Data Modeling with Couchbase Server Connect16 On Demand Recordings • Agile document modeling and data structures • Migrating from relational – Data modeling and access • LINQing to data: Easing the transition from SQL • Tuning for Performance: Indexes and Queries Documentation: Data Modeling with JSON Training class: CD210 Couchbase NoSQL Data Modeling, Querying, andTuning Using N1QL
  46. 46. ©2016 Couchbase Inc. 46 Thank you

×