Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CenterEdge: Migrating from relational: data modeling and access – Couchbase Connect 2016


Published on

Would you like to know how CenterEdge Software consolidated their thirty SQL Server nodes into 5-node Couchbase clusters? Would you like to know the best practices for moving from Oracle, MySQL to Couchbase? Many applications and companies are migrating from relational databases to NoSQL. This talk will discuss the tipping points to consider the move. We'll then share the best practices and lessons learned at CenterEdge Software while moving from SQL Server to Couchbase. In addition, we'll discuss the best practices for moving from Oracle, MySQL to Couchbase.

Published in: Software

CenterEdge: Migrating from relational: data modeling and access – Couchbase Connect 2016

  1. 1. ©2016 Couchbase Inc. 1 The Couchbase Connect16 mobile app Take our in-app survey!
  2. 2. ©2016 Couchbase Inc. Migrating from relational data modeling and access Brant Burnett, Lead Developer, CenterEdge ClarenceTauro, SrTrainer, Couchbase MarcoGreco, Sr Engineer, N1QL R&D,Couchbase 2
  3. 3. ©2016 Couchbase Inc.©2016 Couchbase Inc. Agenda • Practical considerations for data and application migration • Modeling in Couchbase • Real life experience: Centeredge 3
  4. 4. ©2016 Couchbase Inc. 4 Practical Considerations
  5. 5. ©2016 Couchbase Inc.©2016 Couchbase Inc. In this section • Nomenclature • Type and data model mapping • Migrating data • Business logic • Monitoring 5
  6. 6. ©2016 Couchbase Inc.©2016 Couchbase Inc. Nomenclature 6 Oracle Couchbase Database Bucket Table Bucket Row Document Column Field
  7. 7. ©2016 Couchbase Inc.©2016 Couchbase Inc. Type Mapping 7 Oracle (PL/SQL Synonyms) Couchbase Number, Binary_real, Binary_integer Smallint, Int, Dec, Decimal, Float, … Number Char, Nchar, Varchar2, Nvarchar2 Character, String String Boolean Boolean Date, Timestamp Handled via String Interval (year to month, day to fraction) Some support via _millis() functions
  8. 8. ©2016 Couchbase Inc.©2016 Couchbase Inc. Modelling 8 CustomerID Name DOB CBC2016 Jane Smith 1990-01-30 CustomerID Type Cardnum Expiry CBC2016 visa 5827… 2019-03 CBC2016 master 6274… 2018-12 CustomerID ConnId Name CBC2016 XYZ987 Joe Smith CBC2016 SKR007 Sam Smith CustomerID item amt CBC2016 mac 2823.52 CBC2016 ipad2 623.52 CustomerID ConnId Name CBC2016 XYZ987 Joe Smith CBC2016 SKR007 Sam Smith Contacts Customer Billing ConnectionsPurchases { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } DocumentKey: CBC2016
  9. 9. ©2016 Couchbase Inc.©2016 Couchbase Inc. Migration • Generalized process • Commercial tools • Talend • Informatica • Open source • Couchbase java importer • Oracle2couchbase • SQSL • Importing from files 9
  10. 10. ©2016 Couchbase Inc.©2016 Couchbase Inc. Migration Process • High level process to migrate data from RDBMS to Couchbase using N1QL • For each table • Determine primary key columns • Describe table • For each row • Generate document key from primary key columns • Generate document from projection list description, column values • INSERT INTO <bucket> (key, value) ($1, $2) • Use key and document as placeholder values 10
  11. 11. ©2016 Couchbase Inc.©2016 Couchbase Inc. Talend • Talend connector for Couchbase --Talend 5.3 or later • tors/talend/talend.html • Ingesting unstructured data • Couchbase view support • Seamless integration with Couchbase • tCouchbaseInput • Incoming data transformed into JSON documents and stored in Couchbase. • User defines the data fields to be transformed into JSON attributes • tCouchbaseOutput: uses the schema mapping to transform JSON documents into target data formats • ODBC/JDBC drivers (provided by Simba and CData) 11
  12. 12. ©2016 Couchbase Inc.©2016 Couchbase Inc. Informatica 12 • Informatica Power Center • Needs ODBC driver • Informatica Cloud • Needs JDBC driver • ODBC/JDBC drivers (provided by Simba and CData) • ETL & Data Integration • Load data from any Relational system into Couchbase • Export Couchbase data into RDBMS • Seamlessly integrate Couchbase into rest of the Data fabric
  13. 13. ©2016 Couchbase Inc.©2016 Couchbase Inc. Couchbase java importer • Blog post by Laurent Doguin detailing journey from process to code • Java based, but principle applies to other languages • Geared to Postgres but principle applies to other engines • Blog: • Source code: 13
  14. 14. ©2016 Couchbase Inc.©2016 Couchbase Inc. Oracle2couchbase • Another blog post / opensource tool • By Manuel Hurtado • Java based • Migrates from Oracle • Blog: • Source and binary: 14
  15. 15. ©2016 Couchbase Inc.©2016 Couchbase Inc. SQSL • Client side SQL like scripting language developed by yours truly two decades ago • Several nifty features like • Expansion • Data driven operation • On the fly aggregation and redirection • User defined routines • Have recently written data source for Couchbase and json library • Source: • Example: let fromconn="sample"; connect to fromconn source db2cli; connect to "couchbase://" source cb; select * from db2inst1.dept connection fromconn insert into default (key, value) values($1, $2) using json:key("::", columns), json:row2doc(displaylabels, columns); 15
  16. 16. ©2016 Couchbase Inc.©2016 Couchbase Inc. Import / Export utilities • Upcoming version includes cbimport & cbexport • File based utilities • Need to export RDBDMS data to file first • Load directly into data store bypassing N1QL 16
  17. 17. ©2016 Couchbase Inc.©2016 Couchbase Inc. Business logic • DDL • Views • Triggers • Procedures • Sequences • Joins • Transactions 17
  18. 18. ©2016 Couchbase Inc.©2016 Couchbase Inc. Language comparison 18 Query Features SQL on RDBMS N1QL DML  SELECT, INSERT, UPDATE, DELETE, MERGE  SELECT, INSERT, UPDATE, DELETE, MERGE DDL  CREATE [INDEX, PROCEDURE TABLE, TYPE, VIEW…]  ALTER [TABLE, TYPE, …]  DROP [INDEX, PROCEDURE TABLE, TYPE, VIEW…]  CREATE [PRIMARY] INDEX  DROP [PRIMARY] INDEX Query Operations  Select, Join, Project, Subqueries  Strict Schema  Strict Type checking  Select, Join, Project, Subqueries  Nest & Unnest  Look Ma! No Type Mismatch Errors!  JSON keys act as columns Schema  Predetermined Columns  Fully addressable JSON  Flexible document structure Data Types  SQL Data types  Conversion Functions  JSON Data types  Conversion Functions Query Processing  INPUT: Sets of Tuples  OUPUT: Set of Tuples  INPUT: Sets of JSON  OUTPUT: Set of JSON
  19. 19. ©2016 Couchbase Inc.©2016 Couchbase Inc. DDL • Only Create Index / Drop Index exist in N1QL • Everything else should be removed from the application • Temporary tables • Materialize results • Store in memory, or • Insert materialized document in a keyspace using a designated “type”: field and a UUID() as key • DROP <temporary table> becomes DELETE FROM keyspace WHERE type=… and ID=<UUID> 19
  20. 20. ©2016 Couchbase Inc.©2016 Couchbase Inc. Views • Access underlying keyspaces instead • Something akin to views can be obtained with • View indexes • Functional indexes • CREATE INDEX …WHERE clauses 20
  21. 21. ©2016 Couchbase Inc.©2016 Couchbase Inc. Statement blocks • Handled by the application: • Triggers • Procedures 21
  22. 22. ©2016 Couchbase Inc.©2016 Couchbase Inc. Sequences • The eventual persistence engine handles atomic increments • Special documents can be created with a counter and accessed atomically • Can specify a delta on creation • Must be done from SDK • In python: • N1QL does not • Use UUID() instead 22
  23. 23. ©2016 Couchbase Inc.©2016 Couchbase Inc. Joins • Two types of joins • Look up • Index • Joins use the document key • Joining side can be an expression • Joined side is document key • Full expression joins not supported 23
  24. 24. ©2016 Couchbase Inc.©2016 Couchbase Inc. Transactions • In two words: No need • Document modification is atomic • Consistency can be specified at the REST call level or SDK • REST example • Add scan_consistency=[not_bounded|at_plus|request_plus|statement_plus] to REST parameters • C SDK example • Use lcb_n1p_setconsistency (…, [LCB_N1P_CONSISTENCY_NONE, LCB_N1P_CONSISTENCY_RYOW, LCB_N1P_CONSISTENCY_REQUEST, LCB_N1P_CONSISTENCY_STATEMENT]) when setting up request args 24
  25. 25. ©2016 Couchbase Inc.©2016 Couchbase Inc. Monitoring • Oracle • ALTER [system|session] SET timed_statistics=true turns on timed statistics collection. • V$SESSTAT,V$SYSSTAT,V$STATNAME dynamic performance views report timed statistics. • EXPLAIN PLAN explains a statement. • MySQL • SET profiling=1 turns on profiling • SHOW PROFILES displays available query profiles • SHOW PROFILE displays the profile for a specific query • EXPLAIN <statement> produces query plan • Couchbase • system:completed_requests virtual keyspace lists completed long running queries with timings and statistics • system:active_requests virtual keyspace lists active queries with timings and statistics • EXPLAIN <statement> explains request plan as a json document 25
  26. 26. ©2016 Couchbase Inc. 26 Modeling
  27. 27. ©2016 Couchbase Inc.©2016 Couchbase Inc. What is Data Modeling? 27 • A data model is a conceptual representation of the data structures that are required by a database • The data structures include the data objects, the associations between data objects, and the rules which govern operations on the objects.
  28. 28. ©2016 Couchbase Inc.©2016 Couchbase Inc. Conceptual Data Modeling • Phase I – define entities, attributes and their relationships • Entities: Main objects that are targets of your apps operates on • Attributes: properties that your applications keep track of for the entity • Relationships: definition connections to other entities - 1-1, 1-many, many-many Airline Airport Landmark Route Passenger Flight
  29. 29. ©2016 Couchbase Inc.©2016 Couchbase Inc. Physical Data Model • Phase II - Map entities, attributes and their relationships to containers provided by the underlying database solution Relational Databases Couchbase Server Databases Buckets Tables Documents with type designator attribute OR Compound Keys Rows Items (Key-Value or Key-Document) Columns Attributes Index Index
  30. 30. ©2016 Couchbase Inc.©2016 Couchbase Inc. { “name”:…, “flights”:[ {“_id”:“route_1000”, “flight”:…,}, {“_id”:”route_6421”, “flight”:…,} …], … } Physical Data Modeling route_1000 { “id”:”1000”, “airline”: “AF”, “sourceairport”:”TLV”, “destinationairport”:”MRW”, …} airport_TLV { “id”:”126701”, “airportname”: “TLV”, “geo”:{ “lat”:…,“long”:…}, …} Flights
  31. 31. ©2016 Couchbase Inc.©2016 Couchbase Inc. Data Modeling Approaches 31 NoSQL Relaxed Normalization schema implied by structure fields may be empty, duplicate, or missing Relational Required Normalization schema enforced by db same fields in all records • Minimize data inconsistencies (one item = one location) • Reduced update cost (no duplicated data) • Preserve storage resources • Optimized to planned/actual access patterns • Flexibly with software architecture • Supports clustered architecture • Reduced server overhead
  32. 32. ©2016 Couchbase Inc.©2016 Couchbase Inc. JSON Design Choices 32 • Couchbase Server neither enforces nor validates for any particular document structure • Choices that impact JSON document design: – Single Root Attributes – Objects vs. Arrays – Array ElementTypes – Timestamp Formats – Property Names – Empty and Null PropertyValues – JSON Schema
  33. 33. ©2016 Couchbase Inc.©2016 Couchbase Inc. Root Attributes vs. Embedded Attributes 33 • The choice of having a single root attribute or the “type” attribute embedded.
  34. 34. ©2016 Couchbase Inc.©2016 Couchbase Inc. Objects vs. Arrays 34 • The choice of having an object type, or an array type
  35. 35. ©2016 Couchbase Inc.©2016 Couchbase Inc. Array ElementTypes Array of strings Array of objects 35 • Array elements can be simple types, objects or arrays:
  36. 36. ©2016 Couchbase Inc.©2016 Couchbase Inc. Timestamp Formats Array of time components String (ISO 8601) Number (Unix style) (Epoch) • Working and dealing with timestamps has been challenging ever since • When storing timestamps, you have at least 3 options: 16
  37. 37. ©2016 Couchbase Inc.©2016 Couchbase Inc. Empty and Null PropertyValues 37 • Keep in mind that JSON supports optional properties • If a property has a null value, consider dropping it from the JSON, unless there's a good reason not to • N1QL makes it easy to test for missing or null property values • Be sure your application code handles the case where a property value is missing SELECT * FROM couchmusic1 WHERE userprofile.address IS NULL; SELECT * FROM couchmusic1 WHERE userprofile.gender IS MISSING;
  38. 38. ©2016 Couchbase Inc.©2016 Couchbase Inc. JSON Schema 38 • Couchbase Server pays absolutely no attention to the shape of your JSON documents so long as they are well-formed • There are times when it is useful to validate that a JSON document conforms to some expected shape • JSON Schema is a JSON-based format for defining the structure of JSON data • There are implementations for most popular programming languages • Learn more here:
  39. 39. ©2016 Couchbase Inc.©2016 Couchbase Inc. Data Nesting (aka Denormalization) 39 • As you know, relational database design promotes separating data using normalization, which doesn’t scale • For NoSQL systems, we often avoid normalization so that we can scale • Nesting allows related objects to be organized into a hierarchical tree structure where you can have multiple levels of grouping • Rule of thumb is to nest no more than 3 levels deep unless there is a very good reason to do so • You will often want to include a timestamp in the nested data
  40. 40. ©2016 Couchbase Inc.©2016 Couchbase Inc. Example #1 of Data Nesting • Playlist with owner attribute containing username of corresponding userprofile 40 Document Key: copilotmarks61569
  41. 41. ©2016 Couchbase Inc.©2016 Couchbase Inc. Example #1 of Data Nesting • Playlist with owner attribute containing a subset of the corresponding userprofile 41 * Note the inclusion of the updated attribute
  42. 42. ©2016 Couchbase Inc.©2016 Couchbase Inc. Choices with JSON Key Design 42 • A key formed of attributes that exist in the real world: – Phone numbers – Usernames – Social security numbers – Account numbers – SKU, UPC or QR codes – Device IDs
  43. 43. ©2016 Couchbase Inc.©2016 Couchbase Inc. Surrogate Keys 43 • We often use surrogate keys when no obvious natural key exist • They are not derived from application data • They can be generated values – 3305311F4A0FAAFEABD001D324906748B18FB24A (SHA-1) – 003C6F65-641A-4CGA-8E5E-41C947086CAE (UUID) • They can be sequential numbers (often implemented using the Counter feature of Couchbase Server) – 456789, 456790, 456791, …
  44. 44. ©2016 Couchbase Inc.©2016 Couchbase Inc. MakingTough Choices 44 • We must also make trade-offs in data modeling: – Document size – Atomicity – Complexity – Speed
  45. 45. ©2016 Couchbase Inc.©2016 Couchbase Inc. Embed vs. Refer 45 • All of the previous trade-offs are usually rolled into a single decision – whether to embed or refer • When to embed? • When to refer?
  46. 46. ©2016 Couchbase Inc.©2016 Couchbase Inc. Want to know more on Data Modeling? 46 • Session tomorrow – “Agile Document Models and Data Structures” at 1:00PM
  47. 47. ©2016 Couchbase Inc.©2016 Couchbase Inc. 47 Brant Burnett Software Development Team Lead Couchbase Community Expert
  48. 48. ©2016 Couchbase Inc.©2016 Couchbase Inc. About CenterEdge Software  Point of Sale  Admissions & Ticketing  Party, Group & Event Bookings  Online Sales & Party Reservations  Time Clock & Labor Management  & More! 48
  49. 49. ©2016 Couchbase Inc.©2016 Couchbase Inc. About CenterEdge Software • Celebrating 12 Year Anniversary • Team of 50 in Roxboro, NC • Sister company is Palace Pointe, a 100k sq. ft. Entertainment Venue for which we were developed as an in- house system • Over 600 facilties using our platform across the US and abroad • FEC’s, Waterparks, Trampoline Parks, Amusement Parks, Skating Rinks, Bowling Centers, Zoos & Museums 49
  50. 50. ©2016 Couchbase Inc.©2016 Couchbase Inc. Why Couchbase For CenterEdge’s Newest Cloud Platform? • More scalable and performant than traditional SQL in the cloud • Previous online store system uses 19 SQL servers, each hosting 30 stores • As each store is only on a single server, it doesn’t handle spikes in load efficiently • Servers can’t be scaled vertically without downtime for all 30 stores on that server • Schema-less JSON increases flexibility as your system evolves, leaving schema enforcement in your data access layer • Schema changes to large tables can result in downtime as data structure is updated across all records • We were already using Couchbase for our shopping carts as well as a SQL caching layer, with great success. Now we can simplify the architecture with a single data layer. 50
  51. 51. ©2016 Couchbase Inc.©2016 Couchbase Inc. Couchbase Cloud Data Flow Architecture 51 Data Data Data IndexQuery Web Servers Remote Application Servers
  52. 52. ©2016 Couchbase Inc.©2016 Couchbase Inc. Enforcing Schema • Since Couchbase doesn’t enforce schema like SQL, your data access layer should do so instead • At CenterEdge, each document type is only updated by a single service • Within that service, schema is enforced by serializing data from consistent POCOs • Schema changes can be supported using customized JSON converters during deserialization • IS MISSING is a good way to recognize the difference in attributes that weren’t stored because the document was saved using the old schema • Where possible, try to predict possible schema needs in advance 52
  53. 53. ©2016 Couchbase Inc.©2016 Couchbase Inc. Pay AttentionTo Document Modeling Up Front • Watch out for documents that get too large • Might hit 20MB document size limit • High serialization/deserialization/networking performance penalties • Document contention as too many actions attempt to modify the document simultaneously • Watch out for data spread across too many related documents • Lack of atomic transactions across multiple writes can result in partial updates • Can add latency if documents must be read in a chained manner (i.e. each document contains the key to the next document) • Be sure to include document keys, or a way to construct them, where you may want to use N1QL JOIN or NEST operations • Should the document key be stored inside the document, too? • Increases data size, as the key is in the document and in the metadata • Requires that the data layer maintain consistency • Can make queries easier since you don’t need to use META() function to get the key 53
  54. 54. ©2016 Couchbase Inc.©2016 Couchbase Inc. View Indexes vs. Global Secondary Indexes • Be sure to analyze what type of index is best for each workload • Views are great where pre-aggregating numbers is useful, such as reports, graphs, etc • GSI is usually the best option for more generic queries, especially if when you’re just trying to collect a set of documents • Views don’t scale as cleanly, they can’t be scaled independently via Multi Dimensional Scaling • Views live on the data nodes, so they only scale as you add more data nodes • At CenterEdge, our new platform started on Couchbase Server 3.0, before Global Secondary Indexes were an option • We used views and lookup documents for most of our indexing needs • We have run into problems with too many views per bucket causing performance bottlenecks • We’re currently transitioning many of these views into Global Secondary Indexes 54
  55. 55. ©2016 Couchbase Inc.©2016 Couchbase Inc. Efficient Indexing Is Especially Important For Couchbase 55 • Primary key scans in SQL have always been inefficient • Every record in the table would be read and checked for a match to the WHERE predicate • For small tables, the performance penalty was negligible, and would usually go unnoticed • Primary key scans in Couchbase are usually much worse • In our experience with production-scale data, almost invariably results in queries timing out • Every record in the bucket is being read and checked for a match to theWHERE predicate • Can easily result in reading and parsing millions of JSON documents • Will also bust the in-memory cache on the data nodes if there is more data in the bucket than allocated memory • Design every query to be supported by a Global Secondary Index • Helps even if the index isn’t an exact match • A good design can vastly reduce the number of documents scanned, making it more like a SQL primary key scan
  56. 56. ©2016 Couchbase Inc.©2016 Couchbase Inc. Efficient Indexing Is Especially Important For Couchbase 56 /* Use predicate to only index documents of a certain type */ CREATE INDEX `airport_sourceairport` ON `travel-sample` (`sourceairport`) WHERE `type` = 'airport' /* To index the same attribute across multiple document types, include type attribute first */ CREATE INDEX `def_type_id` ON `travel-sample` (`type`, `id`) /* A good practice is to create a fallback in case other indexes aren't used */ CREATE INDEX `def_type` ON `travel-sample` (`type`) If you’re using the “type” attribute as the logical equivalent of a table in SQL, most indexes will include this attribute.
  57. 57. ©2016 Couchbase Inc.©2016 Couchbase Inc. HowTo Store and Index Date/Times • Date/Times are usually stored as ISO 8601 strings in JSON • Use STR_TO_MILLIS(x) in indexes and queries to work with ISO 8601 strings 57 /* STR_TO_MILLIS converts an ISO8601 string to a Unix numeric representation */ /* It also handles the time zone specifier */ SELECT `Extent1`.* FROM `beer-sample` as `Extent1` WHERE (`type` = 'beer') AND (STR_TO_MILLIS(`Extent1`.`updated`) <= STR_TO_MILLIS("2010-01-01T00:00:00Z")) /* STR_TO_MILLIS must also be used in the index, or the index cannot be used */ CREATE INDEX `beer_updated` ON `beer-sample` (STR_TO_MILLIS(`updated`)) WHERE `type` = 'beer'
  58. 58. ©2016 Couchbase Inc.©2016 Couchbase Inc. Index Performance During Mutations 58 Airline SQL Table Airport SQL Table travel-sample Bucket Airline Indexes Airport Indexes Bucket Indexes Remember that GSI indexes are similar to SQL indexes, but not the same
  59. 59. ©2016 Couchbase Inc.©2016 Couchbase Inc. Training! • Don’t just assume you can switch to any NoSQL platform without some training • Performance profile is different, and the penalties can appear in different places • Developers who know the pitfalls in advance can save you a lot of refactoring headaches later • N1QL does help reduce the learning curve significantly • For .Net development shops, look at Linq2Couchbase to make it even easier! • The operations department needs training, too! 59
  60. 60. ©2016 Couchbase Inc. 60 Marco Greco Senior Software Engineer Clarence J MTauro, Ph.D. Senior Instructor Brant Burnett Lead Developer
  61. 61. ©2016 Couchbase Inc. 61 Share your opinion on Couchbase 1. Go here: 2. Create a profile 3. Provide feedback (~15 minutes)
  62. 62. ©2016 Couchbase Inc. 62 The Couchbase Connect16 mobile app Take our in-app survey!
  63. 63. ©2016 Couchbase Inc. ThankYou! 63