SlideShare a Scribd company logo
1 of 9
Download to read offline
Navigating the Transition
From Relational to NoSQL
Database Technology
NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY




Table of Contents
Introduction	3
Why transition at all?	                                                           3
    Scaling model

    Data model

The relational vs. document-oriented data model	                                  4
    Relational data model

    Document data model

Document Modeling: Rules of Thumb	                                                6
    Models

    (Primary) Keys

    Multiple Places and Editability

    Concurrency

Conclusion	8




                                           2

                 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY




Introduction
While the hype surrounding “NoSQL” (non-relational) database technology has become
deafening, there is real substance beneath the often exaggerated claims. But like most things
in life, the benefits come at a cost. Developers accustomed to data modeling and application
development against relational database technology will need to approach things differently.
This white paper highlights the differences between a relational database and a distributed
document-oriented database, the implications for application development, and guidance
that can ease the transition from relational to NoSQL database technology.



Why transition at all?
Change is hard and rarely undertaken unless it alleviates significant pain. The white paper
NoSQL Database Technology provides a comprehensive look at the motivating challenges that
have led to the emergence and rapid adoption of NoSQL database technology. In a nutshell,
the transition has been spurred by the need for flexibility – both in the scaling model and the
data model.


Scaling model
Relational database technology is a “scale up” technology – to add capacity (whether data
storage or I/O capacity) one gets a bigger server. The modern approach to application
architecture is to scale out, rather than scale up. Instead of buying a bigger server, you
add more commodity servers, virtual machines or cloud instances behind a load balancer.
Conversely, capacity can be easily removed when no longer required. While scaling out is
already common at the application logic tier, database technology is only now catching up.


Data model
The scale-out deployment benefits of NoSQL technology frequently get the most attention, but
equally important are the benefits afforded by a schemaless approach to data management.
With a relational database, you must define a schema before adding records to the database.
Each record added to the database must adhere strictly to this schema with its fixed columns
and data types. Changing the database schema, particularly when dealing with a partitioned
relational database spread across many servers, is difficult. If your data capture and
management needs are constantly evolving, a rigid schema quickly becomes blocker to change.

NoSQL databases (whether key-value, document, column-oriented or otherwise) scale out,
and they don’t require schema definition prior to inserting data nor a schema change when
data capture and management needs evolve.

The rest of this paper will focus on distributed document-oriented NoSQL database technology
– with Couchbase and MongoDB being the two most visible and widely adopted examples.


                                              3

                 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY




The relational vs. document-oriented data model
The figure below compares four records from a relational database with four from a
document-oriented database.

                                                                                         {
                                                                                         “UUID”: “21f7f8de-8051-5b89-86
                                                                                         “Time”: “2011-04-01T13:01:02.42
                                                                               {         “Server”: “A2223E”,
                                                                               “UUID”:“Calling Server”: “A2213W”,
                                                                                          “21f7f8de-8051-5b89-86
                                                                               “Time”: “Type”: “E100”,
                                                                                         “2011-04-01T13:01:02.42
                                                                     {
                R1C1      R1C2     R1C3      R1C4                              “Server”:“Initiating User”: “dsallings@spy.net”,
                                                                                           “A2223E”,
                                                                     “UUID”:“Calling Server”: “A2213W”,
                                                                                “21f7f8de-8051-5b89-86
                                                                                         “Details”:
                                                                     “Time”: “Type”: “E100”,
                                                                               “2011-04-01T13:01:02.42
                                                          {                                      {
                                                                     “Server”:“Initiating User”: “dsallings@spy.net”,
                                                                                 “A2223E”,
                                                          “UUID”:“Calling Server”: “A2213W”,“10.1.1.22”,
                                                                      “21f7f8de-8051-5b89-86     “IP”:
                                                                               “Details”:
                                                                     “2011-04-01T13:01:02.42 “InsertDVDQueueItem”,
                                                          “Time”: “Type”: “E100”,                “API”:
                                                                                       {         “Trace”: “cleansed”,
                                                          “Server”:“Initiating User”: “dsallings@spy.net”,
                                                                       “A2223E”,
                R2C1      R2C2     R2C3      R2C4         “Calling “Details”:
                                                                                       “IP”: “10.1.1.22”,
                                                                     Server”: “A2213W”, “Tags”:
                                                                                       “API”: “InsertDVDQueueItem”,
                                                          “Type”: “E100”,                              [
                                                                             {         “Trace”: “cleansed”,
                                                          “Initiating User”: “dsallings@spy.net”,      “SERVER”,
                                                                             “IP”: “10.1.1.22”,
                                                                                       “Tags”:
                                                          “Details”:                                   “US-West”,
                                                                             “API”: “InsertDVDQueueItem”,
                                                                                             [
                                                                   {                                   “API”
                                                                             “Trace”: “cleansed”,
                                                                                             “SERVER”,
                                                                   “IP”: “10.1.1.22”,                    ]
                                                                             “Tags”:
                R3C1      R3C2     R3C3      R3C4                                            “US-West”,
                                                                   “API”: “InsertDVDQueueItem”,
                                                                                   [
                                                                                                 }
                                                                   “Trace”: “cleansed”,  } “API”
                                                                                   “SERVER”,   ]
                                                                   “Tags”:         “US-West”,
                                                                                       }
                                                                         [
                                                                               } “API”
                                                                         “SERVER”,   ]
                                                                         “US-West”,
                                                                             }
                R4C1      R4C2     R4C3      R4C4                    } “API”
                                                                           ]
                                                                   }
                                                          }




                     Relational data model                      Document data model
                       Highly-structured table               Collection of complex documents
                  organization with rigidly-defined           with arbitrary, nested data formats
                 data formats and record structure.             and varying “record” format.




Relational data model
As shown above, each record in a relational database conforms to a schema – with a fixed
number of fields (columns) each having a specified purpose and data type. Every record is the
same. If you wish to capture different data in the future, the database schema must be revised.

Additionally, the relational model is characterized by database normalization, where large
tables are decomposed into smaller, interrelated tables. The figure below illustrates the concept:

                      Table 1: Error Log                                    Table 2: Data Centers

                KEY       ERR     TIME       DC

                                                                                    KEY                  LOC                NUM
                 1        ERR     TIME    FK(DC2)
                                                                                                                         303-223-
                                                                                        1                DEN               2332

                 2        ERR     TIME    FK(DC2)
                                                                                                                         212-223-
                                                                                       2                 NYC               2332

                 3        ERR     TIME    FK(DC2)

                                                                                       3                 SFO             415-223-
                                                                                                                           2332
                 4        ERR     TIME    FK(DC3)




                                                      4

                     © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY




In the above example, the database is used to store error log information. Each error record
(each row in Table 1) consists of an error number (ERR), the time the error occurred (TIME)
and the datacenter (DC) in which the error occurred. Instead of repeating all the datacenter
information in each error record (location, phone number), each error record points to a row
in the in the Data Centers Table (Table 2) which includes the location of the datacenter (LOC)
and the phone number (NUM).

In the relational model, records are “striped” across multiple tables, with some data shared
by multiple records (multiple error records share the same data center information). The upside is
that there is less duplicated data in the database. The downside is that a change in a single record
can mean locking down many tables simultaneously to ensure a change doesn’t leave the database
in an inconsistent state. ACID transactions can be complex on a relational database because the
data, even of a single record, is spread about. This complex web of interrelationships between
shared data items is what makes it so difficult to distribute relational data across multiple servers
and can lead to performance challenges both reading and writing data.

When storage resources were expensive and scarce, the tradeoffs made sense. But the
price of storage has dropped precipitously over the last 40 years, to say the least. For many,
the tradeoff calculus no longer makes sense. Using more storage in exchange for better
application performance and the ability to easily distribute workloads across machines is
now the best choice for many applications.


Document data model
Use of the term “document” is a bit confusing. A document-oriented database really has
nothing to do with “documents” in the classical sense of the word. It doesn’t mean books,
letters or articles. Rather, a document in this case refers to a data record that is self-
describing as to the data elements it contains. XML documents, HTML documents and JSON
documents are examples of “documents” in this context. Couchbase Server (which contains
Apache CouchDB as an ingredient technology) is a document-oriented database that uses
JSON as the document format. In Couchbase, error records would look like this:

     {
           “ID”: 1,
           “ERR”: “Out of Memory”,
           “TIME”: “2004-09-16T23:59:58.75”,
           “DC”: “NYC”,
           “NUM”: “212-223-2332”
     }
     {
           “ID”: 2,
           “ERR”: “ECC Error”,
           “TIME”: “2004-09-16T23:59:59.00”,
           “DC”: “NYC”,
           “NUM”: “212-223-2332”
     }



                                                 5

                  © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY




As can be seen, the data is denormalized. Each record contains a complete set of information
related to the error without external reference. The records are self-contained. This makes
it very easy to move the entire record to another server – all the information simply comes
along with it. There is no concern about having parts of the record in other tables left
behind. And because only the self-contained record (document) needs to be updated when
changes are made (versus changing entries in multiple tables simultaneously), ensuring ACID
compliance is far easier, at record boundaries. Performance is also increased on reads.

But complete data denormalization is not required in a document database, as highlighted
in the next section. In fact, in this particular example, maintaining documents representing
each datacenter and simply referencing those from each error record would probably be
the right decision. Separation would eliminate duplication and allow quick changes to
information shared across many records (e.g. if the telephone number for the datacenter
changed, there would be no need to go update it in every error record). Ultimately, however,
data modeling decisions are dependent on the use case and planned update patterns.



Document Modeling: Rules of Thumb
It takes a while to unlearn habits. But do not fear: by understanding alternatives you will
be able to make more efficient use of your trusted knowledge as well. After all, the tool best
suited for the job will leave you with the least headache. If you know more tools, you can
choose more wisely.


Models
In an application, data objects are a central construct – the model layer in Model-View-
Controller (MVC). These are the documents that hold your data and let you manipulate it. If a
blog has posts and comments, these are likely two different models. Ideally, you should have
a separate document for every post and every comment.

When looking at an existing application, stop at the Object-Relational Mapping (ORM) layer.
Instead of splitting your models up into tables and rows, turn them into JSON and make
them a document. Each document gets a unique ID by which you can find it later. Done.


(Primary) Keys
In the NoSQL world the document ID is the one and only key to a document. They are
roughly equivalent to primary keys in a relational database. Usually an ID can only appear
once in a database (different NoSQL solutions have different names for these: buckets,
collections, tables etc. The idea is roughly similar to a table in an RDBMS. You can have many
per server).




                                               6

                  © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY




Some NoSQL database systems sort data by ID. Data with nearby IDs can be accessed more
efficiently than IDs that are all over the place. Keeping data that you tend to access at the
same time closer together makes your application faster.

The larger point here though is that an ID-lookup is extremely fast, and by selecting clever IDs
you can make your life a lot easier. One example is the use of prefixes (user:com.example:123)
to group your documents.


Multiple Places and Editability
Suppose you have a piece of data that shows up all over your application but you still want
to be able to edit that data. For example, a photo on flickr has a title. The photo can show
up in your photo stream, in sets, collections, groups on your flickr front page and in many
more places.

Usually, a photo’s title is shown with the photo. You could create a document for each
occurrence of the photo in each of the places. But then, if you change the title of your picture,
you need to update a bunch of documents. If you know this is a bounded number (no more
than 10-100 e.g.) and the renaming doesn’t have to happen simultaneously in all places
(which means an asynchronous background task could do the renames), using separate docs
for each occurrence can work fine.

However, if the number of copies is unbounded and could potentially lead you to update
thousands of documents, that approach probably won’t work. Instead, you would want to
keep the title and perhaps other identifying data in a single “photo information” document
and create a separate “photo placement” document for each place the photo appears (these
“photo placement” documents would each point to the photo’s information document). Now
when you display a photo you will make two lookups: one for the placement document and
then another for the photo information document. If you want to change the title of a photo,
you just edit the single photo information document and it will be changed everywhere on
your site.

There is a special trick you can use in Couchbase: Using view collation you can have a single
query answering for all the data you need. Christopher Lenz wrote an outstanding blog post
on this topic, highlighting three approaches to modeling using the blog example.

With views, Couchbase Server allows one to keep a single canonical source of a piece of data
while having it show up in many different places.

In the RDBMS world you are taught to normalize your data as much as possible; in the
NoSQL world you are taught to denormalize as much as possible. In both cases, the truth is
somewhere in the middle.




                                                7

                  © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY




Concurrency
Let’s stay with the blog example. There are multiple authors, maybe an editor, and each of
them is looking at a single article at any given time. No two people work on the same article,
usually. If you have data that you know is only edited by a single person at any given time, it’s
a good idea to place it into a single document.

Comments are different. Many people can write comments and they can do so independently
and simultaneously. Once the post is published comments can be added immediately. To
avoid write contention – that is, concurrent writes happening to the same document – you
can store comments in separate documents, thus ensuring that again, only one author is
editing a single document at any given time.

To avoid serializing and locking each comment author out or accidentally overwriting any
data, just store the posts ID with the comment to be able to fetch them back in one request
for displaying. (Note: Couchbase won’t allow overwriting data, but it requires more complex
code to handle that case, so it’s best to just avoid it if possible.)



Conclusion
The relational data model relies on rigid adherence to a database schema, normalization of
data and joins to store data and perform complex queries. Over the last 40 years, relational
modeling and query techniques have been well established and are familiar to most
application developers.

But changes in application, user and infrastructure characteristics have led application
developers and architects to seek alternative “NoSQL” (non-relational) database technologies.
Many view distributed document database technology as a natural successor to relational
database technology:

      •	 It effortlessly scales across commodity servers, virtual machines or cloud
         instances.

      •	 It doesn’t require a rigid schema before inserting data, nor does it require a
         schema change when different data must be captured and processed.

      •	 Its rich data model and view technology allows for complex data modeling,
         capture and queries.

It’s important to note that in some cases, additional storage may be required given the
preference for data denormalization. But the overall benefits in performance, scalability and
flexibility are usually, and increasingly, a more than fair trade.




                                                8

                  © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY




About Couchbase
Couchbase is the NoSQL database market share leader, with production deployments at ADP,
AOL, BMW, Cisco, Deutsche Post-DHL, Disney, Honda Motors, Zynga and hundreds of other
household brands worldwide. VMware CEO Paul Maritz recently identified the data model
transition from relational to NoSQL database technology as one of the two ongoing changes
most likely to transform the business of IT over the next decade. Couchbase is leading that
transition.

At the core of the Couchbase product family are three of the most widely deployed and
trusted open source data management technologies: CouchDB, Memcached and Membase.
Collectively, these technologies power millions of applications worldwide, including 18 of the
top 20 busiest web properties. Couchbase products combine these ingredient technologies
into a family of interoperable database solutions. The Couchbase team includes the inventors
and core committers from each of these projects – representing decades of development and
deployment experience in modern distributed, document-oriented database technology.

Couchbase offers the only family of interoperable database solutions spanning mobile device
to datacenter cloud:

      •	 Couchbase Server is a distributed, high-performance, document-
         oriented database. By automatically spreading data across commodity
         servers or virtual machines, Couchbase Server makes it easy to employ the
         optimal quantity of low-cost assets to meet the data management needs
         of a given application. And because there is no rigid database schema,
         applications can be more flexible to the needs of the business – capturing
         and processing whatever data is most relevant without requiring a database
         schema change.

      •	 Couchbase Mobile is a database optimized for the mobile computing
         environment, addressing intermittent network connectivity and
         constrained processing, storage, bandwidth and battery resources. It
         delivers the flexibility benefits of document-oriented database technology
         to developers of native iOS and Android applications.

      •	 CouchSync, a capability common to all Couchbase products, is a mature
         data replication technology enabling real-time data synchronization
         between devices and the cloud. It is this differentiating technology that
         really sets Couchbase apart – enabling millions of mobile devices to
         capture and present data, while performing aggregation, analysis and
         enrichment of that data “in the cloud.”

For more information, visit www.couchbase.com or www.couchbase.org.




                                              9

                 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

More Related Content

Similar to Navigating the Transition from Relational to NoSQL Database Technology

Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseDipti Borkar
 
Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint MongoDB
 
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...HostedbyConfluent
 
Juan Pablo Crossley
Juan Pablo CrossleyJuan Pablo Crossley
Juan Pablo CrossleyColombia3.0
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityFederico Galassi
 
High-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using RedisHigh-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using Rediscacois
 
OpenStack Folsom Summit: Melange overview
OpenStack Folsom Summit: Melange overviewOpenStack Folsom Summit: Melange overview
OpenStack Folsom Summit: Melange overviewtroytoman
 
Sql server 2012 roadshow masd overview 003
Sql server 2012 roadshow masd overview 003Sql server 2012 roadshow masd overview 003
Sql server 2012 roadshow masd overview 003Mark Kromer
 
TLS/SSL MAC security flaw
TLS/SSL MAC security flawTLS/SSL MAC security flaw
TLS/SSL MAC security flawNate Lawson
 
Sobanski odl summit_2015
Sobanski odl summit_2015Sobanski odl summit_2015
Sobanski odl summit_2015John Sobanski
 
La apuesta de Telefónica por la cloud privada
La apuesta de Telefónica por la cloud privadaLa apuesta de Telefónica por la cloud privada
La apuesta de Telefónica por la cloud privadaLibreCon
 
How to search extracted data
How to search extracted dataHow to search extracted data
How to search extracted dataJavier Collado
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streamsunivalence
 
Create The Internet of Your Things example of a real system - Laurent Ellerbach
Create The Internet of Your Things example of a real system - Laurent EllerbachCreate The Internet of Your Things example of a real system - Laurent Ellerbach
Create The Internet of Your Things example of a real system - Laurent EllerbachITCamp
 
SIGRed - Monitoring and Detecting with Splunk
SIGRed - Monitoring and Detecting with SplunkSIGRed - Monitoring and Detecting with Splunk
SIGRed - Monitoring and Detecting with SplunkAnthony Reinke
 
Capacity Planning for Linux Systems
Capacity Planning for Linux SystemsCapacity Planning for Linux Systems
Capacity Planning for Linux SystemsRodrigo Campos
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012exponential-inc
 
JFall 2011 no sql workshop
JFall 2011 no sql workshopJFall 2011 no sql workshop
JFall 2011 no sql workshopfvanvollenhoven
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSIvo Andreev
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary
 

Similar to Navigating the Transition from Relational to NoSQL Database Technology (20)

Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and Couchbase
 
Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint
 
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
 
Juan Pablo Crossley
Juan Pablo CrossleyJuan Pablo Crossley
Juan Pablo Crossley
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental Complexity
 
High-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using RedisHigh-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using Redis
 
OpenStack Folsom Summit: Melange overview
OpenStack Folsom Summit: Melange overviewOpenStack Folsom Summit: Melange overview
OpenStack Folsom Summit: Melange overview
 
Sql server 2012 roadshow masd overview 003
Sql server 2012 roadshow masd overview 003Sql server 2012 roadshow masd overview 003
Sql server 2012 roadshow masd overview 003
 
TLS/SSL MAC security flaw
TLS/SSL MAC security flawTLS/SSL MAC security flaw
TLS/SSL MAC security flaw
 
Sobanski odl summit_2015
Sobanski odl summit_2015Sobanski odl summit_2015
Sobanski odl summit_2015
 
La apuesta de Telefónica por la cloud privada
La apuesta de Telefónica por la cloud privadaLa apuesta de Telefónica por la cloud privada
La apuesta de Telefónica por la cloud privada
 
How to search extracted data
How to search extracted dataHow to search extracted data
How to search extracted data
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streams
 
Create The Internet of Your Things example of a real system - Laurent Ellerbach
Create The Internet of Your Things example of a real system - Laurent EllerbachCreate The Internet of Your Things example of a real system - Laurent Ellerbach
Create The Internet of Your Things example of a real system - Laurent Ellerbach
 
SIGRed - Monitoring and Detecting with Splunk
SIGRed - Monitoring and Detecting with SplunkSIGRed - Monitoring and Detecting with Splunk
SIGRed - Monitoring and Detecting with Splunk
 
Capacity Planning for Linux Systems
Capacity Planning for Linux SystemsCapacity Planning for Linux Systems
Capacity Planning for Linux Systems
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
 
JFall 2011 no sql workshop
JFall 2011 no sql workshopJFall 2011 no sql workshop
JFall 2011 no sql workshop
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Navigating the Transition from Relational to NoSQL Database Technology

  • 1. Navigating the Transition From Relational to NoSQL Database Technology
  • 2. NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY Table of Contents Introduction 3 Why transition at all? 3 Scaling model Data model The relational vs. document-oriented data model 4 Relational data model Document data model Document Modeling: Rules of Thumb 6 Models (Primary) Keys Multiple Places and Editability Concurrency Conclusion 8 2 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
  • 3. NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY Introduction While the hype surrounding “NoSQL” (non-relational) database technology has become deafening, there is real substance beneath the often exaggerated claims. But like most things in life, the benefits come at a cost. Developers accustomed to data modeling and application development against relational database technology will need to approach things differently. This white paper highlights the differences between a relational database and a distributed document-oriented database, the implications for application development, and guidance that can ease the transition from relational to NoSQL database technology. Why transition at all? Change is hard and rarely undertaken unless it alleviates significant pain. The white paper NoSQL Database Technology provides a comprehensive look at the motivating challenges that have led to the emergence and rapid adoption of NoSQL database technology. In a nutshell, the transition has been spurred by the need for flexibility – both in the scaling model and the data model. Scaling model Relational database technology is a “scale up” technology – to add capacity (whether data storage or I/O capacity) one gets a bigger server. The modern approach to application architecture is to scale out, rather than scale up. Instead of buying a bigger server, you add more commodity servers, virtual machines or cloud instances behind a load balancer. Conversely, capacity can be easily removed when no longer required. While scaling out is already common at the application logic tier, database technology is only now catching up. Data model The scale-out deployment benefits of NoSQL technology frequently get the most attention, but equally important are the benefits afforded by a schemaless approach to data management. With a relational database, you must define a schema before adding records to the database. Each record added to the database must adhere strictly to this schema with its fixed columns and data types. Changing the database schema, particularly when dealing with a partitioned relational database spread across many servers, is difficult. If your data capture and management needs are constantly evolving, a rigid schema quickly becomes blocker to change. NoSQL databases (whether key-value, document, column-oriented or otherwise) scale out, and they don’t require schema definition prior to inserting data nor a schema change when data capture and management needs evolve. The rest of this paper will focus on distributed document-oriented NoSQL database technology – with Couchbase and MongoDB being the two most visible and widely adopted examples. 3 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
  • 4. NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY The relational vs. document-oriented data model The figure below compares four records from a relational database with four from a document-oriented database. { “UUID”: “21f7f8de-8051-5b89-86 “Time”: “2011-04-01T13:01:02.42 { “Server”: “A2223E”, “UUID”:“Calling Server”: “A2213W”, “21f7f8de-8051-5b89-86 “Time”: “Type”: “E100”, “2011-04-01T13:01:02.42 { R1C1 R1C2 R1C3 R1C4 “Server”:“Initiating User”: “dsallings@spy.net”, “A2223E”, “UUID”:“Calling Server”: “A2213W”, “21f7f8de-8051-5b89-86 “Details”: “Time”: “Type”: “E100”, “2011-04-01T13:01:02.42 { { “Server”:“Initiating User”: “dsallings@spy.net”, “A2223E”, “UUID”:“Calling Server”: “A2213W”,“10.1.1.22”, “21f7f8de-8051-5b89-86 “IP”: “Details”: “2011-04-01T13:01:02.42 “InsertDVDQueueItem”, “Time”: “Type”: “E100”, “API”: { “Trace”: “cleansed”, “Server”:“Initiating User”: “dsallings@spy.net”, “A2223E”, R2C1 R2C2 R2C3 R2C4 “Calling “Details”: “IP”: “10.1.1.22”, Server”: “A2213W”, “Tags”: “API”: “InsertDVDQueueItem”, “Type”: “E100”, [ { “Trace”: “cleansed”, “Initiating User”: “dsallings@spy.net”, “SERVER”, “IP”: “10.1.1.22”, “Tags”: “Details”: “US-West”, “API”: “InsertDVDQueueItem”, [ { “API” “Trace”: “cleansed”, “SERVER”, “IP”: “10.1.1.22”, ] “Tags”: R3C1 R3C2 R3C3 R3C4 “US-West”, “API”: “InsertDVDQueueItem”, [ } “Trace”: “cleansed”, } “API” “SERVER”, ] “Tags”: “US-West”, } [ } “API” “SERVER”, ] “US-West”, } R4C1 R4C2 R4C3 R4C4 } “API” ] } } Relational data model Document data model Highly-structured table Collection of complex documents organization with rigidly-defined with arbitrary, nested data formats data formats and record structure. and varying “record” format. Relational data model As shown above, each record in a relational database conforms to a schema – with a fixed number of fields (columns) each having a specified purpose and data type. Every record is the same. If you wish to capture different data in the future, the database schema must be revised. Additionally, the relational model is characterized by database normalization, where large tables are decomposed into smaller, interrelated tables. The figure below illustrates the concept: Table 1: Error Log Table 2: Data Centers KEY ERR TIME DC KEY LOC NUM 1 ERR TIME FK(DC2) 303-223- 1 DEN 2332 2 ERR TIME FK(DC2) 212-223- 2 NYC 2332 3 ERR TIME FK(DC2) 3 SFO 415-223- 2332 4 ERR TIME FK(DC3) 4 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
  • 5. NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY In the above example, the database is used to store error log information. Each error record (each row in Table 1) consists of an error number (ERR), the time the error occurred (TIME) and the datacenter (DC) in which the error occurred. Instead of repeating all the datacenter information in each error record (location, phone number), each error record points to a row in the in the Data Centers Table (Table 2) which includes the location of the datacenter (LOC) and the phone number (NUM). In the relational model, records are “striped” across multiple tables, with some data shared by multiple records (multiple error records share the same data center information). The upside is that there is less duplicated data in the database. The downside is that a change in a single record can mean locking down many tables simultaneously to ensure a change doesn’t leave the database in an inconsistent state. ACID transactions can be complex on a relational database because the data, even of a single record, is spread about. This complex web of interrelationships between shared data items is what makes it so difficult to distribute relational data across multiple servers and can lead to performance challenges both reading and writing data. When storage resources were expensive and scarce, the tradeoffs made sense. But the price of storage has dropped precipitously over the last 40 years, to say the least. For many, the tradeoff calculus no longer makes sense. Using more storage in exchange for better application performance and the ability to easily distribute workloads across machines is now the best choice for many applications. Document data model Use of the term “document” is a bit confusing. A document-oriented database really has nothing to do with “documents” in the classical sense of the word. It doesn’t mean books, letters or articles. Rather, a document in this case refers to a data record that is self- describing as to the data elements it contains. XML documents, HTML documents and JSON documents are examples of “documents” in this context. Couchbase Server (which contains Apache CouchDB as an ingredient technology) is a document-oriented database that uses JSON as the document format. In Couchbase, error records would look like this: { “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-09-16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-223-2332” } { “ID”: 2, “ERR”: “ECC Error”, “TIME”: “2004-09-16T23:59:59.00”, “DC”: “NYC”, “NUM”: “212-223-2332” } 5 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
  • 6. NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY As can be seen, the data is denormalized. Each record contains a complete set of information related to the error without external reference. The records are self-contained. This makes it very easy to move the entire record to another server – all the information simply comes along with it. There is no concern about having parts of the record in other tables left behind. And because only the self-contained record (document) needs to be updated when changes are made (versus changing entries in multiple tables simultaneously), ensuring ACID compliance is far easier, at record boundaries. Performance is also increased on reads. But complete data denormalization is not required in a document database, as highlighted in the next section. In fact, in this particular example, maintaining documents representing each datacenter and simply referencing those from each error record would probably be the right decision. Separation would eliminate duplication and allow quick changes to information shared across many records (e.g. if the telephone number for the datacenter changed, there would be no need to go update it in every error record). Ultimately, however, data modeling decisions are dependent on the use case and planned update patterns. Document Modeling: Rules of Thumb It takes a while to unlearn habits. But do not fear: by understanding alternatives you will be able to make more efficient use of your trusted knowledge as well. After all, the tool best suited for the job will leave you with the least headache. If you know more tools, you can choose more wisely. Models In an application, data objects are a central construct – the model layer in Model-View- Controller (MVC). These are the documents that hold your data and let you manipulate it. If a blog has posts and comments, these are likely two different models. Ideally, you should have a separate document for every post and every comment. When looking at an existing application, stop at the Object-Relational Mapping (ORM) layer. Instead of splitting your models up into tables and rows, turn them into JSON and make them a document. Each document gets a unique ID by which you can find it later. Done. (Primary) Keys In the NoSQL world the document ID is the one and only key to a document. They are roughly equivalent to primary keys in a relational database. Usually an ID can only appear once in a database (different NoSQL solutions have different names for these: buckets, collections, tables etc. The idea is roughly similar to a table in an RDBMS. You can have many per server). 6 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
  • 7. NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY Some NoSQL database systems sort data by ID. Data with nearby IDs can be accessed more efficiently than IDs that are all over the place. Keeping data that you tend to access at the same time closer together makes your application faster. The larger point here though is that an ID-lookup is extremely fast, and by selecting clever IDs you can make your life a lot easier. One example is the use of prefixes (user:com.example:123) to group your documents. Multiple Places and Editability Suppose you have a piece of data that shows up all over your application but you still want to be able to edit that data. For example, a photo on flickr has a title. The photo can show up in your photo stream, in sets, collections, groups on your flickr front page and in many more places. Usually, a photo’s title is shown with the photo. You could create a document for each occurrence of the photo in each of the places. But then, if you change the title of your picture, you need to update a bunch of documents. If you know this is a bounded number (no more than 10-100 e.g.) and the renaming doesn’t have to happen simultaneously in all places (which means an asynchronous background task could do the renames), using separate docs for each occurrence can work fine. However, if the number of copies is unbounded and could potentially lead you to update thousands of documents, that approach probably won’t work. Instead, you would want to keep the title and perhaps other identifying data in a single “photo information” document and create a separate “photo placement” document for each place the photo appears (these “photo placement” documents would each point to the photo’s information document). Now when you display a photo you will make two lookups: one for the placement document and then another for the photo information document. If you want to change the title of a photo, you just edit the single photo information document and it will be changed everywhere on your site. There is a special trick you can use in Couchbase: Using view collation you can have a single query answering for all the data you need. Christopher Lenz wrote an outstanding blog post on this topic, highlighting three approaches to modeling using the blog example. With views, Couchbase Server allows one to keep a single canonical source of a piece of data while having it show up in many different places. In the RDBMS world you are taught to normalize your data as much as possible; in the NoSQL world you are taught to denormalize as much as possible. In both cases, the truth is somewhere in the middle. 7 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
  • 8. NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY Concurrency Let’s stay with the blog example. There are multiple authors, maybe an editor, and each of them is looking at a single article at any given time. No two people work on the same article, usually. If you have data that you know is only edited by a single person at any given time, it’s a good idea to place it into a single document. Comments are different. Many people can write comments and they can do so independently and simultaneously. Once the post is published comments can be added immediately. To avoid write contention – that is, concurrent writes happening to the same document – you can store comments in separate documents, thus ensuring that again, only one author is editing a single document at any given time. To avoid serializing and locking each comment author out or accidentally overwriting any data, just store the posts ID with the comment to be able to fetch them back in one request for displaying. (Note: Couchbase won’t allow overwriting data, but it requires more complex code to handle that case, so it’s best to just avoid it if possible.) Conclusion The relational data model relies on rigid adherence to a database schema, normalization of data and joins to store data and perform complex queries. Over the last 40 years, relational modeling and query techniques have been well established and are familiar to most application developers. But changes in application, user and infrastructure characteristics have led application developers and architects to seek alternative “NoSQL” (non-relational) database technologies. Many view distributed document database technology as a natural successor to relational database technology: • It effortlessly scales across commodity servers, virtual machines or cloud instances. • It doesn’t require a rigid schema before inserting data, nor does it require a schema change when different data must be captured and processed. • Its rich data model and view technology allows for complex data modeling, capture and queries. It’s important to note that in some cases, additional storage may be required given the preference for data denormalization. But the overall benefits in performance, scalability and flexibility are usually, and increasingly, a more than fair trade. 8 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM
  • 9. NAVIGATING THE TRANSITION FROM RELATIONAL TO NOSQL DATABASE TECHNOLOGY About Couchbase Couchbase is the NoSQL database market share leader, with production deployments at ADP, AOL, BMW, Cisco, Deutsche Post-DHL, Disney, Honda Motors, Zynga and hundreds of other household brands worldwide. VMware CEO Paul Maritz recently identified the data model transition from relational to NoSQL database technology as one of the two ongoing changes most likely to transform the business of IT over the next decade. Couchbase is leading that transition. At the core of the Couchbase product family are three of the most widely deployed and trusted open source data management technologies: CouchDB, Memcached and Membase. Collectively, these technologies power millions of applications worldwide, including 18 of the top 20 busiest web properties. Couchbase products combine these ingredient technologies into a family of interoperable database solutions. The Couchbase team includes the inventors and core committers from each of these projects – representing decades of development and deployment experience in modern distributed, document-oriented database technology. Couchbase offers the only family of interoperable database solutions spanning mobile device to datacenter cloud: • Couchbase Server is a distributed, high-performance, document- oriented database. By automatically spreading data across commodity servers or virtual machines, Couchbase Server makes it easy to employ the optimal quantity of low-cost assets to meet the data management needs of a given application. And because there is no rigid database schema, applications can be more flexible to the needs of the business – capturing and processing whatever data is most relevant without requiring a database schema change. • Couchbase Mobile is a database optimized for the mobile computing environment, addressing intermittent network connectivity and constrained processing, storage, bandwidth and battery resources. It delivers the flexibility benefits of document-oriented database technology to developers of native iOS and Android applications. • CouchSync, a capability common to all Couchbase products, is a mature data replication technology enabling real-time data synchronization between devices and the cloud. It is this differentiating technology that really sets Couchbase apart – enabling millions of mobile devices to capture and present data, while performing aggregation, analysis and enrichment of that data “in the cloud.” For more information, visit www.couchbase.com or www.couchbase.org. 9 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM