The document discusses Apache CouchDB, a NoSQL database management system. It begins with an overview of NoSQL databases and their characteristics like being non-relational, distributed, and horizontally scalable. It then provides details on CouchDB, describing it as a document-oriented database using JSON documents and JavaScript for queries. The document outlines CouchDB's features like schema-free design, ACID compliance, replication, RESTful API, and MapReduce functions. It concludes with examples of CouchDB use cases and steps to set up a sample project using a CouchDB instance with sample employee data and views/shows to query the data.
2. First Off, What is NoSQL?
Class of database management systems that do not follow all of the rules of a traditional relational
DBMS.
Systems typically used in very large databases which are prone to performance problems caused
by SQL and the relational model of databases (Techopedia)
Term is misleading and actually means “Not Only SQL” because some systems may support SQL-
like query languages
Main points: non-relational, distributed, open-source and horizontally scalable
Additional points: Other characteristics that often apply include schema free, easy-replication
support, simple API, eventual consistency, and a huge amount of data
Triggered by the needs of Web 2.0 companies such as Facebook, Google and Amazon.com who
deal with gigantic volumes of data
Designed to support requirements of modern Web, Mobile, and Internet of Things applications and
built to overcome the scale, performance, data model, and data distribution limitations of relational
databases
3. First Off, What is NoSQL?
Motivations are that they have simplicity of design, simpler and faster scaling to clusters of machines (which is a problem for relational
databases), and finer control over availability (Wikipedia)
Types:
Document
CouchDB
MongoDB
Columnar
Cassandra
DynamoDB
Key-Value
Redis
MemcacheDB
Graph
Allegro
Neo4J
4. RDBMS vs. NoSQL
Scale up vs. scale out
clustering servers as needed as opposed to adding memory or
CPU to a central machine, or data center
Structured vs semi-structured and unstructured
Rows/columns vs. JSON vs. images, video, text blobs
Atomic transaction vs. eventual consistency
Atomic
All transactions must happen to the database or none will
Eventual
Cannot guarantee a flow of similar transactions are done at exactly
the same time (though difference is usually milliseconds)
5. What Is CouchDB and Why Use It?
Project created in April 2005 by Damien Katz, a former developer at IBM
Defined as a “storage system for a large scale object database”
Meant to serve web applications
Written in Erlang Programming Language
Schema-free document model
Don’t have to model your data up front as in a relational database
Uses JSON to store data
Query language is JavaScript using MapReduce
ACID Semantics
Implements a form of Multi-Version Concurrency Control, meaning CouchDB can handle a high volume of concurrent readers and writers without conflict
Documents are updated with revisions
Map/Reduce Views and Indexes
Stored data is structured using views
Views are created by using a JavaScript function as the map half of the map/reduce operation.
Function takes a document as a parameter and returns a single value
CouchDB indexes views and keeps them updated as documents are added, removed or altered
6. What Is CouchDB and Why Use It?
Distributed Architecture with Replication
Bi-directional replication and off-line operation
Multiple replicas of a CouchDB instance have their own copies of the same data, modify it, and sync the changes at a later time
REST HTTP API
All database items have a unique URI that can be modified using HTTP methods such as POST, GET, PUT, and DELETE
Eventual Consistency
Guarantees eventual consistency
Perfect for self-contained documents such as an invoice
In a relational DB you would have each invoice stored in a table as a row that refers to other rows in other tables - one row for
seller info, one for buyer, one row for each item billed, and more rows still to describe item details, manufacturer details and so
forth
In CouchDB all the data in the invoice would mimic a real world invoice by having all the data contained within one document
7. Use Cases of CouchDB
Replication and synchronization capabilities make it ideal for using it in mobile devices
where network connection is not guaranteed but the application must keep working
offline
Well suited for applications with accumulating, occasionally changing data, on which
pre-defined queries are to be run and where versioning is important
Enterprises that use it
Amadeus IT Group, for some of their backend systems
Ubuntu used it for their synchronization service “Ubuntu One” from 2009-2011
The BBC, for its dynamic content platforms
Credit Suisse, for internal use at commodities department for their marketplace framework
Sophos, for some of their backend systems
8. Steps of the Project
Scenario: As an employer you have access to a database with potential IT
employees to hire for your company. The database contains documents for each
individual.
Documents contain similar descriptive data but not always exactly the same attributes.
Stay tuned for more.
Ideally a front end application would be written on top of the database, known in
CouchDB lingo as a CouchApp. But for the purpose of the Database Applications class
we will just look at the backend database
9. Steps of the Project
Use video and text sources to become acquainted with CouchDB
Download CouchDB software on to local machine which will act as our DBMS
Use cURL utility tool on command line in conjunction with HTTP API as opposed to the built-
in “Futon” GUI that comes with CouchDB (API allows for interaction with the hosting or
remote server, databases, documents, as well as replication)
1. Create empty IT employee database
10. Steps of the Project
2. Use Mockaroo to generate employee objects in JSON format and bulk import JSON
files as individual documents in to IT employee database (using HTTP API)
11. Steps of the Project
Employee documents contain similar properties but are not always uniform, thus
the beauty of a schema-less but realistic and comparatively real-world design
VS.
12. Steps of the Project
Documents can also include and store attachments such as PDF’s, images, videos
and sound recordings
What a document looks like in
Futon
(These are not real people and all pictures are taken from
Google)
13. Steps of the Project
3. Create special design document which will contain “Views” and “Shows”
Design document is also JSON formatted
There is a “views” key which contains named objects as its value. These values act as the names of the different “views”, which is simply
CouchDB speak for queries. The “view” objects contain map and optional reduce properties that act as keys whose values are implemented
JavaScript functions acting on the database documents. Views are for:
Filtering the documents in your database to find those relevant to a particular process.
Extracting data from your documents and presenting it in a specific order.
Building efficient indexes to find documents by any value or structure that resides in them.
Use these indexes to represent relationships among documents.
Finally, with views you can make all sorts of calculations on the data in your documents. For example, a view can answer the question
of what your company’s spending was in the last week, month, or year.
The “shows” functionality is structured the same as the “views” within the design document
Show functions are also written in JavaScript and are used to render the database JSON documents in to HTML web pages for a
client
The point is that it relieves some work of the developer from having to create browser-side JavaScript for different platforms
The intention is to use a framework such as Ruby on Rails or Django to make HTTP requests to couchDB and render the dynamic
content to the client browser
Popular uses of show functions also include outputting CSV files, PNG images, and XML needed for compatibility with a particular
interface
14. Steps of the Project
The design document in Futon
15. Steps of the Project
4. Test and implement code
5. Run “views” and “shows” queries through HTTP API
6. Make any necessary alterations through HTTP API using cURL utility
17. Demo
Queries through the API
Views
Get the names of the employees
http://127.0.0.1:5984/it_employee_database/_design/example/_view/getName
Get the employees and their skills
http://127.0.0.1:5984/it_employee_database/_design/example/_view/getSkills
Search the database and return all employee candidates who have a criminal record
http://127.0.0.1:5984/it_employee_database/_design/example/_view/getCriminals
AND…lets look at an example of a mug shot of ‘Henry Armstrong’!
http://127.0.0.1:5984/it_employee_database/10d44adbfcd104e10ea4561d28113fd9/mugshot.jpg
Say we want to find someone with a certain set of skills to create a full stack developer or a lead developer
http://127.0.0.1:5984/it_employee_database/_design/example/_view/getFullStackDev
Shows
Usually meant to be ran on individual documents, this is a static HTML rendition of the JSON document for employee candidate ‘Gloria Young’
http://127.0.0.1:5984/it_employee_database/_design/example/_show/summary/10d44adbfcd104e10ea4561d28081664
Atomic:
Transactions to the database are guaranteed to write to disc. This sometimes can cause issues if there is a lot of traffic to a database
Say for example someone wants to modify a table and update a row. The database will lock everyone else out of reading that row until it is updated
Eventual Consistency
An idea of loose coupling. A client stores transactions to some sort of queue and eventually the queue writes the transaction to the server
MVCC Eventual Consistency
Versioning example
Consider a set of requests wanting to access a document. While the first request reads the document, a second request changes the document. This creates a whole new version of the document that couchDB appends to the database without having to wait for the read request to finish. When a third request wants to read the same document, couchDB will point it to the new version that has just been written.