Apache CouchDB…relax
BY JACOB DIAMOND
First Off, What is NoSQL?
 Class of database management systems that do not follow all of the rules of a traditional relational
DBMS.
 Systems typically used in very large databases which are prone to performance problems caused
by SQL and the relational model of databases (Techopedia)
 Term is misleading and actually means “Not Only SQL” because some systems may support SQL-
like query languages
 Main points: non-relational, distributed, open-source and horizontally scalable
 Additional points: Other characteristics that often apply include schema free, easy-replication
support, simple API, eventual consistency, and a huge amount of data
 Triggered by the needs of Web 2.0 companies such as Facebook, Google and Amazon.com who
deal with gigantic volumes of data
 Designed to support requirements of modern Web, Mobile, and Internet of Things applications and
built to overcome the scale, performance, data model, and data distribution limitations of relational
databases
First Off, What is NoSQL?
 Motivations are that they have simplicity of design, simpler and faster scaling to clusters of machines (which is a problem for relational
databases), and finer control over availability (Wikipedia)
 Types:
 Document
 CouchDB
 MongoDB
 Columnar
 Cassandra
 DynamoDB
 Key-Value
 Redis
 MemcacheDB
 Graph
 Allegro
 Neo4J
RDBMS vs. NoSQL
 Scale up vs. scale out
 clustering servers as needed as opposed to adding memory or
CPU to a central machine, or data center
 Structured vs semi-structured and unstructured
 Rows/columns vs. JSON vs. images, video, text blobs
 Atomic transaction vs. eventual consistency
 Atomic
 All transactions must happen to the database or none will
 Eventual
 Cannot guarantee a flow of similar transactions are done at exactly
the same time (though difference is usually milliseconds)
What Is CouchDB and Why Use It?
 Project created in April 2005 by Damien Katz, a former developer at IBM
 Defined as a “storage system for a large scale object database”
 Meant to serve web applications
 Written in Erlang Programming Language
 Schema-free document model
 Don’t have to model your data up front as in a relational database
 Uses JSON to store data
 Query language is JavaScript using MapReduce
 ACID Semantics
 Implements a form of Multi-Version Concurrency Control, meaning CouchDB can handle a high volume of concurrent readers and writers without conflict
 Documents are updated with revisions
 Map/Reduce Views and Indexes
 Stored data is structured using views
 Views are created by using a JavaScript function as the map half of the map/reduce operation.
 Function takes a document as a parameter and returns a single value
 CouchDB indexes views and keeps them updated as documents are added, removed or altered
What Is CouchDB and Why Use It?
 Distributed Architecture with Replication
 Bi-directional replication and off-line operation
 Multiple replicas of a CouchDB instance have their own copies of the same data, modify it, and sync the changes at a later time
 REST HTTP API
 All database items have a unique URI that can be modified using HTTP methods such as POST, GET, PUT, and DELETE
 Eventual Consistency
 Guarantees eventual consistency
 Perfect for self-contained documents such as an invoice
 In a relational DB you would have each invoice stored in a table as a row that refers to other rows in other tables - one row for
seller info, one for buyer, one row for each item billed, and more rows still to describe item details, manufacturer details and so
forth
 In CouchDB all the data in the invoice would mimic a real world invoice by having all the data contained within one document
Use Cases of CouchDB
 Replication and synchronization capabilities make it ideal for using it in mobile devices
where network connection is not guaranteed but the application must keep working
offline
 Well suited for applications with accumulating, occasionally changing data, on which
pre-defined queries are to be run and where versioning is important
 Enterprises that use it
 Amadeus IT Group, for some of their backend systems
 Ubuntu used it for their synchronization service “Ubuntu One” from 2009-2011
 The BBC, for its dynamic content platforms
 Credit Suisse, for internal use at commodities department for their marketplace framework
 Sophos, for some of their backend systems
Steps of the Project
 Scenario: As an employer you have access to a database with potential IT
employees to hire for your company. The database contains documents for each
individual.
 Documents contain similar descriptive data but not always exactly the same attributes.
Stay tuned for more.
 Ideally a front end application would be written on top of the database, known in
CouchDB lingo as a CouchApp. But for the purpose of the Database Applications class
we will just look at the backend database
Steps of the Project
 Use video and text sources to become acquainted with CouchDB
 Download CouchDB software on to local machine which will act as our DBMS
 Use cURL utility tool on command line in conjunction with HTTP API as opposed to the built-
in “Futon” GUI that comes with CouchDB (API allows for interaction with the hosting or
remote server, databases, documents, as well as replication)
1. Create empty IT employee database
Steps of the Project
2. Use Mockaroo to generate employee objects in JSON format and bulk import JSON
files as individual documents in to IT employee database (using HTTP API)
Steps of the Project
 Employee documents contain similar properties but are not always uniform, thus
the beauty of a schema-less but realistic and comparatively real-world design
VS.
Steps of the Project
 Documents can also include and store attachments such as PDF’s, images, videos
and sound recordings
What a document looks like in
Futon
(These are not real people and all pictures are taken from
Google)
Steps of the Project
3. Create special design document which will contain “Views” and “Shows”
 Design document is also JSON formatted
 There is a “views” key which contains named objects as its value. These values act as the names of the different “views”, which is simply
CouchDB speak for queries. The “view” objects contain map and optional reduce properties that act as keys whose values are implemented
JavaScript functions acting on the database documents. Views are for:
 Filtering the documents in your database to find those relevant to a particular process.
 Extracting data from your documents and presenting it in a specific order.
 Building efficient indexes to find documents by any value or structure that resides in them.
 Use these indexes to represent relationships among documents.
 Finally, with views you can make all sorts of calculations on the data in your documents. For example, a view can answer the question
of what your company’s spending was in the last week, month, or year.
 The “shows” functionality is structured the same as the “views” within the design document
 Show functions are also written in JavaScript and are used to render the database JSON documents in to HTML web pages for a
client
 The point is that it relieves some work of the developer from having to create browser-side JavaScript for different platforms
 The intention is to use a framework such as Ruby on Rails or Django to make HTTP requests to couchDB and render the dynamic
content to the client browser
 Popular uses of show functions also include outputting CSV files, PNG images, and XML needed for compatibility with a particular
interface
Steps of the Project
The design document in Futon
Steps of the Project
4. Test and implement code
5. Run “views” and “shows” queries through HTTP API
6. Make any necessary alterations through HTTP API using cURL utility
Sources
 https://www.mockaroo.com/
 http://guide.couchdb.org/index.html
 http://nosql-database.org/
 http://www.couchbase.com/nosql-resources/what-is-no-sql
 http://www.datastax.com/
 https://www.techopedia.com/definition/27689/nosql-database
 http://www.tutorialspoint.com/couchdb/index.htm
 https://www.youtube.com/watch?v=TvRDOLiadtg
Demo
 Queries through the API
 Views
 Get the names of the employees
 http://127.0.0.1:5984/it_employee_database/_design/example/_view/getName
 Get the employees and their skills
 http://127.0.0.1:5984/it_employee_database/_design/example/_view/getSkills
 Search the database and return all employee candidates who have a criminal record
 http://127.0.0.1:5984/it_employee_database/_design/example/_view/getCriminals
 AND…lets look at an example of a mug shot of ‘Henry Armstrong’!
 http://127.0.0.1:5984/it_employee_database/10d44adbfcd104e10ea4561d28113fd9/mugshot.jpg
 Say we want to find someone with a certain set of skills to create a full stack developer or a lead developer
 http://127.0.0.1:5984/it_employee_database/_design/example/_view/getFullStackDev
 Shows
 Usually meant to be ran on individual documents, this is a static HTML rendition of the JSON document for employee candidate ‘Gloria Young’
 http://127.0.0.1:5984/it_employee_database/_design/example/_show/summary/10d44adbfcd104e10ea4561d28081664
getName (shown in GUI Futon)
getSkills
getCriminals
getAddress
getFullStackDev
Summary (a show function)

CouchDB

  • 1.
  • 2.
    First Off, Whatis NoSQL?  Class of database management systems that do not follow all of the rules of a traditional relational DBMS.  Systems typically used in very large databases which are prone to performance problems caused by SQL and the relational model of databases (Techopedia)  Term is misleading and actually means “Not Only SQL” because some systems may support SQL- like query languages  Main points: non-relational, distributed, open-source and horizontally scalable  Additional points: Other characteristics that often apply include schema free, easy-replication support, simple API, eventual consistency, and a huge amount of data  Triggered by the needs of Web 2.0 companies such as Facebook, Google and Amazon.com who deal with gigantic volumes of data  Designed to support requirements of modern Web, Mobile, and Internet of Things applications and built to overcome the scale, performance, data model, and data distribution limitations of relational databases
  • 3.
    First Off, Whatis NoSQL?  Motivations are that they have simplicity of design, simpler and faster scaling to clusters of machines (which is a problem for relational databases), and finer control over availability (Wikipedia)  Types:  Document  CouchDB  MongoDB  Columnar  Cassandra  DynamoDB  Key-Value  Redis  MemcacheDB  Graph  Allegro  Neo4J
  • 4.
    RDBMS vs. NoSQL Scale up vs. scale out  clustering servers as needed as opposed to adding memory or CPU to a central machine, or data center  Structured vs semi-structured and unstructured  Rows/columns vs. JSON vs. images, video, text blobs  Atomic transaction vs. eventual consistency  Atomic  All transactions must happen to the database or none will  Eventual  Cannot guarantee a flow of similar transactions are done at exactly the same time (though difference is usually milliseconds)
  • 5.
    What Is CouchDBand Why Use It?  Project created in April 2005 by Damien Katz, a former developer at IBM  Defined as a “storage system for a large scale object database”  Meant to serve web applications  Written in Erlang Programming Language  Schema-free document model  Don’t have to model your data up front as in a relational database  Uses JSON to store data  Query language is JavaScript using MapReduce  ACID Semantics  Implements a form of Multi-Version Concurrency Control, meaning CouchDB can handle a high volume of concurrent readers and writers without conflict  Documents are updated with revisions  Map/Reduce Views and Indexes  Stored data is structured using views  Views are created by using a JavaScript function as the map half of the map/reduce operation.  Function takes a document as a parameter and returns a single value  CouchDB indexes views and keeps them updated as documents are added, removed or altered
  • 6.
    What Is CouchDBand Why Use It?  Distributed Architecture with Replication  Bi-directional replication and off-line operation  Multiple replicas of a CouchDB instance have their own copies of the same data, modify it, and sync the changes at a later time  REST HTTP API  All database items have a unique URI that can be modified using HTTP methods such as POST, GET, PUT, and DELETE  Eventual Consistency  Guarantees eventual consistency  Perfect for self-contained documents such as an invoice  In a relational DB you would have each invoice stored in a table as a row that refers to other rows in other tables - one row for seller info, one for buyer, one row for each item billed, and more rows still to describe item details, manufacturer details and so forth  In CouchDB all the data in the invoice would mimic a real world invoice by having all the data contained within one document
  • 7.
    Use Cases ofCouchDB  Replication and synchronization capabilities make it ideal for using it in mobile devices where network connection is not guaranteed but the application must keep working offline  Well suited for applications with accumulating, occasionally changing data, on which pre-defined queries are to be run and where versioning is important  Enterprises that use it  Amadeus IT Group, for some of their backend systems  Ubuntu used it for their synchronization service “Ubuntu One” from 2009-2011  The BBC, for its dynamic content platforms  Credit Suisse, for internal use at commodities department for their marketplace framework  Sophos, for some of their backend systems
  • 8.
    Steps of theProject  Scenario: As an employer you have access to a database with potential IT employees to hire for your company. The database contains documents for each individual.  Documents contain similar descriptive data but not always exactly the same attributes. Stay tuned for more.  Ideally a front end application would be written on top of the database, known in CouchDB lingo as a CouchApp. But for the purpose of the Database Applications class we will just look at the backend database
  • 9.
    Steps of theProject  Use video and text sources to become acquainted with CouchDB  Download CouchDB software on to local machine which will act as our DBMS  Use cURL utility tool on command line in conjunction with HTTP API as opposed to the built- in “Futon” GUI that comes with CouchDB (API allows for interaction with the hosting or remote server, databases, documents, as well as replication) 1. Create empty IT employee database
  • 10.
    Steps of theProject 2. Use Mockaroo to generate employee objects in JSON format and bulk import JSON files as individual documents in to IT employee database (using HTTP API)
  • 11.
    Steps of theProject  Employee documents contain similar properties but are not always uniform, thus the beauty of a schema-less but realistic and comparatively real-world design VS.
  • 12.
    Steps of theProject  Documents can also include and store attachments such as PDF’s, images, videos and sound recordings What a document looks like in Futon (These are not real people and all pictures are taken from Google)
  • 13.
    Steps of theProject 3. Create special design document which will contain “Views” and “Shows”  Design document is also JSON formatted  There is a “views” key which contains named objects as its value. These values act as the names of the different “views”, which is simply CouchDB speak for queries. The “view” objects contain map and optional reduce properties that act as keys whose values are implemented JavaScript functions acting on the database documents. Views are for:  Filtering the documents in your database to find those relevant to a particular process.  Extracting data from your documents and presenting it in a specific order.  Building efficient indexes to find documents by any value or structure that resides in them.  Use these indexes to represent relationships among documents.  Finally, with views you can make all sorts of calculations on the data in your documents. For example, a view can answer the question of what your company’s spending was in the last week, month, or year.  The “shows” functionality is structured the same as the “views” within the design document  Show functions are also written in JavaScript and are used to render the database JSON documents in to HTML web pages for a client  The point is that it relieves some work of the developer from having to create browser-side JavaScript for different platforms  The intention is to use a framework such as Ruby on Rails or Django to make HTTP requests to couchDB and render the dynamic content to the client browser  Popular uses of show functions also include outputting CSV files, PNG images, and XML needed for compatibility with a particular interface
  • 14.
    Steps of theProject The design document in Futon
  • 15.
    Steps of theProject 4. Test and implement code 5. Run “views” and “shows” queries through HTTP API 6. Make any necessary alterations through HTTP API using cURL utility
  • 16.
    Sources  https://www.mockaroo.com/  http://guide.couchdb.org/index.html http://nosql-database.org/  http://www.couchbase.com/nosql-resources/what-is-no-sql  http://www.datastax.com/  https://www.techopedia.com/definition/27689/nosql-database  http://www.tutorialspoint.com/couchdb/index.htm  https://www.youtube.com/watch?v=TvRDOLiadtg
  • 17.
    Demo  Queries throughthe API  Views  Get the names of the employees  http://127.0.0.1:5984/it_employee_database/_design/example/_view/getName  Get the employees and their skills  http://127.0.0.1:5984/it_employee_database/_design/example/_view/getSkills  Search the database and return all employee candidates who have a criminal record  http://127.0.0.1:5984/it_employee_database/_design/example/_view/getCriminals  AND…lets look at an example of a mug shot of ‘Henry Armstrong’!  http://127.0.0.1:5984/it_employee_database/10d44adbfcd104e10ea4561d28113fd9/mugshot.jpg  Say we want to find someone with a certain set of skills to create a full stack developer or a lead developer  http://127.0.0.1:5984/it_employee_database/_design/example/_view/getFullStackDev  Shows  Usually meant to be ran on individual documents, this is a static HTML rendition of the JSON document for employee candidate ‘Gloria Young’  http://127.0.0.1:5984/it_employee_database/_design/example/_show/summary/10d44adbfcd104e10ea4561d28081664
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Summary (a showfunction)

Editor's Notes

  • #5 Atomic: Transactions to the database are guaranteed to write to disc. This sometimes can cause issues if there is a lot of traffic to a database Say for example someone wants to modify a table and update a row. The database will lock everyone else out of reading that row until it is updated Eventual Consistency An idea of loose coupling. A client stores transactions to some sort of queue and eventually the queue writes the transaction to the server
  • #6 MVCC Eventual Consistency Versioning example Consider a set of requests wanting to access a document. While the first request reads the document, a second request changes the document. This creates a whole new version of the document that couchDB appends to the database without having to wait for the read request to finish. When a third request wants to read the same document, couchDB will point it to the new version that has just been written.