24-NoSQL continued.pptx

NoSQL
continued
CMSC 461
Michael Wilson

MongoDB
 MongoDB is another NoSQL solution
 Provides a bit more structure than a solution
like Accumulo
 Data is stored as BSON (Binary JSON)
 Binary encoded JSON, extends JSON
 Allows storage of large amounts of data

SQL vs. MongoDB
 SQL has databases, tables, rows, columns
 Monbo has databases, collections,
documents, fields
 Both have primary keys, indexes
 Collection structures are not enforced heavily
 Inserts automatically create schemas

Interacting with MongoDB
 Multiple databases within MongoDB
 Switch databases
 use newDb
 New databases will be stored after an insert
 Create collection
 db.createCollection(“collectionName”)
 Not necessary, collections are implicitly created
on insert

BSON
 MongoDB uses BSON very heavily
 Binary JSON
 Like JSON with a binary serialization method
 Has extensions so that it can represent data
types that JSON cannot
 Used to represent documents, provide input
to queries

Selects/queries
 In MongoDB, querying typically consists of
providing an appropriately crafted BSON
 SELECT * FROM collectionName
 db.collectionName.find()
 SELECT * FROM collectionName WHERE field =
value
 db.collectionName.find( {field: value} )
 SELECT * FROM collectionName WHERE field > 5
 db.collectionName.find( {field: {$gt: 5} } )
 Other functions that take a query argument have
queries that are formatted this way

 Insert
 db.collectionName.insert( {queryBSON} )
 Update
 db.collectionName.update( {queryBSON},
{updateBSON}, {optionBSON} )
 updateBSON
 Set field to 5: {$set: {field: 5}}
 Increment field by 1 {$inc: {field: 1}}
 optionBSON
 Options that determine whether or not to create new
documents, update more than one document, write
concerns

 Delete
 db.collectionName.remove( {queryBSON} )

Apache Hive
 Also runs on Hadoop, uses HDFS as a data
store
 Queryable like SQL
 Using an SQL-inspired language, HiveQL

Hive data organization
 Databases
 Tables
 Partitions
 Tables are broken down into partitions
 Partition keys allow data to be stored into
separate data files on HDFS
 Can query on particular partitions
 Buckets
 Can bucket by column to sample data

Purpose of Hive
 Provide analytics, query large volumes of data
 NOT to be used for real time queries like
Postgres or Oracle
 Hive queries take forever
 Partitions and buckets can help reduce this
amount of time

Hive queries
 Hive queries actually generate MapReduce
jobs
 MapReduce jobs take a while to set up and run
 MapReduce jobs can be run manually, but for
structured data and analytics, Hive can be
used

24-NoSQL continued.pptx

More Related Content

Similar to 24-NoSQL continued.pptx

Recently uploaded

24-NoSQL continued.pptx