NoSQL
continued
CMSC 461
Michael Wilson
MongoDB
 MongoDB is another NoSQL solution
 Provides a bit more structure than a solution
like Accumulo
 Data is stored as BSON (Binary JSON)
 Binary encoded JSON, extends JSON
 Allows storage of large amounts of data
SQL vs. MongoDB
 SQL has databases, tables, rows, columns
 Monbo has databases, collections,
documents, fields
 Both have primary keys, indexes
 Collection structures are not enforced heavily
 Inserts automatically create schemas
Interacting with MongoDB
 Multiple databases within MongoDB
 Switch databases
 use newDb
 New databases will be stored after an insert
 Create collection
 db.createCollection(“collectionName”)
 Not necessary, collections are implicitly created
on insert
BSON
 MongoDB uses BSON very heavily
 Binary JSON
 Like JSON with a binary serialization method
 Has extensions so that it can represent data
types that JSON cannot
 Used to represent documents, provide input
to queries
Selects/queries
 In MongoDB, querying typically consists of
providing an appropriately crafted BSON
 SELECT * FROM collectionName
 db.collectionName.find()
 SELECT * FROM collectionName WHERE field =
value
 db.collectionName.find( {field: value} )
 SELECT * FROM collectionName WHERE field > 5
 db.collectionName.find( {field: {$gt: 5} } )
 Other functions that take a query argument have
queries that are formatted this way
Interacting with MongoDB
 Insert
 db.collectionName.insert( {queryBSON} )
 Update
 db.collectionName.update( {queryBSON},
{updateBSON}, {optionBSON} )
 updateBSON
 Set field to 5: {$set: {field: 5}}
 Increment field by 1 {$inc: {field: 1}}
 optionBSON
 Options that determine whether or not to create new
documents, update more than one document, write
concerns
Interacting with MongoDB
 Delete
 db.collectionName.remove( {queryBSON} )
Apache Hive
 Also runs on Hadoop, uses HDFS as a data
store
 Queryable like SQL
 Using an SQL-inspired language, HiveQL
Hive data organization
 Databases
 Tables
 Partitions
 Tables are broken down into partitions
 Partition keys allow data to be stored into
separate data files on HDFS
 Can query on particular partitions
 Buckets
 Can bucket by column to sample data
Purpose of Hive
 Provide analytics, query large volumes of data
 NOT to be used for real time queries like
Postgres or Oracle
 Hive queries take forever
 Partitions and buckets can help reduce this
amount of time
Hive queries
 Hive queries actually generate MapReduce
jobs
 MapReduce jobs take a while to set up and run
 MapReduce jobs can be run manually, but for
structured data and analytics, Hive can be
used

24-NoSQL continued.pptx

  • 1.
  • 2.
    MongoDB  MongoDB isanother NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data is stored as BSON (Binary JSON)  Binary encoded JSON, extends JSON  Allows storage of large amounts of data
  • 3.
    SQL vs. MongoDB SQL has databases, tables, rows, columns  Monbo has databases, collections, documents, fields  Both have primary keys, indexes  Collection structures are not enforced heavily  Inserts automatically create schemas
  • 4.
    Interacting with MongoDB Multiple databases within MongoDB  Switch databases  use newDb  New databases will be stored after an insert  Create collection  db.createCollection(“collectionName”)  Not necessary, collections are implicitly created on insert
  • 5.
    BSON  MongoDB usesBSON very heavily  Binary JSON  Like JSON with a binary serialization method  Has extensions so that it can represent data types that JSON cannot  Used to represent documents, provide input to queries
  • 6.
    Selects/queries  In MongoDB,querying typically consists of providing an appropriately crafted BSON  SELECT * FROM collectionName  db.collectionName.find()  SELECT * FROM collectionName WHERE field = value  db.collectionName.find( {field: value} )  SELECT * FROM collectionName WHERE field > 5  db.collectionName.find( {field: {$gt: 5} } )  Other functions that take a query argument have queries that are formatted this way
  • 7.
    Interacting with MongoDB Insert  db.collectionName.insert( {queryBSON} )  Update  db.collectionName.update( {queryBSON}, {updateBSON}, {optionBSON} )  updateBSON  Set field to 5: {$set: {field: 5}}  Increment field by 1 {$inc: {field: 1}}  optionBSON  Options that determine whether or not to create new documents, update more than one document, write concerns
  • 8.
    Interacting with MongoDB Delete  db.collectionName.remove( {queryBSON} )
  • 9.
    Apache Hive  Alsoruns on Hadoop, uses HDFS as a data store  Queryable like SQL  Using an SQL-inspired language, HiveQL
  • 10.
    Hive data organization Databases  Tables  Partitions  Tables are broken down into partitions  Partition keys allow data to be stored into separate data files on HDFS  Can query on particular partitions  Buckets  Can bucket by column to sample data
  • 11.
    Purpose of Hive Provide analytics, query large volumes of data  NOT to be used for real time queries like Postgres or Oracle  Hive queries take forever  Partitions and buckets can help reduce this amount of time
  • 12.
    Hive queries  Hivequeries actually generate MapReduce jobs  MapReduce jobs take a while to set up and run  MapReduce jobs can be run manually, but for structured data and analytics, Hive can be used