Good Morning
SQL
Mayank Singh
1316110115
CSE - sec ’B’
Agenda
• What is NoSQL?
• NoSQL Technology Breakdown
• Where is NoSQL a Killer App?
• What Good is Relational?
• NoSQL and CouchDB
• NoSQL, Relational, or Both?
What is NoSQL ?
•Common traits
• Non-relational, Scalable
• Non-schematized/schema-free
• Eventual consistency
• Open source
• Distributed
• “Web scale”
• Developed at big Internet companies
Consistency
• CAP Theorem Databases may only excel at two of the following three
attributes: consistency, availability and partition tolerance
• NoSQL does not offer “ACID” guarantees Atomicity, consistency,
isolation and durability
• Instead offers “eventual consistency” Similar to DNS propagation
• Indexing
• Most NoSQL databases are indexed by key
• Some allow so-called “secondary” indexes
• Often the primary key indexes are clustered
• Hbase uses Hadoop Distributed File System, which is append-only
• Writes are logged
• Logged writes are batched
• File is re-created and sorted
• You get control back quikly
• Queries
• Typically no query language
• Instead, create procedural program
• Sometimes SQL is supported
• Sometimes MapReduce code is used…
MapReduce
• Map step: split the query up
• Reduce Most typical of Hadoop andstep: merge the results
• used with Wide Column Stores, esp. Hbase
• Amazon Web Services’ Elastic MapReduce (EMR) can read/write
DynamoDB, S3, Relational Database Service (RDS)
• “Hive”add-on to Hadoop offers a HiveQL (SQL-like) abstraction over MR
• Use with Hive tables
• Use with Hbase
NoSQL Technology Breakdown
• Key-Value Stores
• The most common; not necessarily the most popular
• Has rows, each with something like a big dictionary/associative array
• Schema may differ from row to row
• Common on Cloud platforms
• e.g. Amazon SimpleDB, Azure Table Storage
• MemcacheDB, Voldemort
• DynamoDB (AWS), Dynomite, Redis and Riak
Key-Value Stores
Wide Column Stores
• Has tables with declared column families
• Each column family has “columns” which are KV pair that can vary from row
to row
• These are the most foundational for large sites
• Big Table (Google)
• Hbase (Originally part of Yahoo-dominated Hadoop project)
• Cassandra (Facebook)
• Calls column families “super columns” and tables “super column families”
• They are the most “Big Data”-ready
• Especially Hbase + Hadoop
Wide Column Stores
Document Stores
• Have “databases,” which are akin to tables
• Have “documents,” akin to rows
• Documents are typically JSON objects
• Each document has properties and values
• Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e.
contained JSON objects - Allows for hierarchical storage)
• Can have attachments as well
• Old versions are retained
• So Doc Stores work well for content management
• Some view doc stores as specialized KV stores
• Most popular with developers, startups, VCs
• The biggies:
• CouchDB
• MongoDB
Document Stores
Document Store Application Orientation
• Documents can each be addressed by URIs
• CouchDB supports full REST interface
• Very geared towards JavaScript and JSON
• Documents are JSON objects
• CouchDB/MongoDB use JavaScript as native language
• In CouchDB, “view functions” also have unique URIs and they return HTML
• So you can build entire applications in the database
Graph Databases
• Great for social network applications and others where relationships are
important
• Nodes and edges
• Edge like a join
• Nodes like rows in a table
• Nodes can also have properties and values
• Neo4j is a popular graph db
Graph Databases
• Source: SlideShare
Where is NoSQL a Killer App?
• Content Management
• Document databases work really well here
• Regular KV pairs can store meta data
• Can also store text-based content
• Attachments can store file-based or binary content
• Versioning and URI addressability help as well
• CouchDB gets called a “Web database”
• Product Catalogs
• Products in a catalog tend to have many attributes in common and then
various others that are class-specific
• Common
• Product ID
• Name
• Description
• Price
• Key Value Stores and Wide Column Stores work well here
• Social
• Graph databases work best here
• Great for tracking:
• Networks
• Followers
• Group membership
• Threaded interactions (comments, likes/favorites)
• Great for Membership, Ownership
• Avoids the self-joins and many-to-many tables necessary in relational DBs
• Big Data
• Wide Column and Key-Value Stores work best here
• MapReduce is designed for this scenario
• Hadoop and Hbase come up a lot
• Sharding and append-only help here
• Premise of analytics is reading data, not maintaining it.
• Miscellaneous
• Event-driven data (i.e. logs)
• User profiles, preferences
• Mail, status message streams
• Other Web data
• Automobile directions
• Info for sites on maps (category, name, description, photo)
• User reviews
• Etc.
What Good is Relational ?
Transactional
• support transactions
• Business systems require atomic transactions
• You can’t process an order without decrementing inventory
• You can’t register a credit without its corresponding debit
• No exceptions, no excuses
Formal Schema
• Regular processes have regular data
• Stocks, trades
• PO line items
• Personnel records
• Insurance policies
• These need relational databases with declared schema
• These don’t need MapReduce, document or graph representation
• Banded Reporting
• Operational reporting is based on detail and group sections with predictable,
consistent layout, based on known schema
• Very hard to design pixel-perfect reports against indeterminate schema
• This highlights how operational business processes almost always require
relational databases
• Data Size
• A well-defined query language
• Mature development and administration tools
• Denormalize the database
NoSQL and CouchDB
• Source: Microsoft Azure
NoSQL, Relational, or Both?
• Type of App
• Really a question of consistency versus massive scale
• Is this an internal system or a public one?
• Is it an application for the data or data for a system?
• Below a certain threshold of concurrent usage, NoSQL may be slower than relational
• Skill Sets and Investment
• Does your staff have RDBMS skills already?
• Do you have significant investment in relational database hw/sw?
• Lots of apps that use an RDBMS?
• Do you want to support both?
• Are you a startup?
• Employ developers who possess NoSQL skills and prefer NoSQL?
• Relational databases use Structured Querying Language (SQL), making them a
good choice for applications that involve the management of several
transactions.
• NoSQL Limitations
• In non-relational databases like Mongo, there are no joins like there would be in relational
databases.
• It also doesn’t automatically treat operations as transactions the way a relational database
does, you must manually choose to create a transaction and then manually verify it, manually
commit it or roll it back.
Companies Using NoSQL DB
References
NoSql - mayank singh

NoSql - mayank singh

  • 1.
  • 2.
  • 3.
    Agenda • What isNoSQL? • NoSQL Technology Breakdown • Where is NoSQL a Killer App? • What Good is Relational? • NoSQL and CouchDB • NoSQL, Relational, or Both?
  • 5.
    What is NoSQL? •Common traits • Non-relational, Scalable • Non-schematized/schema-free • Eventual consistency • Open source • Distributed • “Web scale” • Developed at big Internet companies
  • 6.
    Consistency • CAP TheoremDatabases may only excel at two of the following three attributes: consistency, availability and partition tolerance • NoSQL does not offer “ACID” guarantees Atomicity, consistency, isolation and durability • Instead offers “eventual consistency” Similar to DNS propagation
  • 7.
    • Indexing • MostNoSQL databases are indexed by key • Some allow so-called “secondary” indexes • Often the primary key indexes are clustered • Hbase uses Hadoop Distributed File System, which is append-only • Writes are logged • Logged writes are batched • File is re-created and sorted • You get control back quikly • Queries • Typically no query language • Instead, create procedural program • Sometimes SQL is supported • Sometimes MapReduce code is used…
  • 8.
    MapReduce • Map step:split the query up • Reduce Most typical of Hadoop andstep: merge the results • used with Wide Column Stores, esp. Hbase • Amazon Web Services’ Elastic MapReduce (EMR) can read/write DynamoDB, S3, Relational Database Service (RDS) • “Hive”add-on to Hadoop offers a HiveQL (SQL-like) abstraction over MR • Use with Hive tables • Use with Hbase
  • 9.
    NoSQL Technology Breakdown •Key-Value Stores • The most common; not necessarily the most popular • Has rows, each with something like a big dictionary/associative array • Schema may differ from row to row • Common on Cloud platforms • e.g. Amazon SimpleDB, Azure Table Storage • MemcacheDB, Voldemort • DynamoDB (AWS), Dynomite, Redis and Riak
  • 10.
  • 11.
    Wide Column Stores •Has tables with declared column families • Each column family has “columns” which are KV pair that can vary from row to row • These are the most foundational for large sites • Big Table (Google) • Hbase (Originally part of Yahoo-dominated Hadoop project) • Cassandra (Facebook) • Calls column families “super columns” and tables “super column families” • They are the most “Big Data”-ready • Especially Hbase + Hadoop
  • 12.
  • 13.
    Document Stores • Have“databases,” which are akin to tables • Have “documents,” akin to rows • Documents are typically JSON objects • Each document has properties and values • Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. contained JSON objects - Allows for hierarchical storage) • Can have attachments as well • Old versions are retained • So Doc Stores work well for content management • Some view doc stores as specialized KV stores • Most popular with developers, startups, VCs • The biggies: • CouchDB • MongoDB
  • 14.
  • 15.
    Document Store ApplicationOrientation • Documents can each be addressed by URIs • CouchDB supports full REST interface • Very geared towards JavaScript and JSON • Documents are JSON objects • CouchDB/MongoDB use JavaScript as native language • In CouchDB, “view functions” also have unique URIs and they return HTML • So you can build entire applications in the database
  • 16.
    Graph Databases • Greatfor social network applications and others where relationships are important • Nodes and edges • Edge like a join • Nodes like rows in a table • Nodes can also have properties and values • Neo4j is a popular graph db
  • 17.
  • 18.
  • 19.
    Where is NoSQLa Killer App? • Content Management • Document databases work really well here • Regular KV pairs can store meta data • Can also store text-based content • Attachments can store file-based or binary content • Versioning and URI addressability help as well • CouchDB gets called a “Web database” • Product Catalogs • Products in a catalog tend to have many attributes in common and then various others that are class-specific • Common • Product ID • Name • Description • Price • Key Value Stores and Wide Column Stores work well here
  • 20.
    • Social • Graphdatabases work best here • Great for tracking: • Networks • Followers • Group membership • Threaded interactions (comments, likes/favorites) • Great for Membership, Ownership • Avoids the self-joins and many-to-many tables necessary in relational DBs • Big Data • Wide Column and Key-Value Stores work best here • MapReduce is designed for this scenario • Hadoop and Hbase come up a lot • Sharding and append-only help here • Premise of analytics is reading data, not maintaining it.
  • 21.
    • Miscellaneous • Event-drivendata (i.e. logs) • User profiles, preferences • Mail, status message streams • Other Web data • Automobile directions • Info for sites on maps (category, name, description, photo) • User reviews • Etc.
  • 22.
    What Good isRelational ? Transactional • support transactions • Business systems require atomic transactions • You can’t process an order without decrementing inventory • You can’t register a credit without its corresponding debit • No exceptions, no excuses Formal Schema • Regular processes have regular data • Stocks, trades • PO line items • Personnel records • Insurance policies • These need relational databases with declared schema • These don’t need MapReduce, document or graph representation
  • 23.
    • Banded Reporting •Operational reporting is based on detail and group sections with predictable, consistent layout, based on known schema • Very hard to design pixel-perfect reports against indeterminate schema • This highlights how operational business processes almost always require relational databases • Data Size • A well-defined query language • Mature development and administration tools • Denormalize the database
  • 24.
  • 25.
  • 26.
    NoSQL, Relational, orBoth? • Type of App • Really a question of consistency versus massive scale • Is this an internal system or a public one? • Is it an application for the data or data for a system? • Below a certain threshold of concurrent usage, NoSQL may be slower than relational • Skill Sets and Investment • Does your staff have RDBMS skills already? • Do you have significant investment in relational database hw/sw? • Lots of apps that use an RDBMS? • Do you want to support both? • Are you a startup? • Employ developers who possess NoSQL skills and prefer NoSQL?
  • 27.
    • Relational databasesuse Structured Querying Language (SQL), making them a good choice for applications that involve the management of several transactions. • NoSQL Limitations • In non-relational databases like Mongo, there are no joins like there would be in relational databases. • It also doesn’t automatically treat operations as transactions the way a relational database does, you must manually choose to create a transaction and then manually verify it, manually commit it or roll it back.
  • 28.
  • 29.