Demystfying Nosql
Databases
Mike King & Matt Thomas
Enterprise Technologists, Big Data
2 Dell - Restricted - Confidential
What are databases?
• Tedd Codd & Chris Date
– 13 rules
– An Introduction to Database Systems
• Wikia/Wikipedia
• Mike
– An organized collection of data offering varying levels
of availability, scalability, performance, consistency,
management, accessibility and quality.
• Matt
Databases defined
3 Dell - Restricted - Confidential
What types of databases
exist?
• Network – Adabas
• Hierarchical – IMS
• Relational – PostgreSQL
• Object Oriented – Versant
• Nosql – MongoDB, Hbase
• Newsql – VoltDB, MemSQL
• XML – MarkLogic, Xyleme
4 Dell - Restricted - Confidential
Nosql background, issues and considerations
• History
– Google Big Table, Amazon Dynamo
• What does schema-less mean?
– On read
– Still structured
– Embedded
– Can vary between records
• Languages & formats used
– Java, Python
– JSON, BSON, XML, CSV
5 Dell - Restricted - Confidential
Nosql background, issues and considerations
continued
• Eric Brewer’s CAP theorem
– Can’t do all three.
• What does NoSQL really mean?
– Distributed, shared-nothing aggregate oriented database
– “Not only SQL” versus “No”
• What are the factors for the various choices?
– Best fit
– Use case(s)
– KV
– HA, Multi-site
– Network
– Kevin Bacon
• Sharding
– Partitioning
6 Dell - Restricted - Confidential
NewSQL
• SQL as predominant access method
• OLTP
• Larger user populations than nosql
• Better consistency than nosql
• Still subject to Brewer’s CAP theorem
• Examples
– VoltDB, MemSQL, Clustrix, NuoDB
7 Dell - Restricted - Confidential
RDBMS or NOSQL?-tablify
• RDBMS
– Large user populations
– Structured
– Static schema
– Strong typing
– Access by PK, AK, indexes
– Complex structures
– Feature rich
– Multi-purpose, shared by apps
– OLTP
– ACID
– Complex queries
– >3 way joins
– Small to medium sized dbs
– COTS pkgs
– Datamarts
• Nosql
– Smaller user populations
– Multi-structured
– Schema evolution
– Weak typing
– Mostly random access by PK
– Simple structures
– Bare bones functionality
– Single purpose/use case, not shared by apps
– Not transactional
– BASE
– Simple queries
– VLDB
– Horizontal scalability
8 Dell - Restricted - Confidential
NoSQL Database Types
• Four types
– Columnar
– Hbase, Cassandra
– Document
– MongoDB, Couchbase
– KV
– Riak, Redis
– Graph
– Neo4j, Titan
• How many do you need?
– By type
– Within type
• Who will manage them?
– DBAs
• How do you access them?
– SQL, nosql
– Sequential
9 Dell - Restricted - Confidential
Nosql Commonalities
• Mostly open source
• Weak typing
• Multi-structured
• Horizontal scale
• No standardization
• VLDB
• Single purpose, per database
10 Dell - Restricted - Confidential
Nosql Differences
• Access
• Formats supported
• Features
• Management
• Administration
• VLDB
• Performance & tuning
• Resource consumption
• Language bindings
• APIs
• Security
• Persistence
• Programmability
• ?Schemas
11 Dell - Restricted - Confidential
How are nosql databases typically used?
• As an adjunct to Hadoop
• As a partial replacement for some RDBMS workloads
• To scale linearly
• As a data store for semi-structured and multi-structured data
12 Dell - Restricted - Confidential
What questions do our customers ask?
• Why is my Hbase cluster so CPU hungry?
• Do you have an RA for <Your favorite nosql db goes here>?
• Can I replace all my Oracle databases w/ some nosql databases?
13 Dell - Restricted - Confidential
What are some common problems?
• Cohabitation with Hadoop and other programs on a cluster.
• Poor db design
• Falling prey to vendor hype
14 Dell - Restricted - Confidential
How about some general recommendations?
• Read a book or two on your target nosql db.
• Search thru the blogosphere & twitterverse.
• Don’t use more than one type, unless you’re an SI or large service provider.
• If performance & service levels are important isolate the cluster.
• Review your database design w/ DBAs & those that have done it already.
– Presentations, conference proceedings, boutique consultancies
15 Dell - Restricted - Confidential
Nosql Examples, Diving Deeper
• Hbase
• MongoDB
• Redis
• Neo4j
16 Dell - Restricted - Confidential
Hbase
• Columnar
– Column families
• Uses ZK
• Has a master
• WAL
• Region servers
• Memstore
• Hfiles
• HDFS
• Uses jvm heap
• Access
– Row key
– Get
– Put
– Scan
– Bulk load
• Design
– Beware of skew
– Tune for peaks
• Perf
– CPU intensive
– Very fast for puts & gets by key
17 Dell - Restricted - Confidential
Neo4j
• Property graph
– Nodes, edges, relationship/arc, direction, data/properties(node & arc)
– Edge labeled multi-digraph
• REST API
• ACID
• Fast, scaleablable lookups
• Lucene index for search
18 Dell - Restricted - Confidential
Our Contact Info
• Mike_King2@dell.com
• @MikeDataKing
• 901-262-7918
• Matt_Thomas@Dell.com
• ?twitter?
• 904-429-6709

Demystfying nosql databases

  • 1.
    Demystfying Nosql Databases Mike King& Matt Thomas Enterprise Technologists, Big Data
  • 2.
    2 Dell -Restricted - Confidential What are databases? • Tedd Codd & Chris Date – 13 rules – An Introduction to Database Systems • Wikia/Wikipedia • Mike – An organized collection of data offering varying levels of availability, scalability, performance, consistency, management, accessibility and quality. • Matt Databases defined
  • 3.
    3 Dell -Restricted - Confidential What types of databases exist? • Network – Adabas • Hierarchical – IMS • Relational – PostgreSQL • Object Oriented – Versant • Nosql – MongoDB, Hbase • Newsql – VoltDB, MemSQL • XML – MarkLogic, Xyleme
  • 4.
    4 Dell -Restricted - Confidential Nosql background, issues and considerations • History – Google Big Table, Amazon Dynamo • What does schema-less mean? – On read – Still structured – Embedded – Can vary between records • Languages & formats used – Java, Python – JSON, BSON, XML, CSV
  • 5.
    5 Dell -Restricted - Confidential Nosql background, issues and considerations continued • Eric Brewer’s CAP theorem – Can’t do all three. • What does NoSQL really mean? – Distributed, shared-nothing aggregate oriented database – “Not only SQL” versus “No” • What are the factors for the various choices? – Best fit – Use case(s) – KV – HA, Multi-site – Network – Kevin Bacon • Sharding – Partitioning
  • 6.
    6 Dell -Restricted - Confidential NewSQL • SQL as predominant access method • OLTP • Larger user populations than nosql • Better consistency than nosql • Still subject to Brewer’s CAP theorem • Examples – VoltDB, MemSQL, Clustrix, NuoDB
  • 7.
    7 Dell -Restricted - Confidential RDBMS or NOSQL?-tablify • RDBMS – Large user populations – Structured – Static schema – Strong typing – Access by PK, AK, indexes – Complex structures – Feature rich – Multi-purpose, shared by apps – OLTP – ACID – Complex queries – >3 way joins – Small to medium sized dbs – COTS pkgs – Datamarts • Nosql – Smaller user populations – Multi-structured – Schema evolution – Weak typing – Mostly random access by PK – Simple structures – Bare bones functionality – Single purpose/use case, not shared by apps – Not transactional – BASE – Simple queries – VLDB – Horizontal scalability
  • 8.
    8 Dell -Restricted - Confidential NoSQL Database Types • Four types – Columnar – Hbase, Cassandra – Document – MongoDB, Couchbase – KV – Riak, Redis – Graph – Neo4j, Titan • How many do you need? – By type – Within type • Who will manage them? – DBAs • How do you access them? – SQL, nosql – Sequential
  • 9.
    9 Dell -Restricted - Confidential Nosql Commonalities • Mostly open source • Weak typing • Multi-structured • Horizontal scale • No standardization • VLDB • Single purpose, per database
  • 10.
    10 Dell -Restricted - Confidential Nosql Differences • Access • Formats supported • Features • Management • Administration • VLDB • Performance & tuning • Resource consumption • Language bindings • APIs • Security • Persistence • Programmability • ?Schemas
  • 11.
    11 Dell -Restricted - Confidential How are nosql databases typically used? • As an adjunct to Hadoop • As a partial replacement for some RDBMS workloads • To scale linearly • As a data store for semi-structured and multi-structured data
  • 12.
    12 Dell -Restricted - Confidential What questions do our customers ask? • Why is my Hbase cluster so CPU hungry? • Do you have an RA for <Your favorite nosql db goes here>? • Can I replace all my Oracle databases w/ some nosql databases?
  • 13.
    13 Dell -Restricted - Confidential What are some common problems? • Cohabitation with Hadoop and other programs on a cluster. • Poor db design • Falling prey to vendor hype
  • 14.
    14 Dell -Restricted - Confidential How about some general recommendations? • Read a book or two on your target nosql db. • Search thru the blogosphere & twitterverse. • Don’t use more than one type, unless you’re an SI or large service provider. • If performance & service levels are important isolate the cluster. • Review your database design w/ DBAs & those that have done it already. – Presentations, conference proceedings, boutique consultancies
  • 15.
    15 Dell -Restricted - Confidential Nosql Examples, Diving Deeper • Hbase • MongoDB • Redis • Neo4j
  • 16.
    16 Dell -Restricted - Confidential Hbase • Columnar – Column families • Uses ZK • Has a master • WAL • Region servers • Memstore • Hfiles • HDFS • Uses jvm heap • Access – Row key – Get – Put – Scan – Bulk load • Design – Beware of skew – Tune for peaks • Perf – CPU intensive – Very fast for puts & gets by key
  • 17.
    17 Dell -Restricted - Confidential Neo4j • Property graph – Nodes, edges, relationship/arc, direction, data/properties(node & arc) – Edge labeled multi-digraph • REST API • ACID • Fast, scaleablable lookups • Lucene index for search
  • 18.
    18 Dell -Restricted - Confidential Our Contact Info • Mike_King2@dell.com • @MikeDataKing • 901-262-7918 • Matt_Thomas@Dell.com • ?twitter? • 904-429-6709

Editor's Notes

  • #3 https://en.wikipedia.org/wiki/Codd's_12_rules
  • #4 https://en.wikipedia.org/wiki/NewSQL https://en.wikipedia.org/wiki/XML_database
  • #5 ? Aggregate orientation VS relational?
  • #9 https://itsavant.wordpress.com/2013/04/23/can-you-get-by-with-just-one-nosql-database/ https://en.wikipedia.org/wiki/NoSQL