Database Choices
@LynnLangit
May 2014 – Techorama
Databases Now -> a Menu of Choices
Why Change? ->”Small” Big Data
Your data -
BEHAVIORAL
Your data -
TRANSACTIONAL
PUBLIC data
PREMIUM
data
Current Data Questions
• “Should we evaluate Hadoop?”
• “How much data is Big Data?”
• “What are the limits of SQL Server?...
5
6
DEMO - About Open Source
• Free • Not Free
 Rapid iteration, innovation
 Can start up for free (on premise)
 Can ‘ren...
Database Choices – The first level of choice
Data
A.
Hadoop
B. NoSQL
C.
Relational
On Premise or In the Cloud
Working with Hadoop
About Hadoop MapReduce
HDFS
How you ‘get’ Hadoop
•roll your own
A. Open source
•Cloudera
•MapR
•Hortonworks
•More…
B. Commercial distribution
•AWS
•HD...
11
Demo - Cloudera Hadoop Enterprise
Demo – AWS MapReduce
Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes and gr...
15
Database Choices
On Premise
• RDBMS
• NoSQL
• Hadoop
In Cloud
• RDBMS
• NoSQL
• Hadoop
An Aside…SQL Server 2012++ ‘NoSQL’
• SQL Server 2012 Columnstore Index
• SQL Server 2012 Tabular Model (SSAS)
2012 2014
SS...
But wait…
is there a
RELATIONAL database
that scales,
that is cheap,
that runs in the cloud?
DEMO - AWS Redshift
• About $1k per Terabyte per year - relational
So many NoSQL options
• More than just the Elephant in the room
• Over 150+ types of NoSQL databases
Flavors of NoSQL
Key/Value
Volatile
Key/value
Persistent
Wide-Column Document Graph
Key / Value Database
• Just keys and values
– No schema
• Persistent or Volatile
• Examples
– AWS Dynamo DB
– Riak
DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud
File (BLOB) Storage Buckets in the Cloud
• Amazon – S3 or Glacier
• Google – Cloud Storage
• Microsoft Azure BLOBS
DEMO - Battle of the Buckets
• Google Cloud Storage VS.
• Windows Azure BLOBS VS.
• AWS S3  (Archiving) in to AWS Glacier
Column Database
• Wide, sparse column sets
• Schema-light
• Examples:
– HBase w/Hadoop
– Google Cloud Datastore
– SQL Serv...
Types of Column Databases
• Column-families
– Non-relational
– Sparse
– Examples:
• HBase
• Cassandra
• xVelocity (SQL 201...
DEMO – Google Cloud Datastore
DEMO – SQL Server ‘NoSQL’
• SQL Server Columnstore Index
• SQL Server SSAS Tabular Model
Document Database
• document-oriented (collection of
JSON documents) w/semi structured
data
– Encodings include BSON, JSON...
Demo - MongoDB
Graph Databases
• a lot of many-to-many relationships
• recursive self-joins
• when your primary objective is quickly find...
DEMO – Neo4J
Cloud-hosted, partially managed RDBMS
• AWS RDS
– SQL Server
– MySQL
– PostgreSQL
– Oracle
• Google
– MySQL
• Microsoft
– ...
DEMO - AWS RDS
• SQL Server, MySQL or Oracle
• Essential to understand pricing models
NoSQL Applied
Log Files
•Columnstore
•HBase
Product
Catalogs
•Key/Value
•DynamoDB
Social Games
•Document
•MongoDB
Social
a...
Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
Managed RDBMS RDS – all major RDBMS Cloud SQL SQL Azure
NoSQL bucket...
But wait…
how do I query
NoSQL data?
Example – translate ANSI SQL to MapReduce
Can Excel help?
Connector to
Hadoop
Power BI
Data Quality
Services
Master Data
Services
Integration
with Azure
Data Market...
Demo – Excel Power Query
NoSQL To-Do List
Understand types of NoSQL databases
• Use NoSQL when business needs designate
• Use the right type of NoS...
www.TeachingKidsProgramming.org
• Free Courseware (Java, Small Basic or C# [on Pluralsight])
• Do a Recipe  Teach a Kid (...
43
A Big Thank You To Our Sponsors
Gold Partners
Silver & Track Partners
Platinum Partners
Database Choices
Upcoming SlideShare
Loading in...5
×

Database Choices

723

Published on

Database choices - Hadoop, NoSQL and Relational

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
723
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
25
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • http://pragprog.com/book/rwdata/seven-databases-in-seven-weeks
  • http://hortonworks.com/technology/hortonworksdataplatform/

    More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report

    “Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.
    In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”


    http://www.cloudera.com/

  • http://hortonworks.com/technology/hortonworksdataplatform/

    More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report

    “Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.
    In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”


    http://www.cloudera.com/
  • http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-live.html
  • http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
  • Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • http://nosql-database.org/
    http://hadoop.apache.org/ & http://www.mongodb.org/

    Wikipedia - http://en.wikipedia.org/wiki/NoSQL
    List of noSQL databases – http://nosql-database.org/
    The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
  • http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/

    http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  • http://en.wikipedia.org/wiki/Project_Voldemort

    http://aws.amazon.com/

    http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Introduction.html

    http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
  • http://code.google.com

    Access via REST APIs
    Very Cheap, but not much functionality included
    Lots of code to write for application development
    But…can be a good backup solution
  • http://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.html
    http://stage.hypertable.com/index.php/documentation/architecture/

    http://code.google.com/appengine/

    http://code.google.com/appengine/articles/datastore/overview.html

  • http://cwebbbi.wordpress.com/2012/02/14/so-what-is-the-bi-semantic-model/
    http://www.databasejournal.com/features/mssql/understanding-new-column-store-index-of-sql-server-2012.html
    http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html

    http://ayende.com/blog/4500/that-no-sql-thing-column-family-databases
  • https://developers.google.com/datastore/docs/concepts/overview
    http://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.html
  • http://en.wikipedia.org/wiki/MongoDB

    http://www.mongodb.org/downloads
    http://www.mongodb.org/display/DOCS/Drivers
  • http://en.wikipedia.org/wiki/MongoDB & http://try.mongodb.org/

    http://www.mongodb.org/downloads
    http://www.mongodb.org/display/DOCS/Drivers
  • http://www.infinitegraph.com/what-is-a-graph-database.html and http://www.neo4j.org/

    http://en.wikipedia.org/wiki/Graph_database

    http://www.freebase.com/
  • http://www.neo4j.org/learn/try
  • For Google - http://code.google.com
    For AWS - https://console.aws.amazon.com/console/home
  • Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  • http://rickosborne.org/download/SQL-to-MongoDB.pdf
  • http://www.microsoft.com/en-us/bi/default.aspx

    http://dennyglee.com/

    Demos -   http://www.youtube.com/watch?v=djfpPsGwm6A and http://www.youtube.com/watch?v=uh9bKWO1K7U
  • Lynn
  • Database Choices

    1. 1. Database Choices @LynnLangit May 2014 – Techorama
    2. 2. Databases Now -> a Menu of Choices
    3. 3. Why Change? ->”Small” Big Data Your data - BEHAVIORAL Your data - TRANSACTIONAL PUBLIC data PREMIUM data
    4. 4. Current Data Questions • “Should we evaluate Hadoop?” • “How much data is Big Data?” • “What are the limits of SQL Server?” • “Which NoSQL databases (if any) should we consider?” • “How safe is the cloud really?” • “How do we mine the data for usable information?”
    5. 5. 5
    6. 6. 6 DEMO - About Open Source • Free • Not Free  Rapid iteration, innovation  Can start up for free (on premise)  Can ‘rent’ for cheap or free on the cloud  Can use with the command line for free  Some vendors offer free online training  Ex. www.neo4j.org  Constant releases  Can be deceptively hard to set up (time is money)  Don’t forget to turn it off if on the cloud!  GUI tools, support, training cost $$$  Ex. www.neo4j.com
    7. 7. Database Choices – The first level of choice Data A. Hadoop B. NoSQL C. Relational On Premise or In the Cloud
    8. 8. Working with Hadoop
    9. 9. About Hadoop MapReduce HDFS
    10. 10. How you ‘get’ Hadoop •roll your own A. Open source •Cloudera •MapR •Hortonworks •More… B. Commercial distribution •AWS •HDInsight C. Rent it via the cloud
    11. 11. 11 Demo - Cloudera Hadoop Enterprise
    12. 12. Demo – AWS MapReduce
    13. 13. Example Comparison: RDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes and greater Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
    14. 14. 15 Database Choices On Premise • RDBMS • NoSQL • Hadoop In Cloud • RDBMS • NoSQL • Hadoop
    15. 15. An Aside…SQL Server 2012++ ‘NoSQL’ • SQL Server 2012 Columnstore Index • SQL Server 2012 Tabular Model (SSAS) 2012 2014 SSAS Tabular Models X X NC Columnstore Index X X Clustered (writable) Columnstore Index X In-memory OLTP X
    16. 16. But wait… is there a RELATIONAL database that scales, that is cheap, that runs in the cloud?
    17. 17. DEMO - AWS Redshift • About $1k per Terabyte per year - relational
    18. 18. So many NoSQL options • More than just the Elephant in the room • Over 150+ types of NoSQL databases
    19. 19. Flavors of NoSQL Key/Value Volatile Key/value Persistent Wide-Column Document Graph
    20. 20. Key / Value Database • Just keys and values – No schema • Persistent or Volatile • Examples – AWS Dynamo DB – Riak
    21. 21. DEMO - AWS DynamoDB • Key/Value store on the AWS cloud
    22. 22. File (BLOB) Storage Buckets in the Cloud • Amazon – S3 or Glacier • Google – Cloud Storage • Microsoft Azure BLOBS
    23. 23. DEMO - Battle of the Buckets • Google Cloud Storage VS. • Windows Azure BLOBS VS. • AWS S3  (Archiving) in to AWS Glacier
    24. 24. Column Database • Wide, sparse column sets • Schema-light • Examples: – HBase w/Hadoop – Google Cloud Datastore – SQL Server Columnstore Indexes or SSAS Tabular Models
    25. 25. Types of Column Databases • Column-families – Non-relational – Sparse – Examples: • HBase • Cassandra • xVelocity (SQL 2012 Tabular) • Column-stores – Relational – Dense – Example: • SQL Server 2012 Columnstore index
    26. 26. DEMO – Google Cloud Datastore
    27. 27. DEMO – SQL Server ‘NoSQL’ • SQL Server Columnstore Index • SQL Server SSAS Tabular Model
    28. 28. Document Database • document-oriented (collection of JSON documents) w/semi structured data – Encodings include BSON, JSON, XML… • binary forms – PDF, Microsoft Office documents -- Word, Excel…) • Examples: – MongoDB – Couchbase
    29. 29. Demo - MongoDB
    30. 30. Graph Databases • a lot of many-to-many relationships • recursive self-joins • when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data • Examples: – Neo4j – AlgebraixData – Google Freebase
    31. 31. DEMO – Neo4J
    32. 32. Cloud-hosted, partially managed RDBMS • AWS RDS – SQL Server – MySQL – PostgreSQL – Oracle • Google – MySQL • Microsoft – SQLAzure
    33. 33. DEMO - AWS RDS • SQL Server, MySQL or Oracle • Essential to understand pricing models
    34. 34. NoSQL Applied Log Files •Columnstore •HBase Product Catalogs •Key/Value •DynamoDB Social Games •Document •MongoDB Social aggregators •Graph •Neo4j Line-of- Business •RDBMS •SQL Server
    35. 35. Cloud Offerings– RDBMS AND NoSQL AWS Google Microsoft Managed RDBMS RDS – all major RDBMS Cloud SQL SQL Azure NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables Streaming or ML Kinesis Prospective Search & Prediction API StreamInsight NoSQL Document or Graph MongoDB on EC2 Neo4j on EC2 None Freebase MongoDB on Microsoft Cloud Neo4j on Microsoft Cloud Hadoop (HBase) Elastic MapReduce (S3 & EC2) None HDInsight Dremel/Warehousing RedShift BigQuery None Cloud ETL Data Pipelines None None
    36. 36. But wait… how do I query NoSQL data?
    37. 37. Example – translate ANSI SQL to MapReduce
    38. 38. Can Excel help? Connector to Hadoop Power BI Data Quality Services Master Data Services Integration with Azure Data Market Data Mining w/Predixion
    39. 39. Demo – Excel Power Query
    40. 40. NoSQL To-Do List Understand types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problem Try out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environments Learn NoSQL access technologies & services • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc… • Windows Azure Data Market, other public data markets
    41. 41. www.TeachingKidsProgramming.org • Free Courseware (Java, Small Basic or C# [on Pluralsight]) • Do a Recipe  Teach a Kid (Ages 10 ++) • recipes)
    42. 42. 43 A Big Thank You To Our Sponsors Gold Partners Silver & Track Partners Platinum Partners
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×