Database Choices
Upcoming SlideShare
Loading in...5
×
 

Database Choices

on

  • 390 views

Database choices - Hadoop, NoSQL and Relational

Database choices - Hadoop, NoSQL and Relational

Statistics

Views

Total Views
390
Views on SlideShare
390
Embed Views
0

Actions

Likes
1
Downloads
12
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • http://pragprog.com/book/rwdata/seven-databases-in-seven-weeks
  • http://hortonworks.com/technology/hortonworksdataplatform/ <br /> <br /> More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report <br /> <br /> “Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase. <br /> In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.” <br /> <br /> <br /> http://www.cloudera.com/
  • <br /> http://hortonworks.com/technology/hortonworksdataplatform/ <br /> <br /> More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report <br /> <br /> “Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase. <br /> In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.” <br /> <br /> <br /> http://www.cloudera.com/
  • http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-live.html <br />
  • http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
  • Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience) <br />
  • http://nosql-database.org/ <br /> http://hadoop.apache.org/ & http://www.mongodb.org/ <br /> <br /> Wikipedia - http://en.wikipedia.org/wiki/NoSQL <br /> List of noSQL databases – http://nosql-database.org/ <br /> The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772 <br />
  • http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/ <br /> <br /> http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/ <br />
  • http://en.wikipedia.org/wiki/Project_Voldemort <br /> <br /> http://aws.amazon.com/ <br /> <br /> http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Introduction.html <br /> <br /> http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html <br />
  • http://code.google.com <br /> <br /> Access via REST APIs <br /> Very Cheap, but not much functionality included <br /> Lots of code to write for application development <br /> But…can be a good backup solution <br />
  • http://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.html <br /> http://stage.hypertable.com/index.php/documentation/architecture/ <br /> <br /> http://code.google.com/appengine/ <br /> <br /> http://code.google.com/appengine/articles/datastore/overview.html <br /> <br />
  • http://cwebbbi.wordpress.com/2012/02/14/so-what-is-the-bi-semantic-model/ <br /> http://www.databasejournal.com/features/mssql/understanding-new-column-store-index-of-sql-server-2012.html <br /> http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html <br /> <br /> http://ayende.com/blog/4500/that-no-sql-thing-column-family-databases
  • https://developers.google.com/datastore/docs/concepts/overview <br /> http://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.html
  • http://en.wikipedia.org/wiki/MongoDB <br /> <br /> http://www.mongodb.org/downloads <br /> http://www.mongodb.org/display/DOCS/Drivers <br />
  • http://en.wikipedia.org/wiki/MongoDB & http://try.mongodb.org/ <br /> <br /> http://www.mongodb.org/downloads <br /> http://www.mongodb.org/display/DOCS/Drivers <br />
  • http://www.infinitegraph.com/what-is-a-graph-database.html and http://www.neo4j.org/ <br /> <br /> http://en.wikipedia.org/wiki/Graph_database <br /> <br /> http://www.freebase.com/ <br />
  • http://www.neo4j.org/learn/try
  • For Google - http://code.google.com <br /> For AWS - https://console.aws.amazon.com/console/home <br />
  • Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  • http://rickosborne.org/download/SQL-to-MongoDB.pdf
  • http://www.microsoft.com/en-us/bi/default.aspx <br /> <br /> http://dennyglee.com/ <br /> <br /> Demos -   http://www.youtube.com/watch?v=djfpPsGwm6A and http://www.youtube.com/watch?v=uh9bKWO1K7U
  • Lynn

Database Choices Database Choices Presentation Transcript

  • Database Choices @LynnLangit May 2014 – Techorama
  • Databases Now -> a Menu of Choices
  • Why Change? ->”Small” Big Data Your data - BEHAVIORAL Your data - TRANSACTIONAL PUBLIC data PREMIUM data
  • Current Data Questions • “Should we evaluate Hadoop?” • “How much data is Big Data?” • “What are the limits of SQL Server?” • “Which NoSQL databases (if any) should we consider?” • “How safe is the cloud really?” • “How do we mine the data for usable information?”
  • 5
  • 6 DEMO - About Open Source • Free • Not Free  Rapid iteration, innovation  Can start up for free (on premise)  Can ‘rent’ for cheap or free on the cloud  Can use with the command line for free  Some vendors offer free online training  Ex. www.neo4j.org  Constant releases  Can be deceptively hard to set up (time is money)  Don’t forget to turn it off if on the cloud!  GUI tools, support, training cost $$$  Ex. www.neo4j.com
  • Database Choices – The first level of choice Data A. Hadoop B. NoSQL C. Relational On Premise or In the Cloud
  • Working with Hadoop
  • About Hadoop MapReduce HDFS
  • How you ‘get’ Hadoop •roll your own A. Open source •Cloudera •MapR •Hortonworks •More… B. Commercial distribution •AWS •HDInsight C. Rent it via the cloud
  • 11 Demo - Cloudera Hadoop Enterprise
  • Demo – AWS MapReduce
  • Example Comparison: RDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes and greater Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
  • 15 Database Choices On Premise • RDBMS • NoSQL • Hadoop In Cloud • RDBMS • NoSQL • Hadoop
  • An Aside…SQL Server 2012++ ‘NoSQL’ • SQL Server 2012 Columnstore Index • SQL Server 2012 Tabular Model (SSAS) 2012 2014 SSAS Tabular Models X X NC Columnstore Index X X Clustered (writable) Columnstore Index X In-memory OLTP X
  • But wait… is there a RELATIONAL database that scales, that is cheap, that runs in the cloud?
  • DEMO - AWS Redshift • About $1k per Terabyte per year - relational
  • So many NoSQL options • More than just the Elephant in the room • Over 150+ types of NoSQL databases
  • Flavors of NoSQL Key/Value Volatile Key/value Persistent Wide-Column Document Graph
  • Key / Value Database • Just keys and values – No schema • Persistent or Volatile • Examples – AWS Dynamo DB – Riak
  • DEMO - AWS DynamoDB • Key/Value store on the AWS cloud
  • File (BLOB) Storage Buckets in the Cloud • Amazon – S3 or Glacier • Google – Cloud Storage • Microsoft Azure BLOBS
  • DEMO - Battle of the Buckets • Google Cloud Storage VS. • Windows Azure BLOBS VS. • AWS S3  (Archiving) in to AWS Glacier
  • Column Database • Wide, sparse column sets • Schema-light • Examples: – HBase w/Hadoop – Google Cloud Datastore – SQL Server Columnstore Indexes or SSAS Tabular Models
  • Types of Column Databases • Column-families – Non-relational – Sparse – Examples: • HBase • Cassandra • xVelocity (SQL 2012 Tabular) • Column-stores – Relational – Dense – Example: • SQL Server 2012 Columnstore index
  • DEMO – Google Cloud Datastore
  • DEMO – SQL Server ‘NoSQL’ • SQL Server Columnstore Index • SQL Server SSAS Tabular Model
  • Document Database • document-oriented (collection of JSON documents) w/semi structured data – Encodings include BSON, JSON, XML… • binary forms – PDF, Microsoft Office documents -- Word, Excel…) • Examples: – MongoDB – Couchbase
  • Demo - MongoDB
  • Graph Databases • a lot of many-to-many relationships • recursive self-joins • when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data • Examples: – Neo4j – AlgebraixData – Google Freebase
  • DEMO – Neo4J
  • Cloud-hosted, partially managed RDBMS • AWS RDS – SQL Server – MySQL – PostgreSQL – Oracle • Google – MySQL • Microsoft – SQLAzure
  • DEMO - AWS RDS • SQL Server, MySQL or Oracle • Essential to understand pricing models
  • NoSQL Applied Log Files •Columnstore •HBase Product Catalogs •Key/Value •DynamoDB Social Games •Document •MongoDB Social aggregators •Graph •Neo4j Line-of- Business •RDBMS •SQL Server
  • Cloud Offerings– RDBMS AND NoSQL AWS Google Microsoft Managed RDBMS RDS – all major RDBMS Cloud SQL SQL Azure NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables Streaming or ML Kinesis Prospective Search & Prediction API StreamInsight NoSQL Document or Graph MongoDB on EC2 Neo4j on EC2 None Freebase MongoDB on Microsoft Cloud Neo4j on Microsoft Cloud Hadoop (HBase) Elastic MapReduce (S3 & EC2) None HDInsight Dremel/Warehousing RedShift BigQuery None Cloud ETL Data Pipelines None None
  • But wait… how do I query NoSQL data?
  • Example – translate ANSI SQL to MapReduce
  • Can Excel help? Connector to Hadoop Power BI Data Quality Services Master Data Services Integration with Azure Data Market Data Mining w/Predixion
  • Demo – Excel Power Query
  • NoSQL To-Do List Understand types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problem Try out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environments Learn NoSQL access technologies & services • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc… • Windows Azure Data Market, other public data markets
  • www.TeachingKidsProgramming.org • Free Courseware (Java, Small Basic or C# [on Pluralsight]) • Do a Recipe  Teach a Kid (Ages 10 ++) • recipes)
  • 43 A Big Thank You To Our Sponsors Gold Partners Silver & Track Partners Platinum Partners