Not only SQL - Database Choices

  • 1,968 views
Uploaded on

deck from talk at StartupCodeCamp at House of Devs in the OC in Jan 2014

deck from talk at StartupCodeCamp at House of Devs in the OC in Jan 2014

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,968
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
79
Comments
1
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://pragprog.com/book/rwdata/seven-databases-in-seven-weeks
  • http://hadoop.apache.org/http://en.wikipedia.org/wiki/Apache_Hadoop
  • Hadoop on Azure -- http://msdn.microsoft.com/en-us/magazine/jj190805.aspxhttp://www.oracle.com/technetwork/bdc/hadoop-loader/overview/index.htmlhttp://www.microsoft.com/download/en/details.aspx?id=27584
  • http://hortonworks.com/technology/hortonworksdataplatform/More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”http://www.cloudera.com/
  • http://hortonworks.com/technology/hortonworksdataplatform/More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”http://www.cloudera.com/
  • http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
  • Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • http://lynnlangit.wordpress.com/2011/11/09/relational-cloud-storage-is-50x-more-expensive-than-nosql/
  • http://nosql-database.org/http://hadoop.apache.org/ & http://www.mongodb.org/Wikipedia - http://en.wikipedia.org/wiki/NoSQLList of noSQL databases – http://nosql-database.org/The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
  • http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  • http://en.wikipedia.org/wiki/Project_Voldemorthttp://aws.amazon.com/http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
  • http://code.google.comAccess via REST APIsVery Cheap, but not much functionality includedLots of code to write for application developmentBut…can be a good backup solution
  • http://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.htmlhttp://stage.hypertable.com/index.php/documentation/architecture/http://code.google.com/appengine/http://code.google.com/appengine/articles/datastore/overview.html
  • http://cwebbbi.wordpress.com/2012/02/14/so-what-is-the-bi-semantic-model/http://www.databasejournal.com/features/mssql/understanding-new-column-store-index-of-sql-server-2012.htmlhttp://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.htmlhttp://ayende.com/blog/4500/that-no-sql-thing-column-family-databases
  • https://developers.google.com/datastore/docs/concepts/overviewhttp://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.html
  • http://en.wikipedia.org/wiki/MongoDBhttp://www.mongodb.org/downloadshttp://www.mongodb.org/display/DOCS/Drivers
  • http://en.wikipedia.org/wiki/MongoDB & http://try.mongodb.org/http://www.mongodb.org/downloadshttp://www.mongodb.org/display/DOCS/Drivers
  • http://www.infinitegraph.com/what-is-a-graph-database.html and http://www.neo4j.org/http://en.wikipedia.org/wiki/Graph_databasehttp://www.freebase.com/
  • http://www.neo4j.org/learn/try
  • For Google - http://code.google.comFor AWS - https://console.aws.amazon.com/console/home
  • Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  • http://rickosborne.org/download/SQL-to-MongoDB.pdf
  • http://www.microsoft.com/en-us/bi/default.aspxhttp://dennyglee.com/Demos -   http://www.youtube.com/watch?v=djfpPsGwm6Aand http://www.youtube.com/watch?v=uh9bKWO1K7U
  • DataMarkets – InfoChimps, Factual, DataMarket, Windows Azure Data Marketplace, Wolfram Alpha, Datasifthttp://www.microsoft.com/en-us/sqlazurelabs/default.aspx andhttp://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspxhttps://datamarket.azure.com/http://www.freebase.com/http://code.google.com/p/google-refine/
  • http://www.inboundlogistics.com/cms/article/m2m-101/http://www.freebase.com/Hilary Mason’s datasets - https://bitly.com/bundles/hmason/1
  • Lynn

Transcript

  • 1. Database Choices Lynn Langit Jan 2014 – Startup Code Camp in the OC
  • 2. Data Expertise / Lynn Langit • Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB • Practicing Architect • Technical author / trainer – – – – Pluralsight – Google Cloud Series DevelopMentor – SQL Server 2012 Series 2 books on SQL Server BI Cloudera trainer (certified) • Former MSFT FTE – 4 years
  • 3. Databases Now a Menu of Choices
  • 4. Data Pipeline Process All Acquire New Clean Existing Store Some Query & Mine
  • 5. Is Big Data = NoSQL and just Hadoop? HUGE Hype factor since 2011 Apache Hadoop • a software framework that supports data-intensive distributed applications • under a free license enables applications to work with thousands of nodes and petabytes of data • was inspired by Google's MapReduce and Google File System (GFS) papers
  • 6. Hadoop in the Enterprise
  • 7. How you ‘get’ Hadoop Open source • roll your own Commercial distribution • • • • Cloudera MapR Hortonworks More… Rent it via the cloud • AWS • HDInsight
  • 8. Demo – AWS MapReduce
  • 9. Working with Hadoop
  • 10. About Hadoop MapReduce Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
  • 11. The Hadoop on premises Market Leader Is Cloudera
  • 12. Example Comparison: RDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes and greater Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
  • 13. “Small” BigData vs. “Big” BigData On Premises In the Cloud Hadoop Hadoop NoSQL NoSQL RDBMS RDBMS
  • 14. But wait… is there a relational database that scales that is cheap that runs in the cloud?
  • 15. DEMO - AWS Redshift • About $1k per Terabyte per year - relational
  • 16. Cloud-hosted NoSQL up to 50x CHEAPER
  • 17. So many NoSQL options • More than just the Elephant in the room • Over 150+ types of NoSQL databases
  • 18. Flavors of NoSQL Key/Value Volatile Key/value Persistent Wide-Column Document Graph
  • 19. Key / Value Database • Just keys and values – No schema • Persistent or Volatile • Examples – AWS Dynamo DB – Riak
  • 20. DEMO - AWS DynamoDB • Key/Value store on the AWS cloud
  • 21. File (BLOB) Storage Buckets in the Cloud • Amazon – S3 or Glacier • Google – Cloud Storage • Microsoft Azure BLOBS
  • 22. DEMO - Battle of the Buckets • Google Cloud Storage VS. • Windows Azure BLOBS VS. • AWS S3  (Archiving) in to AWS Glacier
  • 23. Column Database • Wide, sparse column sets • Schema-light • Examples: – HBase w/Hadoop – Google Cloud Datastore – SQL Server Columnstore Indexes or SSAS Tabular Models
  • 24. Types of Column Databases • Column-families – Non-relational – Sparse – Examples: • HBase • Cassandra • xVelocity (SQL 2012 Tabular) • Column-stores – Relational – Dense – Example: • SQL Server 2012 – Columnstore index
  • 25. DEMO – Google Cloud Datastore
  • 26. DEMO – SQL Server ‘NoSQL’ • SQL Server 2012 Columnstore Index • SQL Server 2012 Tabular Model (SSAS)
  • 27. Document Database (Mongo DB) • document-oriented (collection of JSON documents) w/semi structured data – Encodings include BSON, JSON, XML… • binary forms – PDF, Microsoft Office documents -Word, Excel…) • Examples: – MongoDB – Couchbase
  • 28. Demo - Mongo DB
  • 29. Graph Databases • a lot of many-to-many relationships • recursive self-joins • when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data • Examples: – Neo4J – Google Freebase
  • 30. DEMO – Neo4J
  • 31. “Small” BigData vs. “Big” BigData Hadoop Key/Value or Column Document or Graph RDBMS On Premise or In the Cloud
  • 32. Cloud-hosted RDBMS • AWS RDS – SQL Server, mySQL, Oracle – Medium cost – Solid feature set, i.e. backup, snapshot – Use existing tooling • Google – mySQL – Lowest cost – Most limited RDBMS functionality • Microsoft – SQLAzure – Highest cost
  • 33. DEMO - AWS RDS • SQL Server, MySQL or Oracle • Essential to understand pricing models
  • 34. Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
  • 35. Document MongoDB Graph Neo4j RDBMS SQL Server Line-of-Business DynamoDB Social aggregators Key/Value Social Games HBase Product Catalogs Columnstore Log Files NoSQL Applied
  • 36. Cloud Offerings– RDBMS AND NoSQL AWS Google Microsoft RDBMS RDS – all major mySQL SQL Azure NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables Streaming ML or (Mahout) Custom EC2 Prospective Search & Prediction API StreamInsight NoSQL Document or MongoDB on EC2 Graph Freebase MongoDB on Windows Azure NoSQL – Column Hadoop (HBase) Elastic MapReduce using S3 & EC2 none HDInsight Dremel/Warehousi ng RedShift BigQuery none
  • 37. But wait… how do I query NoSQL data?
  • 38. Always MapReduce?
  • 39. Can Excel help? Connector to Hadoop Data Explorer Data Quality Services Master Data Services Integration with Azure Data Market Visualize with PowerView Data Mining w/Predixion
  • 40. Demo - Hadoop Connector to Excel
  • 41. Other types of cloud data services Hosting public datasets • Pay to read • Earn revenue by offering for read Cleaning / matching (your) data • ETL – Microsoft Data Explorer, Google Refine • Data Quality – Windows Azure Data Market, InfoChimps, DataMarket.com
  • 42. Collecting for “BigData” • Sensors everywhere • Structured, Semi-structured, Unstructured vs. Data Standards • M2M • Public Datasets – Freebase – Azure DataMarket – Hillary Mason’s list 42
  • 43. NoSQL To-Do List Understand types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problem Try out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environments Learn NoSQL access technologies & services • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc… • Windows Azure Data Market, other public data markets
  • 44. • recipes) www.TeachingKidsProgramming.org • • Free Courseware (Java, Small Basic or C# [on Pluralsight]) Do a Recipe  Teach a Kid (Ages 10 ++)
  • 45. Keep Learning • Twitter: @LynnLangit • YouTube: http://www.youtube.com/user/SoCalDevGal • Hire me – To help build your BI/Big Data solution – To teach your team next gen BI – To learn more about using NoSQL solutions