Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

7 Databases in 70 minutes


Published on

In the spirit of the book 7 Databases in 7 Weeks, Lara Rubbelke and Karen Lopez cover ~seven databases and datastores in the SQL and NoSQL world, when to use them, and how they are SQL-like.

From SQLBitsXV

Notice an error? Let me know. I welcome this sort of feedback.

Published in: Data & Analytics

7 Databases in 70 minutes

  1. 1. 7 Databases in 70 Minutes Overview of NoSQL in Azure
  2. 2. Technical Architect at Microsoft Primary focus on data solutions in the cloud Lara Rubbelke @sqlgal
  3. 3. Karen has 20+ years of data and information architecture experience on large, multi-project programs. She is a frequent speaker on data modeling, data-driven methodologies and pattern data models. She wants you to love your data. Karen López #TEAMDATA
  4. 4. The only reason for time is so that everything doesn’t happen at once. - Albert Einstein* Session inspired by the book Seven Databases in Seven Weeks
  5. 5. key concepts for hybrid database architectures database / datastore types reasons to go explore Outcomes We want you to leave here understanding:
  6. 6. This is NOT… a deep dive on any technology a comprehensive list a roadmap discussion What We Will Cover
  7. 7. What We’ll Cover NoSQL 101 Comparison to relational Not Only SQL (but really “Not SQL”) Terminology Categories What they are Why you use them When you use them A little of how to use them CAP ACID BASE SCHEMA Cloud Scale
  8. 8. Distributed Systems and the CAP Theorem AvailabilityConsistency Partition Tolerant Eric Brewer’s CAP Theorem and even better CAP Twelve Years Later Myth: Eric Brewer On Why Banks Are BASE Not ACID - Availability Is Revenue
  9. 9. Basically Available Soft State Eventually Consistent BASE ACID Atomic Consistent Isolated Durable BASE - ACID
  10. 10. Polyglot persistence • Optimized for data • Optimized for workload Not all new • EAV • XML • Architecture paradigm: OLAP/DW and OLTP The And
  11. 11. Polyschematic Multiple schemas over the same data Schema on read, not on write Data integrity may be managed elsewhere The Why * ALL DATA HAS STRUCTURE! ** EMBRACE DENORMALIZATION
  12. 12. Kinect Telemetry Retail Application Reporting/Analysis Hadoop Batch Processing Sensor Data Column Family Price Check Key-Value Product Catalog Document Store { }
  13. 13. Data-Intensive Applications in the Cloud Computing World Activity Queue Azure Storage Google Analytics Logs Azure Storage Email DBs SQL Azure x 16 Username DBs SQL Azure x 16 User Profiles SQL Azure x 400 Activity Table X 50 Partitions Azure Storage IIS Logs Azure Storage Data Analysis: Staging Virtual Machine Data Warehouse Reporting Services Activity Processors Worker Roles x 2 Cache Users and Friends Feed Games and Leader Boards Resources and References Distributed Cache x 32 Cache Tasks Worker Roles x 4 Back Office Web Roles x 2 Background Tasks DB Utility DB, Content DB, Taxonomy DB SQL Azure Web Application Web Roles x 180 Web Service/API Web Roles x 2 Moderation Service/Appliance CRISP/3rd Party
  14. 14. NoSQL, Not Only SQL Relational Key Value Column Family Document Hadoop Graph
  15. 15. …Lots of other sessions to learn about this…. Relational
  16. 16. Azure Tables Azure Redis Cache Key-Value
  17. 17. Database Key-Value: Sample Use Table: PriceCompare LocationID ProductBySellerID ProductProperties 123 013803204131 {Seller:“Camera Superstore”, Price:425.99, PriceDate:2014-11-06, SellerType:”Online”} Row Key PropertiesPartition Key
  18. 18. • Low cost, scalable, highly available and geo-redundant • Flexible schema • Fast reads and writes on single key values or partitioned key values • Log data and cache Patterns/What Works Anti-Pattern/Danger Anything that requires: • Joins • Custom sorting • Non-key filters Why Key-Value
  19. 19. // Create a table client. CloudTableClient tableKinect = account.CreateCloudTableClient(); CloudTable tableKinectTelemetry = tableKinect.GetTableReference(“pricecompare"); // Create a query for all entities. IQueryable<DynamicTableEntity> query = from q in tableKinectTelemetry.CreateQuery<DynamicTableEntity>() where q.PartitionKey.Equals(123) and q.RowKey.Equals(013803204131) select q; Azure Tables: LINQ Query
  20. 20. Introduction to Windows Azure Tables Azure Redis Cache 101 on Channel9 Learn More: Azure Tables and Redis Cache
  21. 21. DocDB MongoDB BSON & JSON Databases, Documents, Collections Document
  22. 22. Document: Persistence
  23. 23. Nested Arrays Keys & Values Text, text, text…. Similar to XML patterns Document Features
  24. 24. Document: Query DocDB Mongo DB
  25. 25. • Variable Data Structures for same type of entity • Fast reads and writes on a complete entity set • Highly nested data stories • Partially completed workflows • You love JavaScript  Patterns/What Works Anti-Pattern/Danger Anything that requires: • Joins • Complex transactional needs • Lots of aggregation Why Document
  26. 26. Logs Pre-aggregated data Product Catalog Shopping Cart Travel Reservation Document Use Cases
  27. 27. Azure DocumentDB .NET Code Samples Azure DocumentDB 101 on Channel9 Azure DocumentDB 102 on Channel9 Build a web application with ASP.NET MVC using DocumentDB Learn More: Azure DocumentDB
  28. 28. Column Family
  29. 29. Sensor Data Analysis Real-time Query Web Indexer Message Systems Interactive Dashboards Column Family Use Cases
  30. 30. Apache HBase Features Random and Consistent Real-Time Read/Write Automatic Sharding and Linear Scale Billions of Rows and Millions of Columns
  31. 31. A map of maps…. With Tables Column Families Rows Columns Values Column Family Stores
  32. 32. Don’tThinkAbout
  33. 33. ThinkAbout
  34. 34. ThinkAbout Row Key 720 gender -> male age -> 62 721 gender -> male photo -> image 723 video -> stream Person Table
  35. 35. sparse | persistent | distributed | sorted | multidimensional Understanding BigTable { "trackingid" : 720, "gender" : "male", "age" : 62 } Great Reference: Understanding HBase and Big Table
  36. 36. HBase: A map of maps… { "720" : { "age" : "62", "gender" : "male" }, "721" : { "age" : "40", "gender" : "male", "confidence" : "0.65" }, "722" : { "gender" : "female" }, “723" : { "age" : "12", "gender" : "female", "confidence" : "0.65" }, … } Row Key Sparse
  37. 37. HBase: Column Families "720" : { “demographics” : { "age" : “62", "gender" : “male“ }, “interactions” : { “devicestate” : “removed”, “duration” : “100” } }, "721" : { “demographics” : { "age" : “40", "gender" : “male“ }, “interactions” : { “devicestate” : “replaced”, “duration” : “50” } } … Demographics Interactions Demographics Interactions Multidimensional
  38. 38. HBase: Physical View of a Sorted Map Sort Order Row Key Column Name Timestamp Row Key Column Key Timestamp Value 720 demographics:age 1423234758774 62 720 demographics:gender 1423234758711 male 721 demographics:age 1423234758946 22 721 demographics:age 1423234758725 32 721 demographics:gender 1423234758950 female telemetry Cell Uninterpreted Bytes {row, column, version}
  39. 39. HBase: Query And… HBase SDK for .NET
  40. 40. CREATE TABLE IF NOT EXISTS "kinecttelemetry"("k" VARCHAR primary key, "age" VARCHAR, "gender" VARCHAR) default_column_family='demographics'; Apache Phoenix: SQL Skin over HBase Phoenix in 15 Minutes or Less
  41. 41. Get started using HBase with Hadoop in HDInsight Analyze Real-Time Twitter Sentiment with HBase in HDInsight Learn More: HBase on Azure
  42. 42. Distributed Storage (HDFS or Blob Storage) Distributed Processing (MapReduce) Scripting (Pig) SQL-like Query (HiveQL) SQL-like Query (Impala) Resource Scheduling (YARN) Hadoop Zoo Real-Time (HBase)
  43. 43. Hadoop On Your Terms Cloudera Selects Microsoft Azure as a Preferred Cloud Platform Hortonworks Data Platform is now Microsoft Azure Certified 100% Apache Hadoop-based Service in the Cloud Microsoft Azure HDInsight Qubole Partners with Microsoft Azure
  44. 44. It’s a text file…really Hadoop: Persistence
  45. 45. CREATE EXTERNAL TABLE irs_data_20082( state string, zipcode string, agi_class int, n1 int, mars2 int, prep int, n2 int, numdep int, ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'wasb://$containerName@$'; Create Table Query select state, zipcode, agi_class from irs_Data_20082; Hadoop Hive: External Table
  46. 46. • Batch processing • Map…and reduce • Lots of aggregation • Multiple schemas on same data • Fast Patterns/What Works Anti-Pattern/Danger Anything that requires: • Joins • Complex transactional needs • Granular security requirements • Not a relational database replacement • Not fast Why Hadoop
  47. 47. us/documentation/services/hdinsight/ Resource for Hadoop on Azure
  48. 48. Neo4j Project Naiad (MSR to Open Source) Graph
  49. 49. CREATE Query Graph Database
  50. 50. • Highly connected data • Relationships make the data story • Paths through data • Finding shortest/longest path Patterns/What Works Anti-Pattern/Danger • Low connected data (e.g. Log data) • Very high number of updates on a regular basis. Why Graph
  51. 51. FoaF (Social Graph) Market Basket Analysis Forensics Fraud Detection Recommendations Use Cases for Graph Databases
  52. 52. Free Graph Dabases E-Book Project Naiad from Microsoft Research Learn More: Graph Databases
  53. 53. It’s fun Database technologies aren’t YES/NO decisions It’s inexpensive to learn It’s fast to spin up a learning environment A data professional needs to knows more than one tool Using the right tool for the right job is key It’s fun 7 Reasons to Go Explore
  54. 54. • MSDN Subscription Benefit • Trial Accounts Go Explore!
  55. 55. key concepts for hybrid database architectures database / datastore types reasons to go explore Outcomes We want you to leave here understanding: