NoSQL for the DBA   Lynn Langit                    April 2013 – Big Data Tech Con
Data Expertise / Lynn Langit• Industry awards   – Microsoft – MVP for SQL Server   – Google – GDE for Cloud Platform   – 1...
but first…Business Intelligence     to BigData
What is the relationship?   Business                NoSQL       ???? Intelligence
“The Past” BI = Effective ReportsDataoptimized forStaticREADING
BI = Optimized RDBMSSQL queries & Data Stored on disk
BI = OLAP Cubes storage
BI = OLAP Cubes clients
BI = Transactional Data  Collecting    • What happened?Transactional   • Why did that happen?    data        • Decision Su...
So Why Change?
Enter                     Big DataQ: What is it?A: Your Data, plus more data….
BigData Pipeline - STEP 1 – AcquireAcquire          Process                    Store                            Query & Mi...
Big Data – an example from weather          13
Big Data – an example from weather• Source Data   • National weather data   • Satellite data   • Airplanes with sensors   ...
Big Data – an example from health care• Medical records    • Regular    • Emergency    • Genetic data – 23andMe• Food data...
BigData = ‘Next State’ Questions              • What could happen? Collecting   • Why didn’t this happen?              • W...
What is the reality of personalized medicine?25002000                                               Key Monitoring1500    ...
BigData and Verticals        •   Retail        •   Manufacturing        •   Health Care        •   Banking        •   Educ...
Collecting BigData• Sensors everywhere• Structured, Semi-structured, Unstructured vs. Data  Standards• M2M• Public Dataset...
DEMO – Hilary Mason’s Datasets• Who is Hilary Mason and why do you care  about her datasets?• How do you get her datasets?...
Collecting Data – a note about Faces• Facial recognition• Voice recognition• Gesture capture and analysis               21
Petabytes    of Big Data
Big Data at Apple
Big Data in IndiaUpdate: “The total number of AADHAARs issued as of 24-Mar-2013 is over 304 million. This is more than 25%...
BigData Pipeline – STEP 5 - VisualizeAcquire          Process                    Store                            Query & ...
DEMO - Visualizing Big Data: Wind Map           26
Demo - Visualizing Big Data – D3         27
BigData Pipeline – STEP 2 - ProcessAcquire          Process                    Store                            Query & Mi...
How do you clean up the mess?•   Data Hygiene•   Data Scrubbing•   Data Sprawl•   The true cost of data•   …and what about...
Is NoSQL just Hadoop?HUGE Hype factor since 2011Apache Hadoop• a software framework that supports data-intensive distribut...
What is the relationship?NoSQL     Hadoop     ???    BigData
Hadoop in the Enterprise
How you ‘get’ HadoopOpen source• roll your ownCommercial distribution•   Cloudera•   MapR•   Hortonworks•   More…Rent it v...
Demo – Get and Use Cloudera CDH4 VM
Working with Hadoop
About Hadoop MapReduce     Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
Demo - HDInsight – MapReduce w/Java
Demo - HDInsight – MapReduce w/ Hive
Example Comparison: RDBMS vs. Hadoop            Traditional RDBMS         Hadoop / MapReduceData Size   Gigabytes (Terabyt...
BigData Pipeline STEP 3 – StoreAcquire          Process                    Store                            Query & Mine  ...
“Small” BigData vs. “Big” BigData              Hadoop              NoSQL              RDBMS
The reality…two pivotsStorage Methods     Storage Locations• SQL (RDBMS)       • On premises• NoSQL or Hadoop   • Cloud-ho...
Cloud-hosted NoSQL up to 50x CHEAPER
So many NoSQL options• More than just the Elephant in the room• Over 120+ types of NoSQL databases
Flavors of NoSQLKey/Value   Key/value    Wide-Column   Document   GraphVolatile    Persistent
Key / Value Database• Just keys and values  – No schema• Persistent or Volatile• Examples  – AWS Dynamo DB  – Riak
DEMO - AWS DynamoDB• Key/Value store on the AWS cloud
NoSQL BLOB Storage Buckets in the Cloud•   Amazon – S3 or Glacier•   Google – Cloud Storage•   Microsoft Azure BLOBS•   Ot...
DEMO - Battle of the Buckets• Google Cloud Storage VS.• Windows Azure BLOBS VS.• AWS S3 / Glacier
Column Database• Wide, sparse column sets  • Schema-light• Examples:  –   Cassandra  –   HBase w/Hadoop  –   BigTable  –  ...
Types of Column Databases• Column-families  – Non-relational  – Sparse  – Examples:     • HBase     • Cassandra     • xVel...
DEMO – SQL Server ‘NoSQL’• SQL Server 2012 Columnstore Index• SQL Server 2012 Tabular Model (SSAS)
Document Database (Mongo DB)• document-oriented (collection of  JSON documents) w/semi structured  data   – Encodings incl...
Demo - Mongo DB
Graph Databases• a lot of many-to-many relationships• recursive self-joins• when your primary objective is quickly  findin...
DEMO – Neo4J
CAP Theorem applied = ‘how big is it?’• CA = RDBMS  – Highly-available consistency  – Ex. SQL Server• CP = NoSQL  – Enforc...
“Small” BigData vs. “Big” BigData                Hadoop              Key/Value or                Column              Docum...
Cloud-hosted RDBMS• AWS RDS – SQL Server,  mySQL, Oracle  – Medium cost  – Solid feature set, i.e.    backup, snapshot  – ...
DEMO - AWS RDS• SQL Server, MySQL or Oracle• Essential to understand pricing models
Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
NoSQL AppliedColumnstore   Log FilesHBase                          Key/Value                                       Product...
Cloud Offerings– RDBMS AND NoSQL                   AWS                 Google               MicrosoftRDBMS              RD...
BigData Pipeline STEP 4 – QueryAcquire          Process                    Store                            Query & Mine  ...
Always MapReduce?
Data Scientists and Languages
Karmasphere Studio for AWS
Can Excel help?•   Connector to Hadoop•   Data Explorer•   Data Quality Services•   Master Data Services•   Integration wi...
Demo - Hadoop Connector to Excel
Google BigQuery w/Excel• Hadoop-like (Dremel) based service• For massive amounts of data• SQL-like query language
DEMO - Google BigQuery• Hadoop-like (Dremel) based service• For massive amounts of data• SQL-like query language
Dremel Realized => Impala• Interactive Hadoop?
Other types of cloud data servicesHosting public datasets               Cleaning / matching (your)• Pay to read           ...
NoSQL To-Do ListUnderstand CAP & types of NoSQL databases • Use NoSQL when business needs designate • Use the right type o...
The Changing Data Landscape                               Other                              ServicesRDBMS          NoSQL
• recipes)    www.TeachingKidsProgramming.org      •   Free Courseware (      •   Do a Recipe  Teach a Kid (Ages 10 ++)  ...
Toward Data Craftsmanship…                Follow me @LynnLangit                     RSS my blog                  www.LynnL...
NoSQL for the SQL Server Pro
Upcoming SlideShare
Loading in …5
×

NoSQL for the SQL Server Pro

4,895 views

Published on

Published in: Technology
  • Be the first to comment

NoSQL for the SQL Server Pro

  1. 1. NoSQL for the DBA Lynn Langit April 2013 – Big Data Tech Con
  2. 2. Data Expertise / Lynn Langit• Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB• Practicing Architect• Technical author / trainer – Pluralsight – Google Cloud Series – DevelopMentor – SQL Server Series – 2 books on SQL Server BI – Cloudera trainer (certified)• Former MSFT FTE – 4 years
  3. 3. but first…Business Intelligence to BigData
  4. 4. What is the relationship? Business NoSQL ???? Intelligence
  5. 5. “The Past” BI = Effective ReportsDataoptimized forStaticREADING
  6. 6. BI = Optimized RDBMSSQL queries & Data Stored on disk
  7. 7. BI = OLAP Cubes storage
  8. 8. BI = OLAP Cubes clients
  9. 9. BI = Transactional Data Collecting • What happened?Transactional • Why did that happen? data • Decision Support Systems
  10. 10. So Why Change?
  11. 11. Enter Big DataQ: What is it?A: Your Data, plus more data….
  12. 12. BigData Pipeline - STEP 1 – AcquireAcquire Process Store Query & Mine Visualize
  13. 13. Big Data – an example from weather 13
  14. 14. Big Data – an example from weather• Source Data • National weather data • Satellite data • Airplanes with sensors • Sensors on boats • Sensors in the ocean • Sensors on the ground • Historical Data • Social Media• Results • More accurate predictions • Tsunami • Tornado
  15. 15. Big Data – an example from health care• Medical records • Regular • Emergency • Genetic data – 23andMe• Food data • SparkPeople• Purchasing • Grocery card • credit card• Search – Google• Social media • Twitter • Facebook• Exercise • Nike Fuel Band • Kinect • Location - phone
  16. 16. BigData = ‘Next State’ Questions • What could happen? Collecting • Why didn’t this happen? • When will the next new thing Behavioral happen? data • What will the next new thing be? • What happens?
  17. 17. What is the reality of personalized medicine?25002000 Key Monitoring1500 Sensor Readings1000500 Other Behavioral 0 data 12:00 12:30 1:00 1:30 2:00 2:30
  18. 18. BigData and Verticals • Retail • Manufacturing • Health Care • Banking • Education
  19. 19. Collecting BigData• Sensors everywhere• Structured, Semi-structured, Unstructured vs. Data Standards• M2M• Public Datasets – Freebase – Azure DataMarket – Hillary Mason’s list 19
  20. 20. DEMO – Hilary Mason’s Datasets• Who is Hilary Mason and why do you care about her datasets?• How do you get her datasets?• What do you do with her datasets?
  21. 21. Collecting Data – a note about Faces• Facial recognition• Voice recognition• Gesture capture and analysis 21
  22. 22. Petabytes of Big Data
  23. 23. Big Data at Apple
  24. 24. Big Data in IndiaUpdate: “The total number of AADHAARs issued as of 24-Mar-2013 is over 304 million. This is more than 25% of thepopulation of India.”
  25. 25. BigData Pipeline – STEP 5 - VisualizeAcquire Process Store Query & Mine Visualize
  26. 26. DEMO - Visualizing Big Data: Wind Map 26
  27. 27. Demo - Visualizing Big Data – D3 27
  28. 28. BigData Pipeline – STEP 2 - ProcessAcquire Process Store Query & Mine Visualize
  29. 29. How do you clean up the mess?• Data Hygiene• Data Scrubbing• Data Sprawl• The true cost of data• …and what about data integrity?• …and security?• …should your data be in the cloud?
  30. 30. Is NoSQL just Hadoop?HUGE Hype factor since 2011Apache Hadoop• a software framework that supports data-intensive distributed applications• under a free license enables applications to work with thousands of nodes and petabytes of data• was inspired by Googles MapReduce and Google File System (GFS) papers
  31. 31. What is the relationship?NoSQL Hadoop ??? BigData
  32. 32. Hadoop in the Enterprise
  33. 33. How you ‘get’ HadoopOpen source• roll your ownCommercial distribution• Cloudera• MapR• Hortonworks• More…Rent it via the cloud• AWS
  34. 34. Demo – Get and Use Cloudera CDH4 VM
  35. 35. Working with Hadoop
  36. 36. About Hadoop MapReduce Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
  37. 37. Demo - HDInsight – MapReduce w/Java
  38. 38. Demo - HDInsight – MapReduce w/ Hive
  39. 39. Example Comparison: RDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduceData Size Gigabytes (Terabytes) Petabytes and greaterAccess Interactive and Batch Batch – NOT InteractiveUpdates Read / Write many times Write once, Read many timesStructure Static Schema Dynamic SchemaIntegrity High (ACID) LowScaling Nonlinear LinearQuery Can be near immediate Has latency (due to batchResponse processing)Time
  40. 40. BigData Pipeline STEP 3 – StoreAcquire Process Store Query & Mine Visualize
  41. 41. “Small” BigData vs. “Big” BigData Hadoop NoSQL RDBMS
  42. 42. The reality…two pivotsStorage Methods Storage Locations• SQL (RDBMS) • On premises• NoSQL or Hadoop • Cloud-hosted
  43. 43. Cloud-hosted NoSQL up to 50x CHEAPER
  44. 44. So many NoSQL options• More than just the Elephant in the room• Over 120+ types of NoSQL databases
  45. 45. Flavors of NoSQLKey/Value Key/value Wide-Column Document GraphVolatile Persistent
  46. 46. Key / Value Database• Just keys and values – No schema• Persistent or Volatile• Examples – AWS Dynamo DB – Riak
  47. 47. DEMO - AWS DynamoDB• Key/Value store on the AWS cloud
  48. 48. NoSQL BLOB Storage Buckets in the Cloud• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS• Others – Dropbox – Box – More…
  49. 49. DEMO - Battle of the Buckets• Google Cloud Storage VS.• Windows Azure BLOBS VS.• AWS S3 / Glacier
  50. 50. Column Database• Wide, sparse column sets • Schema-light• Examples: – Cassandra – HBase w/Hadoop – BigTable – GAE HR DS
  51. 51. Types of Column Databases• Column-families – Non-relational – Sparse – Examples: • HBase • Cassandra • xVelocity (SQL 2012 Tabular)• Column-stores – Relational – Dense – Example: • SQL Server 2012 – Columnstore index
  52. 52. DEMO – SQL Server ‘NoSQL’• SQL Server 2012 Columnstore Index• SQL Server 2012 Tabular Model (SSAS)
  53. 53. Document Database (Mongo DB)• document-oriented (collection of JSON documents) w/semi structured data – Encodings include BSON, JSON, XML…• binary forms – PDF, Microsoft Office documents -- Word, Excel…)• Examples: – MongoDB – Couchbase
  54. 54. Demo - Mongo DB
  55. 55. Graph Databases• a lot of many-to-many relationships• recursive self-joins• when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data• Examples: – Neo4J – Google Freebase
  56. 56. DEMO – Neo4J
  57. 57. CAP Theorem applied = ‘how big is it?’• CA = RDBMS – Highly-available consistency – Ex. SQL Server• CP = NoSQL – Enforced consistency – Ex. Hadoop• AP = NoSQL – Eventual consistency – Ex. MongoDB
  58. 58. “Small” BigData vs. “Big” BigData Hadoop Key/Value or Column Document or Graph RDBMS
  59. 59. Cloud-hosted RDBMS• AWS RDS – SQL Server, mySQL, Oracle – Medium cost – Solid feature set, i.e. backup, snapshot – Use existing tooling• Google – mySQL – Lowest cost – Most limited RDBMS functionality• Microsoft – SQLAzure – Highest cost
  60. 60. DEMO - AWS RDS• SQL Server, MySQL or Oracle• Essential to understand pricing models
  61. 61. Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
  62. 62. NoSQL AppliedColumnstore Log FilesHBase Key/Value Product Catalogs DynamoDB Document Social Games MongoDB Graph Social aggregators Neo4j RDBMS Line-of-Business SQL Server
  63. 63. Cloud Offerings– RDBMS AND NoSQL AWS Google MicrosoftRDBMS RDS – all major mySQL SQL AzureNoSQL buckets S3 or Glacier Cloud Storage Azure BlobsNoSQL Key-Value DynamoDB H/R Data on GAE Azure TablesStreaming ML or Custom EC2 Prospective Search StreamInsight(Mahout) & Prediction APINoSQL Document or MongoDB on EC2 Freebase MongoDB onGraph Windows AzureNoSQL – Column Elastic MapReduce none HDInsightHadoop (HBase) using S3 & EC2Dremel/Warehousi RedShift BigQuery noneng
  64. 64. BigData Pipeline STEP 4 – QueryAcquire Process Store Query & Mine Visualize
  65. 65. Always MapReduce?
  66. 66. Data Scientists and Languages
  67. 67. Karmasphere Studio for AWS
  68. 68. Can Excel help?• Connector to Hadoop• Data Explorer• Data Quality Services• Master Data Services• Integration with Azure Data Market• Visualize with PowerView• Data Mining w/Predixion
  69. 69. Demo - Hadoop Connector to Excel
  70. 70. Google BigQuery w/Excel• Hadoop-like (Dremel) based service• For massive amounts of data• SQL-like query language
  71. 71. DEMO - Google BigQuery• Hadoop-like (Dremel) based service• For massive amounts of data• SQL-like query language
  72. 72. Dremel Realized => Impala• Interactive Hadoop?
  73. 73. Other types of cloud data servicesHosting public datasets Cleaning / matching (your)• Pay to read data• Earn revenue by offering for read • ETL – Microsoft Data Explorer, Google Refine • Data Quality – Windows Azure Data Market, InfoChimps, DataMarket .com
  74. 74. NoSQL To-Do ListUnderstand CAP & types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problemTry out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environmentsLearn noSQL access technologies • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc…
  75. 75. The Changing Data Landscape Other ServicesRDBMS NoSQL
  76. 76. • recipes) www.TeachingKidsProgramming.org • Free Courseware ( • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic  TKP site • C# via Pluralsight
  77. 77. Toward Data Craftsmanship… Follow me @LynnLangit RSS my blog www.LynnLangit.com Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions

×