0
NoSQL for the SQL Server Pro
(or “Practical Big Data”)
Lynn Langit
July 2013 – Malibu SQL UG
Data Expertise / Lynn Langit
• Industry awards
– Microsoft – MVP for SQL Server
– Google – GDE for Cloud Platform
– 10Gen ...
BigData Pipeline - STEP 1 – Acquire
Acquire
Process
Store
Query & Mine
Visualize
BigData = ‘Next State’ Questions
• What could happen?
• Why didn’t this happen?
• When will the next new thing
happen?
• W...
BigData Pipeline – STEP 2 - Process
Acquire
Process
Store
Query & Mine
Visualize
Is Big Data = NoSQL and just Hadoop?
HUGE Hype factor since 2011
Apache Hadoop
• a software framework that supports data-i...
Hadoop in the Enterprise
How you ‘get’ Hadoop
• roll your own
Open source
• Cloudera
• MapR
• Hortonworks
• More…
Commercial distribution
• AWS
• H...
Demo - HDInsight
About Hadoop MapReduce
Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
Demo - HDInsight – MapReduce w/Java
Working with Hadoop
Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes and gr...
BigData Pipeline STEP 3 – Store
Acquire
Process
Store
Query & Mine
Visualize
“Small” BigData vs. “Big” BigData
Hadoop
NoSQL
RDBMS
Hadoop
NoSQL
RDBMS
On Premises In the Cloud
Cloud-hosted NoSQL up to 50x CHEAPER
So many NoSQL options
• More than just the Elephant in the room
• Over 120+ types of NoSQL databases
Flavors of NoSQL
Key/Value
Volatile
Key/value
Persistent
Wide-Column Document Graph
Key / Value Database
• Just keys and values
– No schema
• Persistent or Volatile
• Examples
– AWS Dynamo DB
– Riak
DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud
NoSQL BLOB Storage Buckets in the Cloud
• Amazon – S3 or Glacier
• Google – Cloud Storage
• Microsoft Azure BLOBS
• Others...
DEMO - Battle of the Buckets
• Google Cloud Storage VS.
• Windows Azure BLOBS VS.
• AWS S3 / Glacier
Column Database
• Wide, sparse column sets
• Schema-light
• Examples:
– Cassandra
– HBase w/Hadoop
– BigTable
– GAE HR DS
Types of Column Databases
• Column-families
– Non-relational
– Sparse
– Examples:
• HBase
• Cassandra
• xVelocity (SQL 201...
DEMO – SQL Server ‘NoSQL’
• SQL Server 2012 Columnstore Index
• SQL Server 2012 Tabular Model (SSAS)
Document Database (Mongo DB)
• document-oriented (collection of
JSON documents) w/semi structured
data
– Encodings include...
Demo - Mongo DB
Graph Databases
• a lot of many-to-many relationships
• recursive self-joins
• when your primary objective is quickly
find...
DEMO – Neo4J
“Small” BigData vs. “Big” BigData
Hadoop
Key/Value or
Column
Document or
Graph
RDBMS
On Premise or
In the Cloud
Cloud-hosted RDBMS
• AWS RDS – SQL
Server, mySQL, Oracle
– Medium cost
– Solid feature set, i.e.
backup, snapshot
– Use ex...
DEMO - AWS RDS
• SQL Server, MySQL or Oracle
• Essential to understand pricing models
Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
NoSQL Applied
SocialGames
ProductCatalogs
Socialaggregators
LogFiles
Line-of-Business
Columnstore
HBase
Key/Value
DynamoDB...
Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
RDBMS RDS – all major mySQL SQL Azure
NoSQL buckets S3 or Glacier Cl...
BigData Pipeline STEP 4 – Query
Acquire
Process
Store
Query & Mine
Visualize
AlwaysMapReduce?
Can Excel help?
• Connector to Hadoop
• Data Explorer
• Data Quality Services
• Master Data Services
• Integration with Az...
Demo - Hadoop Connector to Excel
Other types of cloud data services
Hosting public datasets
• Pay to read
• Earn revenue by offering for read
Cleaning / ma...
Collecting BigData
• Sensors everywhere
• Structured, Semi-structured, Unstructured vs. Data
Standards
• M2M
• Public Data...
NoSQL To-Do List
Understand types of NoSQL databases
• Use NoSQL when business needs designate
• Use the right type of NoS...
www.TeachingKidsProgramming.org
• Free Courseware (Java, Small Basic or C# [on Pluralsight])
• Do a Recipe  Teach a Kid (...
VOTE
CONFIRM
SHARE
Keep Learning
• Twitter: @LynnLangit
• YouTube:
http://www.youtube.com/user/SoCalDevGal
• Hire me
– To help build your BI/...
Upcoming SlideShare
Loading in...5
×

NoSQL for the SQL Pro

1,544

Published on

slides from talk for Malibu SQL User Group - July 2013

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,544
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
79
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • http://hadoop.apache.org/http://en.wikipedia.org/wiki/Apache_Hadoop
  • http://www.oracle.com/technetwork/bdc/hadoop-loader/overview/index.htmlhttp://www.microsoft.com/download/en/details.aspx?id=27584
  • http://hortonworks.com/technology/hortonworksdataplatform/More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”http://www.cloudera.com/
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • http://hortonworks.com/technology/hortonworksdataplatform/More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”http://www.cloudera.com/
  • Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • http://lynnlangit.wordpress.com/2011/11/09/relational-cloud-storage-is-50x-more-expensive-than-nosql/
  • http://nosql-database.org/http://hadoop.apache.org/ & http://www.mongodb.org/Wikipedia - http://en.wikipedia.org/wiki/NoSQLList of noSQL databases – http://nosql-database.org/The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
  • http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  • http://en.wikipedia.org/wiki/Project_Voldemorthttp://aws.amazon.com/http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
  • http://code.google.comAccess via REST APIsVery Cheap, but not much functionality includedLots of code to write for application developmentBut…can be a good backup solution
  • http://stage.hypertable.com/index.php/documentation/architecture/http://code.google.com/appengine/http://code.google.com/appengine/articles/datastore/overview.html
  • http://cwebbbi.wordpress.com/2012/02/14/so-what-is-the-bi-semantic-model/http://www.databasejournal.com/features/mssql/understanding-new-column-store-index-of-sql-server-2012.htmlhttp://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.htmlhttp://ayende.com/blog/4500/that-no-sql-thing-column-family-databases
  • http://en.wikipedia.org/wiki/MongoDBhttp://www.mongodb.org/downloadshttp://www.mongodb.org/display/DOCS/Drivers
  • http://en.wikipedia.org/wiki/MongoDBhttp://www.mongodb.org/downloadshttp://www.mongodb.org/display/DOCS/Drivers
  • http://www.infinitegraph.com/what-is-a-graph-database.htmlhttp://en.wikipedia.org/wiki/Graph_databasehttp://www.freebase.com/
  • For Google - http://code.google.comFor AWS - https://console.aws.amazon.com/console/home
  • Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  • http://rickosborne.org/download/SQL-to-MongoDB.pdf
  • http://www.microsoft.com/en-us/bi/default.aspxhttp://dennyglee.com/Demos -   http://www.youtube.com/watch?v=djfpPsGwm6Aand http://www.youtube.com/watch?v=uh9bKWO1K7U
  • DataMarkets – InfoChimps, Factual, DataMarket, Windows Azure Data Marketplace, Wolfram Alpha, Datasifthttp://www.microsoft.com/en-us/sqlazurelabs/default.aspx andhttp://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspxhttps://datamarket.azure.com/http://www.freebase.com/http://code.google.com/p/google-refine/
  • http://www.inboundlogistics.com/cms/article/m2m-101/http://www.freebase.com/Hilary Mason’s datasets - https://bitly.com/bundles/hmason/1
  • Lynn
  • Transcript of "NoSQL for the SQL Pro"

    1. 1. NoSQL for the SQL Server Pro (or “Practical Big Data”) Lynn Langit July 2013 – Malibu SQL UG
    2. 2. Data Expertise / Lynn Langit • Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB • Practicing Architect • Technical author / trainer – Pluralsight – Google Cloud Series – DevelopMentor – SQL Server 2012 Series – 2 books on SQL Server BI – Cloudera trainer (certified) • Former MSFT FTE – 4 years
    3. 3. BigData Pipeline - STEP 1 – Acquire Acquire Process Store Query & Mine Visualize
    4. 4. BigData = ‘Next State’ Questions • What could happen? • Why didn’t this happen? • When will the next new thing happen? • What will the next new thing be? • What happens? Collecting Behavioral data
    5. 5. BigData Pipeline – STEP 2 - Process Acquire Process Store Query & Mine Visualize
    6. 6. Is Big Data = NoSQL and just Hadoop? HUGE Hype factor since 2011 Apache Hadoop • a software framework that supports data-intensive distributed applications • under a free license enables applications to work with thousands of nodes and petabytes of data • was inspired by Google's MapReduce and Google File System (GFS) papers
    7. 7. Hadoop in the Enterprise
    8. 8. How you ‘get’ Hadoop • roll your own Open source • Cloudera • MapR • Hortonworks • More… Commercial distribution • AWS • HDInsight Rent it via the cloud
    9. 9. Demo - HDInsight
    10. 10. About Hadoop MapReduce Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
    11. 11. Demo - HDInsight – MapReduce w/Java
    12. 12. Working with Hadoop
    13. 13. Example Comparison: RDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes and greater Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
    14. 14. BigData Pipeline STEP 3 – Store Acquire Process Store Query & Mine Visualize
    15. 15. “Small” BigData vs. “Big” BigData Hadoop NoSQL RDBMS Hadoop NoSQL RDBMS On Premises In the Cloud
    16. 16. Cloud-hosted NoSQL up to 50x CHEAPER
    17. 17. So many NoSQL options • More than just the Elephant in the room • Over 120+ types of NoSQL databases
    18. 18. Flavors of NoSQL Key/Value Volatile Key/value Persistent Wide-Column Document Graph
    19. 19. Key / Value Database • Just keys and values – No schema • Persistent or Volatile • Examples – AWS Dynamo DB – Riak
    20. 20. DEMO - AWS DynamoDB • Key/Value store on the AWS cloud
    21. 21. NoSQL BLOB Storage Buckets in the Cloud • Amazon – S3 or Glacier • Google – Cloud Storage • Microsoft Azure BLOBS • Others – Dropbox – Box – More…
    22. 22. DEMO - Battle of the Buckets • Google Cloud Storage VS. • Windows Azure BLOBS VS. • AWS S3 / Glacier
    23. 23. Column Database • Wide, sparse column sets • Schema-light • Examples: – Cassandra – HBase w/Hadoop – BigTable – GAE HR DS
    24. 24. Types of Column Databases • Column-families – Non-relational – Sparse – Examples: • HBase • Cassandra • xVelocity (SQL 2012 Tabular) • Column-stores – Relational – Dense – Example: • SQL Server 2012 – Columnstore index
    25. 25. DEMO – SQL Server ‘NoSQL’ • SQL Server 2012 Columnstore Index • SQL Server 2012 Tabular Model (SSAS)
    26. 26. Document Database (Mongo DB) • document-oriented (collection of JSON documents) w/semi structured data – Encodings include BSON, JSON, XML… • binary forms – PDF, Microsoft Office documents -- Word, Excel…) • Examples: – MongoDB – Couchbase
    27. 27. Demo - Mongo DB
    28. 28. Graph Databases • a lot of many-to-many relationships • recursive self-joins • when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data • Examples: – Neo4J – Google Freebase
    29. 29. DEMO – Neo4J
    30. 30. “Small” BigData vs. “Big” BigData Hadoop Key/Value or Column Document or Graph RDBMS On Premise or In the Cloud
    31. 31. Cloud-hosted RDBMS • AWS RDS – SQL Server, mySQL, Oracle – Medium cost – Solid feature set, i.e. backup, snapshot – Use existing tooling • Google – mySQL – Lowest cost – Most limited RDBMS functionality • Microsoft – SQLAzure – Highest cost
    32. 32. DEMO - AWS RDS • SQL Server, MySQL or Oracle • Essential to understand pricing models
    33. 33. Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
    34. 34. NoSQL Applied SocialGames ProductCatalogs Socialaggregators LogFiles Line-of-Business Columnstore HBase Key/Value DynamoDB Document MongoDB Graph Neo4j RDBMS SQL Server
    35. 35. Cloud Offerings– RDBMS AND NoSQL AWS Google Microsoft RDBMS RDS – all major mySQL SQL Azure NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs NoSQL Key-Value DynamoDB H/R Data on GAE Azure Tables Streaming ML or (Mahout) Custom EC2 Prospective Search & Prediction API StreamInsight NoSQL Document or Graph MongoDB on EC2 Freebase MongoDB on Windows Azure NoSQL – Column Hadoop (HBase) Elastic MapReduce using S3 & EC2 none HDInsight Dremel/Warehousi ng RedShift BigQuery none
    36. 36. BigData Pipeline STEP 4 – Query Acquire Process Store Query & Mine Visualize
    37. 37. AlwaysMapReduce?
    38. 38. Can Excel help? • Connector to Hadoop • Data Explorer • Data Quality Services • Master Data Services • Integration with Azure Data Market • Visualize with PowerView • Data Mining w/Predixion
    39. 39. Demo - Hadoop Connector to Excel
    40. 40. Other types of cloud data services Hosting public datasets • Pay to read • Earn revenue by offering for read Cleaning / matching (your) data • ETL – Microsoft Data Explorer, Google Refine • Data Quality – Windows Azure Data Market, InfoChimps, DataMarket.com
    41. 41. Collecting BigData • Sensors everywhere • Structured, Semi-structured, Unstructured vs. Data Standards • M2M • Public Datasets – Freebase – Azure DataMarket – Hillary Mason’s list 41
    42. 42. NoSQL To-Do List Understand types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problem Try out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environments Learn NoSQL access technologies & services • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc… • Windows Azure Data Market, other public data markets
    43. 43. www.TeachingKidsProgramming.org • Free Courseware (Java, Small Basic or C# [on Pluralsight]) • Do a Recipe  Teach a Kid (Ages 10 ++) • VOTE at http://www.azureDevs.com, CONFIRM via email and SHARE (tweet) • recipes)
    44. 44. VOTE CONFIRM SHARE
    45. 45. Keep Learning • Twitter: @LynnLangit • YouTube: http://www.youtube.com/user/SoCalDevGal • Hire me – To help build your BI/Big Data solution – To teach your team next gen BI – To learn more about using NoSQL solutions
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×