Your SlideShare is downloading. ×
0
NoSQL for the SQL Ser ver Pro                      Lynn Langit                      Level: Intermediate
BI = Effective ReportsData optimized for READING
BI = Optimized RDBMSSQL queries & Data Stored on disk
The past – Business Intelligence
So Why Change?
Big DataYour data PLUS more data, much more data…
BigData = ‘Next State’ Questions
My Data – an example from health care•Medical records    • Regular    • Emergency    • Genetic data – 23andMe•Food data   ...
Big Data = More Data
BigData Considerations
BigData – Step 1 – Collect More Data•   Types of Data    – Structured, Semi-structured, Unstructured vs. Data      Standar...
Big Data = NoSQL?  Big Data          NoSQL             ????
BigData Considerations
What is Hadoop?HUGE Hype factor in 2011 / 2012Apache Hadoop is a software framework that supports data-intensive distribut...
Oracle Loader for HadoopSQL Server Connector for Hadoop
Working with Hadoop                      • Java (JDK) / Eclipse                      • MapReduce                         •...
Which Hadoop for me?•   Open source – direct download (Linux)•   Vendors (local or cloud)    – Cloudera / Linux    – Horto...
DemoHadoop on Azure (HDInsight)
Example Comparison: RDBMS vs.Hadoop                 Traditional RDBMS         Hadoop / MapReduceData Size        Gigabytes...
Big Data = NoSQL = Hadoop?  Big Data            NoSQL             Hadoop
So many NoSQL options •   More than just the Elephant in the room •   Over 120+ types of noSQL databases
Flavors of NoSQL
Key / Value Database•   Schema-less•   State (Persistent or Volatile)•   Examples    – Redis / Riak    – AWS Dynamo DB    ...
Column Database – BigTable, HBase•   Wide or sparse column sets•   Schema-light
More about Column Databases•   Type A    – Column-families    – Non-relational    – Sparse    – Examples: HBase, Cassandra...
Document Database – MongoDB, CouchDB •   document-oriented •   (collection of JSON documents)     w/semi- structured data ...
DemoMongoDB
Graph Database – Neo4J – good for many-to-many relationships &   recursive self-joins – finds connections, patterns and   ...
DemoNeo4j
Which type of NoSQL for whichtype of data?Type of Data         Type of NoSQL         Example                     solutionL...
Cloud-hosted NoSQL up to 50x CHEAPER
The reality…two pivots
Consumer Storage Buckets•   Dropbox•   Box•   Windows SkyDrive•   Google Drive•   Amazon Cloud Drive•   Apple iCloud
Developer BLOB Storage Buckets•   Amazon – S3 or Glacier•   Google – Cloud Storage•   Microsoft Azure BLOBS•   Others
Cloud-hosted RDBMS•   AWS RDS – SQL    Server, mySQL, Oracle    – Medium cost    – Solid feature set, i.e.      backup, sn...
DemoAmazon RDS
Cloud – RDBMS, NoSQL & Hadoop
Common DBA Tasks in NoSQLRDBMS                           NoSQLImport Data                     Import DataSetup Security   ...
Other cloud data services
Making Sense – Asking Questions
BigData Considerations
About Hadoop MapReduce
DemoHadoop on Azure- MapReduce
Data Scientists…
Comparing…
Kar masphere Studio for AWS
Hadoop Connector to Excel
Big DataWhat’s really new?Hint: it’s ‘post-Hadoop’…
Google BigQuery•   Dremel-based service for massive amounts of data•   Pay for query and storage•   SQL-like query languag...
DemoGoogle Big Query
AWS Redshift
Big Data ⊇ NoSQL    Big Data                           NoSQL               Hadoop      DynamoDB               BigQuery    ...
NoSQL To-Do List
The Changing Data Landscape
• recipes)    www.TeachingKidsProgramming.org      •   Free Courseware (      •   Do a Recipe  Teach a Kid (Ages 10 ++)  ...
Toward Data Craftsmanship…
NoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
Upcoming SlideShare
Loading in...5
×

NoSQL for the SQL Server Pro

5,511

Published on

deck fro

Published in: Technology
2 Comments
3 Likes
Statistics
Notes
  • Download Codes 100% Working No Survey
    :http://www.mediafire.com/download/7lkko7s6ytyqnm4/Setup_2013(Aug)_Updated
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • lol i was just reading about that on the MySQL post

    http://pub.vitrue.com/g3MC
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
5,511
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
147
Comments
2
Likes
3
Embeds 0
No embeds

No notes for slide
  • SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • http://www.inboundlogistics.com/cms/article/m2m-101/ http://www.freebase.com/ Hilary Mason’s datasets - https://bitly.com/bundles/hmason/1 https://datamarket.azure.com/dataset/weathertrends/worldwidehistoricalweatherdata
  • http://hadoop.apache.org/ http://en.wikipedia.org/wiki/Apache_Hadoop
  • http://www.oracle.com/technetwork/bdc/hadoop-loader/overview/index.html http://www.microsoft.com/download/en/details.aspx?id=27584
  • http://hortonworks.com/technology/hortonworksdataplatform/ More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report “ Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google ’ s BigTable, the project ’ s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase. In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.” http://www.cloudera.com/
  • Cloudera VM -- https://ccp.cloudera.com/display/SUPPORT/Downloads MapR - http://www.mapr.com/ Hortonworks - SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • https://www.hadooponazure.com/Account Demo - http://www.youtube.com/watch?v=ugi9C6s_sH4 SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • http://nosql-database.org/ http://hadoop.apache.org/ & http://www.mongodb.org/ Wikipedia - http://en.wikipedia.org/wiki/NoSQL List of noSQL databases – http://nosql-database.org/ The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
  • http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/ http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  • http://en.wikipedia.org/wiki/Project_Voldemort http://redis.io/ http://aws.amazon.com/ http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Introduction.html http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
  • http://stage.hypertable.com/index.php/documentation/architecture/ http://code.google.com/appengine/ http://code.google.com/appengine/articles/datastore/overview.html
  • http://cwebbbi.wordpress.com/2012/02/14/so-what-is-the-bi-semantic-model/ http://www.databasejournal.com/features/mssql/understanding-new-column-store-index-of-sql-server-2012.html http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html http://ayende.com/blog/4500/that-no-sql-thing-column-family-databases
  • http://en.wikipedia.org/wiki/MongoDB http://www.mongodb.org/downloads http://www.mongodb.org/display/DOCS/Drivers
  • http://www.infinitegraph.com/what-is-a-graph-database.html Try Neo4j - http://www.neo4j.org/learn/try http://en.wikipedia.org/wiki/Graph_database http://www.freebase.com/
  • http://lynnlangit.wordpress.com/2011/11/09/relational-cloud-storage-is-50x-more-expensive-than-nosql/
  • http://code.google.com Access via REST APIs Very Cheap, but not much functionality included Lots of code to write for application development But…can be a good backup solution
  • http://code.google.com Access via REST APIs Very Cheap, but not much functionality included Lots of code to write for application development But…can be a good backup solution
  • For Google - http://code.google.com For AWS - https://console.aws.amazon.com/console/home
  • Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  • From SQL Pass Summit 2011 – by Steve Jones Editor SQLServerCentral/ Red Gate Software
  • DataMarkets – InfoChimps, Factual, DataMarket, Windows Azure Data Marketplace, Wolfram Alpha, Datasift http://www.microsoft.com/en-us/sqlazurelabs/default.aspx and http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx https://datamarket.azure.com/ http://www.freebase.com/ http://code.google.com/p/google-refine/
  • When the volume of data is too much for simple human interpretation -> Man PLUS Machine (Data Mining / Statistics)
  • SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • http://hortonworks.com/technology/hortonworksdataplatform/ More about Hbase, from the O’ Reilly ‘ Getting Ready for BigData ’ report “ Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google ’ s BigTable, the project ’ s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase. In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB. ” http://www.cloudera.com/
  • https://www.hadooponazure.com/Account Demo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • About Data Science -- http://www.romymisra.com/the-new-job-market-rulers-data-scientists/ R language - http://www.r-project.org/ Infer.NET - http://research.microsoft.com/en-us/um/cambridge/projects/infernet/ There are a plethora of languages to access, manipulate and process bigData. These languages fall into a couple of categories: RESTful – simple, standards ETL – Pig (Hadoop) is an example Query – Hive (again Hadoop), lots of *QL Analyze – R, Mahout, Infer.NET, DMX, etc.. Applying statistical (data-mining) algorithms to the data output
  • http://rickosborne.org/download/SQL-to-MongoDB.pdf
  • http://www.youtube.com/watch?v=gjsMDAcI1Mo - analyst http://www.youtube.com/watch?v=_MT04szKlyo http://aws.amazon.com/articles/9574327584309154
  • http://www.microsoft.com/en-us/bi/default.aspx http://dennyglee.com/ Demos -    http://www.youtube.com/watch?v=djfpPsGwm6A and http://www.youtube.com/watch?v=uh9bKWO1K7U
  • https://developers.google.com/bigquery/docs/browser_tool Dremel -- http://research.google.com/pubs/pub36632.html SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • http://aws.amazon.com/redshift/ SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • Course Title: Module Title ©2011 DevelopMentor 1-Oct-2011
  • Lynn
  • Transcript of "NoSQL for the SQL Server Pro"

    1. 1. NoSQL for the SQL Ser ver Pro Lynn Langit Level: Intermediate
    2. 2. BI = Effective ReportsData optimized for READING
    3. 3. BI = Optimized RDBMSSQL queries & Data Stored on disk
    4. 4. The past – Business Intelligence
    5. 5. So Why Change?
    6. 6. Big DataYour data PLUS more data, much more data…
    7. 7. BigData = ‘Next State’ Questions
    8. 8. My Data – an example from health care•Medical records • Regular • Emergency • Genetic data – 23andMe•Food data • SparkPeople•Purchasing • Grocery card • credit card•Search – Google•Social media • Twitter • Facebook•Exercise • Nike Fuel Band • Kinect • Location - phone 9
    9. 9. Big Data = More Data
    10. 10. BigData Considerations
    11. 11. BigData – Step 1 – Collect More Data• Types of Data – Structured, Semi-structured, Unstructured vs. Data Standards – Behavioral vs. Transactional Data• Methods of collection – Sensors everywhere – Machine-2-Machine – Public Datasets Freebase Azure DataMarket Hillary Mason’s list 12
    12. 12. Big Data = NoSQL? Big Data NoSQL ????
    13. 13. BigData Considerations
    14. 14. What is Hadoop?HUGE Hype factor in 2011 / 2012Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license•enables applications to work with thousands of nodes andpetabytes of data•was inspired by Googles MapReduce and Google File System (GFS) papers
    15. 15. Oracle Loader for HadoopSQL Server Connector for Hadoop
    16. 16. Working with Hadoop • Java (JDK) / Eclipse • MapReduce • Map (query/format) • Reduce (aggregate) • plug-in for Eclipse (Java) • Pig (ETL -- Java) • Hive (HQL Query) • HBase tables • Others • Mahout (analyze) • Karmasphere (analyze) • R (analyze)
    17. 17. Which Hadoop for me?• Open source – direct download (Linux)• Vendors (local or cloud) – Cloudera / Linux – Hortonworks – working w/MSFT – Microsoft – Isotope project (beta)• Vendors (cloud only) – AWS – MapReduce – MSFT – HDInsight – Google – other products*
    18. 18. DemoHadoop on Azure (HDInsight)
    19. 19. Example Comparison: RDBMS vs.Hadoop Traditional RDBMS Hadoop / MapReduceData Size Gigabytes (Terabytes) Petabytes (Hexabytes)Access Interactive and Batch Batch – NOT InteractiveUpdates Read / Write many times Write once, Read many timesStructure Static Schema Dynamic SchemaIntegrity High (ACID) LowScaling Nonlinear LinearQuery Response Can be near immediate Has latency (due to batch processing)Time
    20. 20. Big Data = NoSQL = Hadoop? Big Data NoSQL Hadoop
    21. 21. So many NoSQL options • More than just the Elephant in the room • Over 120+ types of noSQL databases
    22. 22. Flavors of NoSQL
    23. 23. Key / Value Database• Schema-less• State (Persistent or Volatile)• Examples – Redis / Riak – AWS Dynamo DB – Project Voldemort
    24. 24. Column Database – BigTable, HBase• Wide or sparse column sets• Schema-light
    25. 25. More about Column Databases• Type A – Column-families – Non-relational – Sparse – Examples: HBase, Cassandra, xVelocity (SQL 2012 BISM)• Type B – Column-stores – Relational – Dense – Example: SQL Server 2012 Columnstore index
    26. 26. Document Database – MongoDB, CouchDB • document-oriented • (collection of JSON documents) w/semi- structured data – Encodings XML, YAML, JSON & BSON • binary forms – PDF – Microsoft Office documents -- Word, Excel…)
    27. 27. DemoMongoDB
    28. 28. Graph Database – Neo4J – good for many-to-many relationships & recursive self-joins – finds connections, patterns and relationships between the objects within lots of data – sometimes used with ‘Small Data’
    29. 29. DemoNeo4j
    30. 30. Which type of NoSQL for whichtype of data?Type of Data Type of NoSQL Example solutionLog files Wide Column HBaseProduct Catalogs Key Value on disk DynamoDBUser profiles Key Value in memory RedisStartups Document MongoDBSocial media Graph Neo4jconnectionsLOB w/Transactions NONE! Use RDBMS SQL Server
    31. 31. Cloud-hosted NoSQL up to 50x CHEAPER
    32. 32. The reality…two pivots
    33. 33. Consumer Storage Buckets• Dropbox• Box• Windows SkyDrive• Google Drive• Amazon Cloud Drive• Apple iCloud
    34. 34. Developer BLOB Storage Buckets• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS• Others
    35. 35. Cloud-hosted RDBMS• AWS RDS – SQL Server, mySQL, Oracle – Medium cost – Solid feature set, i.e. backup, snapshot – Use existing tooling• Google – mySQL – Lowest cost – Most limited RDBMS functionality• Microsoft – SQLAzure – Highest cost
    36. 36. DemoAmazon RDS
    37. 37. Cloud – RDBMS, NoSQL & Hadoop
    38. 38. Common DBA Tasks in NoSQLRDBMS NoSQLImport Data Import DataSetup Security Setup SecurityPerform a Backup Make a copy of the dataRestore a Database Move a copy to a locationCreate an Index Create an IndexJoin Tables Together Run MapReduceSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources usedSend an Email from SQL Server Set up resource threshold alertsSearch BOL Interpret Documentation
    39. 39. Other cloud data services
    40. 40. Making Sense – Asking Questions
    41. 41. BigData Considerations
    42. 42. About Hadoop MapReduce
    43. 43. DemoHadoop on Azure- MapReduce
    44. 44. Data Scientists…
    45. 45. Comparing…
    46. 46. Kar masphere Studio for AWS
    47. 47. Hadoop Connector to Excel
    48. 48. Big DataWhat’s really new?Hint: it’s ‘post-Hadoop’…
    49. 49. Google BigQuery• Dremel-based service for massive amounts of data• Pay for query and storage• SQL-like query language• Has an Excel connector
    50. 50. DemoGoogle Big Query
    51. 51. AWS Redshift
    52. 52. Big Data ⊇ NoSQL Big Data NoSQL Hadoop DynamoDB BigQuery MongoDB
    53. 53. NoSQL To-Do List
    54. 54. The Changing Data Landscape
    55. 55. • recipes) www.TeachingKidsProgramming.org • Free Courseware ( • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic 
    56. 56. Toward Data Craftsmanship…
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×