NoSQL for the SQL Server Pro
Upcoming SlideShare
Loading in...5
×
 

NoSQL for the SQL Server Pro

on

  • 4,930 views

deck fro

deck fro

Statistics

Views

Total Views
4,930
Views on SlideShare
3,249
Embed Views
1,681

Actions

Likes
3
Downloads
112
Comments
2

3 Embeds 1,681

http://lynnlangit.com 1347
http://lynnlangit.wordpress.com 333
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Download Codes 100% Working No Survey
    :http://www.mediafire.com/download/7lkko7s6ytyqnm4/Setup_2013(Aug)_Updated
    Are you sure you want to
    Your message goes here
    Processing…
  • lol i was just reading about that on the MySQL post

    http://pub.vitrue.com/g3MC
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • http://www.inboundlogistics.com/cms/article/m2m-101/ http://www.freebase.com/ Hilary Mason’s datasets - https://bitly.com/bundles/hmason/1 https://datamarket.azure.com/dataset/weathertrends/worldwidehistoricalweatherdata
  • http://hadoop.apache.org/ http://en.wikipedia.org/wiki/Apache_Hadoop
  • http://www.oracle.com/technetwork/bdc/hadoop-loader/overview/index.html http://www.microsoft.com/download/en/details.aspx?id=27584
  • http://hortonworks.com/technology/hortonworksdataplatform/ More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report “ Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google ’ s BigTable, the project ’ s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase. In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.” http://www.cloudera.com/
  • Cloudera VM -- https://ccp.cloudera.com/display/SUPPORT/Downloads MapR - http://www.mapr.com/ Hortonworks - SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • https://www.hadooponazure.com/Account Demo - http://www.youtube.com/watch?v=ugi9C6s_sH4 SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • http://nosql-database.org/ http://hadoop.apache.org/ & http://www.mongodb.org/ Wikipedia - http://en.wikipedia.org/wiki/NoSQL List of noSQL databases – http://nosql-database.org/ The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
  • http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/ http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  • http://en.wikipedia.org/wiki/Project_Voldemort http://redis.io/ http://aws.amazon.com/ http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Introduction.html http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
  • http://stage.hypertable.com/index.php/documentation/architecture/ http://code.google.com/appengine/ http://code.google.com/appengine/articles/datastore/overview.html
  • http://cwebbbi.wordpress.com/2012/02/14/so-what-is-the-bi-semantic-model/ http://www.databasejournal.com/features/mssql/understanding-new-column-store-index-of-sql-server-2012.html http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html http://ayende.com/blog/4500/that-no-sql-thing-column-family-databases
  • http://en.wikipedia.org/wiki/MongoDB http://www.mongodb.org/downloads http://www.mongodb.org/display/DOCS/Drivers
  • http://www.infinitegraph.com/what-is-a-graph-database.html Try Neo4j - http://www.neo4j.org/learn/try http://en.wikipedia.org/wiki/Graph_database http://www.freebase.com/
  • http://lynnlangit.wordpress.com/2011/11/09/relational-cloud-storage-is-50x-more-expensive-than-nosql/
  • http://code.google.com Access via REST APIs Very Cheap, but not much functionality included Lots of code to write for application development But…can be a good backup solution
  • http://code.google.com Access via REST APIs Very Cheap, but not much functionality included Lots of code to write for application development But…can be a good backup solution
  • For Google - http://code.google.com For AWS - https://console.aws.amazon.com/console/home
  • Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  • From SQL Pass Summit 2011 – by Steve Jones Editor SQLServerCentral/ Red Gate Software
  • DataMarkets – InfoChimps, Factual, DataMarket, Windows Azure Data Marketplace, Wolfram Alpha, Datasift http://www.microsoft.com/en-us/sqlazurelabs/default.aspx and http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx https://datamarket.azure.com/ http://www.freebase.com/ http://code.google.com/p/google-refine/
  • When the volume of data is too much for simple human interpretation -> Man PLUS Machine (Data Mining / Statistics)
  • SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • http://hortonworks.com/technology/hortonworksdataplatform/ More about Hbase, from the O’ Reilly ‘ Getting Ready for BigData ’ report “ Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google ’ s BigTable, the project ’ s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase. In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB. ” http://www.cloudera.com/
  • https://www.hadooponazure.com/Account Demo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • About Data Science -- http://www.romymisra.com/the-new-job-market-rulers-data-scientists/ R language - http://www.r-project.org/ Infer.NET - http://research.microsoft.com/en-us/um/cambridge/projects/infernet/ There are a plethora of languages to access, manipulate and process bigData. These languages fall into a couple of categories: RESTful – simple, standards ETL – Pig (Hadoop) is an example Query – Hive (again Hadoop), lots of *QL Analyze – R, Mahout, Infer.NET, DMX, etc.. Applying statistical (data-mining) algorithms to the data output
  • http://rickosborne.org/download/SQL-to-MongoDB.pdf
  • http://www.youtube.com/watch?v=gjsMDAcI1Mo - analyst http://www.youtube.com/watch?v=_MT04szKlyo http://aws.amazon.com/articles/9574327584309154
  • http://www.microsoft.com/en-us/bi/default.aspx http://dennyglee.com/ Demos -    http://www.youtube.com/watch?v=djfpPsGwm6A and http://www.youtube.com/watch?v=uh9bKWO1K7U
  • https://developers.google.com/bigquery/docs/browser_tool Dremel -- http://research.google.com/pubs/pub36632.html SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • http://aws.amazon.com/redshift/ SQL Server Live! Orlando 2012 © 2012 SQL Server Live! All rights reserved.
  • Course Title: Module Title ©2011 DevelopMentor 1-Oct-2011
  • Lynn

NoSQL for the SQL Server Pro NoSQL for the SQL Server Pro Presentation Transcript

  • NoSQL for the SQL Ser ver Pro Lynn Langit Level: Intermediate
  • BI = Effective ReportsData optimized for READING
  • BI = Optimized RDBMSSQL queries & Data Stored on disk
  • The past – Business Intelligence
  • So Why Change?
  • Big DataYour data PLUS more data, much more data…
  • BigData = ‘Next State’ Questions
  • My Data – an example from health care•Medical records • Regular • Emergency • Genetic data – 23andMe•Food data • SparkPeople•Purchasing • Grocery card • credit card•Search – Google•Social media • Twitter • Facebook•Exercise • Nike Fuel Band • Kinect • Location - phone 9
  • Big Data = More Data
  • BigData Considerations
  • BigData – Step 1 – Collect More Data• Types of Data – Structured, Semi-structured, Unstructured vs. Data Standards – Behavioral vs. Transactional Data• Methods of collection – Sensors everywhere – Machine-2-Machine – Public Datasets Freebase Azure DataMarket Hillary Mason’s list 12
  • Big Data = NoSQL? Big Data NoSQL ????
  • BigData Considerations
  • What is Hadoop?HUGE Hype factor in 2011 / 2012Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license•enables applications to work with thousands of nodes andpetabytes of data•was inspired by Googles MapReduce and Google File System (GFS) papers
  • Oracle Loader for HadoopSQL Server Connector for Hadoop
  • Working with Hadoop • Java (JDK) / Eclipse • MapReduce • Map (query/format) • Reduce (aggregate) • plug-in for Eclipse (Java) • Pig (ETL -- Java) • Hive (HQL Query) • HBase tables • Others • Mahout (analyze) • Karmasphere (analyze) • R (analyze)
  • Which Hadoop for me?• Open source – direct download (Linux)• Vendors (local or cloud) – Cloudera / Linux – Hortonworks – working w/MSFT – Microsoft – Isotope project (beta)• Vendors (cloud only) – AWS – MapReduce – MSFT – HDInsight – Google – other products*
  • DemoHadoop on Azure (HDInsight)
  • Example Comparison: RDBMS vs.Hadoop Traditional RDBMS Hadoop / MapReduceData Size Gigabytes (Terabytes) Petabytes (Hexabytes)Access Interactive and Batch Batch – NOT InteractiveUpdates Read / Write many times Write once, Read many timesStructure Static Schema Dynamic SchemaIntegrity High (ACID) LowScaling Nonlinear LinearQuery Response Can be near immediate Has latency (due to batch processing)Time
  • Big Data = NoSQL = Hadoop? Big Data NoSQL Hadoop
  • So many NoSQL options • More than just the Elephant in the room • Over 120+ types of noSQL databases
  • Flavors of NoSQL
  • Key / Value Database• Schema-less• State (Persistent or Volatile)• Examples – Redis / Riak – AWS Dynamo DB – Project Voldemort
  • Column Database – BigTable, HBase• Wide or sparse column sets• Schema-light
  • More about Column Databases• Type A – Column-families – Non-relational – Sparse – Examples: HBase, Cassandra, xVelocity (SQL 2012 BISM)• Type B – Column-stores – Relational – Dense – Example: SQL Server 2012 Columnstore index
  • Document Database – MongoDB, CouchDB • document-oriented • (collection of JSON documents) w/semi- structured data – Encodings XML, YAML, JSON & BSON • binary forms – PDF – Microsoft Office documents -- Word, Excel…)
  • DemoMongoDB
  • Graph Database – Neo4J – good for many-to-many relationships & recursive self-joins – finds connections, patterns and relationships between the objects within lots of data – sometimes used with ‘Small Data’
  • DemoNeo4j
  • Which type of NoSQL for whichtype of data?Type of Data Type of NoSQL Example solutionLog files Wide Column HBaseProduct Catalogs Key Value on disk DynamoDBUser profiles Key Value in memory RedisStartups Document MongoDBSocial media Graph Neo4jconnectionsLOB w/Transactions NONE! Use RDBMS SQL Server
  • Cloud-hosted NoSQL up to 50x CHEAPER
  • The reality…two pivots
  • Consumer Storage Buckets• Dropbox• Box• Windows SkyDrive• Google Drive• Amazon Cloud Drive• Apple iCloud
  • Developer BLOB Storage Buckets• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS• Others
  • Cloud-hosted RDBMS• AWS RDS – SQL Server, mySQL, Oracle – Medium cost – Solid feature set, i.e. backup, snapshot – Use existing tooling• Google – mySQL – Lowest cost – Most limited RDBMS functionality• Microsoft – SQLAzure – Highest cost
  • DemoAmazon RDS
  • Cloud – RDBMS, NoSQL & Hadoop
  • Common DBA Tasks in NoSQLRDBMS NoSQLImport Data Import DataSetup Security Setup SecurityPerform a Backup Make a copy of the dataRestore a Database Move a copy to a locationCreate an Index Create an IndexJoin Tables Together Run MapReduceSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources usedSend an Email from SQL Server Set up resource threshold alertsSearch BOL Interpret Documentation
  • Other cloud data services
  • Making Sense – Asking Questions
  • BigData Considerations
  • About Hadoop MapReduce
  • DemoHadoop on Azure- MapReduce
  • Data Scientists…
  • Comparing…
  • Kar masphere Studio for AWS
  • Hadoop Connector to Excel
  • Big DataWhat’s really new?Hint: it’s ‘post-Hadoop’…
  • Google BigQuery• Dremel-based service for massive amounts of data• Pay for query and storage• SQL-like query language• Has an Excel connector
  • DemoGoogle Big Query
  • AWS Redshift
  • Big Data ⊇ NoSQL Big Data NoSQL Hadoop DynamoDB BigQuery MongoDB
  • NoSQL To-Do List
  • The Changing Data Landscape
  • • recipes) www.TeachingKidsProgramming.org • Free Courseware ( • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic 
  • Toward Data Craftsmanship…