NoSQL for the SQL Server DBA


Published on

Slides from my talk at SQLSaturday 120 in Huntington Beach, CA in March 2012

Published in: Technology
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • From the O’Reilly / Strata “Getting Ready for Big Data” Report…“the three Vs of volume, velocity and variety are commonlyused to characterize different aspects of big data”
  • about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”
  • -
  • & - of noSQL databases – good, the bad -
  • OriginalReference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • via REST APIsVery Cheap, but not much functionality includedLots of code to write for application developmentBut…can be a good backup solution
  • DataMarkets – InfoChimps, Factual, DataMarket, Windows Azure Data Marketplace, Wolfram Alpha, Datasift and
  • Hadoop on AWS -
  • From SQL Pass Summit 2011 – by Steve JonesEditorSQLServerCentral/ Red Gate Software
  • - and
  • When the volume of data is too much for simple human interpretation ->Man PLUS Machine (Data Mining / Statistics)
  • About Data Science -- language - - are a plethora of languages to access, manipulate and process bigData. These languages fall into a couple of categories:RESTful – simple, standardsETL – Pig (Hadoop) is an exampleQuery – Hive (again Hadoop), lots of *QLAnalyze – R, Mahout, Infer.NET, DMX, etc.. Applying statistical (data-mining) algorithms to the data output
  • - analyst
  • -
  • Lynn
  • NoSQL for the SQL Server DBA

    1. 1. noSQL for the DBA Lynn LangitPractioner, Author, Instructor March 2012- for SQL Saturday SoCal
    2. 2. BigData = ‘Next State’ Questions • What could happen? Collecting • Why didn’t this happen? • When will the next new thing Behavioral happen? data • What will the next new thing be? • What happens?
    3. 3. BigData = Exponentially More Data• Retail Example -> ‘Feedback Economy’ – Number of transactions – Number of behaviors (collected every minute) 2500 2000 1500 Purchases 1000 Locations Phone data 500 0 12:00 12:30 1:00 1:30 2:00 2:30
    4. 4. So Why Change?
    5. 5. Hitting (Relational) Walls• For Writes – Scale (partition / shard) – Speed (latency)• For Reads – Failures (availability)
    6. 6. Is NoSQL just Hadoop?• HUGE Hype factor in 2011 / 2012Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Googles MapReduce and Google File System (GFS) papers
    7. 7. Working with Hadoop Common Tools / Languages • Java (JDK) / Eclipse • MapReduce • Map (query/format) • Reduce (aggregate) • plug-in for Eclipse (Java) • Pig (ETL -- Java) • Hive (HQL Query) • HBase tables • Others • Mahout (analyze) • Karmasphere (analyze) • R (analyze)
    8. 8. Oracle Loader for HadoopSQL Server Connector for Hadoop
    9. 9. Demo -Hadoop on Azure – Cluster Allocation
    10. 10. The reality…two pivotsStorage Methods Storage Locations• SQL (RDBMS) • On premises• noSQL • Cloud-hosted
    11. 11. So many NoSQL options• More than just the Elephant in the room• Over 120+ types of noSQL databases
    12. 12. Flavors of noSQL
    13. 13. Graph DatabaseUse for data with – a lot of many-to-many relationships – recursive self-joins – when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data – Examples: Neo4J, FreeBase (Google)
    14. 14. Column Database• Wide, sparse column sets• Examples: – Cassandra – HBase – BigTable – GAE HR DS – Azure Tables
    15. 15. Demo - Document Database (Mongo DB)• Use for data that is – document-oriented (collection of JSON documents) w/semi structured data • Encodings include XML, YAML, JSON & BSON – binary forms • PDF, Microsoft Office documents -- Word, Excel…) – Examples: MongoDB, CouchDB
    16. 16. Key / Value Database• Schema-less• State (Persistent or Volatile)• Examples – AWS Dynamo DB – Project Voldemort
    17. 17. So which type of NoSQL? Back to CAP…CP = noSQL/column CA = SQL/RDBMSHadoop SQL Sever / SQL AzureBig Table OracleH-base MySQLMemCacheDB Consistency(graph)?AP = noSQL/documentor key/valueDynamoDBCouchDBCassandraVoldemort Partitioning Availability
    18. 18. Example Comparison: RDBMS vs. Hadoop Traditional RDBMS HadoopData Size Gigabytes (Terabytes) Petabytes (Hexabytes)Access Interactive and Batch Batch – NOT InteractiveUpdates Read / Write many times Write once, Read many timesStructure Static Schema Dynamic SchemaIntegrity High (ACID) LowScaling Nonlinear LinearQuery Response Can be near immediate Has latency (due to batch processing)Time
    19. 19. Real-World Examples – not only SQL• Facebook runs on Hadoop & MySQL• Twitter runs on Hadoop(ran on FlockDb/graph)• Yahoo runs on Hadoop• LinkedIn runs on Hadoop & Voldemort• Klout runs Hadoop (on Azure) &HBase (Hive) & SQL Server SSAS BISM cubes
    20. 20. What about the cloud?
    21. 21. Cloud-hosted NoSQL up to 50x CHEAPER
    22. 22. NoSQL (Cloud) BLOB Storage Buckets• Amazon – S3 – The gold standard• Google – Cloud Storage – Free for developers• Microsoft Azure BLOBS• DropBox, Box…
    23. 23. Cloud-hosted RDBMS• AWS RDS – mySQL, Oracle – Medium cost – Solid feature set, i.e. backup, snapshot• Google – mySQL – Lowest cost – Most limited RDBMS functionality• Microsoft – SQLAzure – Best tooling integration – Highest cost
    24. 24. Other types of cloud data servicesHosting public datasets• Pay to read• Earn revenue by offering for readCleaning / matching (your) data• ETL – Microsoft Data Explorer, Google Refine• Data Quality – Windows Azure Data Market, InfoChimps,
    25. 25. Cloud – RDBMS AND NoSQL AWS Google Microsoft OthersCloud RDBMS Oracle / mySQL mySQL SQL Azure Hosted RDBMS on RackspacenoSQL buckets S3 Cloud Storage HDFS on AzureNoSQL DynamoDB H/R Datastore Azure Tables Herokudatabases on GAEStreaming Custom EC2 Prospective StreamInsightMachine Search & & Mahout withLearning Prediction API HadoopDocument or MongoDB on Freebase (g) MongoDB on Cassandra onGraph EC2 Windows Azure RackspaceHadoop Elastic Big Query Hadoop on MapReduce on (HBase-like) Azure S3 & EC2Data sets & Karmasphere Translation API Azure Database.comother Full-text search DataMarket
    26. 26. Pick your mix and then… • Use Cloud Other Data Markets Services • Use Cloud ETL RDBMS • Host locally • Host in the Cloud NoSQL • Host locally • Host in the Cloud
    27. 27. What about me?
    28. 28. Common DBA Tasks in NoSQLRDBMS NoSQLImport Data Import DataSetup Security Setup SecurityPerform a Backup Make a copy of the dataRestore a Database Move a copy to a locationCreate an Index Create an IndexJoin Tables Together Run MapReduceSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources usedSend an Email from SQL Server Set up resource threshold alertsSearch BOL Interpret Documentation
    29. 29. Demo - HadoopOnAzure – Part 2• Show MapReduce Job• Show JS / Hive consoles
    30. 30. Making Sense – Asking Questions
    31. 31. Data Scientists…
    32. 32. Comparing…
    33. 33. Karmasphere Studio for AWS
    34. 34. Hadoop Connector to Excel - Demo
    35. 35. NoSQL To-Do ListUnderstand CAP & types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problemTry out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environmentsLearn noSQL access technologies • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc…
    36. 36. The Changing Data Landscape Other ServicesRDBMS NoSQL
    37. 37. • recipes) • Free Courseware ( • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic 
    38. 38. Toward Data Craftsmanship… Follow me @LynnLangit RSS my blog Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions