Your SlideShare is downloading. ×
NoSQL for the SQL Server DBA
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

NoSQL for the SQL Server DBA


Published on

Slides from my talk at SQLSaturday 120 in Huntington Beach, CA in March 2012

Slides from my talk at SQLSaturday 120 in Huntington Beach, CA in March 2012

Published in: Technology
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • From the O’Reilly / Strata “Getting Ready for Big Data” Report…“the three Vs of volume, velocity and variety are commonlyused to characterize different aspects of big data”
  • about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”
  • -
  • & - of noSQL databases – good, the bad -
  • OriginalReference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • via REST APIsVery Cheap, but not much functionality includedLots of code to write for application developmentBut…can be a good backup solution
  • DataMarkets – InfoChimps, Factual, DataMarket, Windows Azure Data Marketplace, Wolfram Alpha, Datasift and
  • Hadoop on AWS -
  • From SQL Pass Summit 2011 – by Steve JonesEditorSQLServerCentral/ Red Gate Software
  • - and
  • When the volume of data is too much for simple human interpretation ->Man PLUS Machine (Data Mining / Statistics)
  • About Data Science -- language - - are a plethora of languages to access, manipulate and process bigData. These languages fall into a couple of categories:RESTful – simple, standardsETL – Pig (Hadoop) is an exampleQuery – Hive (again Hadoop), lots of *QLAnalyze – R, Mahout, Infer.NET, DMX, etc.. Applying statistical (data-mining) algorithms to the data output
  • - analyst
  • -
  • Lynn
  • Transcript

    • 1. noSQL for the DBA Lynn LangitPractioner, Author, Instructor March 2012- for SQL Saturday SoCal
    • 2. BigData = ‘Next State’ Questions • What could happen? Collecting • Why didn’t this happen? • When will the next new thing Behavioral happen? data • What will the next new thing be? • What happens?
    • 3. BigData = Exponentially More Data• Retail Example -> ‘Feedback Economy’ – Number of transactions – Number of behaviors (collected every minute) 2500 2000 1500 Purchases 1000 Locations Phone data 500 0 12:00 12:30 1:00 1:30 2:00 2:30
    • 4. So Why Change?
    • 5. Hitting (Relational) Walls• For Writes – Scale (partition / shard) – Speed (latency)• For Reads – Failures (availability)
    • 6. Is NoSQL just Hadoop?• HUGE Hype factor in 2011 / 2012Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Googles MapReduce and Google File System (GFS) papers
    • 7. Working with Hadoop Common Tools / Languages • Java (JDK) / Eclipse • MapReduce • Map (query/format) • Reduce (aggregate) • plug-in for Eclipse (Java) • Pig (ETL -- Java) • Hive (HQL Query) • HBase tables • Others • Mahout (analyze) • Karmasphere (analyze) • R (analyze)
    • 8. Oracle Loader for HadoopSQL Server Connector for Hadoop
    • 9. Demo -Hadoop on Azure – Cluster Allocation
    • 10. The reality…two pivotsStorage Methods Storage Locations• SQL (RDBMS) • On premises• noSQL • Cloud-hosted
    • 11. So many NoSQL options• More than just the Elephant in the room• Over 120+ types of noSQL databases
    • 12. Flavors of noSQL
    • 13. Graph DatabaseUse for data with – a lot of many-to-many relationships – recursive self-joins – when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data – Examples: Neo4J, FreeBase (Google)
    • 14. Column Database• Wide, sparse column sets• Examples: – Cassandra – HBase – BigTable – GAE HR DS – Azure Tables
    • 15. Demo - Document Database (Mongo DB)• Use for data that is – document-oriented (collection of JSON documents) w/semi structured data • Encodings include XML, YAML, JSON & BSON – binary forms • PDF, Microsoft Office documents -- Word, Excel…) – Examples: MongoDB, CouchDB
    • 16. Key / Value Database• Schema-less• State (Persistent or Volatile)• Examples – AWS Dynamo DB – Project Voldemort
    • 17. So which type of NoSQL? Back to CAP…CP = noSQL/column CA = SQL/RDBMSHadoop SQL Sever / SQL AzureBig Table OracleH-base MySQLMemCacheDB Consistency(graph)?AP = noSQL/documentor key/valueDynamoDBCouchDBCassandraVoldemort Partitioning Availability
    • 18. Example Comparison: RDBMS vs. Hadoop Traditional RDBMS HadoopData Size Gigabytes (Terabytes) Petabytes (Hexabytes)Access Interactive and Batch Batch – NOT InteractiveUpdates Read / Write many times Write once, Read many timesStructure Static Schema Dynamic SchemaIntegrity High (ACID) LowScaling Nonlinear LinearQuery Response Can be near immediate Has latency (due to batch processing)Time
    • 19. Real-World Examples – not only SQL• Facebook runs on Hadoop & MySQL• Twitter runs on Hadoop(ran on FlockDb/graph)• Yahoo runs on Hadoop• LinkedIn runs on Hadoop & Voldemort• Klout runs Hadoop (on Azure) &HBase (Hive) & SQL Server SSAS BISM cubes
    • 20. What about the cloud?
    • 21. Cloud-hosted NoSQL up to 50x CHEAPER
    • 22. NoSQL (Cloud) BLOB Storage Buckets• Amazon – S3 – The gold standard• Google – Cloud Storage – Free for developers• Microsoft Azure BLOBS• DropBox, Box…
    • 23. Cloud-hosted RDBMS• AWS RDS – mySQL, Oracle – Medium cost – Solid feature set, i.e. backup, snapshot• Google – mySQL – Lowest cost – Most limited RDBMS functionality• Microsoft – SQLAzure – Best tooling integration – Highest cost
    • 24. Other types of cloud data servicesHosting public datasets• Pay to read• Earn revenue by offering for readCleaning / matching (your) data• ETL – Microsoft Data Explorer, Google Refine• Data Quality – Windows Azure Data Market, InfoChimps,
    • 25. Cloud – RDBMS AND NoSQL AWS Google Microsoft OthersCloud RDBMS Oracle / mySQL mySQL SQL Azure Hosted RDBMS on RackspacenoSQL buckets S3 Cloud Storage HDFS on AzureNoSQL DynamoDB H/R Datastore Azure Tables Herokudatabases on GAEStreaming Custom EC2 Prospective StreamInsightMachine Search & & Mahout withLearning Prediction API HadoopDocument or MongoDB on Freebase (g) MongoDB on Cassandra onGraph EC2 Windows Azure RackspaceHadoop Elastic Big Query Hadoop on MapReduce on (HBase-like) Azure S3 & EC2Data sets & Karmasphere Translation API Azure Database.comother Full-text search DataMarket
    • 26. Pick your mix and then… • Use Cloud Other Data Markets Services • Use Cloud ETL RDBMS • Host locally • Host in the Cloud NoSQL • Host locally • Host in the Cloud
    • 27. What about me?
    • 28. Common DBA Tasks in NoSQLRDBMS NoSQLImport Data Import DataSetup Security Setup SecurityPerform a Backup Make a copy of the dataRestore a Database Move a copy to a locationCreate an Index Create an IndexJoin Tables Together Run MapReduceSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources usedSend an Email from SQL Server Set up resource threshold alertsSearch BOL Interpret Documentation
    • 29. Demo - HadoopOnAzure – Part 2• Show MapReduce Job• Show JS / Hive consoles
    • 30. Making Sense – Asking Questions
    • 31. Data Scientists…
    • 32. Comparing…
    • 33. Karmasphere Studio for AWS
    • 34. Hadoop Connector to Excel - Demo
    • 35. NoSQL To-Do ListUnderstand CAP & types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problemTry out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environmentsLearn noSQL access technologies • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc…
    • 36. The Changing Data Landscape Other ServicesRDBMS NoSQL
    • 37. • recipes) • Free Courseware ( • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic 
    • 38. Toward Data Craftsmanship… Follow me @LynnLangit RSS my blog Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions