NoSQL: An Analysis


Published on

NoSQL: An Analysis - PASS Business Analytics Conference 2013

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • NoSQL: An Analysis

    1. 1. April 10-12 | Chicago, ILNoSQL: An AnalysisAndrew J. Brust, Founder and CEO, Blue Badge Insights
    2. 2. April 10-12 | Chicago, ILPlease silencecell phones
    3. 3. Meet AndrewCEO and Founder, Blue Badge InsightsBig Data blogger for ZDNetMicrosoft Regional Director, MVPCo-chair VSLive! and 17 years as a speakerFounder, Microsoft BI User Group of NYC• http://www.msbinyc.comCo-moderator, NYC .NET Developers Group•“Redmond Review” columnist for Visual Studio Magazine and Redmond Developer, Twitter: @andrewbrust3
    4. 4. Andrew’s New Blog (
    5. 5. Read all about it!
    6. 6. AgendaWhy NoSQL?ConceptsNoSQL CategoriesProvisioning, market, applicabilityTake-aways
    7. 7. Why NoSQL?
    8. 8. NoSQL Data FodderAddresses PreferencesNotesFriends,FollowersDocuments
    9. 9. “Web Scale”This the term used to justify NoSQLScenario is simple needs but “made up for involume”• Millions of concurrent usersThink of sites like Amazon or GoogleThink of non-transactional tasks like loadingcatalog data to display product page, orenvironment preferences
    10. 10. NoSQL Common TraitsNon-relationalNon-schematized/schema-freeOpen sourceDistributedEventual consistency“Web scale”Developed at big Internet companies
    11. 11. CONCEPTS
    12. 12. ConsistencyCAP Theorem• Databases may only excel at two of the following three attributes:consistency, availability and partition toleranceNoSQL does not offer “ACID” guarantees• Atomicity, consistency, isolation and durabilityInstead offers “eventual consistency”Similar to DNS propagation
    13. 13. Things like inventory, account balances should be consistent• Imagine updating a server in Seattle that stock was depleted• Imagine not updating the server in NY• Customer in NY goes to order 50 pieces of the item• Order processed even though no stockThings like catalog information don’t have to be, at least not immediately• If a new item is entered into the catalog, it’s OK for some customers to see iteven before the other customers’ server knows about itBut catalog info must come up quickly• Therefore don’t lock data in one location while waiting to update the otherTherefore, OK to sacrifice consistency for speed, in some casesConsistency
    14. 14. CAP TheoremConsistencyAvailabilityPartitionToleranceRelationalNoSQL
    15. 15. IndexingMost NoSQL databases are indexed by keySome allow so-called “secondary” indexesOften the primary key indexes are clusteredHBase uses HDFS (the Hadoop Distributed File System), which isappend-only• Writes are logged• Logged writes are batched• File is re-created and sorted
    16. 16. QueriesTypically no query languageInstead, create procedural programSometimes SQL is supportedSometimes MapReduce code is used…
    17. 17. MapReduceThis is not Hadoop’s MapReduce, but it’s conceptually relatedMap step: pre-processes dataReduce step: summarizes/aggregates dataWill show a MapReduce code sample for Mongo soonWill demo map code on CouchDB
    18. 18. ShardingA partitioning pattern where separate servers store partitionsFan-out queries supportedPartitions may be duplicated, so replication also provided• Good for disaster recoverySince “shards” can be geographically distributed, sharding can act like aCDNGood for keeping data close to processing• Reduces network traffic when MapReduce splitting takes place
    20. 20. Key-Value StoresThe most common; not necessarily the most popularHas rows, each with something like a big dictionary/associative array• Schema may differ from row to rowCommon on cloud platforms• e.g. Amazon SimpleDB, Azure Table StorageMemcacheDB, Voldemort, Couchbase, DynamoDB (AWS), Dynomite,Redis and Riak20
    21. 21. Key-Value StoresTable: CustomersRow ID: 101First_Name: AndrewLast_Name: BrustAddress: 123 Main StreetLast_Order: 1501Row ID: 202First_Name: JaneLast_Name: DoeAddress: 321 Elm StreetLast_Order: 1502Table: OrdersRow ID: 1501Price: 300 USDItem1: 52134Item2: 24457Row ID: 1502Price: 2500 GBPItem1: 98456Item2: 59428Database
    22. 22. Wide Column StoresHas tables with declared column families• Each column family has “columns” which are KV pairs that can vary from row to rowThese are the most foundational for large sites• BigTable (Google)• HBase (Originally part of Yahoo-dominated Hadoop project)• Cassandra (Facebook)• Calls column families “super columns” and tables “super column families”They are the most “Big Data”-ready• Especially HBase + Hadoop
    23. 23. Table: CustomersRow ID: 101Super Column: NameColumn: First_Name:AndrewColumn: Last_Name: BrustSuper Column: AddressColumn: Number: 123Column: Street: Main StreetSuper Column: OrdersColumn: Last_Order: 1501Table: OrdersRow ID: 1501Super Column: PricingColumn: Price: 300USDSuper Column: ItemsColumn: Item1: 52134Column: Item2: 24457Row ID: 1502Super Column: PricingColumn: Price: 2500GBPSuper Column: ItemsColumn: Item1: 98456Column: Item2: 59428Row ID: 202Super Column: NameColumn: First_Name: JaneColumn: Last_Name: DoeSuper Column: AddressColumn: Number: 321Column: Street: Elm StreetSuper Column: OrdersColumn: Last_Order: 1502Wide Column Stores
    24. 24. April 10-12 | Chicago, ILDemoWide Column Stores
    25. 25. Document StoresHave “databases,” which are akin to tablesHave “documents,” akin to rows• Documents are typically JSON objects• Each document has properties and values• Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. containedJSON objects - Allows for hierarchical storage)• Can have attachments as wellOld versions are retained• So Doc Stores work well for content managementSome view doc stores as specialized KV storesMost popular with developers, startups, VCsThe biggies:• CouchDB• Derivatives• MongoDB
    26. 26. Document Store Application OrientationDocuments can each be addressed by URIsCouchDB supports full REST interfaceVery geared towards JavaScript and JSON• Documents are JSON objects• CouchDB/MongoDB use JavaScript as native languageIn CouchDB, “view functions” also have unique URIs and they returnHTML• So you can build entire applications in the database
    27. 27. Database: CustomersDocument ID: 101First_Name: AndrewLast_Name: BrustAddress:Orders:Database: OrdersDocument ID: 1501Price: 300 USDItem1: 52134Item2: 24457Document ID: 1502Price: 2500 GBPItem1: 98456Item2: 59428Number: 123Street: Main StreetMost_recent: 1501Document ID: 202First_Name: JaneLast_Name: DoeAddress:Orders:Number: 321Street: Elm StreetMost_recent: 1502Document Stores
    28. 28. April 10-12 | Chicago, ILDemoDocument Stores
    29. 29. Graph DatabasesGreat for social network applications and others where relationships areimportantNodes and edges• Edge like a join• Nodes like rows in a tableNodes can also have properties and valuesNeo4j is a popular graph db
    30. 30. DatabaseSent invitationtoCommented onphoto byFriendofAddressPlaced orderItem2Item1Joe Smith JaneDoeAndrew BrustStreet: 123 MainStreetCity: New YorkState: NYZip: 10014ID: 52134Type: DressColor: BlueID: 24457Type: ShirtColor: RedID: 252Total Price: 300USDGeorge WashingtonGraph Databases
    32. 32. NoSQL + BINoSQL databases are bad for ad hoc query and data warehousingBI applications involve models; models rely on schemaExtract, transform and load (ETL) may be your friendWide-column stores, however are good for “Big Data”• See next slideWide-column stores and column-oriented databases are similartechnologically
    33. 33. NoSQL + Big DataBig Data and NoSQL are interrelatedTypically, Wide-Column stores used in Big Data scenariosPrime example:• HBase and HadoopWhy?• Lack of indexing not a problem• Consistency not an issue• Fast reads very important• Distributed file systems important too• Commodity hardware and disk assumptions also important• Not Web scale but massive scale-out, so similar concerns
    34. 34. Going “NoSQL-Like” on the MS CloudAzure Table Storage (a key-value store)SQL Azure XML columns (supports variable schema, hierarchy)SQL Azure Federation (a sharding implementation)OData (HTTP/JSON data APIs)Running NoSQL database products using Azure VMs…34
    35. 35. NoSQL on Windows AzurePlatform as a Service• Cloudant:• MongoDB (via MongoLab):, DIY:• On an Azure Worker Role:• On a Windows VM:• On a Linux VM:
    36. 36. NoSQL on Windows AzureOthers, DIY (Linux VMs):• Couchbase:• CouchDB:• Riak:• Redis:• Cassandra:
    37. 37. And With MS On-Premise TechnologiesSQL Server 2008/2008R2/2012 “Beyond Relational” Features• Sparse columns (like Wide Column Stores)• Geospatial (geometry, geography data types)• FILESTREAM, FileTable (like Document Store attachments)• Full Text Search, Semantic Similarity Search• HierarchyID (can simulate Graph Database functionality)SQL Server Parallel Data Warehouse Edition (PDW)• Distributed architecture (like MapReduce/Hadoop)• PolyBase in PDW v2 (interfaces PDW and HDFS)37
    38. 38. TAKE-AWAYS
    39. 39. CompromisesEventual consistencyWrite bufferingOnly primary keys can be indexedQueries must be written as programsTooling• Productivity (= money)
    40. 40. Summing Up• Line of Business -> Relational• Large, public (consumer)-facing sites -> NoSQL• Complex data structures -> Relational• Big Data -> NoSQL• Transactional -> Relational• Content Management -> NoSQL• Enterprise->Relational• Consumer Web -> NoSQL
    41. 41. Thank you•• @andrewbrust on twitter• Want to get on Blue Badge Insights’ list?”Text “bluebadge” to 22828
    42. 42. Win a Microsoft Surface Pro!Complete an online SESSION EVALUATIONto be entered into the draw.Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BAConference website and on Twitter.Go to or follow the QR code link displayed onsession signage throughout the conference venue.Your feedback is important and valuable. All feedback will be used to improveand select sessions for future events.
    43. 43. April 10-12, Chicago, ILThank you!Diamond Sponsor Platinum Sponsor