NoSQL: An Analysis

  • 2,484 views
Uploaded on

NoSQL: An Analysis - PASS Business Analytics Conference 2013

NoSQL: An Analysis - PASS Business Analytics Conference 2013

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,484
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
39
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://www.chegg.com/textbooks/foundations-of-sql-server-2008-r2-business-intelligence-2nd-edition-9781430233244-1430233249http://www.chegg.com/textbooks/smart-business-intelligence-solutions-with-microsoft-sql-server-2008-1st-edition-9780735625808-0735625808

Transcript

  • 1. April 10-12 | Chicago, ILNoSQL: An AnalysisAndrew J. Brust, Founder and CEO, Blue Badge Insights
  • 2. April 10-12 | Chicago, ILPlease silencecell phones
  • 3. Meet AndrewCEO and Founder, Blue Badge InsightsBig Data blogger for ZDNetMicrosoft Regional Director, MVPCo-chair VSLive! and 17 years as a speakerFounder, Microsoft BI User Group of NYC• http://www.msbinyc.comCo-moderator, NYC .NET Developers Group• http://www.nycdotnetdev.com“Redmond Review” columnist for Visual Studio Magazine and Redmond Developer Newsbrustblog.com, Twitter: @andrewbrust3
  • 4. Andrew’s New Blog (bit.ly/bigondata)
  • 5. Read all about it!
  • 6. AgendaWhy NoSQL?ConceptsNoSQL CategoriesProvisioning, market, applicabilityTake-aways
  • 7. Why NoSQL?
  • 8. NoSQL Data FodderAddresses PreferencesNotesFriends,FollowersDocuments
  • 9. “Web Scale”This the term used to justify NoSQLScenario is simple needs but “made up for involume”• Millions of concurrent usersThink of sites like Amazon or GoogleThink of non-transactional tasks like loadingcatalog data to display product page, orenvironment preferences
  • 10. NoSQL Common TraitsNon-relationalNon-schematized/schema-freeOpen sourceDistributedEventual consistency“Web scale”Developed at big Internet companies
  • 11. CONCEPTS
  • 12. ConsistencyCAP Theorem• Databases may only excel at two of the following three attributes:consistency, availability and partition toleranceNoSQL does not offer “ACID” guarantees• Atomicity, consistency, isolation and durabilityInstead offers “eventual consistency”Similar to DNS propagation
  • 13. Things like inventory, account balances should be consistent• Imagine updating a server in Seattle that stock was depleted• Imagine not updating the server in NY• Customer in NY goes to order 50 pieces of the item• Order processed even though no stockThings like catalog information don’t have to be, at least not immediately• If a new item is entered into the catalog, it’s OK for some customers to see iteven before the other customers’ server knows about itBut catalog info must come up quickly• Therefore don’t lock data in one location while waiting to update the otherTherefore, OK to sacrifice consistency for speed, in some casesConsistency
  • 14. CAP TheoremConsistencyAvailabilityPartitionToleranceRelationalNoSQL
  • 15. IndexingMost NoSQL databases are indexed by keySome allow so-called “secondary” indexesOften the primary key indexes are clusteredHBase uses HDFS (the Hadoop Distributed File System), which isappend-only• Writes are logged• Logged writes are batched• File is re-created and sorted
  • 16. QueriesTypically no query languageInstead, create procedural programSometimes SQL is supportedSometimes MapReduce code is used…
  • 17. MapReduceThis is not Hadoop’s MapReduce, but it’s conceptually relatedMap step: pre-processes dataReduce step: summarizes/aggregates dataWill show a MapReduce code sample for Mongo soonWill demo map code on CouchDB
  • 18. ShardingA partitioning pattern where separate servers store partitionsFan-out queries supportedPartitions may be duplicated, so replication also provided• Good for disaster recoverySince “shards” can be geographically distributed, sharding can act like aCDNGood for keeping data close to processing• Reduces network traffic when MapReduce splitting takes place
  • 19. NOSQL CATEGORIES
  • 20. Key-Value StoresThe most common; not necessarily the most popularHas rows, each with something like a big dictionary/associative array• Schema may differ from row to rowCommon on cloud platforms• e.g. Amazon SimpleDB, Azure Table StorageMemcacheDB, Voldemort, Couchbase, DynamoDB (AWS), Dynomite,Redis and Riak20
  • 21. Key-Value StoresTable: CustomersRow ID: 101First_Name: AndrewLast_Name: BrustAddress: 123 Main StreetLast_Order: 1501Row ID: 202First_Name: JaneLast_Name: DoeAddress: 321 Elm StreetLast_Order: 1502Table: OrdersRow ID: 1501Price: 300 USDItem1: 52134Item2: 24457Row ID: 1502Price: 2500 GBPItem1: 98456Item2: 59428Database
  • 22. Wide Column StoresHas tables with declared column families• Each column family has “columns” which are KV pairs that can vary from row to rowThese are the most foundational for large sites• BigTable (Google)• HBase (Originally part of Yahoo-dominated Hadoop project)• Cassandra (Facebook)• Calls column families “super columns” and tables “super column families”They are the most “Big Data”-ready• Especially HBase + Hadoop
  • 23. Table: CustomersRow ID: 101Super Column: NameColumn: First_Name:AndrewColumn: Last_Name: BrustSuper Column: AddressColumn: Number: 123Column: Street: Main StreetSuper Column: OrdersColumn: Last_Order: 1501Table: OrdersRow ID: 1501Super Column: PricingColumn: Price: 300USDSuper Column: ItemsColumn: Item1: 52134Column: Item2: 24457Row ID: 1502Super Column: PricingColumn: Price: 2500GBPSuper Column: ItemsColumn: Item1: 98456Column: Item2: 59428Row ID: 202Super Column: NameColumn: First_Name: JaneColumn: Last_Name: DoeSuper Column: AddressColumn: Number: 321Column: Street: Elm StreetSuper Column: OrdersColumn: Last_Order: 1502Wide Column Stores
  • 24. April 10-12 | Chicago, ILDemoWide Column Stores
  • 25. Document StoresHave “databases,” which are akin to tablesHave “documents,” akin to rows• Documents are typically JSON objects• Each document has properties and values• Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. containedJSON objects - Allows for hierarchical storage)• Can have attachments as wellOld versions are retained• So Doc Stores work well for content managementSome view doc stores as specialized KV storesMost popular with developers, startups, VCsThe biggies:• CouchDB• Derivatives• MongoDB
  • 26. Document Store Application OrientationDocuments can each be addressed by URIsCouchDB supports full REST interfaceVery geared towards JavaScript and JSON• Documents are JSON objects• CouchDB/MongoDB use JavaScript as native languageIn CouchDB, “view functions” also have unique URIs and they returnHTML• So you can build entire applications in the database
  • 27. Database: CustomersDocument ID: 101First_Name: AndrewLast_Name: BrustAddress:Orders:Database: OrdersDocument ID: 1501Price: 300 USDItem1: 52134Item2: 24457Document ID: 1502Price: 2500 GBPItem1: 98456Item2: 59428Number: 123Street: Main StreetMost_recent: 1501Document ID: 202First_Name: JaneLast_Name: DoeAddress:Orders:Number: 321Street: Elm StreetMost_recent: 1502Document Stores
  • 28. April 10-12 | Chicago, ILDemoDocument Stores
  • 29. Graph DatabasesGreat for social network applications and others where relationships areimportantNodes and edges• Edge like a join• Nodes like rows in a tableNodes can also have properties and valuesNeo4j is a popular graph db
  • 30. DatabaseSent invitationtoCommented onphoto byFriendofAddressPlaced orderItem2Item1Joe Smith JaneDoeAndrew BrustStreet: 123 MainStreetCity: New YorkState: NYZip: 10014ID: 52134Type: DressColor: BlueID: 24457Type: ShirtColor: RedID: 252Total Price: 300USDGeorge WashingtonGraph Databases
  • 31. PROVISIONING, MARKET, APPLICABILITY
  • 32. NoSQL + BINoSQL databases are bad for ad hoc query and data warehousingBI applications involve models; models rely on schemaExtract, transform and load (ETL) may be your friendWide-column stores, however are good for “Big Data”• See next slideWide-column stores and column-oriented databases are similartechnologically
  • 33. NoSQL + Big DataBig Data and NoSQL are interrelatedTypically, Wide-Column stores used in Big Data scenariosPrime example:• HBase and HadoopWhy?• Lack of indexing not a problem• Consistency not an issue• Fast reads very important• Distributed file systems important too• Commodity hardware and disk assumptions also important• Not Web scale but massive scale-out, so similar concerns
  • 34. Going “NoSQL-Like” on the MS CloudAzure Table Storage (a key-value store)SQL Azure XML columns (supports variable schema, hierarchy)SQL Azure Federation (a sharding implementation)OData (HTTP/JSON data APIs)Running NoSQL database products using Azure VMs…34
  • 35. NoSQL on Windows AzurePlatform as a Service• Cloudant: https://cloudant.com/azure/• MongoDB (via MongoLab): http://blog.mongolab.com/2012/10/azure/MongoDB, DIY:• On an Azure Worker Role:http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+Worker+Roles• On a Windows VM:http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Windows+Installer• On a Linux VM:http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Linux+Tutorialhttp://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/
  • 36. NoSQL on Windows AzureOthers, DIY (Linux VMs):• Couchbase:http://blog.couchbase.com/couchbase-server-new-windows-azure• CouchDB: http://ossonazure.interoperabilitybridges.com/articles/couchdb-installer-for-windows-azure• Riak:http://basho.com/blog/technical/2012/10/09/Riak-on-Microsoft-Azure/• Redis: http://blogs.msdn.com/b/tconte/archive/2012/06/08/running-redis-on-a-centos-linux-vm-in-windows-azure.aspx• Cassandra: http://www.windowsazure.com/en-us/manage/linux/other-resources/how-to-run-cassandra-with-linux/
  • 37. And With MS On-Premise TechnologiesSQL Server 2008/2008R2/2012 “Beyond Relational” Features• Sparse columns (like Wide Column Stores)• Geospatial (geometry, geography data types)• FILESTREAM, FileTable (like Document Store attachments)• Full Text Search, Semantic Similarity Search• HierarchyID (can simulate Graph Database functionality)SQL Server Parallel Data Warehouse Edition (PDW)• Distributed architecture (like MapReduce/Hadoop)• PolyBase in PDW v2 (interfaces PDW and HDFS)37
  • 38. TAKE-AWAYS
  • 39. CompromisesEventual consistencyWrite bufferingOnly primary keys can be indexedQueries must be written as programsTooling• Productivity (= money)
  • 40. Summing Up• Line of Business -> Relational• Large, public (consumer)-facing sites -> NoSQL• Complex data structures -> Relational• Big Data -> NoSQL• Transactional -> Relational• Content Management -> NoSQL• Enterprise->Relational• Consumer Web -> NoSQL
  • 41. Thank you• andrew.brust@bluebadgeinsights.com• @andrewbrust on twitter• Want to get on Blue Badge Insights’ list?”Text “bluebadge” to 22828
  • 42. Win a Microsoft Surface Pro!Complete an online SESSION EVALUATIONto be entered into the draw.Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BAConference website and on Twitter.Go to passbaconference.com/evals or follow the QR code link displayed onsession signage throughout the conference venue.Your feedback is important and valuable. All feedback will be used to improveand select sessions for future events.
  • 43. April 10-12, Chicago, ILThank you!Diamond Sponsor Platinum Sponsor