A Practical Look at the        NOSQL and Big Data HullabalooAndrew J. Brust                     Sam BisbeeCEO and Founder ...
Meet Andrew •   CEO and Founder, Blue Badge Insights •   Big Data blogger for ZDNet •   Microsoft Regional Director, MVP •...
My New Blog (bit.ly/bigondata)
Read all about it!
Meet Sam•   Wait…you can’t. He’s not here.•   Sam Bisbee        – Director of Technical Business Development,          Clo...
Agenda•   Why NoSQL?•   NoSQL Definition(s)•   Concepts•   NoSQL Categories•   Provisioning, market, applicability•   Take...
Why NoSQL?
NoSQL Data Fodder  Addresses           Preferences           Documents          Friends, Foll                             ...
“Web Scale”•   This the term used to    justify NoSQL•   Scenario is simple needs    but “made up for in    volume”    – M...
NOSQL DEFINITION(S)
From SamWhat is NOSQL?•   “Not Only SQL” - this is not a holy war•   1870: Modern study of set theory begins•   1970: Codd...
From SamWhat is NOSQL?•   1970 - ~2000: the same sorts of databases    were made (plus a few niche products)•   Dot-Com Bu...
From SamSo What is NOSQL Really?New ways of looking at dynamic data storage   and querying for larger scale systems.  (sca...
NoSQL Common Traits•   Non-relational•   Non-schematized/schema-free•   Open source•   Distributed•   Eventual consistency...
CONCEPTS
Consistency•   CAP Theorem    – Databases may only excel at two of the following      three attributes: consistency, avail...
Consistency•    Things like inventory, account balances should be     consistent     –   Imagine updating a server in Seat...
CAP TheoremRelational                          Consistency                                                       NoSQL    ...
Indexing•   Most NoSQL databases are indexed by key•   Some allow so-called “secondary” indexes•   Often the primary key i...
Queries•   Typically no query language•   Instead, create procedural program•   Sometimes SQL is supported•   Sometimes Ma...
MapReduce•   Map step: pre-processes data•   Reduce step: summarizes/aggregates data•   Most typical of Hadoop and used wi...
Sharding•   A partitioning pattern where separate    servers store partitions•   Fan-out queries supported•   Partitions m...
NOSQL CATEGORIES
Key-Value Stores•   The most common; not necessarily the most    popular•   Has rows, each with something like a big    di...
Key-Value StoresDatabase     Table: Customers            Table: Orders      Row ID: 101                 Row ID: 1501      ...
Wide Column Stores•   Has tables with declared column families    – Each column family has “columns” which are KV pair tha...
Wide Column StoresTable: Customers                Table: Orders Row ID: 101 Super Column: Name  Column: First_Name:       ...
Wide Column Stores
Document Stores•   Have “databases,” which are akin to tables•   Have “documents,” akin to rows    – Documents are typical...
Document StoreApplication Orientation•   Documents can each be addressed by    URIs•   CouchDB supports full REST interfac...
Document StoresDatabase: Customers     Database: Orders Document ID: 101 First_Name: Andrew Last_Name: Brust Address:     ...
Document Stores
Graph Databases•   Great for social network applications and    others where relationships are important•   Nodes and edge...
Graph DatabasesDatabase                            George Washington                       Street: 123 Main Street        ...
PROVISIONING, MARKET,APPLICABILITY
NoSQL on Windows Azure•   Platform as a Service    – Cloudant: https://cloudant.com/azure/    – MongoDB (via MongoLab):   ...
NoSQL on Windows Azure•   Others, DIY (Linux VMs):    – Couchbase: http://blog.couchbase.com/couchbase-server-      new-wi...
From SamThe High-Level Shake Out•   Hadoop will continue to crush data    warehousing•   MongoDB will be the top MySQL / o...
NoSQL + BI•   NoSQL databases are bad for ad hoc query    and    data warehousing•   BI applications involve models; model...
NoSQL + Big Data•   Big Data and NoSQL are interrelated•   Typically, Wide-Column stores used in Big    Data scenarios•   ...
TAKE-AWAYS
Compromises•   Eventual consistency•   Write buffering•   Only primary keys can be indexed•   Queries must be written as p...
Summing Up•   Line of Business -> Relational•   Large, public (consumer)-facing sites ->    NoSQL•   Complex data structur...
Thank you•   andrew.brust@bluebadgeinsights.com•   @andrewbrust on twitter•   Want to get the free “Redmond Roundup    Plu...
Upcoming SlideShare
Loading in...5
×

A Practical Look at the NOSQL and Big Data Hullabaloo

482

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
482
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Practical Look at the NOSQL and Big Data Hullabaloo

  1. 1. A Practical Look at the NOSQL and Big Data HullabalooAndrew J. Brust Sam BisbeeCEO and Founder Senior Doing Stuff PersonBlue Badge Insights Cloudant (In Absentia) Level: Intermediate
  2. 2. Meet Andrew • CEO and Founder, Blue Badge Insights • Big Data blogger for ZDNet • Microsoft Regional Director, MVP • Co-chair VSLive! and 17 years as a speaker • Founder, Microsoft BI User Group of NYC – http://www.msbinyc.com • Co-moderator, NYC .NET Developers Group – http://www.nycdotnetdev.com • “Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News • brustblog.com, Twitter: @andrewbrust
  3. 3. My New Blog (bit.ly/bigondata)
  4. 4. Read all about it!
  5. 5. Meet Sam• Wait…you can’t. He’s not here.• Sam Bisbee – Director of Technical Business Development, Cloudant – He prefers “Senior Doing Stuff Person” Which is ironic• I’ve preserved a few of his slides. • Look for: From Sam in upper-right-hand corner
  6. 6. Agenda• Why NoSQL?• NoSQL Definition(s)• Concepts• NoSQL Categories• Provisioning, market, applicability• Take-aways
  7. 7. Why NoSQL?
  8. 8. NoSQL Data Fodder Addresses Preferences Documents Friends, Foll Notes owers
  9. 9. “Web Scale”• This the term used to justify NoSQL• Scenario is simple needs but “made up for in volume” – Millions of concurrent users• Think of sites like Amazon or Google• Think of non-transactional tasks like loading catalog data to display product page, or environment preferences
  10. 10. NOSQL DEFINITION(S)
  11. 11. From SamWhat is NOSQL?• “Not Only SQL” - this is not a holy war• 1870: Modern study of set theory begins• 1970: Codd writes “A Relational Model of Data for Large Shared Data Banks”• 1970 – 1980: Commercial implementations of Codds theory are released
  12. 12. From SamWhat is NOSQL?• 1970 - ~2000: the same sorts of databases were made (plus a few niche products)• Dot-Com Bubble forced the same data tier problems but at a new scale (Amazon), forcing innovation out of necessity• 2000 – present: innovations are becoming open source and “main stream” (Hadoop)
  13. 13. From SamSo What is NOSQL Really?New ways of looking at dynamic data storage and querying for larger scale systems. (scale = concurrent users and data size)
  14. 14. NoSQL Common Traits• Non-relational• Non-schematized/schema-free• Open source• Distributed• Eventual consistency• “Web scale”• Developed at big Internet companies
  15. 15. CONCEPTS
  16. 16. Consistency• CAP Theorem – Databases may only excel at two of the following three attributes: consistency, availability and partition tolerance• NoSQL does not offer “ACID” guarantees – Atomicity, consistency, isolation and durability• Instead offers “eventual consistency” – Similar to DNS propagation
  17. 17. Consistency• Things like inventory, account balances should be consistent – Imagine updating a server in Seattle that stock was depleted – Imagine not updating the server in NY – Customer in NY goes to order 50 pieces of the item – Order processed even though no stock• Things like catalog information don’t have to be, at least not immediately – If a new item is entered into t he catalog, it’s OK for some customers to see it even before the other customers’ server know about it• But catalog info must come up quickly – Therefore don’t lock data in one location while waiting to update he other• Therefore, OK to sacrifice consistency for speed, in some cases
  18. 18. CAP TheoremRelational Consistency NoSQL Partition Availability Tolerance
  19. 19. Indexing• Most NoSQL databases are indexed by key• Some allow so-called “secondary” indexes• Often the primary key indexes are clustered• HBase uses Hadoop Distributed File System, which is append-only – Writes are logged – Logged writes are batched – File is re-created and sorted
  20. 20. Queries• Typically no query language• Instead, create procedural program• Sometimes SQL is supported• Sometimes MapReduce code is used…
  21. 21. MapReduce• Map step: pre-processes data• Reduce step: summarizes/aggregates data• Most typical of Hadoop and used with Wide Column Stores, esp. HBase• Amazon Web Services’ Elastic MapReduce (EMR) can read/write DynamoDB, S3, Relational Database Service (RDS)• “Hive” offers a HiveQL (SQL-like) abstraction over MR – Use with Hive tables – Use with HBase
  22. 22. Sharding• A partitioning pattern where separate servers store partitions• Fan-out queries supported• Partitions may be duplicated, so replication also provided – Good for disaster recovery• Since “shards” can be geographically distributed, sharding can act like a CDN• Good for keeping data close to processing – Reduces network traffic when MapReduce splitting takes place
  23. 23. NOSQL CATEGORIES
  24. 24. Key-Value Stores• The most common; not necessarily the most popular• Has rows, each with something like a big dictionary/associative array – Schema may differ from row to row• Common on Cloud platforms – e.g. Amazon SimpleDB, Azure Table Storage• MemcacheDB, Voldemort, Couchbase• DynamoDB (AWS), Dynomite, Redis and Riak
  25. 25. Key-Value StoresDatabase Table: Customers Table: Orders Row ID: 101 Row ID: 1501 First_Name: Andrew Price: 300 USD Last_Name: Brust Item1: 52134 Address: 123 Main Street Item2: 24457 Last_Order: 1501 Row ID: 202 Row ID: 1502 First_Name: Jane Price: 2500 GBP Last_Name: Doe Item1: 98456 Address: 321 Elm Street Item2: 59428 Last_Order: 1502
  26. 26. Wide Column Stores• Has tables with declared column families – Each column family has “columns” which are KV pair that can vary from row to row• These are the most foundational for large sites – Big Table (Google) – HBase (Originally part of Yahoo-dominated Hadoop project) – Cassandra (Facebook) Calls column families “super columns” and tables “super column families”• They are the most “Big Data”-ready – Especially HBase + Hadoop
  27. 27. Wide Column StoresTable: Customers Table: Orders Row ID: 101 Super Column: Name Column: First_Name: Row ID: 1501 Andrew Super Column: Pricing Column: Last_Name: Brust Column: Price: 300 USD Super Column: Address Super Column: Items Column: Number: 123 Column: Item1: 52134 Column: Street: Main Street Column: Item2: 24457 Super Column: Orders Column: Last_Order: 1501 Row ID: 202 Row ID: 1502 Super Column: Name Column: First_Name: Jane Super Column: Pricing Column: Last_Name: Doe Column: Price: 2500 Super Column: Address GBP Column: Number: 321 Super Column: Items Column: Street: Elm Street Column: Item1: 98456 Super Column: Orders Column: Item2: 59428 Column: Last_Order: 1502
  28. 28. Wide Column Stores
  29. 29. Document Stores• Have “databases,” which are akin to tables• Have “documents,” akin to rows – Documents are typically JSON objects – Each document has properties and values – Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. contained JSON objects - Allows for hierarchical storage) – Can have attachments as well• Old versions are retained – So Doc Stores work well for content management• Some view doc stores as specialized KV stores• Most popular with developers, startups, VCs• The biggies: – CouchDB – Derivatives – MongoDB
  30. 30. Document StoreApplication Orientation• Documents can each be addressed by URIs• CouchDB supports full REST interface• Very geared towards JavaScript and JSON – Documents are JSON objects – CouchDB/MongoDB use JavaScript as native language• In CouchDB, “view functions” also have unique URIs and they return HTML – So you can build entire applications in the database
  31. 31. Document StoresDatabase: Customers Database: Orders Document ID: 101 First_Name: Andrew Last_Name: Brust Address: Document ID: 1501 Price: 300 USD Number: 123 Item1: 52134 Street: Main Street Item2: 24457 Orders: Most_recent: 1501 Document ID: 202 First_Name: Jane Last_Name: Doe Document ID: 1502 Address: Price: 2500 GBP Number: 321 Item1: 98456 Street: Elm Street Item2: 59428 Orders: Most_recent: 1502
  32. 32. Document Stores
  33. 33. Graph Databases• Great for social network applications and others where relationships are important• Nodes and edges – Edge like a join – Nodes like rows in a table• Nodes can also have properties and values• Neo4j is a popular graph db
  34. 34. Graph DatabasesDatabase George Washington Street: 123 Main Street City: New York Friend of State: NY Zip: 10014 Address Placed order Andrew Brust ID: 252 Total Price: 300 USD Item1 Item2 Joe Smith Jane Doe ID: 52134 ID: 24457 Type: Dress Type: Shirt Color: Blue Color: Red Commented on Sent invitation to photo by
  35. 35. PROVISIONING, MARKET,APPLICABILITY
  36. 36. NoSQL on Windows Azure• Platform as a Service – Cloudant: https://cloudant.com/azure/ – MongoDB (via MongoLab): http://blog.mongolab.com/2012/10/azure/• MongoDB, DIY: – On an Azure Worker Role: http://www.mongodb.org/display/DOCS/MongoDB+on+Azur e+Worker+Roles – On a Windows VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azur e+VM+-+Windows+Installer – On a Linux VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azur e+VM+-+Linux+Tutorial http://www.windowsazure.com/en- us/manage/linux/common-tasks/mongodb-on-a-linux-vm/
  37. 37. NoSQL on Windows Azure• Others, DIY (Linux VMs): – Couchbase: http://blog.couchbase.com/couchbase-server- new-windows-azure – CouchDB: http://ossonazure.interoperabilitybridges.com/articles/couch db-installer-for-windows-azure – Riak: http://basho.com/blog/technical/2012/10/09/Riak-on- Microsoft-Azure/ – Redis: http://blogs.msdn.com/b/tconte/archive/2012/06/08/running- redis-on-a-centos-linux-vm-in-windows-azure.aspx – Cassandra: http://www.windowsazure.com/en- us/manage/linux/other-resources/how-to-run-cassandra- with-linux/
  38. 38. From SamThe High-Level Shake Out• Hadoop will continue to crush data warehousing• MongoDB will be the top MySQL / on-prem alternative• Cloudant will be the top as-a-Service / Cloud database• Basho is pivoting toward cloud object store
  39. 39. NoSQL + BI• NoSQL databases are bad for ad hoc query and data warehousing• BI applications involve models; models rely on schema• Extract, transform and load (ETL) may be your friend• Wide-column stores, however are good for “Big Data” – See next slide• Wide-column stores and column-oriented databases are similar technologically
  40. 40. NoSQL + Big Data• Big Data and NoSQL are interrelated• Typically, Wide-Column stores used in Big Data scenarios• Prime example: – HBase and Hadoop• Why? – Lack of indexing not a problem – Consistency not an issue – Fast reads very important – Distributed files systems important too – Commodity hardware and disk assumptions also important – Not Web scale but massive scale-out, so similar concerns
  41. 41. TAKE-AWAYS
  42. 42. Compromises• Eventual consistency• Write buffering• Only primary keys can be indexed• Queries must be written as programs• Tooling – Productivity (= money)
  43. 43. Summing Up• Line of Business -> Relational• Large, public (consumer)-facing sites -> NoSQL• Complex data structures -> Relational• Big Data -> NoSQL• Transactional -> Relational• Content Management -> NoSQL• Enterprise->Relational• Consumer Web -> NoSQL
  44. 44. Thank you• andrew.brust@bluebadgeinsights.com• @andrewbrust on twitter• Want to get the free “Redmond Roundup Plus?” Text “bluebadge” to 22828
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×