SlideShare a Scribd company logo
1 of 44
A Practical Look at the
        NOSQL and Big Data Hullabaloo
Andrew J. Brust                     Sam Bisbee
CEO and Founder           Senior Doing Stuff Person
Blue Badge Insights                        Cloudant
                                       (In Absentia)


                                 Level: Intermediate
Meet Andrew

 •   CEO and Founder, Blue Badge Insights
 •   Big Data blogger for ZDNet
 •   Microsoft Regional Director, MVP
 •   Co-chair VSLive! and 17 years as a speaker
 •   Founder, Microsoft BI User Group of NYC
     – http://www.msbinyc.com
 •   Co-moderator, NYC .NET Developers Group
     – http://www.nycdotnetdev.com
 •   “Redmond Review” columnist for
     Visual Studio Magazine and Redmond
     Developer News
 •   brustblog.com, Twitter: @andrewbrust
My New Blog (bit.ly/bigondata)
Read all about it!
Meet Sam
•   Wait…you can’t. He’s not here.
•   Sam Bisbee
        – Director of Technical Business Development,
          Cloudant
        – He prefers “Senior Doing Stuff Person”
           Which is ironic
•   I’ve preserved a few of his slides.
    •     Look for: From Sam in upper-right-hand corner
Agenda
•   Why NoSQL?
•   NoSQL Definition(s)
•   Concepts
•   NoSQL Categories
•   Provisioning, market, applicability
•   Take-aways
Why NoSQL?
NoSQL Data Fodder



  Addresses               Preferences           Documents




               Friends,
                                        Notes
              Followers
“Web Scale”
•   This the term used to
    justify NoSQL
•   Scenario is simple needs
    but “made up for in
    volume”
    – Millions of concurrent users
•   Think of sites like Amazon
    or Google
•   Think of non-transactional
    tasks like loading catalog
    data to display product
    page, or environment
    preferences
NOSQL DEFINITION(S)
From Sam
What is NOSQL?
•   “Not Only SQL” - this is not a holy war

•   1870: Modern study of set theory begins

•   1970: Codd writes “A Relational Model of
    Data for Large Shared Data Banks”

•   1970 – 1980: Commercial implementations
    of Codd's theory are released
From Sam
What is NOSQL?
•   1970 - ~2000: the same sorts of databases
    were made (plus a few niche products)

•   Dot-Com Bubble forced the same data tier
    problems but at a new scale (Amazon),
    forcing innovation out of necessity

•   2000 – present: innovations are becoming
    open source and “main stream” (Hadoop)
From Sam
So What is NOSQL Really?




New ways of looking at dynamic data storage

   and querying for larger scale systems.


  (scale = concurrent users and data size)
NoSQL Common Traits

•   Non-relational
•   Non-schematized/schema-free
•   Open source
•   Distributed
•   Eventual consistency
•   “Web scale”
•   Developed at big Internet companies
CONCEPTS
Consistency
•   CAP Theorem
    – Databases may only excel at two of the following
      three attributes: consistency, availability and partition
      tolerance
•   NoSQL does not offer “ACID” guarantees
    – Atomicity, consistency, isolation and durability
•   Instead offers “eventual consistency”
    – Similar to DNS propagation
Consistency
•    Things like inventory, account balances should be
     consistent
     –   Imagine updating a server in Seattle that stock was depleted
     –   Imagine not updating the server in NY
     –   Customer in NY goes to order 50 pieces of the item
     –   Order processed even though no stock
•    Things like catalog information don’t have to be,
     at least not immediately
     – If a new item is entered into the catalog, it’s OK for some
       customers to see it even before the other customers’ server
       knows about it
•    But catalog info must come up quickly
     – Therefore don’t lock data in one location while waiting to
       update the other
•    Therefore, OK to sacrifice consistency for speed,
     in some cases
CAP Theorem

Relational
                          Consistency


                                                       NoSQL




              Partition
                                        Availability
             Tolerance
Indexing
•   Most NoSQL databases are indexed by key
•   Some allow so-called “secondary” indexes
•   Often the primary key indexes are
    clustered
•   HBase uses HDFS (the Hadoop Distributed
    File System), which is append-only
    – Writes are logged
    – Logged writes are batched
    – File is re-created and sorted
Queries
•   Typically no query language
•   Instead, create procedural program
•   Sometimes SQL is supported
•   Sometimes MapReduce code is used…
MapReduce
•   Map step: pre-processes data
•   Reduce step: summarizes/aggregates data
•   Most typical of Hadoop and used with
    Wide Column Stores, esp. HBase
•   Amazon Web Services’ Elastic MapReduce
    (EMR) can read/write DynamoDB, S3,
    Relational Database Service (RDS)
•   “Hive” offers a HiveQL (SQL-like)
    abstraction over MR
    – Use with Hive tables
    – Use with HBase
Sharding
•   A partitioning pattern where separate
    servers store partitions
•   Fan-out queries supported
•   Partitions may be duplicated, so
    replication also provided
    – Good for disaster recovery
•   Since “shards” can be geographically
    distributed, sharding can act like a CDN
•   Good for keeping data close to processing
    – Reduces network traffic when MapReduce splitting
      takes place
NOSQL CATEGORIES
Key-Value Stores
•   The most common; not necessarily the most
    popular
•   Has rows, each with something like a big
    dictionary/associative array
    – Schema may differ from row to row
•   Common on cloud platforms
    – e.g. Amazon SimpleDB, Azure Table Storage
•   MemcacheDB, Voldemort, Couchbase
•   DynamoDB (AWS), Dynomite, Redis and Riak
Key-Value Stores
Database


     Table: Customers            Table: Orders
      Row ID: 101                 Row ID: 1501
      First_Name: Andrew          Price: 300 USD
      Last_Name: Brust            Item1: 52134
      Address: 123 Main Street
                                  Item2: 24457
      Last_Order: 1501



      Row ID: 202                 Row ID: 1502
      First_Name: Jane            Price: 2500 GBP
      Last_Name: Doe              Item1: 98456
      Address: 321 Elm Street
                                  Item2: 59428
      Last_Order: 1502
Wide Column Stores
•   Has tables with declared column families
    – Each column family has “columns” which are KV pairs that
      can vary from row to row
•   These are the most foundational for large
    sites
    – BigTable (Google)
    – HBase (Originally part of Yahoo-dominated Hadoop project)
    – Cassandra (Facebook)
     Calls column families “super columns” and tables “super
       column families”
•   They are the most “Big Data”-ready
    – Especially HBase + Hadoop
Wide Column Stores
Table: Customers                Table: Orders
 Row ID: 101
 Super Column: Name
  Column: First_Name:            Row ID: 1501
 Andrew                          Super Column: Pricing
  Column: Last_Name: Brust       Column: Price: 300 USD
 Super Column: Address           Super Column: Items
  Column: Number: 123            Column: Item1: 52134
  Column: Street: Main Street    Column: Item2: 24457
 Super Column: Orders
  Column: Last_Order: 1501


 Row ID: 202
                                 Row ID: 1502
 Super Column: Name
  Column: First_Name: Jane       Super Column: Pricing
  Column: Last_Name: Doe         Column: Price: 2500
 Super Column: Address           GBP
  Column: Number: 321            Super Column: Items
  Column: Street: Elm Street     Column: Item1: 98456
 Super Column: Orders            Column: Item2: 59428
  Column: Last_Order: 1502
Wide Column Stores
Document Stores
•   Have “databases,” which are akin to tables
•   Have “documents,” akin to rows
    – Documents are typically JSON objects
    – Each document has properties and values
    – Values can be scalars, arrays, links to documents in other databases
      or sub-documents (i.e. contained JSON objects - Allows for hierarchical
      storage)
    – Can have attachments as well
•   Old versions are retained
    – So Doc Stores work well for content management
•   Some view doc stores as specialized KV stores
•   Most popular with developers, startups, VCs
•   The biggies:
    – CouchDB
        – Derivatives
    – MongoDB
Document Store
Application Orientation
•   Documents can each be addressed by
    URIs
•   CouchDB supports full REST interface
•   Very geared towards JavaScript and JSON
    – Documents are JSON objects
    – CouchDB/MongoDB use JavaScript as native
      language
•   In CouchDB, “view functions” also have
    unique URIs and they return HTML
    – So you can build entire applications in the database
Document Stores
Database: Customers     Database: Orders
 Document ID: 101
 First_Name: Andrew
 Last_Name: Brust
 Address:                Document ID: 1501
                         Price: 300 USD
  Number: 123            Item1: 52134
  Street: Main Street
                         Item2: 24457
 Orders:
  Most_recent: 1501


 Document ID: 202
 First_Name: Jane
 Last_Name: Doe
                         Document ID: 1502
 Address:
                         Price: 2500 GBP
  Number: 321            Item1: 98456
  Street: Elm Street     Item2: 59428
 Orders:
  Most_recent: 1502
Document Stores
Graph Databases
•   Great for social network applications and
    others where relationships are important
•   Nodes and edges
    – Edge like a join
    – Nodes like rows in a table
•   Nodes can also have properties and
    values
•   Neo4j is a popular graph db
Graph Databases
Database
                            George Washington




                       Street: 123 Main Street
                       City: New York
    Friend of          State: NY
                       Zip:    10014



                                     Address


                                                        Placed order
                              Andrew Brust                                      ID: 252
                                                                                Total Price: 300 USD

                                                                       Item1                           Item2

                Joe Smith                    Jane Doe                    ID: 52134              ID: 24457
                                                                         Type: Dress            Type: Shirt
                                                                         Color: Blue            Color: Red
    Commented on                                 Sent invitation to
        photo by
PROVISIONING, MARKET,
APPLICABILITY
NoSQL on Windows Azure
•   Platform as a Service
    – Cloudant: https://cloudant.com/azure/
    – MongoDB (via MongoLab):
      http://blog.mongolab.com/2012/10/azure/
•   MongoDB, DIY:
    – On an Azure Worker Role:
      http://www.mongodb.org/display/DOCS/MongoDB+on+Azur
      e+Worker+Roles
    – On a Windows VM:
      http://www.mongodb.org/display/DOCS/MongoDB+on+Azur
      e+VM+-+Windows+Installer
    – On a Linux VM:
      http://www.mongodb.org/display/DOCS/MongoDB+on+Azur
      e+VM+-+Linux+Tutorial
      http://www.windowsazure.com/en-
      us/manage/linux/common-tasks/mongodb-on-a-linux-vm/
NoSQL on Windows Azure
•   Others, DIY (Linux VMs):
    – Couchbase: http://blog.couchbase.com/couchbase-server-
      new-windows-azure
    – CouchDB:
      http://ossonazure.interoperabilitybridges.com/articles/couch
      db-installer-for-windows-azure
    – Riak: http://basho.com/blog/technical/2012/10/09/Riak-on-
      Microsoft-Azure/
    – Redis:
      http://blogs.msdn.com/b/tconte/archive/2012/06/08/running-
      redis-on-a-centos-linux-vm-in-windows-azure.aspx
    – Cassandra: http://www.windowsazure.com/en-
      us/manage/linux/other-resources/how-to-run-cassandra-
      with-linux/
From Sam
The High-Level Shake Out
•   Hadoop will continue to crush data
    warehousing


•   MongoDB will be the top MySQL / on-prem
    alternative

•   Cloudant will be the top as-a-Service /
    Cloud database

•   Basho [Riak] is pivoting toward cloud
    object store
NoSQL + BI
•   NoSQL databases are bad for ad hoc
    query and data warehousing
•   BI applications involve models; models
    rely on schema
•   Extract, transform and load (ETL) may be
    your friend
•   Wide-column stores, however are good for
    “Big Data”
    – See next slide
•   Wide-column stores and column-oriented
    databases are similar technologically
NoSQL + Big Data
•   Big Data and NoSQL are interrelated
•   Typically, Wide-Column stores used in Big
    Data scenarios
•   Prime example:
    – HBase and Hadoop
•   Why?
    – Lack of indexing not a problem
    – Consistency not an issue
    – Fast reads very important
    – Distributed file systems important too
    – Commodity hardware and disk assumptions also
      important
    – Not Web scale but massive scale-out, so similar
      concerns
TAKE-AWAYS
Compromises
•   Eventual consistency
•   Write buffering
•   Only primary keys can be indexed
•   Queries must be written as programs
•   Tooling
    – Productivity (= money)
Summing Up
•   Line of Business -> Relational
•   Large, public (consumer)-facing sites ->
    NoSQL

•   Complex data structures -> Relational
•   Big Data -> NoSQL

•   Transactional -> Relational
•   Content Management -> NoSQL

•   Enterprise->Relational
•   Consumer Web -> NoSQL
Thank you



•   andrew.brust@bluebadgeinsights.com
•   @andrewbrust on twitter
•   Want to get on Blue Badge Insights’ list?”
    Text “bluebadge” to 22828

More Related Content

What's hot

Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisAndrew Brust
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIAndrew Brust
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooAndrew Brust
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionAndrew Brust
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012Andrew Brust
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms Andrew Brust
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational DatabasesChris Baglieri
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sqlAnuja Gunale
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 

What's hot (20)

Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BI
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in Action
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Rdbms vs. no sql
Rdbms vs. no sqlRdbms vs. no sql
Rdbms vs. no sql
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 

Similar to NoSQL and The Big Data Hullabaloo

Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
No sql or Not only SQL
No sql or Not only SQLNo sql or Not only SQL
No sql or Not only SQLAjay Jha
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management SystemAmar Myana
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 

Similar to NoSQL and The Big Data Hullabaloo (20)

Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
No sql or Not only SQL
No sql or Not only SQLNo sql or Not only SQL
No sql or Not only SQL
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management System
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
NoSql
NoSqlNoSql
NoSql
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Modern database
Modern databaseModern database
Modern database
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
No sql databases
No sql databasesNo sql databases
No sql databases
 

More from Andrew Brust

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabsAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012Andrew Brust
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataAndrew Brust
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmAndrew Brust
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Andrew Brust
 

More from Andrew Brust (7)

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabs
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch Paradigm
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

NoSQL and The Big Data Hullabaloo

  • 1. A Practical Look at the NOSQL and Big Data Hullabaloo Andrew J. Brust Sam Bisbee CEO and Founder Senior Doing Stuff Person Blue Badge Insights Cloudant (In Absentia) Level: Intermediate
  • 2. Meet Andrew • CEO and Founder, Blue Badge Insights • Big Data blogger for ZDNet • Microsoft Regional Director, MVP • Co-chair VSLive! and 17 years as a speaker • Founder, Microsoft BI User Group of NYC – http://www.msbinyc.com • Co-moderator, NYC .NET Developers Group – http://www.nycdotnetdev.com • “Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News • brustblog.com, Twitter: @andrewbrust
  • 3. My New Blog (bit.ly/bigondata)
  • 5. Meet Sam • Wait…you can’t. He’s not here. • Sam Bisbee – Director of Technical Business Development, Cloudant – He prefers “Senior Doing Stuff Person” Which is ironic • I’ve preserved a few of his slides. • Look for: From Sam in upper-right-hand corner
  • 6. Agenda • Why NoSQL? • NoSQL Definition(s) • Concepts • NoSQL Categories • Provisioning, market, applicability • Take-aways
  • 8. NoSQL Data Fodder Addresses Preferences Documents Friends, Notes Followers
  • 9. “Web Scale” • This the term used to justify NoSQL • Scenario is simple needs but “made up for in volume” – Millions of concurrent users • Think of sites like Amazon or Google • Think of non-transactional tasks like loading catalog data to display product page, or environment preferences
  • 11. From Sam What is NOSQL? • “Not Only SQL” - this is not a holy war • 1870: Modern study of set theory begins • 1970: Codd writes “A Relational Model of Data for Large Shared Data Banks” • 1970 – 1980: Commercial implementations of Codd's theory are released
  • 12. From Sam What is NOSQL? • 1970 - ~2000: the same sorts of databases were made (plus a few niche products) • Dot-Com Bubble forced the same data tier problems but at a new scale (Amazon), forcing innovation out of necessity • 2000 – present: innovations are becoming open source and “main stream” (Hadoop)
  • 13. From Sam So What is NOSQL Really? New ways of looking at dynamic data storage and querying for larger scale systems. (scale = concurrent users and data size)
  • 14. NoSQL Common Traits • Non-relational • Non-schematized/schema-free • Open source • Distributed • Eventual consistency • “Web scale” • Developed at big Internet companies
  • 16. Consistency • CAP Theorem – Databases may only excel at two of the following three attributes: consistency, availability and partition tolerance • NoSQL does not offer “ACID” guarantees – Atomicity, consistency, isolation and durability • Instead offers “eventual consistency” – Similar to DNS propagation
  • 17. Consistency • Things like inventory, account balances should be consistent – Imagine updating a server in Seattle that stock was depleted – Imagine not updating the server in NY – Customer in NY goes to order 50 pieces of the item – Order processed even though no stock • Things like catalog information don’t have to be, at least not immediately – If a new item is entered into the catalog, it’s OK for some customers to see it even before the other customers’ server knows about it • But catalog info must come up quickly – Therefore don’t lock data in one location while waiting to update the other • Therefore, OK to sacrifice consistency for speed, in some cases
  • 18. CAP Theorem Relational Consistency NoSQL Partition Availability Tolerance
  • 19. Indexing • Most NoSQL databases are indexed by key • Some allow so-called “secondary” indexes • Often the primary key indexes are clustered • HBase uses HDFS (the Hadoop Distributed File System), which is append-only – Writes are logged – Logged writes are batched – File is re-created and sorted
  • 20. Queries • Typically no query language • Instead, create procedural program • Sometimes SQL is supported • Sometimes MapReduce code is used…
  • 21. MapReduce • Map step: pre-processes data • Reduce step: summarizes/aggregates data • Most typical of Hadoop and used with Wide Column Stores, esp. HBase • Amazon Web Services’ Elastic MapReduce (EMR) can read/write DynamoDB, S3, Relational Database Service (RDS) • “Hive” offers a HiveQL (SQL-like) abstraction over MR – Use with Hive tables – Use with HBase
  • 22. Sharding • A partitioning pattern where separate servers store partitions • Fan-out queries supported • Partitions may be duplicated, so replication also provided – Good for disaster recovery • Since “shards” can be geographically distributed, sharding can act like a CDN • Good for keeping data close to processing – Reduces network traffic when MapReduce splitting takes place
  • 24. Key-Value Stores • The most common; not necessarily the most popular • Has rows, each with something like a big dictionary/associative array – Schema may differ from row to row • Common on cloud platforms – e.g. Amazon SimpleDB, Azure Table Storage • MemcacheDB, Voldemort, Couchbase • DynamoDB (AWS), Dynomite, Redis and Riak
  • 25. Key-Value Stores Database Table: Customers Table: Orders Row ID: 101 Row ID: 1501 First_Name: Andrew Price: 300 USD Last_Name: Brust Item1: 52134 Address: 123 Main Street Item2: 24457 Last_Order: 1501 Row ID: 202 Row ID: 1502 First_Name: Jane Price: 2500 GBP Last_Name: Doe Item1: 98456 Address: 321 Elm Street Item2: 59428 Last_Order: 1502
  • 26. Wide Column Stores • Has tables with declared column families – Each column family has “columns” which are KV pairs that can vary from row to row • These are the most foundational for large sites – BigTable (Google) – HBase (Originally part of Yahoo-dominated Hadoop project) – Cassandra (Facebook) Calls column families “super columns” and tables “super column families” • They are the most “Big Data”-ready – Especially HBase + Hadoop
  • 27. Wide Column Stores Table: Customers Table: Orders Row ID: 101 Super Column: Name Column: First_Name: Row ID: 1501 Andrew Super Column: Pricing Column: Last_Name: Brust Column: Price: 300 USD Super Column: Address Super Column: Items Column: Number: 123 Column: Item1: 52134 Column: Street: Main Street Column: Item2: 24457 Super Column: Orders Column: Last_Order: 1501 Row ID: 202 Row ID: 1502 Super Column: Name Column: First_Name: Jane Super Column: Pricing Column: Last_Name: Doe Column: Price: 2500 Super Column: Address GBP Column: Number: 321 Super Column: Items Column: Street: Elm Street Column: Item1: 98456 Super Column: Orders Column: Item2: 59428 Column: Last_Order: 1502
  • 29. Document Stores • Have “databases,” which are akin to tables • Have “documents,” akin to rows – Documents are typically JSON objects – Each document has properties and values – Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. contained JSON objects - Allows for hierarchical storage) – Can have attachments as well • Old versions are retained – So Doc Stores work well for content management • Some view doc stores as specialized KV stores • Most popular with developers, startups, VCs • The biggies: – CouchDB – Derivatives – MongoDB
  • 30. Document Store Application Orientation • Documents can each be addressed by URIs • CouchDB supports full REST interface • Very geared towards JavaScript and JSON – Documents are JSON objects – CouchDB/MongoDB use JavaScript as native language • In CouchDB, “view functions” also have unique URIs and they return HTML – So you can build entire applications in the database
  • 31. Document Stores Database: Customers Database: Orders Document ID: 101 First_Name: Andrew Last_Name: Brust Address: Document ID: 1501 Price: 300 USD Number: 123 Item1: 52134 Street: Main Street Item2: 24457 Orders: Most_recent: 1501 Document ID: 202 First_Name: Jane Last_Name: Doe Document ID: 1502 Address: Price: 2500 GBP Number: 321 Item1: 98456 Street: Elm Street Item2: 59428 Orders: Most_recent: 1502
  • 33. Graph Databases • Great for social network applications and others where relationships are important • Nodes and edges – Edge like a join – Nodes like rows in a table • Nodes can also have properties and values • Neo4j is a popular graph db
  • 34. Graph Databases Database George Washington Street: 123 Main Street City: New York Friend of State: NY Zip: 10014 Address Placed order Andrew Brust ID: 252 Total Price: 300 USD Item1 Item2 Joe Smith Jane Doe ID: 52134 ID: 24457 Type: Dress Type: Shirt Color: Blue Color: Red Commented on Sent invitation to photo by
  • 36. NoSQL on Windows Azure • Platform as a Service – Cloudant: https://cloudant.com/azure/ – MongoDB (via MongoLab): http://blog.mongolab.com/2012/10/azure/ • MongoDB, DIY: – On an Azure Worker Role: http://www.mongodb.org/display/DOCS/MongoDB+on+Azur e+Worker+Roles – On a Windows VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azur e+VM+-+Windows+Installer – On a Linux VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azur e+VM+-+Linux+Tutorial http://www.windowsazure.com/en- us/manage/linux/common-tasks/mongodb-on-a-linux-vm/
  • 37. NoSQL on Windows Azure • Others, DIY (Linux VMs): – Couchbase: http://blog.couchbase.com/couchbase-server- new-windows-azure – CouchDB: http://ossonazure.interoperabilitybridges.com/articles/couch db-installer-for-windows-azure – Riak: http://basho.com/blog/technical/2012/10/09/Riak-on- Microsoft-Azure/ – Redis: http://blogs.msdn.com/b/tconte/archive/2012/06/08/running- redis-on-a-centos-linux-vm-in-windows-azure.aspx – Cassandra: http://www.windowsazure.com/en- us/manage/linux/other-resources/how-to-run-cassandra- with-linux/
  • 38. From Sam The High-Level Shake Out • Hadoop will continue to crush data warehousing • MongoDB will be the top MySQL / on-prem alternative • Cloudant will be the top as-a-Service / Cloud database • Basho [Riak] is pivoting toward cloud object store
  • 39. NoSQL + BI • NoSQL databases are bad for ad hoc query and data warehousing • BI applications involve models; models rely on schema • Extract, transform and load (ETL) may be your friend • Wide-column stores, however are good for “Big Data” – See next slide • Wide-column stores and column-oriented databases are similar technologically
  • 40. NoSQL + Big Data • Big Data and NoSQL are interrelated • Typically, Wide-Column stores used in Big Data scenarios • Prime example: – HBase and Hadoop • Why? – Lack of indexing not a problem – Consistency not an issue – Fast reads very important – Distributed file systems important too – Commodity hardware and disk assumptions also important – Not Web scale but massive scale-out, so similar concerns
  • 42. Compromises • Eventual consistency • Write buffering • Only primary keys can be indexed • Queries must be written as programs • Tooling – Productivity (= money)
  • 43. Summing Up • Line of Business -> Relational • Large, public (consumer)-facing sites -> NoSQL • Complex data structures -> Relational • Big Data -> NoSQL • Transactional -> Relational • Content Management -> NoSQL • Enterprise->Relational • Consumer Web -> NoSQL
  • 44. Thank you • andrew.brust@bluebadgeinsights.com • @andrewbrust on twitter • Want to get on Blue Badge Insights’ list?” Text “bluebadge” to 22828