SlideShare a Scribd company logo
Introduction to Big Data
and NoSQL
SQL Azure Saturday
April, 21, 2012
                Don Demsak
                Advisory Solutions Architect
                EMC Consulting
                www.donxml.com




                                               1
Meet Don

• Advisory Solutions Architect
   – EMC Consulting
      • Application Architecture, Development & Design
• DonXml.com, Twitter: donxml
• Email – don@donxml.com
• SlideShare - http://www.slideshare.net/dondemsak




                                                         2
The era of Big Data


                      3
How did we get here?
• Expensive                • Monoculture
   –   Processors             – Limit CPU cycles
   –   Disk space             – Limit disk space
   –   Memory                 – Limit memory
   –   Operating Systems      – Limited OS
   –   Software                 Development
   –   Programmers            – Limited Software
                              – Programmers
                                  • Mono-lingual
                                  • Mono-persistence




                                                       4
Typical RDBMS Implementations
• Fixed table schemas
• Small but frequent reads/writes
• Large batch transactions
• Focus on ACID
  –   Atomicity
  –   Consistency
  –   Isolation
  –   Durability




                                    5
How we scale RDBMS
implementations




                     6
1st Step – Build a relational database




                  Database




                                         7
2nd Step – Table Partitioning

                  p1 p2 p3




                  Database




                                8
3rd Step – Database Partitioning

   Browser      Web Tier   B/L Tier   Database
  Customer #1




    Browser     Web Tier   B/L Tier   Database
  Customer #2




    Browser     Web Tier   B/L Tier   Database
  Customer #3




                                                 9
4th Step – Move to the cloud?

   Browser      Web Tier   B/L Tier   SQL Azure
                                      Federation
  Customer #1



                                      SQL Azure
    Browser     Web Tier   B/L Tier   Federation

  Customer #2



                                      SQL Azure
    Browser     Web Tier   B/L Tier   Federation

  Customer #3




                                                   10
There has to be other ways


                             11
Polyglot Persistence


                       12
Polyglot Programmer


                      13
14
Where Did NoSQL Originate?
• 1998 - Carlo Strozzi
  – NoSQL project - lightweight open-source relational DB
    with no SQL interface
• 2009 - Eric Evans & Johan Oskarsson of Last.fm
  wanted to organize an event to discuss open-
  source distributed databases




                                                            15
NoSQL (loose) Definition
• (often) Open source
• Non-relational
• Distributed
• (often) don‟t guarantee ACID




                                 16
Atlanta 2009
• No:sql(east) conference
   – select fun, profit from real_world where relational=false
• Billed as “conference of no-rel datastores”




                                                                 17
Types Of NoSQL Data Stores




                             18
5 Groups of Data Models
  Relational


  Document


  Key Value


  Graph


  Column Family



                          19
Document Store
• Apache Jackrabbit
• CouchDB
• MongoDB
• SimpleDB
• XML Databases
  – MarkLogic Server
  – eXist.




                       20
Document?
• Okay think of a web page...
  – Relational model requires column/tag
  – Lots of empty columns
  – Wasted space
• Document model just stores the pages as is
  – Saves on space
  – Very flexible.




                                               21
Graph Storage
• AllegroGraph
• Core Data
• Neo4j
• DEX
• FlockDB
• Microsoft Trinity (research project)
   – http://research.microsoft.com/en-us/projects/trinity/




                                                             22
What‟s a graph?
• Graph consists of
  – Node („stations‟ of the graph)
  – Edges (lines between them)
• FlockDB
  – Created by the Twitter folks
  – Nodes = Users
  – Edges = Nature of relationship between nodes.




                                                    23
Key/Value Stores
• On disk
• Cache in Ram
• Eventually Consistent
  – Weak Definition
     • “If no updates occur for a period, eventually all updates will
       propagate through the system and all replicas will be consistent”
  – Strong Definition
     • “for a given update and a given replica eventually either the
       update reaches the replica or the replica retires”

• Ordered
  – Distributed Hash Table allows lexicographical processing



                                                                           24
Key/Value Examples
• Azure AppFabric Cache
• Memcache-d
• VMWare vFabric GemFire




                           25
Object Databases
• Db4o
• GemStone/S
• InterSystems Caché
• Objectivity/DB
• ZODB




                       26
Tabular
• BigTable
• Mnesia
• Hbase
• Hypertable
• Azure Table Storage
• SQL Server 2012




                        27
Azure Table Storage Demo




                           28
Big Data




           29
Big Data Definition
• Volumes & volumes of data
• Unstructured
• Semi-structured
• Not suited for Relational Databases
• Often utilizes MapReduce frameworks




                                        30
Big Data Examples
• Cassandra
• Hadoop
• Greenplum
• Azure Storage
• EMC Atmos
• Amazon S3
• SQL Azure (with Federations support)



                                         31
Real World Example
       • Twitter
          – The challenges
             • Needs to store many graphs
                    Who you are following
                    Who‟s following you
                    Who you receive phone
                     notifications from etc
             • To deliver a tweet requires
               rapid paging of followers
             • Heavy write load as followers
               are added and removed
             • Set arithmetic for @mentions
               (intersection of users).



                                               32
What did they try?
• Started with Relational
  Databases
• Tried Key-Value storage
  of denormalized lists
• Did it work?
   – Nope
      • Either good at
           Handling the write load
           Or paging large
            amounts of data
           But not both



                                      33
What did they need?
• Simplest possible thing that would work
• Allow for horizontal partitioning
• Allow write operations to
• Arrive out of order
   – Or be processed more than once
   – Failures should result in redundant work
• Not lost work!




                                                34
The Result was FlockDB
• Stores graph data
• Not optimized for graph traversal operations
• Optimized for large adjacency lists
  – List of all edges in a graph
     • Key is the edge value a set of the node end points

• Optimized for fast read and write
• Optimized for page-able set arithmetic.




                                                            35
How Does it Work?
• Stores graphs as sets of edges between nodes
• Data is partitioned by node
  – All queries can be answered by a single partition
• Write operations are idempotent
  – Can be applied multiple times without changing the
    result
• And commutative
  – Changing the order of operands doesn‟t change the
    result.



                                                         36
Working With Big Data




                        37
ACID
• Atomicity
   – All or Nothing
• Consistency
   – Valid according to all defined rules
• Isolation
   – No transaction should be able to interfere with another
     transaction
• Durability
   – Once a transaction has been committed, it will remain
     so, even in the event of power loss, crashes, or errors


                                                               38
BASE
• Basically Available
   – High availability but not always consistent
• Soft state
   – Background cleanup mechanism
• Eventual consistency
   – Given a sufficiently long period of time over which no
     changes are sent, all updates can be expected to
     propagate eventually through the system and all the
     replicas will be consistent.




                                                              39
Traditional (relational) Approach


                    Extract   Transactional Data Store




              Transform



                              Data Warehouse
                     Load




                                                         40
Big Data Approach
• MapReduce Pattern/Framework
  – an Input Reader
  – Map Function – To transform to a common shape
    (format)
  – a partition function
  – a compare function
  – Reduce Function
  – an Output Writer




                                                    41
MongoDB Example

> // map function                        > // reduce function
> m = function(){                        > r = function( key , values ){
...    this.tags.forEach(                ...    var total = 0;
...        function(z){                  ...    for ( var i=0; i<values.length; i++ )
...            emit( z , { count : 1 }   ...        total += values[i].count;
);                                       ...    return { count : total };
...        }                             ...};
...    );
...};




           > // execute
           > res = db.things.mapReduce(m, r, { out : "myoutput" } );




                                                                                        42
MongoDB Demo




               43
Big Data on Azure
• Azure Table Storage
  – Azure Service Bus
• SQL Azure Federations
• MongoDB on Azure
  – http://www.mongodb.org/display/DOCS/MongoDB+on+Azure

• Hadoop on Azure
  – https://www.hadooponazure.com/




                                                           44
Using Azure for Computing


                                           Data
             Data                 Worker
                                           Data
    Client          Master        Worker

             Job/Task Scheduler   Worker
                                           Data




                                                  45
Moving to Event Based Architecture
      Web Role                                       Worker Role


         Web Role                                 Worker Role


            Web Role                          Worker Role




                         Req   Req   Req



                                Queue



             Web Role                         Worker Role


         Web Role         Monitor queue           Worker Role
                          length against
      Web Role          user‟s expectations          Worker Role




                                                                   46
Aggregate Stores




                   47
Visualizing Aggregates                              Orders




  ID: 1001


  Customer: Ann

  Line Items                                        Customers


    32411234        2    $48   $96
    707423234       1    $56   456

    125145          1    $24   $24



                                                    Order Lines
  Payment Details


   Card: AmEx
   CC#: 12343
   Expiration: 07/2015               Credit Cards




                                                                  48
Visualizing Aggregates
  ID: 1001


  Customer: Ann

  Line Items


    32411234        2    $48   $96   {
                                     “SalesOrdersView”:{
    707423234       1    $56   456     ID: 1001,
                                       Customer: Ann,
    125145          1    $24   $24      LineItems: []
                                     ……………..
                                     …………….
                                     ……………..
  Payment Details
                                     }
                                     }
   Card: AmEx
   CC#: 12343
   Expiration: 07/2015




                                                           49
MongoDB on Azure Demo




                        50
Next Steps
• Learn a NoSQL product
  – Great place to start – AppFabric Cache, Azure Table
    Storage, MongoDB
• Pick a new programming language to learn
  – Not Java or C#/VB
  – Node.js, JavaScript, F#




                                                          51
THANK YOU



            52

More Related Content

What's hot

NoSql
NoSqlNoSql
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
Krish_ver2
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
J M
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
Rajith Pemabandu
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
Bishal Khanal
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
Pooja Dixit
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
VESIT/University of Mumbai
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
DHIVYADEVAKI
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
Prashanth Babu
 

What's hot (20)

NoSql
NoSqlNoSql
NoSql
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 

Viewers also liked

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Intro to NoSQL
Intro to NoSQLIntro to NoSQL
Intro to NoSQL
Trisha Gee
 
NoSQL Databases - Lecture 12 - Introduction to Databases (1007156ANR)
NoSQL Databases - Lecture 12 - Introduction to Databases (1007156ANR)NoSQL Databases - Lecture 12 - Introduction to Databases (1007156ANR)
NoSQL Databases - Lecture 12 - Introduction to Databases (1007156ANR)
Beat Signer
 
J2EE and layered architecture
J2EE and layered architectureJ2EE and layered architecture
J2EE and layered architecture
Suman Behara
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques Kalhan Liyanage
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
Venu Anuganti
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 

Viewers also liked (13)

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
spring
springspring
spring
 
Intro to NoSQL
Intro to NoSQLIntro to NoSQL
Intro to NoSQL
 
NoSQL Databases - Lecture 12 - Introduction to Databases (1007156ANR)
NoSQL Databases - Lecture 12 - Introduction to Databases (1007156ANR)NoSQL Databases - Lecture 12 - Introduction to Databases (1007156ANR)
NoSQL Databases - Lecture 12 - Introduction to Databases (1007156ANR)
 
J2EE and layered architecture
J2EE and layered architectureJ2EE and layered architecture
J2EE and layered architecture
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar to Intro to Big Data and NoSQL

Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
Saltmarch Media
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
RTigger
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
lisapaglia
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
hansen3032
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
BigDataCloud
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
Shivji Kumar Jha
 
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Mydbops
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
Michael Kopp
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
Ines Sombra
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnelles
Xavier Gorse
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in Ruby
Rakuten Group, Inc.
 
NoSQL
NoSQLNoSQL

Similar to Intro to Big Data and NoSQL (20)

Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnelles
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in Ruby
 
NoSQL
NoSQLNoSQL
NoSQL
 

Recently uploaded

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Intro to Big Data and NoSQL

  • 1. Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com 1
  • 2. Meet Don • Advisory Solutions Architect – EMC Consulting • Application Architecture, Development & Design • DonXml.com, Twitter: donxml • Email – don@donxml.com • SlideShare - http://www.slideshare.net/dondemsak 2
  • 3. The era of Big Data 3
  • 4. How did we get here? • Expensive • Monoculture – Processors – Limit CPU cycles – Disk space – Limit disk space – Memory – Limit memory – Operating Systems – Limited OS – Software Development – Programmers – Limited Software – Programmers • Mono-lingual • Mono-persistence 4
  • 5. Typical RDBMS Implementations • Fixed table schemas • Small but frequent reads/writes • Large batch transactions • Focus on ACID – Atomicity – Consistency – Isolation – Durability 5
  • 6. How we scale RDBMS implementations 6
  • 7. 1st Step – Build a relational database Database 7
  • 8. 2nd Step – Table Partitioning p1 p2 p3 Database 8
  • 9. 3rd Step – Database Partitioning Browser Web Tier B/L Tier Database Customer #1 Browser Web Tier B/L Tier Database Customer #2 Browser Web Tier B/L Tier Database Customer #3 9
  • 10. 4th Step – Move to the cloud? Browser Web Tier B/L Tier SQL Azure Federation Customer #1 SQL Azure Browser Web Tier B/L Tier Federation Customer #2 SQL Azure Browser Web Tier B/L Tier Federation Customer #3 10
  • 11. There has to be other ways 11
  • 14. 14
  • 15. Where Did NoSQL Originate? • 1998 - Carlo Strozzi – NoSQL project - lightweight open-source relational DB with no SQL interface • 2009 - Eric Evans & Johan Oskarsson of Last.fm wanted to organize an event to discuss open- source distributed databases 15
  • 16. NoSQL (loose) Definition • (often) Open source • Non-relational • Distributed • (often) don‟t guarantee ACID 16
  • 17. Atlanta 2009 • No:sql(east) conference – select fun, profit from real_world where relational=false • Billed as “conference of no-rel datastores” 17
  • 18. Types Of NoSQL Data Stores 18
  • 19. 5 Groups of Data Models Relational Document Key Value Graph Column Family 19
  • 20. Document Store • Apache Jackrabbit • CouchDB • MongoDB • SimpleDB • XML Databases – MarkLogic Server – eXist. 20
  • 21. Document? • Okay think of a web page... – Relational model requires column/tag – Lots of empty columns – Wasted space • Document model just stores the pages as is – Saves on space – Very flexible. 21
  • 22. Graph Storage • AllegroGraph • Core Data • Neo4j • DEX • FlockDB • Microsoft Trinity (research project) – http://research.microsoft.com/en-us/projects/trinity/ 22
  • 23. What‟s a graph? • Graph consists of – Node („stations‟ of the graph) – Edges (lines between them) • FlockDB – Created by the Twitter folks – Nodes = Users – Edges = Nature of relationship between nodes. 23
  • 24. Key/Value Stores • On disk • Cache in Ram • Eventually Consistent – Weak Definition • “If no updates occur for a period, eventually all updates will propagate through the system and all replicas will be consistent” – Strong Definition • “for a given update and a given replica eventually either the update reaches the replica or the replica retires” • Ordered – Distributed Hash Table allows lexicographical processing 24
  • 25. Key/Value Examples • Azure AppFabric Cache • Memcache-d • VMWare vFabric GemFire 25
  • 26. Object Databases • Db4o • GemStone/S • InterSystems Caché • Objectivity/DB • ZODB 26
  • 27. Tabular • BigTable • Mnesia • Hbase • Hypertable • Azure Table Storage • SQL Server 2012 27
  • 29. Big Data 29
  • 30. Big Data Definition • Volumes & volumes of data • Unstructured • Semi-structured • Not suited for Relational Databases • Often utilizes MapReduce frameworks 30
  • 31. Big Data Examples • Cassandra • Hadoop • Greenplum • Azure Storage • EMC Atmos • Amazon S3 • SQL Azure (with Federations support) 31
  • 32. Real World Example • Twitter – The challenges • Needs to store many graphs  Who you are following  Who‟s following you  Who you receive phone notifications from etc • To deliver a tweet requires rapid paging of followers • Heavy write load as followers are added and removed • Set arithmetic for @mentions (intersection of users). 32
  • 33. What did they try? • Started with Relational Databases • Tried Key-Value storage of denormalized lists • Did it work? – Nope • Either good at  Handling the write load  Or paging large amounts of data  But not both 33
  • 34. What did they need? • Simplest possible thing that would work • Allow for horizontal partitioning • Allow write operations to • Arrive out of order – Or be processed more than once – Failures should result in redundant work • Not lost work! 34
  • 35. The Result was FlockDB • Stores graph data • Not optimized for graph traversal operations • Optimized for large adjacency lists – List of all edges in a graph • Key is the edge value a set of the node end points • Optimized for fast read and write • Optimized for page-able set arithmetic. 35
  • 36. How Does it Work? • Stores graphs as sets of edges between nodes • Data is partitioned by node – All queries can be answered by a single partition • Write operations are idempotent – Can be applied multiple times without changing the result • And commutative – Changing the order of operands doesn‟t change the result. 36
  • 37. Working With Big Data 37
  • 38. ACID • Atomicity – All or Nothing • Consistency – Valid according to all defined rules • Isolation – No transaction should be able to interfere with another transaction • Durability – Once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors 38
  • 39. BASE • Basically Available – High availability but not always consistent • Soft state – Background cleanup mechanism • Eventual consistency – Given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent. 39
  • 40. Traditional (relational) Approach Extract Transactional Data Store Transform Data Warehouse Load 40
  • 41. Big Data Approach • MapReduce Pattern/Framework – an Input Reader – Map Function – To transform to a common shape (format) – a partition function – a compare function – Reduce Function – an Output Writer 41
  • 42. MongoDB Example > // map function > // reduce function > m = function(){ > r = function( key , values ){ ... this.tags.forEach( ... var total = 0; ... function(z){ ... for ( var i=0; i<values.length; i++ ) ... emit( z , { count : 1 } ... total += values[i].count; ); ... return { count : total }; ... } ...}; ... ); ...}; > // execute > res = db.things.mapReduce(m, r, { out : "myoutput" } ); 42
  • 44. Big Data on Azure • Azure Table Storage – Azure Service Bus • SQL Azure Federations • MongoDB on Azure – http://www.mongodb.org/display/DOCS/MongoDB+on+Azure • Hadoop on Azure – https://www.hadooponazure.com/ 44
  • 45. Using Azure for Computing Data Data Worker Data Client Master Worker Job/Task Scheduler Worker Data 45
  • 46. Moving to Event Based Architecture Web Role Worker Role Web Role Worker Role Web Role Worker Role Req Req Req Queue Web Role Worker Role Web Role Monitor queue Worker Role length against Web Role user‟s expectations Worker Role 46
  • 48. Visualizing Aggregates Orders ID: 1001 Customer: Ann Line Items Customers 32411234 2 $48 $96 707423234 1 $56 456 125145 1 $24 $24 Order Lines Payment Details Card: AmEx CC#: 12343 Expiration: 07/2015 Credit Cards 48
  • 49. Visualizing Aggregates ID: 1001 Customer: Ann Line Items 32411234 2 $48 $96 { “SalesOrdersView”:{ 707423234 1 $56 456 ID: 1001, Customer: Ann, 125145 1 $24 $24 LineItems: [] …………….. ……………. …………….. Payment Details } } Card: AmEx CC#: 12343 Expiration: 07/2015 49
  • 50. MongoDB on Azure Demo 50
  • 51. Next Steps • Learn a NoSQL product – Great place to start – AppFabric Cache, Azure Table Storage, MongoDB • Pick a new programming language to learn – Not Java or C#/VB – Node.js, JavaScript, F# 51
  • 52. THANK YOU 52

Editor's Notes

  1. t least four groups of data model: key-value, document, column-family, and graph. Looking at this list, there&apos;s a big similarity between the first three - all have a fundamental unit of storage which is a rich structure of closely related data: for key-value stores it&apos;s the value, for document stores it&apos;s the document, and for column-family stores it&apos;s the column family. In DDD terms, this group of data is an aggregate.A Graph Database stores data structured in the Nodes and Relationships of a graphColumn Family (BigTable-style) databases are an evolution of key-value, using &quot;families&quot; to allow grouping of rows. The rise of NoSQL databases has been driven primarily by the desire to store data effectively on large clusters - such as the setups used by Google and Amazon. Relational databases were not designed with clusters in mind, which is why people have cast around for an alternative. Storing aggregates as fundamental units makes a lot of sense for running on a cluster. Aggregates make natural units for distribution strategies such as sharding, since you have a large clump of data that you expect to be accessed together.The Relational ModelThe relational model provides for the storage of records that are made up of tuples. Records are stored in tables. Tables are defined by a schema, which determines what columns are in the table. Columns have a name and a type. All records within a table fit that table&apos;s definition. SQL is a query language designed to operate over tables. SQL provides syntax for finding records that meet criteria, as well as for relating records in one table to another via joins; a join finds a record in one table based on its relationship to a record in another table.Records can be created (inserted) or deleted. Fields within a record can be updated individually.Implementations of the relational model usually provide transactions, which provide a means to make modifications spanning multiple records atomically.In terms of what programming languages provide, tables are like arrays or lists of records or structures. For high performance access, tables can be indexed in various ways using b-trees or hash maps.Key-Value StoresKey-Value stores provide access to a value based on a key.The key-value pair can be created (inserted), or deleted. The value associated with a key may be updated.Key-value stores don&apos;t usually provide transactions.In terms of what programming languages provide, key-value stores resemble hash tables; these have many names: HashMap (Java), hash (Perl), dict (Python), associative array (PHP), boost::unordered_map&lt;...&gt; (C++).Key-value stores provide one implicit index on the key itself.A key-value store may not sound like the most useful thing, but a lot of information can be stored in the value. It is quite common for the value to be an XML document, a JSON object, or some other serialized form. The key point here is that the storage engine is not aware of the internal structure of the value. It is up to the client application to interpet the value andmanage its contents. The value can only be written as a whole; if the client is storing a JSON object, and only wants to update one field, the entire value must be fetched, the new value substituted, and then the entire value must be written back.The inability to fetch data by anything other than one key may appear limited, but there are workarounds. If the application requires a secondary index, the application can maintain one itself. To do this, the application manages a second collection of key-value pairs where the key is the value of another field in the first collection, and the value is the primary key in the first collection. Because there are no transactions that can be used to make sure that the secondary index is kept synchronized with the original collection, any application that does this would be wise to have a periodic syncing process to clean up after any partial changes that occur due to application crashes, bugs, or errors.Document StoresDocument stores provide access to structured data, but unlike the relational model, there may not be a schema that is enforced. In essence, the application stores bags of key-value pairs. In order to operate in this environment, the application adopts some conventions about how to deal with differing bags it may retrieve, or it may take advantage of the storage engine&apos;s ability to put different documents in different collections, which the application will use to manage its data.Unlike a relational store, document stores usually support nested structures. For example, for document stores that support XML or JSON documents, the value of a field may be something that looks like another document. Document stores can also support array or list-valued keys.Unlike a key-value store, document stores are aware of the internal structure of the document. This allows the storage engine to support secondary indexes directly, allowing for efficient queries on any field. The ability to support nested document storage leads to query languages that can be used to search for items nested inside others; XQuery is one example of this. MongoDB supports some similar functionality by allowing the specification of JSON field paths in queries.Column StoresColumn stores are like relational stores, except that they flip the data around. Instead of storing records, column stores store all the values for a column together in a stream. An index provides a means to get column values for any particular record.Map-reduce implementations such as Hadoop are most efficient if they can stream in their data. Column stores work particularly well for that. As a result, stores like HBase and Hypertable are often used as non-relational data warehouses to feed map-reduce for analytics.A relational-style column scalar may not be the most useful for analytics, so users often store more complex structures in columns. This manifests directly in Cassandra, which introduces the notion of &quot;column families,&quot; which get treated as a &quot;super-column.&quot;Column-oriented stores support retrieving records, but this requires fetching the column values from their individual columns and re-assembling the record.Graph DatabasesGraph databases store vertices and the edges between them. Some support adding annotations to the vertices and/or edges. This can be used to model things like social graphs (people are represented by vertices, and their relationships are the edges), or real-world objects (components are represented by vertices, and their connectedness is represented by edges). The content on IMDB is tied together by a graph: movies are related to to the actors in them, and actors are related to the movies they star in, forming a large complex graph.The access and query languages for graph databases are the most different of the set of those discussed here. Graph database query languages are generally about finding paths in the graph based on either endpoints, or constraints on attributes of the paths between endpoints; one example is SPARQL.
  2. Need to go into the EMC offerings