SlideShare a Scribd company logo
1 of 43
http://www.flickr.com/photos/fpat/3328595063/




Gary Dusbabek
Full Disclosure:
I work on Apache
   Cassandra.

             http://www.flickr.com/photos/vmanso/4040094281/
My Goals For You
     Should I?
    How Then?
     Achtung!

                 http://www.flickr.com/photos/29707865@N05/2780508266/
http://www.flickr.com/photos/marc_smith/6246957472/




A Brief History of
   Databases
http://www.flickr.com/photos/watchsmart/1422274819/
1960s
New
    Stuff
Direct access
    storage
Replaced Tape

New Possibilities
                    http://www.flickr.com/photos/byrion/5264950510/
Navigational
 Databases
   (two kinds)
Hierarchical
   Parent-child
Network
Relationships
   Graph
1970s
Codd
Relational Model

                                                           Search by content

                                                               Good for query

                                                       Demands on processor

                                                        Rigid, fixed structures

                                                            Bad for modeling
http://www.flickr.com/photos/35536700@N07/3292544674
Today
Data needs
   have
 changed
     http://www.flickr.com/photos/franzhaas/6761917637/
Data needs
   have
 changed
     http://www.flickr.com/photos/franzhaas/6761917637/
Technology
                                                   has
                                                 changed
http://www.flickr.com/photos/neosnaps/2574417351/
http://www.flickr.com/photos/katclay/3935629242/




                              Choosing
Choosing
Technology
  is Hard
      ™
   Work .
Can you do it with a
relational database?

Is your DB falling apart?
   What do you need?
Where RDBMS Fall Apart
Scaling
SPoF
Sharding
Denormalizing

Availability
 Slave Systems   http://www.flickr.com/photos/horiavarlan/4681206711/
What do you need?
                                                          Reduced cost
                                                            Throughput
                                                             Availability
                                                         Recoverability
                                                            Correctness
                                                           Transactions
http://www.flickr.com/photos/ell-r-brown/5866777592/
                                                       Flexible Schema
What is NoSQL?

                                                        Flight



http://www.flickr.com/photos/24277960@N08/2609390563/
http://www.flickr.com/photos/taylar/4996955547/
http://www.flickr.com/photos/gromgull/611019520/
http://www.flickr.com/photos/igboo/2583174998/
What isn’t NoSQL?
                                               NoFlight




http://www.flickr.com/photos/alanvernon/3121751152/
http://www.flickr.com/photos/tomsaint/3209482579/
http://www.flickr.com/photos/pointnshoot/408384715/
http://www.flickr.com/photos/zigazou76/5846255426/
Considerations
  Fault Tolerance
    Recovery
    Replication
     Access
      Hooks
    Distributed
Considerations
      Data Model
  Query/Search model
Transactional Semantics
Read vs Write Throughput
Deployment/Management
Focus on a few
                         systems
                               MongoDB




                                                                                              Master-Slave
                                  Redis
FullyDistributed




                   Riak
                   HBase
                   Cassandra    http://www.flickr.com/photos/seier/2455551478/sizes/l/in/photostream/
MongoDB
Document Oriented

Naturally denormalized

Flexible schema
MongoDB
Programmer friendly
Many language drivers
Atomic on a single document
MongoDB
Real-time data
warehousing/analytics

Blocking/offline compaction

Complicated queries
MongoDB

db.foo.find({j: {$ne: 3}, k: {$gt: 10} });

db.foo.find( { name : "bob" , $or : [ { a : 1 } , { b : 2 } ] } )
MongoDB
Master-slave replication
          Asynchronous
Gives failover & data redundancy
       But not consistency
 Only master can receive writes

Makes atomic writes easy
Redis
  Real-time stats
tracking

 Wicked fast

 Collections built in
Redis
In-memory

Snapshots

Master-slave

RAM limited
Riak
Relationships, aka “Links”
Built-in MapReduce
Completely schemaless
No SPoF
Scales linearly
Tunable consistency
Riak
Pre- Post-Hooks
Configurable storage engines
REST access
Easy cluster balancing
Riak
Doesn’t keep data sorted
Erlang
Cassandra
Query language
Range queries
Datacenter/Rack aware
Hadoop integration
Configurable cacheing
Live schema changes
Cassandra
Some schema
Growing cluster isn’t fair
HBase
Coprocessors
Versioned cells (BigTable)
Hadoop integration
HBase
HadoopNameNode is SPoF

Schema maintenance downtime

Schema required up front

Complicated balancing
http://www.flickr.com/photos/annguyenphotography/3267723713/




No Silver Bullet
http://www.flickr.com/photos/nateone/3768979925
           /




   HBase
@gdusbabek

More Related Content

Similar to Breaking the Relational Headlock: A Survey of NoSQL Datastores

Geek Sync - Azure SQL Database Performance Tuning
Geek Sync - Azure SQL Database Performance Tuning   Geek Sync - Azure SQL Database Performance Tuning
Geek Sync - Azure SQL Database Performance Tuning
IDERA Software
 
Nosql-columbia-feb2011
Nosql-columbia-feb2011Nosql-columbia-feb2011
Nosql-columbia-feb2011
siculars
 

Similar to Breaking the Relational Headlock: A Survey of NoSQL Datastores (20)

Building Rackspace Cloud Monitoring
Building Rackspace Cloud MonitoringBuilding Rackspace Cloud Monitoring
Building Rackspace Cloud Monitoring
 
Austin cassandra meetup
Austin cassandra meetupAustin cassandra meetup
Austin cassandra meetup
 
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
 
Cloud Computing im Unternehmen / Jan 25th 2011
Cloud Computing im Unternehmen / Jan 25th 2011Cloud Computing im Unternehmen / Jan 25th 2011
Cloud Computing im Unternehmen / Jan 25th 2011
 
Getting 100B Metrics to Disk
Getting 100B Metrics to DiskGetting 100B Metrics to Disk
Getting 100B Metrics to Disk
 
Just Too Late
Just Too LateJust Too Late
Just Too Late
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandra
 
Geek Sync - Azure SQL Database Performance Tuning
Geek Sync - Azure SQL Database Performance Tuning   Geek Sync - Azure SQL Database Performance Tuning
Geek Sync - Azure SQL Database Performance Tuning
 
CQRS and Event Sourcing for Java Developers
CQRS and Event Sourcing for Java DevelopersCQRS and Event Sourcing for Java Developers
CQRS and Event Sourcing for Java Developers
 
Using ArcGIS Server with Ruby on Rails
Using ArcGIS Server with Ruby on RailsUsing ArcGIS Server with Ruby on Rails
Using ArcGIS Server with Ruby on Rails
 
Scalding - Big Data Programming with Scala
Scalding - Big Data Programming with ScalaScalding - Big Data Programming with Scala
Scalding - Big Data Programming with Scala
 
Nosql-columbia-feb2011
Nosql-columbia-feb2011Nosql-columbia-feb2011
Nosql-columbia-feb2011
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
 
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
 
JPA 스터디 Week2 - Object Relational Mapping
JPA 스터디 Week2 - Object Relational MappingJPA 스터디 Week2 - Object Relational Mapping
JPA 스터디 Week2 - Object Relational Mapping
 
A walk down NOSQL Lane in the cloud
A walk down NOSQL Lane in the cloudA walk down NOSQL Lane in the cloud
A walk down NOSQL Lane in the cloud
 
SQL Server On SANs
SQL Server On SANsSQL Server On SANs
SQL Server On SANs
 
LA RubyConf 2009 Waves And Resource-Oriented Architecture
LA RubyConf 2009 Waves And Resource-Oriented ArchitectureLA RubyConf 2009 Waves And Resource-Oriented Architecture
LA RubyConf 2009 Waves And Resource-Oriented Architecture
 

More from gdusbabek

Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
gdusbabek
 
Cassandra Codebase 2011
Cassandra Codebase 2011Cassandra Codebase 2011
Cassandra Codebase 2011
gdusbabek
 

More from gdusbabek (12)

My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013
 
Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013
 
Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
 
Cassandra Codebase 2011
Cassandra Codebase 2011Cassandra Codebase 2011
Cassandra Codebase 2011
 
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column FamiliesData Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
 
Getting to Know the Cassandra Codebase
Getting to Know the Cassandra CodebaseGetting to Know the Cassandra Codebase
Getting to Know the Cassandra Codebase
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Breaking the Relational Headlock: A Survey of NoSQL Datastores

Editor's Notes

  1. Technical?Ask questionsDiscussion
  2. Open source fan
  3. Context
  4. Seeking didn’t kill you.
  5. TraverselinksFollow pointersNo notion of keysJust data
  6. Up and down
  7. Up down left right
  8. Edgar F CoddDominant by 90s
  9. Emphasize search, not navigationForeign keys are a bad model.Relationships not explicit.E-R diagrams not until mid to late 70s.
  10. Rackspace example
  11. Google File System 2003BigTable 2004
  12. Answer: Should I?Temptation – new startup makes a blog post saying “we like it.”HypeThis is Hawt! I should be using it.New shiny
  13. Fads aside…Mistakes not evident at first.
  14. Answer: Should I?Two questionsHypeNew shiny
  15. Fixed table spaceNotlinear – 2x space != 2x money.
  16. Relational impedance
  17. Datamodel is complexQueries are represented as JSON
  18. Datamodel is complexQueries are represented as JSON
  19. Does do sharding
  20. Like a memcache for lists and setsLive dataFast changing
  21. Snapshots - leave delta for data lossMaster/Slave - asynchronous replication
  22. Faithful Dynamo CloneMapReduce != Hadoop integrationHooks == BigTable Coprocessors
  23. Bitscask -> small data set (keys must fit in memory)InnoDB -> big data setMemory -> duhREST – easy for programmersBalancing – always 64 pieces.
  24. Sorted – poor scanning performance
  25. Dyanamo + BigTable
  26. Balancing –RegionServers + HDFS
  27. Will be more choices and better solutions: 205 Million Dollars of Funding For Big Data Startups (http://datascience101.wordpress.com/2012/02/28/funding-for-big-data-startups/)Accel PartnersIA Ventures