SlideShare a Scribd company logo
1 of 39
Download to read offline
Where do I put this
          Data?
          Ezra Zygmuntowicz
              @ezmobius




#lessql
SQL Databases Are
  Not A Panacea
SQL Databases Are
  Not A Panacea


NO-SQL Databases
Are Not A Panacea
Duh!
SQL Databases Are
  Not A Panacea


NO-SQL Databases
Are Not A Panacea
Who really needs a
#lessql database?
Who really needs a
  #nosql database?

Probably not you!
Who really needs a
     #nosql database?

Probably not you!
 Unless you do your homework first...
Limitations of SQL
Don’t think #NOSQL

  Think #LESSQL

 Hybrid systems are
    where it’s at
Find a small part of your
  application that has pain
because of the sql database
 and port just that part to
   one of these systems
Find a small part of your
  application that has pain
because of the sql database
 and port just that part to
   one of these systems

    rinse ... repeat...
Example
The Players
  • Redis
  • Tokyo*
  • MongoDB
  • Riak
  • Cassandra
  • Dynomite
• Fast, in memory key/value store
• Async disk persistence
• STRING, LIST and SET data types
• Perfect Data Structure/State/Cache Server
      http://code.google.com/p/redis/
    http://github.com/ezmobius/redis-rb
Pros:                            Cons:
Very Fast(110k/ops/sec)       Data Set must fit in Memory
 High Level Data Types         Possible to lose some data
  Sequential IO Only          between syncs(configurable)
  Very Clean C Code


  Scales horizontally via consistent hashing in the client
Use Redis when you want:

                   As fast as you can get

     Data structure sharing between ruby processes

              A better, persistent memcached

Hit counters, rate limiters, circular log buffers and sessions
• Large data workhorse
   • Native, memcached and http interfaces
   • Fast with full disk persistence
   • Key/value with Lua extensions available
           http://1978th.net/tokyocabinet/
            http://1978th.net/tokyotyrant/
       http://github.com/jmettraux/rufus-tokyo
http://copiousfreetime.rubyforge.org/tyrantmanager/
Pros:                            Cons:
    Fast and Stable                  No data types for
 Ability to store Large            values(table db type
   amounts of data                       excluded)
 Embeded Lua in the              Some issues with data sets
     tyrant server                  larger then 70Gigs
  Pluggable storage
         engines
  Scales horizontally via consistent hashing in the client
Tyrant also supports master/master and master /slave replication
Use Tokyo when you want:

    As fast as you can get with synchronous writes

        Store large amounts of persistent data

Use the smallest amount of disk space for your data set

                 Tunable RAM usage
• JSON document database
 • the ‘mysql’ of key/value stores
 • Fast but flexible query engine
 • Support for sharding baked in(sorta)
 • Replication
            http://www.mongodb.org
http://github.com/mongodb/mongo-ruby-driver
 http://github.com/jnunemaker/mongomapper
Pros:
             Schemaless
     Advanced query features           Cons:
           Relatively Fast    AutoSharding not ready yet
               GridFS              No Transactions
Easy transition from SQL databases
        Active development

          Scales horizontally via auto sharding
        Supports master/slave replication for failover
Use MongoDB when you want:

       Very Fast with synchronous writes

    Store large amounts of schemaless data

        Great for logging, stats collection

Very powerful query API and indexing capabilities
• Document oriented database
• HTTP/JSON query interface
• Erlang map/reduce query interface
• Decentralized, just add or remove nodes to
  scale

           http://riak.basho.com/
Pros:
                                         Cons:
       Schemaless
                                 Still young project
No master node/share
                              Not much documentation
         nothing
  Add/remove nodes
easily to scale up/down

Scales horizontally via shared nothing, hinted handoff and
       consistent hashing. Definitely one to watch
Use Riak when you want:

Ease of operations when adding/removing nodes
• Eventually consistent, distributed,
       structured key/value store
    • Cross between Big Table and Dynamo
    • Column Families provide higher level data
       models then most key/value stores
    • Truly add or remove nodes to scale
         capacity
          http://incubator.apache.org/cassandra/
http://blog.evanweaver.com/articles/2009/07/06/up-and-
                  running-with-cassandra/
Pros:
    Always writable                      Cons:
  Horizontally scalable        Restart whole system when
Addnodes easily to scale         making schema changes
     write capacity             Not much documentation
  Easy to get common              Rough edges abound
 sorted queries(recent
     blog posts etc)
 Scales horizontally via automatic replication, even tunable
                 across racks/data centers
Use Cassandra when you want:

        Truly just add nodes to scale out

Fairly rich data model, sorted supercolumn familes

        Store truly large amounts of data
• Eventually consistent, distributed, key/value
  store
• Based on Amazon’s Dynamo Papers
• Data Partitioning, versioning and read
  repair
• Written in Erlang
   http://github.com/cliffmoon/dynomite
Pros:
   Always writable
 Horizontally scalable                      Cons:
Good for storing large          New partitions come online
       binaries                   before they are *ready*
    Add nodes to                  Migration is very painful
      repartition               Still beta quality but used in
   Gossip protocol                production at powerset
  Pluggable storage
        engines
  Scales horizontally via automatic replication, talk to any
               node in the cluster from clients
Use Dynomite when you want:

    Scale writes by adding nodes

Store large binaries(like image assetts)

      Nice web admin interface
Thats great but which
  one should I use?
Thats great but which
  one should I use?
Unless you have good reasons
 stick with Redis, Tokyo and
     MongoDB for now
Thats great but which
  one should I use?
Unless you have good reasons
 stick with Redis, Tokyo and
     MongoDB for now
But keep an eye on the others
for truly add node to scale out
            capacity
Chef recipes to configure all of these on the Engine Yard
                        Cloud
         http://github.com/ezmobius/ey-lessql
Pitfalls of #LESSQL
• No Referential Integrity
• Not as much tooling
• Almost non existent disaster recovery
  tools
• Not as much production, used in anger
  experience
Do not buy into the hype!
 Most of you do not need this
stuff. Relational databases scale
   well for 99% of use cases.

   Don’t do it because it’s
          “Cool”
But if you do your
homework, #LESSQL can be
 very compelling and very
           useful.
     Please make informed
decisions so you don’t have to
hire me to clean up your mess!
Questions?

More Related Content

What's hot

From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
20120329 installing wordpress_3_3_1_locally
20120329 installing wordpress_3_3_1_locally20120329 installing wordpress_3_3_1_locally
20120329 installing wordpress_3_3_1_locallyDERlab
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server sideHoward Marks
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scalesOren Eini
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)John Dougherty
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling PinterestC4Media
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using KafkaAkash Vacher
 
Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Oren Eini
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
Making Startups Work: Scaling Drupal for Thrillist.com
Making Startups Work: Scaling Drupal for Thrillist.comMaking Startups Work: Scaling Drupal for Thrillist.com
Making Startups Work: Scaling Drupal for Thrillist.comMichael Smith
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBAmar Das
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using KafkaVenu Ryali
 
SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.Pini Krisher
 

What's hot (20)

Compression talk
Compression talkCompression talk
Compression talk
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
In Memory Cahce Structure
In Memory Cahce StructureIn Memory Cahce Structure
In Memory Cahce Structure
 
20120329 installing wordpress_3_3_1_locally
20120329 installing wordpress_3_3_1_locally20120329 installing wordpress_3_3_1_locally
20120329 installing wordpress_3_3_1_locally
 
SPA vs. MPA
SPA vs. MPASPA vs. MPA
SPA vs. MPA
 
Client server
Client serverClient server
Client server
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
 
RavenDB 4.0
RavenDB 4.0RavenDB 4.0
RavenDB 4.0
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling Pinterest
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
 
Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
Performance stack
Performance stackPerformance stack
Performance stack
 
Making Startups Work: Scaling Drupal for Thrillist.com
Making Startups Work: Scaling Drupal for Thrillist.comMaking Startups Work: Scaling Drupal for Thrillist.com
Making Startups Work: Scaling Drupal for Thrillist.com
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
 
SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.
 

Similar to Where do I put this data? #lessql

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 DistilledGrig Gheorghiu
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesKyle Banerjee
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance DrupalChapter Three
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
2010 mongo berlin-scaling
2010 mongo berlin-scaling2010 mongo berlin-scaling
2010 mongo berlin-scalingMongoDB
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)Panagiotis Kanavos
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCCal Henderson
 
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...Serdar Basegmez
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMathew Beane
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 

Similar to Where do I put this data? #lessql (20)

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
noSQL choices
noSQL choicesnoSQL choices
noSQL choices
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 
Data engineering
Data engineeringData engineering
Data engineering
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
2010 mongo berlin-scaling
2010 mongo berlin-scaling2010 mongo berlin-scaling
2010 mongo berlin-scaling
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling Magento
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
MongoDB
MongoDBMongoDB
MongoDB
 

More from Ezra Zygmuntowicz (9)

Redis: REmote DIctionary Server
Redis: REmote DIctionary ServerRedis: REmote DIctionary Server
Redis: REmote DIctionary Server
 
Railsconf
RailsconfRailsconf
Railsconf
 
Erlangfactory
ErlangfactoryErlangfactory
Erlangfactory
 
Ruby Deployment
Ruby DeploymentRuby Deployment
Ruby Deployment
 
Merb + Nanite
Merb + NaniteMerb + Nanite
Merb + Nanite
 
Vertebra
VertebraVertebra
Vertebra
 
Vertebra
VertebraVertebra
Vertebra
 
Merb Core
Merb CoreMerb Core
Merb Core
 
Custom Mongrel Handlers: Learning how to walk the dog
Custom Mongrel Handlers: Learning how to walk the dogCustom Mongrel Handlers: Learning how to walk the dog
Custom Mongrel Handlers: Learning how to walk the dog
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Where do I put this data? #lessql

  • 1. Where do I put this Data? Ezra Zygmuntowicz @ezmobius #lessql
  • 2. SQL Databases Are Not A Panacea
  • 3. SQL Databases Are Not A Panacea NO-SQL Databases Are Not A Panacea
  • 4. Duh! SQL Databases Are Not A Panacea NO-SQL Databases Are Not A Panacea
  • 5. Who really needs a #lessql database?
  • 6. Who really needs a #nosql database? Probably not you!
  • 7. Who really needs a #nosql database? Probably not you! Unless you do your homework first...
  • 9. Don’t think #NOSQL Think #LESSQL Hybrid systems are where it’s at
  • 10. Find a small part of your application that has pain because of the sql database and port just that part to one of these systems
  • 11. Find a small part of your application that has pain because of the sql database and port just that part to one of these systems rinse ... repeat...
  • 13. The Players • Redis • Tokyo* • MongoDB • Riak • Cassandra • Dynomite
  • 14. • Fast, in memory key/value store • Async disk persistence • STRING, LIST and SET data types • Perfect Data Structure/State/Cache Server http://code.google.com/p/redis/ http://github.com/ezmobius/redis-rb
  • 15. Pros: Cons: Very Fast(110k/ops/sec) Data Set must fit in Memory High Level Data Types Possible to lose some data Sequential IO Only between syncs(configurable) Very Clean C Code Scales horizontally via consistent hashing in the client
  • 16. Use Redis when you want: As fast as you can get Data structure sharing between ruby processes A better, persistent memcached Hit counters, rate limiters, circular log buffers and sessions
  • 17. • Large data workhorse • Native, memcached and http interfaces • Fast with full disk persistence • Key/value with Lua extensions available http://1978th.net/tokyocabinet/ http://1978th.net/tokyotyrant/ http://github.com/jmettraux/rufus-tokyo http://copiousfreetime.rubyforge.org/tyrantmanager/
  • 18. Pros: Cons: Fast and Stable No data types for Ability to store Large values(table db type amounts of data excluded) Embeded Lua in the Some issues with data sets tyrant server larger then 70Gigs Pluggable storage engines Scales horizontally via consistent hashing in the client Tyrant also supports master/master and master /slave replication
  • 19. Use Tokyo when you want: As fast as you can get with synchronous writes Store large amounts of persistent data Use the smallest amount of disk space for your data set Tunable RAM usage
  • 20. • JSON document database • the ‘mysql’ of key/value stores • Fast but flexible query engine • Support for sharding baked in(sorta) • Replication http://www.mongodb.org http://github.com/mongodb/mongo-ruby-driver http://github.com/jnunemaker/mongomapper
  • 21. Pros: Schemaless Advanced query features Cons: Relatively Fast AutoSharding not ready yet GridFS No Transactions Easy transition from SQL databases Active development Scales horizontally via auto sharding Supports master/slave replication for failover
  • 22. Use MongoDB when you want: Very Fast with synchronous writes Store large amounts of schemaless data Great for logging, stats collection Very powerful query API and indexing capabilities
  • 23. • Document oriented database • HTTP/JSON query interface • Erlang map/reduce query interface • Decentralized, just add or remove nodes to scale http://riak.basho.com/
  • 24. Pros: Cons: Schemaless Still young project No master node/share Not much documentation nothing Add/remove nodes easily to scale up/down Scales horizontally via shared nothing, hinted handoff and consistent hashing. Definitely one to watch
  • 25. Use Riak when you want: Ease of operations when adding/removing nodes
  • 26. • Eventually consistent, distributed, structured key/value store • Cross between Big Table and Dynamo • Column Families provide higher level data models then most key/value stores • Truly add or remove nodes to scale capacity http://incubator.apache.org/cassandra/ http://blog.evanweaver.com/articles/2009/07/06/up-and- running-with-cassandra/
  • 27. Pros: Always writable Cons: Horizontally scalable Restart whole system when Addnodes easily to scale making schema changes write capacity Not much documentation Easy to get common Rough edges abound sorted queries(recent blog posts etc) Scales horizontally via automatic replication, even tunable across racks/data centers
  • 28. Use Cassandra when you want: Truly just add nodes to scale out Fairly rich data model, sorted supercolumn familes Store truly large amounts of data
  • 29. • Eventually consistent, distributed, key/value store • Based on Amazon’s Dynamo Papers • Data Partitioning, versioning and read repair • Written in Erlang http://github.com/cliffmoon/dynomite
  • 30. Pros: Always writable Horizontally scalable Cons: Good for storing large New partitions come online binaries before they are *ready* Add nodes to Migration is very painful repartition Still beta quality but used in Gossip protocol production at powerset Pluggable storage engines Scales horizontally via automatic replication, talk to any node in the cluster from clients
  • 31. Use Dynomite when you want: Scale writes by adding nodes Store large binaries(like image assetts) Nice web admin interface
  • 32. Thats great but which one should I use?
  • 33. Thats great but which one should I use? Unless you have good reasons stick with Redis, Tokyo and MongoDB for now
  • 34. Thats great but which one should I use? Unless you have good reasons stick with Redis, Tokyo and MongoDB for now But keep an eye on the others for truly add node to scale out capacity
  • 35. Chef recipes to configure all of these on the Engine Yard Cloud http://github.com/ezmobius/ey-lessql
  • 36. Pitfalls of #LESSQL • No Referential Integrity • Not as much tooling • Almost non existent disaster recovery tools • Not as much production, used in anger experience
  • 37. Do not buy into the hype! Most of you do not need this stuff. Relational databases scale well for 99% of use cases. Don’t do it because it’s “Cool”
  • 38. But if you do your homework, #LESSQL can be very compelling and very useful. Please make informed decisions so you don’t have to hire me to clean up your mess!