SlideShare a Scribd company logo
1 of 7
Non-Relational Databases

What are they, and why are they
    getting a lot of press?

      (plus a super-secret mystery guest tacked on at the end of the presentation)




             D DiPaolo - Lightning Lunch - 5/8/2009
What types of non-relational
           databases are there?
• Column-Oriented
   – Google’s BigTable
   – HBase (Apache Hadoop)
• Document-Oriented
   – CouchDB
   – Amazon SimpleDB
   – MongoDB
• Key-Value
   –   Facebook Cassandra
   –   Memcached (sort of)
   –   Tokyo Cabinet
   –   Redis
   –   Amazon Dynamo

                       D DiPaolo - Lightning Lunch - 5/8/2009
Motivation for each type
• Column-Oriented
  – Not all that different from relational, still have similar
    structures but just oriented differently so as to maximize
    disk performance
  – Generally you don’t want every column of a record
• Document-Oriented
  – Lack of structure allows for tight packing of data
  – Lack of strong typed fields akin to dynamic programming
    languages
• Key-Value
  – More generic version of Document-Oriented, with no
    guarantee/requirement of structure in the values

                     D DiPaolo - Lightning Lunch - 5/8/2009
Why are they popular now?
• Speed, speed, speed
   – Smaller/less data = faster throughput
   – Less “structure” means less overhead
   – Similar data stored sequentially means high compression
• Scalability too
   – RDBMSes weren’t designed to run across networks
• Moore’s Law isn’t enough
   – Faster processing can’t compensate (enough)
• Actually kind of gross hacks that are getting pretty
  faces put on them
• Lose some “nice features”

                     D DiPaolo - Lightning Lunch - 5/8/2009
You lose some stuff, but…
• Giving up
  –   Enforced Structure
  –   Constraints
  –   DB-side logic
  –   ACID guarantees
• The use-cases for these generally don’t need
  those
• If you absolutely need both speed and relational
  data, you can denormalize
  – But you probably don’t
                    D DiPaolo - Lightning Lunch - 5/8/2009
Parting Thought: Bloom Filters
• These are freaking wild
• Who knew lossy storage would actually be
  useful in databases?
• Basic idea: constant-space mapping of
  unlimited data, but it may lie a teeny bit
• This data structure is used in several of these
  non-relational DB implementations


                 D DiPaolo - Lightning Lunch - 5/8/2009
Bloom filter example
Bloom filter: Hash functions: x, y, z; 8 bits             So are all these bits set in the filter?
                                                          1011 0001 = "wtf"
Feed key "foo" into bloom filter:                         1111 1010 = filter
x("foo") maps to:         0010 0000                       y_yy ___n = not in the filter
y("foo") maps to:         0001 0000                       So we know "wtf" has never been put through this
z("foo") maps to:         1000 0010                           filter.
So the filter "result" is 1011 0010
                                                          What about "lol"?
Now feed key "bar" into bloom filter:                     x("lol") maps to: 1100 0000
x("bar") maps to:         0001 0000                       y("lol") maps to: 0010 0000
y("bar") maps to:         0000 1000                       z("lol") maps to: 0000 0010
z("bar") maps to:         0110 0000                       result:           1110 0010
So the filter "result" is 0111 1000
Combine (bitwise AND) this with the previous result Is "lol" in the filter:
     and the filter is now:                         1110 0010 = "lol"
                          1111 1010                 1111 1010 = filter
                                                    yyy_ __y_ = it might be!
Now we want to see if "wtf" is in the bloom filter:
x("wtf") maps to:         0001 0001                 (but it isn’t)
y("wtf") maps to:         0010 0000
z("wtf") maps to:         1000 0000
Our filter "result" is    1011 0001



                                       D DiPaolo - Lightning Lunch - 5/8/2009

More Related Content

Similar to Non Relational Databases

Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILSRoy Zimmer
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
Creating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesCreating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesMongoDB
 
Making CSS and Firebug Your New Friends
Making CSS and Firebug Your New FriendsMaking CSS and Firebug Your New Friends
Making CSS and Firebug Your New Friendscdw9
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JFlorent Biville
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Data Con LA
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsSteven Francia
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilledrICh morrow
 
Hdf5 is for Lovers (PyData SV 2013)
Hdf5 is for Lovers (PyData SV 2013)Hdf5 is for Lovers (PyData SV 2013)
Hdf5 is for Lovers (PyData SV 2013)PyData
 
03 introduction to graph databases
03   introduction to graph databases03   introduction to graph databases
03 introduction to graph databasesNeo4j
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson
 
XNA L10–Shaders Part 1
XNA L10–Shaders Part 1XNA L10–Shaders Part 1
XNA L10–Shaders Part 1Mohammad Shaker
 
Prototyping w/HTML5 and CSS3
Prototyping w/HTML5 and CSS3Prototyping w/HTML5 and CSS3
Prototyping w/HTML5 and CSS3Todd Zaki Warfel
 
HTML5, CSS3, and other fancy buzzwords
HTML5, CSS3, and other fancy buzzwordsHTML5, CSS3, and other fancy buzzwords
HTML5, CSS3, and other fancy buzzwordsMo Jangda
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsJoshua Shinavier
 
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015Dave Stokes
 
MapReduce In The Cloud Infinispan Distributed Task Execution Framework
MapReduce In The Cloud Infinispan Distributed Task Execution FrameworkMapReduce In The Cloud Infinispan Distributed Task Execution Framework
MapReduce In The Cloud Infinispan Distributed Task Execution FrameworkManik Surtani
 

Similar to Non Relational Databases (20)

Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Creating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesCreating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading Strategies
 
Making CSS and Firebug Your New Friends
Making CSS and Firebug Your New FriendsMaking CSS and Firebug Your New Friends
Making CSS and Firebug Your New Friends
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4J
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
Hdf5 is for Lovers (PyData SV 2013)
Hdf5 is for Lovers (PyData SV 2013)Hdf5 is for Lovers (PyData SV 2013)
Hdf5 is for Lovers (PyData SV 2013)
 
03 introduction to graph databases
03   introduction to graph databases03   introduction to graph databases
03 introduction to graph databases
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
 
XNA L10–Shaders Part 1
XNA L10–Shaders Part 1XNA L10–Shaders Part 1
XNA L10–Shaders Part 1
 
Prototyping w/HTML5 and CSS3
Prototyping w/HTML5 and CSS3Prototyping w/HTML5 and CSS3
Prototyping w/HTML5 and CSS3
 
HTML5, CSS3, and other fancy buzzwords
HTML5, CSS3, and other fancy buzzwordsHTML5, CSS3, and other fancy buzzwords
HTML5, CSS3, and other fancy buzzwords
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBs
 
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
 
NoSQL learnings from the world of Telco
NoSQL learnings from the world of TelcoNoSQL learnings from the world of Telco
NoSQL learnings from the world of Telco
 
MapReduce In The Cloud Infinispan Distributed Task Execution Framework
MapReduce In The Cloud Infinispan Distributed Task Execution FrameworkMapReduce In The Cloud Infinispan Distributed Task Execution Framework
MapReduce In The Cloud Infinispan Distributed Task Execution Framework
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Non Relational Databases

  • 1. Non-Relational Databases What are they, and why are they getting a lot of press? (plus a super-secret mystery guest tacked on at the end of the presentation) D DiPaolo - Lightning Lunch - 5/8/2009
  • 2. What types of non-relational databases are there? • Column-Oriented – Google’s BigTable – HBase (Apache Hadoop) • Document-Oriented – CouchDB – Amazon SimpleDB – MongoDB • Key-Value – Facebook Cassandra – Memcached (sort of) – Tokyo Cabinet – Redis – Amazon Dynamo D DiPaolo - Lightning Lunch - 5/8/2009
  • 3. Motivation for each type • Column-Oriented – Not all that different from relational, still have similar structures but just oriented differently so as to maximize disk performance – Generally you don’t want every column of a record • Document-Oriented – Lack of structure allows for tight packing of data – Lack of strong typed fields akin to dynamic programming languages • Key-Value – More generic version of Document-Oriented, with no guarantee/requirement of structure in the values D DiPaolo - Lightning Lunch - 5/8/2009
  • 4. Why are they popular now? • Speed, speed, speed – Smaller/less data = faster throughput – Less “structure” means less overhead – Similar data stored sequentially means high compression • Scalability too – RDBMSes weren’t designed to run across networks • Moore’s Law isn’t enough – Faster processing can’t compensate (enough) • Actually kind of gross hacks that are getting pretty faces put on them • Lose some “nice features” D DiPaolo - Lightning Lunch - 5/8/2009
  • 5. You lose some stuff, but… • Giving up – Enforced Structure – Constraints – DB-side logic – ACID guarantees • The use-cases for these generally don’t need those • If you absolutely need both speed and relational data, you can denormalize – But you probably don’t D DiPaolo - Lightning Lunch - 5/8/2009
  • 6. Parting Thought: Bloom Filters • These are freaking wild • Who knew lossy storage would actually be useful in databases? • Basic idea: constant-space mapping of unlimited data, but it may lie a teeny bit • This data structure is used in several of these non-relational DB implementations D DiPaolo - Lightning Lunch - 5/8/2009
  • 7. Bloom filter example Bloom filter: Hash functions: x, y, z; 8 bits So are all these bits set in the filter? 1011 0001 = "wtf" Feed key "foo" into bloom filter: 1111 1010 = filter x("foo") maps to: 0010 0000 y_yy ___n = not in the filter y("foo") maps to: 0001 0000 So we know "wtf" has never been put through this z("foo") maps to: 1000 0010 filter. So the filter "result" is 1011 0010 What about "lol"? Now feed key "bar" into bloom filter: x("lol") maps to: 1100 0000 x("bar") maps to: 0001 0000 y("lol") maps to: 0010 0000 y("bar") maps to: 0000 1000 z("lol") maps to: 0000 0010 z("bar") maps to: 0110 0000 result: 1110 0010 So the filter "result" is 0111 1000 Combine (bitwise AND) this with the previous result Is "lol" in the filter: and the filter is now: 1110 0010 = "lol" 1111 1010 1111 1010 = filter yyy_ __y_ = it might be! Now we want to see if "wtf" is in the bloom filter: x("wtf") maps to: 0001 0001 (but it isn’t) y("wtf") maps to: 0010 0000 z("wtf") maps to: 1000 0000 Our filter "result" is 1011 0001 D DiPaolo - Lightning Lunch - 5/8/2009