SlideShare a Scribd company logo
1 of 20
Download to read offline
NoSQL
SQL anti patterns and NoSQL alternatives

           Gleicon Moraes

    http://zenmachine.wordpress.com
         http://github.com/gleicon
Doing it wrong, Junior !
SQL Anti patterns
and related stuff


  The eternal tree (rows refer to the table itself - think
  threaded discussion)
  Dynamic table creation (and dynamic query building)
  Table as cache (lets save it in another table)
  Table as queue (wtf)
  Table as log file (table cleaning slave required)
  Stoned Procedures (living la vida business)
  Row Alignment (the careful gentleman)
  Extreme JOINs (app requires a warmed up cache)
  Your scheme must be printed in an A3 sheet.
  Your ORM issue full queries for Dataset iterations
The eternal tree
Problem: Most threaded discussion example uses something
like a table which contains all threads and answers, relating to
each other by an id. Usually the developer will come up with
his own binary-tree version to manage this mess.

id - parent_id -author - text
1 - 0 - gleicon - hello world
2 - 1 - elvis - shout !

NoSQL alternative: Document storage:
{ thread_id:1, title: 'the meeting', author: 'gleicon', replies:[
     {
       'author': elvis, text:'shout', replies:[{...}]
     }
   ]
}
Dynamic table creation
Problem: To avoid huge tables, one must come with a
"dynamic schema". For example, lets think about a document
management company, which is adding new facilities over the
country. For each storage facility, a new table is created:

item_id - row - column - stuff
1 - 10 - 20 - cat food
2 - 12 - 32 - trout

Now you have to come up with "dynamic queries", which will
probably query a "central storage" table and issue a huge join
to check if you have enough cat food over the country.

NoSQL alternative:
- Document storage, modeling a facility as a document
- Key/Value, modeling each facility as a SET
Table as cache
Problem: Complex queries demand that a result be stored in a
separated table, so it can be queried quickly. Worst than views


NoSQL alternative:
- Really ?
- Memcached
- Redis + AOF + EXPIRE
- Denormalization
Table as queue
Problem: A table which holds messages to be completed.
Worse, they must be ordered.

NoSQL alternative:
- RestMQ, Resque
- Any other message broker
- Redis (LISTS - LPUSH + RPOP)
- Use the right tool
Table as log file
Problem: A table in which data gets written as a log file. From
time to time it needs to be purged. Truncating this table once
a day usually is the first task assigned to new DBAs.

NoSQL alternative:
- MongoDB capped collection
- Redis, and a RRD pattern
- RIAK
Stoned procedures
Problem: Stored procedures hold most of your applications
logic. Also, some triggers are used to - well - trigger important
data events.

SP and triggers has the magic property of vanishing of our
memories and being impossible to keep versioned.

NoSQL alternative:
- Now be careful so you dont use map/reduce as stoned
procedures.
- Use your preferred language for business stuff, and let event
handling to pub/sub or message queues.
Row Alignment
Problem: Extra rows are created but not used, just in case.
Usually they are named as a1, a2, a3, a4 and called padding.

There's good will behind that, specially when version 1 of the
software needed an extra column in a 150M lines database
and it took 2 days to run an ALTER TABLE.

NoSQL alternative:
- Document based databases as MongoDB and CouchDB,
where new atributes are local to the document. Also, having no
schema helps

- Column based databases may be not the best choice if
column creation need restart/migrations
Extreme JOINs
Problem: Business stuff modeled as tables. Table inheritance
(Product -> SubProduct_A). To find the complete data for a
user plan, one must issue gigantic queries with lots of JOINs.

NoSQL alternative:
- Document storage, as MongoDB
- Denormalization
- Serialized objects
Your scheme fits in an A3 sheet
Problem: Huge data schemes are difficult to manage. Extreme
specialization creates tables which converges to key/value
model. The normal form get priority over common sense.

Product_A      Product_B
id - desc      id - desc

NoSQL alternative:
- Denormalization
- Another scheme ?
- Document store for flattening model
- Key/Value
Your ORM ...
Problem: Your ORM issue full queries for dataset iterations,
your ORM maps and creates tables which mimics your
classes, even the inheritance, and the performance is bad
because the queries are huge, etc, etc

NoSQL alternative:
Apart from denormalization and good old common sense,
ORMs are trying to bridge two things with distinct impedance.

There is nothing to relational models which maps cleanly to
classes and objects. Not even the basic unit which is the
domain(set) of each column. Black Magic ?
No silver bullet
- Consider alternatives

- Think outside the norm

- Denormalize

- Simplify
Cycle of changes - Product A
1.   There was the database model
2.   Then, the cache was needed. Performance was no good.
3.   Cache key: query, value: resultset
4.   High or inexistent expiration time [w00t]

(Now there's a turning point. Data didn't need to change often.
Denormalization was a given with cache)

5. The cache needs to be warmed or the app wont work.
6. Key/Value storage was a natural choice. No data on MySQL
anymore.
Cycle of changes - Product B
1. Postgres DB storing crawler results.
2. There was a counter in each row, and updating this counter
   caused contention errors.
3. Memcache for reads. Performance is better.
4. First MongoDB test, no more deadlocks from counter
   update.
5. Data model was simplified, the entire crawled doc was
   stored.
Stuff to think about
Think if the data you use aren't denormalized (cached)

Most of the anti-patterns contain signs that the NoSQL route
(or at least a partial NoSQL route) may simplify.

Are you dependent on cache ? Does your application fails
when there is no cache ? Does it just slows down ?

Are you ready to think more about your data ?

Think about the way to put and to get back your data from the
database (be it SQL or NoSQL).
Extra - MongoDB and Redis
The next two slides are here to show what is like to use
MongoDB and Redis for the same task.

There is more to managing your data than stuffing it inside a
database. You gotta plan ahead for searches and migrations.

This example is about storing books and searching between
them. MongoDB makes it simpler, just liek using its query
language. Redis requires that you keep track of tags and ids to
use SET operations to recover which books you want.

Check http://rediscookbook.org and http://cookbook.mongodb.
org/ for recipes on data handling.
MongoDB/Redis recap - Books
MongoDB                                           Redis
{
 'id': 1,
                                                  SET book:1 {'title' : 'Diving into Python',
 'title' : 'Diving into Python',
                                                  'author': 'Mark Pilgrim'}
 'author': 'Mark Pilgrim',
                                                  SET book:2 { 'title' : 'Programing Erlang',
 'tags': ['python','programming', 'computing']
                                                  'author': 'Joe Armstrong'}
}
                                                  SET book:3 { 'title' : 'Programing in Haskell',
                                                  'author': 'Graham Hutton'}
{
 'id':2,
 'title' : 'Programing Erlang',                   SADD tag:python 1
 'author': 'Joe Armstrong',                       SADD tag:erlang 2
 'tags': ['erlang','programming', 'computing',    SADD tag:haskell 3
'distributedcomputing', 'FP']                     SADD tag:programming 1 2 3
}                                                 SADD tag computing 1 2 3
                                                  SADD tag:distributedcomputing 2
{                                                 SADD tag:FP 2 3
 'id':3,
 'title' : 'Programing in Haskell',
 'author': 'Graham Hutton',
 'tags': ['haskell','programming', 'computing',
'FP']
}
MongoDB/Redis recap - Books
MongoDB                                     Redis
Search tags for erlang or haskell:
                                            SINTER 'tag:erlang' 'tag:haskell'
db.books.find({"tags":
                                            0 results
     { $in: ['erlang', 'haskell']
   }
                                            SINTER 'tag:programming' 'tag:computing'
})
                                            3 results: 1, 2, 3
Search tags for erlang AND haskell (no
                                            SUNION 'tag:erlang' 'tag:haskell'
results)
                                            2 results: 2 and 3
db.books.find({"tags":
                                            SDIFF 'tag:programming' 'tag:haskell'
     { $all: ['erlang', 'haskell']
                                            2 results: 1 and 2 (haskell is excluded)
   }
})

This search yields 3 results
db.books.find({"tags":
     { $all: ['programming', 'computing']
   }
})

More Related Content

What's hot

Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud appsDavid Cunningham
 
Redis modules 101
Redis modules 101Redis modules 101
Redis modules 101Dvir Volk
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance TipsPerrin Harkins
 
Jsonnet, terraform & packer
Jsonnet, terraform & packerJsonnet, terraform & packer
Jsonnet, terraform & packerDavid Cunningham
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scriptingTony Fabeen
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at ScaleFabian Reinartz
 
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012 Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012 Tom Croucher
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientBuilding Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientMike Friedman
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveSematext Group, Inc.
 
Web program-peformance-optimization
Web program-peformance-optimizationWeb program-peformance-optimization
Web program-peformance-optimizationxiaojueqq12345
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK hypto
 
Getting Started with PL/Proxy
Getting Started with PL/ProxyGetting Started with PL/Proxy
Getting Started with PL/ProxyPeter Eisentraut
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013Andrew Dunstan
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Data Con LA
 

What's hot (20)

Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud apps
 
Redis modules 101
Redis modules 101Redis modules 101
Redis modules 101
 
Tuning Solr for Logs
Tuning Solr for LogsTuning Solr for Logs
Tuning Solr for Logs
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance Tips
 
Presto overview
Presto overviewPresto overview
Presto overview
 
Jsonnet, terraform & packer
Jsonnet, terraform & packerJsonnet, terraform & packer
Jsonnet, terraform & packer
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scripting
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
Fluentd meetup #2
Fluentd meetup #2Fluentd meetup #2
Fluentd meetup #2
 
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012 Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientBuilding Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::Client
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Web program-peformance-optimization
Web program-peformance-optimizationWeb program-peformance-optimization
Web program-peformance-optimization
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Getting Started with PL/Proxy
Getting Started with PL/ProxyGetting Started with PL/Proxy
Getting Started with PL/Proxy
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
 

Similar to NoSQL and SQL Anti Patterns

Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingGleicon Moraes
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011bostonrb
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handlingGleicon Moraes
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsHolden Karau
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandrazznate
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON Padma shree. T
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial EnAnkur Dongre
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial EnAnkur Dongre
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkDatabricks
 

Similar to NoSQL and SQL Anti Patterns (20)

Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
 
Scala+data
Scala+dataScala+data
Scala+data
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handling
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
NoSql Introduction
NoSql IntroductionNoSql Introduction
NoSql Introduction
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial En
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial En
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 

More from Gleicon Moraes

Como arquiteturas de dados quebram
Como arquiteturas de dados quebramComo arquiteturas de dados quebram
Como arquiteturas de dados quebramGleicon Moraes
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsGleicon Moraes
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...Gleicon Moraes
 
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...Gleicon Moraes
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Gleicon Moraes
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaSGleicon Moraes
 
Semi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisSemi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisGleicon Moraes
 
L'esprit de l'escalier
L'esprit de l'escalierL'esprit de l'escalier
L'esprit de l'escalierGleicon Moraes
 
OSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityOSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityGleicon Moraes
 
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling FailuresArchitectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling FailuresGleicon Moraes
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by AccidentGleicon Moraes
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Gleicon Moraes
 

More from Gleicon Moraes (15)

Como arquiteturas de dados quebram
Como arquiteturas de dados quebramComo arquiteturas de dados quebram
Como arquiteturas de dados quebram
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devops
 
API Gateway report
API Gateway reportAPI Gateway report
API Gateway report
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
 
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
 
Locaweb cloud and sdn
Locaweb cloud and sdnLocaweb cloud and sdn
Locaweb cloud and sdn
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 
Semi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisSemi Automatic Sentiment Analysis
Semi Automatic Sentiment Analysis
 
L'esprit de l'escalier
L'esprit de l'escalierL'esprit de l'escalier
L'esprit de l'escalier
 
OSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityOSCon - Performance vs Scalability
OSCon - Performance vs Scalability
 
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling FailuresArchitectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 
Patterns of fail
Patterns of failPatterns of fail
Patterns of fail
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

NoSQL and SQL Anti Patterns

  • 1. NoSQL SQL anti patterns and NoSQL alternatives Gleicon Moraes http://zenmachine.wordpress.com http://github.com/gleicon
  • 2. Doing it wrong, Junior !
  • 3. SQL Anti patterns and related stuff The eternal tree (rows refer to the table itself - think threaded discussion) Dynamic table creation (and dynamic query building) Table as cache (lets save it in another table) Table as queue (wtf) Table as log file (table cleaning slave required) Stoned Procedures (living la vida business) Row Alignment (the careful gentleman) Extreme JOINs (app requires a warmed up cache) Your scheme must be printed in an A3 sheet. Your ORM issue full queries for Dataset iterations
  • 4. The eternal tree Problem: Most threaded discussion example uses something like a table which contains all threads and answers, relating to each other by an id. Usually the developer will come up with his own binary-tree version to manage this mess. id - parent_id -author - text 1 - 0 - gleicon - hello world 2 - 1 - elvis - shout ! NoSQL alternative: Document storage: { thread_id:1, title: 'the meeting', author: 'gleicon', replies:[ { 'author': elvis, text:'shout', replies:[{...}] } ] }
  • 5. Dynamic table creation Problem: To avoid huge tables, one must come with a "dynamic schema". For example, lets think about a document management company, which is adding new facilities over the country. For each storage facility, a new table is created: item_id - row - column - stuff 1 - 10 - 20 - cat food 2 - 12 - 32 - trout Now you have to come up with "dynamic queries", which will probably query a "central storage" table and issue a huge join to check if you have enough cat food over the country. NoSQL alternative: - Document storage, modeling a facility as a document - Key/Value, modeling each facility as a SET
  • 6. Table as cache Problem: Complex queries demand that a result be stored in a separated table, so it can be queried quickly. Worst than views NoSQL alternative: - Really ? - Memcached - Redis + AOF + EXPIRE - Denormalization
  • 7. Table as queue Problem: A table which holds messages to be completed. Worse, they must be ordered. NoSQL alternative: - RestMQ, Resque - Any other message broker - Redis (LISTS - LPUSH + RPOP) - Use the right tool
  • 8. Table as log file Problem: A table in which data gets written as a log file. From time to time it needs to be purged. Truncating this table once a day usually is the first task assigned to new DBAs. NoSQL alternative: - MongoDB capped collection - Redis, and a RRD pattern - RIAK
  • 9. Stoned procedures Problem: Stored procedures hold most of your applications logic. Also, some triggers are used to - well - trigger important data events. SP and triggers has the magic property of vanishing of our memories and being impossible to keep versioned. NoSQL alternative: - Now be careful so you dont use map/reduce as stoned procedures. - Use your preferred language for business stuff, and let event handling to pub/sub or message queues.
  • 10. Row Alignment Problem: Extra rows are created but not used, just in case. Usually they are named as a1, a2, a3, a4 and called padding. There's good will behind that, specially when version 1 of the software needed an extra column in a 150M lines database and it took 2 days to run an ALTER TABLE. NoSQL alternative: - Document based databases as MongoDB and CouchDB, where new atributes are local to the document. Also, having no schema helps - Column based databases may be not the best choice if column creation need restart/migrations
  • 11. Extreme JOINs Problem: Business stuff modeled as tables. Table inheritance (Product -> SubProduct_A). To find the complete data for a user plan, one must issue gigantic queries with lots of JOINs. NoSQL alternative: - Document storage, as MongoDB - Denormalization - Serialized objects
  • 12. Your scheme fits in an A3 sheet Problem: Huge data schemes are difficult to manage. Extreme specialization creates tables which converges to key/value model. The normal form get priority over common sense. Product_A Product_B id - desc id - desc NoSQL alternative: - Denormalization - Another scheme ? - Document store for flattening model - Key/Value
  • 13. Your ORM ... Problem: Your ORM issue full queries for dataset iterations, your ORM maps and creates tables which mimics your classes, even the inheritance, and the performance is bad because the queries are huge, etc, etc NoSQL alternative: Apart from denormalization and good old common sense, ORMs are trying to bridge two things with distinct impedance. There is nothing to relational models which maps cleanly to classes and objects. Not even the basic unit which is the domain(set) of each column. Black Magic ?
  • 14. No silver bullet - Consider alternatives - Think outside the norm - Denormalize - Simplify
  • 15. Cycle of changes - Product A 1. There was the database model 2. Then, the cache was needed. Performance was no good. 3. Cache key: query, value: resultset 4. High or inexistent expiration time [w00t] (Now there's a turning point. Data didn't need to change often. Denormalization was a given with cache) 5. The cache needs to be warmed or the app wont work. 6. Key/Value storage was a natural choice. No data on MySQL anymore.
  • 16. Cycle of changes - Product B 1. Postgres DB storing crawler results. 2. There was a counter in each row, and updating this counter caused contention errors. 3. Memcache for reads. Performance is better. 4. First MongoDB test, no more deadlocks from counter update. 5. Data model was simplified, the entire crawled doc was stored.
  • 17. Stuff to think about Think if the data you use aren't denormalized (cached) Most of the anti-patterns contain signs that the NoSQL route (or at least a partial NoSQL route) may simplify. Are you dependent on cache ? Does your application fails when there is no cache ? Does it just slows down ? Are you ready to think more about your data ? Think about the way to put and to get back your data from the database (be it SQL or NoSQL).
  • 18. Extra - MongoDB and Redis The next two slides are here to show what is like to use MongoDB and Redis for the same task. There is more to managing your data than stuffing it inside a database. You gotta plan ahead for searches and migrations. This example is about storing books and searching between them. MongoDB makes it simpler, just liek using its query language. Redis requires that you keep track of tags and ids to use SET operations to recover which books you want. Check http://rediscookbook.org and http://cookbook.mongodb. org/ for recipes on data handling.
  • 19. MongoDB/Redis recap - Books MongoDB Redis { 'id': 1, SET book:1 {'title' : 'Diving into Python', 'title' : 'Diving into Python', 'author': 'Mark Pilgrim'} 'author': 'Mark Pilgrim', SET book:2 { 'title' : 'Programing Erlang', 'tags': ['python','programming', 'computing'] 'author': 'Joe Armstrong'} } SET book:3 { 'title' : 'Programing in Haskell', 'author': 'Graham Hutton'} { 'id':2, 'title' : 'Programing Erlang', SADD tag:python 1 'author': 'Joe Armstrong', SADD tag:erlang 2 'tags': ['erlang','programming', 'computing', SADD tag:haskell 3 'distributedcomputing', 'FP'] SADD tag:programming 1 2 3 } SADD tag computing 1 2 3 SADD tag:distributedcomputing 2 { SADD tag:FP 2 3 'id':3, 'title' : 'Programing in Haskell', 'author': 'Graham Hutton', 'tags': ['haskell','programming', 'computing', 'FP'] }
  • 20. MongoDB/Redis recap - Books MongoDB Redis Search tags for erlang or haskell: SINTER 'tag:erlang' 'tag:haskell' db.books.find({"tags": 0 results { $in: ['erlang', 'haskell'] } SINTER 'tag:programming' 'tag:computing' }) 3 results: 1, 2, 3 Search tags for erlang AND haskell (no SUNION 'tag:erlang' 'tag:haskell' results) 2 results: 2 and 3 db.books.find({"tags": SDIFF 'tag:programming' 'tag:haskell' { $all: ['erlang', 'haskell'] 2 results: 1 and 2 (haskell is excluded) } }) This search yields 3 results db.books.find({"tags": { $all: ['programming', 'computing'] } })