Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Relational is the new Big Data by Miguel Ángel Fajardo and Daniel Dominguez at Big Data Spain 2017

1,128 views

Published on

Relational databases were the persistence system of choice for decades, until the Web 2.0 in the 2000s required to process volumes of data so big it needed distributed systems running in parallel. A new type of databases (NoSQL) was adopted to solve this problem in different ways.

https://www.bigdataspain.org/2017/talk/relational-is-the-new-big-data

Big Data Spain 2017
16th - 17th November Kinépolis Madrid

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Relational is the new Big Data by Miguel Ángel Fajardo and Daniel Dominguez at Big Data Spain 2017

  1. 1. RELATIONAL is the new BIG DATA
  2. 2. Daniel Domínguez Head of Data Previously: CIEMAT / CERN @danieluchi01 Miguel Ángel Fajardo CTO Previously: EA Games, Gilt, Shutterstock @ma_bits
  3. 3. Distributed Processing Not Only SQL
  4. 4. A long time ago in a galaxy far, far away...
  5. 5. 1960s Information Management System (IMS) by IBM ● Built for Saturn V moon rocket ● Hierarchical, tree structure
  6. 6. 1970 Relational Model Paper ● Base for IBM DB1 and DB2
  7. 7. 1980s-90s Development of RDBMS ● Widely adopted ● Models easy to define ● ACID transactions ● Clients for all stacks ● ORMs
  8. 8. 2000s Web 2.0 ● Large volume (petabytes) ● Faster networks and devices ● Systems must scale
  9. 9. Problems scaling Relational DBs ● Sharding is hard ● Maintaining transactions ACID is hard ● Two-phase commit is hard ● Parallelizing is hard
  10. 10. Distributed Processing Not Only SQL
  11. 11. Relational databases The CAP theorem
  12. 12. Key-value stores ○ User session data ○ Component configuration ○ Cached data, fast access ○ Complex queries ○ Interconnected data
  13. 13. Column-oriented DB ○ Real time analytics ○ Facebook Messenger ○ Queries against few rows ○ Flexible data schemas ○ Incremental data loads/deletes
  14. 14. ○ Records with different fields ○ Models with many layers ○ Joins ○ Flexible queries Document-oriented DB
  15. 15. Graph DB ○ Routing ○ Social networks ○ Disease spreading ○ Hard to do aggregates ○ Analytics
  16. 16. No one magic database to rule them all ● Each of them fits a small number of use cases ● Often hard, complex and expensive to maintain ● Specific query languages CQL
  17. 17. MEANWHILE IN THE RELATIONAL BATCAVE
  18. 18. 2010s Relational strikes back ● Less structured data formats ● Partitioning ● Parallel execution ● Sharding ● C, A and P?
  19. 19. ● ACID for queries going to a single shard ● Open Source, DAAS ● PostgreSQL extension ● Interactive analytics ● Multi-tenant ● Fully ACID ● Open Source ● PostgreSQL fork ● Scaling intensive ● Multi-tenant
  20. 20. WHEN YOU HAVE A HAMMER...
  21. 21. Questions? tech.geoblink.com

×