Your SlideShare is downloading. ×
  • Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The Coming Database Revolution

  • 905 views
Published

For more than a decade, the evolution of database was governed largely by the incremental improvement in the major RDBMS products, and then suddenly in the past few years, a whole series of …

For more than a decade, the evolution of database was governed largely by the incremental improvement in the major RDBMS products, and then suddenly in the past few years, a whole series of innovations started to arrive. This presentation will touch on the most significant, including these "Top 12":

The impact of SSD
Vector registers
The ARM processor
Column store databases and analytic databases
In memory architecture and database
NoSQL and the failure of SQL
Big data/machine data
Hadoop and friends
Data virtualization
Cloud database - database-as-a-service
Streaming and time series databases
A mathematics of data

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
905
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
28
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. THE DATABASE REVOLUTION Robin Bloor, Ph DTuesday, August 2, 11
  • 2. This Presentation Intro: The RDBMS Computer Hardware Trends The NoSQL trend (Either No as in none or NO as in Not Only) What to do... Main Take Away: Database is no longer a commodityTuesday, August 2, 11
  • 3. A Point Of Departure In the 1990s, Relational Database quickly became the dominant form of database. The SQL language became the dominant data access mechanism. The RDBMS conferred mathematical respectability on itself and even claimed an underlying “Relational Algebra.” The RDBMS dominated because it dealt effectively with transactional and BI apps.Tuesday, August 2, 11
  • 4. Relational Dogma Data and Process should be kept separate. The database embodies a data model within a schema Normalization to 3NF (or 5NF) is the correct way to design the schema The query language (SQL) is part DDL and part DML (Select, Project, Join) Ordering doesn’t matterTuesday, August 2, 11
  • 5. The 1990s RDBMS The RDBMS of the 1990s was physically based on B-tree structures and an optimizer. This scaled up within reason but it scaled out poorly. It was fundamentally an index- based data store. It managed megabytes and gigabytes fine. But look what happened to data....Tuesday, August 2, 11
  • 6. Moore’s Law Cubed Moore’s Law suggests that CPU power increases 10-fold every 6 years (and other technologies have stayed in step to some degree) Large database volumes have grown 1000-fold: In ~1992 measured in megabytes In ~1998 measured in gigabytes In ~2004 measured in terabytes in ~2010 measured in petabytes Exabytes by ~2016?Tuesday, August 2, 11
  • 7. HARDWARETuesday, August 2, 11
  • 8. RDBMSTuesday, August 2, 11
  • 9. Tuesday, August 2, 11
  • 10. RDBMSTuesday, August 2, 11
  • 11. A Database is a Cupboard Some are transactional (for operational systems) Some service large queries against large data heaps Some are content oriented for accessing complex objects (object based systems mainly) All databases need to deliver performanceTuesday, August 2, 11
  • 12. A Database is a Cupboard RDBMS ✔ Some are transactional (for operational systems) Some service large queries against large data heaps Some are content oriented for accessing complex objects (object based systems mainly) All databases need to deliver performanceTuesday, August 2, 11
  • 13. A Database is a Cupboard RDBMS ✔ Some are transactional (for operational systems) RDBMS ?? Some service large queries against large data heaps Some are content oriented for accessing complex objects (object based systems mainly) All databases need to deliver performanceTuesday, August 2, 11
  • 14. A Database is a Cupboard RDBMS ✔ Some are transactional (for operational systems) RDBMS ?? Some service large queries against large data heaps RDBMS ?? Some are content oriented for accessing complex objects (object based systems mainly) All databases need to deliver performanceTuesday, August 2, 11
  • 15. Hardware Data Points Moore’s Law now proceeds by adding cores rather than by increasing clock speed. Vector registers now standard on Intel chips Parallelism is now on the rise and will eventually become the normal mode of processing Memory is about 1 million times faster than disk and random reads have become very expensive in respect of latency The Intel processor is now being challenged by the ARM processor (it’s about heat)Tuesday, August 2, 11
  • 16. Memory v DiskTuesday, August 2, 11
  • 17. Memory v Disk The decline in memory costs is (on current trends) likely to have memory cheaper than disk around 2016 This means that non- volatile SSDs will prevail relatively soon. SSDs are between 1000 and 100,000 times faster than spinning diskTuesday, August 2, 11
  • 18. Massive Scale-Out CPUS are now doubling cores every 18 months or so. This trend, combined with memory cost trends, suggests that massive scale out will eventually become a much rarer requirement. But we cannot know that for sure.Tuesday, August 2, 11
  • 19. Consequences SSD will replace disk - but slowly... Many DBMS tasks can now be handled in memory - but better physical architectures are possible for this. Physical indexes are becoming irrelevant Scale out and parallelism are now the driving force for large data volume applications. The physical architecture of the traditional RDBMS is now an anachronismTuesday, August 2, 11
  • 20. NoSQLTuesday, August 2, 11
  • 21. A Plethora of Databases 4th Dimension, Adabas D, AllegroGraph, Alpha Five, Altibase, Apache Derby, Aster Data, Azure Table Storage, BaseX, Berkeley DB, Bigdata, BlackRay, CA-Datacom, Cassandra, Chordless, Citrusleaf , Clarion, Cloudata, Cloudera, Clustrix, CouchDB, Network OLAP OR CSQL, CUBRID, Daffodil database, Data Management Center (DMC), Database DBMS RDBMS DBMS ODBMS DBMS Management Library, DataEase, Dataphor, DB-Fast, db4o, Derby aka Java DB, DEX, Dynomite, EffiProz, ElevateDB, Empress Embedded Database, EnterpriseDB, eXist, eXtremeDB, Faircom C-Tree, fastDB, FileDB, FileMaker Pro, Firebird, FlockDB, FrontBase, GenieDB, GigaSpaces, Gladius DB, Greenplum, GroveSite, GT.M, H2, Hadoop / HBase, HamsterDB, Hazelcast, Helix database, Hibari, HPCC, HSQLDB, Open In HyperGraphDB, Hypertable, IBM DB2, IBM DB2 Express-C, IBM Lotus Approach, IBM Text Content XML Source Memory DBMS DBMS DBMS Lotus/Domino, Infinite Graph, Infobright, InfoGrid, Informix, Ingres, InterBase, DBMS DBMS Intersystems Cache, InterSystems Caché, ISIS Family, KAI, Kognitio, LightCloud, Linter, Magma, MariaDB, Mark Logic Server, MaxDB, Mckoi SQL Database, MEMBASE, MemcacheDB, Microsoft Access, Microsoft Jet Database Engine (part of Microsoft Access), Microsoft SQL Server, Microsoft SQL Server Express, Microsoft Visual FoxPro, Mimer SQL, Mnesia , Analytic Column MonetDB, MongoDB, Morantex, mSQL, MySQL, Neo4J, NEO, Streams Temporal Hadoop Store NonStop SQL, Objectivity, Openbase, OpenInsight, OpenLink HBASE Netezza, & Virtuoso, DBMS DBMS DBMS OpenLink Virtuoso, OpenLink Virtuoso Universal Server, OpenQM, Oracle,(MPP) Rdb DBMS Oracle for OpenVMS, OrientDB, Panorama, Perst, PervasiveSQL, PicoLisp, Pincaster, PostgreSQL, Prevayler, Progress Software, Qizx, Queplix, RaptorDB, RavenDB, RDM Embedded, RDM Server, Recutils, Redis, Riak, SAND CDBMS, Sav Zigzag, Scalaris, Scalien, SciDB, ScimoreDB, Sedna, SisoDB, SmallSQL, solidDB, Sones, SQLBase, Hyper- Graph Algebraic Cloud Triple SQLDB, SQLite, Starcounter, Sterling, Stratosphere, STSdb, Sybase, Sybase IQ, media DBMS DBMS DBMS Stores tdbengine, Teradata, Terrastore, The SAS system, ThruDB, TimesTen, Tokutek , Trinity, DBMS txtSQL, U2, UniData, UniVerse, Valentina, Versant, VertexDB , Vertica, VistaDB, VMDS, Voldemort, WCE SL Plus, XSPRADA, Yserial, ZODB, ZodunaTuesday, August 2, 11
  • 22. RDBMS & SQL As Anachronisms For big BI, RDBMS has been superseded by column store dbms primarily because it didn’t scale out and indexes have become far less important. The use of snowflake schemas and star schemas had already demonstrated that 3NF was a limited modeling technique and nothing more. And then came Hadoop & MapReduce for massive scale-out - which cares nothing for SQL or RDBMSTuesday, August 2, 11
  • 23. A Fundamental Error Actions: Add, Modify, Delete, Archive From day 1 there was a fundamental error in the simple mechanics of database and file systems. When you update data you destroy the old value. No audit trail. A correct theory of data was invented by (perhaps) Luca Pacioli. It is the basis of accounting. A few databases (Firebird is one) were built so that data was only ever added or archived.Tuesday, August 2, 11
  • 24. The Ordering Of Data “A data set is an unordered collection of unique, non-duplicated items.” This is an absurd constraint to place upon data, as data is naturally ordered by time if by nothing else. Events are ordered by time. Changes to entities are ordered by time There are lots of applications. requiring time series capability. This has led to TSDB products like Streambase, Vhayu, Open TSDB, etc.Tuesday, August 2, 11
  • 25. The Separation of Data and Process The assumption was that this separation could be enforced But when you try to enforce it, you Process forever encounter data and process locked together in a guilty embrace. It is a wrong separation of concerns. SQL SCHEMA In truth it cannot be enforced without there being a true algebra of data So many databases (object databases and other NoSQL databases) do not enforce it. DBMS However their interfaces to data are not perfect either.Tuesday, August 2, 11
  • 26. Relational Algebra Isn’t An Algebra Set aside that fact that RDBMS focus so strongly on Table structures that they cannot naturally represent other important data structures (such as BOMP and MOLAP). And that RDBMS rail against the ordering of data (“No order”) Ignore the stored procedures (which violate the separation of data and process). Even so Relational Algebra is not even an algebra. (NULLs?) There is at least one algebraic (NoSQL) databaseTuesday, August 2, 11
  • 27. The SQL Barrier SQL has: DDL (for data definition) SQL Barrier DML (for Select, Project and Join) Results Or results But it has no MML or TML processing must be done here processing must be done here Usually result sets are brought to the client for further manipulation, but using them for further data access SQL becomes problematic. Conclusions: Analytic DBMS This separation of data from process is arbitrary and unhelpful Any database to which this doesn’t apply is NoSQLTuesday, August 2, 11
  • 28. Other NDBMS Directions Some NDBMS do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability) Some NDBMS deploy a distributed scale-out architecture with data redundancy. XML DBMS using XQuery are NDBMS. Some documents stores are NDBMS (OrientDB, Terrastore, etc.) Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.) Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.) Graph DBMS (DEX, OrientDB, etc.) are NDMBS Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMSTuesday, August 2, 11
  • 29. What To Do...Tuesday, August 2, 11
  • 30. What Is The Problem You Are Trying To Solve? The primary message of this presentation is that database is no longer a commodity (if it ever was). Despite faults and weaknesses the General Purpose Relations Database works fine for many areas of application and: It is well understood Skills (for any popular product) are abundant It can be inexpensive (by license or Open Source) Beyond such products, it is “horses for courses” and “caveat emptor.”Tuesday, August 2, 11
  • 31. Other Selection Criteria Don’t fall for fashion. Proven performance? Skills, both for design and for administration. Interfaces & middleware The hardware bill. Product roadmap. External support/internal support. Calculate a TCO (note that even for expensive DBMS the licenses fees are rarely more than 15% of the TCO)Tuesday, August 2, 11
  • 32. Take Aways Hardware trends have brought change, will bring more change There are many RDBMS weaknesses There are a huge number of “new” database products both No SQL Whatsoever, and Not Only SQL Select database products with caution Main Take Away: Database is no longer a commodityTuesday, August 2, 11
  • 33. Tuesday, August 2, 11
  • 34. Thank You For Your AttentionTuesday, August 2, 11