THE DATABASE
                         REVOLUTION




                        Robin Bloor, Ph D



Tuesday, August 2, 11
This Presentation

                          Intro: The RDBMS
                          Computer Hardware Trends
                          The NoSQL trend (Either No as
                          in none or NO as in Not Only)
                          What to do...



                        Main Take Away:

                        Database is no longer a commodity



Tuesday, August 2, 11
A Point Of Departure
            In the 1990s, Relational Database
            quickly became the dominant form
            of database.
            The SQL language became the
            dominant data access mechanism.
            The RDBMS conferred mathematical
            respectability on itself and even
            claimed an underlying “Relational
            Algebra.”
            The RDBMS dominated because it
            dealt effectively with transactional
            and BI apps.



Tuesday, August 2, 11
Relational Dogma
          Data and Process should be kept
          separate.
          The database embodies a data
          model within a schema
          Normalization to 3NF (or 5NF) is
          the correct way to design the
          schema
          The query language (SQL) is part
          DDL and part DML (Select,
          Project, Join)
          Ordering doesn’t matter

Tuesday, August 2, 11
The 1990s RDBMS
             The RDBMS of the 1990s was
             physically based on B-tree
             structures and an optimizer.
             This scaled up within reason but
             it scaled out poorly.
             It was fundamentally an index-
             based data store.
             It managed megabytes and
             gigabytes fine.
             But look what happened to
             data....

Tuesday, August 2, 11
Moore’s Law Cubed
                  Moore’s Law suggests that CPU power increases
                  10-fold every 6 years (and other technologies have
                  stayed in step to some degree)
                  Large database volumes have grown 1000-fold:
                    In ~1992 measured in megabytes
                    In ~1998 measured in gigabytes
                    In ~2004 measured in terabytes
                    in ~2010 measured in petabytes
                  Exabytes by ~2016?




Tuesday, August 2, 11
HARDWARE




Tuesday, August 2, 11
RDBMS




Tuesday, August 2, 11
Tuesday, August 2, 11
RDBMS




Tuesday, August 2, 11
A Database is a Cupboard

              Some are transactional (for
              operational systems)

              Some service large queries
              against large data heaps

              Some are content oriented for
              accessing complex objects
              (object based systems mainly)

              All databases need to deliver
              performance


Tuesday, August 2, 11
A Database is a Cupboard

                    RDBMS ✔
              Some are transactional (for
              operational systems)

              Some service large queries
              against large data heaps

              Some are content oriented for
              accessing complex objects
              (object based systems mainly)

              All databases need to deliver
              performance


Tuesday, August 2, 11
A Database is a Cupboard

                    RDBMS ✔
              Some are transactional (for
              operational systems)


                   RDBMS ??
              Some service large queries
              against large data heaps

              Some are content oriented for
              accessing complex objects
              (object based systems mainly)

              All databases need to deliver
              performance


Tuesday, August 2, 11
A Database is a Cupboard

                    RDBMS ✔
              Some are transactional (for
              operational systems)


                   RDBMS ??
              Some service large queries
              against large data heaps


                   RDBMS ??
              Some are content oriented for
              accessing complex objects
              (object based systems mainly)

              All databases need to deliver
              performance


Tuesday, August 2, 11
Hardware Data Points
          Moore’s Law now proceeds by adding
          cores rather than by increasing clock
          speed. Vector registers now standard on
          Intel chips
          Parallelism is now on the rise and will
          eventually become the normal mode of
          processing
          Memory is about 1 million times faster
          than disk and random reads have become
          very expensive in respect of latency
          The Intel processor is now being
          challenged by the ARM processor (it’s
          about heat)


Tuesday, August 2, 11
Memory v Disk




Tuesday, August 2, 11
Memory v Disk
          The decline in memory
          costs is (on current
          trends) likely to have
          memory cheaper than
          disk around 2016
          This means that non-
          volatile SSDs will
          prevail relatively soon.
          SSDs are between
          1000 and 100,000
          times faster than
          spinning disk



Tuesday, August 2, 11
Massive Scale-Out
          CPUS are now
          doubling cores every
          18 months or so.
          This trend, combined
          with memory cost
          trends, suggests that
          massive scale out will
          eventually become a
          much rarer
          requirement.
          But we cannot know
          that for sure.



Tuesday, August 2, 11
Consequences
          SSD will replace disk - but slowly...
          Many DBMS tasks can now be
          handled in memory - but better
          physical architectures are possible
          for this.
          Physical indexes are becoming
          irrelevant
          Scale out and parallelism are now
          the driving force for large data
          volume applications.
          The physical architecture of the
          traditional RDBMS is now an
          anachronism


Tuesday, August 2, 11
NoSQL




Tuesday, August 2, 11
A Plethora of Databases
                  4th Dimension, Adabas D, AllegroGraph, Alpha Five, Altibase, Apache Derby, Aster
                   Data, Azure Table Storage, BaseX, Berkeley DB, Bigdata, BlackRay, CA-Datacom,
                  Cassandra, Chordless, Citrusleaf , Clarion, Cloudata, Cloudera, Clustrix, CouchDB,
                      Network                            OLAP                                 OR
                    CSQL, CUBRID, Daffodil database, Data Management Center (DMC), Database
                       DBMS
                                       RDBMS
                                                         DBMS
                                                                          ODBMS
                                                                                             DBMS
                 Management Library, DataEase, Dataphor, DB-Fast, db4o, Derby aka Java DB, DEX,
                  Dynomite, EffiProz, ElevateDB, Empress Embedded Database, EnterpriseDB, eXist,
                    eXtremeDB, Faircom C-Tree, fastDB, FileDB, FileMaker Pro, Firebird, FlockDB,
                   FrontBase, GenieDB, GigaSpaces, Gladius DB, Greenplum, GroveSite, GT.M, H2,
                  Hadoop / HBase, HamsterDB, Hazelcast, Helix database, Hibari, HPCC, HSQLDB,
                                                                           Open                In
                HyperGraphDB, Hypertable, IBM DB2, IBM DB2 Express-C, IBM Lotus Approach, IBM
                        Text           Content            XML
                                                                          Source            Memory
                       DBMS             DBMS             DBMS
                     Lotus/Domino, Infinite Graph, Infobright, InfoGrid, Informix, Ingres, InterBase,
                                                                           DBMS              DBMS
               Intersystems Cache, InterSystems Caché, ISIS Family, KAI, Kognitio, LightCloud, Linter,
                   Magma, MariaDB, Mark Logic Server, MaxDB, Mckoi SQL Database, MEMBASE,
                  MemcacheDB, Microsoft Access, Microsoft Jet Database Engine (part of Microsoft
               Access), Microsoft SQL Server, Microsoft SQL Server Express, Microsoft Visual FoxPro,
                 Mimer SQL, Mnesia , Analytic
                      Column           MonetDB, MongoDB, Morantex, mSQL, MySQL, Neo4J, NEO,
                                                        Streams          Temporal
                                                                                            Hadoop
                       Store NonStop SQL, Objectivity, Openbase, OpenInsight, OpenLink HBASE
                   Netezza,                                                                & Virtuoso,
                                       DBMS              DBMS              DBMS
                OpenLink Virtuoso, OpenLink Virtuoso Universal Server, OpenQM, Oracle,(MPP) Rdb
                       DBMS
                                                                                              Oracle
                    for OpenVMS, OrientDB, Panorama, Perst, PervasiveSQL, PicoLisp, Pincaster,
                 PostgreSQL, Prevayler, Progress Software, Qizx, Queplix, RaptorDB, RavenDB, RDM
                 Embedded, RDM Server, Recutils, Redis, Riak, SAND CDBMS, Sav Zigzag, Scalaris,
                  Scalien, SciDB, ScimoreDB, Sedna, SisoDB, SmallSQL, solidDB, Sones, SQLBase,
                                       Hyper-
                       Graph                           Algebraic           Cloud             Triple
                   SQLDB, SQLite, Starcounter, Sterling, Stratosphere, STSdb, Sybase, Sybase IQ,
                                        media
                       DBMS                              DBMS              DBMS              Stores
                tdbengine, Teradata, Terrastore, The SAS system, ThruDB, TimesTen, Tokutek , Trinity,
                                        DBMS
                txtSQL, U2, UniData, UniVerse, Valentina, Versant, VertexDB , Vertica, VistaDB, VMDS,
                              Voldemort, WCE SL Plus, XSPRADA, Yserial, ZODB, Zoduna




Tuesday, August 2, 11
RDBMS & SQL As Anachronisms
          For big BI, RDBMS has been
          superseded by column store dbms
          primarily because it didn’t scale out
          and indexes have become far less
          important.
          The use of snowflake schemas and
          star schemas had already
          demonstrated that 3NF was a limited
          modeling technique and nothing
          more.
          And then came Hadoop & MapReduce
          for massive scale-out - which cares
          nothing for SQL or RDBMS


Tuesday, August 2, 11
A Fundamental Error
          Actions: Add, Modify, Delete,
          Archive
          From day 1 there was a fundamental
          error in the simple mechanics of
          database and file systems.
          When you update data you destroy
          the old value. No audit trail.
          A correct theory of data was
          invented by (perhaps) Luca Pacioli.
          It is the basis of accounting.
          A few databases (Firebird is one)
          were built so that data was only ever
          added or archived.


Tuesday, August 2, 11
The Ordering Of Data
          “A data set is an unordered
          collection of unique, non-duplicated
          items.”
          This is an absurd constraint to place
          upon data, as data is naturally
          ordered by time if by nothing else.
               Events are ordered by time.
               Changes to entities are ordered
               by time
          There are lots of applications.
          requiring time series capability.
          This has led to TSDB products like
          Streambase, Vhayu, Open TSDB,
          etc.


Tuesday, August 2, 11
The Separation of Data and Process
          The assumption was that this
          separation could be enforced
          But when you try to enforce it, you      Process
          forever encounter data and process
          locked together in a guilty embrace.
          It is a wrong separation of concerns.
                                                    SQL      SCHEMA
          In truth it cannot be enforced without
          there being a true algebra of data
          So many databases (object
          databases and other NoSQL
          databases) do not enforce it.            DBMS

          However their interfaces to data are
          not perfect either.




Tuesday, August 2, 11
Relational Algebra Isn’t An Algebra
          Set aside that fact that RDBMS
          focus so strongly on Table structures
          that they cannot naturally represent
          other important data structures
          (such as BOMP and MOLAP).
          And that RDBMS rail against the
          ordering of data (“No order”)
          Ignore the stored procedures (which
          violate the separation of data and
          process).
          Even so Relational Algebra is not
          even an algebra. (NULLs?)
          There is at least one algebraic
          (NoSQL) database



Tuesday, August 2, 11
The SQL Barrier
          SQL has:
            DDL (for data definition)                                     SQL
                                                                        Barrier
               DML (for Select, Project and Join)
                                                         Results                  Or results
               But it has no MML or TML                processing
                                                    must be done here
                                                                                  processing
                                                                               must be done here

          Usually result sets are brought to the
          client for further manipulation, but
          using them for further data access
                                                                         SQL
          becomes problematic.
          Conclusions:                                                            Analytic
                                                                                   DBMS

               This separation of data from
               process is arbitrary and unhelpful
               Any database to which this
               doesn’t apply is NoSQL



Tuesday, August 2, 11
Other NDBMS Directions
          Some NDBMS do not attempt to provide all ACID
          properties. (Atomicity, Consistency, Isolation, Durability)
          Some NDBMS deploy a distributed scale-out
          architecture with data redundancy.
          XML DBMS using XQuery are NDBMS.
          Some documents stores are NDBMS (OrientDB,
          Terrastore, etc.)
          Object databases are NDBMS (Gemstone, Objectivity,
          ObjectStore, etc.)
          Key value stores = schema-less stores (Cassandra,
          MongoDB, Berkeley DB, etc.)
          Graph DBMS (DEX, OrientDB, etc.) are NDMBS
          Large data pools (BigTable, Hbase, Mnesia, etc.) are
          NDBMS


Tuesday, August 2, 11
What To Do...




Tuesday, August 2, 11
What Is The Problem You Are
                   Trying To Solve?
                The primary message of this presentation is that
                database is no longer a commodity (if it ever
                was).
                Despite faults and weaknesses the General
                Purpose Relations Database works fine for many
                areas of application and:
                  It is well understood
                  Skills (for any popular product) are abundant
                  It can be inexpensive (by license or Open
                  Source)
                Beyond such products, it is “horses for courses”
                and “caveat emptor.”


Tuesday, August 2, 11
Other Selection Criteria
                Don’t fall for fashion.
                Proven performance?
                Skills, both for design and for administration.
                Interfaces & middleware
                The hardware bill.
                Product roadmap.
                External support/internal support.
                Calculate a TCO (note that even for expensive
                DBMS the licenses fees are rarely more than
                15% of the TCO)




Tuesday, August 2, 11
Take Aways
                        Hardware trends have brought change,
                        will bring more change
                        There are many RDBMS weaknesses
                        There are a huge number of “new”
                        database products both
                         No SQL Whatsoever, and
                         Not Only SQL
                        Select database products with caution
                        Main Take Away:

                        Database is no longer a commodity


Tuesday, August 2, 11
Tuesday, August 2, 11
Thank You
                        For Your
                        Attention



Tuesday, August 2, 11

The Coming Database Revolution

  • 1.
    THE DATABASE REVOLUTION Robin Bloor, Ph D Tuesday, August 2, 11
  • 2.
    This Presentation Intro: The RDBMS Computer Hardware Trends The NoSQL trend (Either No as in none or NO as in Not Only) What to do... Main Take Away: Database is no longer a commodity Tuesday, August 2, 11
  • 3.
    A Point OfDeparture In the 1990s, Relational Database quickly became the dominant form of database. The SQL language became the dominant data access mechanism. The RDBMS conferred mathematical respectability on itself and even claimed an underlying “Relational Algebra.” The RDBMS dominated because it dealt effectively with transactional and BI apps. Tuesday, August 2, 11
  • 4.
    Relational Dogma Data and Process should be kept separate. The database embodies a data model within a schema Normalization to 3NF (or 5NF) is the correct way to design the schema The query language (SQL) is part DDL and part DML (Select, Project, Join) Ordering doesn’t matter Tuesday, August 2, 11
  • 5.
    The 1990s RDBMS The RDBMS of the 1990s was physically based on B-tree structures and an optimizer. This scaled up within reason but it scaled out poorly. It was fundamentally an index- based data store. It managed megabytes and gigabytes fine. But look what happened to data.... Tuesday, August 2, 11
  • 6.
    Moore’s Law Cubed Moore’s Law suggests that CPU power increases 10-fold every 6 years (and other technologies have stayed in step to some degree) Large database volumes have grown 1000-fold: In ~1992 measured in megabytes In ~1998 measured in gigabytes In ~2004 measured in terabytes in ~2010 measured in petabytes Exabytes by ~2016? Tuesday, August 2, 11
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    A Database isa Cupboard Some are transactional (for operational systems) Some service large queries against large data heaps Some are content oriented for accessing complex objects (object based systems mainly) All databases need to deliver performance Tuesday, August 2, 11
  • 12.
    A Database isa Cupboard RDBMS ✔ Some are transactional (for operational systems) Some service large queries against large data heaps Some are content oriented for accessing complex objects (object based systems mainly) All databases need to deliver performance Tuesday, August 2, 11
  • 13.
    A Database isa Cupboard RDBMS ✔ Some are transactional (for operational systems) RDBMS ?? Some service large queries against large data heaps Some are content oriented for accessing complex objects (object based systems mainly) All databases need to deliver performance Tuesday, August 2, 11
  • 14.
    A Database isa Cupboard RDBMS ✔ Some are transactional (for operational systems) RDBMS ?? Some service large queries against large data heaps RDBMS ?? Some are content oriented for accessing complex objects (object based systems mainly) All databases need to deliver performance Tuesday, August 2, 11
  • 15.
    Hardware Data Points Moore’s Law now proceeds by adding cores rather than by increasing clock speed. Vector registers now standard on Intel chips Parallelism is now on the rise and will eventually become the normal mode of processing Memory is about 1 million times faster than disk and random reads have become very expensive in respect of latency The Intel processor is now being challenged by the ARM processor (it’s about heat) Tuesday, August 2, 11
  • 16.
  • 17.
    Memory v Disk The decline in memory costs is (on current trends) likely to have memory cheaper than disk around 2016 This means that non- volatile SSDs will prevail relatively soon. SSDs are between 1000 and 100,000 times faster than spinning disk Tuesday, August 2, 11
  • 18.
    Massive Scale-Out CPUS are now doubling cores every 18 months or so. This trend, combined with memory cost trends, suggests that massive scale out will eventually become a much rarer requirement. But we cannot know that for sure. Tuesday, August 2, 11
  • 19.
    Consequences SSD will replace disk - but slowly... Many DBMS tasks can now be handled in memory - but better physical architectures are possible for this. Physical indexes are becoming irrelevant Scale out and parallelism are now the driving force for large data volume applications. The physical architecture of the traditional RDBMS is now an anachronism Tuesday, August 2, 11
  • 20.
  • 21.
    A Plethora ofDatabases 4th Dimension, Adabas D, AllegroGraph, Alpha Five, Altibase, Apache Derby, Aster Data, Azure Table Storage, BaseX, Berkeley DB, Bigdata, BlackRay, CA-Datacom, Cassandra, Chordless, Citrusleaf , Clarion, Cloudata, Cloudera, Clustrix, CouchDB, Network OLAP OR CSQL, CUBRID, Daffodil database, Data Management Center (DMC), Database DBMS RDBMS DBMS ODBMS DBMS Management Library, DataEase, Dataphor, DB-Fast, db4o, Derby aka Java DB, DEX, Dynomite, EffiProz, ElevateDB, Empress Embedded Database, EnterpriseDB, eXist, eXtremeDB, Faircom C-Tree, fastDB, FileDB, FileMaker Pro, Firebird, FlockDB, FrontBase, GenieDB, GigaSpaces, Gladius DB, Greenplum, GroveSite, GT.M, H2, Hadoop / HBase, HamsterDB, Hazelcast, Helix database, Hibari, HPCC, HSQLDB, Open In HyperGraphDB, Hypertable, IBM DB2, IBM DB2 Express-C, IBM Lotus Approach, IBM Text Content XML Source Memory DBMS DBMS DBMS Lotus/Domino, Infinite Graph, Infobright, InfoGrid, Informix, Ingres, InterBase, DBMS DBMS Intersystems Cache, InterSystems Caché, ISIS Family, KAI, Kognitio, LightCloud, Linter, Magma, MariaDB, Mark Logic Server, MaxDB, Mckoi SQL Database, MEMBASE, MemcacheDB, Microsoft Access, Microsoft Jet Database Engine (part of Microsoft Access), Microsoft SQL Server, Microsoft SQL Server Express, Microsoft Visual FoxPro, Mimer SQL, Mnesia , Analytic Column MonetDB, MongoDB, Morantex, mSQL, MySQL, Neo4J, NEO, Streams Temporal Hadoop Store NonStop SQL, Objectivity, Openbase, OpenInsight, OpenLink HBASE Netezza, & Virtuoso, DBMS DBMS DBMS OpenLink Virtuoso, OpenLink Virtuoso Universal Server, OpenQM, Oracle,(MPP) Rdb DBMS Oracle for OpenVMS, OrientDB, Panorama, Perst, PervasiveSQL, PicoLisp, Pincaster, PostgreSQL, Prevayler, Progress Software, Qizx, Queplix, RaptorDB, RavenDB, RDM Embedded, RDM Server, Recutils, Redis, Riak, SAND CDBMS, Sav Zigzag, Scalaris, Scalien, SciDB, ScimoreDB, Sedna, SisoDB, SmallSQL, solidDB, Sones, SQLBase, Hyper- Graph Algebraic Cloud Triple SQLDB, SQLite, Starcounter, Sterling, Stratosphere, STSdb, Sybase, Sybase IQ, media DBMS DBMS DBMS Stores tdbengine, Teradata, Terrastore, The SAS system, ThruDB, TimesTen, Tokutek , Trinity, DBMS txtSQL, U2, UniData, UniVerse, Valentina, Versant, VertexDB , Vertica, VistaDB, VMDS, Voldemort, WCE SL Plus, XSPRADA, Yserial, ZODB, Zoduna Tuesday, August 2, 11
  • 22.
    RDBMS & SQLAs Anachronisms For big BI, RDBMS has been superseded by column store dbms primarily because it didn’t scale out and indexes have become far less important. The use of snowflake schemas and star schemas had already demonstrated that 3NF was a limited modeling technique and nothing more. And then came Hadoop & MapReduce for massive scale-out - which cares nothing for SQL or RDBMS Tuesday, August 2, 11
  • 23.
    A Fundamental Error Actions: Add, Modify, Delete, Archive From day 1 there was a fundamental error in the simple mechanics of database and file systems. When you update data you destroy the old value. No audit trail. A correct theory of data was invented by (perhaps) Luca Pacioli. It is the basis of accounting. A few databases (Firebird is one) were built so that data was only ever added or archived. Tuesday, August 2, 11
  • 24.
    The Ordering OfData “A data set is an unordered collection of unique, non-duplicated items.” This is an absurd constraint to place upon data, as data is naturally ordered by time if by nothing else. Events are ordered by time. Changes to entities are ordered by time There are lots of applications. requiring time series capability. This has led to TSDB products like Streambase, Vhayu, Open TSDB, etc. Tuesday, August 2, 11
  • 25.
    The Separation ofData and Process The assumption was that this separation could be enforced But when you try to enforce it, you Process forever encounter data and process locked together in a guilty embrace. It is a wrong separation of concerns. SQL SCHEMA In truth it cannot be enforced without there being a true algebra of data So many databases (object databases and other NoSQL databases) do not enforce it. DBMS However their interfaces to data are not perfect either. Tuesday, August 2, 11
  • 26.
    Relational Algebra Isn’tAn Algebra Set aside that fact that RDBMS focus so strongly on Table structures that they cannot naturally represent other important data structures (such as BOMP and MOLAP). And that RDBMS rail against the ordering of data (“No order”) Ignore the stored procedures (which violate the separation of data and process). Even so Relational Algebra is not even an algebra. (NULLs?) There is at least one algebraic (NoSQL) database Tuesday, August 2, 11
  • 27.
    The SQL Barrier SQL has: DDL (for data definition) SQL Barrier DML (for Select, Project and Join) Results Or results But it has no MML or TML processing must be done here processing must be done here Usually result sets are brought to the client for further manipulation, but using them for further data access SQL becomes problematic. Conclusions: Analytic DBMS This separation of data from process is arbitrary and unhelpful Any database to which this doesn’t apply is NoSQL Tuesday, August 2, 11
  • 28.
    Other NDBMS Directions Some NDBMS do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability) Some NDBMS deploy a distributed scale-out architecture with data redundancy. XML DBMS using XQuery are NDBMS. Some documents stores are NDBMS (OrientDB, Terrastore, etc.) Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.) Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.) Graph DBMS (DEX, OrientDB, etc.) are NDMBS Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS Tuesday, August 2, 11
  • 29.
  • 30.
    What Is TheProblem You Are Trying To Solve? The primary message of this presentation is that database is no longer a commodity (if it ever was). Despite faults and weaknesses the General Purpose Relations Database works fine for many areas of application and: It is well understood Skills (for any popular product) are abundant It can be inexpensive (by license or Open Source) Beyond such products, it is “horses for courses” and “caveat emptor.” Tuesday, August 2, 11
  • 31.
    Other Selection Criteria Don’t fall for fashion. Proven performance? Skills, both for design and for administration. Interfaces & middleware The hardware bill. Product roadmap. External support/internal support. Calculate a TCO (note that even for expensive DBMS the licenses fees are rarely more than 15% of the TCO) Tuesday, August 2, 11
  • 32.
    Take Aways Hardware trends have brought change, will bring more change There are many RDBMS weaknesses There are a huge number of “new” database products both No SQL Whatsoever, and Not Only SQL Select database products with caution Main Take Away: Database is no longer a commodity Tuesday, August 2, 11
  • 33.
  • 34.
    Thank You For Your Attention Tuesday, August 2, 11