SlideShare a Scribd company logo
1 of 48
Download to read offline
Fit For Purpose:
The New Database Revolution




     Mark Madsen & Robin Bloor
Introduction
Significant and revolutionary changes are taking place
in database technology

In order to investigate and analyze these changes and
where they may lead, The Bloor Group has teamed up
with Third Nature to launch an Open Research
project.

This is the first webinar in a series of webinars and
research activities that will comprise the project

All research will be made available through our web
site: Databaserevolution.com
Sponsors of This Research
General Webinar Structure
What & why

History of Database Part 1: How we got to the RDBMS

History of Database Part 2: Relational and Post- relational

Food For Thought: Issues, Problems, Assumptions,
Challenges

Current Conclusions: Insofar as we have any
Change? Why?
Increased data volumes

Significant hardware changes

Database product innovation

New workloads, different data structures

Established database concepts are being challenged

Market Forces can drive change
Data Volumes: Moore’s Law Cubed
Moore’s Law suggests that CPU power
increases 10-fold every 6 years (and other
technologies have stayed in step to some
degree)
Large database volumes have grown 1000-
fold every 6 years:
  In 1992, measured in megabytes
  In 1998 measured in gigabytes
  In 2004 measured in terabytes
  In 2010 measured in petabytes

Exabytes by 2016?
Hardware Changes
Moore’s Law now proceeds by adding cores
rather than by increasing clock speed.

Computer grids using commodity servers are
now relatively inexpensive

Parallelism is now on the rise and will eventually
become the normal mode of processing

Memory is about 1 million times faster than
disk and random reads have become very
expensive in respect of latency

SSD are augmenting and may eventually replace
spinning disk
Majority of Data becomes Historical Data over
       time or even all historic when no longer active

Data
          Application Performance
                                    10%



                                                              100%
         Active
          70%
                                    90%
         Static
          30% Cost $$$
                  and PAIN

Transactional Data                                        Time

                                          Image courtesy: RainStror
Market Forces
A new set of products appear

They include some fundamental innovations

A few are sufficiently popular to last

Fashion and marketing drive greater adoption

Products defects begin to be addressed

They eventually challenge the dominant products
Section 1:
         History Part 1
 Pre-relational and Relational
What we had in prior technology regimes

Where we came from

What we traded away and why
The Dawn of Database
Schema defines logical structure of data
   The schema enables extensive reuse
   Logical structure vs Physical structure

ACID properties
   Atomicity – transactions must be
   atomic
   Consistency – a transaction ensures
   consistency
   Isolation – a transaction runs in
   isolation
   Durability – a completed transaction
   causes permanent change to data
Database Performance Bottlenecks
  CPU saturation

  Memory saturation

  Disk I/O channel saturation

  Locking

  Network saturation

  Parallelism – inefficient load balancing
The Joys of SQL?
SQL is a declarative query language
targeted at data organized in two-
dimensional tables.
It enables set operations on those
tables via: Select, Project and Join
operations which can be qualified
(Order By, etc.)
It imposes some limitations on the
logical model of data.
It can create a barrier between the user
and the data....
The Ordering Of Data
“A data set is an unordered collection of
unique, non-duplicated items.”

Data is naturally ordered by time if by
nothing else.
   Events are ordered by time.
   Changes to entities are ordered by
   time

Having an inherent physical order to data
can save many processing cycles in some
areas of application

This is particularly the case for time
series applications.
The RDBMS Optimizer
The database can know how to access data better and
faster than any programmer…
   It wasn’t true
   It became true
   It isn’t always true

It only optimizes for persistent data
Section 2:
       History Part 2
Relational and Post-relational
Where we are today: oldsql, newsql and nosql

The finalizing of the distributed web architecture

Rediscovery of the past, when we had purpose-built data stores
of different types, with a twist.

Revisiting of old arguments

Challenging old assumptions
Database Product Innovation
Column Stores and Query-biased Workloads
      Column store databases are still RDBMSs

      Most SQL queries do not require all columns of a table
         So partitioning data by columns (vertically) will usually
         be better than partitioning by rows (horizontally)
         And data compression can be more efficient

      Column store databases scale up [somewhat] better
      than traditional RDBMSs depending on workload,
      queries, etc.

      Column store <> column family
New Lamps For Old
   Google, Yahoo!, Facebook and others had data management
   problems that established products did not cater for: Big Data,
   unusual data structures, new workloads

   They had money to invest and some smart engineers

   They built their own solutions: Big Table, MapReduce,
   Cassandra, etc.

   In doing so, they provoked a database revolution



In others words, the internet happened and some people noticed.
A random selection of databases
Sybase IQ, ASE             EnterpriseDB   Algebraix
Teradata, Aster Data       LucidDB        Intersystems Caché
Oracle, RAC                Vectorwise     Streambase
Microsoft SQLServer, PDW   MonetDB        SQLStream
IBM DB2s, Netezza          Exasol         Coral8
Paraccel                   Illuminate     Ingres
Kognitio                   Vertica        Postgres
EMC/Greenplum              InfiniDB       Cassandra
Oracle Exadata             1010 Data      CouchDB
SAP HANA                   SAND           Mongo
Infobright                 Endeca         Hbase
MySQL                      Xtreme Data    Redis
MarkLogic                  IMS            RainStor
Tokyo Cabinet              Hive           Scalaris
                  And a few hundred more…
Section 3: Database Discussion Topics
The core post-relational changes
in assumptions.

Key aspects of the code-
database mismatch

Reclassifying pre-relational as
NoSQL

Complex data, emergent
structure, types and schemas

Cloud and databases, uhoh?
Changing Assumptions
One single scalable piece of reliable hardware

You really need a schema all the time

A handful of discrete types are all anybody will ever need, and
when they need more they can code UDTs and UDFs in C++

SQL is the optimal way to write and retrieve data

ACID always applies

Data integrity is a key component of a database
No SQL, New Concepts
Maybe SQL is an unacceptable constraint

Maybe SQL is unnecessary for some fit-for-purpose databases,
or perhaps just unimportant

Maybe the impedance mismatch can be avoided

Maybe a formal schema is a constraint

Maybe ACID properties can be compromised
The “Impedance Mismatch”
The RDBMS stores data organized
according to table structures

The OO programmer manipulates data
organized according to complex object
structures, which may have specific
methods associated with them.

The data does not simply map to the
structure it has within the database

Consequently a mapping activity is
necessary to get and put data

Basically: hierarchies, types, result sets,
crappy APIs, language bindings, tools
NoSQL Directions: Technology Types
  Some NoSQL DBs do not attempt to provide all ACID properties.
  (Atomicity, Consistency, Isolation, Durability)

  Some NoSQL DBs deploy a distributed scale-out architecture with
  data redundancy.

  XML DBMS using XQuery are NoSQL DBs

  Some documents stores are NoSQL DBs (OrientDB, Terrastore,
  etc.)

  Object databases are NoSQL DBs (Gemstone, Objectivity,
  ObjectStore, etc.)

  Key value stores = schema-less stores (Cassandra, MongoDB,
  Berkeley DB, etc.)

  Graph DBMS (DEX, OrientDB, etc.) are NoSQL DBs

  Large data pools (BigTable, Hbase, Mnesia, etc.) are NoSQL DBs
The Cloud, uh-oh
Negative implications for shared-everything databases
that have scalability needs
There are architectural implications and possible
incompatibilities for shared-nothing databases too
Not at scale and at scale (concurrency, ingest volumes
and frequencies, etc.) are different
How does the database permit dynamic provisioning,
elasticity (+/-), etc?
The new database problems for IT
 …are probably like old problems for people who went
 through the Unix client-server era.
 Best of breed, no standards for anything, “polyglot
 persistence” = silos on steroids, data integration
 challenges, shifting data movement architectures
Recognize Tradeoffs
Read consistency vs programmatic correction
Schema vs a program to interpret each data structure
Standard access interface vs an API for each type of store
Data integrity enforcement vs programmatic control
Query performance for arbitrary queries vs planned access paths
Space efficiency vs simplicity / latency
Network transfer performance vs simplicity / latency
For the primary goals of
   Horizontal scale
   Looser coupling
   Flexibility for developers building and changing applications
Information Management Through Human History


         New technology development
                    creates
             New methods to cope
                    creates
     New information scale and availability
                   creates…
Big Data
Big data?




      Unstructured data isn’t 
      really unstructured.
      The problem is that this 
      data is unmodeled.
The holy grail of databases under current market hype




The other problem is that 
we’re talking mostly about 
computation over data 
when we talk about “big 
data” and analytics, 
another potential 
mismatch.
Conclusion
Wherein all is revealed, or ignorance exposed

Best of breed is back baby

Workload types and characteristics

The importance of understanding workload in order to select
technology

Pragmatism, babies and bathwater
Solving the Problem Depends on the Diagnosis
Types of workloads
Write‐biased:                Read‐biased:
  ▪ OLTP                       ▪ Query
  ▪ OLTP, batch                ▪ Query, simple retrieval
  ▪ OLTP, lite                 ▪ Query, complex
  ▪ Object persistence         ▪ Query‐hierarchical / 
  ▪ Data ingest, batch           object / network
  ▪ Data ingest, real‐time     ▪ Analytic



                        Mixed?
The real challenge is that few systems are all one 
workload.

Who said you have to write everything to one 
place, and read everything from the same place?
SOA offers a partial way out, and is how many 
apps work.
You must understand your 
workload ‐ throughput and 
response time requirements 
aren’t enough.
  ▪ 100 simple queries accessing 
    month‐to‐date data
  ▪ 90 simple queries accessing 
    month‐to‐date data and 10 
    complex queries using two 
    years of history
  ▪ Hazard calculation for the 
    entire customer master
  ▪ Performance problems are 
    rarely due to a single factor. 
Six Key Query Workload Elements
These characteristics help determine suitability of 
technologies to improve query performance.
  1. Retrieval – how much data comes back?
  2. Selectivity – how much data is filtered?
  3. Repetition – how often for the same query?
  4. Concurrency – how many queries at once?
  5. Data volume – how much data is being queried?
  6. Query complexity – how many joins, 
     aggregations, columns, filters, subselects, etc.?
  7. Computational complexity – how much 
     computation is performed over the data?
Characteristics of BI workloads

Workload             Selectivity Retrieval     Repetition       Complexity
Reporting / BI       Moderate     Low          Moderate         Moderate
Dashboards /         Moderate     Low          High             Low
scorecards
Ad‐hoc query and  Low to          Moderate Low                  Low to 
analysis          high            to low                        moderate
Analytics (batch)    Low          High         Low to High Low*
Analytics (inline)   High         Low          High             Low*
Operational /        High         Low          High             Low
embedded BI

* Low for retrieving the data, high if doing analytics in SQL
Choosing Hardware Architectures
               Compute and data sizes are key requirements
               PF




                                                          MR and related
Computations
               TF




                                         Shared nothing
               GF




                              Shared everything
                     PC       or shared disk
               MF




                    <10s GB    100s GB    1s TB   10s TB     100sTB        PB
                                         Data volume
                                                                           40
Choosing Hardware Architectures
Today’s reality, and true for a while in most businesses.
               PF
Computations
               TF
               GF




                       The bulk of the
                     market resides here!
               MF




                    <10s GB   100s GB    1s TB   10s TB   100sTB   PB
                                        Data volume
                                                                   41
Choosing Hardware Architectures
Today’s reality, and true for a while in most businesses.
               PF




                              …but analytics
Computations




                              pushes many things
               TF




                              into the MPP zone.
               GF




                       The bulk of the
                     market resides here!
               MF




                    <10s GB   100s GB    1s TB   10s TB   100sTB   PB
                                        Data volume
                                                                   42
Evaluating DB Technology

1. Define the key problems: 
   response time, 
   throughput, scalability?
2. Examine the workloads 
   and their requirements
3. Match those to suitable 
   technologies
4. Look for vendors using 
   those technologies
5. Evaluate on real data 
   with real workloads
                                 Slide 43
  Copyright Third Nature, Inc.
Thank You
For Your
Attention
Back-Up Slides
NoSQL Directions
Some NDBMS do not attempt to provide all ACID properties.
(Atomicity, Consistency, Isolation, Durability)

Some NDBMS deploy a distributed scale-out architecture with data
redundancy.

XML DBMS using XQuery are NDBMS.

Some documents stores are NDBMS (OrientDB, Terrastore, etc.)

Object databases are NDBMS (Gemstone, Objectivity, ObjectStore,
etc.)

Key value stores = schema-less stores (Cassandra, MongoDB,
Berkeley DB, etc.)

Graph DBMS (DEX, OrientDB, etc.) are NDMBS

Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS
The SQL Barrier
SQL has:
  DDL (for data definition)
  DML (for Select, Project and Join)
  But it has no MML (Math) or TML
  (Time)

Usually result sets are brought to
the client for further analytical
manipulation, but this creates
problems

Alternatively doing all analytical
manipulation in the database
creates problems
Discussion Topics
If not covered in history through today:
    the core post-relational change in assumptions
    nosql core drivers, persistence in cloud, finalizing of web
    arch, SOAizing
    a NoSQL classification list (types and projects/products)
    key aspects of the OR mismatch

complex data and emergent structure

database technology types

a giant list of databases

cloud and databases, uhoh?

More Related Content

What's hot

Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...i_scienceEU
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDBAhsan Bilal
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...ijcsit
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012DATAVERSITY
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data recordsDavid Walker
 
Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesStorage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesDavid Walker
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
NoSQL – Back to the Future or Yet Another DB Feature?
NoSQL – Back to the Future or Yet Another DB Feature?NoSQL – Back to the Future or Yet Another DB Feature?
NoSQL – Back to the Future or Yet Another DB Feature?Martin Scholl
 

What's hot (20)

On nosql
On nosqlOn nosql
On nosql
 
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDB
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
The new EDW
The new EDWThe new EDW
The new EDW
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho bench
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data records
 
Flexible Design
Flexible DesignFlexible Design
Flexible Design
 
Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesStorage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store Databases
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
NoSQL – Back to the Future or Yet Another DB Feature?
NoSQL – Back to the Future or Yet Another DB Feature?NoSQL – Back to the Future or Yet Another DB Feature?
NoSQL – Back to the Future or Yet Another DB Feature?
 

Viewers also liked

desaladora canal alicante.pdf
desaladora canal alicante.pdfdesaladora canal alicante.pdf
desaladora canal alicante.pdfhome
 
Legal Advice
Legal AdviceLegal Advice
Legal Advicelegal2
 
Associate Agencies, Ahmedabad, Surveying Instruments
Associate Agencies, Ahmedabad, Surveying InstrumentsAssociate Agencies, Ahmedabad, Surveying Instruments
Associate Agencies, Ahmedabad, Surveying InstrumentsIndiaMART InterMESH Limited
 
Hackers di tutta italia, unitevi!
Hackers di tutta italia, unitevi!Hackers di tutta italia, unitevi!
Hackers di tutta italia, unitevi!osimod
 
Simple Secrets
Simple SecretsSimple Secrets
Simple Secretsnath
 

Viewers also liked (7)

De Festa Infantil
De Festa InfantilDe Festa Infantil
De Festa Infantil
 
desaladora canal alicante.pdf
desaladora canal alicante.pdfdesaladora canal alicante.pdf
desaladora canal alicante.pdf
 
Legal Advice
Legal AdviceLegal Advice
Legal Advice
 
Associate Agencies, Ahmedabad, Surveying Instruments
Associate Agencies, Ahmedabad, Surveying InstrumentsAssociate Agencies, Ahmedabad, Surveying Instruments
Associate Agencies, Ahmedabad, Surveying Instruments
 
S. K. Exports, Rajasthan, Gemstones & Jewellery
S. K. Exports, Rajasthan, Gemstones & JewelleryS. K. Exports, Rajasthan, Gemstones & Jewellery
S. K. Exports, Rajasthan, Gemstones & Jewellery
 
Hackers di tutta italia, unitevi!
Hackers di tutta italia, unitevi!Hackers di tutta italia, unitevi!
Hackers di tutta italia, unitevi!
 
Simple Secrets
Simple SecretsSimple Secrets
Simple Secrets
 

Similar to Database revolution opening webcast 01 18-12

مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيMohamed Galal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolutionmark madsen
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Mohamed Galal
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013Facundo Farias
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBen Stopford
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architecturesRaji Gogulapati
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 

Similar to Database revolution opening webcast 01 18-12 (20)

مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
No sql
No sqlNo sql
No sql
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
NOSQL
NOSQLNOSQL
NOSQL
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
NoSQL
NoSQLNoSQL
NoSQL
 

More from mark madsen

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of Peoplemark madsen
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humansmark madsen
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Rangemark madsen
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software marketmark madsen
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customersmark madsen
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsmark madsen
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)mark madsen
 

More from mark madsen (20)

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customers
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analytics
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Database revolution opening webcast 01 18-12

  • 1. Fit For Purpose: The New Database Revolution Mark Madsen & Robin Bloor
  • 2. Introduction Significant and revolutionary changes are taking place in database technology In order to investigate and analyze these changes and where they may lead, The Bloor Group has teamed up with Third Nature to launch an Open Research project. This is the first webinar in a series of webinars and research activities that will comprise the project All research will be made available through our web site: Databaserevolution.com
  • 3. Sponsors of This Research
  • 4. General Webinar Structure What & why History of Database Part 1: How we got to the RDBMS History of Database Part 2: Relational and Post- relational Food For Thought: Issues, Problems, Assumptions, Challenges Current Conclusions: Insofar as we have any
  • 5. Change? Why? Increased data volumes Significant hardware changes Database product innovation New workloads, different data structures Established database concepts are being challenged Market Forces can drive change
  • 6. Data Volumes: Moore’s Law Cubed Moore’s Law suggests that CPU power increases 10-fold every 6 years (and other technologies have stayed in step to some degree) Large database volumes have grown 1000- fold every 6 years: In 1992, measured in megabytes In 1998 measured in gigabytes In 2004 measured in terabytes In 2010 measured in petabytes Exabytes by 2016?
  • 7. Hardware Changes Moore’s Law now proceeds by adding cores rather than by increasing clock speed. Computer grids using commodity servers are now relatively inexpensive Parallelism is now on the rise and will eventually become the normal mode of processing Memory is about 1 million times faster than disk and random reads have become very expensive in respect of latency SSD are augmenting and may eventually replace spinning disk
  • 8. Majority of Data becomes Historical Data over time or even all historic when no longer active Data Application Performance 10% 100% Active 70% 90% Static 30% Cost $$$ and PAIN Transactional Data Time Image courtesy: RainStror
  • 9. Market Forces A new set of products appear They include some fundamental innovations A few are sufficiently popular to last Fashion and marketing drive greater adoption Products defects begin to be addressed They eventually challenge the dominant products
  • 10. Section 1: History Part 1 Pre-relational and Relational What we had in prior technology regimes Where we came from What we traded away and why
  • 11. The Dawn of Database Schema defines logical structure of data The schema enables extensive reuse Logical structure vs Physical structure ACID properties Atomicity – transactions must be atomic Consistency – a transaction ensures consistency Isolation – a transaction runs in isolation Durability – a completed transaction causes permanent change to data
  • 12. Database Performance Bottlenecks CPU saturation Memory saturation Disk I/O channel saturation Locking Network saturation Parallelism – inefficient load balancing
  • 13. The Joys of SQL? SQL is a declarative query language targeted at data organized in two- dimensional tables. It enables set operations on those tables via: Select, Project and Join operations which can be qualified (Order By, etc.) It imposes some limitations on the logical model of data. It can create a barrier between the user and the data....
  • 14. The Ordering Of Data “A data set is an unordered collection of unique, non-duplicated items.” Data is naturally ordered by time if by nothing else. Events are ordered by time. Changes to entities are ordered by time Having an inherent physical order to data can save many processing cycles in some areas of application This is particularly the case for time series applications.
  • 15. The RDBMS Optimizer The database can know how to access data better and faster than any programmer… It wasn’t true It became true It isn’t always true It only optimizes for persistent data
  • 16. Section 2: History Part 2 Relational and Post-relational Where we are today: oldsql, newsql and nosql The finalizing of the distributed web architecture Rediscovery of the past, when we had purpose-built data stores of different types, with a twist. Revisiting of old arguments Challenging old assumptions
  • 18. Column Stores and Query-biased Workloads Column store databases are still RDBMSs Most SQL queries do not require all columns of a table So partitioning data by columns (vertically) will usually be better than partitioning by rows (horizontally) And data compression can be more efficient Column store databases scale up [somewhat] better than traditional RDBMSs depending on workload, queries, etc. Column store <> column family
  • 19. New Lamps For Old Google, Yahoo!, Facebook and others had data management problems that established products did not cater for: Big Data, unusual data structures, new workloads They had money to invest and some smart engineers They built their own solutions: Big Table, MapReduce, Cassandra, etc. In doing so, they provoked a database revolution In others words, the internet happened and some people noticed.
  • 20. A random selection of databases Sybase IQ, ASE EnterpriseDB Algebraix Teradata, Aster Data LucidDB Intersystems Caché Oracle, RAC Vectorwise Streambase Microsoft SQLServer, PDW MonetDB SQLStream IBM DB2s, Netezza Exasol Coral8 Paraccel Illuminate Ingres Kognitio Vertica Postgres EMC/Greenplum InfiniDB Cassandra Oracle Exadata 1010 Data CouchDB SAP HANA SAND Mongo Infobright Endeca Hbase MySQL Xtreme Data Redis MarkLogic IMS RainStor Tokyo Cabinet Hive Scalaris And a few hundred more…
  • 21. Section 3: Database Discussion Topics The core post-relational changes in assumptions. Key aspects of the code- database mismatch Reclassifying pre-relational as NoSQL Complex data, emergent structure, types and schemas Cloud and databases, uhoh?
  • 22. Changing Assumptions One single scalable piece of reliable hardware You really need a schema all the time A handful of discrete types are all anybody will ever need, and when they need more they can code UDTs and UDFs in C++ SQL is the optimal way to write and retrieve data ACID always applies Data integrity is a key component of a database
  • 23. No SQL, New Concepts Maybe SQL is an unacceptable constraint Maybe SQL is unnecessary for some fit-for-purpose databases, or perhaps just unimportant Maybe the impedance mismatch can be avoided Maybe a formal schema is a constraint Maybe ACID properties can be compromised
  • 24. The “Impedance Mismatch” The RDBMS stores data organized according to table structures The OO programmer manipulates data organized according to complex object structures, which may have specific methods associated with them. The data does not simply map to the structure it has within the database Consequently a mapping activity is necessary to get and put data Basically: hierarchies, types, result sets, crappy APIs, language bindings, tools
  • 25. NoSQL Directions: Technology Types Some NoSQL DBs do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability) Some NoSQL DBs deploy a distributed scale-out architecture with data redundancy. XML DBMS using XQuery are NoSQL DBs Some documents stores are NoSQL DBs (OrientDB, Terrastore, etc.) Object databases are NoSQL DBs (Gemstone, Objectivity, ObjectStore, etc.) Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.) Graph DBMS (DEX, OrientDB, etc.) are NoSQL DBs Large data pools (BigTable, Hbase, Mnesia, etc.) are NoSQL DBs
  • 26. The Cloud, uh-oh Negative implications for shared-everything databases that have scalability needs There are architectural implications and possible incompatibilities for shared-nothing databases too Not at scale and at scale (concurrency, ingest volumes and frequencies, etc.) are different How does the database permit dynamic provisioning, elasticity (+/-), etc?
  • 27. The new database problems for IT …are probably like old problems for people who went through the Unix client-server era. Best of breed, no standards for anything, “polyglot persistence” = silos on steroids, data integration challenges, shifting data movement architectures
  • 28. Recognize Tradeoffs Read consistency vs programmatic correction Schema vs a program to interpret each data structure Standard access interface vs an API for each type of store Data integrity enforcement vs programmatic control Query performance for arbitrary queries vs planned access paths Space efficiency vs simplicity / latency Network transfer performance vs simplicity / latency For the primary goals of Horizontal scale Looser coupling Flexibility for developers building and changing applications
  • 29. Information Management Through Human History New technology development creates New methods to cope creates New information scale and availability creates…
  • 31. Big data? Unstructured data isn’t  really unstructured. The problem is that this  data is unmodeled.
  • 33. Conclusion Wherein all is revealed, or ignorance exposed Best of breed is back baby Workload types and characteristics The importance of understanding workload in order to select technology Pragmatism, babies and bathwater
  • 35. Types of workloads Write‐biased:  Read‐biased: ▪ OLTP ▪ Query ▪ OLTP, batch ▪ Query, simple retrieval ▪ OLTP, lite ▪ Query, complex ▪ Object persistence ▪ Query‐hierarchical /  ▪ Data ingest, batch object / network ▪ Data ingest, real‐time ▪ Analytic Mixed?
  • 37. You must understand your  workload ‐ throughput and  response time requirements  aren’t enough. ▪ 100 simple queries accessing  month‐to‐date data ▪ 90 simple queries accessing  month‐to‐date data and 10  complex queries using two  years of history ▪ Hazard calculation for the  entire customer master ▪ Performance problems are  rarely due to a single factor. 
  • 38. Six Key Query Workload Elements These characteristics help determine suitability of  technologies to improve query performance. 1. Retrieval – how much data comes back? 2. Selectivity – how much data is filtered? 3. Repetition – how often for the same query? 4. Concurrency – how many queries at once? 5. Data volume – how much data is being queried? 6. Query complexity – how many joins,  aggregations, columns, filters, subselects, etc.? 7. Computational complexity – how much  computation is performed over the data?
  • 39. Characteristics of BI workloads Workload Selectivity Retrieval Repetition Complexity Reporting / BI Moderate Low Moderate Moderate Dashboards /  Moderate Low High Low scorecards Ad‐hoc query and  Low to  Moderate Low Low to  analysis high to low moderate Analytics (batch) Low High Low to High Low* Analytics (inline) High Low High Low* Operational /  High Low High Low embedded BI * Low for retrieving the data, high if doing analytics in SQL
  • 40. Choosing Hardware Architectures Compute and data sizes are key requirements PF MR and related Computations TF Shared nothing GF Shared everything PC or shared disk MF <10s GB 100s GB 1s TB 10s TB 100sTB PB Data volume 40
  • 41. Choosing Hardware Architectures Today’s reality, and true for a while in most businesses. PF Computations TF GF The bulk of the market resides here! MF <10s GB 100s GB 1s TB 10s TB 100sTB PB Data volume 41
  • 42. Choosing Hardware Architectures Today’s reality, and true for a while in most businesses. PF …but analytics Computations pushes many things TF into the MPP zone. GF The bulk of the market resides here! MF <10s GB 100s GB 1s TB 10s TB 100sTB PB Data volume 42
  • 43. Evaluating DB Technology 1. Define the key problems:  response time,  throughput, scalability? 2. Examine the workloads  and their requirements 3. Match those to suitable  technologies 4. Look for vendors using  those technologies 5. Evaluate on real data  with real workloads Slide 43 Copyright Third Nature, Inc.
  • 46. NoSQL Directions Some NDBMS do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability) Some NDBMS deploy a distributed scale-out architecture with data redundancy. XML DBMS using XQuery are NDBMS. Some documents stores are NDBMS (OrientDB, Terrastore, etc.) Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.) Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.) Graph DBMS (DEX, OrientDB, etc.) are NDMBS Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS
  • 47. The SQL Barrier SQL has: DDL (for data definition) DML (for Select, Project and Join) But it has no MML (Math) or TML (Time) Usually result sets are brought to the client for further analytical manipulation, but this creates problems Alternatively doing all analytical manipulation in the database creates problems
  • 48. Discussion Topics If not covered in history through today: the core post-relational change in assumptions nosql core drivers, persistence in cloud, finalizing of web arch, SOAizing a NoSQL classification list (types and projects/products) key aspects of the OR mismatch complex data and emergent structure database technology types a giant list of databases cloud and databases, uhoh?