Welcome
Remember when it looked like this.
 They were all pretty much alike?
It used to be easy…
Now it’s quite
c0nfuZ1nG!
But it’s also quite
  exciting!
We Lose: Joe Hellerstein (Berkeley) 2001

 “Databases are commoditised and cornered
to slow-moving, evolving, structure intensive,
applications that require schema evolution.“ …
“The internet companies are lost and we will
remain in the doldrums of the enterprise
space.” …
“As databases are black boxes which require a
lot of coaxing to get maximum performance”
What Happened?
The Web
With a lot of users.
we   changed scale
tack
        ge   d
we ch an
New Approach to Data Access
Simple

Pragmatic

Solved an insoluble problem

Unencumbered by tradition (good & bad)
With this came a Different Focus
Tradition                              No SQL
•  Global consistency                  •  Local consistency
•  Schema driven                       •  Schemaless
•  Reliable Network                    •  Unreliable Network
•  Highly Structured                   •  Semi-structured/
                                          Unstructured


       NoSQL / Big Data technologies really focus on load
       and volume problems by avoiding the complexities
       associated with traditional transactional storage
The ‘Relational Camp’ had
      been busy too
 Realisation that the traditional
 architecture was insufficient for
    various modern workloads
End of an Era Paper - 2007
“Because RDBMSs can be beaten by more
than an order of magnitude on the standard
OLTP benchmark, then there is no market
where they are competitive. As such, they
should be considered as legacy technology
more than a quarter of a century in age, for
which a complete redesign and re-architecting
is the appropriate next step.” – Michael
Stonebraker
No Longer a One-Size-Fits-All
There is a new and impressive breed
•  Products ~ 5 years old
•  Shared nothing (sharded)
•  Designed for SSD’s & 10GE
•  Large address spaces (256GB+)
•  No indexes (column oriented)
•  Dropping traditional tenets (referential integrity
   etc)
•  Surprisingly quick for big queries when
   compared with incumbent technologies.
Both types of solution have
        clear value
..and it’s not really a question of size



TB
     0 1 10    100        1000                10,000




      The majority of us live in the overlap region
More a question of utility …
which this tends to lead to
   composite offerings
Compose Solutions
So what does this mean for the
         enterprise?
80% Enterprise Databases < 1TB




                           This
                           Reference
                           is getting
                           pretty old
                           now, sorry
                           (2009)
Yet we often have a lot of them
Communication is Store & Forward
The outside world
Sometimes we’re a bit more
       organized!
But most of our data is not that accessible

       Core
    Operational
                                  Exposed
       Data
…and sharing is often an afterthought

       Core
    Operational
                             Exposed
       Data
Services can help
But as data is getting bigger and
             heavier
..it can make it hard to join data
            together
So we often we turn to some form of
    Enterprise Data Warehouse
   (or maybe data virtualization)
Big data tech sometimes provides a
    composite solution (or ETL)
Ability to model data is much more of a
        gating factor than raw size

             Dave Campbell
    (Microsoft – VLDB Keynote 2012)
Importing data into a standard
model is a slow and painful processs
An alternative is to use a Late
       Bound Schema
Combining structured & unstructured approaches in a
        layered fashion makes the process more nimble



  Structured                                       Late Bound
Standardisation                                     Schema
     Layer


 Raw Data
We take this kind of approach
•  Grid of machines
•  Late bound schema
•  Sharded, immutable data
•  Low latency (real time) and high throughput
   (grid) use cases
•  All data is observable (an event)
•  Interfaces: Standardised (safe) or Raw
   (uayor)
Both Raw & Standardised data is available
                    Operational        Relational
                  (real time / MR)     Analytics



  Object/SQL
Standardisation


 Raw Data
This helps to loosen the grip of the
    single schema, whilst also
     providing a more iterative
   approach to standardisation
Support for both one standardised and many
  bespoke models in the same technology



                               Raw Facts from
                               different systems
           Standardised
              Model
Next step: to centralise common
        processing tasks



       Standardised      Risk
          Model       Calculation
Are we back to the mainframe?
Thanks



http://www.benstopford.com

The return of big iron?

  • 1.
  • 2.
    Remember when itlooked like this. They were all pretty much alike?
  • 3.
    It used tobe easy…
  • 4.
  • 6.
    But it’s alsoquite exciting!
  • 7.
    We Lose: JoeHellerstein (Berkeley) 2001 “Databases are commoditised and cornered to slow-moving, evolving, structure intensive, applications that require schema evolution.“ … “The internet companies are lost and we will remain in the doldrums of the enterprise space.” … “As databases are black boxes which require a lot of coaxing to get maximum performance”
  • 8.
  • 9.
  • 10.
    With a lotof users.
  • 11.
    we changed scale
  • 12.
    tack ge d we ch an
  • 13.
    New Approach toData Access Simple Pragmatic Solved an insoluble problem Unencumbered by tradition (good & bad)
  • 14.
    With this camea Different Focus Tradition No SQL •  Global consistency •  Local consistency •  Schema driven •  Schemaless •  Reliable Network •  Unreliable Network •  Highly Structured •  Semi-structured/ Unstructured NoSQL / Big Data technologies really focus on load and volume problems by avoiding the complexities associated with traditional transactional storage
  • 15.
    The ‘Relational Camp’had been busy too Realisation that the traditional architecture was insufficient for various modern workloads
  • 16.
    End of anEra Paper - 2007 “Because RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark, then there is no market where they are competitive. As such, they should be considered as legacy technology more than a quarter of a century in age, for which a complete redesign and re-architecting is the appropriate next step.” – Michael Stonebraker
  • 17.
    No Longer aOne-Size-Fits-All
  • 18.
    There is anew and impressive breed •  Products ~ 5 years old •  Shared nothing (sharded) •  Designed for SSD’s & 10GE •  Large address spaces (256GB+) •  No indexes (column oriented) •  Dropping traditional tenets (referential integrity etc) •  Surprisingly quick for big queries when compared with incumbent technologies.
  • 19.
    Both types ofsolution have clear value
  • 20.
    ..and it’s notreally a question of size TB 0 1 10 100 1000 10,000 The majority of us live in the overlap region
  • 21.
    More a questionof utility … which this tends to lead to composite offerings
  • 22.
  • 23.
    So what doesthis mean for the enterprise?
  • 24.
    80% Enterprise Databases< 1TB This Reference is getting pretty old now, sorry (2009)
  • 25.
    Yet we oftenhave a lot of them
  • 26.
    Communication is Store& Forward The outside world
  • 27.
    Sometimes we’re abit more organized!
  • 28.
    But most ofour data is not that accessible Core Operational Exposed Data
  • 29.
    …and sharing isoften an afterthought Core Operational Exposed Data
  • 30.
  • 31.
    But as datais getting bigger and heavier
  • 32.
    ..it can makeit hard to join data together
  • 33.
    So we oftenwe turn to some form of Enterprise Data Warehouse (or maybe data virtualization)
  • 34.
    Big data techsometimes provides a composite solution (or ETL)
  • 35.
    Ability to modeldata is much more of a gating factor than raw size Dave Campbell (Microsoft – VLDB Keynote 2012)
  • 36.
    Importing data intoa standard model is a slow and painful processs
  • 37.
    An alternative isto use a Late Bound Schema
  • 38.
    Combining structured &unstructured approaches in a layered fashion makes the process more nimble Structured Late Bound Standardisation Schema Layer Raw Data
  • 39.
    We take thiskind of approach •  Grid of machines •  Late bound schema •  Sharded, immutable data •  Low latency (real time) and high throughput (grid) use cases •  All data is observable (an event) •  Interfaces: Standardised (safe) or Raw (uayor)
  • 40.
    Both Raw &Standardised data is available Operational Relational (real time / MR) Analytics Object/SQL Standardisation Raw Data
  • 41.
    This helps toloosen the grip of the single schema, whilst also providing a more iterative approach to standardisation
  • 42.
    Support for bothone standardised and many bespoke models in the same technology Raw Facts from different systems Standardised Model
  • 43.
    Next step: tocentralise common processing tasks Standardised Risk Model Calculation
  • 44.
    Are we backto the mainframe?
  • 45.