Big iron 2 (published)

Ben Stopford
Ben StopfordEngineer at Confluent
The Return of Big Iron?
Ben Stopford
Distinguished Engineer
RBS Markets
Much diversity
What does this mean?
• A change in what customers (we) value
• The mainstream is not serving customers
(us) sufficiently
The Database field has problems
We Lose: Joe Hellerstein (Berkeley) 2001
“Databases are commoditised and cornered
to slow-moving, evolving, structure
intensive, applications that require schema
evolution.“ …
“The internet companies are lost and we will
remain in the doldrums of the enterprise
space.” …
“As databases are black boxes which
require a lot of coaxing to get maximum
performance”
His question was how to win
them back?
These new technologies also
caused frustration
Backlash (2009)
Not novel (dates back to the 80’s)
Physical level not the logical level (messy?)
Incompatible with tooling
Lack of integrity (referential) & ACID
MR is brute force ignoring indexing, scew
All points are reasonable
And they proved it too!
“A comparison of Approaches to Large Scale
Data Analysis” – Sigmod 2009
• Vertica vs. DBMSX vs.
Hadoop
• Vertica up to 7 x faster than
Hadoop over benchmarks
Databases faster
than Hadoop
But possibly missed the point?
Databases were traditionally
designed to keep data safe
NoSQL grew from a need to scale
Big iron 2 (published)
It’s more than just scale, they
facilitate different practices
A Better Fit
They better match the way software is
engineered today.
– Iterative development
– Fast feedback
– Frequent releases
Is NoSQL a Disruptive Technology?
Christensen’s observation:
Market leaders are displaced when markets
shift in ways that the incumbent leaders are
not prepared for.
Aside: MongoDB
• Impressive trajectory
• Slightly crappy product (from a traditional
database standpoint)
• Most closely related to relational DB (of
the NoSQLs)
• Plays to the agile mindset
Yet the NoSQL market is relatively
small
• Currently around $600 but projected to
grow strongly
• Database and systems management
market is worth around $34billion
Key Point

There is more to NoSQL than just
scale, it sits better with the way we
build software today
We have new building blocks to
play with!
My Problem
• Sprawling application space, built over
many years, grouped into both vertical and
horizontal silos
• Duplication of effort
• Data corruption & preventative measures
• Consolidation is costly, time consuming
and technically challenging.
Traditional solutions
(in chronological order)
– Messaging
– SOA
– Enterprise Data Warehouse
– Data virtualisation
Bringing data, applications, people
together is hard
A popular choice is an EDW
EDW pattern is workable, but tough
– As soon as you take a ‘view’ on what the
shape of the data is, it becomes harder to
change.
• Leave ‘taking a view” to the last responsible
moment

– Multifaceted: Shape, diversity of source,
diversity of population, temporal change
Harder to do iteratively
Is this the only way?
The Google Approach
MapReduce
Google Filesystem
BigTable
Tenzing
Megastore
F1
Dremel
Spanner
And just one code base!
So no enterprise schema secret
society!
The Ebay Approach
The Partial-Schematic
Approach
Often termed Clobs & Cracking
Problems with solidifying a
schematic representation
• Risk of throwing information away, keeping
only what you think you need.
– OK if you create data
– Bad if you got data from elsewhere

• Data tends to be poly-structured in
programs and on the wire
• Early-binding slows down development
But schemas are good
• They guarantee a contract
• That contract spans the whole dataset
– Similar to static typing in programming
languages.
Compromise positions
• Query schema can be a subset of data
schema.
• Use schemaless databases to capture
diversity early and evolve it as you build.
Common solutions today use
multiple technologies
M Re u
ap d ce

D a
at
W ho se
are u

?
Ke Vl u
y ae
St o
re

In- M mry/
eo
O
LTP D ba
ata se
We use an late-bound schema,
sitting over a schemaless store
S
tructured
S
tandardisation
Layer
Raw Data

Late Bound
Schema
Evolutionary Approach
• Late-binding makes consolidation
incremental
– Schematic representation delivered at the ‘last
responsible moment’ (schema on demand)
– A trade in this model has 4 mandatory nodes. A
fully modeled trade has around 800.

• The system of record is raw data, not our
‘view’ of it
• No schema migration! But this comes at a
price.
Scaling
Key based access always scales
Client
But queries (without the sharding key)
always broadcast
Client
As query complexity increases so does
the overhead
Client
Course grained shards
Client
Data Replicas provide hardware isolation
Client
Scaling
• Key based sharding is only sufficient very
simple workloads
• Course grained shards help (but suffer
from skew)
• Replication provides useful, if expensive,
hardware isolation
• Workload management is less useful in
my experience
Weak consistency forces the
problem onto the developer
Particularly bad for banks!
Scaling two phase commit is hard to
do efficiently
• Requires distributed lock/clock/counter
• Requires synchronisation of all readers &
writers
Alternatives to traditional 2PC
• MVCC over explicit locking
• Timestamp based strong consistency
– E.g. Granola

• Optimistic concurrency control
– Leverage short running transactions (avoid
cross-network transactions)
– Tolerate different temporal viewpoints to
reduce synchronization costs.
Immutable Data
•
•
•
•
•

Safety
‘As was’ view
Sits well with MVCC
Efficiency problems
Gaining popularity (e.g. Datomic)
Use joins to avoid ‘over aggregating’

Joins are ok, so long as they are
– Local
– via a unique key

Trade
r

Party
Trade
Memory/Disk Tradeoff
• Memory only (possibly overplayed)
• Pinned indexes (generally good idea if you
can afford the RAM)
• Disk resident (best general purpose
solution and for very large datasets)
Balance flexibility and complexity
Operational
(real time / MR)

Object/S
QL
S
tandardisation

Raw Data

Relational
Analytics
Supple at the front, more rigid at the back

Raw Access

Operational Access

Analytic Access

D

Looser

Tighter

L
M

Untyped

Object/S
QL

Reporting

Broad Data Coverage

Narrow Data Coverage

Narrow Query

Comprehensive Quer y
Principals
•
•
•
•

Record everything
Grow a schema, don’t do it upfront
Avoid using a ‘view’ as your system of record.
Differentiate between sourced data (out of
your control) and generated data (in your
control).
• Use automated replication (for isolation) as
well as sharding (for scale)
• Leverage asynchronicity to reduce
transaction overheads
Consolidation
means more trust,
less impedance
mismatches and
managing tighter
couplings
Target architectures are starting to
look more like large applications
of cloud enabled services than
heterogeneous application
conglomerates
Are we going back to the mainframe?
Thanks

http://www.benstopford.com
1 of 58

Recommended

The return of big iron? by
The return of big iron?The return of big iron?
The return of big iron?Ben Stopford
13.3K views45 slides
Big Data & the Enterprise by
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the EnterpriseBen Stopford
13.4K views44 slides
No sql by
No sqlNo sql
No sqlPrateek Jain
448 views38 slides
Cloud Computing: The Hard Problems Never Go Away by
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go AwayZendCon
2.4K views48 slides
SQL/NoSQL How to choose ? by
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
50.6K views43 slides
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H... by
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
5.4K views40 slides

More Related Content

What's hot

Lessons from lhc by
Lessons from lhcLessons from lhc
Lessons from lhcdrsm79
271 views40 slides
SQL or NoSQL, that is the question! by
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
2.3K views95 slides
Performance Considerations in Logical Data Warehouse by
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehouseDenodo
1.3K views19 slides
NoSQL Architecture Overview by
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
3.1K views37 slides
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost by
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostHow Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostDana Gardner
148 views9 slides
Data Warehouse in Cloud by
Data Warehouse in CloudData Warehouse in Cloud
Data Warehouse in CloudPawan Bhargava
210 views17 slides

What's hot(20)

Lessons from lhc by drsm79
Lessons from lhcLessons from lhc
Lessons from lhc
drsm79271 views
SQL or NoSQL, that is the question! by Andraz Tori
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
Andraz Tori2.3K views
Performance Considerations in Logical Data Warehouse by Denodo
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
Denodo 1.3K views
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost by Dana Gardner
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostHow Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
Dana Gardner148 views
Building a Digital Bank by DataStax
Building a Digital BankBuilding a Digital Bank
Building a Digital Bank
DataStax2K views
How to select a modern data warehouse and get the most out of it? by Slim Baltagi
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
Slim Baltagi2.7K views
O'Reilly ebook: Operationalizing the Data Lake by Vasu S
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
Vasu S119 views
Where Does Big Data Meet Big Database - QCon 2012 by Ben Stopford
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
Ben Stopford966 views
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor... by Sebastian Verheughe
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Sebastian Verheughe4.5K views
So You Want to Build a Data Lake? by David P. Moore
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
David P. Moore152 views
How much money do you lose every time your ecommerce site goes down? by DataStax
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
DataStax3.3K views
Big Challenges in Data Modeling: NoSQL and Data Modeling by DATAVERSITY
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
DATAVERSITY3.2K views

Viewers also liked

JAX London Slides by
JAX London SlidesJAX London Slides
JAX London SlidesBen Stopford
5.4K views82 slides
Streaming, Database & Distributed Systems Bridging the Divide by
Streaming, Database & Distributed Systems Bridging the DivideStreaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideBen Stopford
7.9K views82 slides
Microservices for a Streaming World by
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming WorldBen Stopford
9.9K views138 slides
Data Pipelines with Apache Kafka by
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache KafkaBen Stopford
5.1K views50 slides
A little bit of clojure by
A little bit of clojureA little bit of clojure
A little bit of clojureBen Stopford
9.4K views22 slides
The Power of the Log by
The Power of the LogThe Power of the Log
The Power of the LogBen Stopford
4.9K views71 slides

Viewers also liked(20)

JAX London Slides by Ben Stopford
JAX London SlidesJAX London Slides
JAX London Slides
Ben Stopford5.4K views
Streaming, Database & Distributed Systems Bridging the Divide by Ben Stopford
Streaming, Database & Distributed Systems Bridging the DivideStreaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the Divide
Ben Stopford7.9K views
Microservices for a Streaming World by Ben Stopford
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
Ben Stopford9.9K views
Data Pipelines with Apache Kafka by Ben Stopford
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
Ben Stopford5.1K views
A little bit of clojure by Ben Stopford
A little bit of clojureA little bit of clojure
A little bit of clojure
Ben Stopford9.4K views
The Power of the Log by Ben Stopford
The Power of the LogThe Power of the Log
The Power of the Log
Ben Stopford4.9K views
Linux Performance Tools by Brendan Gregg
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
Brendan Gregg232.8K views
Ideas for Distributing Skills Across a Continental Divide by Ben Stopford
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental Divide
Ben Stopford1.7K views
Test-Oriented Languages: Is it time for a new era? by Ben Stopford
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
Ben Stopford2.2K views
Refactoring tested code - has mocking gone wrong? by Ben Stopford
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?
Ben Stopford2.7K views
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability by Ben Stopford
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Ben Stopford8.4K views
Coherence Implementation Patterns - Sig Nov 2011 by Ben Stopford
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
Ben Stopford7.1K views
Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016 by Petr Zapletal
Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016
Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016
Petr Zapletal1.1K views
Time Manager Workshop at #QGIS2015 Conference in Nodebo by Anita Graser
Time Manager Workshop at #QGIS2015 Conference in NodeboTime Manager Workshop at #QGIS2015 Conference in Nodebo
Time Manager Workshop at #QGIS2015 Conference in Nodebo
Anita Graser38.8K views
User Focused Security at Netflix: Stethoscope by Jesse Kriss
User Focused Security at Netflix: StethoscopeUser Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: Stethoscope
Jesse Kriss2.2K views
forward and backward chaining by Rado Sianipar
forward and backward chainingforward and backward chaining
forward and backward chaining
Rado Sianipar10.4K views
Taking the friction out of microservice frameworks with Lagom by Markus Eisele
Taking the friction out of microservice frameworks with LagomTaking the friction out of microservice frameworks with Lagom
Taking the friction out of microservice frameworks with Lagom
Markus Eisele1.4K views
Stay productive while slicing up the monolith by Markus Eisele
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith
Markus Eisele438 views
Modernizing Applications with Microservices by Markus Eisele
Modernizing Applications with MicroservicesModernizing Applications with Microservices
Modernizing Applications with Microservices
Markus Eisele544 views

Similar to Big iron 2 (published)

SQL, NoSQL, BigData in Data Architecture by
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
32.6K views41 slides
What ya gonna do? by
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
364 views37 slides
Big Data Platforms: An Overview by
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
28K views93 slides
Relational databases vs Non-relational databases by
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
22.6K views46 slides
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn by
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
3.7K views46 slides
One Size Doesn't Fit All: The New Database Revolution by
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolutionmark madsen
17.1K views63 slides

Similar to Big iron 2 (published)(20)

SQL, NoSQL, BigData in Data Architecture by Venu Anuganti
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti32.6K views
What ya gonna do? by CQD
What ya gonna do?What ya gonna do?
What ya gonna do?
CQD364 views
Big Data Platforms: An Overview by C. Scyphers
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers28K views
Relational databases vs Non-relational databases by James Serra
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra22.6K views
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn by LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn3.7K views
One Size Doesn't Fit All: The New Database Revolution by mark madsen
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
mark madsen17.1K views
NoSQLDatabases by Adi Challa
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa395 views
Building FoundationDB by FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
FoundationDB3.4K views
Everything We Learned About In-Memory Data Layout While Building VoltDB by jhugg
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
jhugg826 views
Database Revolution - Exploratory Webcast by Inside Analysis
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
Inside Analysis658 views
Database revolution opening webcast 01 18-12 by mark madsen
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
mark madsen1K views
JasperWorld 2012: Reinventing Data Management by Max Schireson by MongoDB
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
MongoDB381 views
When to Use MongoDB...and When You Should Not... by MongoDB
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB6.7K views
SpringPeople - Introduction to Cloud Computing by SpringPeople
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople252 views
Navigating NoSQL in cloudy skies by shnkr_rmchndrn
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
shnkr_rmchndrn863 views
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB by BigDataCloud
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
BigDataCloud967 views
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,... by lisapaglia
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
lisapaglia1.4K views

More from Ben Stopford

10 Principals for Effective Event-Driven Microservices with Apache Kafka by
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache KafkaBen Stopford
964 views91 slides
10 Principals for Effective Event Driven Microservices by
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven MicroservicesBen Stopford
383 views91 slides
The Future of Streaming: Global Apps, Event Stores and Serverless by
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessBen Stopford
452 views56 slides
A Global Source of Truth for the Microservices Generation by
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationBen Stopford
1.6K views51 slides
Building Event Driven Services with Kafka Streams by
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBen Stopford
13K views68 slides
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams by
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with StreamsBen Stopford
4.6K views107 slides

More from Ben Stopford(18)

10 Principals for Effective Event-Driven Microservices with Apache Kafka by Ben Stopford
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
Ben Stopford964 views
10 Principals for Effective Event Driven Microservices by Ben Stopford
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
Ben Stopford383 views
The Future of Streaming: Global Apps, Event Stores and Serverless by Ben Stopford
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
Ben Stopford452 views
A Global Source of Truth for the Microservices Generation by Ben Stopford
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
Ben Stopford1.6K views
Building Event Driven Services with Kafka Streams by Ben Stopford
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
Ben Stopford13K views
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams by Ben Stopford
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Ben Stopford4.6K views
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B... by Ben Stopford
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Ben Stopford1.4K views
Building Event Driven Services with Stateful Streams by Ben Stopford
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
Ben Stopford5.8K views
Devoxx London 2017 - Rethinking Services With Stateful Streams by Ben Stopford
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
Ben Stopford6.6K views
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka by Ben Stopford
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Ben Stopford987 views
Event Driven Services Part 1: The Data Dichotomy by Ben Stopford
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
Ben Stopford806 views
Event Driven Services Part 3: Putting the Micro into Microservices with State... by Ben Stopford
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Ben Stopford1.1K views
Strata Software Architecture NY: The Data Dichotomy by Ben Stopford
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
Ben Stopford2.6K views
Advanced databases ben stopford by Ben Stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
Ben Stopford1.3K views
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H... by Ben Stopford
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
Ben Stopford5.5K views
Balancing Replication and Partitioning in a Distributed Java Database by Ben Stopford
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
Ben Stopford6.5K views
Data Grids with Oracle Coherence by Ben Stopford
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford6.9K views
Architecting for Change: An Agile Approach by Ben Stopford
Architecting for Change: An Agile ApproachArchitecting for Change: An Agile Approach
Architecting for Change: An Agile Approach
Ben Stopford1.7K views

Recently uploaded

Network Source of Truth and Infrastructure as Code revisited by
Network Source of Truth and Infrastructure as Code revisitedNetwork Source of Truth and Infrastructure as Code revisited
Network Source of Truth and Infrastructure as Code revisitedNetwork Automation Forum
26 views45 slides
Scaling Knowledge Graph Architectures with AI by
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIEnterprise Knowledge
30 views15 slides
Kyo - Functional Scala 2023.pdf by
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
368 views92 slides
HTTP headers that make your website go faster - devs.gent November 2023 by
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023Thijs Feryn
22 views151 slides
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdfDr. Jimmy Schwarzkopf
19 views29 slides
SAP Automation Using Bar Code and FIORI.pdf by
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdfVirendra Rai, PMP
23 views38 slides

Recently uploaded(20)

HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada127 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker37 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views

Big iron 2 (published)

  • 1. The Return of Big Iron? Ben Stopford Distinguished Engineer RBS Markets
  • 3. What does this mean? • A change in what customers (we) value • The mainstream is not serving customers (us) sufficiently
  • 4. The Database field has problems
  • 5. We Lose: Joe Hellerstein (Berkeley) 2001 “Databases are commoditised and cornered to slow-moving, evolving, structure intensive, applications that require schema evolution.“ … “The internet companies are lost and we will remain in the doldrums of the enterprise space.” … “As databases are black boxes which require a lot of coaxing to get maximum performance”
  • 6. His question was how to win them back?
  • 7. These new technologies also caused frustration
  • 8. Backlash (2009) Not novel (dates back to the 80’s) Physical level not the logical level (messy?) Incompatible with tooling Lack of integrity (referential) & ACID MR is brute force ignoring indexing, scew
  • 9. All points are reasonable
  • 10. And they proved it too! “A comparison of Approaches to Large Scale Data Analysis” – Sigmod 2009 • Vertica vs. DBMSX vs. Hadoop • Vertica up to 7 x faster than Hadoop over benchmarks Databases faster than Hadoop
  • 11. But possibly missed the point?
  • 13. NoSQL grew from a need to scale
  • 15. It’s more than just scale, they facilitate different practices
  • 16. A Better Fit They better match the way software is engineered today. – Iterative development – Fast feedback – Frequent releases
  • 17. Is NoSQL a Disruptive Technology? Christensen’s observation: Market leaders are displaced when markets shift in ways that the incumbent leaders are not prepared for.
  • 18. Aside: MongoDB • Impressive trajectory • Slightly crappy product (from a traditional database standpoint) • Most closely related to relational DB (of the NoSQLs) • Plays to the agile mindset
  • 19. Yet the NoSQL market is relatively small • Currently around $600 but projected to grow strongly • Database and systems management market is worth around $34billion
  • 20. Key Point There is more to NoSQL than just scale, it sits better with the way we build software today
  • 21. We have new building blocks to play with!
  • 22. My Problem • Sprawling application space, built over many years, grouped into both vertical and horizontal silos • Duplication of effort • Data corruption & preventative measures • Consolidation is costly, time consuming and technically challenging.
  • 23. Traditional solutions (in chronological order) – Messaging – SOA – Enterprise Data Warehouse – Data virtualisation
  • 24. Bringing data, applications, people together is hard
  • 25. A popular choice is an EDW
  • 26. EDW pattern is workable, but tough – As soon as you take a ‘view’ on what the shape of the data is, it becomes harder to change. • Leave ‘taking a view” to the last responsible moment – Multifaceted: Shape, diversity of source, diversity of population, temporal change
  • 27. Harder to do iteratively
  • 28. Is this the only way?
  • 29. The Google Approach MapReduce Google Filesystem BigTable Tenzing Megastore F1 Dremel Spanner
  • 30. And just one code base! So no enterprise schema secret society!
  • 33. Problems with solidifying a schematic representation • Risk of throwing information away, keeping only what you think you need. – OK if you create data – Bad if you got data from elsewhere • Data tends to be poly-structured in programs and on the wire • Early-binding slows down development
  • 34. But schemas are good • They guarantee a contract • That contract spans the whole dataset – Similar to static typing in programming languages.
  • 35. Compromise positions • Query schema can be a subset of data schema. • Use schemaless databases to capture diversity early and evolve it as you build.
  • 36. Common solutions today use multiple technologies M Re u ap d ce D a at W ho se are u ? Ke Vl u y ae St o re In- M mry/ eo O LTP D ba ata se
  • 37. We use an late-bound schema, sitting over a schemaless store S tructured S tandardisation Layer Raw Data Late Bound Schema
  • 38. Evolutionary Approach • Late-binding makes consolidation incremental – Schematic representation delivered at the ‘last responsible moment’ (schema on demand) – A trade in this model has 4 mandatory nodes. A fully modeled trade has around 800. • The system of record is raw data, not our ‘view’ of it • No schema migration! But this comes at a price.
  • 40. Key based access always scales Client
  • 41. But queries (without the sharding key) always broadcast Client
  • 42. As query complexity increases so does the overhead Client
  • 44. Data Replicas provide hardware isolation Client
  • 45. Scaling • Key based sharding is only sufficient very simple workloads • Course grained shards help (but suffer from skew) • Replication provides useful, if expensive, hardware isolation • Workload management is less useful in my experience
  • 46. Weak consistency forces the problem onto the developer Particularly bad for banks!
  • 47. Scaling two phase commit is hard to do efficiently • Requires distributed lock/clock/counter • Requires synchronisation of all readers & writers
  • 48. Alternatives to traditional 2PC • MVCC over explicit locking • Timestamp based strong consistency – E.g. Granola • Optimistic concurrency control – Leverage short running transactions (avoid cross-network transactions) – Tolerate different temporal viewpoints to reduce synchronization costs.
  • 49. Immutable Data • • • • • Safety ‘As was’ view Sits well with MVCC Efficiency problems Gaining popularity (e.g. Datomic)
  • 50. Use joins to avoid ‘over aggregating’ Joins are ok, so long as they are – Local – via a unique key Trade r Party Trade
  • 51. Memory/Disk Tradeoff • Memory only (possibly overplayed) • Pinned indexes (generally good idea if you can afford the RAM) • Disk resident (best general purpose solution and for very large datasets)
  • 52. Balance flexibility and complexity Operational (real time / MR) Object/S QL S tandardisation Raw Data Relational Analytics
  • 53. Supple at the front, more rigid at the back Raw Access Operational Access Analytic Access D Looser Tighter L M Untyped Object/S QL Reporting Broad Data Coverage Narrow Data Coverage Narrow Query Comprehensive Quer y
  • 54. Principals • • • • Record everything Grow a schema, don’t do it upfront Avoid using a ‘view’ as your system of record. Differentiate between sourced data (out of your control) and generated data (in your control). • Use automated replication (for isolation) as well as sharding (for scale) • Leverage asynchronicity to reduce transaction overheads
  • 55. Consolidation means more trust, less impedance mismatches and managing tighter couplings
  • 56. Target architectures are starting to look more like large applications of cloud enabled services than heterogeneous application conglomerates
  • 57. Are we going back to the mainframe?

Editor's Notes

  1. Think about the systems you built five or ten years ago. Who was involved in the building of a new system in the early 2000s? Who used a relational DB? Who seriously considered using anything else?
  2. Retrospective
  3. (no schema or high level languages)
  4. Companies that grew up around technology.