Database Survival Guide: Exploratory Webcast

Database
Survival…
Robin Bloor, Ph D

Database Disruption
The forces of nature
often converge to
transform the very
foundations of our
infrastructure.
In the database
landscape, recent
developments have
resulted in a massive
transformation of
the DBMS market.
Understanding your
requirements is key
success these days.

Presentation Sequence
1 What is a Database
exactly?
2 The Database
Landscape
3 The Data Lake
Phenomenon

Database Fundamentals
q Built for a collection of
resources – which could
be engineered for the
application
q Shares data among
multiple concurrent users
q Optimizes performance
q Handles resilience
q Provides ACID properties
to some degree

Multiple Database Roles
Scale is a factor!

Hardware Factors
q CPUs, GPUs & FPGAs
q Cross breeding
q 3D Xpoint and PCM (and
Memristor?)
q SSDs & parallel access
q Parallel hardware
architectures
Performance is accelerating
and costs continue to fall.

The Cloud
q A Cloud Database is no
different to an on-prem,
in theory
q Most databases now
available in the cloud
q Some databases are cloud
focused (Snowflake, Reed
Shift)
q Some are hybrid (NuoDb
is a good example)

Data Growth
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Web Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Data growth is roughly 55% pa. Always has been.

The Global Map and Data Options
u Move the data to
the processing
u Move the
processing to the
data
u Move the
processing and the
data
u Shard
There will not be a single physical database (or data lake) for a
multitude of reasons.

Everything in flux
u Hardware (network,
storage, servers)
u Data Sources
u Data Staging
u Data Volumes
u Data Flow
u Data Governance
u Query Languages
u Data Usage
u Data Structures
u Schema definition
u Ingest speeds
u Data Workloads
u Applications

NoSQL Confusion
As the graph indicates,
there is some overlap
between SQL databases
and other databases.
What to choose is a use-
case driven decision.
There never was a
“universal database”
and probably there
never will be.

NoSQL World
q Some NDBMS do not attempt to
provide all ACID properties.
q Some NDBMS use a distributed
scale-out architecture with data
redundancy.
q XML DBMS using XQuery are
NDBMS.
q Some documents stores are
NDBMS
q Object databases are NDBMS
(Gemstone, Objectivity,
ObjectStore, etc.)
q Key value stores
q Graph DBMS are NDMBS
q Large data pools (BigTable,
Hbase, Mnesia, etc.) are NDBM

SQL Merits and Demerits
q SQL: very good for set
manipulation.
q Works for OLTP and many
query environments.
q Not good for nested data
structures (documents, web
pages, etc.)
q Not good for ordered data
sets
q Not good for data graphs
(networks of values)
Not a Swiss Army Knife!

The Impedance Mismatch
q The RDBMS stores data organized
according to table structures
q The OO programmer manipulates
data organized according to
complex object structures,
which may have specific
methods associated with them.
q The data does not simply map to
the structure it has within the
database
q Consequently a mapping activity
is necessary to get and put data
q Basically: hierarchies, types,
result sets, crappy APIs,
language bindings, tools.

The SQL Barrier
q SQL has:
q DDL (for data definition)
q DML (for Select, Project and
Join)
q But it has little MML (Math)
or TML (Time)
q Usually result sets are brought to
the client for further analytical
manipulation, but this creates
problems
q Alternatively doing all analytical
manipulation in the database
creates problems

The Analytics Apps
Advanced
Analytic
Methods
Machine
learning
Statistics
Numerical
methods
Text
mining &
text
analytics
Rules
engines &
constraint
programming
Information
theory & IR
Visualization
GIS

Database Mismatch
A key problem is that we talk
mostly about computation over data
when we talk about “big data” and
analytics, a potential mismatch for
both relational and NoSQL

Database Workload Parameters
q Read-intensive vs. write-
intensive
q Mutable vs. immutable data
q Immediate vs. eventual
consistency
q Short vs. long data latency
q Predictable vs.
unpredictable data access
patterns
q Simple vs. complex data
types

Horses for Courses
q Relational row store databases for
conventionally tooled low to mid-
scale OLTP
q Relational databases for ACID
requirements
q Parallel databases (row or column)
for unpredictable or variable query
workloads
q Specialized databases for complex
data query workloads
q NoSQL (KVS, DHT) for high scale
OLTP
q NoSQL (KVS, DHT) for low latency
read-mostly data access
q Parallel databases (row or column)
for analytic workloads over tabular
data
q NoSQL / Hadoop for batch analytic
workloads over large data volumes

Database Tools: A Call Out
q Have you noticed how databases
are not self-running.
q DBA’s are in short supply and the
need for them is increasing
q Database diversity doesn’t help
in this area.
q DBA Tools:
q SQL analysis
q Performance analysis
q Security management
q Capacity planning
q Database deployment
q We meet the same problem with
data lakes – except that there
are very few tools

The Impact of Parallelism
We used to see 10x performance
improvement every 6 years, now we
see 1000x (and that’s just an
approximation) regularly

The Perfect Storm – The Data Lake
q The triumph of Open
Source as a business model
q The dominance of Apache
q Hadoop, the platform
for data
q Spark, for speed
q Kafka & Nifi for data
flow
q The triumph of the cloud
and its dominance
q Cost collapse

The Primary Role of the Data Lake
System of Record
Data Governance
Application Platform

The Evolved Conception
Analytics
or BI Apps
Data
Governance
Data Lake
Mgt
Static Data Sources Data Streams
To
Databases
Data Marts
Other Apps
ETL
Data
Lake
Ingest
u Static data and data
streams
u Real-time data ingest
u Data Governance
u Data Lake Mgt
u Analytics & BI
u Extracts
The data lake becomes
the system of record

Data Bus Processing
Metadata
Mgt
Data
Cleansing
Data
Transforms
Data
Aggregat'n
Data
SecurityIt will be preferred to
complete governance
processing on the bus
where feasible. Then it
will be done at memory
speeds rather than disk
speeds.

The Full Picture
Data
Cleansing
Data
Security
Ingest
Metadata
Mgt
Real-Time
Apps
Transform &
Aggregate
Search &
Query
BI, Visual'n
& Analytics
Other
Apps
Data Lake
Mgt
Data
Governance
DATA LAKE
To
Databases
Data Marts
Other Apps
Archive
Life Cycle
Mgt Extracts
Servers, Desktops, Mobile, Network Devices, Embedded
Chips, RFID, IoT, The Cloud, Oses, VMs, Log Files, Sys
Mgt Apps, ESBs, Web Services, SaaS, Business Apps,
Office Apps, BI Apps, Workflow, Data Streams, Social...

Data Governance
If data governance was important
before Big Data, (and it was) it is
far more important in the era of
Data Lakes

Data Governance
System of record
Data provenance & lineage
Data cleansing
Data security
Data compliance
Data integrity
Data audit record
Data life-cycle mgt
Data meaning
Data Governance is a perpetual
process

The Event-based World
The event-base world is real-
time. The architecture must thus
be real-time.

A TRANSACTION is a
MOLECULE of ATOMIC EVENTS
The ATOM of data has
become the EVENT
Events: Atoms and Molecules

Events
Think of events as drops of water.
They can live in streams, and they
can also live in data pools and data
lakes and databases.

Event Types
q Instantiation Event
q A State Report
q A Trigger Event
q A Correction Event
We also need to consider:
Data Refinement
Aggregations
Homogeneous Collections
Derived Data

§ The pulse and the
threshold alert
§ Some of this involves
distributed processing
§ There are known apps
and unknown apps, so
analytical exploration
needs to be enabled
§ Only aggregations will
migrate
DepotDepot
Central
Hub
Source
Proc.
Depot
Proc.
Central
Proc.
Sensors, controllers, CPUs
Data Data
Data
Event Based IoT Architecture

u Time
u Geographic location
u Virtual/logical location
u Source device & SW
u Device ID
u Derivation (if derived)
u Creator
u Owner
u Permissions
u Status (for replication)
u Metadata
u Audit Trail
u Archive flag
Self-defining data

Database Survival Guide: Exploratory Webcast

Database Survival Guide: Exploratory Webcast

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Database Survival Guide: Exploratory Webcast

Similar to Database Survival Guide: Exploratory Webcast (20)

More from Eric Kavanagh

More from Eric Kavanagh (20)

Recently uploaded

Recently uploaded (20)

Database Survival Guide: Exploratory Webcast