Horses for Courses: Database Roundtable

Database Survival:
Food for Thought
Robin Bloor, Ph D

An Early Thought
Data Lakes and Databases are not
very different things…
Irrespective of what the data lake
enthusiasts claim

Everything in flux
u Hardware (network,
storage, servers)
u Data Sources
u Data Staging
u Data Volumes
u Data Flow
u Data Governance
u Query Languages
u Data Usage
u Data Structures
u Schema definition
u Ingest speeds
u Data Workloads
u Applications

The Data Lake Picture
Data
Cleansing
Data
Security
Ingest
Metadata
Mgt
Real-Time
Apps
Transform &
Aggregate
Search &
Query
BI, Visual'n
& Analytics
Other
Apps
Data Lake
Mgt
Data
Governance
DATA LAKE
To
Databases
Data Marts
Other Apps
Archive
Life Cycle
Mgt Extracts
Servers, Desktops, Mobile, Network Devices, Embedded
Chips, RFID, IoT, The Cloud, Oses, VMs, Log Files, Sys
Mgt Apps, ESBs, Web Services, SaaS, Business Apps,
Office Apps, BI Apps, Workflow, Data Streams, Social...
u Data Lakes (Yes!):
u Ingest points for data for the
sake of governance
u Analytics sandboxes
u Good places for cool and cold
data – and hence archive
u Data Lakes (No!):
u OLTP databases
u Fast query engines
u High user concurrency
u Bid Data analytics apps
u Unusually structured data
(NoSQL, graph, etc.)
You don’t have one data lake you have
many
Data lakes do not manage data well.

Streaming
There’s a spectrum of streaming
capability and thus a spectrum of
streaming platforms:
Spark, in-memory DBMS, SQLstream

Database Workload Parameters
q Read-intensive vs. write-
intensive
q Mutable vs. immutable data
q Immediate vs. eventual
consistency
q Short vs. long data latency
q Predictable vs.
unpredictable data access
patterns
q Simple vs. complex data
types

Horses for Courses
q Relational row store databases for
conventional OLTP
q Relational databases for ACID
requirements
q Parallel databases (row or column)
for unpredictable or variable query
workloads
q Specialized databases for complex
data query workloads (graph, etc.)
q NoSQL (KVS, DHT) for high scale
OLTP
q NoSQL (KVS, DHT) for low latency
read-mostly data access
q NoSQL / Hadoop /Spark for scale-
out batch analytic workloads
q Cloud Databases can be any of the
above

Database Tools
q Have you noticed how databases
are not self-running.
q DBA’s are in short supply and the
need for them is increasing
q Database diversity doesn’t help
in this area.
q DBA Tools:
q SQL analysis
q Performance analysis
q Security management
q Capacity planning
q Database deployment
q We meet the same problem with
data lakes – except that there
are very few tools

Confidential – Not for distribution Copyright © 2017 Datical. All rights reserved.
Database Automation. Business Innovation.
Bloor Roundtable Webcast

2

Mind the Gap
3
#ofReleases
Time
Database Deployments
Application Releases
The
Deployment
Gap
Organizations have a Deployment Gap
• Current DB deployment methods are
outpaced by Agile software release
process
• DB changes are:
– Manually intensive
– Slow
– Full of Risk
The Deployment Gap

4

Slowing Releases
75% of DBAs report release
delays due to database change
process are increasing.
Need Automation
Over half respondents report
automation is key to release
speed. Same amount report
current automation does not
meet their needs.
Errors Increasing
30% report errors have increased over
last 12 months. 42% of DBAs report
same.
Dev Managers say
database change
process delay releases.
90%
DBAs say database
change process delay
releases.
91%
5
CIO Magazine Survey:
Digital Transformation and the Database

Close the Gap
7
#ofReleases
Time
Database Deployments
Application Releases
The Deployment Gap
Datical Closes the Deployment Gap
• Datical DB radically improvesand simplifiesthe
application release processby automating
database change

What Makes Us Different
8
Change Management
Simulator
Simulate the impact of
database changes before
they are deployed
Dynamic Rules Engine
Automaticallyenforce
DBA rules across all
proposed database changes
Database Code Packager
Creates database
continuous integrationby
unifying application and
database changes
Deployment Monitoring
Console
Automaticallymonitor the
status of everydatabase
deploymentacross the
enterprise

Expand, Migrate, Contract
• Move from Data Clump to Coordinate class.
9
https://martinfowler.com/bliki/ParallelChange.html

10

11

• Split a column into two columns
12

• Add new columns, populate with UPDATE & Substring
13

• Drop the old column
14

Respond faster by automating the deployment of database
changes.
15
Eliminates back and forth
between Dev, QA and DBAs1
Integrates with
your tools and
processes
2
Automated deployment
Validated database changes are
automatically deployed with Datical
to different environments right
alongside application changes.
3
CODE BUILD TEST
DB CHANGES
APP CHANGES
TEST STAGE PRODUCTION

Perform higher by massively increasing productivity,
efficiency, and ROI.
DB Professional
Database pros avoid time-
consuming review of change
scripts to focus on
strategically moving the
business forward.
Developer/QA
Devs package, review, and
validate database changes
alongside app code
changes with the push of a
button.
Business Executive
Business delivers experiences faster
and more often while reducing error
and maximizing other app release
investments.
Less Time on Database
Change Management
Tasks*
Days & Weeks è Hours
80%
Decrease in
Deployment Errors to
Test and Production*
90%
* Benchmarked from Datical customers.
16

Big Data Analytics and Hybrid
Architectures
Steve Sarsfield
Steve.Sarsfield@hpe.com

– Blogger
my.vertica.com
data-governance.blogspot.com
– Author of “The Data Governance Imperative”
– Contact
– Twitter: @stevesarsfield
– steve.sarsfield@hpe.com
About the Speaker
Steve Sarsfield
Vertica Team

Picking a DB
3
Structure
• Does the data fit into a nice
clean data model
• Will the schema lack clarity
or be dynamic?
Analytics
• What question(s) do
you want to ask of the
data?
• Short running queries
• Long, deep analytics
including predictive
Size
• Is the data “Big Data”
or will it ever be big
data?
Also:
• Cost per Terabyte
• Staffing considerations
• Familiarity with
technologies
• Company Financials
• Company Ancillary
Portfolio
• Community & Openness

Security Analytics
– Are there any attacks happening
right now?
Needing different kinds of analysis is common
Weather Application
– Tell me the current
temperature and pressure
Short, fast queries
Deeper analytics with
bigger data sets
Machine learning and
predictive
– What was the high/low for my
area?
– What was the high/low for my
region?
– What was the average
temperature?
– Highest and lowest of all time?
– Can we predict conditions
tomorrow?
– What IP and where are most of my
events coming from?
– Has traffic spiked compared to
historical?
– Has any event happened liked this
over the last three years
– What new events should we be
tracking to predict security events?

HPE Vertica Enterprise
– Columnar storage and advanced compression
– Maximum performance and scalability
HPE Vertica
All built on the same trusted and proven HPE Vertica Core SQL Engine
5
Core HPE Vertica SQL Engine
• Advanced Analytics
• Open ANSI SQL Standards ++
• R, Python, Java, Spark. Scala
• In-database machine learning
HPE Vertica for SQL on Hadoop
– Native support for ORC and Parquet
– Support for industry-leading distributions
– No helper node or single point of failure
HPE Vertica In the Cloud
– Get up and running quickly in the cloud
– Flexible, enterprise-class cloud deployment options

The Appeal of Vertica
Requirement Proof
Extreme Optimization
• Columnar design for high performance analytics
• Aggressive compression
• Scalable to petabyte scale
Total Cost of Ownership
• Simply and predictable pricing
• No penalty for additional hardware or connected users
Ready for your Enterprise
• SQL compliant to 100% of the TPC-DS benchmark queries
• Secure and ACID compliant
• No single point of failure
Open and Compatible
• Open platform – Standards compliant SQL, Python, Java
• Working with open source community on Spark, Hadoop, Kafka, etc.
6

Vertica Enterprise Unique Value to expand the data warehouse
7
Hadoop Data Lake Vertica Big Data Warehouse
CREATE TABLE customer_visits (
customer_id bigint,
visit_num int)
PARTITIONED BY (page_view_dt date)
STORED AS ORC;
Customer information in Hadoop Customer information in Data Warehouse
SELECT customers.customer_id FROM orders RIGHT OUTER JOIN customers
ON orders.customer_id = customers.customer_id
GROUP BY customers.customer_id HAVING COUNT(orders.customer_id) = 0;
Vertica Engine
Querying data that sits
BOTH in the data
warehouse and Hadoop
is our unique value.
Most solutions require that
you move the data.
ROS
§ Leveraging Web Logs to gain customer insight
§ Sensor and IOT data for pre-emptive service
§ Marketing Programs Tracking
§ Tracking impact of application updates
§ Many more uses

Machine Learning in Vertica 8.0.1
Algorithm Example
Linear Regression Demand Forecasting
Model the demand for a service or good (response) based on its features (predictors) for
example; demand for different models of laptops based on monitor size, weight, price,
operating system, etc.
Logistic Regression Engineering
Predicting the likelihood that a particular mechanical part of a system will malfunction or
require maintenance (response) based on operating conditions and diagnostic
measurements (predictors)
K-means Fraud Detection
Identify individual observations that don’t align to a distinct group (cluster) and identify
types of clusters that are more likely to be at risk of
Naïve Bayes Categorization
Using fuzzy logic, identify items that in one group or another. Used in email spam
detection, language detection, sentiment analysis and document sorting
Support the whole workflow of predictive analytics

Perhaps the ultimate architecture is all-inclusive
Apache Spark, Hadoop and Kafka
Data Warehouse (Vertica)
Optimal Use Case
– Deep Analysis
– Massive scale
– Many concurrent users
Kafka
Data Lake (Hadoop)
Optimal Use Case
– Data lake
– Warm, cold storage
– Data discovery
– ETL
Operational Analytics (Spark)
Optimal Use Case
– Small, fast running queries
– ETL and complex event processing
– Operational analytics
Features:
– Vertica performs optimized data load from
Spark
– Spark runs queries on Vertica data
Features:
– Analyze-in-place without data movement
via native ORC and Parquet readers
– Any Hadoop
– Run ON the Hadoop cluster or ON Vertica
cluster
Features:
– Share data between
applications that support
Kafka
– Data streaming into Vertica

Vertica makes data matter
Purpose built for Big Data from the first line of code
Gain insight into your data 50x-1,000x
faster than legacy products
Fast Analytics
Infinitely scale your solution by addingan
unlimited number of low cost nodes
Massive scalability
Built-in support for Hadoop, R, and a
range of ETL and BI tools
Open architecture
Store 10x-30x more data per server than
row databases with patentedcolumnar
compression
Optimized data storage
HPE Vertica Community
Edition
Download and install community
edition.Manage and analyze up to 1
TB of data across three nodes for an
unlimited time.
Try it on my.vertica.com

Horses for Courses: Database Roundtable

More Related Content

What's hot

Similar to Horses for Courses: Database Roundtable

More from Eric Kavanagh

Recently uploaded

Horses for Courses: Database Roundtable