Not Your Father’s Data Warehouse: Breaking Tradition with Innovation

Grab some coffee and enjoy
the pre-show banter before
the top of the hour!

Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
The Briefing Room

Twitter Tag: #briefr
The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh

! Reveal the essential characteristics of enterprise software,
good and bad
! Provide a forum for detailed analysis of today’s innovative
technologies
! Give vendors a chance to explain their product to savvy
analysts
! Allow audience members to pose serious questions... and get
answers!
The Briefing Room
Mission

The Briefing Room
Topics
This Month: DATABASE
June: ANALYTICS & MACHINE LEARNING
July: INNOVATIVE TECHNOLOGY
2014 Editorial Calendar at
www.insideanalysis.com/webcasts/the-briefing-room

The Briefing Room
Database

The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor

The Briefing Room
Teradata
! Teradata is known for its analytics data solutions with a
focus on integrated data warehousing, big data analytics
and business applications
! It offers a broad suite of technology platforms and solutions
and a wide range of data management applications
! Teradata recently announced Database 15 and QueryGrid,
an analytics platform that enables data processing across
the enterprise

The Briefing Room
Guest: Imad Birouty
Imad Birouty holds the position of Manager of
Teradata Product Marketing and is responsible
for Teradata software and hardware products,
including the Teradata Database, Teradata
Platform Family, Teradata Unity, Tools and
Utilities, and In-Database Analytics. Prior to
this, Imad led the Product Management team
responsible for the NCR/Teradata Platforms
and was responsible for setting product
strategy and direction.

NOT YOUR FATHER’S DATABASE:
BREAKING TRADITION WITH NEW
INNOVATIONS
Imad Birouty
Teradata

THE CHANGING DATABASE
From Structured Data To Structured and Multi-Structured Data (XML,
JSON, Weblogs)
From SQL To SQL, Java, Perl, Ruby, Python, and R
From Business Users To Business Users and Developers
From Disk Data Storage To Disk Data Storage with Solid State Drives
and in-memory
From Query Single Database To Query Multi-Databases and Sources
From Reporting/Ad Hoc Queries To Reporting/Ad Hoc Queries and 1,000s of In-database
11 5/19/14 Copyright Teradata
Analytics Algorithms
From Row Data Storage To Hybrid Row/Column Data Storage

RUNS THE INTERNET OF THINGS
– AND TERADATA RUNS JSON

Multi-structured Data à Data Warehouse
Teradata Data Warehouse
XML
<xml />
JSON
41521390 2013-01-01
00:25:42 2.111.94.18
Mozilla/5.0 (Macintosh; U;
Intel Mac OS X 10_6_5; en-us)
AppleWebKit/533.19.4
(KHTML, like Gecko) Version/
5.0.3 Safari/533.19.4
"http://www.cokstate.edu/
welcome/"
"https://www.google.com/
#sclient=psyab&hl=en&sour
ce=hp&q=oklahoma
+state&pbx=1&oq”
XML weblogs

Serving Two Perspectives
Business User
• New Data Elements
• New Data Sources
• Dynamic Data Sources
• Rapid Turnaround
• Agile Change
• Independence, Autonomy
IT Professional
• Consistency
• Stability
• IT Processes
• Governance and Security
• Test Cycles
• Smooth Application
Interaction

Early and Late Binding in SQL
BI
tools
LOAD TIME RUNTIME
Early
binding
Late
binding
Data
Warehouse
Source
data
Schema
ETL
CLOB
Weblogs
SQL +
parse/extract
functions

Choice: Right Approach In Any Environment
Schema on Write
• Well understood data
• Relational integrity
• Storage efficiency
Schema On Read
• Dynamic data
• Reduced coordination
• Human readable
Teradata 15 now offers both

ALL THE COOL KIDS CODE IN
WELCOME TO THE COOLEST
ANALYTIC DBMS, DUDES.
Teradata table operators support
C++, Java, Perl, Ruby – and Python.

Application Developers and BI SQL Programmers
Application Developers
• Flow logic control focus
• Procedural and script
languages
• Data retrieved for use in
application processing
• Work within IDE
• Object and custom build
orientation
BI SQL Programmers
• Set-based data processing
focus
• SQL language
• Data retrieval, processing,
and presentation is the
application
• Standalone or template-based
query development
• RDBMS orientation

Choice of Languages
Run in-database, in parallel
• Perl
• Python
• Ruby
• R
• Shell Scripts
• C/C++
• Java
• Embed parts of application logic
in database
> Separate presentation and
processing layers
> Eliminate “round trips”
> Automatically process data in
parallel

95% IN A RECENT PETABYTE-SCALE BENCHMARK
ON TERADATA TECHNOLOGY OF I/Os WERE SERVED FROM MEMORY
In-memory performance, spinning disk prices.
ANY QUESTIONS?

Improves Query Performance
Performance of in-memory databases without their cost
• 43% of disk I/O
against 1% of data
• Hottest data in
memory/not all the
data
• Integrated into
Teradata system
• No need for
separate appliance
Data Temperature Profile – Typical DW

Leveraging Extended Memory Space
Hottest data placed
and maintained in
memory, aged out
as it cools
• Sophisticated
algorithms to
track usage,
measure
temperature,
and rank data
• Compliments
FSG cache
• Dynamically
adjusts to new
query patterns
Intelligent
Memory
most
recently
used
data
most
frequently
used data
very hot in cool out
FSG
Cache
Temporarily store data
required for current
queries, purges least
recently used

1+ Petabyte Benchmark – Impact of TIM
TIM set to 50%
of total system
memory
TIM showed a
cache
effectiveness of
95%
60% Reduction in
Total Physical IO!
..but max CPU @ 100%
For both benchmarks
Clear reduction
in physical I/O…

Teradata QueryGrid™
Optimize, simplify, and orchestrate processing across
and beyond the Teradata UDA
• Run the right analytic on the right platform
> Take advantage of specialized processing engines while operating as a
cohesive analytic environment
• Automated and optimized work distribution through “push-down”
processing across platforms
> Minimize data movement, process data where it resides
> Minimize data duplication
> Transparently automate analytic processing and data movement
between systems
> Bi-directional data movement
• Integrated processing; within and outside the UDA
• Easy access to data and analytics through existing SQL
skills and tools

Teradata QueryGrid™
TERADATA
ASTER
DATABASE
IDW Discovery
TERADATA
DATABASE
TERADATA
ASTER
DATABASE
HADOOP OTHER LANGUAGES
TERADATA
DATABASE
DATABASES
Remote,
push-down
processing in
Hadoop
Teradata
Databases
Aster functions
such as SQL-MapReduce
™,
graph
RDBMS
Databases
Leverage
Languages such
as SAS, Perl,
Python, Ruby, R
When fully implemented, the Teradata Database or the Teradata Aster Database will be able to
intelligently use the functionality and data of multiple heterogeneous processing engines

Teradata Database 15 – Teradata QueryGrid™
Leverage Hadoop resources,
Reduce data movement • Bi-directional to Hadoop
3 4
• Query push-down
• Easy configuration of
server connections
• Query through Teradata
• Sent to Hadoop through Hive
• Results returned to Teradata
• Additional processing joins data
in Teradata
• Final results sent back to
application/user
2 1

Customer Value Based on Social Influence
Use Case
HADOOP
• Determine
customer
sentiment
TERADATA
ASTER
DATABASE
TERADATA
DATABASE
• Determine high value
customers based on history
• Determine customer value
based on social influence
<=
• Determine
customer
sphere of
influence
1 $$
2
3
4

QUESTIONS?

The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor

To Go With The Flow
Robin Bloor, Ph.D.

Everything in Flux
u Hardware (network,
storage, servers)
u Data sources
u Data staging
u Data volumes
u Data flow
u Data governance
u Data usage
u Data structures
u Schema definition
u Ingest speeds
u Data workloads

The “Pipeline” Data Architecture
Do we take the DATA TO
THE PROCESSING…
…or the PROCESSING TO
THE DATA?
This is not a simple question

Data as “The New Oil”
The diagram illustrates
the fractional distillation of
crude oil
The DATA RESERVOIR/
DATA HUB concept
suggests something
similar for data
The analogy is not perfect, but
it is useful

In a Data Pipeline Architecture…
The structure of the data reservoir
cannot be independent of the structure
of the logical data warehouse
In our view, the whole ensemble
needs to be heuristic

u How do you see the future of JSON and XML?
u What is the penalty, with Teradata, for late
binding in SQL?
u What do you see as the fundamental division of
workload between Hadoop/Asterdata and
Teradata? Or, in fact, is there one?
u Why do you think Hadoop is important from a
technical perspective?

u Does Teradata provide any special optimization
between query and analytical workloads?
u Which specific components of the Hadoop
ecosystem does Teradata recommend using?

The Briefing Room

This Month: DATABASE
June: ANALYTICS & MACHINE LEARNING
July: INNOVATIVE TECHNOLOGY
www.insideanalysis.com/webcasts/the-briefing-room
The Briefing Room
Upcoming Topics
2014 Editorial Calendar at
www.insideanalysis.com

THANK YOU
for your
ATTENTION!
The Briefing Room

Not Your Father’s Data Warehouse: Breaking Tradition with Innovation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Not Your Father’s Data Warehouse: Breaking Tradition with Innovation

Similar to Not Your Father’s Data Warehouse: Breaking Tradition with Innovation (20)

More from Inside Analysis

More from Inside Analysis (20)

Recently uploaded

Recently uploaded (20)

Not Your Father’s Data Warehouse: Breaking Tradition with Innovation