The Briefing Room with Dr. Robin Bloor and Teradata
Live Webcast on May 20, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=f09e84f88e4ca6e0a9179c9a9e930b82
Traditional data warehouses have been the backbone of corporate decision making for over three decades. With the emergence of Big Data and popular technologies like open-source Apache™ Hadoop®, some analysts question the lifespan of the data warehouse and the future role it will play in enterprise information management. But it’s not practical to believe that emerging technologies provide a wholesale replacement of existing technologies and corporate investments in data management. Rather, a better approach is for new innovations and technologies to complement and build upon existing solutions.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains where tomorrow’s data warehouse fits in the information landscape. He’ll be briefed by Imad Birouty of Teradata, who will highlight the ways in which his company is evolving to meet the challenges presented by different types of data and applications. He will also tout Teradata’s recently-announced Teradata® Database 15 and Teradata® QueryGrid™, an analytics platform that enables data processing across the enterprise.
Visit InsideAnlaysis.com for more information.
Human Factors of XR: Using Human Factors to Design XR Systems
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
1. Grab some coffee and enjoy
the pre-show banter before
the top of the hour!
2. Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
The Briefing Room
3. Twitter Tag: #briefr
The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
4. ! Reveal the essential characteristics of enterprise software,
good and bad
! Provide a forum for detailed analysis of today’s innovative
technologies
! Give vendors a chance to explain their product to savvy
analysts
! Allow audience members to pose serious questions... and get
answers!
Twitter Tag: #briefr
The Briefing Room
Mission
5. Twitter Tag: #briefr
The Briefing Room
Topics
This Month: DATABASE
June: ANALYTICS & MACHINE LEARNING
July: INNOVATIVE TECHNOLOGY
2014 Editorial Calendar at
www.insideanalysis.com/webcasts/the-briefing-room
7. Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
8. Twitter Tag: #briefr
The Briefing Room
Teradata
! Teradata is known for its analytics data solutions with a
focus on integrated data warehousing, big data analytics
and business applications
! It offers a broad suite of technology platforms and solutions
and a wide range of data management applications
! Teradata recently announced Database 15 and QueryGrid,
an analytics platform that enables data processing across
the enterprise
9. Twitter Tag: #briefr
The Briefing Room
Guest: Imad Birouty
Imad Birouty holds the position of Manager of
Teradata Product Marketing and is responsible
for Teradata software and hardware products,
including the Teradata Database, Teradata
Platform Family, Teradata Unity, Tools and
Utilities, and In-Database Analytics. Prior to
this, Imad led the Product Management team
responsible for the NCR/Teradata Platforms
and was responsible for setting product
strategy and direction.
10. NOT YOUR FATHER’S DATABASE:
BREAKING TRADITION WITH NEW
INNOVATIONS
Imad Birouty
Teradata
11. THE CHANGING DATABASE
From Structured Data To Structured and Multi-Structured Data (XML,
JSON, Weblogs)
From SQL To SQL, Java, Perl, Ruby, Python, and R
From Business Users To Business Users and Developers
From Disk Data Storage To Disk Data Storage with Solid State Drives
and in-memory
From Query Single Database To Query Multi-Databases and Sources
From Reporting/Ad Hoc Queries To Reporting/Ad Hoc Queries and 1,000s of In-database
11 5/19/14 Copyright Teradata
Analytics Algorithms
From Row Data Storage To Hybrid Row/Column Data Storage
12. RUNS THE INTERNET OF THINGS
– AND TERADATA RUNS JSON
12 5/19/14 Copyright Teradata
13. Multi-structured Data à Data Warehouse
Teradata Data Warehouse
XML
<xml />
13 5/19/14 Copyright Teradata
JSON
41521390 2013-01-01
00:25:42 2.111.94.18
Mozilla/5.0 (Macintosh; U;
Intel Mac OS X 10_6_5; en-us)
AppleWebKit/533.19.4
(KHTML, like Gecko) Version/
5.0.3 Safari/533.19.4
"http://www.cokstate.edu/
welcome/"
"https://www.google.com/
#sclient=psyab&hl=en&sour
ce=hp&q=oklahoma
+state&pbx=1&oq”
XML weblogs
14. Serving Two Perspectives
Business User
• New Data Elements
• New Data Sources
• Dynamic Data Sources
• Rapid Turnaround
• Agile Change
• Independence, Autonomy
14 5/19/14 Copyright Teradata
IT Professional
• Consistency
• Stability
• IT Processes
• Governance and Security
• Test Cycles
• Smooth Application
Interaction
15. Early and Late Binding in SQL
BI
tools
LOAD TIME RUNTIME
Early
binding
15 5/19/14 Copyright Teradata
Late
binding
Data
Warehouse
Source
data
Schema
ETL
CLOB
Weblogs
SQL +
parse/extract
functions
16. Choice: Right Approach In Any Environment
Schema on Write
• Well understood data
• Relational integrity
• Storage efficiency
16 5/19/14 Copyright Teradata
Schema On Read
• Dynamic data
• Reduced coordination
• Human readable
Teradata 15 now offers both
17. ALL THE COOL KIDS CODE IN
WELCOME TO THE COOLEST
ANALYTIC DBMS, DUDES.
Teradata table operators support
C++, Java, Perl, Ruby – and Python.
17 5/19/14 Copyright Teradata
18. Application Developers and BI SQL Programmers
Application Developers
• Flow logic control focus
• Procedural and script
languages
• Data retrieved for use in
application processing
• Work within IDE
• Object and custom build
orientation
BI SQL Programmers
• Set-based data processing
focus
• SQL language
• Data retrieval, processing,
and presentation is the
application
• Standalone or template-based
18 5/19/14 Copyright Teradata
query development
• RDBMS orientation
19. Choice of Languages
Run in-database, in parallel
• Perl
• Python
• Ruby
• R
• Shell Scripts
• C/C++
• Java
• Embed parts of application logic
in database
> Separate presentation and
processing layers
> Eliminate “round trips”
> Automatically process data in
parallel
19 5/19/14 Copyright Teradata
20. 95% IN A RECENT PETABYTE-SCALE BENCHMARK
ON TERADATA TECHNOLOGY OF I/Os WERE SERVED FROM MEMORY
In-memory performance, spinning disk prices.
ANY QUESTIONS?
20 5/19/14 Copyright Teradata
21. Improves Query Performance
Performance of in-memory databases without their cost
21 5/19/14 Copyright Teradata
• 43% of disk I/O
against 1% of data
• Hottest data in
memory/not all the
data
• Integrated into
Teradata system
• No need for
separate appliance
Data Temperature Profile – Typical DW
22. Leveraging Extended Memory Space
Hottest data placed
and maintained in
memory, aged out
as it cools
22 5/19/14 Copyright Teradata
• Sophisticated
algorithms to
track usage,
measure
temperature,
and rank data
• Compliments
FSG cache
• Dynamically
adjusts to new
query patterns
Intelligent
Memory
most
recently
used
data
most
frequently
used data
very hot in cool out
FSG
Cache
Temporarily store data
required for current
queries, purges least
recently used
23. 1+ Petabyte Benchmark – Impact of TIM
TIM set to 50%
of total system
memory
TIM showed a
cache
effectiveness of
95%
60% Reduction in
Total Physical IO!
23 5/19/14 Copyright Teradata
..but max CPU @ 100%
For both benchmarks
Clear reduction
in physical I/O…
25. Teradata QueryGrid™
Optimize, simplify, and orchestrate processing across
and beyond the Teradata UDA
• Run the right analytic on the right platform
> Take advantage of specialized processing engines while operating as a
cohesive analytic environment
• Automated and optimized work distribution through “push-down”
processing across platforms
> Minimize data movement, process data where it resides
> Minimize data duplication
> Transparently automate analytic processing and data movement
between systems
> Bi-directional data movement
• Integrated processing; within and outside the UDA
• Easy access to data and analytics through existing SQL
skills and tools
25 5/19/14 Copyright Teradata
26. Teradata QueryGrid™
TERADATA
ASTER
DATABASE
IDW Discovery
TERADATA
DATABASE
TERADATA
ASTER
DATABASE
HADOOP OTHER LANGUAGES
TERADATA
DATABASE
26 5/19/14 Copyright Teradata
DATABASES
Remote,
push-down
processing in
Hadoop
Teradata
Databases
Aster functions
such as SQL-MapReduce
™,
graph
RDBMS
Databases
Leverage
Languages such
as SAS, Perl,
Python, Ruby, R
When fully implemented, the Teradata Database or the Teradata Aster Database will be able to
intelligently use the functionality and data of multiple heterogeneous processing engines
27. Teradata Database 15 – Teradata QueryGrid™
Leverage Hadoop resources,
Reduce data movement • Bi-directional to Hadoop
3 4
27 5/19/14 Copyright Teradata
• Query push-down
• Easy configuration of
server connections
• Query through Teradata
• Sent to Hadoop through Hive
• Results returned to Teradata
• Additional processing joins data
in Teradata
• Final results sent back to
application/user
2 1
28. Customer Value Based on Social Influence
Use Case
HADOOP
• Determine
customer
sentiment
28 5/19/14 Copyright Teradata
TERADATA
ASTER
DATABASE
TERADATA
DATABASE
• Determine high value
customers based on history
• Determine customer value
based on social influence
<=
• Determine
customer
sphere of
influence
1 $$
2
3
4
32. Everything in Flux
u Hardware (network,
storage, servers)
u Data sources
u Data staging
u Data volumes
u Data flow
u Data governance
u Data usage
u Data structures
u Schema definition
u Ingest speeds
u Data workloads
33. The “Pipeline” Data Architecture
Do we take the DATA TO
THE PROCESSING…
…or the PROCESSING TO
THE DATA?
This is not a simple question
34. Data as “The New Oil”
The diagram illustrates
the fractional distillation of
crude oil
The DATA RESERVOIR/
DATA HUB concept
suggests something
similar for data
The analogy is not perfect, but
it is useful
36. In a Data Pipeline Architecture…
The structure of the data reservoir
cannot be independent of the structure
of the logical data warehouse
In our view, the whole ensemble
needs to be heuristic
37. u How do you see the future of JSON and XML?
u What is the penalty, with Teradata, for late
binding in SQL?
u What do you see as the fundamental division of
workload between Hadoop/Asterdata and
Teradata? Or, in fact, is there one?
u Why do you think Hadoop is important from a
technical perspective?
38. u Does Teradata provide any special optimization
between query and analytical workloads?
u Which specific components of the Hadoop
ecosystem does Teradata recommend using?