HADOOP & THE DATA WAREHOUSE:
WHEN TO USE WHICH
Steve Wooledge – Teradata Labs
Jim Walker – Hortonworks
1
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
Big Data Comes with BIG HEADACHES
Even free software like Hadoop is causing
companies to spend more money…Many CIOs believe
data is inexpensive because storage has become
inexpensive. But data is inherently messy—it can be
wrong, it can be duplicative, and it can be irrelevant—
which means it requires handling, which is where the
real expenses come in.
“
”
Through 2015, 85% of Fortune 500 organizations will
be unable to exploit big data for competitive advantage.
“ ”Source: The Wall Street Journal. “CIOs’ Big Problem with Big Data”. Aug 2012
Source: Gartner. “Information Innovation: Innovation Key Initiative Overview”. April 2012
Organizations Face Several Obstacles with Big Data
Source: Big Analytics 2012 Survey, Teradata
Difficulty
managing
multiple systems,
new types of data
Hard to find right
skills; Lack of
supportability
for new systems &
“data scientists”
Difficulty
deploying and
integrating new
systems
Difficulty
providing
accessibility to
fast insights on
big data
Shift from a Single Platform to an Ecosystem
“Big Data requirements are solved by
a range of platforms including
analytical databases, discovery
platforms, and NoSQL solutions
beyond Hadoop.”
“We will abandon the old models
based on the desire to implement for
high-value analytic applications.”
"Logical" Data Warehouse
Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY
PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED
DATA WAREHOUSE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Engineers
Data Scientists
Business Analysts
Front-Line WorkersCustomers / PartnersMarketing
Operational SystemsExecutives
TERADATA UNIFIED DATA ARCHITECTURE
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
DUAL
SYSTEMS
DATA
MARTS
ANALYTICAL
ARCHIVE
TEST/
DEV
The Value of The Data Warehouse
INDEPENDENT
DATA MART
Business Analysts
Knowledge Workers
DATA MININGBUSINESS INTELLIGENCE APPLICATIONS
Customers/Partners
Marketing
Executives
Front-line Workers
Operational Systems
INTEGRATED
DATA WAREHOUSE
DATA
LAB
Integrated Analytics
Advanced
Analytics
Temporal
OLAP
Optimization
Geospatial
Big Data
Integration
Application
Development
Agile
Analytics
Data
Exploration
Benefits
•Easy to consume data
•Rationalization of data
from multiple sources
into single enterprise
view
•Clean, safe, secure data
•Cross-functional
analysis
•Transform once, use
many
•Fast response times
SQL Advantages with an MPP RDBMS
• Full ANSI SQL:
• The lingua franca of business users when accessing data
• Decades of standardization (stable, feature rich, portable)
• Mature 3rd Party SQL based tools that provide business users with
self service direct access to the data
• BI Tools
• In-database statistical packages
• Analytic applications (CRM, SCM, MDM)
• Easily parallelized
• Scalable when manipulating large data sets
6/27/2013 9
ACID Advantages in an MPP RDBMS
• Guarantees database actions are
processed reliably
• Ensures 100% query result accuracy
• Supports updates and deletes
• Needed for applications that require
100% consistency
6/27/2013 10
Atomicity - All of the pieces are
committed or none are committed.
Consistency - Creates a new and
valid state of data, or, if any failure
occurs, returns all data to its original
state.
Isolation - Processed and not yet
committed transactions must remain
isolated from any other
transactions.
Durability - Committed data is
saved such that in event of a failure
and system restart, the data is
available in its correct state.
Tight Vertical Integration
• End-to-end management of resources
• Efficient utilization of resources
• Engineered extremely well for known data
• Fine-grained parallelism and resource management
• Consistency of service level delivery
Best Practices Management:
• Workload functions
• Workload groups
• Exceptions
• Priorities
• Time periods
Low Latency Advantages of MPP RDBMS
Multi-temperature
storage with automated
distribution of data based
on access patterns:
• In-Memory
• Solid-State Drives
• Fast Hard Drives
• Fat Hard Drives
6/27/2013 12
• Indexes
• Statistics
• Advanced partitioning
Cost Based Optimizer Advantages in an MPP RDBMS
• Best practices optimizer determines how
the query will be processed most
efficiently, with no “hints” or degrees of
parallelism necessary.
• In chess, you can look out a few moves
to decide your best next move, but you
can’t envision all move and countermove
sequences for the entire game:
• The Grand Master has the
knowledge, experience, and
intelligence to identify and use
the right strategy.
• With Hadoop, the user takes a
heavy role in optimizing the
execution of queries.
• With an MPP RDBMS, the
software is the optimizer.
6/27/2013 13
Query Rewrite
• semantic optimization
• different types of vendor tools
Fast/Efficient Data Access
• Access path - Indexing
• Partitioning (CP & PPI)
• Advanced partitioning schemes
(range & case based, multilevel,
dynamic)
• IO Optimizations (efficient
scans/sync scan) scan optimization
Query Complexity
• Join costing & planning
• Aggregation
Many ways to process a complex query…
Granular Security Advantages in an MPP RDBMS
• Row level security
• Column level security
• An MPP RDBMS tightly integrates mature security features
• User-level security controls
• Increased user authentication options
• Support for security roles
• Enterprise directory integration
• Auditing and monitoring controls
• Encryption
6/27/2013 14
MPP RDBMS Customer Examples
6/27/2013 15
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
© Hortonworks Inc. 2012
Data
Explosion
The World of Data is Changing
Page 18
By 2015, organizations that build a modern information management
system will outperform their peers financially by 20 percent.
– Gartner, Mark Beyer, “Information Management in the 21st Century”
1 Zettabyte
(ZB)
=
1 Billion
TBs
15x
growth rate of
machine
generated data
by 2020
Source: IDC
© Hortonworks Inc. 2012
StorageApache Hadoop: Center of Big Data Strategy
Open Source data management
with scale-out storage &
distributed processing
Page 19
HDFS
• Distributed across “nodes”
• Natively redundant
• Name node tracks locations
Processing
Map Reduce
• Splits a task across processors
“near” the data & assembles results
• Self-Healing, High Bandwidth
Clustered Storage
Key Characteristics
• Scalable
– Efficiently store and process
petabytes of data
– Linear scale driven by additional
processing and storage
• Reliable
– Redundant storage
– Failover across nodes and racks
• Flexible
– Store all types of data in any format
– Apply schema on analysis and
sharing of the data
• Economical
– Use commodity hardware
– Open source software guards
against vendor lock-in
© Hortonworks Inc. 2012
HCatalog
Table access
Aligned metadata
REST API
• Raw Hadoop data
• Inconsistent, unknown
• Tool specific access
Apache HCatalog provides flexible metadata
services across tools and external access
Metadata Services
• Consistency of metadata and data models across tools
(MapReduce, Pig, HBase and Hive)
• Accessibility: share data as tables in and out of HDFS
• Availability: enables flexible, thin-client access via REST API
Shared table
and schema
management
opens the
platform
© Hortonworks Inc. 2012
Page 21
“how to” deliver an open
source enterprise product
• Identify requirements
• Open community delivery
• Enterprise rigor
Apache
Hadoop
Test &
Patch
Design & Develop
Release
Apache
Pig
Apache
HCatalo
g
Apache
HBase
Other
Apache
Projects
Apache
Hive
Apache
Ambari
An Open Apache Community
Fastest path to innovation is an open community
© Hortonworks Inc. 2012
Big Data: It’s About Scale & Structure
Page 22
RDBMS HadoopNoSQLMPPEDW
best fit use
schemaRequired on write Required on read
speedReads are fast Writes are fast
governanceStandards and structured Loosely structured
processingLimited, no data processing Processing coupled with data
data typesStructured Multi and unstructured
Interactive OLAP Analytics
Complex ACID Transactions
Operational Data Store
Data Discovery
Processing unstructured data
Massive Storage/Processing
costSoftware License Support only
resourcesKnown entity Growing, complexities, wide
© Hortonworks Inc. 2012
An Emerging Data Architecture
Page 23
APPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MPP
DATASOURCES
MOBILE
DATA
OLTP,
POS
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
HORTONWORKS
DATA PLATFORM
© Hortonworks Inc. 2012
Interoperating With Your Tools
Page 24
APPLICATIONSDATASYSTEMS
DEV & DATA
TOOLS
OPERATIONAL
TOOLS
Viewpoint
Microsoft Applications
HADOOP
DATASOURCES
MOBILE
DATA
OLTP,
POS
SYSTEMS
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY
PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED
DATA WAREHOUSE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Engineers
Data Scientists
Business Analysts
Front-Line WorkersCustomers / PartnersMarketing
Operational SystemsExecutives
TERADATA UNIFIED DATA ARCHITECTURE
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
•FAMILIARITY
Organizations Face Several Obstacles with Big Data
Source: Big Analytics 2012 Survey, Teradata
Difficulty
managing
multiple systems,
new types of data
Hard to find right
skills; Lack of
supportability
for new systems &
“data scientists”
Difficulty
deploying and
integrating new
systems
Difficulty
providing
accessibility to
fast insights on
big data
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
Confidential and proprietary. Copyright © 2013 Teradata Corporation.30
Teradata Unified Data Architecture
• Hadoop
- Collect ALL
interaction data
• Teradata Aster
- Discovery customer
behavioral patterns
• Teradata
- Operationalize
Insights
The right technology on the right analytical problems using best of
breed technologies
Confidential and proprietary. Copyright © 2013 Teradata Corporation.31
Improved Customer Service and Retention
Hadoop
captures, stores
and transforms
social, images
and call records
Path, pattern &
graph analysis
Data Sources
Multi-Structured
Raw Data
Call Center
Voice Records
Check Images
Traditional Data Flow
Analysis +
Marketing
Automation
(Customer
Campaign)
Capture, Store
and Refine Layer
ETL Tools
Hadoop
Call Data
Integrated DW
DimensionalData
AnalyticResults
Discovery
Platform
Sentiment
Scores
SOCIAL
FEEDS
CLICKSTREAM
DATA
Confidential and proprietary. Copyright © 2013 Teradata Corporation.32
Teradata Workload-Specific Platforms
670
1650
2700
6700
Data Mart
Appliance
Extreme
Data
Appliance
Data
Warehouse
Appliance
Active
Enterprise
Data
Warehouse
Appliance for
Hadoop
Aster Big
Analytics
Appliance
SAS High
Performance
Analytics
Scale Up to 12TB Up to 186PB Up to 1.6PB Up to 61PB Up to 10PB Up to 5PB Up to 52TB
Work-
loads
Test /
Development
or Smaller
Data Marts
Analytical
Archive,
Deep Dive
Analytics
Strategic
Intelligence,
Decision
Support
System, Fast
Scan
Strategic &
Operational
Intelligence,
Real Time
Update, Active
workloads
Appliance for
Storing,
Capturing and
Refining Data.
Hortonworks
HDP 1.1
Discovery
Platform for
Big Data
Analytics with
embedded SQL
MapReduce for
new data types
& sources
Dedicated
appliance for
SAS high-
performance
analytic model
development
700
Confidential and proprietary. Copyright © 2013 Teradata Corporation.33
Teradata Unified Data Architecture
• Hadoop
- Collect ALL
interaction data
• Teradata Aster
- Discovery customer
behavioral patterns
• Teradata
- Operationalize
Insights
The right technology on the right analytical problems using best of
breed technologies
SQL-H SQL-H
Aster-Teradata
Connector
Aster Connector
for Hadoop
Teradata Connector
for Hadoop
Confidential and proprietary. Copyright © 2013 Teradata Corporation.34
Teradata SQL-H™
A Business User’s Bridge to Access Hadoop Data
Teradata SQL-H Gives Business
Users a Better Way to Access
Data Stored in Hadoop
• Trusted: Use existing tools/skills and
enable self-service BI with granular
security
• Allow standard ANSI SQL access to
Hadoop data
• Fast: Queries run on Teradata, data
accessed from Hadoop
• Efficient: Intelligent data access
leveraging the Hadoop HCatalog
Hadoop Layer: HDFS
Pig
Hive
Hadoop
MR
Teradata: SQL-H
HCatalog
Data
DataFiltering
Confidential and proprietary. Copyright © 2013 Teradata Corporation.35
The App Store of Big Data
PATH ANALYSIS
Discover Patterns in Rows of
Sequential Data
TEXT ANALYSIS
Derive Patterns and Extract
Features in Textual Data
STATISTICAL ANALYSIS
High-Performance Processing of
Common Statistical Calculations
SEGMENTATION
Discover Natural Groupings of
Data Points
MARKETING ANALYTICS
Analyze Customer Interactions to
Optimize Marketing Decisions
DATA TRANSFORMATION
Transform Data for More
Advanced Analysis
Graph Analysis
Graph analytics processing and
visualization
SQL-MapReduce
Visualization
Graphing and visualization tools
linked to key functions of the
MapReduce analytics library
Aster Discovery Portfolio: Accelerate Time to Insights
Some of the 80+ out-of-the-box analytical apps
Confidential and proprietary. Copyright © 2013 Teradata Corporation.36
Big Data Analytics & Discovery
Example Customers: Teradata Aster Big Analytics Appliance
XL Axiata
Confidential and proprietary. Copyright © 2013 Teradata Corporation.37
Discovering Deep Insights in Retail
Transforming Web Walks into DNA Sequences
Situation
Large retailer with 700M
visits/year, 2M customers / day
look at 1M products online
Problem
Increase ability of web content
owners to self-serve insights
Solution
Treat web walks like DNA
sequences of simple patterns.
Impact
• Data: loaded logs into Hortonworks
• Loaded 2 months of raw data in 1
hour, vs. 1 day on old system
• Can load a day’s log data in 60 sec
• Sessionize: Creates sequence for
visit, e.g., boils 20 customer clicks
down to 1 line:
• <Home –Search -Look at Product -
Add to Basket – Pay – Exit>
• Analyze: Business analysts can now
do path analysis
• Act:
• Segmentations by behavior can
increase conversion rates by 5-10%.
• Web design changes can drive
another 10-20% more visitors into
the sales funnel
Confidential and proprietary. Copyright © 2013 Teradata Corporation.38
Example: Online Checkout Flow Analysis
• Customers who have reached the checkout process follow an “ideal path”.
• deliveryslots > deliveryinformation > coupons > substitutions > paymentinfo > orderconfirmation
• Determine how and when (and ultimately, why) customers deviate from this path.
• Discover obstacles preventing purchase and optimize visitor flow through the web site.
• The Aster SQL-MapReduce Framework enables a variety of different path visualizations.
Teradata Portfolio for Hadoop
”Taking Hadoop from Silicon Valley to Main Street”
Most Trusted & Flexible Hadoop Platforms for Your Next-Generation
Unified Data Architecture™
1. Teradata Aster Big Analytics Appliance
2. Teradata Appliance for Hadoop
3. Teradata Commodity Offering for Hadoop (Dell)
4. Teradata Software-only for Hadoop (Hortonworks Data Platform)
Complete consulting and training capability
• Big Analytics Services – across the UDA
• Data Integration Optimization – ETL, ELT across the UDA
• Hadoop deployment & mentoring
• Teradata delivering Hortonworks training
• Hadoop Managed Services - operations & administration
Customer Support for Hadoop
• World-class Teradata customer support, backed by Hortonworks
What We Announced Today
Teradata Appliance for Hadoop
Value-Added Software Bringing Hadoop to Enterprise
Access: SQL-H™
Management: Viewpoint, TVI
Administration: Hadoop Builder,
Intelligent start/stop, DataNode
swap, deferred drive replace
High Availability : NameNode
HA, Master Machine Failover
Refining, Metadata,
Entity Resolution
Security & Data Access
HCatalog KerberosKerberos
41 6/27/2013 Teradata Confidential
Complete Consulting and Training Capability
Post-sale Services Areas of Focus
Teradata Analytic
Architecture Services
Services to scope, design, build, operate and maintain an optimal UDA approach
for Teradata, Aster, and Hadoop
Teradata DI Optimization Assess structured/non-structured data, discuss data loading techniques,
determine best platform, optimize load scripts/processes
Teradata Big Analytics Assess data value/cost of capture, identify source of “exhaust” data, create
conceptual architecture, refine and enrich the data, implement initial analytics in
Aster or best-fit tool
Teradata Workshop for
Hadoop
Introduction workshop (across all of UDA)
Teradata Data Staging for
Hadoop
Load data into landing-area; set-up data exploration/refining area; Scope
architecture and analytics; set-up Hadoop repository; Load sample data
Teradata Platform for
Hadoop
Installation guidance and mentoring for Hadoop platform, D-I-Y after installation
Teradata Managed
Services for Hadoop
Operations, management, administration, backup, security, process control for
Hadoop
Teradata Training Courses
for Hadoop
Two comprehensive, multi-day training offerings: 1) Administration of Apache
Hadoop and 2) Developing Solutions Using Apache Hadoop
42 6/27/2013 Teradata Confidential
When to Use Which?
The best approach by workload and data type
Processing as a Function of Schema Requirements and Stage of Data Pipeline
Low Cost
Storage and
Fast Loading
Data Pre-
Processing,
Refining,
Cleansing
“Simple math
at scale”
(Score, filter,
sort, avg.,
count...)
Joins,
Unions,
Aggregates
Analytics
(Iterative and
data mining)
Reporting
Stable
Schema
Evolving
Schema
Aster
(SQL +
MapReduce
Analytics)
Format,
No Schema
Hadoop Hadoop Hadoop Aster Aster
Aster
(MapReduce
Analytics)
Teradata/
Hadoop
Teradata Teradata Teradata Teradata Teradata
Hadoop
Aster /
Hadoop
Aster /
Hadoop
Aster Aster Aster
Hadoop Hadoop Hadoop Aster Aster Aster
Financial Analysis, Ad-Hoc/OLAP
Enterprise-Wide BI and Reporting
Spatial/Temporal
Active Execution
Interactive Data Discovery
Web Clickstream, Set-Top Box Analysis
CDRs, Sensor Logs, JSON
Social Feeds, Text, Image Processing
Audio/Video Storage and Refining
Storage and Batch Transformations
Confidential and proprietary. Copyright © 2013 Teradata Corporation.43
When to Use Which?
The best approach by workload and data type
Processing as a Function of Schema Requirements and Stage of Data Pipeline
Low Cost
Storage and
Fast Loading
Data Pre-
Processing,
Refining,
Cleansing
“Simple math
at scale”
(Score, filter,
sort, avg.,
count...)
Joins,
Unions,
Aggregates
Analytics
(Iterative and
data mining)
Reporting
Stable
Schema
Evolving
Schema
Aster
(SQL +
MapReduce
Analytics)
Format,
No Schema
Hadoop Hadoop Hadoop Aster Aster
Aster
(MapReduce
Analytics)
Teradata/
Hadoop
Teradata Teradata Teradata Teradata Teradata
Hadoop
Aster /
Hadoop
Aster /
Hadoop
Aster Aster Aster
Hadoop Hadoop Hadoop Aster Aster Aster
6/27/2013 44
Questions
and Answers
Thank You!

Hadoop and the Data Warehouse: When to Use Which

  • 1.
    HADOOP & THEDATA WAREHOUSE: WHEN TO USE WHICH Steve Wooledge – Teradata Labs Jim Walker – Hortonworks 1
  • 2.
    Topics • Trends inenterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 3.
    Big Data Comeswith BIG HEADACHES Even free software like Hadoop is causing companies to spend more money…Many CIOs believe data is inexpensive because storage has become inexpensive. But data is inherently messy—it can be wrong, it can be duplicative, and it can be irrelevant— which means it requires handling, which is where the real expenses come in. “ ” Through 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage. “ ”Source: The Wall Street Journal. “CIOs’ Big Problem with Big Data”. Aug 2012 Source: Gartner. “Information Innovation: Innovation Key Initiative Overview”. April 2012
  • 4.
    Organizations Face SeveralObstacles with Big Data Source: Big Analytics 2012 Survey, Teradata Difficulty managing multiple systems, new types of data Hard to find right skills; Lack of supportability for new systems & “data scientists” Difficulty deploying and integrating new systems Difficulty providing accessibility to fast insights on big data
  • 5.
    Shift from aSingle Platform to an Ecosystem “Big Data requirements are solved by a range of platforms including analytical databases, discovery platforms, and NoSQL solutions beyond Hadoop.” “We will abandon the old models based on the desire to implement for high-value analytic applications.” "Logical" Data Warehouse Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.
  • 6.
    AUDIO & VIDEOIMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP DISCOVERY PLATFORM CAPTURE | STORE | REFINE INTEGRATED DATA WAREHOUSE LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Engineers Data Scientists Business Analysts Front-Line WorkersCustomers / PartnersMarketing Operational SystemsExecutives TERADATA UNIFIED DATA ARCHITECTURE
  • 7.
    Topics • Trends inenterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 8.
    DUAL SYSTEMS DATA MARTS ANALYTICAL ARCHIVE TEST/ DEV The Value ofThe Data Warehouse INDEPENDENT DATA MART Business Analysts Knowledge Workers DATA MININGBUSINESS INTELLIGENCE APPLICATIONS Customers/Partners Marketing Executives Front-line Workers Operational Systems INTEGRATED DATA WAREHOUSE DATA LAB Integrated Analytics Advanced Analytics Temporal OLAP Optimization Geospatial Big Data Integration Application Development Agile Analytics Data Exploration Benefits •Easy to consume data •Rationalization of data from multiple sources into single enterprise view •Clean, safe, secure data •Cross-functional analysis •Transform once, use many •Fast response times
  • 9.
    SQL Advantages withan MPP RDBMS • Full ANSI SQL: • The lingua franca of business users when accessing data • Decades of standardization (stable, feature rich, portable) • Mature 3rd Party SQL based tools that provide business users with self service direct access to the data • BI Tools • In-database statistical packages • Analytic applications (CRM, SCM, MDM) • Easily parallelized • Scalable when manipulating large data sets 6/27/2013 9
  • 10.
    ACID Advantages inan MPP RDBMS • Guarantees database actions are processed reliably • Ensures 100% query result accuracy • Supports updates and deletes • Needed for applications that require 100% consistency 6/27/2013 10 Atomicity - All of the pieces are committed or none are committed. Consistency - Creates a new and valid state of data, or, if any failure occurs, returns all data to its original state. Isolation - Processed and not yet committed transactions must remain isolated from any other transactions. Durability - Committed data is saved such that in event of a failure and system restart, the data is available in its correct state.
  • 11.
    Tight Vertical Integration •End-to-end management of resources • Efficient utilization of resources • Engineered extremely well for known data • Fine-grained parallelism and resource management • Consistency of service level delivery Best Practices Management: • Workload functions • Workload groups • Exceptions • Priorities • Time periods
  • 12.
    Low Latency Advantagesof MPP RDBMS Multi-temperature storage with automated distribution of data based on access patterns: • In-Memory • Solid-State Drives • Fast Hard Drives • Fat Hard Drives 6/27/2013 12 • Indexes • Statistics • Advanced partitioning
  • 13.
    Cost Based OptimizerAdvantages in an MPP RDBMS • Best practices optimizer determines how the query will be processed most efficiently, with no “hints” or degrees of parallelism necessary. • In chess, you can look out a few moves to decide your best next move, but you can’t envision all move and countermove sequences for the entire game: • The Grand Master has the knowledge, experience, and intelligence to identify and use the right strategy. • With Hadoop, the user takes a heavy role in optimizing the execution of queries. • With an MPP RDBMS, the software is the optimizer. 6/27/2013 13 Query Rewrite • semantic optimization • different types of vendor tools Fast/Efficient Data Access • Access path - Indexing • Partitioning (CP & PPI) • Advanced partitioning schemes (range & case based, multilevel, dynamic) • IO Optimizations (efficient scans/sync scan) scan optimization Query Complexity • Join costing & planning • Aggregation Many ways to process a complex query…
  • 14.
    Granular Security Advantagesin an MPP RDBMS • Row level security • Column level security • An MPP RDBMS tightly integrates mature security features • User-level security controls • Increased user authentication options • Support for security roles • Enterprise directory integration • Auditing and monitoring controls • Encryption 6/27/2013 14
  • 15.
    MPP RDBMS CustomerExamples 6/27/2013 15
  • 16.
    Topics • Trends inenterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 17.
    © Hortonworks Inc.2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata
  • 18.
    © Hortonworks Inc.2012 Data Explosion The World of Data is Changing Page 18 By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent. – Gartner, Mark Beyer, “Information Management in the 21st Century” 1 Zettabyte (ZB) = 1 Billion TBs 15x growth rate of machine generated data by 2020 Source: IDC
  • 19.
    © Hortonworks Inc.2012 StorageApache Hadoop: Center of Big Data Strategy Open Source data management with scale-out storage & distributed processing Page 19 HDFS • Distributed across “nodes” • Natively redundant • Name node tracks locations Processing Map Reduce • Splits a task across processors “near” the data & assembles results • Self-Healing, High Bandwidth Clustered Storage Key Characteristics • Scalable – Efficiently store and process petabytes of data – Linear scale driven by additional processing and storage • Reliable – Redundant storage – Failover across nodes and racks • Flexible – Store all types of data in any format – Apply schema on analysis and sharing of the data • Economical – Use commodity hardware – Open source software guards against vendor lock-in
  • 20.
    © Hortonworks Inc.2012 HCatalog Table access Aligned metadata REST API • Raw Hadoop data • Inconsistent, unknown • Tool specific access Apache HCatalog provides flexible metadata services across tools and external access Metadata Services • Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) • Accessibility: share data as tables in and out of HDFS • Availability: enables flexible, thin-client access via REST API Shared table and schema management opens the platform
  • 21.
    © Hortonworks Inc.2012 Page 21 “how to” deliver an open source enterprise product • Identify requirements • Open community delivery • Enterprise rigor Apache Hadoop Test & Patch Design & Develop Release Apache Pig Apache HCatalo g Apache HBase Other Apache Projects Apache Hive Apache Ambari An Open Apache Community Fastest path to innovation is an open community
  • 22.
    © Hortonworks Inc.2012 Big Data: It’s About Scale & Structure Page 22 RDBMS HadoopNoSQLMPPEDW best fit use schemaRequired on write Required on read speedReads are fast Writes are fast governanceStandards and structured Loosely structured processingLimited, no data processing Processing coupled with data data typesStructured Multi and unstructured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Massive Storage/Processing costSoftware License Support only resourcesKnown entity Growing, complexities, wide
  • 23.
    © Hortonworks Inc.2012 An Emerging Data Architecture Page 23 APPLICATIONSDATASYSTEMS TRADITIONAL REPOS RDBMS EDW MPP DATASOURCES MOBILE DATA OLTP, POS SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Enterprise Applications HORTONWORKS DATA PLATFORM
  • 24.
    © Hortonworks Inc.2012 Interoperating With Your Tools Page 24 APPLICATIONSDATASYSTEMS DEV & DATA TOOLS OPERATIONAL TOOLS Viewpoint Microsoft Applications HADOOP DATASOURCES MOBILE DATA OLTP, POS SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media)
  • 25.
    AUDIO & VIDEOIMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP DISCOVERY PLATFORM CAPTURE | STORE | REFINE INTEGRATED DATA WAREHOUSE LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Engineers Data Scientists Business Analysts Front-Line WorkersCustomers / PartnersMarketing Operational SystemsExecutives TERADATA UNIFIED DATA ARCHITECTURE
  • 26.
    © Hortonworks Inc.2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata
  • 27.
    © Hortonworks Inc.2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata •FAMILIARITY
  • 28.
    Organizations Face SeveralObstacles with Big Data Source: Big Analytics 2012 Survey, Teradata Difficulty managing multiple systems, new types of data Hard to find right skills; Lack of supportability for new systems & “data scientists” Difficulty deploying and integrating new systems Difficulty providing accessibility to fast insights on big data
  • 29.
    Topics • Trends inenterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 30.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.30 Teradata Unified Data Architecture • Hadoop - Collect ALL interaction data • Teradata Aster - Discovery customer behavioral patterns • Teradata - Operationalize Insights The right technology on the right analytical problems using best of breed technologies
  • 31.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.31 Improved Customer Service and Retention Hadoop captures, stores and transforms social, images and call records Path, pattern & graph analysis Data Sources Multi-Structured Raw Data Call Center Voice Records Check Images Traditional Data Flow Analysis + Marketing Automation (Customer Campaign) Capture, Store and Refine Layer ETL Tools Hadoop Call Data Integrated DW DimensionalData AnalyticResults Discovery Platform Sentiment Scores SOCIAL FEEDS CLICKSTREAM DATA
  • 32.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.32 Teradata Workload-Specific Platforms 670 1650 2700 6700 Data Mart Appliance Extreme Data Appliance Data Warehouse Appliance Active Enterprise Data Warehouse Appliance for Hadoop Aster Big Analytics Appliance SAS High Performance Analytics Scale Up to 12TB Up to 186PB Up to 1.6PB Up to 61PB Up to 10PB Up to 5PB Up to 52TB Work- loads Test / Development or Smaller Data Marts Analytical Archive, Deep Dive Analytics Strategic Intelligence, Decision Support System, Fast Scan Strategic & Operational Intelligence, Real Time Update, Active workloads Appliance for Storing, Capturing and Refining Data. Hortonworks HDP 1.1 Discovery Platform for Big Data Analytics with embedded SQL MapReduce for new data types & sources Dedicated appliance for SAS high- performance analytic model development 700
  • 33.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.33 Teradata Unified Data Architecture • Hadoop - Collect ALL interaction data • Teradata Aster - Discovery customer behavioral patterns • Teradata - Operationalize Insights The right technology on the right analytical problems using best of breed technologies SQL-H SQL-H Aster-Teradata Connector Aster Connector for Hadoop Teradata Connector for Hadoop
  • 34.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.34 Teradata SQL-H™ A Business User’s Bridge to Access Hadoop Data Teradata SQL-H Gives Business Users a Better Way to Access Data Stored in Hadoop • Trusted: Use existing tools/skills and enable self-service BI with granular security • Allow standard ANSI SQL access to Hadoop data • Fast: Queries run on Teradata, data accessed from Hadoop • Efficient: Intelligent data access leveraging the Hadoop HCatalog Hadoop Layer: HDFS Pig Hive Hadoop MR Teradata: SQL-H HCatalog Data DataFiltering
  • 35.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.35 The App Store of Big Data PATH ANALYSIS Discover Patterns in Rows of Sequential Data TEXT ANALYSIS Derive Patterns and Extract Features in Textual Data STATISTICAL ANALYSIS High-Performance Processing of Common Statistical Calculations SEGMENTATION Discover Natural Groupings of Data Points MARKETING ANALYTICS Analyze Customer Interactions to Optimize Marketing Decisions DATA TRANSFORMATION Transform Data for More Advanced Analysis Graph Analysis Graph analytics processing and visualization SQL-MapReduce Visualization Graphing and visualization tools linked to key functions of the MapReduce analytics library Aster Discovery Portfolio: Accelerate Time to Insights Some of the 80+ out-of-the-box analytical apps
  • 36.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.36 Big Data Analytics & Discovery Example Customers: Teradata Aster Big Analytics Appliance XL Axiata
  • 37.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.37 Discovering Deep Insights in Retail Transforming Web Walks into DNA Sequences Situation Large retailer with 700M visits/year, 2M customers / day look at 1M products online Problem Increase ability of web content owners to self-serve insights Solution Treat web walks like DNA sequences of simple patterns. Impact • Data: loaded logs into Hortonworks • Loaded 2 months of raw data in 1 hour, vs. 1 day on old system • Can load a day’s log data in 60 sec • Sessionize: Creates sequence for visit, e.g., boils 20 customer clicks down to 1 line: • <Home –Search -Look at Product - Add to Basket – Pay – Exit> • Analyze: Business analysts can now do path analysis • Act: • Segmentations by behavior can increase conversion rates by 5-10%. • Web design changes can drive another 10-20% more visitors into the sales funnel
  • 38.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.38 Example: Online Checkout Flow Analysis • Customers who have reached the checkout process follow an “ideal path”. • deliveryslots > deliveryinformation > coupons > substitutions > paymentinfo > orderconfirmation • Determine how and when (and ultimately, why) customers deviate from this path. • Discover obstacles preventing purchase and optimize visitor flow through the web site. • The Aster SQL-MapReduce Framework enables a variety of different path visualizations.
  • 39.
    Teradata Portfolio forHadoop ”Taking Hadoop from Silicon Valley to Main Street” Most Trusted & Flexible Hadoop Platforms for Your Next-Generation Unified Data Architecture™ 1. Teradata Aster Big Analytics Appliance 2. Teradata Appliance for Hadoop 3. Teradata Commodity Offering for Hadoop (Dell) 4. Teradata Software-only for Hadoop (Hortonworks Data Platform) Complete consulting and training capability • Big Analytics Services – across the UDA • Data Integration Optimization – ETL, ELT across the UDA • Hadoop deployment & mentoring • Teradata delivering Hortonworks training • Hadoop Managed Services - operations & administration Customer Support for Hadoop • World-class Teradata customer support, backed by Hortonworks What We Announced Today
  • 40.
    Teradata Appliance forHadoop Value-Added Software Bringing Hadoop to Enterprise Access: SQL-H™ Management: Viewpoint, TVI Administration: Hadoop Builder, Intelligent start/stop, DataNode swap, deferred drive replace High Availability : NameNode HA, Master Machine Failover Refining, Metadata, Entity Resolution Security & Data Access HCatalog KerberosKerberos
  • 41.
    41 6/27/2013 TeradataConfidential Complete Consulting and Training Capability Post-sale Services Areas of Focus Teradata Analytic Architecture Services Services to scope, design, build, operate and maintain an optimal UDA approach for Teradata, Aster, and Hadoop Teradata DI Optimization Assess structured/non-structured data, discuss data loading techniques, determine best platform, optimize load scripts/processes Teradata Big Analytics Assess data value/cost of capture, identify source of “exhaust” data, create conceptual architecture, refine and enrich the data, implement initial analytics in Aster or best-fit tool Teradata Workshop for Hadoop Introduction workshop (across all of UDA) Teradata Data Staging for Hadoop Load data into landing-area; set-up data exploration/refining area; Scope architecture and analytics; set-up Hadoop repository; Load sample data Teradata Platform for Hadoop Installation guidance and mentoring for Hadoop platform, D-I-Y after installation Teradata Managed Services for Hadoop Operations, management, administration, backup, security, process control for Hadoop Teradata Training Courses for Hadoop Two comprehensive, multi-day training offerings: 1) Administration of Apache Hadoop and 2) Developing Solutions Using Apache Hadoop
  • 42.
    42 6/27/2013 TeradataConfidential When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline Low Cost Storage and Fast Loading Data Pre- Processing, Refining, Cleansing “Simple math at scale” (Score, filter, sort, avg., count...) Joins, Unions, Aggregates Analytics (Iterative and data mining) Reporting Stable Schema Evolving Schema Aster (SQL + MapReduce Analytics) Format, No Schema Hadoop Hadoop Hadoop Aster Aster Aster (MapReduce Analytics) Teradata/ Hadoop Teradata Teradata Teradata Teradata Teradata Hadoop Aster / Hadoop Aster / Hadoop Aster Aster Aster Hadoop Hadoop Hadoop Aster Aster Aster Financial Analysis, Ad-Hoc/OLAP Enterprise-Wide BI and Reporting Spatial/Temporal Active Execution Interactive Data Discovery Web Clickstream, Set-Top Box Analysis CDRs, Sensor Logs, JSON Social Feeds, Text, Image Processing Audio/Video Storage and Refining Storage and Batch Transformations
  • 43.
    Confidential and proprietary.Copyright © 2013 Teradata Corporation.43 When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline Low Cost Storage and Fast Loading Data Pre- Processing, Refining, Cleansing “Simple math at scale” (Score, filter, sort, avg., count...) Joins, Unions, Aggregates Analytics (Iterative and data mining) Reporting Stable Schema Evolving Schema Aster (SQL + MapReduce Analytics) Format, No Schema Hadoop Hadoop Hadoop Aster Aster Aster (MapReduce Analytics) Teradata/ Hadoop Teradata Teradata Teradata Teradata Teradata Hadoop Aster / Hadoop Aster / Hadoop Aster Aster Aster Hadoop Hadoop Hadoop Aster Aster Aster
  • 44.