Understanding Deployment Practices that Merge the
Strengths of Hadoop and the Data Warehouse
Joe Rao
PS Consultant, Teradata Corporation
HADOOP IS NOT AN ISLAND
IN THE ENTERPRISE:
2 6/17/2014 Teradata Confidential
This presentation covers
• A comparison of the strengths of Hadoop and a Data
Warehouse
• Architectures that involve Hadoop and the data warehouse
working together
AGENDA
3 6/17/2014 Teradata Confidential
• Our two platforms:
> The Data Platform – Hadoop
> The Enterprise Data Warehouse – Teradata
• Both platforms could handle everything by themselves
if we really wanted them to
• Biased organizations will favor one over the other, and
argue that everything can be done in one place
• And they're both right
FRAMING THE DISCUSSION
4 6/17/2014 Teradata Confidential
•Let's consider a software startup or
company that has no IT department yet
•They need to:
> Acquire their technology from scratch
> Build business logic from scratch
> Staff their new department from scratch
•With no existing technology, how should
they structure their data center?
FRAMING THE DISCUSSION
5 6/17/2014 Teradata Confidential
• Traditional data warehouses (like the Teradata
database) have been used as the central
repository
of business data for years.
• Data warehouses are great with:
> Thousands of concurrent users and queries
> Full ANSI SQL interfaces
> Very complex SQL query logic
> Advanced workload management
> Transactional capabilities
> Secure access
DATA WAREHOUSE STRENGTHS
6 6/17/2014 Teradata Confidential
• Many companies that have been doing things the
old way with a data warehouse don't think they need
to change anything
• What they've been doing has worked for years. Hadoop
is young and immature they say. Why change?
• These companies are change resistant. They are missing
out on the advancements in big data and can fall behind
their competition.
DATA WAREHOUSE ONLY?
I’m lonely
7 6/17/2014 Teradata Confidential
• Hadoop is changing the game in the enterprise
data landscape. It's major strengths include:
> Economical
> Able to process extremely large data sets
> Extremely flexible storage and processing
> Open, free, active community development
HADOOP STRENGTHS
8 6/17/2014 Teradata Confidential
• Appliance Solution
> Purpose-built integrated hardware/software solution
> Optimized hardware for Hadoop, software, storage, and
networking in a single rack
> Delivered ready to run at a competitive price point
• Enterprise Ready
> 100% open-source Hadoop via Hortonworks HDP
> Integrated with Teradata Unified Data Architecture on 40GB/s
InfiniBand BYNET V5 for performance and reliability
> Support for major ETL tools, enhanced security, and
metadata management
> Management tools for monitoring system health
• Benefits
> Lowest TCO and fastest time to value
> Fully engineered and supported by Teradata
TERADATA APPLIANCE FOR HADOOP
9 6/17/2014 Teradata Confidential
• Many companies are so eager to jump onto the Hadoop wave
that they think they can run their entire datacenter on
Hadoop.
• It's free, it has lots of development effort put into it, it's
flexible. Why go the “old way” with an EDW?
• These companies are using Hadoop beyond its design and
maturity level, and may run into technical problems
meeting requirements.
HADOOP ONLY?
I’m lonely
10 6/17/2014 Teradata Confidential
CONCLUSIONS — TWO TCOD EXAMPLES
1. TCOD is NOT platform cost – it is total project cost
2. Each technology has large advantages in its sweet spot(s)
3. Neither platform is cost effective in the other’s sweet spot
4. Biggest differences for the data warehouse are the development of:
 Complex queries
 Analytics Source: WinterCorp - Full report at www.wintercorp.com/tcod-report
Data Refining: Hadoop wins
Also: Landing Zone, Archive EDW: Data W/H Platform Wins
$0
$5
$10
$15
$20
$25
$30
$35
On Hadoop On Data
Warehouse
Millions
$0
$100
$200
$300
$400
$500
$600
$700
$800
On Hadoop On Data
Warehouse
Millions
Total System Cost
System and Data Admin
Application Development
ETL
Complex Queries
Analysis
11 6/17/2014 Teradata Confidential
• These two platforms are complementary!
• Successful enterprise datacenters merge the strengths
of both platforms.
EDW VS. HADOOP
12 6/17/2014 Teradata Confidential
• Split Workload Architecture
• ETL System Architecture
• Secure Access Architecture
• Active Archive Architecture
COMBINED ARCHITECTURES
13 6/17/2014 Teradata Confidential
Insurance Use Case
Impact
• Quickly analyze data for informed decisions and ad hoc reporting
• Streamlined process to calculate vehicle and fleet scores
• Cost effectively quantify, adjust and manage risk premiums
Situation
A large diversified customers needed to accurately calculate scores and adjust risk
premiums for its enterprise fleets based on vehicle data, driver behavior, GPS data,
weather data, traffic and DW data. Current custom developed applications limits the
effectiveness of these scores.
Problem
Lacks infrastructure and system to handle the huge volumes of real time data. No ad-hoc
reporting systems to combine, enrich and analyze the data. Limited storage capacity limits
the amount of data that can be captured, refined and stored.
Solution
Used Teradata Big Analytics Appliance to design a platform to streamline the ingestion
process for telematics data from multiple sources, data types, structure, and frequency
and combine with other data sources to perform meaningful analytics.
14 6/17/2014 Teradata Confidential
HADOOP
TeradataINTEGRATED DATA WAREHOUSE
• The Data Warehouse and Hadoop run different workloads
on different data sets.
SPLIT WORKLOADS
Big Data
Operational Data
15 6/17/2014 Teradata Confidential
• It is not economical to put gigantic, “value sparse” data
sets on an enterprise data warehouse.
• Hadoop was not built to be an accessible, highly concurrent
transactional database.
• The easiest natural architecture is to split up the two
platforms based on the data set and workload.
> Teradata handles the operational business data and queries
> Hadoop handles the cost prohibitive “big data” sets, such as
web, machine, social data
SPLIT WORKLOADS
16 6/17/2014 Teradata Confidential
• Both systems operate favorably on cost and performance
with respect to their given workloads.
• The business can analyze new data and gain new insights
that their existing platform couldn't handle before.
SPLIT WORKLOADS — BUSINESS VALUE
17 6/17/2014 Teradata Confidential
LARGE COMPUTER MANUFACTURER
Analysis of Customer Web Interactions
Capture, Refine, Store ClickStream Data
Impact
• Reduced data inconsistencies and improved performance
• Capture and curate ALL the data and prepare for analysis
• Perform ad hoc analytics on multi-level interactions
• Improves the marketing campaigns and the customer support process
Situation
Customers interact interact with public websites of large PC vendor for various purposes — resulting in
huge volumes of raw omniture data. Because of its nature, the data structure and format is not always
consistent and because of the volumes, processing the amount of data is difficult.
Problem
Inconsistencies like file errors, corrupted file compressions in the raw omniture data makes the
capturing and analysis process error prone. The volume, velocity (70files/hr, 1M files) adds to the
complexity.
Solution
Teradata Big Analytics solution to provide a landing and staging area for in-coming data at high
velocity. Hadoop nodes to curate the data, check for data consistency, and prepare the data for
consumption by higher end analytic platforms.
18 6/17/2014 Teradata Confidential
HADOOP Teradata
TERADATA
PLATFORM FAMILY
• Hadoop can be used as a staging and ETL preprocessing
layer for the Data Warehouse.
ETL SYSTEM ARCHITECTURE
Source Data Transformed Data
19 6/17/2014 Teradata Confidential
• The Data Warehouse is busy with operational queries.
We can reduce the workload on the DW by
migrating some ETL to Hadoop.
• ETL processing is a write once step, which fits
Hadoop's architecture.
• Hadoop can inexpensively retain the raw source
data for data lineage purposes.
*Note that there are many cases where this migration doesn't make sense,
such as when it's necessary to do referential integrity checks. The DW is
capable of handling its ETL if necessary.
ETL SYSTEM ARCHITECTURE
20 6/17/2014 Teradata Confidential
HADOOP TERADATA
PLATFORM FAMILY
• Command line interface for Hadoop / TD data transfer
• Batch mapreduce jobs
• Bidirectional
• Run on the Hadoop side
TERADATA CONNNECTOR
FOR HADOOP (TDCH)
TDCH
21 6/17/2014 Teradata Confidential
hadoop jar /home/jo845b/teradata-connector-1.0.10/lib/teradata-connector-
1.0.10.jar 
com.teradata.hadoop.tool.TeradataExportTool 
-libjars $LIB_JARS 
-classname com.teradata.jdbc.TeraDriver 
-url jdbc:teradata://terarps.ca.boeing.com/DATABASE=SQLH_TEST 
-username jo845b 
-password Teradata14 
-jobtype hcat 
-fileformat rcfile 
-method internal.fastload 
-sourcedatabase default 
-sourcetable ontime_sqoop 
-targettable ontime_sqoop 
-usexviews true
• There are a plethora of options to fine-tune data transfer
between Teradata and Hadoop
TERADATA CONNECTOR FOR HADOOP
22 6/17/2014 Teradata Confidential
• Hadoop frees up the Data Warehouse's limited storage and
processing resources, saving the business time and money.
• Data can now be kept in its raw form, adding new data
lineage capabilities to the data center.
ETL SYSTEM ARCHITECTURE — BUSINESS VALUE
23 6/17/2014 Teradata Confidential
BANKING USE CASE
Impact
• Analyze multi-structured data types
• Keep data confidential to those with access rights
• SQL users have easy access to big data sources
Situation
A large national bank needed to securely and inexpensively store and analyze raw
financial data in varied nonrelational formats. The data needs strict access privileges and
should be generally accessible to SQL users in some way.
Problem
Current infrastructure is not flexible enough to handle the expected variations in data
formats and processing algorithms. Security requirements are too strict for vanilla
Hadoop.
Solution
Use Teradata Big Analytics Appliance to ingest and store the data. Data is accessed by
analysts though an access layer with the data warehouse, and power users manipulate
the data on the Hadoop system directly.
24 6/17/2014 Teradata Confidential
HADOOP TERADATA
PLATFORM FAMILY
Sub-queries
Data
Queries
SECURE ACCESS ARCHITECTURE
• Teradata can be used as an access layer to the data
stored in Hadoop.
25 6/17/2014 Teradata Confidential
• Data in Hadoop can be accessed by data
warehouse users with no knowledge of the
inner workings of Hadoop.
• The full Teradata SQL library is now available to
Hadoop users
• Teradata can be used as a secure gateway to
limit the authentication gap in Hadoop without
needing Kerberos.
SECURE ACCESS ARCHITECTURE
26 6/17/2014 Teradata Confidential
HADOOP TERADATA
PLATFORM FAMILY
Query Grid
Data
TERADATA QUERY GRID:
TERADATA DATABASE TO HADOOP
• Direct data transfer from the Hadoop Distributed Filesystem
• Hadoop data referenced in normal SQL queries
• Transfers occur in a high speed, parallel, scalable fashion
• Data can be processed on the fly or stored long-term
27 6/17/2014 Teradata Confidential
CREATE VIEW TOM AS (
SELECT * FROM load_from_hcatalog(
USING
server('sdll4364.labs.teradata.com')
port('9083')
username('hive')
dbname('vim')
templeton_port('1880')
));
• There are a plethora of options to fine-tune data transfer
between Teradata and Hadoop
• Access rights on the view can limit users' access to other
data sets.
TERADATA QUERY GRID
28 6/17/2014 Teradata Confidential
• Businesses can leverage the much more widespread
SQL and EDW user community instead of the small,
expensive Hadoop expert community. This saves the
business money.
• Data can be stored inexpensively, securely, and
accessibly at the same time.
SECURE ACCESS ARCHITECTURE —
BUSINESS VALUE
29 6/17/2014 Teradata Confidential
PHARMACY USE CASE
Impact
• Reduced storage costs for data variety
• Perform adhoc analytics on the multiple versions of data
• Retrieve data in minutes ( vs. days with tape archives )
• Reduced load and improved performance of DW/Databases
Situation
High performance storage is expensive. A Large integrated pharmacy HC providers deals
With a variety of data with different business value. All data cannot be store on the same
system. Ever expanding data is only adding to this challenge.
Problem
Long terms storage data cannot be queried and it takes a long time for retrieval. No analysis
can be performed on the archived data. Losing out on business value from this valuable data.
Solution
Used Teradata Hadoop nodes to store all the data coming in from weblogs, medical
data, JSON files. Hadoop also serves as a enrichment layer to enhance data for high-end
analytics consumption. The complete solution provides easy movement of data from
Hadoop, Aster and Teradata.
30 6/17/2014 Teradata Confidential
HADOOP TERADATA
PLATFORM FAMILY
ACTIVE ARCHIVE
• Hadoop can be used to store the data warehouse's
cold data, historical data, and regular backups.
Backups
Historical Data
31 6/17/2014 Teradata Confidential
• Using Hadoop as an active archive allows database users to
access cold or historical data on the fly, unlike tape
archives.
• Hadoop data can be accessed in the EDW using Teradata
QueryGrid: Teradata-Hadoop.
• The data is no longer stored in the data warehouse,
freeing valuable space. Hadoop is a less expensive
platform to store this data on.
ACTIVE ARCHIVE
32 6/17/2014 Teradata Confidential
• Storing data on Hadoop frees up cold data storage space
on the relatively expensive data warehouse, saving the
business money.
• Compared to tape, businesses can still analyze and
access their data on Hadoop. This saves time and effort.
ACTIVE ARCHIVE — BUSINESS VALUE
33 6/17/2014 Teradata Confidential
• A successful DW / Hadoop coexistence system will see
varying uses of all four of these mechanisms concurrently.
• Replacing existing infrastructures with Hadoop is not a
feasible goal.
• In order to get Hadoop's foot in the door with large
established enterprises, we need to push Hadoop as an
integrated solution in tandem with a DW.
CONCLUDING REMARKS
PUSHING HADOOP FURTHER
Q&A
THANKS!
WWW.TERADATA.COM

Hadoop is not an Island in the Enterprise

  • 1.
    Understanding Deployment Practicesthat Merge the Strengths of Hadoop and the Data Warehouse Joe Rao PS Consultant, Teradata Corporation HADOOP IS NOT AN ISLAND IN THE ENTERPRISE:
  • 2.
    2 6/17/2014 TeradataConfidential This presentation covers • A comparison of the strengths of Hadoop and a Data Warehouse • Architectures that involve Hadoop and the data warehouse working together AGENDA
  • 3.
    3 6/17/2014 TeradataConfidential • Our two platforms: > The Data Platform – Hadoop > The Enterprise Data Warehouse – Teradata • Both platforms could handle everything by themselves if we really wanted them to • Biased organizations will favor one over the other, and argue that everything can be done in one place • And they're both right FRAMING THE DISCUSSION
  • 4.
    4 6/17/2014 TeradataConfidential •Let's consider a software startup or company that has no IT department yet •They need to: > Acquire their technology from scratch > Build business logic from scratch > Staff their new department from scratch •With no existing technology, how should they structure their data center? FRAMING THE DISCUSSION
  • 5.
    5 6/17/2014 TeradataConfidential • Traditional data warehouses (like the Teradata database) have been used as the central repository of business data for years. • Data warehouses are great with: > Thousands of concurrent users and queries > Full ANSI SQL interfaces > Very complex SQL query logic > Advanced workload management > Transactional capabilities > Secure access DATA WAREHOUSE STRENGTHS
  • 6.
    6 6/17/2014 TeradataConfidential • Many companies that have been doing things the old way with a data warehouse don't think they need to change anything • What they've been doing has worked for years. Hadoop is young and immature they say. Why change? • These companies are change resistant. They are missing out on the advancements in big data and can fall behind their competition. DATA WAREHOUSE ONLY? I’m lonely
  • 7.
    7 6/17/2014 TeradataConfidential • Hadoop is changing the game in the enterprise data landscape. It's major strengths include: > Economical > Able to process extremely large data sets > Extremely flexible storage and processing > Open, free, active community development HADOOP STRENGTHS
  • 8.
    8 6/17/2014 TeradataConfidential • Appliance Solution > Purpose-built integrated hardware/software solution > Optimized hardware for Hadoop, software, storage, and networking in a single rack > Delivered ready to run at a competitive price point • Enterprise Ready > 100% open-source Hadoop via Hortonworks HDP > Integrated with Teradata Unified Data Architecture on 40GB/s InfiniBand BYNET V5 for performance and reliability > Support for major ETL tools, enhanced security, and metadata management > Management tools for monitoring system health • Benefits > Lowest TCO and fastest time to value > Fully engineered and supported by Teradata TERADATA APPLIANCE FOR HADOOP
  • 9.
    9 6/17/2014 TeradataConfidential • Many companies are so eager to jump onto the Hadoop wave that they think they can run their entire datacenter on Hadoop. • It's free, it has lots of development effort put into it, it's flexible. Why go the “old way” with an EDW? • These companies are using Hadoop beyond its design and maturity level, and may run into technical problems meeting requirements. HADOOP ONLY? I’m lonely
  • 10.
    10 6/17/2014 TeradataConfidential CONCLUSIONS — TWO TCOD EXAMPLES 1. TCOD is NOT platform cost – it is total project cost 2. Each technology has large advantages in its sweet spot(s) 3. Neither platform is cost effective in the other’s sweet spot 4. Biggest differences for the data warehouse are the development of:  Complex queries  Analytics Source: WinterCorp - Full report at www.wintercorp.com/tcod-report Data Refining: Hadoop wins Also: Landing Zone, Archive EDW: Data W/H Platform Wins $0 $5 $10 $15 $20 $25 $30 $35 On Hadoop On Data Warehouse Millions $0 $100 $200 $300 $400 $500 $600 $700 $800 On Hadoop On Data Warehouse Millions Total System Cost System and Data Admin Application Development ETL Complex Queries Analysis
  • 11.
    11 6/17/2014 TeradataConfidential • These two platforms are complementary! • Successful enterprise datacenters merge the strengths of both platforms. EDW VS. HADOOP
  • 12.
    12 6/17/2014 TeradataConfidential • Split Workload Architecture • ETL System Architecture • Secure Access Architecture • Active Archive Architecture COMBINED ARCHITECTURES
  • 13.
    13 6/17/2014 TeradataConfidential Insurance Use Case Impact • Quickly analyze data for informed decisions and ad hoc reporting • Streamlined process to calculate vehicle and fleet scores • Cost effectively quantify, adjust and manage risk premiums Situation A large diversified customers needed to accurately calculate scores and adjust risk premiums for its enterprise fleets based on vehicle data, driver behavior, GPS data, weather data, traffic and DW data. Current custom developed applications limits the effectiveness of these scores. Problem Lacks infrastructure and system to handle the huge volumes of real time data. No ad-hoc reporting systems to combine, enrich and analyze the data. Limited storage capacity limits the amount of data that can be captured, refined and stored. Solution Used Teradata Big Analytics Appliance to design a platform to streamline the ingestion process for telematics data from multiple sources, data types, structure, and frequency and combine with other data sources to perform meaningful analytics.
  • 14.
    14 6/17/2014 TeradataConfidential HADOOP TeradataINTEGRATED DATA WAREHOUSE • The Data Warehouse and Hadoop run different workloads on different data sets. SPLIT WORKLOADS Big Data Operational Data
  • 15.
    15 6/17/2014 TeradataConfidential • It is not economical to put gigantic, “value sparse” data sets on an enterprise data warehouse. • Hadoop was not built to be an accessible, highly concurrent transactional database. • The easiest natural architecture is to split up the two platforms based on the data set and workload. > Teradata handles the operational business data and queries > Hadoop handles the cost prohibitive “big data” sets, such as web, machine, social data SPLIT WORKLOADS
  • 16.
    16 6/17/2014 TeradataConfidential • Both systems operate favorably on cost and performance with respect to their given workloads. • The business can analyze new data and gain new insights that their existing platform couldn't handle before. SPLIT WORKLOADS — BUSINESS VALUE
  • 17.
    17 6/17/2014 TeradataConfidential LARGE COMPUTER MANUFACTURER Analysis of Customer Web Interactions Capture, Refine, Store ClickStream Data Impact • Reduced data inconsistencies and improved performance • Capture and curate ALL the data and prepare for analysis • Perform ad hoc analytics on multi-level interactions • Improves the marketing campaigns and the customer support process Situation Customers interact interact with public websites of large PC vendor for various purposes — resulting in huge volumes of raw omniture data. Because of its nature, the data structure and format is not always consistent and because of the volumes, processing the amount of data is difficult. Problem Inconsistencies like file errors, corrupted file compressions in the raw omniture data makes the capturing and analysis process error prone. The volume, velocity (70files/hr, 1M files) adds to the complexity. Solution Teradata Big Analytics solution to provide a landing and staging area for in-coming data at high velocity. Hadoop nodes to curate the data, check for data consistency, and prepare the data for consumption by higher end analytic platforms.
  • 18.
    18 6/17/2014 TeradataConfidential HADOOP Teradata TERADATA PLATFORM FAMILY • Hadoop can be used as a staging and ETL preprocessing layer for the Data Warehouse. ETL SYSTEM ARCHITECTURE Source Data Transformed Data
  • 19.
    19 6/17/2014 TeradataConfidential • The Data Warehouse is busy with operational queries. We can reduce the workload on the DW by migrating some ETL to Hadoop. • ETL processing is a write once step, which fits Hadoop's architecture. • Hadoop can inexpensively retain the raw source data for data lineage purposes. *Note that there are many cases where this migration doesn't make sense, such as when it's necessary to do referential integrity checks. The DW is capable of handling its ETL if necessary. ETL SYSTEM ARCHITECTURE
  • 20.
    20 6/17/2014 TeradataConfidential HADOOP TERADATA PLATFORM FAMILY • Command line interface for Hadoop / TD data transfer • Batch mapreduce jobs • Bidirectional • Run on the Hadoop side TERADATA CONNNECTOR FOR HADOOP (TDCH) TDCH
  • 21.
    21 6/17/2014 TeradataConfidential hadoop jar /home/jo845b/teradata-connector-1.0.10/lib/teradata-connector- 1.0.10.jar com.teradata.hadoop.tool.TeradataExportTool -libjars $LIB_JARS -classname com.teradata.jdbc.TeraDriver -url jdbc:teradata://terarps.ca.boeing.com/DATABASE=SQLH_TEST -username jo845b -password Teradata14 -jobtype hcat -fileformat rcfile -method internal.fastload -sourcedatabase default -sourcetable ontime_sqoop -targettable ontime_sqoop -usexviews true • There are a plethora of options to fine-tune data transfer between Teradata and Hadoop TERADATA CONNECTOR FOR HADOOP
  • 22.
    22 6/17/2014 TeradataConfidential • Hadoop frees up the Data Warehouse's limited storage and processing resources, saving the business time and money. • Data can now be kept in its raw form, adding new data lineage capabilities to the data center. ETL SYSTEM ARCHITECTURE — BUSINESS VALUE
  • 23.
    23 6/17/2014 TeradataConfidential BANKING USE CASE Impact • Analyze multi-structured data types • Keep data confidential to those with access rights • SQL users have easy access to big data sources Situation A large national bank needed to securely and inexpensively store and analyze raw financial data in varied nonrelational formats. The data needs strict access privileges and should be generally accessible to SQL users in some way. Problem Current infrastructure is not flexible enough to handle the expected variations in data formats and processing algorithms. Security requirements are too strict for vanilla Hadoop. Solution Use Teradata Big Analytics Appliance to ingest and store the data. Data is accessed by analysts though an access layer with the data warehouse, and power users manipulate the data on the Hadoop system directly.
  • 24.
    24 6/17/2014 TeradataConfidential HADOOP TERADATA PLATFORM FAMILY Sub-queries Data Queries SECURE ACCESS ARCHITECTURE • Teradata can be used as an access layer to the data stored in Hadoop.
  • 25.
    25 6/17/2014 TeradataConfidential • Data in Hadoop can be accessed by data warehouse users with no knowledge of the inner workings of Hadoop. • The full Teradata SQL library is now available to Hadoop users • Teradata can be used as a secure gateway to limit the authentication gap in Hadoop without needing Kerberos. SECURE ACCESS ARCHITECTURE
  • 26.
    26 6/17/2014 TeradataConfidential HADOOP TERADATA PLATFORM FAMILY Query Grid Data TERADATA QUERY GRID: TERADATA DATABASE TO HADOOP • Direct data transfer from the Hadoop Distributed Filesystem • Hadoop data referenced in normal SQL queries • Transfers occur in a high speed, parallel, scalable fashion • Data can be processed on the fly or stored long-term
  • 27.
    27 6/17/2014 TeradataConfidential CREATE VIEW TOM AS ( SELECT * FROM load_from_hcatalog( USING server('sdll4364.labs.teradata.com') port('9083') username('hive') dbname('vim') templeton_port('1880') )); • There are a plethora of options to fine-tune data transfer between Teradata and Hadoop • Access rights on the view can limit users' access to other data sets. TERADATA QUERY GRID
  • 28.
    28 6/17/2014 TeradataConfidential • Businesses can leverage the much more widespread SQL and EDW user community instead of the small, expensive Hadoop expert community. This saves the business money. • Data can be stored inexpensively, securely, and accessibly at the same time. SECURE ACCESS ARCHITECTURE — BUSINESS VALUE
  • 29.
    29 6/17/2014 TeradataConfidential PHARMACY USE CASE Impact • Reduced storage costs for data variety • Perform adhoc analytics on the multiple versions of data • Retrieve data in minutes ( vs. days with tape archives ) • Reduced load and improved performance of DW/Databases Situation High performance storage is expensive. A Large integrated pharmacy HC providers deals With a variety of data with different business value. All data cannot be store on the same system. Ever expanding data is only adding to this challenge. Problem Long terms storage data cannot be queried and it takes a long time for retrieval. No analysis can be performed on the archived data. Losing out on business value from this valuable data. Solution Used Teradata Hadoop nodes to store all the data coming in from weblogs, medical data, JSON files. Hadoop also serves as a enrichment layer to enhance data for high-end analytics consumption. The complete solution provides easy movement of data from Hadoop, Aster and Teradata.
  • 30.
    30 6/17/2014 TeradataConfidential HADOOP TERADATA PLATFORM FAMILY ACTIVE ARCHIVE • Hadoop can be used to store the data warehouse's cold data, historical data, and regular backups. Backups Historical Data
  • 31.
    31 6/17/2014 TeradataConfidential • Using Hadoop as an active archive allows database users to access cold or historical data on the fly, unlike tape archives. • Hadoop data can be accessed in the EDW using Teradata QueryGrid: Teradata-Hadoop. • The data is no longer stored in the data warehouse, freeing valuable space. Hadoop is a less expensive platform to store this data on. ACTIVE ARCHIVE
  • 32.
    32 6/17/2014 TeradataConfidential • Storing data on Hadoop frees up cold data storage space on the relatively expensive data warehouse, saving the business money. • Compared to tape, businesses can still analyze and access their data on Hadoop. This saves time and effort. ACTIVE ARCHIVE — BUSINESS VALUE
  • 33.
    33 6/17/2014 TeradataConfidential • A successful DW / Hadoop coexistence system will see varying uses of all four of these mechanisms concurrently. • Replacing existing infrastructures with Hadoop is not a feasible goal. • In order to get Hadoop's foot in the door with large established enterprises, we need to push Hadoop as an integrated solution in tandem with a DW. CONCLUDING REMARKS PUSHING HADOOP FURTHER
  • 34.
  • 35.

Editor's Notes