Teradata QueryGrid to MongoDB Lightning Introduction
Rich Charucki - Teradata
2
What is a Teradata Data Warehouse?
• Analytic database
– In-memory, in-database
• Scale-out MPP
– 30+ petabyte sites
– 35PB, 4096 cores
• Self service BI
– Dashboards, reports, OLAP
– Predictive analytics
• Complex SQL
– 20-50 way joins
– 350 pages of SQL
• Real time access/load
• Mixed workloads
Data
scientists
Power
users
Sales,
partners
1024 nodes
Intel
CPUs
512GB
Intel
CPUs
512GB
Intel
CPUs
512GB
Intel
CPUs
512GB
3
JSONPath inside SQL
Color Size Prod_ID Create_Time
----- ----- ------- -------------------
Blue Small 96 2013-06-17 20:07:27
SELECT
box.MFG_Line.Product.Color AS "Color",
box.MFG_Line.Product.Size AS "Size",
box.MFG_Line.Product.Prod_ID AS "Prod_ID",
box.MFG_Line.Product.Create_Time AS "Create_Time"
FROM mfgTable
WHERE CAST(box.MFG_Line.Product.Create_Time
AS TIMESTAMP) >= TIMESTAMP'2013-06-16 00:00:00'
AND box.MFG_Line.Product.Prod_ID = 96;
4
Math
and Stats
Data
Mining
Business
Intelligence
Applications
Languages
Marketing
ANALYTIC
TOOLS &
APPS
USERS
UNIFIED DATA ARCHITECTURE
Marketing
Executives
Operational
Systems
Frontline
Workers
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
INTEGRATED
DATA WAREHOUSE
DISCOVERY
PLATFORM
DATA LAKE
REAL TIME PROCESSINGERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
Web and
Social
SOURCES
5
MONGODB
NoSQL
Database
Teradata and MongoDB: QueryGrid
IDW
TERADATA
DATABASE
Discovery
ASTER
DATABASE
Business users Data scientists
TERADATA
ASTER
SQL,
SQL-MR,
SQL-GR
Teradata
Systems
TERADATA
DATABASE
HADOOP
Push-down
to Hadoop
SAS, Perl, R,
Python, Ruby
LANGUAGES
6
Integration Export / Import
Direct Connect
7
Teradata and MongoDB
• Operational + Analytical
– Rich MongoDB applications
– Rich Teradata analytics
– Complementary
• Teradata pulls directly from
MongoDB sharded clusters
• Teradata pushes back to MongoDB
deployments
MongoDB Teradata
Operational Data
Analytics
8
Scale-out NoSQL + Scale-out DW SQL
Application
Primary
Shard 1
Primary
Shard 2
Primary
Shard N
Primary
Shard 3
Query router Query router Query router
NoSQL
SQL
AMPAMP
PE
AMPAMP
PE
AMPAMP
PE
AMPAMP
PE
9
Query Router
Shard 1
Shard 2
Shard 3
Shard 4
Contract Phase
Teradata
node
PE
SQL
E
A
H
AMP
AMP
AMP
AMP
10
Contract Phase
Teradata
node
AMP
AMP
AMP
AMP
E
A
H
Query Router
Shard 1
Shard 2
Shard 3
Shard 4
PE
11
Data Export to Shards
Teradata
node
E
A
H
Query Router
Shard 1
Shard 2
Shard 3
Shard 4
AMP
AMP
AMP
AMP
PE
12
Import Data from Shards
Teradata
node
E
A
H
Query Router
Shard 1
Shard 2
Shard 3
Shard 4
AMP
AMP
AMP
AMP
PE
13
Back-office context to the Front-office operations
Use cases
14
Data Warehouse
eCommerce in Action: A Virtuous Circle
Buyer preferences
Sales catalog
Campaigns
Recent purchases
Profitability
Shard
Shard
Shard
Shard
Shard
Shard
Shard
Shard
15
Data Warehouse
Shard
Shard
Shard
Shard
Shard
Shard
Shard
Shard
Call Center Efficiency: A Virtuous Circle
Trouble tickets
Customer profiles
Payment history
Claims
Next best offer
web logs
16
• Context from the DW
– Enriching MongoDB applications
• Integration
– Import/export
– Teradata QueryGrid
• Two scale out architectures
– OLTP scale-out
– Analytics scale-out
• JSON in the data warehouse
Conclusions
1717

Teradata QueryGrid to MongoDB Lightning Introduction

  • 1.
    Teradata QueryGrid toMongoDB Lightning Introduction Rich Charucki - Teradata
  • 2.
    2 What is aTeradata Data Warehouse? • Analytic database – In-memory, in-database • Scale-out MPP – 30+ petabyte sites – 35PB, 4096 cores • Self service BI – Dashboards, reports, OLAP – Predictive analytics • Complex SQL – 20-50 way joins – 350 pages of SQL • Real time access/load • Mixed workloads Data scientists Power users Sales, partners 1024 nodes Intel CPUs 512GB Intel CPUs 512GB Intel CPUs 512GB Intel CPUs 512GB
  • 3.
    3 JSONPath inside SQL ColorSize Prod_ID Create_Time ----- ----- ------- ------------------- Blue Small 96 2013-06-17 20:07:27 SELECT box.MFG_Line.Product.Color AS "Color", box.MFG_Line.Product.Size AS "Size", box.MFG_Line.Product.Prod_ID AS "Prod_ID", box.MFG_Line.Product.Create_Time AS "Create_Time" FROM mfgTable WHERE CAST(box.MFG_Line.Product.Create_Time AS TIMESTAMP) >= TIMESTAMP'2013-06-16 00:00:00' AND box.MFG_Line.Product.Prod_ID = 96;
  • 4.
    4 Math and Stats Data Mining Business Intelligence Applications Languages Marketing ANALYTIC TOOLS & APPS USERS UNIFIEDDATA ARCHITECTURE Marketing Executives Operational Systems Frontline Workers Customers Partners Engineers Data Scientists Business Analysts INTEGRATED DATA WAREHOUSE DISCOVERY PLATFORM DATA LAKE REAL TIME PROCESSINGERP SCM CRM Images Audio and Video Machine Logs Text Web and Social SOURCES
  • 5.
    5 MONGODB NoSQL Database Teradata and MongoDB:QueryGrid IDW TERADATA DATABASE Discovery ASTER DATABASE Business users Data scientists TERADATA ASTER SQL, SQL-MR, SQL-GR Teradata Systems TERADATA DATABASE HADOOP Push-down to Hadoop SAS, Perl, R, Python, Ruby LANGUAGES
  • 6.
    6 Integration Export /Import Direct Connect
  • 7.
    7 Teradata and MongoDB •Operational + Analytical – Rich MongoDB applications – Rich Teradata analytics – Complementary • Teradata pulls directly from MongoDB sharded clusters • Teradata pushes back to MongoDB deployments MongoDB Teradata Operational Data Analytics
  • 8.
    8 Scale-out NoSQL +Scale-out DW SQL Application Primary Shard 1 Primary Shard 2 Primary Shard N Primary Shard 3 Query router Query router Query router NoSQL SQL AMPAMP PE AMPAMP PE AMPAMP PE AMPAMP PE
  • 9.
    9 Query Router Shard 1 Shard2 Shard 3 Shard 4 Contract Phase Teradata node PE SQL E A H AMP AMP AMP AMP
  • 10.
  • 11.
    11 Data Export toShards Teradata node E A H Query Router Shard 1 Shard 2 Shard 3 Shard 4 AMP AMP AMP AMP PE
  • 12.
    12 Import Data fromShards Teradata node E A H Query Router Shard 1 Shard 2 Shard 3 Shard 4 AMP AMP AMP AMP PE
  • 13.
    13 Back-office context tothe Front-office operations Use cases
  • 14.
    14 Data Warehouse eCommerce inAction: A Virtuous Circle Buyer preferences Sales catalog Campaigns Recent purchases Profitability Shard Shard Shard Shard Shard Shard Shard Shard
  • 15.
    15 Data Warehouse Shard Shard Shard Shard Shard Shard Shard Shard Call CenterEfficiency: A Virtuous Circle Trouble tickets Customer profiles Payment history Claims Next best offer web logs
  • 16.
    16 • Context fromthe DW – Enriching MongoDB applications • Integration – Import/export – Teradata QueryGrid • Two scale out architectures – OLTP scale-out – Analytics scale-out • JSON in the data warehouse Conclusions
  • 17.

Editor's Notes

  • #4 What items do we need to recall based on the quality issue on 6/16 with product #96? CAST looks at the JSON data type and formats it as a timestamp.
  • #5 The UDA architecture allows us to identify major subsystems and in this case actual hardware platforms performing the processing.
  • #6 The key to the QueryGrid vision is that once the DBA sets up the feature, ANY business user can join data from Teradata or Aster to the remote system dynamically. The join is done interactively and results are delivered via the user’s favorite BI tool. OK, we can always use flat files to exchange data back and forth. But dynamic access means it can be done fast, we don’t need batch processing or tools, the business user can easily invoke the process at any time, and most important is the data in the host data base is combined with data from the remote database easily. And in case you are wondering, the blue cylinder in the middle is the network, preferably Infiniband
  • #9 MongoDB builds its scale-out architecture using Shards. These are similar to the concept of AMPs in Teradata or Vworkers in Aster. Data is hashed across the MongoDB cluster and stored in a primary shard. It is also replicated to a secondary shard on another node to enable recovery should the primary shard be unavailable. Connectivity to shards is actually done through the query routers which send requests to the correct cluster node based on hashed keys. Its drawn this way for simplicity.
  • #10 Note: click for animations A table operator request is submitted to PE PE launches contract function via the EAH EAH opens JDBC to Query Router
  • #11 Note: click for animations EAH requests table metadata for specified table Metadata also includes ??? information PE & dispatcher distribute the output row format to all AMPs
  • #12 Note: click for animations Each AMP is mapped to a series of Shards AMP connects to its corresponding Shard via the EAH
  • #13 Note: click for animations Each AMP reads rows of data from a shard and spools the reformatted row into Teradata spool
  • #15 This is an existing Teradata customer who has evolved into using MongoDB for their eCommerce website. Formerly a mail order company, they have become a full eTailer. On a nightly basis, they extract data from MongoDB and load it into the data warehouse. They use deep dive predictive analytics, buyer preferences, promotional objectives, and other data to provide context and next-best-offers to the MongoDB application. Once calculated, the new information is exported to files and loaded into the MongoDB shards to make the website visitor experience more relevant and hopefully more sales come with it.
  • #16 THE major source of rich customer information is in the data warehouse. For years, DWs have collected customer purchases, payment history, buyer preferences, claims, plus next best offers and upsell opportunities. A lot of this data is historical going back 3-5 years. And some of it is the result of predictive analytics coupled with campaign management tools Real time tactical access to the data warehouse is the same as accessing any relational database. We call this Active Data Warehousing. 100s of Teradata customers are accessing data in near real time with their Active Data Warehouse. Combining these rich subject areas with MongoDB JSON data helps provide a faster time to resolution, next best offers, and the correct customer treatments based on their status with the corporation.