<b>Teradata QueryGrid to MongoDB Lightning Introduction </b> [2:10 pm - 2:30 pm]<br />This is where SQL and NoSQL work together. This session demonstrates the joining MongoDB documents with data warehouse tables to perform new levels of analytics. Seamless self-service dataaccess will be accomplished via a simple SQL JSON notation from Teradata to MongoDB. Now, no more time and effort will be required to co-locate data from both platforms in order to analyze it! Using theTeradata QueryGrid connector to MongoDB enables users to access data on two systems transparently in a self-service manner. This session introduces Teradata’s new capability.
2. 2
What is a Teradata Data Warehouse?
• Analytic database
– In-memory, in-database
• Scale-out MPP
– 30+ petabyte sites
– 35PB, 4096 cores
• Self service BI
– Dashboards, reports, OLAP
– Predictive analytics
• Complex SQL
– 20-50 way joins
– 350 pages of SQL
• Real time access/load
• Mixed workloads
Data
scientists
Power
users
Sales,
partners
1024 nodes
Intel
CPUs
512GB
Intel
CPUs
512GB
Intel
CPUs
512GB
Intel
CPUs
512GB
3. 3
JSONPath inside SQL
Color Size Prod_ID Create_Time
----- ----- ------- -------------------
Blue Small 96 2013-06-17 20:07:27
SELECT
box.MFG_Line.Product.Color AS "Color",
box.MFG_Line.Product.Size AS "Size",
box.MFG_Line.Product.Prod_ID AS "Prod_ID",
box.MFG_Line.Product.Create_Time AS "Create_Time"
FROM mfgTable
WHERE CAST(box.MFG_Line.Product.Create_Time
AS TIMESTAMP) >= TIMESTAMP'2013-06-16 00:00:00'
AND box.MFG_Line.Product.Prod_ID = 96;
What items do we need to recall based on the quality issue on 6/16 with product #96?
CAST looks at the JSON data type and formats it as a timestamp.
The UDA architecture allows us to identify major subsystems and in this case actual hardware platforms performing the processing.
The key to the QueryGrid vision is that once the DBA sets up the feature, ANY business user can join data from Teradata or Aster to the remote system dynamically. The join is done interactively and results are delivered via the user’s favorite BI tool.
OK, we can always use flat files to exchange data back and forth. But dynamic access means it can be done fast, we don’t need batch processing or tools, the business user can easily invoke the process at any time, and most important is the data in the host data base is combined with data from the remote database easily.
And in case you are wondering, the blue cylinder in the middle is the network, preferably Infiniband
MongoDB builds its scale-out architecture using Shards. These are similar to the concept of AMPs in Teradata or Vworkers in Aster. Data is hashed across the MongoDB cluster and stored in a primary shard. It is also replicated to a secondary shard on another node to enable recovery should the primary shard be unavailable.
Connectivity to shards is actually done through the query routers which send requests to the correct cluster node based on hashed keys. Its drawn this way for simplicity.
Note: click for animations
A table operator request is submitted to PE
PE launches contract function via the EAH
EAH opens JDBC to Query Router
Note: click for animations
EAH requests table metadata for specified table
Metadata also includes ??? information
PE & dispatcher distribute the output row format to all AMPs
Note: click for animations
Each AMP is mapped to a series of Shards
AMP connects to its corresponding Shard via the EAH
Note: click for animations
Each AMP reads rows of data from a shard and spools the reformatted row into Teradata spool
This is an existing Teradata customer who has evolved into using MongoDB for their eCommerce website. Formerly a mail order company, they have become a full eTailer. On a nightly basis, they extract data from MongoDB and load it into the data warehouse. They use deep dive predictive analytics, buyer preferences, promotional objectives, and other data to provide context and next-best-offers to the MongoDB application. Once calculated, the new information is exported to files and loaded into the MongoDB shards to make the website visitor experience more relevant and hopefully more sales come with it.
THE major source of rich customer information is in the data warehouse. For years, DWs have collected customer purchases, payment history, buyer preferences, claims, plus next best offers and upsell opportunities. A lot of this data is historical going back 3-5 years. And some of it is the result of predictive analytics coupled with campaign management tools
Real time tactical access to the data warehouse is the same as accessing any relational database. We call this Active Data Warehousing. 100s of Teradata customers are accessing data in near real time with their Active Data Warehouse.
Combining these rich subject areas with MongoDB JSON data helps provide a faster time to resolution, next best offers, and the correct customer treatments based on their status with the corporation.