The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights
 

The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

on

  • 1,954 views

The Briefing Room with John O’Brien and Teradata ...

The Briefing Room with John O’Brien and Teradata

Slides from the Live Webcast on Aug. 21, 2012

Data and context -- that's the ultimate combination. Uniting those two is the goal of today's information managers, as they seek to connect the world of traditional business intelligence on structured data to the ocean of new, multi-structured Big Data that can provide so much valuable context and additional insights. The question of how begs answers, but the big issue of what technology is best dominates the dialogue in the world's most cutting-edge companies.

Check out this episode of The Briefing Room to learn from veteran database Analyst John O'Brien of Radiant Advisors as he explains how certain information architectures have advantages over others with respect to bridging structured and unstructured data. He'll be briefed by Steve Wooledge of Teradata who will detail his company's innovations in SQL-MapReduce, which allows professionals to perform multi-structured analytics at scale. He'll describe how a new extension called SQL-H allows analysts to use Hadoop as if it were just another table in the database.

For more information visit: http://www.insideanalysis.com

Statistics

Views

Total Views
1,954
Views on SlideShare
1,954
Embed Views
0

Actions

Likes
1
Downloads
34
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I like your Big Data presentation.
    I would like to share with you document about application of Big Data and Data Science in retail banking. http://www.slideshare.net/LadislavUrban/syoncloud-big-data-for-retail-banking-syoncloud
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights Presentation Transcript

  • Tuesday, August 21, 2012
  • Eric Kavanagh Eric.kavanagh@bloorgroup.comTwitter Tag: #briefrTuesday, August 21, 2012
  • Reveal the essential characteristics of enterprise software, good and bad Provide a forum for detailed analysis of today’s innovative technologies Give vendors a chance to explain their product to savvy analysts Allow audience members to pose serious questions... and get answers!Twitter Tag: #briefrTuesday, August 21, 2012
  • August: Analytics September: Integration October: Database November: Cloud December: InnovatorsTwitter Tag: #briefrTuesday, August 21, 2012
  • Analytics is, and always has been, about discovering insights that lead to better business decisions. The range of technologies and use cases that inhabit this area is wide: statistical analysis, data and process mining, predictive analytics and modeling, and complex event processing. What is now referred to as Big Data has pushed analytics beyond the capabilities of traditional solutions. “Big Analytics” has organizations diving into large heaps of data that previously was not available or usable. The growing volume, variety, velocity and complexity of data has proven to be a major challenge to organizations who leverage analytics to maintain a competitive edge. Twitter Tag: #briefrTuesday, August 21, 2012
  •   John is the Principal and Founder of Radiant Advisors. As a recognized thought leader in BI, John has been publishing articles and presenting at conferences for the past 10 years. He has been a Best Practices judge, presenter and panel participant at TDWI. John has also developed and presented his own courses: Radiant Advisors Learning Catalog. John has a B.S. in Mechanical Engineering from California State University and an M.B.A. from the University of Colorado. He is a Certified Business Intelligence Professional with mastery levels in Leadership and Administration, Database Administration and Business Intelligence.  Twitter Tag: #briefrTuesday, August 21, 2012
  • Teradata is known for its analytic data solutions with a focus on integrated data warehousing, big data analytics and business applications. It offers a broad suite of technology platforms and solutions, and a wide range of data management applications and data mining capabilities. Teradata features Teradata Aster is its MapReduce platform to handle big data and big analytics on multi-structured data.Twitter Tag: #briefrTuesday, August 21, 2012
  • Steve Wooledge is Senior Director of Marketing at Teradata’s Aster Center of Innovation, where he is an evangelist for the company’s analytic platform product and responsible for awareness, demand generation, and solution marketing for the data scientist. Steve has more than 10 years of experience in product marketing and business development for business intelligence, data management, Web analytics and e-commerce products. Prior to his current role, Steve held product marketing positions at Interwoven and Business Objects as well as sales and engineering roles at Business Objects, Dow Chemical and Occidental Petroleum. Steve has a B.S. in Chemical Engineering and an M.B.A. in Marketing and Finance.Twitter Tag: #briefrTuesday, August 21, 2012
  • The Unified Big Data Architecture & Bridging the Analyst Gap for Hadoop Steve Wooledge, Sr. Director of Marketing August 21, 2012Tuesday, August 21, 2012
  • Topics • Quick intro to Teradata Aster • The need for a unified big data architecture • Bridging the Analyst Gap for Hadoop: Aster SQL-H™ 10 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster Leading Innovator in Data Discovery for the Enterprise Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules § On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules § On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules § On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules § On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Aster MapReduce Platform Analysts Customers Business Users Data Scientists Your Analytic & Advanced Reporting Applications • 50+ pre-built analytic modules Develop Rapid Analytics • Visual IDE; develop apps in hours Development • Many programming languages • SQL-MapReduce framework Process Embedded Analytic • Analyze both structured Processing & multi-structured data • Linear, incremental scalability • Commodity-hardware based Store Massively Parallel Data • Software only, appliance, or cloud Storage • Relational-data architecture can be extended for non-relational types 12 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Business Impact / ROI Increased conversions from recommendations with 360-degree view of customer across in-store and .com behavior • Payment processing Build revenue attribution analytics down from one day to one minute with SQL- models to link every purchase to a MapReduce site feature • Web log data processing from seven hours to 20 minutes Reduce churn from one day • Interactive dashboards with all KPI’s from point of order previously to 20 minutes inception—down from five hours to five minutes Deeper Consumer Insights with Teradata Aster 13 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Big Data: From Transactions to Interactions Web logs WEB A/B testing s te aby r Offer history Dynamic Pricing Te Affiliate Networks te s CRM Segmentation a by Search marketing g Gi Offer details ERP Behavioral Targeting es Customer Touches b yt Purchase detail ga Purchase record Me Support Contacts Dynamic Funnels Payment record 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Big Data: From Transactions to Interactions BIG DATA User Generated Content es Social Network b yt Mobile Web ta Pe User Click Stream Sentiment External Demographics Web logs WEB A/B testing Business Data Feeds s te aby r Offer history Dynamic Pricing Te HD Video Affiliate Networks te s CRM Speech to Text Segmentation a by Search marketing g Gi Offer details Product/Service Logs ERP Behavioral Targeting es Customer Touches b yt Purchase detail g a SMS/MMS Purchase record Me Support Contacts Dynamic Funnels Payment record Increasing data variety and complexity 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions “Capture only what’s needed” 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions “Capture only what’s needed” IT delivers a platform for Big Data Method storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions “Capture only what’s needed” IT delivers a platform for Big Data Method storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering “Capture in case it’s needed” 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions SQL performance and structure “Capture only what’s needed” MapReduce Processing Flexibility IT delivers a platform for Big Data Method storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering “Capture in case it’s needed” 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • SQL-MapReduce • Single-pass of data MapReduce Analytics • Linked list sequential analysis Traditional SQL Example: Pattern Matching Analysis • Self-Joins for sequencing • Limited operators for ordered data 16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • The Advantages of MapReduce Raw click-stream data and pattern matching with nPath Goal • Increase understanding of customer behavior Click Stream Analysis: on a website to improve advertising rates or website navigation Comparative Performance Challenges 400 SQL for 3 pages: • Full website session-level data needed, 6 minutes typically from raw web logs 300 • Requires complex multi-pass SQL queries MapReduce for 3, 4, 8, 12 pages: or Non-SQL techniques 77-131 seconds • Requires rewriting query to change number Time 200 of clicks analyzed MapReduce Value 100 • Performance: Single pass over data regardless of number of clicks analyzed • Manageability: Much simpler code— 0 from 350 lines of SQL to 18-line SQL- SQL  (3pg) SQL-­‐MR  (3pg) SQL-­‐MR  (4pg) SQL-­‐MR  (8pg) SQL-­‐MR  (12pg) MapReduce Example Analytic Logic • Ease of Use: Pattern flexibility to handle People who search ‘diabetes’ also browse… varied numbers of clicks and click patterns People who download visit pages A, B, D … without rewriting code 17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Need for a Unified Big Data Architecture for New Insights Enabling All Users for Any Data Type from Data Capture to Analysis Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Reporting and Execution Discover and Explore in the Enterprise Capture, Store and Refine Audio/ Web & Machine Images Docs Text CRM SCM ERP Video Social Logs 18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Teradata Unified Big Data Architecture Any User, Any Data, Any Analysis Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Aster MapReduce Portfolio Teradata Analytics Portfolio Integrated Data Discovery Platform Warehouse SQL-H Capture, Store, Refine Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Hadoop Points of Integration – Bulk Data Transfer • Teradata:Hadoop • JDBC (available today) − Hadoop programs can call JDBC • TDDBinputformat/Dboutputformat (available today) − Submits SQL to JDBC • Cloudera Sqoop (available today) − Command line import/export database objects • Aster:Hadoop • Aster-Hadoop Adaptor – node:node transfer using SQL-MapReduce Opportunity for analysts to more easily access Hadoop data 20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Source: Enterprise Strategy Group; April 5, 2012Tuesday, August 21, 2012
  • Source: Enterprise Strategy Group; April 5, 2012Tuesday, August 21, 2012
  • Bridging the Business Analyst Gap for Hadoop DataTuesday, August 21, 2012
  • Announced June 12th, 2012 Aster SQL-H™ A Business User’s Bridge to Analyze Hadoop DataAster SQL-H gives analysts and data scientists a better way to analyze data stored cheaply in Hadoop•Allow standard ANSI SQL to Hadoop data•Leverage existing BI tool investments•Enable 50+ prebuilt SQL-MapReduce Apps and IDE•Lower costs by making data analysts self-sufficient 23 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • The Big Data Architecture Today Has Gaps Analyst’s Goal: Get Insights from Data in Hadoop Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Teradata Analytics Portfolio Custom Code and Development SQL & SQL-MapReduce SQL MR, Pig, Hive Teradata Aster Teradata IT is the optimizer Discovery Platform IDW HDFS 24 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Analytics on Hadoop Data with Aster SQL-H Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Teradata Analytics Portfolio SQL & MapReduce SQL Teradata Aster Teradata Discovery Platform IDW HDFS 25 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Analytics on Hadoop Data with Aster SQL-H Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Aster MapReduce Portfolio Teradata Analytics Portfolio SQL-H SQL & MapReduce SQL & SQL-MapReduce SQL SQL Teradata Aster Teradata Discovery Platform IDW HDFS 25 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Aster SQL-H Integration with Hadoop Catalog A Business User’s Bridge to Analyzing Data in Hadoop • Industry’s First Database Integration with Hadoop’s HCatalog Aster SQL-H • Abstraction layer to easily and efficiently read structured & multi- structured data stored in HDFS Hadoop • Uses Hadoop Catalog (HCatalog) to MapReduce perform data abstraction functions (e.g. automatically understands tables, data partitions) Hive HCatalog • HDFS data presented to users as Aster tables Pig • Fully accessible within the Aster SQL and SQL-MapReduce processing engines, plus ODBC/JDBC & BI tools HDFS 26 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Data & Processing Locality in SQL-H•SQL & SQL-MapReduce processing•Intermediate data Aster Layer: SQL-H persistence•Optional: HDFS data subset persistence for maximum performance Hadoop MR Data Filtering Hive HCatalog Data•Hcatalog: metadata store•HDFS: data repository Pig•No MapReduce processing in Hadoop•Directly & in parallel move data from HDFS to Teradata Hadoop Layer: HDFS Aster 27 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Benefits of Aster SQL-H™ Deep metadata layer integration between Aster and Hadoop Business Analysts (Powerful analytics & Performance) •50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio) •Simplified, SQL-based interface with Hadoop data structures (Hcatalog) •Interoperability with existing ecosystem & skillset Architects and Administrators (Maintainability) •Leverage existing DBA skill-sets without additional overhead •Simplify administration and monitoring - Alternatives require manual creation and maintenance of metadata - Less work and fewer errors - Can do filtering with Aster; select data from HCatalog, leverage partitioning 28 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Aster MapReduce Portfolio: the App Store of Big Data Some of the 50+ out-of-the-box analytical apps Path Analysis Text Analysis Discover patterns in rows of Derive patterns and extract sequential data features in textual data Statistical Analysis Segmentation High-performance processing of Discover natural groupings of common statistical calculations data points Marketing Analytics Data Transformation Analyze customer interactions to Transform data for more optimize marketing decisions advanced analysis 29 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Big Data Architecture: Optimizing Workloads with Specialized ApproachTuesday, August 21, 2012
  • When to Use Which? The best approach by workload and data type • Processing as a Function of Schema Requirements by Data Type Low Cost Loading and Refining Analytics Storage & Reporting (User-driven, Retention Data Pre-Processing, Transformations interactive) Prep, Cleansing Financial analysis, ad-Hoc/OLAP Stable Teradata / Enterprise-wide BI and Reporting Teradata Teradata Teradata Teradata Schema Hadoop Spatial/Temporal (SQL analytics) Active Execution Interactive data discovery Aster Aster Evolving Aster / Web clickstream Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop Set-top box analysis structured data) Analytics) CDRs, Sensor logs, JSON Social feeds, text, document, or image processing Aster Format, Hadoop Hadoop Audio/video storage and refining Hadoop (MapReduce No Schema Storage and batch transformations Analytics) 31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • When to Use Which? The best approach by workload and data type • Processing as a Function of Schema Requirements by Data Type Low Cost Loading and Refining Analytics Storage & Reporting (User-driven, Retention Data Pre-Processing, Transformations interactive) Prep, Cleansing Stable Teradata / Teradata Teradata Teradata Teradata Schema Hadoop (SQL analytics) Interactive data discovery Aster Aster Evolving Aster / Web clickstream Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop Set-top box analysis structured data) Analytics) CDRs, Sensor logs, JSON Social feeds, text, document, or image processing Aster Format, Hadoop Hadoop Audio/video storage and refining Hadoop (MapReduce No Schema Storage and batch transformations Analytics) 31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • When to Use Which? The best approach by workload and data type • Processing as a Function of Schema Requirements by Data Type Low Cost Loading and Refining Analytics Storage & Reporting (User-driven, Retention Data Pre-Processing, Transformations interactive) Prep, Cleansing Stable Teradata / Teradata Teradata Teradata Teradata Schema Hadoop (SQL analytics) Aster Aster Evolving Aster / Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop structured data) Analytics) Social feeds, text, document, or image processing Aster Format, Hadoop Hadoop Audio/video storage and refining Hadoop (MapReduce No Schema Storage and batch transformations Analytics) 31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • When to Use Which? The best approach by workload and data type • Processing as a Function of Schema Requirements by Data Type Low Cost Loading and Refining Analytics Storage & Reporting (User-driven, Retention Data Pre-Processing, Transformations interactive) Prep, Cleansing Stable Teradata / Teradata Teradata Teradata Teradata Schema Hadoop (SQL analytics) Aster Aster Evolving Aster / Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop structured data) Analytics) Aster Format, Hadoop Hadoop Hadoop (MapReduce No Schema Analytics) 31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • ESG Benchmark Report Summary 3rd-party validation of Aster and Hadoop “fit” Scope • Identical hardware for Aster and Hadoop • Clickstream, sentiment, & traditional retail data • Compare “time to insight” and “time to develop” Results •Loading: Hadoop 1.8x faster •Transforms: Hadoop 1.3x faster •Analytics: Aster 35x faster (range: 4-416x) •Development: Aster 3x faster 32 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Hadoop vs. Aster Web Clickstream Analytics 33 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Hadoop vs. Aster Web Clickstream Analytics On average Aster is 18x Faster Aster Aster Aster 1.5X Faster 33X Faster 6X Faster 33 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Single Channel Pathing Analysis 35 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Analyzing Multi-channel Identifies Advertising Signal 36 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Hadoop Provides 1.3x Faster ELT on Average 37 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • When to Use Which Depends on Data Type - Aster faster on parsing and sessionizing Weblogs 38 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Evolving Schema Example Aster Digital Marketing Client Custom Data by Client Analytic Tools Media Data (Aggregated) Teradata Aster Cookie-level Raw Web Archival data Logs Ad Server Hadoop (on AWS) Logs (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Evolving Schema Example Aster Digital Marketing Client Custom Data • Segmentation: Custom SQL-MR by Client Analytic Tools algorithms to match and create centralized identifiers • Sessionize by client • nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) Cookie-level Raw Web Archival data Logs Ad Server Hadoop (on AWS) Logs (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Evolving Schema Example Aster Digital Marketing Client Custom Data • Segmentation: Custom SQL-MR by Client Analytic Tools algorithms to match and create centralized identifiers • Sessionize by client • nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) • Benefits: Cookie-level Raw Web Archival data Logs Ad Server Hadoop (on AWS) Logs (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Evolving Schema Example Aster Digital Marketing Client Custom Data • Segmentation: Custom SQL-MR by Client Analytic Tools algorithms to match and create centralized identifiers • Sessionize by client • nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) • Benefits: Cookie-level Raw Web Archival - Marketing analysts more data Logs productive with Aster Ad Server Hadoop (on AWS) Logs (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Evolving Schema Example Aster Digital Marketing Client Custom Data • Segmentation: Custom SQL-MR by Client Analytic Tools algorithms to match and create centralized identifiers • Sessionize by client • nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) • Benefits: Cookie-level Raw Web Archival - Marketing analysts more data Logs productive with Aster - Lower cost - storage and batch refining done on Ad Server Logs Hadoop (on AWS) Amazon Elastic MapReduce (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • More Accurate Customer Churn Prevention Social feeds Clickstream Data Multi-Structured Raw Data Call Data Aster Analysis Call Center Voice Sentiment + Hadoop Discovery Records Scores Platform Marketing Check Data Automation Check Images Analytic Results Dimensional Data (Customer Capture, Retain & Retention Traditional Data Flow Refine Layer Campaign) Data Sources ETL Tools Teradata Integrated DW 40 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • More Accurate Customer Churn Prevention Social feeds Clickstream Data Hadoop captures, Aster does path stores and and sentiment transforms social, analysis with images and call multi-structured records data Multi-Structured Raw Data Call Data Aster Analysis Call Center Voice Sentiment + Hadoop Discovery Records Scores Platform Marketing Check Data Automation Check Images Analytic Results Dimensional Data (Customer Capture, Retain & Retention Traditional Data Flow Refine Layer Campaign) Data Sources ETL Tools Teradata Integrated DW 40 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Summary Bringing the VALUE of Hadoop to the Enterprise • Teradata is focused on extracting most business value for customers from data in Hadoop • Mainstream organizations need a unified big data architecture - Best-of-breed with Hadoop, Aster, Teradata - Brings “Data Science” to business analysts - 50+ business-ready MapReduce analytics and apps - Enabled by SQL-MapReduce framework and new SQL-H • Learn more at www.asterdata.com/mapreduce 41 Confidential and proprietary. Copyright © 2012 Teradata Corporation.Tuesday, August 21, 2012
  • Tuesday, August 21, 2012
  • Twitter Tag: #briefrTuesday, August 21, 2012
  • 1 THE GREAT DIVIDE: BRIDGING UNSTRUCTURED AND STRUCTURED DATA FOR NEW CUSTOMER INSIGHTS §Briefing Room - August 21, 2012 §John O’Brien, Radiant Advisors §john.obrien@radiantadvisors.com © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000Tuesday, August 21, 2012
  • 2 Principal and Founder, Radiant Advisors JOHN O’BRIEN §With over 25 years of experience delivering value through data warehousing and BI programs, John O’Briens unique perspective comes from the combination of his roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in designing, building, and growing enterprise BI systems and teams brings real world insights to each role and phase within a BI program. §Today, through Radiant Advisors John provides research and advisory services that guide companies in meeting the demands of next generation information management, architecture, and emerging technologies. Instructor 10+ years Experienced Education As a recognized thought leader in BI, In 2005, John co-founded and became John has a B.S. in Mechanical John has been publishing articles and CTO of a data warehouse appliance Engineering from California State presenting at conferences in North company that raised $43 million in University with an emphasis in America and Europe for the past 10 several rounds of venture capital control systems and instrumentation years, including The Data Warehousing financing and has many global and an Executive M.B.A. from Institute where he has been invited as production customers.  As CTO, John’s University of Colorado.  He is a one of TDWI’s Best Practices judges, primary role was to focus product Certified Business Intelligence Executive Summit presenters and development and BI market strategy. Professional (CBIP) since 2005 with expert panel participants. John has mastery levels in Leadership and also developed and presented many of Administration, Database his own courses that now comprise the Administration and Business initial Radiant Advisors Learning Intelligence. Catalog. © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000Tuesday, August 21, 2012
  • 3 §Bridging the Great Divide: Unstructured and Structured Data WHERE DOES CONTEXT LIVE? Context leveraged Context(s) leveraged Structured BI Tools Context in abstraction Direct access Context in structures Context in structures Individual Context Context in Unstructured with Data Data Scientists Scientists Hive M/R PIG Centralized Hive Centralized PIG Context in abstraction Context in HCatalog abstraction MapReduce MapReduce Hadoop HDFS Hadoop HDFS More Rigid More Agile © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000Tuesday, August 21, 2012
  • 4 §Bridging the Great Divide: Unstructured and Structured Data UNLOCKING UNSTRUCTURED VALUE Yesterday Tomorrow & Analysts Casual Users Value Value Power Users Power Users Users Involved Users Involved More Very Few Many Many More Very Few Analysts Data Scientists Consumers Analysts Data Scientists Tool Hive PIG DB BI Hive PIG HCatalog ç MapReduce MapReduce Hadoop HDFS Hadoop HDFS © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000Tuesday, August 21, 2012
  • 5 §Bridging the Great Divide: Unstructured and Structured Data DISCOVERY IN BI PROCESSES 1. Many More Analysts Many Many Consumers Discover Context Hive Tool PIG BI Tool BI Few More Analysts/Modelers ç ç Analysts/ Modelers HCatalog ç ç M/R Hadoop HDFS Very Few Data Scientists Defined Context 2. Available to Structured Database © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000Tuesday, August 21, 2012
  • 6 §Bridging the Great Divide: Unstructured and Structured Data MODERN BI ARCHITECTURES Data Warehouse: Internet, Optimized Work Loads Sensor data Operational Hadoop: Benefit from Context Massive Scalability Operational Systems Lowest Cost Insulate Change or Direct to Handles Complexity Staging Migrate History or ETL Acquire Staging ETL ç MapReduce or ETL ç ç ç ç Very Few Few ETL Data Analysts/ Scientists PIG Modelers Data Marts Hadoop HDFS HCatalog Data Marts Hive Data Marts Many Many Consumers © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000Tuesday, August 21, 2012
  • 7 §Bridging the Great Divide: Unstructured and Structured Data SUMMARY • Understand context in processes and architectures • Realize that value is unlocked with more users • Discovery is a powerful BI process to operationalize • Modern BI Architectures are integrating Hadoop © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000Tuesday, August 21, 2012
  • • Is Aster Solution intended for Data Discovery Platform and/or Analytic Engine Platform? • Is there any difference in semantics for Teradatas vision of Integrated Data Warehouse vs. "Analytic Platform" which includes Aster and Hadoop? • Does the Hcatalog need to be defined before users can use SQL-H to query Hadoop? • The Aster MapReduce Portfolio enables its users to query and pull data from the Hadoop HDFS directly via SQL-H.  When data is pulled in from HDFS into Aster, are the Aster tables modeled as in Hcatalog or as key-value pairs? • Is the output of the SQL-MR in Aster inserted into another physical table for further usage? Twitter Tag: #briefrTuesday, August 21, 2012
  • • Given that Hive and PIG are interface layers above the MapReduce processing layer, does the Aster Layer SQL-H work as an interface layer interfacing with MapReduce?  Does SQL-H work similar to Hive when processing data inside HDFS? • When it comes to performance comparisons between Aster and Hadoop, what guidelines were given in sizing the Hadoop environment? • Given the commodity nature of Hadoop, does it make sense to increase the size of Hadoop environment to gain performance more cost effectively? • When to use Hadoop or Aster? Based on data type?  Based on workload (e.g. Load, ETL, Analyze)? Or Based on Analysis type (e.g. Sentiment Classification or Sessionization)? Twitter Tag: #briefrTuesday, August 21, 2012
  • • Does Aster store "multi-structured" data such as audio, video, image, pdf, etc files as a blog/clob field in database records or stores pointers to files? • Does Aster Data have Predictive Modeling Markup Language (PMML) compatibility to enable Discovery through the inter-operability of Analytic Models to allow models developed in SAS or other platforms to be migrated to Aster? Twitter Tag: #briefrTuesday, August 21, 2012
  • Twitter Tag: #briefrTuesday, August 21, 2012
  • August: Analytics September: Integration October: Database November: Cloud December: InnovatorsTwitter Tag: #briefrTuesday, August 21, 2012
  • Twitter Tag: #briefrTuesday, August 21, 2012