• Save
Unified big data architecture
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Unified big data architecture

on

  • 5,236 views

Trending use cases have pointed out the complementary nature of Hadoop and existing data management systems—emphasizing the importance of leveraging SQL, engineering, and operational skills, as well ...

Trending use cases have pointed out the complementary nature of Hadoop and existing data management systems—emphasizing the importance of leveraging SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing. Many vendors have provided interfaces between SQL systems and Hadoop but have not been able to semantically integrate these technologies while Hive, Pig and SQL processing islands proliferate. This session will discuss how Teradata is working with Hortonworks to optimize the use of Hadoop within the Teradata Analytical Ecosystem to ingest, store, and refine new data types, as well as exciting new developments to bridge the gap between Hadoop and SQL to unlock deeper insights from data in Hadoop. The use of Teradata Aster as a tightly integrated SQL-MapReduce® Discovery Platform for Hadoop environments will also be discussed.

Statistics

Views

Total Views
5,236
Views on SlideShare
5,160
Embed Views
76

Actions

Likes
10
Downloads
0
Comments
0

2 Embeds 76

http://eventifier.co 67
http://eventifier.com 9

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Unified big data architecture Presentation Transcript

  • 1. Integrating Hadoop within an EnterpriseAnalytic EcosystemPriyank Patel | Product ManagementJune 13 2012
  • 2. Topics•  Unified Big Data Architecture Overview•  Aster SQL-H™ : The business user’s bridge to Hadoop Data2 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 3. Big Data: From Transactions to Interactions BIG DATA User Generated Social Network Content Mobile Web External User Click Stream Sentiment Demographics Web logs WEB A/B testing Business Data Feeds Offer history Dynamic Pricing HD Video Affiliate Networks CRM Speech to Text Segmentation Search marketing Offer details Product/Service Logs ERP Behavioral Targeting Customer Touches Purchase detail Purchase record Support Contacts Dynamic Funnels SMS/MMS Payment record Increasing data variety and complexity3 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 4. Unified Big Data ArchitectureBridging Classic & Big Data Worlds Classic BI Method Structured & Repeatable AnalysisBusiness determines what IT structures the data to questions to ask answer those questions SQL performance and structure “Capture only what’s needed” MapReduce Processing Flexibility IT delivers a platform for Big Data Analytics storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering “Capture in case it’s needed”4 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 5. Need for a Unified Big Data Architecture for New InsightsEnabling All Users for Any Data Type from Data Capture to Analysis Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Reporting and Execution Discover and Explore in the Enterprise Capture, Store and Refine Audio/ Web & Machine Images Docs Text CRM SCM ERP Video Social Logs5 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 6. Unified Big Data Architecture for the Enterprise Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Discovery Platform Integrated Data Warehouse Capture, Store, Refine Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 6 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 7. What’s Technically Different in Big Data AnalyticsVariety of data types and analytics require different schemas•  Data that uses a stable schema (structured) -  Data from packaged business processes with well-defined & known attributes (e.g., ERP data, Inventory Records, Supply Chain records, …)•  Data that has an evolving schema (semi-structured) -  Data generated by machine processes; known but changing set of attributes (e.g., Web logs, CDRs, Sensor logs, JSON, Social profiles, Twitter feeds, …)•  Data that has a format, but no schema (unstructured) -  Data captured by machines with well-defined format, but no semantics (e.g., images, videos, web pages, PDF documents, …) -  Semantics can be extracted from raw data by interpreting the format and pulling out required data (e.g., shapes from video, face recognition in images, logo detection, …) -  Sometimes format data is accompanied by meta-data that can have (Stable Schema or Evolving Schema) – that needs to be classified and treated separately7 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 8. Diversity of Data Processing and AnalyticsUnified Big Data Architecture Must Handle Each Workload Optimally•  Low cost storage and retention -  Retention of raw data in manner that can provide low TCO per terabyte storage costs -  Access in deep storage still required but not at same speeds as in a front line system•  Loading and refining -  Load: bring data into the system from the source system -  Pre-processing / prep/ cleansing / constraint validation: prepare data for downstream processing – e.g., fetch dimension data, record new incoming batch, archive old window batch, etc. -  Transformations: Convert one structure of data into another structure. This may require going from 3NF in relational to star/snowflake schema in Relational, or going from text to Relational, or going from Relational to Graph – I.e., structural transformations•  Reporting -  This is querying of what happened, where did it happen, how much happened, who did it•  Analytics (user-driven, interactive, ad-hoc) -  Relationship modeling that can be done via declarative SQL (e.g., scoring, basic stats) -  Relationship modeling done via procedural MR (E.g., model building, time series)8 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 9. When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements by Data Type Loading and Refining Low Cost Analytics Storage & Data Pre-Processing, Reporting (User-driven, Retention Prep, Cleansing Transformations interactive) Financial analysis, ad-Hoc/OLAPStable Teradata / Enterprise-wide BI and Reporting Teradata Teradata Teradata TeradataSchema Hadoop Spatial/Temporal (SQL analytics) Active Execution Interactive data discovery Aster Web clickstream, social feeds AsterEvolving Aster / Hadoop (joining with Aster (SQL + MapReduceSchema Hadoop Set-top box analysis structured data) Analytics) CDRs, Sensor logs, JSON Image processing AsterFormat,No Schema Hadoop Audio/video storage and refining Hadoop Hadoop (MapReduce Analytics) Storage and batch transformations 9 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 10. Aster Digital Marketing Client Custom Analytic Tools •  Segmentation: Custom SQL- Data by Client MR algorithms to match and create centralized identifiers •  Sessionize by client •  nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) •  Benefits: Cookie-level Raw Web Archival -  Marketing analysts more data Logs productive with Aster -  Lower cost - storage and batch refining done on Ad Server Logs Hadoop (on AWS) Amazon Elastic (Storage, aggregations, cleansing) MapReduce10 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 11. More Accurate Customer Churn Prevention Hadoop captures, Aster does path stores and and sentiment transform images Social & analysis with Web data multi-structured and call records data Multi-Structured Raw Data Call Data Analysis Aster Call Center Hadoop Discovery + Voice Records Check Data Platform Marketing Automation Check Images Capture, Analytic Results Dimensional Data Retention (Customer & Retention Campaign) Traditional Data Flow Transformation Layer Data Sources ETL Tools Teradata Integrated DW11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 12. Bridging the Business Analyst Gap for Hadoop Data
  • 13. Aster SQL-H™ A Business User’s Bridge to Analyze Hadoop DataAster SQL-H gives analysts and data scientists a better wayto analyze data stored cheaply in Hadoop •  Allow standard ANSI SQL to Hadoop data •  Leverage existing BI tool investments •  Enable 50+ prebuilt SQL-MapReduce Apps and IDE •  Improve self-sufficiency for analysts going against Hadoop 13 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 14. The Big Data Architecture Today Has Gaps Gap 1: Analysts Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. MapReduce (Processing) Discovery Active Data Gap 2: File system lacks Platform Warehouse optimizers, data locality, indexes Database and Analytic Processing Layer Data Storage and Refining Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 15. Analyst’s Goal: Get Insights from Data in Hadoop Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Teradata Analytics Portfolio Custom Code and Development SQL & SQL-MapReduce SQL MR, Pig, Hive Teradata Aster Teradata IT is the optimizer Discovery Platform IDW 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 16. Analytics on Hadoop Data with Aster SQL-H Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Aster MapReduce Portfolio Teradata Analytics Portfolio SQL-H SQL & MapReduce SQL & SQL-MapReduce SQL SQL Teradata Aster Teradata Discovery Platform IDW 16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 17. Aster SQL-H Integration with Hadoop CatalogA Business User’s Bridge to Analyzing Data in Hadoop•  Industry’s First Database Integration with Hadoop’s HCatalog Aster SQL-H•  Abstraction layer to easily and efficiently read structured & multi- structured data stored in HDFS Hadoop•  Uses Hadoop Catalog (HCatalog) to MapReduce perform data abstraction functions (e.g. automatically understands tables, data partitions) Hive HCatalog•  HDFS data presented to users as Aster tables Pig•  Fully accessible within the Aster SQL and SQL-MapReduce processing engines, plus ODBC/JDBC & BI tools HDFS17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 18. Benefits of Aster SQL-H™Deep metadata layer integration between Aster and HadoopBusiness Analysts (Powerful analytics & Performance)•  50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio) -  Analytics on Hadoop data no longer requires expensive talent and training•  Simplified, SQL-based interface with Hadoop data structures (Hcatalog) -  No longer limited by Hive’s QL•  Interoperability with existing ecosystem & skillset -  BI tools (MSTR, Tableau, Cognos), ETL tools, SQL analysts, existing appsArchitects and Administrators (Maintainability)•  Leverage existing DBA skill-sets without additional overhead•  Simplify administration and monitoring -  Competitors require manual creation and maintenance of metadata -  Less work and fewer errors -  Can do filtering with Aster; select data from HCatalog, leverage partitioning 18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 19. Aster MapReduce Portfolio: the App Store of Big DataSome of the 50+ out-of-the-box analytical apps Path Analysis Text Analysis Discover patterns in rows of Derive patterns and extract sequential data features in textual data Statistical Analysis Segmentation High-performance processing of Discover natural groupings of common statistical calculations data points Marketing Analytics Data Transformation Analyze customer interactions to Transform data for more optimize marketing decisions advanced analysis19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 20. Summary•  Mainstream organizations need a unified big data architecture -  Best-of-breed with Hadoop, Aster, Teradata -  Brings “Data Science” to business analysts -  50+ business-ready MapReduce analytics and apps -  Enabled by SQL-MapReduce framework and new SQL-H•  Learn more - asterdata.com/mapreduce•  Download - developer.teradata.com/aster•  Breakout Session : Thursday – 4:30 pm How does SQL-H work ? Sushil Thomas, Teradata Aster20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.