Big Data ArchitectureTasso Argyros | co-President | Teradata AsterTwitter: @targyrosNovember, 2011
What We’re Covering Today•   Data Science in Enterprise (vs the Valley)•   Quick Overview of Teradata Aster’s Technology• ...
About Aster Data• Aster has been a Big Data & Big Analytics pioneer since 2005  by developing an MPP SQL+MapReduce platfor...
The Nature of Data Scientist Analytics          in the Enteprise
What is Data Science?                                            Curiosity/      Data                                     ...
Data Science is Exploding6   Teradata Confidential and Proprietary
What is Making Data Science Popular?1. Proliferation of Data-Driven Products & Businesses2. Consumer Interactions with Web...
A Day in the Life of a Data Scientist –“Investigative Analytics”                                            Integrate     ...
Data Scientists in the Enterpriseare Not Only Developers                                                           SQL Ana...
Data Scientists Have Different SkillsCombination of:-  Analysts-  Coders                                       Enterprises...
Data Scientists andMapReduce Platforms
A Brief History of MapReduce & Hadoop                                                                 2008: Aster Data    ...
MapReduce is the SQL of Big Analytics• MapReduce is a parallel                    Map Function   programming framework  - ...
14   Teradata Confidential and Proprietary
The Technology Gap                         SQL-MR                  Hadoop-MR         • Analyst-friendly                  •...
Quick Aster & SQL-MapReduce          Overview
Filling the Gap: SQL-MapReduce17   Teradata Confidential and Proprietary
Enabling Analysis of Diverse DataAster capabilities for processing and analyzing multi-structured,raw data     Multi-struc...
SQL-MapReduce for Big Data Analytics    Example: Pattern Matching, Time Series Analysis    Discover patterns in rows of se...
Sample SQL-MapReduce Packaged FunctionsModules                           SQL-MapReduce Analytic Functions                 ...
Complementing Hadoop in the Enterpise
You Need Hybrid Architectures                  Engineers                         Data Scientists     Business Analysts    ...
Complimentary and Overlapping Use Cases           Use cases                          Use Cases            Use Cases       ...
An Example of an Enterprise Hybrid Architecture                                                     Data                  ...
Connecting Hadoop With Other Systems
3 Ways to Connect Hadoop to Databases     Ad-Hoc                                                       Purpose-Built      ...
Using Aster Data and Hadoop TogetherAster Data for rich, ultra-fast analytics      Data     Sources                       ...
The Aster-Hadoop Data ConnectorEnable users to analyze data where it makes the most sense• Why Is It Needed?              ...
MapReduce Enterprise Use Cases
Example #1: SQL-MapReduce forData Scientist Investigative Analytics Data Scientist Discovery of Bot Detection Algos• Busin...
Example #2: Enabling Creation ofData-Driven Products           /                                   “Cards that fit you”   ...
Example #3:Better Visibility to Marketing Impact “Aster gives us the analytic capability to provide best-in-class digital ...
Visualization Example: Aster Data TableauIntegration with SQL-MapReduce®33   Teradata Confidential and Proprietary
Summary - MapReduce for the Rest of Us                   Data Science is Growing Fast but     1                   Big Ente...
Thank You! ... Questions?Learn More About SQL-MapReduce• MapReduce Resource Center -  www.asterdata.com/mapreduce• Aster D...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata
Upcoming SlideShare
Loading in …5
×

Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata

4,095 views

Published on

Recent research has pointed out the complementary nature of Hadoop and other data management solutions and the importance of leveraging existing systems, SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve analytic processing. Come to this session to learn how companies optimize the use of Hadoop with other enterprise systems to improve overall analytical throughput and build new data-driven products. This session covers: ways to achieve high-performance integration between Hadoop and relational-based systems; Hadoop+NoSQL vs Hadoop+SQL architectures; high-speed, massively parallel data transfer to analytical platforms that can aggregate web log data with granular fact data; and strategies for freeing up capacity for more explorative, iterative analytics and ad hoc queries.

Published in: Technology, Business

Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata

  1. 1. Big Data ArchitectureTasso Argyros | co-President | Teradata AsterTwitter: @targyrosNovember, 2011
  2. 2. What We’re Covering Today• Data Science in Enterprise (vs the Valley)• Quick Overview of Teradata Aster’s Technology• Hybrid Hadoop Architectures• Connecting Hadoop to Other Systems• MapReduce Enteprise Use Cases2 Teradata Confidential and Proprietary
  3. 3. About Aster Data• Aster has been a Big Data & Big Analytics pioneer since 2005 by developing an MPP SQL+MapReduce platform• Aster Data acquisition completed on April 6, 2011• Opportunity for Teradata to expand its business in the Big Data analytics market to include multi-structured data and new analytical capabilities• Intense Focus on the Enterprise3 Teradata Confidential and Proprietary
  4. 4. The Nature of Data Scientist Analytics in the Enteprise
  5. 5. What is Data Science? Curiosity/ Data Cleverness Scientists Technical Business Expertise Acumen5 Teradata Confidential and Proprietary
  6. 6. Data Science is Exploding6 Teradata Confidential and Proprietary
  7. 7. What is Making Data Science Popular?1. Proliferation of Data-Driven Products & Businesses2. Consumer Interactions with Web & Social Channels3. Breadth of Tools Available4. Wealth of Machine-Generated Data7 Teradata Confidential and Proprietary
  8. 8. A Day in the Life of a Data Scientist –“Investigative Analytics” Integrate Investigate Implement8 Teradata Confidential and Proprietary
  9. 9. Data Scientists in the Enterpriseare Not Only Developers SQL Analysts SAS/R Analysts Curiosity/ DBMS Power Users Cleverness Java Coders … Technical Business Expertise Acumen9 Teradata Confidential and Proprietary
  10. 10. Data Scientists Have Different SkillsCombination of:- Analysts- Coders Enterprises- Sys admins / EngOpsHard to find &expensive Web Startups 10 Teradata Confidential and Proprietary
  11. 11. Data Scientists andMapReduce Platforms
  12. 12. A Brief History of MapReduce & Hadoop 2008: Aster Data 2009-2011: becomes the first Follow-on DBMS vendor to incorporate vendors announce 2006: Hadoop MapReduce connectors to becomes the first Hadoop open-source Aster Data implementation of tightly coupled: Hadoop MapReduce embedded MapReduce Distributions/ 2004: Google with SQL to bring Platforms emerge: publishes MapReduce to • Amazon MapReduce paper at enterprises – • Cloudera OSDI Conference SQL-MapReduce® • Hortonworks • Data Stax • MapR • …12 Teradata Confidential and Proprietary
  13. 13. MapReduce is the SQL of Big Analytics• MapReduce is a parallel Map Function programming framework - “J2EE for Big Data Analytics” Scheduler• MapReduce provides - Automatic parallelization map - Fault tolerance - Monitoring & status updates shuffle• Hadoop reduce - Open source MapReduce• Aster - Commercial implementation of Results MapReduce + SQL13 Teradata Confidential and Proprietary
  14. 14. 14 Teradata Confidential and Proprietary
  15. 15. The Technology Gap SQL-MR Hadoop-MR • Analyst-friendly • Developer-friendly • Iterative & Fast • Batch-oriented • Integrates well • Requires lots of with BI/Viz Tools coding But what if you need both?15 Teradata Confidential and Proprietary
  16. 16. Quick Aster & SQL-MapReduce Overview
  17. 17. Filling the Gap: SQL-MapReduce17 Teradata Confidential and Proprietary
  18. 18. Enabling Analysis of Diverse DataAster capabilities for processing and analyzing multi-structured,raw data Multi-structured raw data Aster Analytic Platform SQL-MapReduce Output Col1 Col2 Col3 Col4 Structured Data tokenize, unpack, sessionize, … (DW, DBMS) Integrate Data Process and Explore Leverage Results • Load raw data directly • Use SQL-MapReduce • Structured output of into Aster Database functions to interpret & SQL-MapReduce • Bypass complex ETL analyze raw data processing available for pipeline via ELT • Leverage flexible, further use or output to dynamically-created data warehouse schema at runtime18 Teradata Confidential and Proprietary
  19. 19. SQL-MapReduce for Big Data Analytics Example: Pattern Matching, Time Series Analysis Discover patterns in rows of sequential dataWeblogs {user, page, time} Aster SQL-MapReduce Approach Click 1 Click 2 Click 3 Click 4 • Single-pass of data {device, value, time} • Linked list sequential analysisSmartMeters Reading 1 Reading 2 Reading 3 Reading 4 • Gap recognition {user, product, time}SalesTransactions Purchase 1 Purchase 2 Purchase 3 Purchase 4 {stock, price, time} Traditional SQL ApproachStock Tick • Full Table ScansData Tick 1 Tick 2 Tick 3 Tick 4 • Self-Joins for sequencingCall Data Records {user, number, time} • Limited operators for ordered data Call 1 Call 2 Call 3 Call 4 Call 4 eBusiness Telecomm Financial Federal >Sessionization >Calling Patterns >Trade Sequences >Pattern Detection >Click Analysis >Signal Processing >Pairs Trading >Fuzzy Matching >Golden Path >Forecasting >Fraud Detection >Inference Analysis >Rev Attribution >Inexact linking 19 Teradata Confidential and Proprietary
  20. 20. Sample SQL-MapReduce Packaged FunctionsModules SQL-MapReduce Analytic Functions • nPath: complex sequential analysis for time series and behavioral patternsPath Analysis • nPath Extensions: count entrants, track exit paths, count children, andDiscover patterns in rows generate subsequencesof sequential data • Sessionization: identifies sessions from time series data in single passGraph and • Graph analysis: finds shortest path from distinct node to all other nodes inRelational Analysis graph • nTree: new function for performing operations on tree hierarchies. *Analyze patterns across Newrows of data • Other: triangle finding, square finding, clustering coefficient * • Sentiment Analysis: classify content is positive or negative (for product review, customer feedback) * New • Text Categorization: used to label content as spam/not spam *Text Analysis • Entity Extraction/Rules Engine: identify addresses, phone number, names from textual data *Derive patterns in textualdata • Text Processing: counts occurrences of words, identifies roots, & tracks relative positions of words & multi-word phrases • nGram: split an input stream of text into individual words and phrases • Levenshtein Distance: computes the distance between two words • Pivot: convert columns to rows or rows to columns *Data • Log parser: Generalized tool for parsing Apache logs * NewTransformation • Unpack: extracts nested data for further analysisTransform data for more • Pack: compress multi-column data into a single columnadvanced analysis • Antiselect: returns all columns except for specified column20 • Multicase: Teradata Confidential and Proprietary case statement that supports row match for multiple cases
  21. 21. Complementing Hadoop in the Enterpise
  22. 22. You Need Hybrid Architectures Engineers Data Scientists Business Analysts 5-10 concurrent users 50+ concurrent users 5000+ concurrent users Ingest, Transform, Archive Discover and explore Analyze and Report • Path & pattern • Fast data loading analysis • ELT/ETL • Operational analysis • Image processing • Graph analysis • Transactional analysis • Online archival • Fraud detection • High volume ad-hoc • Text analysis • Elastic data marts Hadoop Aster Teradata Batch Interactive Active22 Teradata Confidential and Proprietary
  23. 23. Complimentary and Overlapping Use Cases Use cases Use Cases Use Cases • Data preprocessing • Web log analysis • Pattern matching • Image processing • Text processing • Visitor behavior • Search indexes • Genomic, • Graph & relationship • Web crawling Astronomical, , analysis Geo-Spatial, • Investigative scientific analytics BATCH FAST/ PROCESSING INTERACTIVE23 Teradata Confidential and Proprietary
  24. 24. An Example of an Enterprise Hybrid Architecture Data Business Data Scientists BI Analysts Apps Teradata | Aster Hadoop Multi- Structured Structured Teradata | EDW Data Data• Batch • Weblogs • Financial • Customer Processing • Machine data data addresses,• Data Archival • Customer • SAP, ERP, phones, etc … • Integration with• Data Interaction data • Call center text • Address, financial, Transform- phones, … data operational data ations 24 Teradata Confidential and Proprietary
  25. 25. Connecting Hadoop With Other Systems
  26. 26. 3 Ways to Connect Hadoop to Databases Ad-Hoc Purpose-Built Connectors Hadoop Front-End (Pig/Hive) Batch HDFS Scripts Ease of Use26 Teradata Confidential and Proprietary
  27. 27. Using Aster Data and Hadoop TogetherAster Data for rich, ultra-fast analytics Data Sources Hadoop Aster Database Web data NetFlow data Map Map HDFS Reduce Reduce Connector SQL + SQL/MR Data Source HDFS Log files Text filesDiverse Data Sources 1 2 3 4 Non-relational data Hadoop processes Data from HDFS Data used for loaded into Hadoop data transformation loaded into Aster interactive analytics cluster using HDFS connector inside Aster Database27 Teradata Confidential and Proprietary
  28. 28. The Aster-Hadoop Data ConnectorEnable users to analyze data where it makes the most sense• Why Is It Needed? Example: - Hadoop can be used batch ETL and batch data processing insert into mytable - Aster for fast, interactive analysis select * - Challenge: slow, tedious manual from operations required to transfer data load_from_hadoop( from Hadoop into Aster Database on mytable host(10.10.3.22)• What Is It? port(9000) - A set of 2 SQL-MapReduce functions delimiter(,) developed by Aster Data nullstring() • LoadFromHadoop: Parallel data loading from files(hdfs_input_filepaths.txt) HDFS to Aster nCluster • LoadToHadoop: Parallel data loading from Aster ); nCluster to HDFS - Advantages: Parallel performance, Seamless (SQL), Consistency (ACID)28 Teradata Confidential and Proprietary
  29. 29. MapReduce Enterprise Use Cases
  30. 30. Example #1: SQL-MapReduce forData Scientist Investigative Analytics Data Scientist Discovery of Bot Detection Algos• Business Goal: • Update bot detection algo’s with new markers of suspect traffic for potential fraud or spam attacks “We’ve always wanted to examine search sub-sessions to really• Aster Data Differentiated Solution: understand what behaviors come • Investigative analysis to identify new attributes that increase from specific searches… the predictive accuracy of bot detection • Correlate data within/across sessions from complex URLs • Use nPath to quickly identify and iteratively explore site All of this requires cursors and activity patterns external programming in Oracle, but can be easily parallelized in• Business Impact : Aster Data even with non- • Site integrity: identify bot traffic which can degrade programmers.” performance and security of www.book.com (B&N) • Improved customer experience: detect and prevent spam Michael Wexler, VP of Analytics, and other automated nuisances to B&N members Barnes & NobleOther Aster Data Applications at Barnes & Noble: • Online marketing attribution – across search, device, features • Customer personalized recommendations - ever-changing30 Teradata Confidential and Proprietary
  31. 31. Example #2: Enabling Creation ofData-Driven Products / “Cards that fit you” • Personalized recommendations of credit cards that would provide best fit for customer • Uses clickstream analysis + text analysis to process data about customer interests and spending patterns • Business Impact: delivers referral revenue related to click-throughs on specific card offers31 Teradata Confidential and Proprietary
  32. 32. Example #3:Better Visibility to Marketing Impact “Aster gives us the analytic capability to provide best-in-class digital marketing optimization for our clients, enabling more accurate marketing attribution. With Aster, we can help our clients understand every marketing interaction with consumers over time and across their entire online market ecosystem, knowing the impact of every marketing dollar spent.” Sunil Kavi, Director of Technology Razorfish32 Teradata Confidential and Proprietary
  33. 33. Visualization Example: Aster Data TableauIntegration with SQL-MapReduce®33 Teradata Confidential and Proprietary
  34. 34. Summary - MapReduce for the Rest of Us Data Science is Growing Fast but 1 Big Enterprise is not Facebook There is a Gap Between Existing Enterprise 2 Skills and Technology Capabilities To Solve this Problem Look at Utilizing the 3 Right Technology for the Right Problem34 Teradata Confidential and Proprietary
  35. 35. Thank You! ... Questions?Learn More About SQL-MapReduce• MapReduce Resource Center - www.asterdata.com/mapreduce• Aster Developer Express IDE trial www.asterdata.com/ide• Download white paper at www.asterdata.com See it in action tonight!! – Aster & Tableau Happy Hour Eventi Hotel 851 Avenue of the Americas (6th Avenue) New York, NY 10001 7-9PM35 Teradata Confidential and Proprietary

×