SlideShare a Scribd company logo
1 of 36
Big Data Architecture
Tasso Argyros | co-President | Teradata Aster
Twitter: @targyros

November, 2011
What We’re Covering Today


•   Data Science in Enterprise (vs the Valley)
•   Quick Overview of Teradata Aster’s Technology
•   Hybrid Hadoop Architectures
•   Connecting Hadoop to Other Systems
•   MapReduce Enteprise Use Cases




2     Teradata Confidential and Proprietary
About Aster Data

• Aster has been a Big Data & Big Analytics pioneer since 2005
  by developing an MPP SQL+MapReduce platform

• Aster Data acquisition completed on April 6, 2011

• Opportunity for Teradata to expand its business in the Big Data
  analytics market to include multi-structured data and new
  analytical capabilities

• Intense Focus on the Enterprise




3   Teradata Confidential and Proprietary
The Nature of Data Scientist Analytics
          in the Enteprise
What is Data Science?



                                            Curiosity/      Data
                                            Cleverness    Scientists




                           Technical               Business
                           Expertise               Acumen



5   Teradata Confidential and Proprietary
Data Science is Exploding




6   Teradata Confidential and Proprietary
What is Making Data Science Popular?


1. Proliferation of Data-Driven Products & Businesses

2. Consumer Interactions with Web & Social Channels

3. Breadth of Tools Available

4. Wealth of Machine-Generated Data




7   Teradata Confidential and Proprietary
A Day in the Life of a Data Scientist –
“Investigative Analytics”


                                            Integrate




                                                        Investigate




                                            Implement




8   Teradata Confidential and Proprietary
Data Scientists in the Enterprise
are Not Only Developers
                                                           SQL Analysts
                                                          SAS/R Analysts
                                            Curiosity/   DBMS Power Users
                                            Cleverness     Java Coders
                                                                …




                           Technical               Business
                           Expertise               Acumen



9   Teradata Confidential and Proprietary
Data Scientists Have Different Skills
Combination of:
-  Analysts
-  Coders                                       Enterprises
-  Sys admins /
   EngOps

Hard to find &
expensive




                     Web Startups


  10    Teradata Confidential and Proprietary
Data Scientists and
MapReduce Platforms
A Brief History of MapReduce & Hadoop




                                                                 2008: Aster Data        2009-2011:
                                                                 becomes the first       Follow-on DBMS
                                                                 vendor to incorporate   vendors announce
                                             2006: Hadoop        MapReduce               connectors to
                                             becomes the first                           Hadoop
                                             open-source         Aster Data
                                             implementation of   tightly coupled:        Hadoop
                                             MapReduce           embedded MapReduce      Distributions/
        2004: Google
                                                                 with SQL to bring       Platforms emerge:
        publishes
                                                                 MapReduce to            • Amazon
        MapReduce paper at
                                                                 enterprises –           • Cloudera
        OSDI Conference
                                                                 SQL-MapReduce®          • Hortonworks
                                                                                         • Data Stax
                                                                                         • MapR
                                                                                         • …

12   Teradata Confidential and Proprietary
MapReduce is the SQL of Big Analytics

• MapReduce is a parallel                    Map Function
   programming framework
  - “J2EE for Big Data Analytics”             Scheduler

• MapReduce provides
     - Automatic parallelization
                                                            map
     - Fault tolerance
     - Monitoring & status updates
                                                            shuffle

• Hadoop                                                    reduce
     - Open source MapReduce

• Aster
     - Commercial implementation of            Results
       MapReduce + SQL


13   Teradata Confidential and Proprietary
14   Teradata Confidential and Proprietary
The Technology Gap

                         SQL-MR                  Hadoop-MR

         • Analyst-friendly                  • Developer-friendly

         • Iterative & Fast                  • Batch-oriented

         • Integrates well                   • Requires lots of
           with BI/Viz Tools                   coding




                        But what if you need both?
15   Teradata Confidential and Proprietary
Quick Aster & SQL-MapReduce
          Overview
Filling the Gap: SQL-MapReduce




17   Teradata Confidential and Proprietary
Enabling Analysis of Diverse Data
Aster capabilities for processing and analyzing multi-structured,
raw data


     Multi-structured
        raw data                                               Aster Analytic Platform
                                                                                SQL-MapReduce Output

                                                                                 Col1   Col2   Col3   Col4




        Structured
           Data                                             tokenize, unpack,
                                                              sessionize, …
       (DW, DBMS)



           Integrate Data                       Process and Explore                   Leverage Results
      • Load raw data directly                • Use SQL-MapReduce                 • Structured output of
        into Aster Database                     functions to interpret &            SQL-MapReduce
      • Bypass complex ETL                      analyze raw data                    processing available for
        pipeline via ELT                      • Leverage flexible,                  further use or output to
                                                dynamically-created                 data warehouse
                                                schema at runtime
18    Teradata Confidential and Proprietary
SQL-MapReduce for Big Data Analytics
    Example: Pattern Matching, Time Series Analysis
    Discover patterns in rows of sequential data

Weblogs
               {user, page, time}
                                                                                                Aster SQL-MapReduce Approach
                    Click 1            Click 2         Click 3         Click 4                  • Single-pass of data
               {device, value, time}                                                            • Linked list sequential analysis
Smart
Meters              Reading 1       Reading 2         Reading 3   Reading 4
                                                                                                • Gap recognition
                 {user, product, time}
Sales
Transactions          Purchase 1        Purchase 2       Purchase 3      Purchase 4
                     {stock, price, time}
                                                                                                Traditional SQL Approach
Stock Tick                                                                                      • Full Table Scans
Data                      Tick 1             Tick 2          Tick 3            Tick 4
                                                                                                • Self-Joins for sequencing
Call Data Records
                      {user, number, time}
                                                                                                • Limited operators for ordered data
                              Call 1         Call 2      Call 3       Call 4       Call 4




             eBusiness                                    Telecomm                             Financial                Federal
          >Sessionization                             >Calling Patterns                     >Trade Sequences       >Pattern Detection
          >Click Analysis                             >Signal Processing                    >Pairs Trading         >Fuzzy Matching
          >Golden Path                                >Forecasting                          >Fraud Detection       >Inference Analysis
          >Rev Attribution                                                                  >Inexact linking

     19        Teradata Confidential and Proprietary
Sample SQL-MapReduce Packaged Functions
Modules                           SQL-MapReduce Analytic Functions
                                  • nPath: complex sequential analysis for time series and behavioral patterns
Path Analysis
                                  • nPath Extensions: count entrants, track exit paths, count children, and
Discover patterns in rows           generate subsequences
of sequential data
                                  • Sessionization: identifies sessions from time series data in single pass
Graph and                         • Graph analysis: finds shortest path from distinct node to all other nodes in
Relational Analysis                 graph
                                  • nTree: new function for performing operations on tree hierarchies. *
Analyze patterns across                                                                                           New
rows of data                      • Other: triangle finding, square finding, clustering coefficient *
                                  • Sentiment Analysis: classify content is positive or negative
                                    (for product review, customer feedback) *                                     New
                                  • Text Categorization: used to label content as spam/not spam *
Text Analysis                     • Entity Extraction/Rules Engine: identify addresses, phone number, names
                                    from textual data *
Derive patterns in textual
data                              • Text Processing: counts occurrences of words, identifies roots, & tracks
                                    relative positions of words & multi-word phrases
                                  • nGram: split an input stream of text into individual words and phrases
                                  • Levenshtein Distance: computes the distance between two words
                                  • Pivot: convert columns to rows or rows to columns *
Data                              • Log parser: Generalized tool for parsing Apache logs *                        New
Transformation                    • Unpack: extracts nested data for further analysis
Transform data for more           • Pack: compress multi-column data into a single column
advanced analysis                 • Antiselect: returns all columns except for specified column
20                                     • Multicase:
       Teradata Confidential and Proprietary          case statement that supports row match for multiple cases
Complementing Hadoop in the Enterpise
You Need Hybrid Architectures
                  Engineers                         Data Scientists     Business Analysts

                      5-10
               concurrent users
                                                          50+
                                                     concurrent users
                                                                                  5000+
                                                                               concurrent users
 Ingest, Transform, Archive

                                                 Discover and explore

                                                                            Analyze and Report


                                                 • Path & pattern
     •   Fast data loading                         analysis
     •   ELT/ETL                                                        •   Operational analysis
     •   Image processing                        • Graph analysis       •   Transactional analysis
     •   Online archival                         • Fraud detection
                                                                        •   High volume ad-hoc
                                                 • Text analysis
                                                                        •   Elastic data marts

                 Hadoop                                  Aster                    Teradata
                    Batch                             Interactive                    Active


22       Teradata Confidential and Proprietary
Complimentary and Overlapping Use Cases




           Use cases                          Use Cases            Use Cases
           • Data preprocessing               • Web log analysis   • Pattern matching
           • Image processing                 • Text processing    • Visitor behavior
           • Search indexes                   • Genomic,           • Graph & relationship
           • Web crawling                       Astronomical, ,      analysis
                                                Geo-Spatial,       • Investigative
                                                scientific           analytics


                                      BATCH                    FAST/
                                    PROCESSING              INTERACTIVE



23    Teradata Confidential and Proprietary
An Example of an Enterprise Hybrid Architecture

                                                     Data                                Business
                Data Scientists                                         BI
                                                    Analysts                              Apps



                                                         Teradata | Aster



         Hadoop
                                    Multi-
                                                    Structured
                                 Structured                                 Teradata | EDW
                                                       Data
                                    Data

•   Batch                      • Weblogs            • Financial                • Customer
    Processing                 • Machine data         data                       addresses,
•   Data Archival              • Customer           • SAP, ERP,                  phones, etc
                                                      …                        • Integration with
•   Data                         Interaction data
                               • Call center text   • Address,                   financial,
    Transform-                                        phones, …
                                 data                                            operational data
    ations

    24   Teradata Confidential and Proprietary
Connecting Hadoop With Other Systems
3 Ways to Connect Hadoop to Databases
     Ad-Hoc



                                                       Purpose-Built
                                                        Connectors

                                               Hadoop
                                              Front-End
                                              (Pig/Hive)

                           Batch HDFS
                             Scripts

                                                                       Ease of Use


26    Teradata Confidential and Proprietary
Using Aster Data and Hadoop Together
Aster Data for rich, ultra-fast analytics


      Data
     Sources
                                                            Hadoop                           Aster Database
      Web data


     NetFlow data                                   Map               Map
                                                                                 HDFS
                                                   Reduce            Reduce     Connector
                                                                                                     SQL + SQL/MR
     Data Source                                            HDFS

       Log files


      Text files
Diverse Data
  Sources


                    1                                        2                          3                           4

      Non-relational data                         Hadoop processes               Data from HDFS              Data used for
     loaded into Hadoop                           data transformation            loaded into Aster       interactive analytics
           cluster                                                            using HDFS connector     inside Aster Database



27        Teradata Confidential and Proprietary
The Aster-Hadoop Data Connector
Enable users to analyze data where it makes the most sense

• Why Is It Needed?
                                                              Example:
     - Hadoop can be used batch ETL and
       batch data processing
                                                              insert into mytable
     - Aster for fast, interactive analysis                   select *
     - Challenge: slow, tedious manual                        from
       operations required to transfer data                   load_from_hadoop(
       from Hadoop into Aster Database
                                                                  on mytable
                                                                  host('10.10.3.22')
• What Is It?                                                     port(9000)
     - A set of 2 SQL-MapReduce functions                         delimiter(',')
       developed by Aster Data                                    nullstring('')
       •     LoadFromHadoop: Parallel data loading from       files('hdfs_input_filepaths.txt')
             HDFS to Aster nCluster
       •     LoadToHadoop: Parallel data loading from Aster
                                                              );
             nCluster to HDFS
     - Advantages: Parallel performance,
       Seamless (SQL), Consistency (ACID)


28         Teradata Confidential and Proprietary
MapReduce Enterprise Use Cases
Example #1: SQL-MapReduce for
Data Scientist Investigative Analytics
 Data Scientist Discovery of Bot Detection Algos

• Business Goal:
   • Update bot detection algo’s with new markers of suspect
     traffic for potential fraud or spam attacks
                                                                       “We’ve always wanted to examine
                                                                       search sub-sessions to really
• Aster Data Differentiated Solution:
                                                                       understand what behaviors come
   • Investigative analysis to identify new attributes that increase
                                                                       from specific searches…
     the predictive accuracy of bot detection
   • Correlate data within/across sessions from complex URLs
   • Use nPath to quickly identify and iteratively explore site        All of this requires cursors and
     activity patterns                                                 external programming in Oracle,
                                                                       but can be easily parallelized in
• Business Impact :                                                    Aster Data even with non-
   • Site integrity: identify bot traffic which can degrade            programmers.”
     performance and security of www.book.com (B&N)
   • Improved customer experience: detect and prevent spam             Michael Wexler, VP of Analytics,
     and other automated nuisances to B&N members                      Barnes & Noble

Other Aster Data Applications at Barnes & Noble:
  • Online marketing attribution – across search, device, features
  • Customer personalized recommendations - ever-changing
30     Teradata Confidential and Proprietary
Example #2: Enabling Creation of
Data-Driven Products


           /                                   “Cards that fit you”
                                             • Personalized recommendations
                                               of credit cards that would
                                               provide best fit for customer
                                             • Uses clickstream analysis +
                                               text analysis to process data
                                               about customer interests and
                                               spending patterns

                                             • Business Impact: delivers
                                               referral revenue related to
                                               click-throughs on specific card
                                               offers


31   Teradata Confidential and Proprietary
Example #3:
Better Visibility to Marketing Impact


 “Aster gives us the analytic capability to provide
 best-in-class digital marketing optimization for our clients, enabling
 more accurate marketing attribution. With Aster, we can help our
 clients understand every marketing interaction with consumers over
 time and across their entire online market ecosystem, knowing the
 impact of every marketing dollar spent.”


 Sunil Kavi, Director of Technology
 Razorfish



32   Teradata Confidential and Proprietary
Visualization Example: Aster Data Tableau
Integration with SQL-MapReduce®




33   Teradata Confidential and Proprietary
Summary - MapReduce for the Rest of Us


                   Data Science is Growing Fast but
     1
                   Big Enterprise is not Facebook


                   There is a Gap Between Existing Enterprise
     2
                   Skills and Technology Capabilities



                   To Solve this Problem Look at Utilizing the
     3
                   Right Technology for the Right Problem


34   Teradata Confidential and Proprietary
Thank You! ... Questions?

Learn More About SQL-MapReduce
• MapReduce Resource Center -
  www.asterdata.com/mapreduce
• Aster Developer Express IDE trial
  www.asterdata.com/ide
• Download white paper at
  www.asterdata.com



     See it in action tonight!! – Aster & Tableau Happy Hour
                                            Eventi Hotel
                               851 Avenue of the Americas (6th Avenue)
                                        New York, NY 10001
                                               7-9PM

35   Teradata Confidential and Proprietary
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata

More Related Content

What's hot

Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
Jeffrey T. Pollock
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
jdijcks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 

What's hot (20)

Aster getting started
Aster getting startedAster getting started
Aster getting started
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 
Big Data
Big DataBig Data
Big Data
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
Innovate Analytics with Oracle Data Mining & Oracle R
Innovate Analytics with Oracle Data Mining & Oracle RInnovate Analytics with Oracle Data Mining & Oracle R
Innovate Analytics with Oracle Data Mining & Oracle R
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprise
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
JDV Big Data Webinar v2
JDV Big Data Webinar v2JDV Big Data Webinar v2
JDV Big Data Webinar v2
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 

Viewers also liked

Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Hortonworks
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
Taro L. Saito
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 

Viewers also liked (20)

Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Teradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data StreamingTeradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data Streaming
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
Herding Cats in the Digital World
Herding Cats in the Digital WorldHerding Cats in the Digital World
Herding Cats in the Digital World
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWS
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Big Data: SQL query federation for Hadoop and RDBMS data
Big Data:  SQL query federation for Hadoop and RDBMS dataBig Data:  SQL query federation for Hadoop and RDBMS data
Big Data: SQL query federation for Hadoop and RDBMS data
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Cassandra useful features
Cassandra useful featuresCassandra useful features
Cassandra useful features
 
Floriculture cut flower plants
Floriculture cut flower plantsFloriculture cut flower plants
Floriculture cut flower plants
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 

Similar to Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata

Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analytics
aghosh_us
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
MapR Technologies
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 

Similar to Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata (20)

Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analytics
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache Drill
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata

  • 1. Big Data Architecture Tasso Argyros | co-President | Teradata Aster Twitter: @targyros November, 2011
  • 2. What We’re Covering Today • Data Science in Enterprise (vs the Valley) • Quick Overview of Teradata Aster’s Technology • Hybrid Hadoop Architectures • Connecting Hadoop to Other Systems • MapReduce Enteprise Use Cases 2 Teradata Confidential and Proprietary
  • 3. About Aster Data • Aster has been a Big Data & Big Analytics pioneer since 2005 by developing an MPP SQL+MapReduce platform • Aster Data acquisition completed on April 6, 2011 • Opportunity for Teradata to expand its business in the Big Data analytics market to include multi-structured data and new analytical capabilities • Intense Focus on the Enterprise 3 Teradata Confidential and Proprietary
  • 4. The Nature of Data Scientist Analytics in the Enteprise
  • 5. What is Data Science? Curiosity/ Data Cleverness Scientists Technical Business Expertise Acumen 5 Teradata Confidential and Proprietary
  • 6. Data Science is Exploding 6 Teradata Confidential and Proprietary
  • 7. What is Making Data Science Popular? 1. Proliferation of Data-Driven Products & Businesses 2. Consumer Interactions with Web & Social Channels 3. Breadth of Tools Available 4. Wealth of Machine-Generated Data 7 Teradata Confidential and Proprietary
  • 8. A Day in the Life of a Data Scientist – “Investigative Analytics” Integrate Investigate Implement 8 Teradata Confidential and Proprietary
  • 9. Data Scientists in the Enterprise are Not Only Developers SQL Analysts SAS/R Analysts Curiosity/ DBMS Power Users Cleverness Java Coders … Technical Business Expertise Acumen 9 Teradata Confidential and Proprietary
  • 10. Data Scientists Have Different Skills Combination of: - Analysts - Coders Enterprises - Sys admins / EngOps Hard to find & expensive Web Startups 10 Teradata Confidential and Proprietary
  • 12. A Brief History of MapReduce & Hadoop 2008: Aster Data 2009-2011: becomes the first Follow-on DBMS vendor to incorporate vendors announce 2006: Hadoop MapReduce connectors to becomes the first Hadoop open-source Aster Data implementation of tightly coupled: Hadoop MapReduce embedded MapReduce Distributions/ 2004: Google with SQL to bring Platforms emerge: publishes MapReduce to • Amazon MapReduce paper at enterprises – • Cloudera OSDI Conference SQL-MapReduce® • Hortonworks • Data Stax • MapR • … 12 Teradata Confidential and Proprietary
  • 13. MapReduce is the SQL of Big Analytics • MapReduce is a parallel Map Function programming framework - “J2EE for Big Data Analytics” Scheduler • MapReduce provides - Automatic parallelization map - Fault tolerance - Monitoring & status updates shuffle • Hadoop reduce - Open source MapReduce • Aster - Commercial implementation of Results MapReduce + SQL 13 Teradata Confidential and Proprietary
  • 14. 14 Teradata Confidential and Proprietary
  • 15. The Technology Gap SQL-MR Hadoop-MR • Analyst-friendly • Developer-friendly • Iterative & Fast • Batch-oriented • Integrates well • Requires lots of with BI/Viz Tools coding But what if you need both? 15 Teradata Confidential and Proprietary
  • 16. Quick Aster & SQL-MapReduce Overview
  • 17. Filling the Gap: SQL-MapReduce 17 Teradata Confidential and Proprietary
  • 18. Enabling Analysis of Diverse Data Aster capabilities for processing and analyzing multi-structured, raw data Multi-structured raw data Aster Analytic Platform SQL-MapReduce Output Col1 Col2 Col3 Col4 Structured Data tokenize, unpack, sessionize, … (DW, DBMS) Integrate Data Process and Explore Leverage Results • Load raw data directly • Use SQL-MapReduce • Structured output of into Aster Database functions to interpret & SQL-MapReduce • Bypass complex ETL analyze raw data processing available for pipeline via ELT • Leverage flexible, further use or output to dynamically-created data warehouse schema at runtime 18 Teradata Confidential and Proprietary
  • 19. SQL-MapReduce for Big Data Analytics Example: Pattern Matching, Time Series Analysis Discover patterns in rows of sequential data Weblogs {user, page, time} Aster SQL-MapReduce Approach Click 1 Click 2 Click 3 Click 4 • Single-pass of data {device, value, time} • Linked list sequential analysis Smart Meters Reading 1 Reading 2 Reading 3 Reading 4 • Gap recognition {user, product, time} Sales Transactions Purchase 1 Purchase 2 Purchase 3 Purchase 4 {stock, price, time} Traditional SQL Approach Stock Tick • Full Table Scans Data Tick 1 Tick 2 Tick 3 Tick 4 • Self-Joins for sequencing Call Data Records {user, number, time} • Limited operators for ordered data Call 1 Call 2 Call 3 Call 4 Call 4 eBusiness Telecomm Financial Federal >Sessionization >Calling Patterns >Trade Sequences >Pattern Detection >Click Analysis >Signal Processing >Pairs Trading >Fuzzy Matching >Golden Path >Forecasting >Fraud Detection >Inference Analysis >Rev Attribution >Inexact linking 19 Teradata Confidential and Proprietary
  • 20. Sample SQL-MapReduce Packaged Functions Modules SQL-MapReduce Analytic Functions • nPath: complex sequential analysis for time series and behavioral patterns Path Analysis • nPath Extensions: count entrants, track exit paths, count children, and Discover patterns in rows generate subsequences of sequential data • Sessionization: identifies sessions from time series data in single pass Graph and • Graph analysis: finds shortest path from distinct node to all other nodes in Relational Analysis graph • nTree: new function for performing operations on tree hierarchies. * Analyze patterns across New rows of data • Other: triangle finding, square finding, clustering coefficient * • Sentiment Analysis: classify content is positive or negative (for product review, customer feedback) * New • Text Categorization: used to label content as spam/not spam * Text Analysis • Entity Extraction/Rules Engine: identify addresses, phone number, names from textual data * Derive patterns in textual data • Text Processing: counts occurrences of words, identifies roots, & tracks relative positions of words & multi-word phrases • nGram: split an input stream of text into individual words and phrases • Levenshtein Distance: computes the distance between two words • Pivot: convert columns to rows or rows to columns * Data • Log parser: Generalized tool for parsing Apache logs * New Transformation • Unpack: extracts nested data for further analysis Transform data for more • Pack: compress multi-column data into a single column advanced analysis • Antiselect: returns all columns except for specified column 20 • Multicase: Teradata Confidential and Proprietary case statement that supports row match for multiple cases
  • 21. Complementing Hadoop in the Enterpise
  • 22. You Need Hybrid Architectures Engineers Data Scientists Business Analysts 5-10 concurrent users 50+ concurrent users 5000+ concurrent users Ingest, Transform, Archive Discover and explore Analyze and Report • Path & pattern • Fast data loading analysis • ELT/ETL • Operational analysis • Image processing • Graph analysis • Transactional analysis • Online archival • Fraud detection • High volume ad-hoc • Text analysis • Elastic data marts Hadoop Aster Teradata Batch Interactive Active 22 Teradata Confidential and Proprietary
  • 23. Complimentary and Overlapping Use Cases Use cases Use Cases Use Cases • Data preprocessing • Web log analysis • Pattern matching • Image processing • Text processing • Visitor behavior • Search indexes • Genomic, • Graph & relationship • Web crawling Astronomical, , analysis Geo-Spatial, • Investigative scientific analytics BATCH FAST/ PROCESSING INTERACTIVE 23 Teradata Confidential and Proprietary
  • 24. An Example of an Enterprise Hybrid Architecture Data Business Data Scientists BI Analysts Apps Teradata | Aster Hadoop Multi- Structured Structured Teradata | EDW Data Data • Batch • Weblogs • Financial • Customer Processing • Machine data data addresses, • Data Archival • Customer • SAP, ERP, phones, etc … • Integration with • Data Interaction data • Call center text • Address, financial, Transform- phones, … data operational data ations 24 Teradata Confidential and Proprietary
  • 25. Connecting Hadoop With Other Systems
  • 26. 3 Ways to Connect Hadoop to Databases Ad-Hoc Purpose-Built Connectors Hadoop Front-End (Pig/Hive) Batch HDFS Scripts Ease of Use 26 Teradata Confidential and Proprietary
  • 27. Using Aster Data and Hadoop Together Aster Data for rich, ultra-fast analytics Data Sources Hadoop Aster Database Web data NetFlow data Map Map HDFS Reduce Reduce Connector SQL + SQL/MR Data Source HDFS Log files Text files Diverse Data Sources 1 2 3 4 Non-relational data Hadoop processes Data from HDFS Data used for loaded into Hadoop data transformation loaded into Aster interactive analytics cluster using HDFS connector inside Aster Database 27 Teradata Confidential and Proprietary
  • 28. The Aster-Hadoop Data Connector Enable users to analyze data where it makes the most sense • Why Is It Needed? Example: - Hadoop can be used batch ETL and batch data processing insert into mytable - Aster for fast, interactive analysis select * - Challenge: slow, tedious manual from operations required to transfer data load_from_hadoop( from Hadoop into Aster Database on mytable host('10.10.3.22') • What Is It? port(9000) - A set of 2 SQL-MapReduce functions delimiter(',') developed by Aster Data nullstring('') • LoadFromHadoop: Parallel data loading from files('hdfs_input_filepaths.txt') HDFS to Aster nCluster • LoadToHadoop: Parallel data loading from Aster ); nCluster to HDFS - Advantages: Parallel performance, Seamless (SQL), Consistency (ACID) 28 Teradata Confidential and Proprietary
  • 30. Example #1: SQL-MapReduce for Data Scientist Investigative Analytics Data Scientist Discovery of Bot Detection Algos • Business Goal: • Update bot detection algo’s with new markers of suspect traffic for potential fraud or spam attacks “We’ve always wanted to examine search sub-sessions to really • Aster Data Differentiated Solution: understand what behaviors come • Investigative analysis to identify new attributes that increase from specific searches… the predictive accuracy of bot detection • Correlate data within/across sessions from complex URLs • Use nPath to quickly identify and iteratively explore site All of this requires cursors and activity patterns external programming in Oracle, but can be easily parallelized in • Business Impact : Aster Data even with non- • Site integrity: identify bot traffic which can degrade programmers.” performance and security of www.book.com (B&N) • Improved customer experience: detect and prevent spam Michael Wexler, VP of Analytics, and other automated nuisances to B&N members Barnes & Noble Other Aster Data Applications at Barnes & Noble: • Online marketing attribution – across search, device, features • Customer personalized recommendations - ever-changing 30 Teradata Confidential and Proprietary
  • 31. Example #2: Enabling Creation of Data-Driven Products / “Cards that fit you” • Personalized recommendations of credit cards that would provide best fit for customer • Uses clickstream analysis + text analysis to process data about customer interests and spending patterns • Business Impact: delivers referral revenue related to click-throughs on specific card offers 31 Teradata Confidential and Proprietary
  • 32. Example #3: Better Visibility to Marketing Impact “Aster gives us the analytic capability to provide best-in-class digital marketing optimization for our clients, enabling more accurate marketing attribution. With Aster, we can help our clients understand every marketing interaction with consumers over time and across their entire online market ecosystem, knowing the impact of every marketing dollar spent.” Sunil Kavi, Director of Technology Razorfish 32 Teradata Confidential and Proprietary
  • 33. Visualization Example: Aster Data Tableau Integration with SQL-MapReduce® 33 Teradata Confidential and Proprietary
  • 34. Summary - MapReduce for the Rest of Us Data Science is Growing Fast but 1 Big Enterprise is not Facebook There is a Gap Between Existing Enterprise 2 Skills and Technology Capabilities To Solve this Problem Look at Utilizing the 3 Right Technology for the Right Problem 34 Teradata Confidential and Proprietary
  • 35. Thank You! ... Questions? Learn More About SQL-MapReduce • MapReduce Resource Center - www.asterdata.com/mapreduce • Aster Developer Express IDE trial www.asterdata.com/ide • Download white paper at www.asterdata.com See it in action tonight!! – Aster & Tableau Happy Hour Eventi Hotel 851 Avenue of the Americas (6th Avenue) New York, NY 10001 7-9PM 35 Teradata Confidential and Proprietary