SlideShare a Scribd company logo
1 of 21
Integrating Hadoop within an Enterprise
Analytic Ecosystem
Priyank Patel | Product Management
June 13 2012
Topics


•  Unified Big Data Architecture Overview

•  Aster SQL-H™ : The business user’s bridge to Hadoop Data




2   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Big Data: From Transactions to Interactions

                                                                             BIG DATA
                           User Generated
                                                                                                     Social Network
                              Content
                                                     Mobile Web
                                                                                                       External
                        User Click Stream                                         Sentiment
                                                                                                     Demographics


                                 Web logs
                                                         WEB                   A/B testing        Business Data Feeds

                      Offer history                                          Dynamic Pricing
                                                                                                       HD Video
                                                                            Affiliate Networks
                                 CRM                                                                 Speech to Text
                                                     Segmentation
                                                                            Search marketing
                                                        Offer details
                                                                                                  Product/Service Logs
                         ERP                                               Behavioral Targeting
                                               Customer Touches
                  Purchase detail
                  Purchase record               Support Contacts            Dynamic Funnels            SMS/MMS
                  Payment record




                                              Increasing data variety and complexity



3   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data Architecture
Bridging Classic & Big Data Worlds

                                                         Classic BI Method
                                                    Structured & Repeatable Analysis




Business determines what                                                                   IT structures the data to
     questions to ask                                                                       answer those questions
                                                SQL performance and structure
                                                                                             “Capture only
                                                                                            what’s needed”


                                             MapReduce Processing Flexibility




     IT delivers a platform for                         Big Data Analytics
       storing, refining, and                                                             Business explores data for
                                                  Multi-structured & Iterative Analysis
    analyzing all data sources                                                            questions worth answering

     “Capture in case
       it’s needed”
4        Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Need for a Unified Big Data Architecture for New Insights
Enabling All Users for Any Data Type from Data Capture to Analysis




          Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.


                                                                             Reporting and Execution
              Discover and Explore
                                                                                in the Enterprise


                                           Capture, Store and Refine


    Audio/                                                          Web &       Machine
                  Images             Docs            Text                                 CRM   SCM   ERP
    Video                                                           Social       Logs



5    Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data Architecture for the Enterprise



             Engineers                      Data Scientists                  Quants           Business Analysts

          Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.




            Discovery Platform                                                        Integrated Data
                                                                                        Warehouse




                                                     Capture, Store, Refine


       Audio/                                              Web &            Machine
                       Images             Text                                          CRM       SCM       ERP
       Video                                               Social            Logs




 6   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
What’s Technically Different in Big Data Analytics
Variety of data types and analytics require different schemas
•  Data that uses a stable schema (structured)
    -  Data from packaged business processes with well-defined & known attributes
       (e.g., ERP data, Inventory Records, Supply Chain records, …)


•  Data that has an evolving schema (semi-structured)
    -  Data generated by machine processes; known but changing set of attributes
       (e.g., Web logs, CDRs, Sensor logs, JSON, Social profiles, Twitter feeds, …)


•  Data that has a format, but no schema (unstructured)
    -  Data captured by machines with well-defined format, but no semantics
       (e.g., images, videos, web pages, PDF documents, …)
    -  Semantics can be extracted from raw data by interpreting the format and
       pulling out required data
       (e.g., shapes from video, face recognition in images, logo detection, …)
    -  Sometimes format data is accompanied by meta-data that can have (Stable
       Schema or Evolving Schema) – that needs to be classified and treated
       separately

7     Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Diversity of Data Processing and Analytics
Unified Big Data Architecture Must Handle Each Workload Optimally

•  Low cost storage and retention
    -  Retention of raw data in manner that can provide low TCO per terabyte storage costs
    -  Access in deep storage still required but not at same speeds as in a front line system
•  Loading and refining
    -  Load: bring data into the system from the source system
    -  Pre-processing / prep/ cleansing / constraint validation: prepare data for
       downstream processing – e.g., fetch dimension data, record new incoming batch, archive
       old window batch, etc.
    -  Transformations: Convert one structure of data into another structure. This may
       require going from 3NF in relational to star/snowflake schema in Relational, or going
       from text to Relational, or going from Relational to Graph – I.e., structural
       transformations
•  Reporting
    -  This is querying of what happened, where did it happen, how much happened, who did it
•  Analytics (user-driven, interactive, ad-hoc)
    -  Relationship modeling that can be done via declarative SQL (e.g., scoring, basic stats)
    -  Relationship modeling done via procedural MR (E.g., model building, time series)



8      Confidential and proprietary. Copyright © 2012 Teradata Corporation.
When to Use Which?
 The best approach by workload and data type
 Processing as a Function of Schema Requirements by Data Type

                                                        Loading and Refining
                    Low Cost                                                                                      Analytics
                    Storage &            Data Pre-Processing,                                     Reporting     (User-driven,
                    Retention              Prep, Cleansing                      Transformations                  interactive)


                                                     Financial analysis, ad-Hoc/OLAP
Stable             Teradata /                       Enterprise-wide BI and Reporting                             Teradata
                                                   Teradata        Teradata    Teradata
Schema              Hadoop                                  Spatial/Temporal                                   (SQL analytics)
                                                             Active Execution

                                                         Interactive data discovery
                                                                        Aster
                                                       Web clickstream, social feeds                                Aster
Evolving                                            Aster /
                      Hadoop                                        (joining with  Aster                      (SQL + MapReduce
Schema                                              Hadoop Set-top box analysis
                                                                  structured data)                                Analytics)
                                                           CDRs, Sensor logs, JSON

                                                            Image processing                                      Aster
Format,
No Schema
                      Hadoop                         Audio/video storage and refining
                                                    Hadoop         Hadoop                                       (MapReduce
                                                                                                                 Analytics)
                                                    Storage and batch transformations

  9      Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Aster Digital Marketing Client


     Custom
                                                 Analytic Tools                   •  Segmentation: Custom SQL-
     Data by
      Client
                                                                                     MR algorithms to match and
                                                                                     create centralized identifiers
                                                                                  •  Sessionize by client
                                                                                  •  nPath identifies segment path
 Media Data
 (Aggregated)                                   Teradata Aster                       analysis (behavior after ads)


                                                                                  •  Benefits:
                                              Cookie-level




     Raw Web                                                           Archival     -  Marketing analysts more
                                                 data




       Logs
                                                                                       productive with Aster
                                                                                    -  Lower cost - storage and
                                                                                       batch refining done on
     Ad Server
       Logs
                                        Hadoop (on AWS)                                Amazon Elastic
                                       (Storage, aggregations,
                                              cleansing)
                                                                                       MapReduce



10      Confidential and proprietary. Copyright © 2012 Teradata Corporation.
More Accurate Customer Churn Prevention

            Hadoop captures,                                                                                                       Aster does path
                stores and                                                                                                          and sentiment
            transform images                                                     Social &                                            analysis with
                                                                                 Web data                                          multi-structured
              and call records
                                                                                                                                         data


                          Multi-Structured
                             Raw Data
                                                                            Call Data                                                     Analysis
                                                                                              Aster
                            Call Center
                                                              Hadoop                        Discovery                                        +
                           Voice Records                                    Check Data
                                                                                            Platform                                     Marketing
                                                                                                                                         Automation
                           Check Images                          Capture,




                                                                                                                Analytic Results
                                                                                             Dimensional Data
                                                                 Retention                                                                (Customer
                                                                     &                                                                     Retention
                                                                                                                                          Campaign)
                       Traditional Data Flow                  Transformation
                                                                   Layer
                          Data Sources


                                                                 ETL Tools                  Teradata
                                                                                         Integrated DW




11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Bridging the Business Analyst Gap for
            Hadoop Data
Aster SQL-H™
                 A Business User’s Bridge to Analyze Hadoop Data



Aster SQL-H gives analysts and data scientists a better way
to analyze data stored cheaply in Hadoop
      •  Allow standard ANSI SQL to Hadoop data

      •  Leverage existing BI tool investments

      •  Enable 50+ prebuilt SQL-MapReduce Apps and IDE

      •  Improve self-sufficiency for analysts going against Hadoop

 13     Confidential and proprietary. Copyright © 2012 Teradata Corporation.
The Big Data Architecture Today Has Gaps

                                                Gap 1:
                                               Analysts

               Engineers                    Data Scientists                       Quants         Business Analysts

            Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

               MapReduce
              (Processing)
                                                                              Discovery              Active Data
      Gap 2: File system lacks
                                                                              Platform               Warehouse
      optimizers, data locality,
      indexes
                                                                       Database and Analytic Processing Layer



        Data Storage and
            Refining

          Audio/                                           Web &               Machine
                          Images            Text                                           CRM     SCM      ERP
          Video                                            Social               Logs




 14    Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Analyst’s Goal: Get Insights from Data in Hadoop


      Engineers                         Data Scientists                      Quants       Business Analysts




                                                      Aster MapReduce Portfolio       Teradata Analytics Portfolio
        Custom Code and
          Development

                                                         SQL & SQL-MapReduce                    SQL

           MR, Pig, Hive
                                                           Teradata Aster                     Teradata
            IT is the optimizer                          Discovery Platform                     IDW




 15   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Analytics on Hadoop Data with Aster SQL-H


      Engineers                         Data Scientists                      Quants        Business Analysts




                              Aster MapReduce Portfolio
                                           Aster MapReduce Portfolio                  Teradata Analytics Portfolio




              SQL-H                               SQL & MapReduce
                                         SQL & SQL-MapReduce                                      SQL
                                                                                                  SQL



                                                           Teradata Aster                      Teradata
                                                         Discovery Platform                      IDW




 16   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Aster SQL-H Integration with Hadoop Catalog
A Business User’s Bridge to Analyzing Data in Hadoop

•  Industry’s First Database Integration
   with Hadoop’s HCatalog                                                            Aster SQL-H
•  Abstraction layer to easily and
   efficiently read structured & multi-
   structured data stored in HDFS
                                                                             Hadoop
•  Uses Hadoop Catalog (HCatalog) to                                        MapReduce
   perform data abstraction functions
   (e.g. automatically understands
   tables, data partitions)                                                   Hive          HCatalog
•  HDFS data presented to users as
   Aster tables
                                                                               Pig
•  Fully accessible within the Aster SQL
   and SQL-MapReduce processing
   engines, plus ODBC/JDBC & BI tools
                                                                                       HDFS

17   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Benefits of Aster SQL-H™
Deep metadata layer integration between Aster and Hadoop

Business Analysts (Powerful analytics & Performance)
•  50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio)
  -  Analytics on Hadoop data no longer requires expensive talent and training
•  Simplified, SQL-based interface with Hadoop data structures (Hcatalog)
  -  No longer limited by Hive’s QL
•  Interoperability with existing ecosystem & skillset
  -  BI tools (MSTR, Tableau, Cognos), ETL tools, SQL analysts, existing apps


Architects and Administrators (Maintainability)
•  Leverage existing DBA skill-sets without additional overhead
•  Simplify administration and monitoring
  -  Competitors require manual creation and maintenance of metadata
  -  Less work and fewer errors
  -  Can do filtering with Aster; select data from HCatalog, leverage partitioning


 18   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Aster MapReduce Portfolio: the App Store of Big Data
Some of the 50+ out-of-the-box analytical apps



        Path Analysis                                                       Text Analysis
        Discover patterns in rows of                                        Derive patterns and extract
        sequential data                                                     features in textual data



        Statistical Analysis                                                Segmentation
        High-performance processing of                                      Discover natural groupings of
        common statistical calculations                                     data points



        Marketing Analytics                                                 Data Transformation
        Analyze customer interactions to                                    Transform data for more
        optimize marketing decisions                                        advanced analysis



19   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Summary


•  Mainstream organizations need a unified big data architecture
     -  Best-of-breed with Hadoop, Aster, Teradata
     -  Brings “Data Science” to business analysts
     -  50+ business-ready MapReduce analytics and apps
     -  Enabled by SQL-MapReduce framework and new SQL-H


•  Learn more - asterdata.com/mapreduce
•  Download - developer.teradata.com/aster

•  Breakout Session : Thursday – 4:30 pm
        How does SQL-H work ?
              Sushil Thomas, Teradata Aster



20     Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified big data architecture

More Related Content

What's hot

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Oracle - Enterprise Manager 12c Overview
Oracle - Enterprise Manager 12c OverviewOracle - Enterprise Manager 12c Overview
Oracle - Enterprise Manager 12c OverviewFred Sim
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
 
Azure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdfAzure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdfChitresh Kaushik
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Edureka!
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream ProcessingSafe Software
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks
 

What's hot (20)

Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Oracle - Enterprise Manager 12c Overview
Oracle - Enterprise Manager 12c OverviewOracle - Enterprise Manager 12c Overview
Oracle - Enterprise Manager 12c Overview
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
Azure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdfAzure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdf
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 

Similar to Unified big data architecture

The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London SeminarHortonworks
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMark Ginnebaugh
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityDatabase Architechs
 
Albel pres mdm implementation
Albel pres   mdm implementationAlbel pres   mdm implementation
Albel pres mdm implementationAli BELCAID
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaleBase
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Data Warehouse Architecture
Data Warehouse ArchitectureData Warehouse Architecture
Data Warehouse Architecturepcherukumalla
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger PresentationMauricio Godoy
 

Similar to Unified big data architecture (20)

The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data Services
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
 
Search2012 ibm vf
Search2012 ibm vfSearch2012 ibm vf
Search2012 ibm vf
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data Quality
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and BeyondvBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
 
Enterprise Services Solutions
Enterprise Services SolutionsEnterprise Services Solutions
Enterprise Services Solutions
 
Albel pres mdm implementation
Albel pres   mdm implementationAlbel pres   mdm implementation
Albel pres mdm implementation
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write Splitting
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Data Warehouse Architecture
Data Warehouse ArchitectureData Warehouse Architecture
Data Warehouse Architecture
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger Presentation
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Unified big data architecture

  • 1. Integrating Hadoop within an Enterprise Analytic Ecosystem Priyank Patel | Product Management June 13 2012
  • 2. Topics •  Unified Big Data Architecture Overview •  Aster SQL-H™ : The business user’s bridge to Hadoop Data 2 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 3. Big Data: From Transactions to Interactions BIG DATA User Generated Social Network Content Mobile Web External User Click Stream Sentiment Demographics Web logs WEB A/B testing Business Data Feeds Offer history Dynamic Pricing HD Video Affiliate Networks CRM Speech to Text Segmentation Search marketing Offer details Product/Service Logs ERP Behavioral Targeting Customer Touches Purchase detail Purchase record Support Contacts Dynamic Funnels SMS/MMS Payment record Increasing data variety and complexity 3 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 4. Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic BI Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions SQL performance and structure “Capture only what’s needed” MapReduce Processing Flexibility IT delivers a platform for Big Data Analytics storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering “Capture in case it’s needed” 4 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 5. Need for a Unified Big Data Architecture for New Insights Enabling All Users for Any Data Type from Data Capture to Analysis Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Reporting and Execution Discover and Explore in the Enterprise Capture, Store and Refine Audio/ Web & Machine Images Docs Text CRM SCM ERP Video Social Logs 5 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 6. Unified Big Data Architecture for the Enterprise Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Discovery Platform Integrated Data Warehouse Capture, Store, Refine Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 6 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 7. What’s Technically Different in Big Data Analytics Variety of data types and analytics require different schemas •  Data that uses a stable schema (structured) -  Data from packaged business processes with well-defined & known attributes (e.g., ERP data, Inventory Records, Supply Chain records, …) •  Data that has an evolving schema (semi-structured) -  Data generated by machine processes; known but changing set of attributes (e.g., Web logs, CDRs, Sensor logs, JSON, Social profiles, Twitter feeds, …) •  Data that has a format, but no schema (unstructured) -  Data captured by machines with well-defined format, but no semantics (e.g., images, videos, web pages, PDF documents, …) -  Semantics can be extracted from raw data by interpreting the format and pulling out required data (e.g., shapes from video, face recognition in images, logo detection, …) -  Sometimes format data is accompanied by meta-data that can have (Stable Schema or Evolving Schema) – that needs to be classified and treated separately 7 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 8. Diversity of Data Processing and Analytics Unified Big Data Architecture Must Handle Each Workload Optimally •  Low cost storage and retention -  Retention of raw data in manner that can provide low TCO per terabyte storage costs -  Access in deep storage still required but not at same speeds as in a front line system •  Loading and refining -  Load: bring data into the system from the source system -  Pre-processing / prep/ cleansing / constraint validation: prepare data for downstream processing – e.g., fetch dimension data, record new incoming batch, archive old window batch, etc. -  Transformations: Convert one structure of data into another structure. This may require going from 3NF in relational to star/snowflake schema in Relational, or going from text to Relational, or going from Relational to Graph – I.e., structural transformations •  Reporting -  This is querying of what happened, where did it happen, how much happened, who did it •  Analytics (user-driven, interactive, ad-hoc) -  Relationship modeling that can be done via declarative SQL (e.g., scoring, basic stats) -  Relationship modeling done via procedural MR (E.g., model building, time series) 8 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 9. When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements by Data Type Loading and Refining Low Cost Analytics Storage & Data Pre-Processing, Reporting (User-driven, Retention Prep, Cleansing Transformations interactive) Financial analysis, ad-Hoc/OLAP Stable Teradata / Enterprise-wide BI and Reporting Teradata Teradata Teradata Teradata Schema Hadoop Spatial/Temporal (SQL analytics) Active Execution Interactive data discovery Aster Web clickstream, social feeds Aster Evolving Aster / Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop Set-top box analysis structured data) Analytics) CDRs, Sensor logs, JSON Image processing Aster Format, No Schema Hadoop Audio/video storage and refining Hadoop Hadoop (MapReduce Analytics) Storage and batch transformations 9 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 10. Aster Digital Marketing Client Custom Analytic Tools •  Segmentation: Custom SQL- Data by Client MR algorithms to match and create centralized identifiers •  Sessionize by client •  nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) •  Benefits: Cookie-level Raw Web Archival -  Marketing analysts more data Logs productive with Aster -  Lower cost - storage and batch refining done on Ad Server Logs Hadoop (on AWS) Amazon Elastic (Storage, aggregations, cleansing) MapReduce 10 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 11. More Accurate Customer Churn Prevention Hadoop captures, Aster does path stores and and sentiment transform images Social & analysis with Web data multi-structured and call records data Multi-Structured Raw Data Call Data Analysis Aster Call Center Hadoop Discovery + Voice Records Check Data Platform Marketing Automation Check Images Capture, Analytic Results Dimensional Data Retention (Customer & Retention Campaign) Traditional Data Flow Transformation Layer Data Sources ETL Tools Teradata Integrated DW 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 12. Bridging the Business Analyst Gap for Hadoop Data
  • 13. Aster SQL-H™ A Business User’s Bridge to Analyze Hadoop Data Aster SQL-H gives analysts and data scientists a better way to analyze data stored cheaply in Hadoop •  Allow standard ANSI SQL to Hadoop data •  Leverage existing BI tool investments •  Enable 50+ prebuilt SQL-MapReduce Apps and IDE •  Improve self-sufficiency for analysts going against Hadoop 13 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 14. The Big Data Architecture Today Has Gaps Gap 1: Analysts Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. MapReduce (Processing) Discovery Active Data Gap 2: File system lacks Platform Warehouse optimizers, data locality, indexes Database and Analytic Processing Layer Data Storage and Refining Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 15. Analyst’s Goal: Get Insights from Data in Hadoop Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Teradata Analytics Portfolio Custom Code and Development SQL & SQL-MapReduce SQL MR, Pig, Hive Teradata Aster Teradata IT is the optimizer Discovery Platform IDW 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 16. Analytics on Hadoop Data with Aster SQL-H Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Aster MapReduce Portfolio Teradata Analytics Portfolio SQL-H SQL & MapReduce SQL & SQL-MapReduce SQL SQL Teradata Aster Teradata Discovery Platform IDW 16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 17. Aster SQL-H Integration with Hadoop Catalog A Business User’s Bridge to Analyzing Data in Hadoop •  Industry’s First Database Integration with Hadoop’s HCatalog Aster SQL-H •  Abstraction layer to easily and efficiently read structured & multi- structured data stored in HDFS Hadoop •  Uses Hadoop Catalog (HCatalog) to MapReduce perform data abstraction functions (e.g. automatically understands tables, data partitions) Hive HCatalog •  HDFS data presented to users as Aster tables Pig •  Fully accessible within the Aster SQL and SQL-MapReduce processing engines, plus ODBC/JDBC & BI tools HDFS 17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 18. Benefits of Aster SQL-H™ Deep metadata layer integration between Aster and Hadoop Business Analysts (Powerful analytics & Performance) •  50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio) -  Analytics on Hadoop data no longer requires expensive talent and training •  Simplified, SQL-based interface with Hadoop data structures (Hcatalog) -  No longer limited by Hive’s QL •  Interoperability with existing ecosystem & skillset -  BI tools (MSTR, Tableau, Cognos), ETL tools, SQL analysts, existing apps Architects and Administrators (Maintainability) •  Leverage existing DBA skill-sets without additional overhead •  Simplify administration and monitoring -  Competitors require manual creation and maintenance of metadata -  Less work and fewer errors -  Can do filtering with Aster; select data from HCatalog, leverage partitioning 18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 19. Aster MapReduce Portfolio: the App Store of Big Data Some of the 50+ out-of-the-box analytical apps Path Analysis Text Analysis Discover patterns in rows of Derive patterns and extract sequential data features in textual data Statistical Analysis Segmentation High-performance processing of Discover natural groupings of common statistical calculations data points Marketing Analytics Data Transformation Analyze customer interactions to Transform data for more optimize marketing decisions advanced analysis 19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 20. Summary •  Mainstream organizations need a unified big data architecture -  Best-of-breed with Hadoop, Aster, Teradata -  Brings “Data Science” to business analysts -  50+ business-ready MapReduce analytics and apps -  Enabled by SQL-MapReduce framework and new SQL-H •  Learn more - asterdata.com/mapreduce •  Download - developer.teradata.com/aster •  Breakout Session : Thursday – 4:30 pm How does SQL-H work ? Sushil Thomas, Teradata Aster 20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.