SlideShare a Scribd company logo
1 of 41
INSIGHT THAT MATTERS



COMPLETING THE BIG DATA PICTURE:
  UNDERSTANDING WHY
   AND NOT JUST WHAT
     Sid Probstein, Chief Technology Officer, sid@attivio.com
BIG
DATA
Big Data vs. Extreme Information




     Source: 'Big Data' Is Only the Beginning of Extreme Information Management, April 7, 2011, Gartner Group
Completing the Big Data Picture




       Structured                  Unstructured                Unstructured
          Data                         Data                      Content
  • Stored and/or sourced       • Also known as              • Any type of free-form
    from relational databases     Unstructured Data or         text information
  • “Normalized” so that          Non-Relational Data        • Documents (500+
    each piece of data is       • Contains tags or other       formats); scanned
    stored once                   markers to readily parse     documents; email; web
  • Organized in tables that      data fields, etc.            content; SharePoint;
    are related to each other   • Clickstream data, web        knowledge bases; etc.
                                  logs, etc.
Unstructured Content – Valuable Opportunity
57% of data and IT managers                                              Don’t know/Not sure 6%
                                                                Not Important                     Extremely
 surveyed say unstructured                                      at this time 8%                   Important 18%

content is extremely or very
     important to their                                 Somewhat                                      Very Important
                                                                                                      39%
        businesses…                                  Important 30%




                       More resources for
        Don’t know/    Unstructured Content 13%
        Not sure 20%
                                      Equal resources
                                      for Unstructured                       …yet 52% also say more
                                      & Structured 14%
                                      Other 1%
                                                                           resources are committed to
More resources                                                                  structured data.
 for Structured
      Data 52%


                       Source: Unisphere Research (June 2011)
Unified Information Access (UIA)

• Query any information – structured or unstructured –
  with the precision of SQL and the fuzziness of search
• Build applications and rapidly create value by avoiding
  the typical risks presented by information silos
• Use text analytics, language modeling and machine
  learning components to enrich and link information
  together across silos
• Allow users to consume information the way they want
  to consume it, with search or Business Intelligence (BI)


       “[Attivio is] on the forefront of a shift away from
       reliance on relational databases… “ --Nick Patience
Unified Information Access – Enabling Variety




                      INFORMATION




Enrich unstructured content and link it to structured data
               to find “WHAT” and “WHY”
Unified Information Access – Conceptual View


                                         1


      John Smith <jsmith@customer.com>
             8




      New engagement



I am delighted that we were able to
move forward … your service desk has
been wonderful and helped resolve…


       Analyze & enrich                        Retain & respect
       unstructured data                     normalized structure
SEARCH &          UIA           PACKAGED           ACTIVE       AD HOC QUERY       BI &
  DISCOVERY    APPLICATIONS      APPLICATIONS      DASHBOARDS         TOOLS       REPORTING




                            ATTIVIO
              ACTIVE INTELLIGENCE ENGINE (AIE) 3.0




WEB SERVER    FILE SERVER     EMAIL SERVER   CONTENT MGMT   HADOOP++   CRM, ERP   ADBMS/EDW
SEARCH &            UIA              PACKAGED           ACTIVE       AD HOC QUERY         BI &
  DISCOVERY      APPLICATIONS         APPLICATIONS      DASHBOARDS         TOOLS         REPORTING




                 SEARCH API                                           ANSI-92 SQL
                JAVA, WSDL, REST                                       ODBC, JDBC

                              QUERY & RESPONSE WORKFLOWS
              PREDICTIVE AUTOCOMPLETE, FUZZY MATCHING, FACET FINDER™, ACTIVE SECURITY,
                        BEHAVIORAL ANALYTICS*, CONTENT SPOTLIGHTING, ALERTS

                                        UNIVERSAL ENGINE
      INCREMENTAL REAL-TIME INDEXING, QUERY RESOLUTION, JOIN/GRAPH, RELEVANCY, CONTENT STORE


                              INGESTION & TRIGGER WORKFLOWS
                LANGUAGE PROCESSING, TEXT EXTRACTION, TEXT ANALYTICS, DATA MINING,
                                  CLASSIFICATION*, ONTOLOGY*

                CONTENT API                                          CONNECTORS
                JAVA, WSDL, REST




WEB SERVER      FILE SERVER        EMAIL SERVER   CONTENT MGMT   HADOOP++    CRM, ERP    ADBMS/EDW
AIE – Text Analytics
KEY PHRASES                 AUTO-CLASSIFICATION   SENTIMENT ANALYSIS




ENTITY/CONCEPT EXTRACTION                         ENTITY SENTIMENT
AIE – ANSI-92 SQL with ODBC, JDBC

• Use a wide array of
  existing BI products
  with AIE
• Easily integrate AIE
  with existing BI/DW
  infrastructure
• AIE ODBC 3.5
  compliant driver
  included
AIE – Triples & Graphs

                                                      <triple id="1">
                                                       <entityId>P01</entityId>
                                                       <name>Joe</name>
                                                       <is>person</is>
                                                       ...
                                                      </triple>


All people who live in a college town:
JOIN(is:person, INNER(JOIN(is:city, INNER(is:college, on="name=locatedIn")),
on="livesIn=name"))

All people who live in a college town with “happy students”:
JOIN(is:person, INNER(JOIN(is:city, INNER(JOIN(is:college,
INNER(AND(table:news, NEAR(happiest, students)), ON="name=college")),
ON="name=locatedIn")), ON="livesIn=name"))
SEARCH &           ACTIVE          PACKAGED                    AD HOC QUERY        BI &
     DISCOVERY        DASHBOARDS       APPLICATIONS                     TOOLS        REPORTING




             ATTIVIO                                                ADBMS
    ACTIVE INTELLIGENCE ENGINE (AIE) 3.0                         BIG DATA OR ANALYTIC PLATFORM




WEB SERVER       FILE SERVER   EMAIL SERVER           HADOOP++             STRUCTURED DATA
18
ADBMS -> AIE




11/9/2011
19
ADBMS -> AIE




11/9/2011
20
AIE -> ADBMS




11/9/2011
SEARCH &          UIA           PACKAGED           ACTIVE       AD HOC QUERY       BI &
  DISCOVERY    APPLICATIONS      APPLICATIONS      DASHBOARDS         TOOLS       REPORTING




                            ATTIVIO
              ACTIVE INTELLIGENCE ENGINE (AIE) 3.0




WEB SERVER    FILE SERVER     EMAIL SERVER   CONTENT MGMT   HADOOP++   CRM, ERP   ADBMS/EDW
AIE – Non-Collocated JOIN

                                                                Unlimited scaling of
                                     Node 1                     JOIN capabilities
                                                                No special planning required to
                                                                JOIN across content/data spread
                                                                across partitions




           Hash-based Partitioning                                           Query
                                              Cross-Node JOIN
           of Ingested Documents
                                              Coordination
                 and Records
                                     Node 2                     JOIN(table:A, INNER(table:B),
 Table A                                                        INNER(table:email), on=“emailaddress”




 Table B
Hadoop & AIE – Complementary

  Hadoop is great for…

  • Rapidly collecting an extremely large volume of unprocessed information
  • Providing a flexible, (if sometimes complicated) way to ask almost any
    question of information
  • Bringing information to data scientists
  • Batch processing where latency is not a concern

  AIE is great for…

  • Deep insight across structured and especially unstructured information
  • Handling the Variety of Extreme Information
  • Getting answers quickly
  • Providing simple ways of asking questions
  • Bringing information to end users using their desired method
  • Real-time / high-velocity analysis
SEARCH &              UIA         PACKAGED        ACTIVE         AD HOC QUERY      BI &
      DISCOVERY        APPLICATIONS    APPLICATIONS   DASHBOARDS           TOOLS      REPORTING




                                 ATTIVIO
                     ACTIVE INTELLIGENCE ENGINE (AIE) 3.0



                                                             HADOOP
                                                           (HIVE, HDFS, HBASE)




FILE SERVER   EMAIL SERVER   CONTENT MGMT ADBMS/EDW       WEB SERVER     MONITORED    SENSOR
                                                                          SYSTEM
AIE XT Module – Key Features

• Connectors to Big Data sources
   • Hadoop (Hive, Hbase, HDFS)
   • Cloudera
   • Others coming soon…
• Data integration in the
  engine/workflow
   • Text analytics
   • Data cleansing & mining
   • Correlate at query time
• Universal information repository
   • Natively parallel, scales without
     excessive hardware costs
• ODBC/JDBC Connectivity Module
• Attivio Classification Engine
• Attivio Behavioral Analytics Module
AIE & Hadoop – Find “Mapreduce Tutorial”
Using MapReduce             Using AIE to Index



                            Using AIE Workflow
                            public class SampleSimpleIngestTransformer extends AbstractSingleDocumentTransformer {


                             private String value = "mapreduce
                            tutorial";

                              @Override
                              public ProcessingResult
                            processDocument(AttivioDocument doc) throws
                            AttivioException {
                                for (Field<?> f : doc) {
                                  for (FieldValue<?> fv : f) {
                                    if
                            (fv.getValueAsString().contains(value)) {
                                       return dropResult();
                                    }
                                  }
                                }
                                return okResult();
                              }
                                public String getValue() {
                                  return value;
                                }

                                public void setValue(String value) {
                                  this.value = value;
                                }

                            }
Case Study
           • Content aggregator needed to provide faster, better
Problem

             customer experience to build business
           • Needed to replace Lucene implementation, which
             couldn't be adapted to meet requirements
           • Goals: reduce latency, serve more queries, improve
             relevancy of results, streamline white-label business

           • Handles massive query volume, rapid updates and
Why AIE?




             low latency better than competitors
           • Able to improve relevancy with: information about       Decision Drivers
             past purchases, fuzzy search and language modeling
                                                                      High query volume
           • Workflow supports white-label strategy without
             writing more software                                    Low latency
                                                                      Rapid updates
           • Thumbplay can offer more content and handle more
Results




                                                                      Results relevancy tuning
             demand; customer experience improved
                                                                      Workflow
           • Operations simplified by reduced complexity
                                                                      Rapid development &
           • Rapid development reduces cost and time in serving
                                                                       deployment
             revenue-generating partnerships
Case Study
           • Launch a major new online music service
Problem

             incorporating streaming music, local caching,
             internet radio, personalization and multiple
             subscription/service levels
           • Expect 2,000+ queries per second during beta, up to
             5x that in production

           • Handles massive query volume, rapid updates and
Why AIE?




             low latency better than competitors
           • Able to improve relevancy using fuzzy name            Decision Drivers
             matching, artist aliases and transaction history
                                                                    High query volume
           • Workflow supports white-label strategy without
             writing more software                                  Low latency
                                                                    Rapid updates
           • iHeartRadio launched with no scalability or
Results




                                                                    Relevancy based on sales
             performance problems
                                                                     history
           • Operations simplified by reduced complexity
                                                                    Workflow
           • Rapid development reduces cost and time in serving
             revenue-generating partnerships
Case Study: DCS eMap




                        Time
                       Time
Case Study: DCS eMap
 1.    Review Saved Searches           2.   New or refined search              3.   Who are custodians? What are      4.    What is all this about?
                                                                                    their profiles?
       Review Issues                        Include exclude threads                                                         Show Tag Cloud
                                                                                    Who’s active and who’s
       Review MDi Reports                   Mark relevance                                                                  Show Facets
                                                                                    passive?
                                            Review related thread
                                                                                    Where should we start looking?
                                            Refine search query




5.                                6.                                      7.                                         8.
      What are the specific and        Who is talking to who?                       Review email, tweets, posts            Compare threads across
      related conversations?                                                                                               email, Facebook and Twiitter
                                       Who’s been dropped/added?                    Annotate items
      Who are the participants?                                                                                            Annotate items
                                       Annotate conversations                       Review attachments
      What other conversations                                                                                             Review attachments
                                       Create issues for further review             Tag issues for further review
      did they participate in?
Case Study: Database Archiving

           •    Leading database archiving suite reduces IT costs and
Problem



                demands on production systems through archiving and
                legacy application decommissioning
           •    Needed an easy way to quickly find data across archives
                without a priori knowledge of archive structure

            •   Ease of integration, rapid development model and high
                                                                                  Decision Drivers
Why AIE?




                degree of flexibility; grey-box platform
            •   Discovery-oriented information access capabilities such as       Addresses key gap in product line
                FacetFinder, fuzzy query operators, spelling correction, etc…
                                                                                 Easy integration with existing
            •   Support for multiple query types - keyword search and SQL         architecture
            •   Strategic unified information access vision alignment            Unified information access
            •   Eight months from project inception to GA                         strategic direction
                                                                                 Multiple query modes – search
Results




            •   High-speed access to archive information; no more
                concerns about ability to access data once archived               and SQL
            •   Powerful cross-archive query capabilities enhance legal
                discovery and data retention use cases
Case Study: Financial Services Regulation
           • Leading financial firm needed to reduce costs and
Problem

             risk by expediting rule monitoring and policy updates
           • 700+ staff track 200+ global regulators, who publish
             in different formats (Word, web, PDF, etc.)
           • Needed to streamline collection/reporting of metrics
             and policy activity for oversight and audits
           • Ability to harvest, analyze and link diverse
Why AIE?




             information types
           • Provide immediate notification of new rules and
             interactive, role-based dashboards
           • Ability to both push and pull information, generate
             audit reports and issue alerts
                                                                         Decision Drivers
                                                                      Information harvesting and analysis
           • No more manual monitoring of regulators
Results




                                                                      Unified information
           • Changes and drafts are detected and tracked on the
             dashboard                                                Workflow, alerts and triggers

           • Users see a roll up of the risk that matters to them     Role-based Active Dashboard

           • Workflow automates compliance processes                  SharePoint Integration
Case Study: IT Incident Management
                   • Disruptions in application availability affected a large
                     financial services ability to meet SLA
Why AIE? Problem


                   • The company’s goal is to identify warning conditions so
                     fixes can be applied before an incident
                   • Key challenge: required information for issue resolution
                     is scattered across more than 60 diverse sources

                   • Ease of use in retrieving and linking information
                     across data and content sources
                   • Query processing speed and scalability
                   • Reporting and analysis of service metrics with all             Decision Drivers
                     relevant data - pushed to users via role-based Active       Unified information across content
                     Dashboard and alerts
                                                                                  and data sources

                   • Executive dashboards with comprehensive information         Facets and JOINs for tracing and
Results




                     and push-delivery ensure insight and rapid results           connecting related tickets,
                                                                                  knowledge base docs and experts
                   • Reduced MTTR for 17,000 annual service events from
                     27 to 3 minutes                                             Ability to push information
                                                                                 Powered by Active Dashboard
Convergence Architecture Using AIE
              (3) The new convergence
              application consumes                                            Taxonomies, Ontologies,
              information from the UIA                                              Lexicons
              engine, and when
                                             Convergence App
              necessary acts on it using
              the wrapped/generalized
              methods


                                               Attivio AIE


          API/Methods                                                         Content/Data
                                                                              (2) Content from
(1) Important functions in                                                    legacy applications
legacy apps are wrapped                                                       is loaded into a UIA
and generalized                                                               engine and
                                                                              normalized/rationali
                                                                              zed using various
                             Legacy        Legacy       Legacy       Legacy
                             App 1         App 2        App 3    …   App N
                                                                              information
                                                                              structures like an
                                                                              Ontology
ENTER FOR A CHANCE TO
   WIN A 50” LG HDTV
    LEAVE YOUR BUSINESS CARD TO ENTER

                                                                                       It’s That Easy to WIN!!!




SHOW EXHIBITORS AND THEIR EMPLOYEES ARE NOT ELIGIBLE TO PARTICIPATE IN THE DRAWING. TV WILL BE SHIPPED TO WINNER AFTER THE SHOW, WITHIN THE US ONLY.
QUESTIONS?




 FOR MORE INFORMATION PLEASE VISIT
WWW.ATTIVIO.COM OR STOP BY OUR BOOTH

More Related Content

Viewers also liked

JCConf 2015 - Google Dataflow 在雲端大資料處理的應用
JCConf 2015 - Google Dataflow 在雲端大資料處理的應用JCConf 2015 - Google Dataflow 在雲端大資料處理的應用
JCConf 2015 - Google Dataflow 在雲端大資料處理的應用Simon Su
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Leadership Styles with Examples
Leadership Styles with ExamplesLeadership Styles with Examples
Leadership Styles with Exampleschintu83
 

Viewers also liked (8)

JCConf 2015 - Google Dataflow 在雲端大資料處理的應用
JCConf 2015 - Google Dataflow 在雲端大資料處理的應用JCConf 2015 - Google Dataflow 在雲端大資料處理的應用
JCConf 2015 - Google Dataflow 在雲端大資料處理的應用
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Leadership Styles with Examples
Leadership Styles with ExamplesLeadership Styles with Examples
Leadership Styles with Examples
 

Similar to Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not Just What - Sid Probstein - Attivio

Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Etu Solution
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...ArunshankarArjunan
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPDr Geetha Mohan
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 
Big Data: Beyond the "Bigness" and the Technology (webcast)
Big Data: Beyond the "Bigness" and the Technology (webcast)Big Data: Beyond the "Bigness" and the Technology (webcast)
Big Data: Beyond the "Bigness" and the Technology (webcast)Apigee | Google Cloud
 
Enabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data SourcesEnabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data SourcesInside Analysis
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...Stichting ePortfolio Support
 
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)Ajay Ohri
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Realizing the Value of Social: Evolving from Social Media to Customer Experience
Realizing the Value of Social: Evolving from Social Media to Customer ExperienceRealizing the Value of Social: Evolving from Social Media to Customer Experience
Realizing the Value of Social: Evolving from Social Media to Customer ExperienceTata Consultancy Services
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Analyze Your Data, Transform Your Business
Analyze Your Data, Transform Your BusinessAnalyze Your Data, Transform Your Business
Analyze Your Data, Transform Your BusinessDATAVERSITY
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics AKAGroup
 
CISO's Guide to Securing SharePoint
CISO's Guide to Securing SharePointCISO's Guide to Securing SharePoint
CISO's Guide to Securing SharePointImperva
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
All Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the CloudAll Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the CloudInside Analysis
 

Similar to Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not Just What - Sid Probstein - Attivio (20)

Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 
Big Data: Beyond the "Bigness" and the Technology (webcast)
Big Data: Beyond the "Bigness" and the Technology (webcast)Big Data: Beyond the "Bigness" and the Technology (webcast)
Big Data: Beyond the "Bigness" and the Technology (webcast)
 
Enabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data SourcesEnabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data Sources
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
 
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Realizing the Value of Social: Evolving from Social Media to Customer Experience
Realizing the Value of Social: Evolving from Social Media to Customer ExperienceRealizing the Value of Social: Evolving from Social Media to Customer Experience
Realizing the Value of Social: Evolving from Social Media to Customer Experience
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Analyze Your Data, Transform Your Business
Analyze Your Data, Transform Your BusinessAnalyze Your Data, Transform Your Business
Analyze Your Data, Transform Your Business
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics
 
CISO's Guide to Securing SharePoint
CISO's Guide to Securing SharePointCISO's Guide to Securing SharePoint
CISO's Guide to Securing SharePoint
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
All Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the CloudAll Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the Cloud
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not Just What - Sid Probstein - Attivio

  • 1. INSIGHT THAT MATTERS COMPLETING THE BIG DATA PICTURE: UNDERSTANDING WHY AND NOT JUST WHAT Sid Probstein, Chief Technology Officer, sid@attivio.com
  • 3. Big Data vs. Extreme Information Source: 'Big Data' Is Only the Beginning of Extreme Information Management, April 7, 2011, Gartner Group
  • 4. Completing the Big Data Picture Structured Unstructured Unstructured Data Data Content • Stored and/or sourced • Also known as • Any type of free-form from relational databases Unstructured Data or text information • “Normalized” so that Non-Relational Data • Documents (500+ each piece of data is • Contains tags or other formats); scanned stored once markers to readily parse documents; email; web • Organized in tables that data fields, etc. content; SharePoint; are related to each other • Clickstream data, web knowledge bases; etc. logs, etc.
  • 5. Unstructured Content – Valuable Opportunity 57% of data and IT managers Don’t know/Not sure 6% Not Important Extremely surveyed say unstructured at this time 8% Important 18% content is extremely or very important to their Somewhat Very Important 39% businesses… Important 30% More resources for Don’t know/ Unstructured Content 13% Not sure 20% Equal resources for Unstructured …yet 52% also say more & Structured 14% Other 1% resources are committed to More resources structured data. for Structured Data 52% Source: Unisphere Research (June 2011)
  • 6. Unified Information Access (UIA) • Query any information – structured or unstructured – with the precision of SQL and the fuzziness of search • Build applications and rapidly create value by avoiding the typical risks presented by information silos • Use text analytics, language modeling and machine learning components to enrich and link information together across silos • Allow users to consume information the way they want to consume it, with search or Business Intelligence (BI) “[Attivio is] on the forefront of a shift away from reliance on relational databases… “ --Nick Patience
  • 7. Unified Information Access – Enabling Variety INFORMATION Enrich unstructured content and link it to structured data to find “WHAT” and “WHY”
  • 8. Unified Information Access – Conceptual View 1 John Smith <jsmith@customer.com> 8 New engagement I am delighted that we were able to move forward … your service desk has been wonderful and helped resolve… Analyze & enrich Retain & respect unstructured data normalized structure
  • 9. SEARCH & UIA PACKAGED ACTIVE AD HOC QUERY BI & DISCOVERY APPLICATIONS APPLICATIONS DASHBOARDS TOOLS REPORTING ATTIVIO ACTIVE INTELLIGENCE ENGINE (AIE) 3.0 WEB SERVER FILE SERVER EMAIL SERVER CONTENT MGMT HADOOP++ CRM, ERP ADBMS/EDW
  • 10. SEARCH & UIA PACKAGED ACTIVE AD HOC QUERY BI & DISCOVERY APPLICATIONS APPLICATIONS DASHBOARDS TOOLS REPORTING SEARCH API ANSI-92 SQL JAVA, WSDL, REST ODBC, JDBC QUERY & RESPONSE WORKFLOWS PREDICTIVE AUTOCOMPLETE, FUZZY MATCHING, FACET FINDER™, ACTIVE SECURITY, BEHAVIORAL ANALYTICS*, CONTENT SPOTLIGHTING, ALERTS UNIVERSAL ENGINE INCREMENTAL REAL-TIME INDEXING, QUERY RESOLUTION, JOIN/GRAPH, RELEVANCY, CONTENT STORE INGESTION & TRIGGER WORKFLOWS LANGUAGE PROCESSING, TEXT EXTRACTION, TEXT ANALYTICS, DATA MINING, CLASSIFICATION*, ONTOLOGY* CONTENT API CONNECTORS JAVA, WSDL, REST WEB SERVER FILE SERVER EMAIL SERVER CONTENT MGMT HADOOP++ CRM, ERP ADBMS/EDW
  • 11. AIE – Text Analytics KEY PHRASES AUTO-CLASSIFICATION SENTIMENT ANALYSIS ENTITY/CONCEPT EXTRACTION ENTITY SENTIMENT
  • 12. AIE – ANSI-92 SQL with ODBC, JDBC • Use a wide array of existing BI products with AIE • Easily integrate AIE with existing BI/DW infrastructure • AIE ODBC 3.5 compliant driver included
  • 13.
  • 14.
  • 15.
  • 16. AIE – Triples & Graphs <triple id="1"> <entityId>P01</entityId> <name>Joe</name> <is>person</is> ... </triple> All people who live in a college town: JOIN(is:person, INNER(JOIN(is:city, INNER(is:college, on="name=locatedIn")), on="livesIn=name")) All people who live in a college town with “happy students”: JOIN(is:person, INNER(JOIN(is:city, INNER(JOIN(is:college, INNER(AND(table:news, NEAR(happiest, students)), ON="name=college")), ON="name=locatedIn")), ON="livesIn=name"))
  • 17. SEARCH & ACTIVE PACKAGED AD HOC QUERY BI & DISCOVERY DASHBOARDS APPLICATIONS TOOLS REPORTING ATTIVIO ADBMS ACTIVE INTELLIGENCE ENGINE (AIE) 3.0 BIG DATA OR ANALYTIC PLATFORM WEB SERVER FILE SERVER EMAIL SERVER HADOOP++ STRUCTURED DATA
  • 21. SEARCH & UIA PACKAGED ACTIVE AD HOC QUERY BI & DISCOVERY APPLICATIONS APPLICATIONS DASHBOARDS TOOLS REPORTING ATTIVIO ACTIVE INTELLIGENCE ENGINE (AIE) 3.0 WEB SERVER FILE SERVER EMAIL SERVER CONTENT MGMT HADOOP++ CRM, ERP ADBMS/EDW
  • 22. AIE – Non-Collocated JOIN Unlimited scaling of Node 1 JOIN capabilities No special planning required to JOIN across content/data spread across partitions Hash-based Partitioning Query Cross-Node JOIN of Ingested Documents Coordination and Records Node 2 JOIN(table:A, INNER(table:B), Table A INNER(table:email), on=“emailaddress” Table B
  • 23. Hadoop & AIE – Complementary Hadoop is great for… • Rapidly collecting an extremely large volume of unprocessed information • Providing a flexible, (if sometimes complicated) way to ask almost any question of information • Bringing information to data scientists • Batch processing where latency is not a concern AIE is great for… • Deep insight across structured and especially unstructured information • Handling the Variety of Extreme Information • Getting answers quickly • Providing simple ways of asking questions • Bringing information to end users using their desired method • Real-time / high-velocity analysis
  • 24. SEARCH & UIA PACKAGED ACTIVE AD HOC QUERY BI & DISCOVERY APPLICATIONS APPLICATIONS DASHBOARDS TOOLS REPORTING ATTIVIO ACTIVE INTELLIGENCE ENGINE (AIE) 3.0 HADOOP (HIVE, HDFS, HBASE) FILE SERVER EMAIL SERVER CONTENT MGMT ADBMS/EDW WEB SERVER MONITORED SENSOR SYSTEM
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. AIE XT Module – Key Features • Connectors to Big Data sources • Hadoop (Hive, Hbase, HDFS) • Cloudera • Others coming soon… • Data integration in the engine/workflow • Text analytics • Data cleansing & mining • Correlate at query time • Universal information repository • Natively parallel, scales without excessive hardware costs • ODBC/JDBC Connectivity Module • Attivio Classification Engine • Attivio Behavioral Analytics Module
  • 31. AIE & Hadoop – Find “Mapreduce Tutorial” Using MapReduce Using AIE to Index Using AIE Workflow public class SampleSimpleIngestTransformer extends AbstractSingleDocumentTransformer { private String value = "mapreduce tutorial"; @Override public ProcessingResult processDocument(AttivioDocument doc) throws AttivioException { for (Field<?> f : doc) { for (FieldValue<?> fv : f) { if (fv.getValueAsString().contains(value)) { return dropResult(); } } } return okResult(); } public String getValue() { return value; } public void setValue(String value) { this.value = value; } }
  • 32. Case Study • Content aggregator needed to provide faster, better Problem customer experience to build business • Needed to replace Lucene implementation, which couldn't be adapted to meet requirements • Goals: reduce latency, serve more queries, improve relevancy of results, streamline white-label business • Handles massive query volume, rapid updates and Why AIE? low latency better than competitors • Able to improve relevancy with: information about Decision Drivers past purchases, fuzzy search and language modeling  High query volume • Workflow supports white-label strategy without writing more software  Low latency  Rapid updates • Thumbplay can offer more content and handle more Results  Results relevancy tuning demand; customer experience improved  Workflow • Operations simplified by reduced complexity  Rapid development & • Rapid development reduces cost and time in serving deployment revenue-generating partnerships
  • 33. Case Study • Launch a major new online music service Problem incorporating streaming music, local caching, internet radio, personalization and multiple subscription/service levels • Expect 2,000+ queries per second during beta, up to 5x that in production • Handles massive query volume, rapid updates and Why AIE? low latency better than competitors • Able to improve relevancy using fuzzy name Decision Drivers matching, artist aliases and transaction history  High query volume • Workflow supports white-label strategy without writing more software  Low latency  Rapid updates • iHeartRadio launched with no scalability or Results  Relevancy based on sales performance problems history • Operations simplified by reduced complexity  Workflow • Rapid development reduces cost and time in serving revenue-generating partnerships
  • 34. Case Study: DCS eMap Time Time
  • 35. Case Study: DCS eMap 1. Review Saved Searches 2. New or refined search 3. Who are custodians? What are 4. What is all this about? their profiles? Review Issues Include exclude threads Show Tag Cloud Who’s active and who’s Review MDi Reports Mark relevance Show Facets passive? Review related thread Where should we start looking? Refine search query 5. 6. 7. 8. What are the specific and Who is talking to who? Review email, tweets, posts Compare threads across related conversations? email, Facebook and Twiitter Who’s been dropped/added? Annotate items Who are the participants? Annotate items Annotate conversations Review attachments What other conversations Review attachments Create issues for further review Tag issues for further review did they participate in?
  • 36. Case Study: Database Archiving • Leading database archiving suite reduces IT costs and Problem demands on production systems through archiving and legacy application decommissioning • Needed an easy way to quickly find data across archives without a priori knowledge of archive structure • Ease of integration, rapid development model and high Decision Drivers Why AIE? degree of flexibility; grey-box platform • Discovery-oriented information access capabilities such as  Addresses key gap in product line FacetFinder, fuzzy query operators, spelling correction, etc…  Easy integration with existing • Support for multiple query types - keyword search and SQL architecture • Strategic unified information access vision alignment  Unified information access • Eight months from project inception to GA strategic direction  Multiple query modes – search Results • High-speed access to archive information; no more concerns about ability to access data once archived and SQL • Powerful cross-archive query capabilities enhance legal discovery and data retention use cases
  • 37. Case Study: Financial Services Regulation • Leading financial firm needed to reduce costs and Problem risk by expediting rule monitoring and policy updates • 700+ staff track 200+ global regulators, who publish in different formats (Word, web, PDF, etc.) • Needed to streamline collection/reporting of metrics and policy activity for oversight and audits • Ability to harvest, analyze and link diverse Why AIE? information types • Provide immediate notification of new rules and interactive, role-based dashboards • Ability to both push and pull information, generate audit reports and issue alerts Decision Drivers  Information harvesting and analysis • No more manual monitoring of regulators Results  Unified information • Changes and drafts are detected and tracked on the dashboard  Workflow, alerts and triggers • Users see a roll up of the risk that matters to them  Role-based Active Dashboard • Workflow automates compliance processes  SharePoint Integration
  • 38. Case Study: IT Incident Management • Disruptions in application availability affected a large financial services ability to meet SLA Why AIE? Problem • The company’s goal is to identify warning conditions so fixes can be applied before an incident • Key challenge: required information for issue resolution is scattered across more than 60 diverse sources • Ease of use in retrieving and linking information across data and content sources • Query processing speed and scalability • Reporting and analysis of service metrics with all Decision Drivers relevant data - pushed to users via role-based Active  Unified information across content Dashboard and alerts and data sources • Executive dashboards with comprehensive information  Facets and JOINs for tracing and Results and push-delivery ensure insight and rapid results connecting related tickets, knowledge base docs and experts • Reduced MTTR for 17,000 annual service events from 27 to 3 minutes  Ability to push information  Powered by Active Dashboard
  • 39. Convergence Architecture Using AIE (3) The new convergence application consumes Taxonomies, Ontologies, information from the UIA Lexicons engine, and when Convergence App necessary acts on it using the wrapped/generalized methods Attivio AIE API/Methods Content/Data (2) Content from (1) Important functions in legacy applications legacy apps are wrapped is loaded into a UIA and generalized engine and normalized/rationali zed using various Legacy Legacy Legacy Legacy App 1 App 2 App 3 … App N information structures like an Ontology
  • 40. ENTER FOR A CHANCE TO WIN A 50” LG HDTV LEAVE YOUR BUSINESS CARD TO ENTER It’s That Easy to WIN!!! SHOW EXHIBITORS AND THEIR EMPLOYEES ARE NOT ELIGIBLE TO PARTICIPATE IN THE DRAWING. TV WILL BE SHIPPED TO WINNER AFTER THE SHOW, WITHIN THE US ONLY.
  • 41. QUESTIONS? FOR MORE INFORMATION PLEASE VISIT WWW.ATTIVIO.COM OR STOP BY OUR BOOTH

Editor's Notes

  1. Volume: machine generated transactions (sensors, log files, click streams, automated trades)Variety: includes other sources of all kinds of data which may have very high volume (such as email or IM)Velocity: batch is not always sufficient, especially for “point of sale” analyticsComplexity:variable data types, unstructured content, extracting meaning by referencing other transactions