1
Data Search

      Searching and Finding information in
    Unstructured and Structured Data Sources


     Erik Fransen  ...
Agenda
    • Introduction;
    • Industry models;
    • Combining structured & unstructured data
      – “Pure Portal”
   ...
Profile
    • Erik Fransen
    • Background: Knowledge Engineering,
      Middlesex University;
    • Expertise areas:
   ...
Introduction




5
Combining BI with unstructured data
    •   Integrated access to relevant information (‘provide complete picture’);
    • ...
(un)structured data keeps growing….
                                                                                      ...
Industry Model:                        Text   Data

Bill Inmon’s DW 2.0™
    •   Hold data at the lowest detail;
    •   H...
Industry Model:
    Information Access Architecture (Gartner)




9
Industry Model:
     Enterprise Search Platform (Forrester)




10
Data Search Scenarios

     Searching and Finding information in
     Unstructured and Structured Data Sources




11
Global architecture
      Master &
      Meta Data

                                                                      ...
Three data search scenarios
      Master &
      Meta Data

                     Structure                                ...
Scenario 1: Pure Portal

     Many portlets, one user interface;
     Business user may manually combines content
     fro...
1: Pure Portal
      Master &
      Meta Data

                                                                           ...
Integrate news with BI information




                                   Source: Aruba


16
Structured BI info…




17
… and Photos, Files and Maps




18
Scenario 2: “index it all”

     Enterprise Search from one user interface;
     Business user knows what to look for and ...
2: Index it all
      Master &
      Meta Data

                                                                          ...
Scenario 2: “Index it all”


Unstructured                                                 Search
                 Search i...
Example: IBM Cognos 8 Go! Search
                                Integration with enterprise
                             ...
Example: IBM OmniFind




23
Example: IBM OmniFind




24
SAP BusinessObject Intelligent Search




25
SAP BusinessObject Intelligent Search




26                     11/9/2
Scenario 3: “Structure it all”

     Generate structure using document warehousing
     and text mining;
     Business use...
3: Structure it all
      Master &
      Meta Data

                     Structure                                        ...
Generating structure in document warehouse
                                    Retrieve                Preprocess         ...
Document warehouse

        Contains complete documents or URLs
        Metadata about documents:
         summaries, au...
BI reporting on dimensional model

                                                  Dim
        Dim                     D...
Generate structure using text mining tools




     Example taken from SPSS PASW Text Analytics, many other tools availabl...
Generating structure using UIMA
     • Unstructured Information Management Architecture
     • Originates from IBM, now Ap...
Example: Generating structure using UIMA


     • Analyzed by a collection of text analytics
     • Detected Semantic Enti...
Summary
     • Growing business need for combining BI with
       unstructured data;
     • Data Search bridges the gap be...
Upcoming SlideShare
Loading in …5
×

Data Search Searching And Finding Information In Unstructured And Structured Data Sources

2,492 views
2,290 views

Published on

IRM DWBI event in the UK 2009

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,492
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
64
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Data Search Searching And Finding Information In Unstructured And Structured Data Sources

  1. 1. 1
  2. 2. Data Search Searching and Finding information in Unstructured and Structured Data Sources Erik Fransen 11.00-12.00 P.M. November, 3 Senior Business Consultant IRM UK, DW/BI 2009, London Centennium BI expertisehuis The Hague, The Netherlands e.fransen@centennium.nl 2
  3. 3. Agenda • Introduction; • Industry models; • Combining structured & unstructured data – “Pure Portal” – “Index it all” – “Structure it all” • Summary. 3
  4. 4. Profile • Erik Fransen • Background: Knowledge Engineering, Middlesex University; • Expertise areas: – Business Intelligence – Knowledge engineering – Knowledge & Content management – Data warehousing – Analytics • CBIP. 4
  5. 5. Introduction 5
  6. 6. Combining BI with unstructured data • Integrated access to relevant information (‘provide complete picture’); • Unstructured data like documents provide valuable context to numerical data; – Customer complaints – Competitor’s press releases – Marketing documents – … • Insurance fraud analysis (i.e. claim statistics and claim forms); • Competitive Intelligence (i.e. market share data and competitor news); • Customer retention (i.e. sales data and customer complaints); • Data Search acts as a bridge between structured and unstructured data. 6
  7. 7. (un)structured data keeps growing…. 2009 2005 Cave paintings, Bone tools 40,000 BC Writing 3500 BC 2001 >80% Unstructured Paper 105 Printing 1450 2000 Electricity, Telephone Oracle-79 1870 SQL-70 Transistor 1947 SQL-89 GIGABYTES Computing 1950 SQL-92 Internet (DARPA) Late 1960s SQL-99 The Web 1993 SQL-03 1999 Source: Forrester 7
  8. 8. Industry Model: Text Data Bill Inmon’s DW 2.0™ • Hold data at the lowest detail; • Hold data to infinity; • Have integrity of data and have online high-performance transaction processing; • Tightly couple metadata to the data warehouse environment; • … • Link structured data and unstructured data; 8
  9. 9. Industry Model: Information Access Architecture (Gartner) 9
  10. 10. Industry Model: Enterprise Search Platform (Forrester) 10
  11. 11. Data Search Scenarios Searching and Finding information in Unstructured and Structured Data Sources 11
  12. 12. Global architecture Master & Meta Data Reports Structured Data Data DWH OLAP Marts Marts OLTP Mining… Cubes Financial ODS Apps Middleware Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 12
  13. 13. Three data search scenarios Master & Meta Data Structure Reports Structured Data Data it all DWH Marts Marts OLAP OLTP Mining… Cubes Financial ODS Apps Middleware Index Pure it all Portal Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 13
  14. 14. Scenario 1: Pure Portal Many portlets, one user interface; Business user may manually combines content from several independent sources; Risk: too complex for user. 14
  15. 15. 1: Pure Portal Master & Meta Data Reports Structured Data Data DWH OLAP Marts Marts OLTP Mining… Cubes Financial ODS Apps Middleware Pure Portal Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 15
  16. 16. Integrate news with BI information Source: Aruba 16
  17. 17. Structured BI info… 17
  18. 18. … and Photos, Files and Maps 18
  19. 19. Scenario 2: “index it all” Enterprise Search from one user interface; Business user knows what to look for and expects a “complete picture” as a result; Risk: Many irrelevant search results due to the nature of document indexing. 19
  20. 20. 2: Index it all Master & Meta Data Reports Structured Data Data DWH OLAP Marts Marts OLTP Mining… Cubes Financial ODS Apps Middleware Index it all Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 20
  21. 21. Scenario 2: “Index it all” Unstructured Search Search index data sources application User interface BI report is indexed as if it was a document Data warehouse BI Structured application Architecture Reports data sources 21
  22. 22. Example: IBM Cognos 8 Go! Search Integration with enterprise search applications (IBM OmniFind, Google OneBox for Enterprise, Yahoo, Autonomy) Search results return all relevant structured content (reports, analyses, etc.) and unstructured content (Word documents, PDFs, et) within a single interface. 22
  23. 23. Example: IBM OmniFind 23
  24. 24. Example: IBM OmniFind 24
  25. 25. SAP BusinessObject Intelligent Search 25
  26. 26. SAP BusinessObject Intelligent Search 26 11/9/2
  27. 27. Scenario 3: “Structure it all” Generate structure using document warehousing and text mining; Business user knows exactly what to look for; Risk: Limited flexibility for user. 27
  28. 28. 3: Structure it all Master & Meta Data Structure Reports Structured Data Data it all DWH Marts Marts OLAP OLTP Mining… Cubes Financial ODS Apps Middleware Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 28
  29. 29. Generating structure in document warehouse Retrieve Preprocess Compile Identify Sources Text Mining Documents Documents Metadata Sources are not Internal sources Format Linguistic analysis Carefully attach fixed retrieval, file documents in a Key features are metadata to Iterative process, servers, consistent matter extracted document sources lead to CMS/DMS Files must be in Indexing Used for new sources External source suitable form for documents querying, retrieval, using text analysis Summarizing matching, crawlers, spiders documents navigation Sources are not support fixed Store in Iterative process, document sources lead to warehouse new sources Source: Dan Sullivan Data warehouse Document warehouse Architecture Architecture Combine (meta)data 29
  30. 30. Document warehouse  Contains complete documents or URLs  Metadata about documents: summaries, authors’ names, publication dates, titles, sources, keywords, etc.  Translations of documents  Thematic clustering of similar Document warehouse documents Architecture  Topical or thematic indexes  Extracted key features (structure)  Dimensions and Facts, linked to documents, summaries etc.  Combine with the data warehouse 30
  31. 31. BI reporting on dimensional model Dim Dim Dim Action Product Customer Dim Sales Call Competitor Facts Facts Dim Dim Dim Sales person Time Telco Term Data warehouse Document warehouse 31
  32. 32. Generate structure using text mining tools Example taken from SPSS PASW Text Analytics, many other tools available: IBM, SAS, Oracle, SAP BO, Microsoft etc. etc. 32
  33. 33. Generating structure using UIMA • Unstructured Information Management Architecture • Originates from IBM, now Apache UIMA http://incubator.apache.org/uima/ Source: IBM UIMA is supported by all main BI vendors. 33
  34. 34. Example: Generating structure using UIMA • Analyzed by a collection of text analytics • Detected Semantic Entities and Relations Highlighted • Represented in UIMA Common Analysis Structure (CAS) 34
  35. 35. Summary • Growing business need for combining BI with unstructured data; • Data Search bridges the gap between both worlds – Scenario 1: “Pure Portal” – Scenario 2: “Index it all” – Scenario 3: “Structure it all” • Scenarios can be combined. Questions? 35

×