• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Search Searching And Finding Information In Unstructured And Structured Data Sources
 

Data Search Searching And Finding Information In Unstructured And Structured Data Sources

on

  • 2,630 views

IRM DWBI event in the UK 2009

IRM DWBI event in the UK 2009

Statistics

Views

Total Views
2,630
Views on SlideShare
2,621
Embed Views
9

Actions

Likes
4
Downloads
58
Comments
0

2 Embeds 9

http://www.slideshare.net 5
http://www.linkedin.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data Search Searching And Finding Information In Unstructured And Structured Data Sources Data Search Searching And Finding Information In Unstructured And Structured Data Sources Presentation Transcript

    • 1
    • Data Search Searching and Finding information in Unstructured and Structured Data Sources Erik Fransen 11.00-12.00 P.M. November, 3 Senior Business Consultant IRM UK, DW/BI 2009, London Centennium BI expertisehuis The Hague, The Netherlands e.fransen@centennium.nl 2
    • Agenda • Introduction; • Industry models; • Combining structured & unstructured data – “Pure Portal” – “Index it all” – “Structure it all” • Summary. 3
    • Profile • Erik Fransen • Background: Knowledge Engineering, Middlesex University; • Expertise areas: – Business Intelligence – Knowledge engineering – Knowledge & Content management – Data warehousing – Analytics • CBIP. 4
    • Introduction 5
    • Combining BI with unstructured data • Integrated access to relevant information (‘provide complete picture’); • Unstructured data like documents provide valuable context to numerical data; – Customer complaints – Competitor’s press releases – Marketing documents – … • Insurance fraud analysis (i.e. claim statistics and claim forms); • Competitive Intelligence (i.e. market share data and competitor news); • Customer retention (i.e. sales data and customer complaints); • Data Search acts as a bridge between structured and unstructured data. 6
    • (un)structured data keeps growing…. 2009 2005 Cave paintings, Bone tools 40,000 BC Writing 3500 BC 2001 >80% Unstructured Paper 105 Printing 1450 2000 Electricity, Telephone Oracle-79 1870 SQL-70 Transistor 1947 SQL-89 GIGABYTES Computing 1950 SQL-92 Internet (DARPA) Late 1960s SQL-99 The Web 1993 SQL-03 1999 Source: Forrester 7
    • Industry Model: Text Data Bill Inmon’s DW 2.0™ • Hold data at the lowest detail; • Hold data to infinity; • Have integrity of data and have online high-performance transaction processing; • Tightly couple metadata to the data warehouse environment; • … • Link structured data and unstructured data; 8
    • Industry Model: Information Access Architecture (Gartner) 9
    • Industry Model: Enterprise Search Platform (Forrester) 10
    • Data Search Scenarios Searching and Finding information in Unstructured and Structured Data Sources 11
    • Global architecture Master & Meta Data Reports Structured Data Data DWH OLAP Marts Marts OLTP Mining… Cubes Financial ODS Apps Middleware Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 12
    • Three data search scenarios Master & Meta Data Structure Reports Structured Data Data it all DWH Marts Marts OLAP OLTP Mining… Cubes Financial ODS Apps Middleware Index Pure it all Portal Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 13
    • Scenario 1: Pure Portal Many portlets, one user interface; Business user may manually combines content from several independent sources; Risk: too complex for user. 14
    • 1: Pure Portal Master & Meta Data Reports Structured Data Data DWH OLAP Marts Marts OLTP Mining… Cubes Financial ODS Apps Middleware Pure Portal Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 15
    • Integrate news with BI information Source: Aruba 16
    • Structured BI info… 17
    • … and Photos, Files and Maps 18
    • Scenario 2: “index it all” Enterprise Search from one user interface; Business user knows what to look for and expects a “complete picture” as a result; Risk: Many irrelevant search results due to the nature of document indexing. 19
    • 2: Index it all Master & Meta Data Reports Structured Data Data DWH OLAP Marts Marts OLTP Mining… Cubes Financial ODS Apps Middleware Index it all Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 20
    • Scenario 2: “Index it all” Unstructured Search Search index data sources application User interface BI report is indexed as if it was a document Data warehouse BI Structured application Architecture Reports data sources 21
    • Example: IBM Cognos 8 Go! Search Integration with enterprise search applications (IBM OmniFind, Google OneBox for Enterprise, Yahoo, Autonomy) Search results return all relevant structured content (reports, analyses, etc.) and unstructured content (Word documents, PDFs, et) within a single interface. 22
    • Example: IBM OmniFind 23
    • Example: IBM OmniFind 24
    • SAP BusinessObject Intelligent Search 25
    • SAP BusinessObject Intelligent Search 26 11/9/2
    • Scenario 3: “Structure it all” Generate structure using document warehousing and text mining; Business user knows exactly what to look for; Risk: Limited flexibility for user. 27
    • 3: Structure it all Master & Meta Data Structure Reports Structured Data Data it all DWH Marts Marts OLAP OLTP Mining… Cubes Financial ODS Apps Middleware Portal Content Man System Unstructured Search Search Fileservers Index Text Mining Database Visualisation Email Intranet/inte rnet 28
    • Generating structure in document warehouse Retrieve Preprocess Compile Identify Sources Text Mining Documents Documents Metadata Sources are not Internal sources Format Linguistic analysis Carefully attach fixed retrieval, file documents in a Key features are metadata to Iterative process, servers, consistent matter extracted document sources lead to CMS/DMS Files must be in Indexing Used for new sources External source suitable form for documents querying, retrieval, using text analysis Summarizing matching, crawlers, spiders documents navigation Sources are not support fixed Store in Iterative process, document sources lead to warehouse new sources Source: Dan Sullivan Data warehouse Document warehouse Architecture Architecture Combine (meta)data 29
    • Document warehouse  Contains complete documents or URLs  Metadata about documents: summaries, authors’ names, publication dates, titles, sources, keywords, etc.  Translations of documents  Thematic clustering of similar Document warehouse documents Architecture  Topical or thematic indexes  Extracted key features (structure)  Dimensions and Facts, linked to documents, summaries etc.  Combine with the data warehouse 30
    • BI reporting on dimensional model Dim Dim Dim Action Product Customer Dim Sales Call Competitor Facts Facts Dim Dim Dim Sales person Time Telco Term Data warehouse Document warehouse 31
    • Generate structure using text mining tools Example taken from SPSS PASW Text Analytics, many other tools available: IBM, SAS, Oracle, SAP BO, Microsoft etc. etc. 32
    • Generating structure using UIMA • Unstructured Information Management Architecture • Originates from IBM, now Apache UIMA http://incubator.apache.org/uima/ Source: IBM UIMA is supported by all main BI vendors. 33
    • Example: Generating structure using UIMA • Analyzed by a collection of text analytics • Detected Semantic Entities and Relations Highlighted • Represented in UIMA Common Analysis Structure (CAS) 34
    • Summary • Growing business need for combining BI with unstructured data; • Data Search bridges the gap between both worlds – Scenario 1: “Pure Portal” – Scenario 2: “Index it all” – Scenario 3: “Structure it all” • Scenarios can be combined. Questions? 35