• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
IBM Software Group | DB2 Information Management Software
 

IBM Software Group | DB2 Information Management Software

on

  • 1,054 views

 

Statistics

Views

Total Views
1,054
Views on SlideShare
1,053
Embed Views
1

Actions

Likes
0
Downloads
10
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • It seems that everyone is looking for information, but no one can find anything. Many call this the information overload. We see four key challenges that organizations face as they attempt to get value out of all the information assets that exist across the enterprise. The first is that i nformation is isolated in multiple silos typically created by individual departments but the needs of information consumers typically cut across an organization. Many companies have tried to standardize on a single content management system to try to get their arms around this problem but have been unsuccessful because the subject matter experts that create the content are unwilling to modify their existing tools and workflow.
  • The second challenge is that the vast majority of this information is unstructured. Oftentimes we take for granted that information in a database or XML format contains implicit context that makes it much easier to organize and retrieve. If a database table has a date field, we that to find recent updates to content we should sort on that specific attribute. If we need to find inexpensive products in a catalog it is easy to sort by price. Unstructured content lacks this context, free form text contains keywords which search engines can use to statistically index but this content must be analyzed and organized to maximize it’s use. Price or dates hidden within the text are treated like any other keyword.
  • The third challenge is that the conventional search and browse experience is not good enough. Here is a typical example that we see in the retail banking. Retail banks have invested tremendous resources in online banking portals but suffer from limited adoption. In this use case, a prospect may come to a website looking for college loan offerings but because typical search engines yield no results when users ask specific questions, the woman does a keyword search for “loan”, hundreds of results are returned and nothing on the first page is relevant to her specific need. Indexing content as text that users can access through keyword search, yields “feast or famine” results, many lack relevance and there is no efficient way to refine queries. In essence the user needs to either be very patient or a trained researcher to be effective. The implication is that many users abandon search and explore alternative high cost channels (phone, email, etc.) to find information
  • Finally these three issues have created inherent tension between line of business and IT. LOB needs to provide their customers, partners, and employees with the right information necessary to drive their business objectives. IT wants to standardize their infrastructure across an organization. Historically these two objectives have been in conflict because enterprise search infrastructure traditionally has not provided the user experience or business control that a business needs to be successful. This had lead to investments in numerous point solutions that partially address individual departmental needs but are costly to maintain and do not provide any cross-organizational leverage.
  • IBM Content Discovery is a key element of the IOD stack that helps organizations streamline business processes and generates new levels of insight by going beyond search to find. We address all of the key challenges associated with Information Overload by providing native, bi-directional access to content repositories across an organization, a text analytics framework that can uncover the inherent meaning of unstructured content, a contextually relevant end user experience that guides people to action, and the core technology, architecture, tools, and market solutions that balance the needs of business and IT.
  • Build busines
  • With a holistic view of information, a CSR can provide seamless customer service across different product lines. Venetica allows applications to work across disparate systems, such as those acquired through a merger.
  • See Add’l Slides Section – “SWG Services and BP Connectors”
  • Couple of pages to describe how things are different – very useful if the customer is coming from a DB2 or Content Management rather than Portal perspective of data. To complicate matters, searching enterprise content is very different from searching the Web. Searching different content sources means different techniques are required to determine document relevancy, different security and access models must be utilized, and different user requirements must be met. Some of the most successful search techniques for the web (page-ranking, for example) are not optimized for an enterprise environment where documents are not generally as interlinked to each other. To address these issues, IBM has developed information indexing and retrieval techniques specifically focused on solving the problem of delivering highly relevant results for intranet and other corporate content.
  • Expectations for portal functionality are strongly tied to those that support collaborative activity. WII OmniFind and Content Editions help build the IBM collaboration value proposition.
  • IBM WebSphere II OmniFind Edition provides complete enterprise search through a scalable, high performance search platform. It delivers highly relevant results with sub-second response times, is able to crawl a broad set of key enterprise data sources leveraging the broader II platform. And it has an open architecture for partners and customers to deliver advanced capabilities and industry-specific analytics and vocabularies on top of OmniFind. Finally, it supports text analytics and semantic queries for more accurate search results and for making unstructured information available to traditional data-oriented business intelligence applications.
  • Now if we take a closer look at OmniFind…This chart summarizes the phases and key technologies used to prepare your enterprise content for search Step 1 – Content is first extracted from its source through a process called “crawling”. Similar in concept to the crawlers used for the web but also applied to non-web data sources. Step 2 – The content is then parsed and tokenized to identify individual words Step 3 - Categorization – The documents are then optionally categorized Step 4 – The documents are then further annotated with features found in the text. This is where the advanced text analytics is applied that I mentioned earlier. We’ll drill down on this topic later. A document might be annotated to identify proper nouns, dates, relationships between words, and so on… Step 5 – Once the documents have been tokenized and annotated they are ready for indexing. Global analysis is performed on the entire set of documents to determine its static ranking. A common GA task would be to perform link analysis on web documents for example. The more documents that are linked to it for a particular reference raises its rank for that reference (or keyword). Step 6 – And lastly the index is made available for searching.
  • WebSphere Portal 5.1.0.x In order to support searching secured WebSphere Portal pages, customers must submit searches from the OmniFind Search Portlet. Searches submitted from the standard ESSearchApplication will not have the proper credentials in order to properly verify end user's access to document in the index. Must deploy ESPACServer.ear on Portal server Other sources are planned to be staged in based on customer demand
  • Configurable water marks, icons for result source, Readable dates Title cleanup
  • Users also complained about being confused by results, or not being provided with precise answers, or being shown text-only results. Here, you can see a customer who asks “How much will my mortgage payments be?” He is shown a rich HTML mortgage payment calculator, which enables him to directly answer his question. On the right hand side of the UI, he can also see an interactive promotion, and “Act Now” links to guide his action.
  • Neiman Marcus Online Launches iPhrase OneStep Self-Service Search and Navigation Solution Luxury Retailer To Use OneStep To Deliver Advanced Search November 25, 2002 CAMBRIDGE, Mass., November 25, 2002 - iPhrase, a leading provider of self-service search and navigation software, today announced that its OneStep™ solution now powers the search functions across NeimanMarcus.com. Neiman Marcus Online chose iPhrase to upgrade its online search capabilities and provide the renowned luxury retailer with the most user-friendly, accurate and intuitive search interface available. Neiman Marcus Online's deep commitment to customer satisfaction prompted it to implement the OneStep platform. iPhrase's natural language technology allows Neiman Marcus' customers to ask questions on the site in every day conversational language, as though speaking with a store associate. OneStep removes any ambiguity in requests, retrieves the most pertinent content available from multiple sources-both structured and unstructured-and takes the user directly to an existing page or dynamically creates a result page specifically tailored to the user's request. iPhrase OneStep is extremely tolerant to language, spelling and usage, and provides a feedback loop that tells the user how the question was interpreted. For example, a NeimanMarcus.com customer may ask about "leather handbags under $500." OneStep queries all relevant databases, collects the most relevant answers, and presents them back to the user in OneStep, eliminating the need to sift through multiple or irrelevant results. "Delivering relevant search results is a critical component of the Neiman Marcus online experience," said Michael Crotty, vice president of marketing at Neiman Marcus Online. "We go to great lengths to ensure that NeimanMarcus.com meets and surpasses our customers' expectations, not only in fashion leadership, but also in innovative presentation of merchandise. Of all the vendors we evaluated, only iPhrase OneStep was able to meet our standards." Neiman Marcus Online benefits from a self-service search and navigation solution that is flexible enough to support content across its entire site, including its product catalog. Some of what Neiman Marcus was looking for in a search solution included: the highest accuracy and relevancy of answers to shoppers requests; the flexibility to search the constantly changing offering of products without the costly demands of constantly maintaining the search engine; and to provide shopping support information and guidance from the same interface as their catalog site. "iPhrase OneStep delivers a superior online self-service experience. This improves customer satisfaction and converts searches to sales far faster than conventional search applications," said André Pino, senior vice president of marketing, iPhrase Technologies. "We are proud that retail leaders such as Neiman Marcus Online rely on OneStep to exceed their customers' expectations." About The Neiman Marcus Group, Inc. (NYSE: NMGa) The Neiman Marcus Group, Inc. is a leading national retailer offering distinctive products that connect with the discerning consumer. The Group includes Neiman Marcus Stores, Bergdorf Goodman - and the direct marketing segment, Neiman Marcus Direct and Neiman Marcus Online. These renowned retailers offer upscale assortments of apparel, accessories, jewelry, beauty and decorative home products. About iPhrase iPhrase is the leading provider of self-service search and navigation software for mission-critical applications. iPhrase offers the patent-pending OneStep platform, which combines natural language processing, multi-source retrieval, dynamic presentation and in-depth analytics to simplify access to high-value information. Through superior relevancy and usability, iPhrase OneStep provides a powerful return on investment in critical applications at leading companies such as Charles Schwab & Co., LexisNexis, Lycos and TD Waterhouse. iPhrase's investors include Charles River Ventures, Greylock, Reed Elsevier Ventures, Sequoia Capital, TD Capital Technology Ventures and Bain Capital. Founded by former MIT researchers and business leaders, iPhrase has offices in Cambridge, MA and San Mateo, CA. For more information, please visit www.iphrase.com or call 617/577-4300.
  • Content Discovery Suite will help customers to: Break down information silos by enabling users to seamlessly access and act on content stored in diverse repositories across the enterprise Maximize the value of information by uncovering the inherent meaning of unstructured content through text analytics Improve productivity by delivering contextually relevant information to the right people at the right time via intuitive user interfaces Streamline business processes and generate new levels of insight by going beyond search to find; and, Make better business decisions by gaining visibility into emerging trends and problems, and empowering subject matter experts to enhance the end user experience
  • Content Discovery Suite will help customers to: Break down information silos by enabling users to seamlessly access and act on content stored in diverse repositories across the enterprise Maximize the value of information by uncovering the inherent meaning of unstructured content through text analytics Improve productivity by delivering contextually relevant information to the right people at the right time via intuitive user interfaces Streamline business processes and generate new levels of insight by going beyond search to find; and, Make better business decisions by gaining visibility into emerging trends and problems, and empowering subject matter experts to enhance the end user experience
  • We provide three key product offerings to power these types of solutions. WebSphere Information Integrator Content Edition offers our comprehensive Content Integration Services that allow organizations to manage, leverage, and extend their enterprise content without painful ripping and replacing efforts. It offers virtual, bi-directional access to dozens of content repositories via a single development interface allowing organizations to increase productivity, mange risk, and lower development costs. WebSphere Information Integrator Omnifind Edition provides a robust search and text analytics foundation that allows organizations to quickly implement intranet search applications and develop advanced BI solutions that uncover meaning from text documents. WebSphere Content Discovery Server is able to tap our end to end stack to help organizations quickly deploy line of business solutions that increase revenue and reduce support costs by understanding user intent and application context to find the right information and present it in a way that guides people to make purchases, answer questions, and solve problems. Next we will review examples of customers scenarios that have taken advantage of these capabilities to enhance their businesses.
  • Linguistic support improves document search results. Linguistic processing performed during 2 stages: when the document is added to the index and when a user inputs a query during search. Once the documents language is determined, then specific linguistic functions can be applied to segment the text string into words and lexical units. Word segmentation – distinguishing words. This is challenging in languages like Japanese and Chinese that do not use white space separators Stemming – find “mice” when searching for “mouse” Break contractions into parts – make “wouldn’t” into “would” and “not” Clitics – a form of contractions, make “l’avenue” into “le” and “avenue” Recognize non-alphabetic characters as part of or separate from a lexical unit, e.g., URLs, dates Recognize abbreviations Recognize end of sentence for sentence segmentation
  • Linguistic support improves document search results. The first step in linguistic support is detecting the national language of the document. This works best for mono-lingual documents. Once the documents language is determined, then specific linguistic functions can be applied to segment the text string into words and lexical units. Word segmentation – distinguishing words. This is challenging in languages like Japanese and Chinese that do not use white space separators Stemming – find “mice” when searching for “mouse” Break contractions into parts – make “wouldn’t” into “would” and “not” Clitics – a form of contractions, make “l’avenue” into “le” and “avenue” Recognize non-alphabetic characters as part of or separate from a lexical unit, e.g., URLs, dates Recognize abbreviations Recognize end of sentence for sentence segmentation

IBM Software Group | DB2 Information Management Software IBM Software Group | DB2 Information Management Software Presentation Transcript

  • Nigel Freeman Content Discovery specialist - IBM Software Group [email_address] May 2006 Information is Everywhere Managing Information for Discovery and Search
  • Agenda
    • Too much information – drowning or swimming ?
    • IBM is going beyond mere ‘search’… IBM Content Discovery architecture
    • Content Integration services : making connections between existing systems
      • Information Integration Content Edition – overview
    • Enterprise Search : not the same as Internet search
      • What do you need from Enterprise Search and text analytics middleware?
      • OmniFind – overview
    • Text Analysis : - Unstructured Information Management Architecture UIMA
    • Contextual Delivery, Information Accelerators to generate customer solutions
      • WebSphere Content Discovery Server – overview
    • IBM Content Discovery products, summary
    • Customer Examples
  • Drowning in information, or swimming?
    • Organisations today are faced with an ever-growing abundance of information. The lack of a proper systems to access and manage their collective wisdom can cripple an organisation - not being able to find the relevant information when it is needed or finding it too late translates into bad decisions, missed opportunities, wasting time and money reinventing information that already exists.
      • “ It is clear that we are all drowning in a sea of information. The challenge is to learn to swim in that sea, rather than drown in it.” - from a study by University of California, Berkeley School of Information Management and Systems
    • By implementing cutting-edge systems for organizing and accessing information, organisations will promote growth at significantly reduced cost to today’s enterprise.
      • “ An enterprise with 1,000 knowledge workers wastes $48,000 per week – $2.5 million per year – due to an inability to locate and retrieve information.” The High Cost of Not Finding Information, IDC
    • IBM w3 advertisement “w3 personalisation…”
  • Information is isolated in multiple silos … Independent Systems Customer Service Council Tax Social Services Education Leisure Services Planning Housing The problem…
  • … and the vast majority is unstructured
    • Office Documents
    • Images
    • Web pages
    • E-mail
    • Audio & Video
    • Free-form text fields (comments/notes)
    • File servers
    • Websites
    • Portals
    • ECM systems
    • Collaborative systems
    • Databases (BLOBs and free-form text fields)
    Examples Where It Exists
  • Typical search experience is not good enough “ Loan” I need help finding a loan for college Typical Online Experience Burden of discovery is on the end user!
  • There is inherent tension between business and IT
    • Line-of-Business Owners and Project Leads
      • Must deliver information to their specific customers , partners and employees to facilitate business process
      • Care most about best of breed functionality and direct control over the end user experience
    • IT Architects and CIOs
      • Must make information available from across the enterprise in a secure and standard format
      • Care most about achieving leverage and reuse, with a low total cost of ownership
    Search App 1 Search App 2 Search App 3 Enterprise Search Infrastructure
  • The IBM Approach : Content Discovery
    • Information is isolated in multiple silos
      • Native, bi-directional access ensures all assets are available and content can be continually improved
    • Much of it is unstructured, limiting its use
      • Uncovering the inherent meaning of unstructured content can enhance search relevance, giving new levels of business insight
    • Traditional search is a bottleneck to facilitating action
      • Understanding user intent and application context allows organizations to get the right information to the right people at the right time
    • IT wants standards but business wants control
      • Complete solutions built on a Service Oriented Architecture allow organisations to balance the needs of business and IT
    Going Beyond “Search” to “Find”
  • IBM Content Discovery Architecture Content Discovery Analysis & Discovery Services Content Integration Services Information Accelerators Search & Indexing Text Analysis (UIMA) Contextual Delivery Extract knowledge and meaning, for greater relevance and insight Industry vocabularies and solution templates shorten deployment time Broad content access and native integration for secure read and write access Scalable search capability with sophisticated indexing and retrieval Understand user intent and context, to guide action and navigate large result sets
  • Content Discovery Analysis & Discovery Services Content Integration Services Information Accelerators Search & Indexing Text Analysis (UIMA) Contextual Delivery
  • The Problem: Multiple Silos of Content 36% 14% 25% 17% 1 repository 5% 2-5 repositories 6-10 repositories 10-15 repositories 4% More than 15 repositories Don't know Survey base: 81 North American decision-makers (multiple responses accepted) “ The Future of Content in the Enterprise,” Connie Moore and Robert Markham
  • WebSphere II Content Edition
    • SOA, enterprise-class integration architecture for “content”
    • Single interface to multiple content sources and workflow systems
    • Many “out of the box” connectors and toolkit for custom connectors
    • Two-way access to expose underlying functionality
    • Adds cross-repository services such as federated search, event services, single sign-on, etc
    • “ Out of the box”client, development components and APIs for building custom applications
    Lets you work with content from multiple disparate content sources - as if it were stored in one unified system
  • Display associated metadata with the ability to preview a document and update content or properties Provide a single point of access to all documents associated with the customer, regardless of where they are stored Content Integration Services Seamless Access to Distributed Content from Business Applications
  • WebSphere II Content Edition Integration Services
    • Many Out-of-the-Box Connectors
      • Pre-built and fully supported real-time, bi-directional connectors
      • Exposes content, workflow and functionality of underlying systems
      • Available for most major commercial systems, including…
    • Connector SDK for custom systems
    INTEGRATION SERVICES
      • Documentum Content Server, FileNet Content Services, FileNet Image Services, FileNet P8 Content Manager, FileNet P8 Business Process Manager, Hummingbird DM,
      • IBM Content Manager, IBM Content Manager OnDemand, IBM Portal Document Manager, Lotus Domino Document Manager, IBM Lotus Notes, IBM WebSphere MQ Workflow,
      • Interwoven Teamsite Content Server, Microsoft Index Server, OpenText Livelink Enterprise Server, Stellent Content Server, File Systems, Lab Services, Partner Connectors
  • WebSphere II Content Edition Federation Services
    • Meta Data Mapping
      • Common schema across different systems
    • Federated Search
      • Single search interface across multiple disparate systems
    • Virtual Repository
      • Single, unified view of distributed content
      • Consolidated view of work tasks from multiple workflow systems
    • Subscription Event Services
      • Subscription-based notification of changes to content, across multiple repositories
    • View Services
      • Convert content on-the-fly to browser-readable formats (eg PDF, HTML)
    • Single Sign-On (SSO) authentication
      • Native and integration with LDAP and Active Directory
    INTEGRATION SERVICES FEDERATION SERVICES
  • WebSphere II Content Edition Developer Services
    • Federated Client
      • Complete out-of-the-box UI for working with distributed content
      • Includes key functionality and a highly usable interface
    • Web Components
      • Accelerates time to market for custom applications
      • Development components plug into web applications
      • Completely customizable look and feel
      • Includes JSR 168 compliant portlets
    • WebSphere II Content Edition API
      • Complete access to content and workflow functionality
      • Easy to use Java API and SOAP-based Web Services API
    INTEGRATION SERVICES DEVELOPER SERVICES FEDERATION SERVICES
  • IBM Federated Records Management
    • Consists of
    • IBM DB2 Records Manager, WebSphere II Content Edition, FRM Solution Components*
    • Key Features
    • Central policy mgmt on distributed content
    • “ Touchless” records declaration
    • Federated search for discovery operations
    • Two-way, consistent UI to content systems
    … the application of records management to distributed content
    • Business Value
    • Reduce risk with centralized RM policies
    • Accelerate time to compliance
    • Reduce discovery costs
    • Consolidate over a phased timeframe
    • Provide a “future proof” infrastructure
    Leave records in native repository Move records to strategic repository at declaration *Services Offering
  • Content Discovery Analysis & Discovery Services Content Integration Services Information Accelerators Search & Indexing Text Analysis Contextual Delivery
  • OmniFind: it’s not Google… … because Intranet Search is different from Internet Search
    • Corporate intranets are smaller … but it’s more difficult to return highly relevant results
      • Less content in a corporate intranet … lower chance for perfectly matching document
      • Less well linked – fewer links and anchor text cues – so Page Ranking isn’t the answer
      • The heterogeneous nature (both in form and size) makes search precision difficult
  • Search and content management are the top two capabilities expected by 289 Portal customers Q26: For which solutions do you plan to keep your existing tool, and for which would you like the portal to provide? * Base = Those with portal solutions implemented, planned or under evaluation. Reference: Enterprise Portal Purchase and Usage Characteristics, Final Report, META Group Multi-Client Study, November 2003 21% 79% Windows desktop 32% 68% Desktop productivity (spreadsheet, word processing, etc.) 37% 63% Application server 41% 60% Activity Tracking 48% 52% Taxonomy 54% 46% Enterprise application integration (EAI) 57% 43% Directory 57% 43% Collaboration 59% 42% Process automation/workflow 59% 41% Authentication/single sign on 60% 40% Reporting 61% 39% Content management 68% 32% Search Would like Portal to provide Intend to keep existing tool
  • WebSphere II OmniFind Edition Crawl Index Search
    • Excellent search quality
    • Complements and uses IBM’s offerings in portal, content management, and Information Integration
    • Crawls a broad range of enterprise data sources
    • Leverages systems’ own security mechanisms
    • Open architecture (UIMA) for text analytics and semantic queries
    • Rich multilingual capabilities
    Keyword search Semantic search Text analysis
  • Key Technologies Parsing/ Tokenizing
    • HTML / XML
    • 200+ Doc Filters
    • Advanced Linguistics
    Search Applications Categorization (optional)
    • Dynamic & Admin-influenced ranking
    • Fielded Search
    • Parametric Search
    • Semantic search
    Searching Text Analytics
    • Partner Apps
    • UIMA
    Indexing
    • Global Analysis
    • Static Ranking
    • Store
    Security Sources of Enterprise Content Crawling
    • Scalable Web crawler
    • Data Source crawlers
    • Custom Crawlers
  • OmniFind Crawlers
    • Web content
      • HTTP / HTTPS
      • News groups (NNTP)
      • WebSphere Portal portlets and Portal Document Manager
    • Collaboration
      • Lotus Notes /Domino databases, Domino.Doc, QuickPlace
      • MS Exchange public folders
    • Windows and Unix File systems - over 250 file formats: PDF, MS Word / Excel / Powerpoint, Lotus SmartSuite, etc etc
    • Enterprise Content Management systems
      • DB2 Content Manager
      • via WebSphere Information Integrator Content Edition : FileNet Content Services, FileNet P8, Documentum, Hummingbird DM, OpenText LiveLink and more in future
    • Relational Data sources
      • DB2 family (DB2, Informix, DB2 for z/OS)
      • WS Information Integrator relational data sources (Oracle, Informix, MS SQL Server, Sybase)
    • Federated access to LDAP and JDBC
    • Data Listener API for Custom crawlers
    II Standard Edition Content Manager QuickPlace Domino Domino.doc MS Exchange Windows File System Unix File System Websites Newsgroups Data Listener II Content Edition SQL Server
  • OmniFind Security
    • Security can be set at Collection level or Document level
    • OmniFind uses the application’s own security for Access-Control Lists for the following data sources:
      • Lotus Notes / Domino
      • Domino Document Manager
      • QuickPlace
      • WebSphere Portal Document Manager
      • Portal pages
      • FileNet CS
      • Windows File System
      • Documentum
    • Linguistic Support
    • The document language is detected automatically and used for language-specific result filtering at search time. Language-specific base form computation (eg “mouse” for “mice”) is provided.
    • Automatic language detection also works for Arabic, Hebrew, Hungarian and Turkish (but no base form support yet).
    • Basic Support
    • Text is segmented using either white space information (for simple text languages) or n-grams (for complex text languages).
    • If simple and complex script languages are mixed in one document, the best segmentation strategy (either white space or n-gram) is selected for each individual script range within the document.
    • Basic support processing should work for all languages . No language limitation is built into OmniFind.
    • IBM tests basic support for the following list of languages:
    • Simple Text Languages (STL)
    • Albanian, Bulgarian, Belarusian, Catalan, Croatian, Estonian, Hungarian, Icelandic, Indonesian, Kazakh, Latvian, Lithuanian, Macedonian, Malay, Romanian, Serbian (Cyrillic & Latin), Slovak, Slovenian, Turkish, Ukrainian
    • Complex Text Languages (CTL)
    • Arabic, Bengali, Gujarati, Hebrew, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu, Thai, Vietnamese
    Language Support in OmniFind OmniFind has Linguistic support for: Chinese (Simplified & Traditional), Czech, Danish, Dutch, English , Finnish, French, German, Greek, Italian, Japanese, Korean, Norwegian (Bokmal & Nynorsk), Polish, Portuguese, Portuguese, Russian, Spanish, Swedish
  • Search & Indexing Services Simple “Google” Style Search for Enterprise Content
    • Out-of-the-box search application provides “Google”-style results list with paging
      • relevancy ranking, date, field values
      • site collapse
      • customizable look and feel
    Configurable ‘Quick links’ provide immediate access to predetermined relevant sites, documents or applications Broad support for searching across enterprise content sources “ Did you mean?” synonym expansion provides one click access to other potentially relevant queries or can be used for spelling correction
  • Content Discovery Analysis & Discovery Services Content Integration Services Information Accelerators Search & Indexing Text Analysis (UIMA) Contextual Delivery Unstructured Information Management Architecture (UIMA)
    • Most BI implementations ignore knowledge buried within free form text
      • They can only report on predefined structured data, such as problem codes…
      • Problem descriptions, technician comments, call center notes and customer correspondence can contain a lot of the supporting details required for true insights
    Text Analysis Services Leveraging Knowledge Buried in Unstructured Information
  • Text Analysis Services Extract Knowledge From Unstructured Information
    • Identify concepts, entities and facts buried in unstructured content
      • Determine underlying issues or problems, parts referenced and actions from technician or customer service notes, customer surveys, consumer review sites and other sources
    • Extracted knowledge can now be sent to a search engine, database or delivered as a service to rules processing engines and other business applications
    • Provide broader access through more simplified search and browse interfaces
    PART 1: Fuel Pump PART 2: Fuel Filter PART 3: Wiring Harness PART 4: Wiring Harness Cover PROBLEM 1: Corrosion PART 3: Wiring Harness ACTION 1: Replace PART 1: Fuel Pump PART 2: Fuel Filter ACTION 2: Remove PART 4: Wiring Harness Cover
    • Report on facts extracted from unstructured information
      • Show other parts referenced, underlying root problems or issues, and actions taken…
      • Create alerts to be notified of specified findings or thresholds
    • Provide simplified search interface extending access to broader set of users
      • Easily find information about claims involving a fuel pump…
      • See all of the other parts, problems and actions referenced in the warranty claim
    Text Analysis Services Leveraging Knowledge Buried in Unstructured Information
  • Identify Language Find Words & Roots Categorization Plug In Annotator Plug In Annotator Extracted Metadata and Facts Text Search Index WebSphere II OmniFind Edition Plug In Annotator Plug In Annotator UIMA
    • UIMA: Unstructured Information Management Architecture : a “plug and play” framework for advanced text analysis components
    • UIMA framework allows “Annotators” to add value to text
      • find words specific to an industry, from dictionary or by rules
      • add further information around these terms, like Latitude/Longitude for places
      • allow Indexed and annotated results to go to other processes / systems as well as to a Search Engine, for further analysis or semantic search
    Data Warehouse Rules Engine ...any Application Search Application Reports
  • Content Discovery Analysis & Discovery Services Content Integration Services Information Accelerators Search & Indexing Text Analysis Contextual Delivery WebSphere Content Discovery Server (iPhrase)
    • WCDS demo on-screen “WCDS Self Service demo.exe”
  • WebSphere Content Discovery for Self Service Embed Rich HTML responses within result Interactive promotion guides action Understands user intent and provides actionable response
  • Contextual Delivery Services Integration into Contact Centres facilitates faster Problem Resolution Launch query for possible resolutions directly from Siebel Call Center… … leverage context and customer info to automatically find most relevant content Return integration enables creation of new solutions based on findings Enable agents to easily filter content by source, product and other attributes
  • Contextual Delivery Services Business User Control Empower business managers to easily refine the end-user experience Monitor end-user behavior and effectiveness of business rules
  • IBM Product Offerings       Integrating Content from Multiple Sources into Business Applications      WebSphere Content Edition WebSphere OmniFind Edition WebSphere Content Discovery Server Infrastructure for Enterprise Search and Text Analytics Business Driven Search Applications Contextual Delivery Search & Indexing Text Analytics Content Integration 
  • Customer Examples Content Discovery Analysis & Discovery Services Content Integration Services Information Accelerators Search & Indexing Text Analysis (UIMA) Contextual Delivery
  • Wachovia improved business effectiveness and addressed compliance issues by providing integrated view of all content
    • Access and work with content from multiple repositories following mergers
    • Deliver repository independent customer service, brokerage and workflow applications
    Growth through Acquisition
    • Greater accessibility resulted in 50-fold increase in number of content retrievals
    • $2.3 million savings within 2 years for a 64% return on initial investment
    • $1 million savings for each additional business unit implementing content integration services
    • Business executives have immediate access to newly acquired systems
    Content Integration Challenge Benefits
  • IFPMA makes it easier for doctors and patients to research clinical trial information worldwide
    • Doctors and patients need to find info about all clinical trials sponsored by the pharmaceutical industry
    • Unstructured information from multiple companies and clinical trials registries
    • Enables searching by disease area, medicine name or trial location
    • Recognizes medical and geographical synonyms across multiple languages, without manual indexing
    • Allows doctors and patients to find trials they can join and review summarized results
    Search & Indexing Text Analytics Challenge Benefits
  • CBI Engineering increased productivity by allowing employees to access Lotus Notes from their intranet search solution
    • Need for improved search relevancy across file system and Lotus Notes to make engineers more productive
    • Must respect security already defined within Lotus Notes
    • Common search framework for intranet, file system and Lotus Notes content
    • Engineers able to seamlessly access native Notes documents from intranet search results
    • Allowed CBI to provide broad content access while honoring stringent native repository security
    Search & Indexing Challenge Benefits
  • IBM Workplace for Customer Support (Lotus Premium Support) increased customer satisfaction and productivity with Content Discovery
    • Revitalize customer interest in using lower cost online support channel
    • Streamline customer self-sufficiency while continuing to deliver personalized service from IBM support staff
    • Increased customer satisfaction through the delivery of relevant information in 3 clicks or less
    • Unified content from disparate repositories to simplify problem resolution
    • Enabled resolution of repetitive product problems in less than five minutes
    • Decreased number of problem management reports submitted
    Personalization enables results to be automatically limited to customer owned products Customers can escalate and preserve context Enables searching across multiple content stores and easy user navigation Contextual Delivery Challenge Benefits
  • Summary
    • Getting the right information to the right people at the right time is a key element of achieving Information On Demand
    • IBM is building this capability around a portfolio of
      • Content Integration
      • Text Analytics
      • Search & Indexing
      • Contextual Delivery
      • Information Accelerators
    • IBM Content Discovery brings these capabilities together to help organizations drive measurable results for their business
  • Thank You Any questions ?
  • The IBM Content Discovery software portfolio WebSphere Content Discovery Server WebSphere II OmniFind Edition WebSphere II Content Edition Allows organizations to … Quickly deploy business driven solutions that increase revenue and reduce support costs
    • Records Management
    • M&A Content Migration
    By providing … Example initiatives A rich understanding of user intent and application context to help people quickly find the information they need to make purchases, answer questions, and solve problems Implement a single search architecture to underpin enterprise portal and BI initiatives Robust enterprise search capabilities and a text analytics foundation able to uncover the inherent meaning of large volumes of content from around the globe Manage, leverage and extend their enterprise content without painful ripping and replacing Virtual access to dozens of content silos via a single interface to increase productivity, manage risk, and lower development costs
    • Issues Analytics
    • Intranet Search
    • eCommerce
    • Self-Service websites
  • OmniFind - Linguistic Analysis
    • Linguistic processing when adding document to index
      • Determines language of document
      • Tokenizes text
      • Creates index using tokens
    • Linguistic processing performed during search
      • Query string segmented, analyzed, searched in index
    • Stop word removal – removing “a”, “the”, etc.
    • Character normalization
      • Normalization performed in Unicode
      • Case normalization – finding documents with “USA” when searching with “usa”
      • Umlaut normalization – finding documents with “shoen” when searching with “schön”
      • Accent removal – finding documents with “é” when searching for “e”
      • Other diacritics removal – finding documents with “ç” when searching for “c”
      • Ligature expansion – finding documents with “Æ” when searching for “ae”
      • Normalization works in both directions
  • OmniFind - Linguistic Analysis
    • Recognize documents in a wide range of languages:
      • Arabic, Chinese (traditional and simplified), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Polish, Portuguese (Brazilian), Russian, Spanish, Swedish, Turkish
    • Dictionary-based linguistic support for documents in recognized languages
      • Word segmentation
      • Stemming, find “mice” when searching for “mouse”
      • Break contractions into parts, make “wouldn’t” into “would” and “not”
      • Clitics, a form of contractions, make “l’avenue” into “le” and “avenue”
      • Recognize non-alphabetic characters as part of or separate from a lexical unit, e.g., URLs, dates
      • Recognize abbreviations
      • Recognize end of sentence for sentence segmentation
    • Basic support for documents not in a recognized language
      • Word segmentation via white space or blanks, and, n-gram segmentation