Enterprise Search - Introduction

  • 1,189 views
Uploaded on

Need a good introduction to Enterprise Search?

Need a good introduction to Enterprise Search?

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,189
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
84
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Endeca: structured
  • The Open Graph Protocol enables you to integrate your Web pages into the social graph. It is currently designed for Web pages representing profiles of real-world things — things like movies, sports teams, celebrities, and restaurants. Including Open Graph tags on your Web page, makes your page equivalent to a Facebook Page. This means when a user clicks a Like button on your page, a connection is made between your page and the user.
  • Most companies have different types of ECM needs. The picture above shows the different types of needs in the context of DM/collaboration/search/Intranet.The picture helps in explaining that each “room” above might induce specific technical and functional requirements.Library= Document management, high level of classification and metadata, Archiving might be desired, search is essentialTeam Rooms= collaboration spaces, less classification, more fuzzy, but many to many, collaborative editing might be needed. Some content from here might be moved to the library at some time.Conference Center= the “classic” Intranet, HR documents, trainings, presentations, ... (this does not require strict DM features. But in some cases, rather WCM features)Expert corner= applications: FAQ, social networking, blogs & wikis, ...Registration Area: security, profiling and personalizationDashboard= portal functionalityData processing= Workflow, Integrations
  • Doesn’t include opensource
  • SBA: software applications in which a search engine platform is used as the core infrastructure for information access and reporting

Transcript

  • 1. Enterprise Search8/12/2011 – Damien Dewitte
  • 2. Enterprise SearchSetting the scene Damien Dewitte Lead ECM consultant 2.
  • 3. ContentssearchThe enterprise search promiseSome thoughts on search scenariosMake your content “findable”Search: How it worksThe enterprise search market 3.
  • 4. 4.
  • 5. While on the Intranet … 5.
  • 6. the Enterprise Search promise 6.
  • 7. The Enterprise Search Promise IDC 2001:‖The High Cost of Not Finding Information” Cost= Poor decisions based on faulty or poor information Duplicated efforts within different divisions/projects Lost sales due to customer‘s inability to find product and services Lost productivity due to employees inability to find information 7.
  • 8. The Enterprise Search Promise Google (2008) 8.
  • 9. The Enterprise Search Promise 9.
  • 10. The Invisible Intranet Using Search on an Intranet usually leaves a huge portion of existing valuable information ‗invisible‘, because Some information silos are not indexed: Databases with structured content External sources Isolated departmental content repositories Individual desktops Content applications ‗in the cloud‘ Digital Archives Some Information is ―over-secured‖ Some Information is trapped in proprietary file formats, which can not be indexed Some Information can not be extracted as text Rich Media files (Audio, Video) Badly scanned documents 10.
  • 11. The Enterprise Search Promise 11.
  • 12. The Enterprise Search Promise … MAIL SEARCH SITE SEARCH DMS SEARCH ECOMMERCE CORPORATE BI SEARCH SEARCH SEARCH Enterprise Search Platform Legacy Data RDBMS Files WWW Direct (e.g. ISAM, (JDBC, ODBC, (e.g. Word, Excel, (HTML, XML, WML, Push VSAM, IMS) SQLNet, DW, pdf, images, mp3) JavaScript) DM) Applications Message Queues (e.g. ERM, CRM, DMS eMail Systems Portals Private Webs (e.g. TIBCO, Help Desk) (e.g. M’Soft CMS, (e.g. Notes, (e.g. WebSphere, (e.g. news feeds, MQ-Series) Documentum) Exchange) WebLogic) Intranets) REAL--TIME STRUCTURED UNSTRUCTURED 1 12.
  • 13. The Enterprise Search Promise―There’s no reason to expect that search is going to get that much better. The basic algorithms by which search is done have not improved much since about 1975.The only way to improve the situation is by enhancing search engines with more deterministic metadata.If you look at the victory of Google, it wasn’t because they had better search techniques. It’s because they deployed one key metadata value – how many pages are linked to this one – to enhance the relevancy of their results. The same concepts need to be applied to the enterprise.‖ (Tim Bray) 1 13.
  • 14. Some thoughts on search scenarios 14.
  • 15. Enterprise versus web search Web EnterpriseContent Mainly HTML and All formats and PDF sources, including databases and legacy systemsSecurity Focus on system Also restricting security user access to specific contentUpdates Via (scheduled) Push updates to crawling the index (near real time)Volume On average: 1000 Potentially: > files 1.000.000 “records”Metadata Centrally in e.g. Consolidate Web CMS metadata frommanagement various source systems 15.
  • 16. Enterprise versus web search Probably the cheapest website search you can find 16.
  • 17. Structured versus unstructured Start by Start by filtering typing 17.
  • 18. Search versus research“Meeting “Ecm and Greenminutes social IT in Europe”collaborationproject” “Amplexor “average time proposal for spent on Intranet” searching for content” “Does ECM have“Timesheets impact onapril 2009” governmental decisions in Spain?” “Life is like a box of chocolates“I know you’re out there..” You never know what you gonna 18.
  • 19. Search versus research Search based on “Meeting Information Type (Meeting minutes, minutes social Proposal, Invoice, Timesheet, …) collaboration Document Format (PDF, DOC, PPT, e- project” mail, …) Organisational Source Projects Products Processes – HR – Compliance – Marketing – IT – … … Publication Date, Modification date Author Search queries are more or less predictable (after analysis) 19.
  • 20. Search versus research Research based on Entities: People Geographical locations “Does ECM have Companies & Brands impact on … governmental Source: Internal or External decisions in Spain?” Publication Date Range Natural language search Search queries are unpredictable. The system should be “taught” how to interpret a query. (natural language search, entity extraction from content, … 20.
  • 21. Metadata What is metadata? Information about the information: Descriptive Structural Administrative Types of metadata: Implicit (e.g. creation date, publication date, URL, filename, file format, source system, …) Explicit (e.g. owner, topic, summary, expiry date, status, …) Guiding metadata input with: Taxonomies Folksonomies Ontologies 21.
  • 22. Taxonomies 2 22.
  • 23. Folksonomies http://taggalaxy.de 23.
  • 24. Ontologies Taxonomies, representing knowledge as a set of concepts within a domain, and the relationships between those concepts http://en.wikipedia.org/wiki/Geopolit 24.
  • 25. Metadata Statement 1: ―A performant Enterprise Search Engine should not require information workers to add metadata. It should just Crawl all my information sources‖ But: Will users understand the results displayed? (title, author, … How will they filter results? Does it really help to crawl 1.000.000 records if 900.000 have become irrelevant over time? 25.
  • 26. Metadata Statement 2: ―Google doesn‘t need metadata‖ Are you sure? 26.
  • 27. Metadata So you think Google doesn‘t need metadata? 27.
  • 28. Simple example of the semantic web 28.
  • 29. Metadata Statement 3: Adding metadata is so time consuming my information workers will never do it. Yes, but: In an structured ECM approach, it is possible to automate lots of the metadata input, because it can be deduced from some business rules If you‘re not 100% sure you will need a metadata field for a specific purpose, then don‘t create it. Convince users about the value of the metadata fields which remain Make it user friendly for content contributors to add metadata 29.
  • 30. Metadata Avoid defining metadata around the document, if it should already be present IN the document. 30.
  • 31. Make content findable 31.
  • 32. Findability Findability is not obtained just by implementing search technology AIIM.org: ―Information Organization and Access (IOA) refers to a collection of technologies to help you organize and find information‖, which includes: enterprise search content classification categorization and clustering fact and entity extraction taxonomy creation and management information presentation (i.e., visualization) information governance 32.
  • 33. Findability Tips & Tricks The more value content has, the more effort should be spent in managing it (and making it findable) 33.
  • 34. Findability Tips & Tricks One search interface doesn‘t solve it all. Keep in mind that Specific content sources or Lines of Business might require specialized search screens 34.
  • 35. Findability Tips & Tricks Define specific search scopes, if your information governance permits … 35.
  • 36. Findability Tips & Tricks Landing Pages are still ―in‖! Projects Overview Page Knowledge base page (links to knowledge bases) Practical Guide (categorized hyperlinks to practical information) Tools Forms Filtered listings (e.g. Automatic listing of all FAQ Content types) 36.
  • 37. How search works 37.
  • 38. How it works Architecture TUNING, ADMINISTRATION Web Content Vertical WEB Pipeline Pipeline Query Applications CRAWLER Files, SEARCH QUERY & RESULT Documents PROCESSING FILE Portals CONNECTORS CONNECTORS TRAVERSE PROCESSING DOCUMENT Multimedia R Results Custom Front-Ends Databases DATABASE CONNECTO R FILTER Pipeline Alert Custom Content Mobile Applications Push Devices Index Files 38.
  • 39. How it works Connect to content sources and get data Web pages (e.g. XML, HTML, WML): Crawler Files, documents (e.g. Word, Excel, pdf): File traverser Database content (e.g. Oracle, DB2): Database connectors Applications (e.g. Sharepoint, Documentum, Exchange, CMS/DMS): Application connectors TUNING, ADMINISTRATION Web Content WEB Vertical Pipeline Pipeline Query Applications CRAWLE Files, R Documents SEARCH QUERY & RESULT FILE Portals CONNECTORS CONNECTORS PROCESSING PROCESSING TRAVERSE DOCUMENT Multimedia R Results Custom DATABASE Front-Ends Databases CONNECTO R FILTER Alert Custom Content Mobile Applications Push Devices Index Files 39.
  • 40. How it works Analyze and index content to make it searchable Convert and process content through pre- processing pipeline: Lemmatization/stemming, entity extraction, taxonomy classification Custom logic (e.g. adding special tags) Write content to index files TUNING, ADMINISTRATION Web Content WEB Vertical Pipeline Pipeline Query Applications CRAWLE R Files, SEARCH Documents QUERY /RESULT PROCESSING FILE Portals CONNECTORS CONNECTORS PROCESSING TRAVERSE DOCUMENT Multimedia R Results Custom DATABASE Front-Ends Databases CONNECTO R FILTER Pipeline Alert Custom Content Mobile Applications Push Devices Index Files 40.
  • 41. Search EngineHow It Works Analyze query Use query language or query API Convert and process query through query pipeline: Linguistic processing Custom logic (e.g. query term modification/addition) TUNING, Web ADMINISTRATION Content WEB Vertical Pipeline Pipeline Query Applications CRAWLE Files, R Documents SEARCH PROCESSING FILE Portals QUERY CONNECTORS CONNECTORS PROCESSING TRAVERSE DOCUMENT Multimedia R Results Custom DATABASE Front-Ends Databases CONNECTO R FILTER Alert Custom Content Mobile Applications Push Devices Index Files 41.
  • 42. How it works Match query to content index Query- and content adaptive matching Exploit all information and structure in the data TUNING, ADMINISTRATION Web Content WEB Vertical Pipeline Pipeline Query Applications CRAWLE R Files, SEARCH QUERY /RESULT Documents PROCESSING FILE Portals CONNECTORS CONNECTORS PROCESSING TRAVERSE DOCUMENT Multimedia R Results Custom DATABASE Front-Ends Databases CONNECTO R Pipeline Alert FILTER Custom Content Mobile Applications Push Devices Index Files 42.
  • 43. How it works Return results to user Convert and process results through result pipeline: Resort, filter for security, organize for dynamic drilldown Pass results on to application (generated or through API) Push results to alert engine and then external environment (e.g. mail, queue) TUNING, ADMINISTRATION Web Content WEB Vertical Pipeline Query Applications CRAWLE R Files, PROCESSING SEARCH Documents RESULT FILE Portals CONNECTORS CONNECTORS PROCESSING TRAVERSE DOCUMENT Multimedia R Results Custom DATABASE Front-Ends Databases CONNECTO R Pipeline Alert FILTER Custom Content Mobile Applications Push Devices Index Files 43.
  • 44. Mediafin 44.
  • 45. How it works Federated Search: Relies on the indexes and the relevance algorithms of the under laying search engines 45.
  • 46. the Enterprise Search market 46.
  • 47. The Enterprise Search Market What‘s the vendors focus? Business Intelligence Text-mining (linguistic support!) E-Commerce Image/Video: Visual Information retrieval Audio/Video: speech recognition eDiscovery … 47.
  • 48. The Enterprise Search Market Enterprise search products can be: Specialized — products that use search to address a need in a specific area like customer service or to supplement business intelligence platforms Integrated — products that merge search capabilities with other information management functions like content management, collaboration or analytics; the goal of these products is to become deeply ingrained in the technology portfolio so that the use of the tool becomes a ubiquitous part of the information workplace Detached — products like Google‘s appliance focused on ease of deployment and flexibility 48.
  • 49. The Enterprise Search Market Forrester (september 2011) evaluated twelve vendors/products in its Market Overview (not including open source): Autonomy IDOL 7  Acquired by HP Attivio AIE 1.3 Coveo Platform 6.5 Endeca Latitude 2  Acquired by Oracle Exalead CloudView 5.1 Fabsoft Mindbreeze 5.0 Google Search Appliance 6.8 IBM Content Analytics with Enterprise Search 2.2 ISYS Enterprise Server v9.7 Microsoft FAST Search for SharePoint Server 2010 Sinequa ES 7 Vivisimo Velocity 8.0 49.
  • 50. The Enterprise Search Market Important Trends Social and collaborative features Mobile support Audio/Video Cloud Spatial support Semantics/text analytics Search Based Applications (―SBA‖) 50.
  • 51. Wrap upSearch Technology platforms are mature and are available on the market in abundance and multiple flavors.But,make sure you are: Cost-effective (what‘s the business case? Priorities?) Consistent in Content classification and Governance Continuously monitoring usage and improving relevance Clever & Pragmatic Creative (User interface, multi-device) 51.
  • 52. Thank you! 52.