Endeca business white paper for media and publishing


  1. 1. Information Access Solutions forMedia and PublishingEndeca Business White PaperENDECA55 Cambridge Parkway Cambridge, MA. 02142Telephone 617.577.7999
  Information Access Solutions for Media and PublishingEndeca Business White PaperTABLE OF CONTENTS1. INFORMATION ACCESS AND RETRIEVAL CHALLENGES 3 1.1. Introduction 1.2. The Negative Business Impact 1.2.1. Poor customer acquisition and retention 1.2.2. Lost revenues 1.2.3. High technology costs, high content management costs 1.3. Technology Obstacles to Traditional Information Access and Retrieval 1.3.1. Search behavior: human-centered design 1.3.2. Why traditional search technologies often fail 1.3.3. Managing complex content2. THE ENDECA PLATFORM – MAXIMIZING INFORMATION ACCESS AND RETRIEVAL OF ALL KINDS OF DATA 6 2.1. Guided Navigation – A Breakthrough Technology 2.1.1. Faceted navigation overcomes the limits of taxonomy solutions 2.1.2. An intuitive, easy-to-use interface 2.1.3. The power to search both structured and unstructured data 2.2. Advanced Search Features 2.2.1. Integrated search and Guided Navigation 2.2.2. Sharp answers to fuzzy questions 2.2.3. Adding structure to unstructured content 2.2.4. Targeted searching 2.3. Content Spotlighting 2.4. Additional Platform Features 2.4.1. Single interface to multiple data sources 2.4.2. Open architecture 2.4.3. A high-performing, low-cost infrastructure3. ENDECA ROI 12 3.1. Improved Customer Retention and Acquisition 3.2. Increased Revenues 3.2.1. Transaction revenues. 3.2.2. Advertising revenues. 3.3.3. Subscription and registration revenues 3.3.4. Licensing revenues. 3.3. Lower Total Cost of Ownership4. CONCLUSION 145. FOOTNOTES 14
  3. 3. 1. INFORMATION ACCESS AND RETRIEVAL CHALLENGES1.1. IntroductionInformation providers of all types, including directories, news and magazine publishers, and multimedia content suppli-ers, continue to make substantial investments in their online business. This market is still growing quickly, as consumersspend a larger percentage of their time online, and revenue dollars follow them. For example, online ad spending grew28.6% in the second quarter of 2005 while newspaper print ads grew only 1.9% in the same period.1To take best advantage of this opportunity, traditional media and publishing companies have diversified into online delivery– investing heavily in popular web technologies and IT resources. But this is only half the battle. The fight for market andwallet share on the web is equally fierce. New, free online media – web search engines, portals, and blogs – are gain-ing momentum in the traditional media space. In 2004, 37% of households used the free web as their only informationsource.2 This free content puts downward pressure on margins – and turns content into a commodity.To win online, media and publishing companies must differentiate themselves by offering not only premium content, butalso a better user experience. But getting content online – and, more important, making it easily accessible – isn’t simple.Backlogs of proprietary information are often huge, and content and data sources are proliferating almost exponentially.Even once this information is online, it usually isn’t easy to find. Users input a query and often get a million results, orworse -- get no results. Then they have no meaningful way to browse further to find what they are looking for, aside fromtaking another shot in the dark. This is the same frustrating search experience they have on the free web.But for companies in the business of providing information online – which have to compete with the free web – successfulinformation access is paramount. Poor search has a serious negative business impact on customer retention and acquisi-tion and, consequently, revenues. Information access failures can affect short-term and long-term profitability.What can companies do about this problem, and why should they do it, considering their already significant IT invest-ments? Here’s a closer look at how these search failures negatively affect the business, why they occur, and what newsolution can improve search success to help differentiate a site and provide competitive advantage.1.2. The Negative Business ImpactOver time, information access and retrieval difficulties produce negative business consequences in three areas – customeracquisition and retention, revenue, and profitability.1.2.1. Poor customer acquisition and retentionWith an increasing array of information sources available—including commodity search engines like Google and Yahoo andcontent aggregators such as wikis and blogs– premium content providers must first attract customers away from thesefree resources, and then ensure that they stay on the site. Even if there is unique content on the site, users will quicklyabandon it if they aren’t able to find that content. Customers will stay on the site and continue to come back to that site ifthey believe that there is valuable content available and they have an easy way to access it.These search failures can have a short- and long-term impact on the success of the business. In the short-term, searchfailures diminish customer satisfaction and loyalty and, if constantly repeated, result in failed relationships and lost cus-tomers. What’s more, if enough customers defect because of poor search and retrieval, negative word-of-mouth spreads.The ability to attract new customers is hampered, brand equity is destroyed, and market share begins to deteriorate. Thecompany is suddenly at a significant competitive disadvantage.In short, just providing users with premium content isn’t enough to get a competitive edge if that content is not easily ac-cessible. A superior user experience – one that is better than the hit-or-miss searches of free Internet resources – is anecessary differentiator to encourage customer loyalty and site usage.1.2.2. Lost revenuesInformation access failures also lead to lost revenues in several different areas, including subscription and licensing rev-enue, transaction or sales revenue, and advertising revenue. 3© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  4. 4. Online publishers that rely on subscription fees as a primary revenue source need an effective information access solutionin order to acquire and retain subscribers. If users can’t easily find relevant content, they won’t see the value in paying fora subscription and/or they won’t see the need in renewing a current subscription. As a result, overall revenues from thesubscription service will decline. Multimedia content providers that depend on licensing revenue will also see a declinein business if their customers fail to renew their license agreements because they are unable to find the content that theyare seeking. Competitors with better search experiences will steal these users.A primary source of revenue for some media and publishing companies may be from sales transactions – commonly re-ferred to as “pay-per-piece” transactions. For example, in addition to their subscription-based services, market researchfirms sell their reports individually, and some online publishers sell certain articles individually. Multimedia contentproviders may sell photos and other graphic images, audio files, and video files on a per-piece basis. Each failed search isa lost transaction.Poor search also has a subtler, but just as significant, negative impact on advertising revenues. Over the past few years,the focus for directories and information publishers has primarily been on getting their data from print to the online pub-lication. As a result, the search experience is poor, and users are not getting the value that they expect from these sites.Now add to this content access problem the fact that advertisers spend money on sites that have high page views and givethem the ability to target ads to relevant customers. The result is twofold: low traffic, as users quickly abandon the site tolook for an alternative source to find the information, and also low advertising revenue, as advertisers abandon the sitebecause their click-through and conversion rates are low.To lure advertisers to their site, companies need not just a certain volume of traffic, but high-quality traffic, which cantypically be measured by the number of unique visitors to the site and the number of page views per unique visitor or persession. These traffic figures speak directly to the volume and quality of the traffic, and, consequently, affect the advertis-ing revenues generated by the site. Publishers charge advertisers on a CPM (cost per thousand impressions) or CPC (costper click) basis. If an advertiser believes that it is going to receive high-quality traffic (and a large volume of it), it will bewilling to pay a higher CPM or CPC.Site traffic also affects the amount of the ad inventory. The fewer the page views, the less ad inventory there will be tosell to potential advertisers. In the worst-case scenario, advertisers fail to patronize the site at all because of the poten-tially poor traffic metrics. Consequently, and once again, search failures can have a negative impact on another importantrevenue stream for media and publishing companies – advertising revenue. A high-quality user experience, especially onethat is providing easy access to sought-after content, can and will determine the amount of ad revenues generated in theshort- and long-term.1.2.3. High technology costs, high content management costsMost media and publishing companies have already made a large investment in hardware, software, and IT talent in orderto implement their site and get their premium content online. Adding search capabilities often requires expensive, multi-ple servers to handle the volume of traffic and the added complexity of searching large volumes of data. What’s more, withmost search technologies, maintaining and updating the content once it’s online can be equally expensive, especially if thesite requires the creation and maintenance of hard-coded taxonomies. Other search engines may also require expensivehardware in order to add additional content or update the existing content because of the complexity of the software appli-cation. To ensure a high quality user experience, business user tools are incredibly important but are often non-existent,expensive, and/or difficult to use. When the number of users and the amount of revenues stagnate or decline, all of thesecosts together can result in a high total cost of ownership and, consequently, lower total profitability.These negative impacts are interrelated. Their common underlying problems lie in the inherent limitations of commonlyused search technologies, especially for searching complex information collections involving multiple data sources thatinclude both structured and unstructured data types.1.3. Technology Obstacles to Traditional Information Access and RetrievalMany web design packages and other related software, such as content management systems, come equipped withsearch capabilities. Companies often choose to either leverage these existing systems for search or buy other traditionalsearch technologies. These companies quickly discover that this traditional search functionality is limited in its ability toactually find the relevant information. This is particularly true if the data has both structured and unstructured charac-teristics, is large in volume and complexity, and resides in disparate repositories. A closer look at human search behaviorand traditional search technologies shows why these limitations exist. 4
  5. 5. 1.3.1. Search behavior: human-centered designWhen searchers are having a hard time finding the right content, it’s not for lack of ingenuity; it’s for lack of the right toolsto match their inventiveness and flexibility. While many search software companies have upgraded their technology toimprove existing tools, the right approach lies in first studying human searching behavior to figure out which tools are theright tools to build. While such user-centered design is considered a best practice in other endeavors, it has been over-looked in the critical field of information access and retrieval – until now.Research conducted by top information scientists and user experience experts supports the notion that people looking forinformation follow consistent behaviors over a wide range of tasks. They follow a particular pattern of behavior as theyinitiate a task, which changes as they proceed through the task, and then as they either finish or abandon a task. In orderto continue to make progress, they need a different set of tools at each step of the process. The appropriate approach,missed by traditional search technology, arms users with all the tools they need to find what they want. When users finallyhave the right tools, the tools themselves feel intuitive and transparent, creating a superior user experience and resultingin customer satisfaction and loyalty. In short, the path to business objectives lies in supporting user goals.31.3.2. Why traditional search technologies often failTraditional search technologies each have their inherent limits: • Keyword search: The effectiveness of keyword (or full-text) search relies on users to predict what might constitute a good query, yet paradoxically, they don’t yet know enough about the content to know what to ask. Prediction fails because wrong guesses yield the extremes of “no results,” or too many answers, especially in response to broad keywords – and no helpful guidance on why their prediction failed. For example, searching on a generic term like “sports” in a large collection of news articles can generate thousands of results. Furthermore, relevance algo- rithms fail to put the most useful results at the top of long lists. If users don’t know precisely what to ask for as they initiate their search task and there is no effective way to narrow the list of results, users will quickly abandon the site and look for the information elsewhere. • Navigating taxonomies or fixed classifications: For information seekers to navigate fixed hierarchical taxonomies, they have to make predictions about where to find the content they want. Are all the possible articles about the war in Iraq under the “International” branch in “News”? Certainly not. There may also be articles about the war in Iraq under “Terrorism” in “Politics” branch. In other words, some information may be hidden because customers don’t know the “right” branch of the hierarchy to select. They need to choose the path that will lead them to the right content, which means they have to make the right decisions about which branch to choose at each decision point. If the search isn’t productive, there’s no way to know why, and there isn’t any way to adapt or iterate their behavior in order to make progress. Moreover, fixed taxonomies are expensive to maintain because successful search paths are hard coded and need to be changed as the data changes.The limitations of these different technologies are especially apparent in searching the complex content offered on thesesites.1.3.3. Managing complex contentMedia and publishing companies have several different types of data in various repositories and collections, each with itsown access and retrieval challenges: • Directories primarily have highly structured data, typically located in databases that are frequently updated. If the directory combines data from several databases, there may be different metadata or taxonomies for each database, especially in cases where the data comes from external sources (for example, aggregating a number of regional Yellow Pages print directories). In addition, a directory site may also offer some unstructured data. For example, a job site might have multiple data repositories – for job postings, company profiles, and lists of employers – but also a collection of unstructured data in the form of resumes. Combining structured and unstructured data and combin- ing data from different repositories have been particularly challenging for traditional search technologies. This is due to the fact that most technologies were built on the assumption that all data would be in the same format and would be located in the same repository. • Online publishers primarily have unstructured data, i.e., long-form text documents. Looking through thousands of results – each with pages and pages of text—for the right piece of information is a tremendous challenge for users. 5
  6. 6. That’s why it’s critical for companies to supply readers with an intuitive experience that allows them to easily and quickly identify which document is the most relevant. Traditional search technologies do not have the capabilities to extract structure from unstructured data. Yet this structure is necessary to provide users with the context they need to make these refinement decisions. Some search technologies offer rigid taxonomies or categorization schemes, but that won’t suffice either for the reasons discussed above. In addition, most online publishers use sophisticated content management systems to store, tag, and publish the data. Consequently, it is essential that the search en- gine has the capability to extract data from these systems in order to allow for fast and flexible indexing. • Multimedia producers and suppliers have a completely unique type of data in the form of images, audio, and video – requiring flexible and powerful indexing and search capabilities. In these instances, it is even more important for the search technology to leverage the metadata associated with the content because that’s where most of the con- text for searching resides. This information may be held in a digital asset management system – requiring adaptors to extract it for data collection.If the common search technologies discussed above are employed in cases where complex data exists, information re-trieval challenges will persist– and information seekers will avoid using the applications whenever possible.The good news is that there is now a breakthrough solution designed to overcome the “million or none” results impasseof traditional search technology and to access and retrieve all kinds of data across diverse systems. All of this is possiblewithout encountering high implementation and maintenance costs that are associated with many site implementations.2. THE ENDECA PLATFORM – MAXIMIZING INFORMATION ACCESS AND RETRIEVAL OF ALL KINDS OF DATAEndeca provides solutions and best practices designed specifically for each kind of information provider—directories,online publishers, and multimedia suppliers. Underlying these three solutions is a common technology core: the EndecaInformation Access Platform, which includes the Endeca Navigation Engine. This core technology overcomes the limita-tions of traditional search engines and addresses the data challenges facing media and publishing companies.Built on the Endeca Information Access Platform, Endeca solutions for directories, online publishers, and multimediacontent providers offer a single, fast, easy, and effective way to search and browse large volumes of data in structured andunstructured formats – across all types of systems. Endeca solutions integrate search and navigation, providing the flex-ibility and control needed to allow users to search intuitively and effectively. These solutions also return the results of allsearches in a precise navigation context that improves users’ future predictive search choices, give them relevant tools toadapt their search at each stage, and encourage meaningful search iteration and revision.At each stage of the search process, customers progress toward their goal, which means that they are staying on the sitelonger and returning to the site more often, resulting in an increase in site activity (for example, page views, click-throughrates, session duration, etc.). And because they ultimately find what they want, they are satisfied and become long-term,loyal customers. The Endeca technology platform also offers a low total cost of ownership and is designed for ease ofinstallation and use.Based on nine pending patents, the Endeca Information Access Platform includes the following features and capabilities,which make these customer and financial benefits possible.2.1. Guided Navigation – A Breakthrough TechnologyThe Endeca Information Access Platform includes the Endeca Navigation Engine, which executes innovative browsingtechnology called “Guided Navigation.” This helps users refine and explore relevant results to overcome the “million ornone” obstacle, so they can quickly and easily find what they are looking for and even discover information they didn’tknow existed. Specifically, Guided Navigation provides:2.1.1. Faceted navigation overcomes the limits of taxonomy solutionsIn general, navigation helps users who are not familiar with data to ask smarter questions by exposing all the choices thatare available to them. But Guided Navigation goes far beyond current browse solutions by making a new kind of navigationpossible. Based on faceted navigation, a multi-dimensional approach advocated by information scientists as a far moreefficient and easy-to-use way to find information than taxonomies, Guided Navigation: 6© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  7. 7. • Creates hundreds of valid browse paths to each record, rather than just the few paths available in a taxonomy, tre- mendously increasing the likelihood that a user will find a record • Allows users to prioritize their choices in their own personalized way rather than forcing users down the arbitrary path of the taxonomist • Updates all navigation options at each click, showing users all the valid questions they can ask next and eliminating millions of possible deadened paths. • Integrates fully with search, making it possible to refine long lists of search results, and search navigation options. (See Section 2.2.1 below for additional details about integrated search and Guided Navigation)2.1.2. An intuitive, easy-to-use interfaceSimply calculating which questions users can ask next is not enough to facilitate search success because there are usuallythousands of choices. In fact, the best way to organize navigation options changes markedly as users narrow from a vast Guardian Unlimited has seen significant increase in search activity on the site as readers use Guided Navigation to browse and refine their search results. 7© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  8. 8. space, down close to a result. Adapting to the changing situation, Guided Navigation intelligently reorganizes those optionswith each click in the most meaningful, relevant way – and presents those new choices in a clear on-screen list that showsusers the next step.The result is a more effective interface, strongly preferred by end-users over traditional solutions, that provides easy ac-cess to the power and flexibility of Guided Navigation. Users see progress as they search and discover related informationthey didn’t know existed, so they remain on the site, exploring the data and finding relevant content. Because their searchexperience is meaningful and successful, they are satisfied and consistently return to the site.2.1.3. The power to search both structured and unstructured dataAlthough media and publishing companies have different data profiles, Endeca’s technology was built with inherent capa-bilities to meet their different needs. Endeca can handle a wide range of data formats: from unstructured documents withbasic metadata or fielded information; to semi-structured customer data, product information, XML pages, and auto-clas-sified documents; to highly structured parametric data and databases. In fact, the Endeca Information Access Platformallows users to seamlessly bridge and explore large content collections consisting of structured, unstructured, or bothtypes data—from all kinds of sources: content management, digital asset management, and other enterprise systems;relational databases; file servers; websites; intranets; and portals. Endeca technology also supports more than 350 fileformats and 250 languages.But searching structure is not enough; users must be able to navigate structure to leverage its full value. However, data-bases and search engines are optimized for either structured data or unstructured data and miss the full value in bridgingthe two. The Endeca Information Access Platform captures the most valuable aspect of structure: navigating relationshipsbetween records. In a patent-pending process called “meta-relational indexing,” the Endeca Navigation Engine builds outall the latent connections between structured and unstructured elements in the data. This indexing process enables it tohandle sources with differing metadata and taxonomies as well as unstructured data. As a result, customers can find whatthey’re looking for because they’re searching within a relevant context, and sites eliminate costly labor expenses typicallyassociated with the taxonomy and content management process.2.2. Advanced Search FeaturesEndeca incorporates best-of-breed search functionality to help users quickly and easily find the information they need.Unlike other search solutions, it gives better results by analyzing information in context and leveraging structured, un-structured, and relational information to give users the most meaningful results. Specifically, it provides:2.2.1. Integrated search and Guided NavigationTraditional enterprise search applications create artificial distinctions between search and navigation and structured andunstructured information because they are designed around legacy technology limitations. Endeca is the first solution tofully integrate search and navigation, giving users the speed and power to search—and bridge—structured and unstruc-tured information in their searches. • Guided Navigation: Analyzing search logs reveals that users typically enter broad one or two word queries for the vast majority of searches, leading to a uselessly long list of results. Guided Navigation solves this pervasive problem by instantly returning the results of all searches in a precise navigation context that shows users all the valid ways to refine and explore further. The navigation context exposes and organizes structure associated with search results in a meaningful way to help users find information. • Combination of navigation category and full-text matches: Search queries are resolved against both structured navigation categories (which link to more relevant results) and full-text fields (which return a more extensive set of results). For example, a search for “Florists” in a directory application returns a category match like “Personal Services > Florists,” navigation categories such as “Events & Occasion,” and navigation refinements such as “Fu- nerals,” as well as a ranked list of businesses that are most relevant to the word “Florists,” In an online publishing application, a search for “Iraq” returns a category match like “International Relations > Iraq,” navigation categories such as “Publication Year,” and navigation refinements such as “2005,”,as well as a ranked list of articles with the word “Iraq” in the title, author, body, or other critical fields. 8© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  9. 9. 2.2.2. Sharp answers to fuzzy questionsTypical search engines respond with “no results found” to roughly 25% of queries, without giving users any confidence thatthe system even understood their query. This happens because users have no way to know the exact spelling, syntax, orword choices used in the underlying data. Endeca’s variant search uses linguistic analysis and the following techniquesto fix many of these near misses, relieving the user of the burden of having to know the precise terminology of the databefore they can ask useful questions: • Spell correction: Endeca’s smart algorithms combine phonetic analysis on search terms and underlying data to correct misspellings and detect alternate spellings. This patent-pending technology is based on the data in the par- ticular data set, removing the need to build and maintain a custom dictionary. Yet companies can tune the phonetic spelling corrector to make trade-offs between search precision (i.e., getting only the exact or very close results) and search recall (i.e., returning more results to ensure that data relevant to the user’s search isn’t missed). • Word stemming: Linguistic analysis of data finds word form variations including plurals, prefixes, suffixes, and conjugations. • Bi-directional thesaurus and synonyms: Customized thesauri and synonyms are implemented at both the naviga- tion category and full-text level. For example, a user’s query for “sushi” in a restaurant directory can be expanded to return the navigation category “Cuisine > Japanese” and/or all items with the word “Japanese cuisine” in their text description. Moreover, Endeca technology can perform asymmetrical synonyms matches, in which a search for “Iraq” would also return articles containing the keywords “Baghdad” and “Saddam Hussein,” but a search for “Sad- dam Hussein” may not return all articles with the keyword “Iraq.” What’s more, synonyms can be maintained over time with simple GUI tools, and regular search logs can help identify new terms to add to the thesaurus and list of synonyms. • Relevance ranking: Endeca’s unmatched, highly configurable relevancy ranking makes sure that the right results are at the top of the list. Endeca offers a variety of relevancy ranking modules that take into account a broad range of factors including term frequency, word positions and proximity, document date, document popularity, what field the term occurs in – and many other characteristics. These modules can be flexibly tuned and combined to execute sophisticated, customized search strategies that optimize information retrieval in the context of a specific applica- tion – rather than just offering a black-box approach to relevance like many competing solutions. Developers can even combine modules in different ways to create different search strategies within one application. For example, the relevancy ranking can change depending on which specific set of documents a user is searching, which specific part of the application a user is searching, or even which user is searching.2.2.3. Adding structure to unstructured contentEndeca is a leader in extracting and exploiting structure from semi-structured or unstructured data.4 This occurs duringits data transformation and indexing processes by a number of methods: • Entity extraction: Endeca automatically extracts entities – people, places, and organizations – found in unstructured documents based on a variety of natural language processing techniques and statistical inference. In addition, the extraction process is self-training. Once a new type of entity is extracted in a number of documents – for example, product names—Endeca subsequently automatically extracts product names as metadata during the indexing pro- cess. • Inherent metadata: Endeca can extract the metadata – data about documents such as their date of creation, file type, and file size—from more than 370 file types, including documents with no inherent structure such as Word and PDF files. This valuable information is then used by Endeca’s Guided Navigation and search features for informa- tion access and retrieval. This capability is particularly powerful in cases where documents have some consistent metadata – for example, in content management systems—and is critical for unstructured data. • Contextual metadata: Endeca can also extract and leverage existing information about records held in a file system. For example the file structure, including elements of the file path, can be parsed and added to the record as meta- data. A document containing information about a company’s next product release may be found using a file path such as “Product Management > 2005 Product Releases > Product Release 2.0.” This information can be used as for making search refinements through Endeca’s Guided Navigation capabilities. In cases where file structures are very hierarchical, this process can add several layers of metadata. 9© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  10. 10. • Concept extraction: Endeca offers the ability to extract key concepts from unstructured data via existing or im- ported, pre-built thesauri involving hundreds of industry-standard taxonomies in dozens of subject domains and languages. These thesauri also expand queries to include related terms. • Rules-based tagging: Endeca can use rules to add still more tags to documents during its process of acquiring con- tent from original sources. Rules can be as simple as tagging all documents containing the text “MSFT” or “Micro- soft” with <Microsoft> or as sophisticated as employing Boolean syntax and developing a rule stating <if X AND Y> and <date=June03> add <TAG> for records from June 3 that include both X and Y. To facilitate implementing rules- based tagging, Endeca leverages industry-standard thesauri, taxonomies, and controlled vocabularies.2.2.4. Targeted searchingUsers have a powerful but easy-to-use suite of functionality to hone the recall and relevance of their results: • Search within results: Users can refine their search process by launching iterative searches against their results. (They can also refine results with Guided Navigation.) • Parametric search: A parametric search interface gives users the option to simultaneously filter by ranges of information along multiple navigation dimensions. The parametric search options dynamically update as the user selects refinements, so that the user will never reach a dead-end. He or she will only have the ability to select a combination of refinements that lead to actual, relevant, results. • Dynamic concept discovery: Endeca offers users the ability to refine results by concept clusters. For example, a search for “eagles” will return thousands of relevant articles in an online publishing application. Endeca’s technol- ogy will then help users refine the results to get to the article they’re looking for by presenting clusters of articles relating to unique but relevant key concepts – for example, the sports team (Philadelphia Eagles), the band (Eagles), and the birds. • Automatic phrasing: Automatic phrasing: Endeca treats a series of words – for example, “Tom Cruise”—as a single phrase, improving the relevancy of results. For example, in this case it might be set to only return documents where “Tom” and “Cruise” are adjacent, greatly enhancing the precision of results. Endeca can also offer users the oppor- tunity to opt in or opt out of the phrasing.2.3. Content SpotlightingContent Spotlighting is an out-of-the-box capability for highlighting specific, relevant content on-screen as well as gener-ally grouping or arranging search results – based on defined business rules. Frequently used in merchandising for cross-selling and up-selling, it can also be used to disclose popular or richer content related to a query or for targeted adver-tising. For example, if a user is searching for articles on the “Red Sox” in an online publishing application, the businessowner could use Content Spotlighting to highlight premium content that is only available on their site like live highlightvideos, player statistics, or articles from featured sports columnists. If a user is searching for a high-paying nursing job inthe Buckhead area of Atlanta on an online job site, the business owner can use Content Spotlighting to offer its advertisers(hospitals) the opportunity to buy highly targeted advertising inventory (for example, on web pages with content on nurs-ing, high salary range, Buckhead) instead of just the category “Nursing”.Integrated with search and Guided Navigation, Content Spotlighting is data-driven, interactively responding to users’search activity – as specified by the business rules. It can be triggered by search terms or Guided Navigation choices. Itcan also be triggered by user profile information. During a query, rules are dynamically selected to provide users with themost relevant content possible – i.e. content related to both what they are looking for and to the user’s profile (for ex-ample, demographics, click behavior, etc.). This capability represents an advanced feature that other search technologiescan’t provide dynamically and at scale.As a result of these features, Content Spotlighting significantly helps users find what they are looking for and, more im-portant, frequently enables them to discover information and content that they didn’t know existed. It also enables com-panies to promote premium or featured content and highly relevant and targeted advertisements. In this way, it boostssearch effectiveness and efficiency and creates a very compelling user experience. Business owners can use ContentSpotlighting to highlight the premium content that’s available on their site (and only their site) and help users see thevalue in the paid subscription or registration. This contributes to greater customer satisfaction and loyalty and creates site“stickiness” and repeat usage. 10© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  11. 11. World Book makes it possible to search content of all types, supplementing articles in multiple languages with rich media include videos, audio clips, photos, and structured tables -- from multiple content repositories.Content Spotlighting is also easy to implement and manage – even for complex content collections. Business users – with-out IT help – can easily define the rules that drive Content Spotlighting placements using an intuitive, web-based Endecainterface designed specifically for their needs, versus the needs of the IT department. Once the rules are implemented,they are updated dynamically, and changing the parameters is easy. As a result, the need to use costly IT resources forthese tasks is eliminated, and business managers spend less time managing the placements – decreasing costs overall.2.4. Additional Platform FeaturesIn addition to supplying users with unique technology that promotes search success and increased site activity, the Endecaplatform is designed for ease of implementation and maintenance, lowering the burden on IT resources and providingcompanies with a successful information and retrieval solution with a low total cost of ownership.2.4.1. Single interface to multiple data sourcesAs mentioned, content often originates in separate data stores or includes various document formats and structured dataschemas. The Endeca Information Access Platform crosses these boundaries to give users a seamless and single accesspoint to all data, regardless of its origin. A search might transparently cross, for example, image files, XML files, and PDFsbecause Endeca supports: • Multiple formats: Endeca can search the most popular document formats including PDFs, Word docs, HTML, and many more (over 350 different file types). Likewise, structured data might originate in an RDBMS, XML database, or many other sources. • Multiple data sources: Data can originate in separate silos, and users can search all sources from a single inter- face. • Permissions: Individuals and groups can gain access to subsets of data based on their login ID. Guided Navigation options always perfectly reflect only the valid choices available to a specific user, giving everyone a customized view. 11© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  12. 12. 2.4.2. Open architectureThe Endeca Information Access Platform extracts and integrates data from multiple disparate sources including relationaldatabases, file servers, web sources (XML files), and content management systems and other packaged applications. It in-tegrates with diverse sources systems via packaged adapters and APIs to transfer data by a range of approaches, includ-ing data extracts; adapters, web crawlers, file server crawlers, and its own SDK – the Endeca Content Acquisition Develop-ers’ Kit for building custom adapters.2.4.3. A high-performing, low-cost infrastructureThe Endeca Information Access Platform provides a powerful solution at a low total cost of ownership, based on the fol-lowing features and capabilities: • A standards-based architecture: Endeca integrates easily into the enterprise infrastructure. At the data level, Endeca has been designed to work with content of all kinds of systems and formats. It also integrates easily with other applications via a rich set of APIs. This flexibility makes the Endeca platform easy to deploy and allows com- panies to leverage their existing architecture. • Easy scaling: Because Endeca is built on a distributed platform, it scales easily for both increasing data volumes and site traffic while maintaining fast search performance – just by adding inexpensive, commodity servers. • High performance: Endeca provides sub-second response times to queries because its meta-relational indexing makes highly aggressive use of memory, multi-threading, index compression techniques, and cache engineering. This speed enhances the user experience, contributing to customer loyalty.3. ENDECA ROIBecause of its innovative technology, the Endeca Information Access Platform meets the challenges of finding the rightinformation in complex content collections. Furthermore, media and publishing leaders have found that Endeca solutionsare quick and easy to deploy and maintain, and are enthusiastically adopted by broad audiences of information seekers. Asa result, they produce early and continuing ROI in several areas.3.1. Improved Customer Retention and AcquisitionThe Endeca Information Access platform offers users a powerful, intuitive user experience that highlights premiumcontent and differentiates the site from other commodity content sites – promoting customer satisfaction and, ultimately,customer retention and acquisition. Endeca’s fast and easy indexing gets content on-site quickly and cost-effectively – en-suring media and publishing companies have rich, up-to-date, content that their competitors lack.Because of Endeca’s powerful Guided Navigation, advanced search, and Content Spotlighting capabilities, customers caneasily find the premium content they seek and can even discover previously unknown but relevant information. Featureslike Endeca’s intuitive interface, configurable relevancy ranking, and scalability also enhance the customer experience andensure a search proceeds to the right result quickly and easily. For example, Endeca enabled World Book to increase thespeed of its search eight-to-ten times over its previous technology while offering richer search results (i.e., images, maps,etc.) relating to the subject being researched.In addition, Endeca’s reporting tools provide sites with information on usage and trends, like popular search terms, docu-ments, or images. This information allows site developers to fine-tune features like relevancy ranking, thesauri, and Con-tent Spotlighting to further enhance search success and direct customers to desirable and relevant content.As a result of this positive search experience, customers spend more time exploring the site and finding even more rel-evant information. They also return to the site with increasing loyalty – and create a positive buzz. This word-of-mouth, inturn, results in growing brand recognition and easier customer acquisition.Customer results tell the story best. With Endeca solutions: • Calls to customer support at Nando Media (a McClatchy Company) dropped by 15-20% because customers found what they wanted by themselves. 12© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  13. 13. • World Book increased speed of search by 8-10x while providing richer search results. • 78% of users of a leading classifieds directory preferred the new site over the old experience; 72% of those users pointed to Guided Navigation & the Endeca breadcrumb as the driver of loyalty.3.2. Increased RevenuesJust as Endeca’s superior user experience helps to improve customer retention and acquisition; it also leads to higherrevenues. For example, after a site upgrade featuring a new Endeca solution, World Book increased its sales by 20%.With Endeca, revenue benefits can occur in several other areas. Most relevant to media and publishing companies, En-deca can help increase site activity, increase advertising revenues, and increase subscriptions and registrations.3.2.1. Transaction revenues.As companies get more of their premium information online easily and cost-effectively with Endeca’s indexing, and morecustomers find what they are looking for via advanced search and Guided Navigation, conversion rates rise, leading toincreased revenues. Satisfied customers return to the site to look for more research reports or case studies, for example,and the number of purchases per unique visitor increases. As a result, Endeca customers have seen margins and overalltransactional revenues grow significantly.3.2.2. Advertising revenues.As the number of customers and page views increase, this improvement in site traffic and traffic quality directly impactsad revenues – attracting advertisers to the site and creating additional advertising inventory available for advertisers tobuy. In addition, with more pages accessed – especially with visitors accessing different pages and exploring the contentmore deeply so that more pages are visited -- there is more relevant and high-quality ad inventory to sell, and that adinventory commands a higher price.Just as important, the increase in site visits (from repeat and new customers) also improves the likelihood of higherclick-through rates and a larger number of ad impressions (CPM and CPC rates) – especially because Content Spotlight-ing allows sites to target ads to pre-qualified customers based on their search and navigation paths. The result is morerevenue generated per page view and per advertiser. For example, a leading newspaper publisher in the UK saw a stun-ning increase of 20% in page views and 40% in click-through rates.3.3.3. Subscription and registration revenuesWith all of the free content sites available today (for example, search engines, blogs, and content aggregators), it is dif-ficult to justify subscription fees or even free registrations to your potential customer base. The most important way (and,ironically, the easiest way!) to show the value of the subscription fee is by improving the search experience, so that userscan find that premium content that’s available only on your site. If users can’t find the content that really makes up thevalue of the subscription fee, there’s no way they’ll pay for access, and they won’t even take the time to register to accessthe content. In 2005, InfoCommerce Group reported that companies with subscription-based services lose 15-20% of theirsubscriber base each year because they couldn’t find the information they were looking for, even though it actually did ex-ist. Additionally, 25% of paid registrants log into the service once, find that the experience is difficult and frustrating, andnever log in again. Obviously that same 25% don’t renew their subscriptions.5 Endeca’s integrated search, Guided Naviga-tion, and Content Spotlighting capabilities give companies the ability to highlight valuable content and users the ability tofind valuable content. As a result, several of Endeca’s customers have seen increases in subscriptions and registrationsas their users quickly see the importance of their content versus the free content sites.3.3.4. Licensing revenues.Once again, because Endeca easily enables users to navigate through content, find what they are looking for, and discovernew content, site traffic increases. As a result, distributors and publishers are willing to pay higher licensing fees for ac-cess to premium data because they can see the value of the content and consistently find the specific piece of content theyneed to support their own businesses. For example, advertising agencies or news publishers are more likely to be willingto pay a higher licensing fee to a stock photography site if they have an easy and fast way to find and purchase the photosthat they need for the print ad or article that will be released in tomorrow’s edition of the daily newspaper. 13© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  14. 14. 3.3. Lower Total Cost of OwnershipThe easy-to-use, open technology of the Endeca Information Access Platform decreases total cost of ownership. Whiletraditional search technologies with rigid schemas and taxonomies require intensive IT efforts to deploy and update andhigh-cost hardware to run queries, Endeca’s special approach to indexing and GUI-driven system tools allows for rapidsystem deployment and maintenance, including data cleansing and updates. Implementing and maintaining an Endecasolution is easier, less time-consuming, and, therefore, less costly – resulting in early ROI. For example, leading informa-tion provider IHS cut millions of dollars in IT labor costs over five years.Furthermore, Endeca solutions run on commodity hardware, reducing the hardware expenses of traditional search. Theyalso scale economically as more data and users are added to the system – just by adding commodity servers.4. CONCLUSIONThe Endeca Information Access Platform brings new information retrieval functionality – and significant financial andcompetitive benefits – to media and publishing companies. Built on innovative Endeca Guided Navigation® technology, itovercomes obstacles to retrieving complex information and exposes relevant content to users.With access to this information through an easy-to-use interface and an intuitive, productive approach to navigating infor-mation, customers find what they are looking for and discover other relevant content. This successful search and browseexperience encourages them to explore the site, viewing more pages and often purchasing or downloading more informa-tion per visit, as well as to return to the site. As a result, revenues –from transactions, subscriptions and registrations,ads, and licensing—grow. And because Endeca technology is easy and cost-effective to use, deploy, and maintain, compa-nies lower their total cost of ownership.In other words, from its initial deployment and throughout its daily use, the Endeca Information Access Platform increasesprofits, lowers costs, and improves customer satisfaction – providing a competitive advantage. These advantages make itan economical—and critical – infrastructure application for media and publishing companies.5. FOOTNOTES1 Outlook, 20052 IDC, 20043 Research on this topic includes: • Nicholas J. Belkin. School of Communication, Information and Library Studies at Rutgers University. An overview of his work can be found at /cladp97.html • Scott Card and Peter Pirolli. Information Foraging Theory. items/UIR-1999- 05-Pirolli-Report-InfoForaging.pdf • Jared Spool. User Interface Engineering Report. • Don Norman. The Design of Everyday Things, (Currency, 1990).4 Forrester Research, “The Future of Enterprise Search,” 2003.5 InfoCommerce 2005, The Conference for Data Publishers, November 6-8, 2005 14© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark ofEndeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.