Your SlideShare is downloading. ×
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

What Lucene and Solr Open Source Search can do for Enterprise Search

2,097

Published on

Companies like Netflix, Zappos and Monster have all utilized Lucene/Solr, an open-source search development environment ideally suited for large-scale, enterprise search applications. Download this …

Companies like Netflix, Zappos and Monster have all utilized Lucene/Solr, an open-source search development environment ideally suited for large-scale, enterprise search applications. Download this free white paper and

* organize your enterprise search requirements from both technological and economic perspectives
* identify the technological and economic advantages of Lucene/Solr open source search
* learn about support available for designing, developing, and deploying the necessary search solution
http://www.lucidimagination.com/whitepaper/lucene-solr-enterprise-search

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,097
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Guidelines for Managers: What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper
  • 2. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page i Abstract Lucene/Solr is an open-source search development environment ideally suited for large- scale, enterprise search applications. This paper provides some ways to think about your enterprise search requirements from both technological and economic perspectives, explains why a Lucene/Solr-based approach can be optimal, and describes how Lucid Imagination can help you to design, develop, and deploy the necessary search solution.
  • 3. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page ii Table of Contents Introduction............................................................................................................................................................1 Preliminary Considerations..............................................................................................................................2 Know Your Business Requirements ..........................................................................................................3 Know Your Data.................................................................................................................................................4 Know Your Users...............................................................................................................................................5 Advantages of a Lucene/Solr-Based Solution............................................................................................5 Technological Advantages.............................................................................................................................5 Lower Cost, Greater Flexibility....................................................................................................................7 How Lucid Imagination Can Help ...................................................................................................................9 Conclusion............................................................................................................................................................. 11
  • 4. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 1 Introduction Markets are conversations. And today, increasingly communications-rich interactions within companies, and between companies and their stakeholders, are typically preserved and stored, creating ever larger reserves of documents and data. Effective access to a company’s data can be a strategic advantage of potentially enormous value. Email, office documents, databases, customer service chat logs, content management systems, data types representing all forms of communications in the company and with its marketplace, continue to grow in electronic form. It’s not just that the proverbial haystack is growing larger; it also has more types of hay, with many different types of needles to be found. At some point, every function in the company needs access to such data, and these needs can vary significantly across organizations. Search technology can be a standalone system designed to provide a single point of access to the entirety of a company’s data, irrespective of location, container format, or owner. Or, it may provide search functionality as a component within another application. But enabling employees, customers, partners, investors, and other stakeholders to find the information they need when they need it is the goal of any enterprise search solution, no matter where it will be deployed or how it will be used. This white paper provides some ways to approach choosing and building enterprise search solutions, and discusses why Lucene/Solr open source search solutions supported by Lucid Imagination present key advantages. It starts with what must be considered when presented with an enterprise search problem, discusses some attributes of a Lucene/Solr- based solution that could be of special significance in selecting a solution strategy, and concludes by describing how Lucid Imagination can help to design, implement, and support a solution that meets your organization’s needs.
  • 5. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 2 Lucene and Solr are state-of-the-art search technologies available for free as open source from The Apache Software Foundation. Lucene is a powerful search library; Solr provides a platform built on top of Lucene that makes it easy to build Lucene-based applications1. Both incarnations are full-featured and have excellent performance, relevancy ranking and scalability. These technologies are used today by thousands of organizations. They power substantial and diverse search applications at AOL, CNET, Comcast Interactive Media, IBM, Netflix, LinkedIn, MySpace and many others. In many instances, Lucene/Solr solutions regularly index and search tens or hundreds of millions of documents with sub-second response time. Lucene and Solr power substantial and diverse search applications at AOL, CNET, Comcast Interactive Media, IBM, Netflix, LinkedIn, MySpace and many others. Lucid Imagination is exclusively dedicated to providing robust commercial support for Apache Lucene/Solr open source search technology. Our products and services are designed for enterprises currently using or evaluating Lucene/Solr for their search solutions. Preliminary Considerations It is not unusual to think of the Web the minute search is mentioned, and with good reason: nowadays, even small companies can have a large Web presence, and most workers and consumers use the Web every day. 1 Most organizations use Solr today as their search development platform. As Lucene is the older of the two technologies, and serves as the core of Solr’s search capabilities, we’ll refer to them together, as Lucene/Solr. For more on the technologies, see http://www.lucidimagination.com.
  • 6. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 3 But even in small companies, web pages typically represent only a fraction of the important, text-based data to which stakeholders need access. Spreadsheets, slide decks, PDF’s, project management files, electronic design documents, chat logs and e-mail may all contain information that will be critical in any number of business situations. Similarly, within even small companies there can be a need to support search usage models not typically found on Web-centric systems. For example, the ability to conduct collaborative searches may be critical to productivity in some contexts. To create an optimal enterprise search solution, it is essential to know your: • Business requirements: What needs must be met to create competitive advantage for your enterprise, and how you will know when they are met? • Available data: What and where is the content you have to work on, and how is it structured (e.g., does it form a natural “cascade” of sub-classes?) • Users: What do they need to search and how will they prefer to search for it? Along any of these dimensions, there is potentially huge variance, from one case to another. The goal of the discussion here is not to provide an exhaustive checklist of issues. It is intended rather to suggest the sorts of questions that should be considered at the earliest possible stage of development. Know Your Business Requirements Applications for enterprise search and their associated requirements are as diverse as the organizations that need them and the data they need to search. However, there are two characteristics by which any search solution will ultimately be judged: • Performance. Does the system return results quickly enough to fulfill the expectations of the critical mass of users? How does it perform under peak loads? Will the performance scale adequately as usage increases? Is enough known about probable evolution to build the system in such a way that it will sustain projected growth with minimum enhancement, let alone wholesale re-structuring? Additionally, what is the cost associated with obtaining that scale? • Relevance. How well will the system find the data that the user needs, and how good a job will it do presenting query results in an appropriate way and in the best order? What techniques, implicit or explicit, are required to get user assessments of relevance?
  • 7. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 4 In some cases, additional criteria of success will focus on areas such as system security or legal and regulatory compliance. However, a focus on optimizing for both performance and relevance is essential to designing and building an effective search solution. Know Your Data As noted above, your search solution can include input data of any type stored in any format or container type — ranging from project-specific program management files to sets of database records with relevant unstructured text fields. The better you understand the data domain of the search system, the more effective your resulting searches will be, and the higher the probability that your system’s success metrics (starting with performance and relevance) will be achieved. By beginning with a data audit, you can gain an understanding of: • Number and types of documents. How many documents does your system need to support, and how big are they, both individually and in aggregate? Answers to these questions will have implications for performance design and planning. Similarly, knowing about document types, formats, etc., is essential to ensure adequate access by means of file filtering or other data preparation or pre-processing steps, and so is crucial both for performance and relevance. • Key fields. For structured or partially structured data, certain fields may carry more weight than others in determining relevance. For example, a document’s title may be assigned a higher relevancy weight than its size. • Internal information structure. Even less formal documents can have key structural attributes Let’s illustrate by example: Imagine that the data domain of a search query includes the unstructured text of consumer-electronics blogs. Although the text itself is unstructured, the information within it may have a fair amount of structure, including, for example, names of manufacturers and their products, product capabilities such as storage or resolution, etc. The structure has a “shape” to it, from more general to more specific. Thus, a manufacturer’s name can be associated with many product names, product names with attributes or capabilities, etc.
  • 8. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 5 Know Your Users The human dimension of the search solution of course presents the most variables of all. Who will your users be, and in what roles are they most likely to use the search application, e.g., consumer, research scientist, salesperson, manager, all of the above? Of equal if not greater importance than the “who” and “what” of your users is the “how.” Is there a need for collaborative search or to structure a “work flow” into the search process itself? Is there a need to provide different levels or types of access to different classes of users? Or will your application be extracting search results to feed to another application, without presenting it to users at all? Advantages of a Lucene/Solr-Based Solution We discussed above some key success criteria for enterprise search solutions primarily in terms of the performance and results relevance of the search application. As a business- decision maker, you may find that useful, but still a little too abstract. You are likely to ask yourself at some point: How will my enterprise search solution help me either to make or save money? While it is true that the Lucene and Solr software are free, there’s much more to it than the attractive price. Let’s take a closer look at the question of making and saving money using Lucene and Solr, from the vantage points of both technology and economics. Technological Advantages Lucene is the core search library; Solr is the logical starting point for most developers building search applications with Lucene/Solr technology for their web site, product, or internal organizational use. Let’s look at how Solr helps you build search, and then how Lucene executes it. Solr is a layer of code on top of Lucene that transforms Lucene into an enterprise search platform, and simplifies programming by extending to a broad variety of common, easier-to use development environments. Key features include: • Web services. Solr places Lucene over HTTP, allowing programs written in any language to invoke Lucene Search. It provides access via REST-like interfaces, or
  • 9. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 6 from a full array of open-standards based development environments, languages, and tools, including, for example, Python, PHP, Ruby, Ruby-on-Rails, etc. • Faceting, which is the grouping of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field (for example, choosing different attributes of shoes at Zappos.com, or searching Wikipedia by sub-articles, or navigating news articles at cnet.com ). • Easy configuration for managing which fields are indexed, and their characteristics. • System administration tools for data loading, index replication, monitoring, logging and cache management. “How will my enterprise search solution help me either to make or save money? While it is true that the Lucene and Solr software are free, there’s much more to it than the attractive price. Lucene, the core search engine, is a Java-based search library available for free as open source under the Apache Software License. At the heart of the application’s “search engine,” Lucene exhibits attributes that enable applications employing it to deliver world-class user satisfaction. These include: • Outstanding speed. Supports sub-second performance for most queries. • Strong relevancy ranking and full results processing. Great out-box precision returns the information (documents) that users need without including a lot that they don’t. These results are presented clearly by relevancy, date, field, or any document property—and can be sorted by these attributes. Additional supported features, like highlighting and spell checking, let you extend search interactivity, making the refinement process easier and more conversational. • Complete query capabilities. Offers a full array of query methods: keyword, Boolean and +/- queries, proximity operators, wildcards, fielded searching, term/field/document weights, find-similar, spell-checking, multi-lingual search, and
  • 10. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 7 more. This means that your search solution can be flexible enough to accommodate an enormous range of user preferences and data types, from the simplest to the most complex. And because Lucene/Solr are open source, users can readily tailor queries to very specific needs. • Unsurpassed portability. Runs on any platform supporting Java, and indexes are portable across platforms. You can build an index on Linux and copy it to a Microsoft Windows machine and search it there. This makes it easy to leverage advances in hardware and Operating Systems while minimizing additional development costs for faster and better search functionality. There are also open source ports of Lucene for many languages besides Java, including .NET, C, Python and others. • Excellent scalability. Scales from document sets of hundreds to hundreds of millions and beyond. • Easily manageable, highly flexible deployment options. Enables “shrink-to-fit” deployments, ranging from single-server to fully distributed, multi-server systems, with its low overhead indexes and rapid incremental indexing (especially with versions 2.3 and later). While no single search technology is best on each of these dimensions for every application, Lucene is among the best out-of-the-box on all of them. Together, Lucene and Solr provide the foundations for a search solution that is fully capable and functionally complete. When the capabilities and attributes listed above are essential requirements for your enterprise search needs, Lucene/Solr is a prime candidate for fulfilling them. Lower Cost, Greater Flexibility When evaluating the economic advantages of a Lucene/Solr-based enterprise search solution, it is useful to consider competing solutions from the perspective of non-recurring and recurring costs: • Non-recurring costs: Requirements gathering, system design and specification, system development (implementation), and testing are all more-or-less non- recurring costs. Another important element in this set is the cost of software acquisition (licensing or purchase).
  • 11. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 8 • Recurring costs: The largest contributors here will be on-going technical and customer support and system administration, management, and maintenance. These are dependent on many factors including, for example, system size and complexity and number of users and their level of sophistication. In both sets, almost all costs are associated with labor, and despite possible assertions to the contrary, those costs are going to be approximately the same across the competing products and technologies. All search systems, no matter how they work or who provides them, require design and specification, development, configuration, deployment, testing, and on-going support and maintenance. The only inarguably clear differentiator is the cost of software acquisition. As Lucene and Solr are open-source software solutions based on open standards and community-driven development processes, they are free. Assuming all other costs are about equal, therefore, the open source solution is almost certain to be highly cost efficient. That, however, is still not the whole story. A Lucene/Solr-based solution can be the most cost effective as well. With its strong out-of-the-box performance and relevancy ranking; complete query capabilities; portability, scalability, and manageability characteristics; and easy-to-use, highly-standard programming interfaces, Lucene/Solr enables you to deploy exactly the enterprise search functionality required to fulfill completely your customers’ needs.
  • 12. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 9 How Lucid Imagination Can Help Lucid Imagination has the expertise, resources and services you need to drive development of Lucene/Solr-based enterprise search solutions. We offer a full portfolio of software and services including: Certified Distributions. Because Lucene/Solr distributions certified by Lucid Imagination are tested and commercially supported, they speed up implementation time, reduce the risk of “gotchas”, and eliminate the need for familiarity with the fine points of the community release process. Tested bugfixes are incorporated in organized fashion, reducing the time needed to comb through nightly open source community releases, or risking code forks between release cycles. The Get Started program helps users who download our Certified Distributions with first-time installation, configuration, and basic usage of Lucene/Solr and included utilities. “There is no substitute for “industrial strength” support to ensure your enterprise IT operation gets timely responses, so it can both meet market- driven development schedules and maintain stringent service level commitments.” Technical Support. Although contemporary open-source solutions are typically at least as robust and reliable as their commercial counterparts, problems can still arise. Because the community may not focus on maintenance in timely fashion, there is no substitute for the “industrial strength” support provided by Lucid Imagination to ensure your enterprise IT operation gets timely responses, so it can both meet market-driven development schedules and maintain stringent service level commitments. Designed for customers with Lucene/Solr installations, the support subscriptions we offer include: o Regular updates and upgrades for Lucid-certified versions of Lucene/Solr
  • 13. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 10 o Problem isolation and diagnosis of errors in Lucene/Solr software o Bug patches and workarounds o Troubleshooting of use-case issues that may arise. Support subscriptions are available in a variety of packages to fit different maintenance profiles: • Basic, a fit for stable deployments that can rely on minimal intervention and can wait a day to hear back. • Professional, for deployments with quicker response time requirements, featuring both phone and email support. • Enterprise. For mission-critical deployments requiring initial response within four (4) business hours on the same business day, plus an annual Search Health Check program. • Advanced Support. Designed for customers with more demanding needs for expert advice and guidance on an ongoing basis, Advanced Support subscriptions include the services delivered under Enterprise Technical Support, plus consultative support to help optimize development and/or deployment efforts. Two options are available: o Development Support: As noted above, enterprise search requirements often are designed for deployment with enormous data domains and stringent user requirements. Although it may be relatively easy to construct a solution that works to a first-order of sophistication, when the requirements exceed more straightforward design goals, we can help you get to a solution that is potentially many times more capable for a relatively small amount of additional investment. We help you optimize development of Lucene/Solr enterprise search applications with reviews of architecture, design, code and configuration, along with best-of-breed methodologies and powerful tools. Includes one annual Search Health Check. o Production Support: For large, relatively complex systems that have data domains with continuously increasing size and complexity, on-going tuning and performance enhancement may be critical to ensuring sustained customer satisfaction We help you achieve optimal performance and availability for Lucene/Solr in your production environment. We provide advice on best
  • 14. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 11 practices for configuration, operations, scaling, tuning, and tools, as well as two annual Search Health Checks. • Training. Our hands-on training programs in Lucene and Solr technologies help your staff acquire skills and develop expertise. Training programs are offered as classroom-based courses, and can be customized for on-site delivery. • Consulting. Our consulting practice offers flexible-term engagements to assist you with high value activities such as architecture and design reviews, training, enablement, and best practices. As our consultants work on a broad variety of implementations, they are well positioned to recommend optimal approaches to your business and technical challenges. Their deep domain expertise can be retained on a project basis, over several months, ad-hoc, or as a subscription. Consultants are available on a remote basis or for short-term onsite work. Our customers benefit from the years of collective expertise found in our technical staff, who are themselves widely recognized leaders in the Lucene/Solr community. By providing predictable, reliable resources, Lucid Imagination helps you meet your project feature, function, and schedule requirements. We can help you reduce the risks and capture the benefits of open source for your enterprise search solution. We invite you to visit our Website (http://www.lucidimagination.com) for additional details. Conclusion Lucene/Solr-based enterprise search solutions are among the most comprehensive, complete, robust, and flexible in the world today. Whether you are merely contemplating an open-source enterprise search solution or already have one deployed, Lucid Imagination is the one company that is uniquely situated to help ensure that your customers are not merely satisfied, but delighted in the fulfillment of their enterprise search needs.

×