Liwp consider opensource2010

910 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
910
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Liwp consider opensource2010

  1. 1. Good Information Is Hard to Find: Guidelines for Managers Considering Open Source Enterprise Search A Lucid Imagination White Paper
  2. 2. Abstract Enterprise search helps your employees, customers, and partners find the most relevant and timely information; they need it to make smart, efficient decisions about doing business with and in your company. Open source has delivered great benefits to enterprise software customers, with innovative operating systems, databases, and middleware and a broad range of applications; now the open source model can unleash this value for your enterprise search needs. Lucid Imagination brings market-leading expertise to open source enterprise search, and can help any organization quickly design and optimize search solutions based on Lucene and Solr. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page i
  3. 3. Table of Contents Introduction and Overview ............................................................................................................................... 1 The Advantages of Open Source ...................................................................................................................... 3 Lower Costs ......................................................................................................................................................... 3 Pay at the Point of Value................................................................................................................................. 4 Transparent Development ............................................................................................................................ 5 Re-tool the employees, retire the software............................................................................................. 5 Lower Overall Risk ........................................................................................................................................... 6 About Lucid Imagination.................................................................................................................................... 6 Engagement Scenarios ........................................................................................................................................ 8 Considering Alternatives to Legacy Packaged Search Applications .............................................. 9 Building on In-house Lucene/Solr Expertise ...................................................................................... 11 Next Steps ............................................................................................................................................................. 12 Appendix: About Apache Lucene and Solr ....................................Error! Bookmark not defined. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page ii
  4. 4. Introduction and Overview Raising the collective intelligence of company employees can make them smarter and more efficient—but how do you enable them to keep up with the vast, ever-changing amount of data your organization produces? Many operations seem to be better at creating data than using it to operate more productively. Using search tools designed for the Web can make it difficult to find relevant, timely corporate information, mostly because corporate data is not much like Web data: • Corporate data can be stored in a variety of different and unstructured formats, including documents and database records. • A document’s popularity is not necessarily what makes it useful to a specific search. • Information may require controlled access, yet still be discoverable to those users with the appropriate permissions. Two state-of-the-art, open source search technologies—Lucene and Solr—are available for free from the Apache Software Foundation. Lucene is a powerful search engine and library; Solr provides a platform built on top of Lucene that makes it easy to build Lucene-based applications.1 Rich, flexible text query tools and sophisticated ranking capabilities of Lucene/Solr enable users to quickly find the most useful documents or records. Either of these full-featured technologies delivers excellent performance, relevancy ranking, and scalability. They are used today by thousands of organizations, powering substantial and diverse search applications for AOL, CNET, Comcast Interactive Media, IBM, Netflix, LinkedIn, MySpace, and many others. For these companies, Lucene/Solr solutions regularly index and search hundreds of millions of documents with subsecond response time, all without incurring any licensing fees. These solutions excel at quickly and effectively searching large volumes of unstructured text—documents or other records containing freeform text—and returning results based 1 Most organizations use Solr today as their search development platform. Because Lucene serves as the core of Solr’s search capabilities, this paper refers to them as Lucene/Solr. For more information about these technologies, see the Appendix. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 1
  5. 5. on how well they match the user’s query. At most companies, this means digesting and searching through dozens of different file formats—including documents, spreadsheets, presentations, e-mail, and records stored in databases, to name just a few—and delivering relevant results to authorized users. Incremental update capabilities mean that Lucene/Solr searches can track document collections easily as they grow and change, finding information nearly as fast as it is created. Solr can speedily facet, or categorize, data and search results based on specific field values. An excellent example of this function is Zappos.com, the popular shoe e-tailer, where users can quickly refine searches based on product criteria such as price or features. For most application development teams, building a search application is not an everyday project. By definition, enterprise search technology processes unstructured data, which can change frequently. Expert guidance on architectural considerations, such as index optimization, result relevance, deployment configuration, and retrieval performance can make a tremendous difference in deploying a successful solution. By taking advantage of expert, experienced personnel to assist with application design, development, and deployment, organizations can leverage the full benefit of Lucene/Solr search technologies without the cost of licensing proprietary software. “Expert guidance on architectural considerations, such as index optimization, result relevance, deployment configuration, and retrieval performance can make a tremendous difference in deploying a successful solution.” For these reasons, Lucid Imagination provides commercial-grade support, training, and professional consulting services that are essential to designing and installing successful enterprise applications. This paper is intended for business decision makers who are considering options for powerful, flexible enterprise search solutions. It provides guidelines for understanding: • Advantages of open source software, including ways it can lower costs and risks, Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 2
  6. 6. • Why Lucid Imagination’s service and support is a key ingredient in achieving successful Lucene/Solr solutions, • Engagement scenarios—the types of situations where Lucid Imagination can help, and • The capabilities of Lucene/Solr, which are provided in an appendix. The Advantages of Open Source Open Source has changed the IT landscape. Gartner says 85 percent of polled companies are already using open source software, calling the use of open source software “pervasive.”2 Most organizations are now familiar with free and open source products such as Linux, MySQL, Apache, and SugarCRM, because of the many benefits, including: • Lower costs • Pay at the point of value • Transparent development • Control and flexibility – investing in people instead of software licenses • Lower overall risk With Lucene/Solr’s broad, successful adoption across markets and deployments, these advantages are now available for enterprise search applications. Let’s take a closer look at how open source pays off. Lower Costs While proprietary software vendors must try to recover their development costs, this is not the case with open source software, because it does not have capital costs associated with source code IP. The cost of talent is less, too. Community development, adherence to standards, and lower barriers to adoption all help increase the number of developers who 2 http://www.theregister.co.uk/2008/11/18/gartner_open_source/ Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 3
  7. 7. become proficient in the use of a product or technology. Together, these factors combine to reduce upward pricing pressure. The high license fees associated with proprietary and closed source development can discourage developers and customers from adopting a product or technology. In contrast, open source communities help lower costs by encouraging participation and allowing anyone to download the source code and try it out. Most open source communities release updated binaries on a periodic basis, so users can easily try the software on their own timetables. “In most cases, however, the technology’s purchase price makes up less than half of the implementation cost, with the balance going to services.” Many commercial solutions combine proprietary software with service and support, and customers may believe that buying a software license is sufficient to get a search application up and running. In most cases, however, the technology’s purchase price makes up less than half of the implementation cost, with the balance going to services. Both open source and proprietary software usually require a significant amount of customization, which means some service and support costs are inevitable. Pay at the Point of Value Open source project code is freely available for any use. If a company can become proficient with the code, it can make productive use of the code at any phase from evaluation to production. Only in those areas where an open source customer sees value—for support and integration services, or for additional functionality or expertise—does money need to be spent. There are no restrictions on when open source software can be used. In contrast, proprietary products typically must be purchased before they can be used, or in some cases, even evaluated. Some vendors offer evaluation or trial versions, but these often have reduced functionality or restrictive licenses. Because the software must be Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 4
  8. 8. purchased before the customer can see any value from the product, return on investment is delayed. Transparent Development Community-developed software enables everyone to see what is being built and which features are included as early as possible. Developers and customers do not need to wait for a vendor to publish a roadmap, or for a vendor product launch, to know what is being readied for release. As a result, prospective users can make better, faster, and more informed decisions relating to their software infrastructure. Compare this to proprietary software, where customers have little if any insight into upcoming products until very late in the product life cycle. This is typically no sooner than the software’s beta release, when it is too late to provide input on features and functionality. This delays assessment and adoption of innovations. Re-tool the employees, retire the software In this tough economic climate, managers who own budgets need to review every expense with a critical eye. Many software applications that made sense a few years back may have out-lived their intended fit to business needs. Any application development effort generates significant learning. The work of development imbues in-house developers with deep knowledge and understanding of the company, its IT infrastructure, culture, and usage requirements. Given that software applications must keep up with an organization’s changing goals and requirements as the needs of its market and constituents evolve, the expertise which the technical staff develops becomes is a vital competitive asset. This is key corollary benefit of the open source model: by retiring old software packages and investing in staff expertise, companies combine innovative technology with their most valuable asset – their people, establishing vital competitive advantage. Companies who leverage savings from not purchasing software licenses to build development talent in-house reduce the cost of addressing inevitable change. What’s more, increasing a technical team’s ability to translate company business objectives into technology solutions increases the likelihood that the software they build will continue to fit that inevitable change. This is particularly true for an enterprise search solution. What’s Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 5
  9. 9. more, compared to closed source implementations, in-house developers can work with open source code and supplement additional functions or expertise by relying on the community and marketplace of readily available resources – again capturing unique competitive advantage. “The expertise which your technical staff develops becomes is a vital competitive asset.” Supplementing open source development with training, consulting, and reliable support from established industry experts reinforces a company’s competitive advantage – with the control and flexibility needed to survive and thrive. Lower Overall Risk Vendors use proprietary interfaces and components to lock in customers. However, the source code for open source software is freely available and widely supported by the community, based on standardized, free public interfaces. If a commercial vendor goes out of business (or is purchased by another), or tries to increase fees for a commercial product, open source vendors may be able to step in to meet the needs of customers at market- competitive prices. Open source software can reduce security and operational risks, too. Widely used open source software is essentially under constant peer review. Technical or security issues, once exposed in the community, are readily addressed, resulting in a safer and more reliable product. About Lucid Imagination The benefits of open source have unlocked tremendous value in many software categories: Red Hat’s Enterprise Linux in operating systems, MySQL in database software, Sugar in CRM software—all have benefited from matching the efficiencies of open source with deep, robust commercial resources to ensure successful applications. Today, Lucid Imagination’s Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 6
  10. 10. capabilities and expertise brings that same approach to unlocking enterprise search with Lucene and Solr. Lucid Imagination’s mission is to enable customers to achieve business objectives for optimal search performance and accuracy, with lower total cost of ownership and faster time to market. The company’s founding team consists of many key contributors and committers to the Lucene/Solr project, as well as other experts in enterprise search application development. Our skills, acquired across hundreds of deployments, including best practices and technical know-how, can enhance and optimize any phase of an open source search implementation. Lucid Imagination’s team has a deep understanding of indexing, which is the foundation of any search solution; it captures all the content and location of searched documents for quick lookup, much as a book index does. We have broad experience indexing: • Documents of widely varying sizes and formats within a very large collection, • Documents with diverse metadata requirements, and • Multilingual documents. The team is also skilled at applying business rules such as boosting documents and fields, indexing dates, or other attributes of terms and data. Lucid Imagination has developed best practices for indexing and metadata management, and can help establish and refine policies to meet business and technical search requirements, such as: • How and when to add documents to an index, • Removing documents from an index, • Results relevancy and document/data findability • Undeleting documents, and • Batch and real-time updates. The Lucid Imagination team has extensive experience with large-scale search applications, including engagements with: • Large collections—more than one billion documents, • High query volumes and large user populations, • High document growth rates, Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 7
  11. 11. • Distributed indexing and searching, • Replication and high availability, and • Cloud environments. In addition to fine-tuning search technology machinery, the Lucid Imagination team has significant expertise in natural language processing, which optimizes the interaction of compute resources with human-created content. Key considerations include: • Developing structured methods for characterizing how well a set of results meets user needs, • Establishing a tradeoff between overall net gain in the quality of results across the whole application, versus a single improvement for one query or user, and • Improving the ability to find accurate answers by leveraging a balanced mix of content analysis and query interpretation algorithms. The breadth of expertise offered by Lucid is available in a variety of forms suited to a range of different business needs and deployment requirements. This enables customers to create even more powerful and successful search applications. Engagement Scenarios Virtually every company and organization uses some form of enterprise search, to help customers, employees, and partners find the information they need. Many companies use packaged commercial software applications; but, over time, their requirements evolve beyond the original platform’s limitations. Also, licensing or customization costs may grow too high, or the number and type of documents may expand beyond the original design’s capacity. As companies evaluate the ongoing fit of their current search applications to an ever changing market and organizational landscape, they naturally ask “Is there a faster, cheaper, more effective way to do this?” Today, thousands of companies and organizations—each with unique search and retrieval requirements—answered this question with Lucene/Solr. The essential value of Lucid Imagination and open source Lucene/Solr technology is that it provides commercial support that adapts to specific requirements. Whether a company is evaluating Lucene/Solr for a new implementation, considering replacement of a commercial search Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 8
  12. 12. product, or enhancing an existing Lucene/Solr implementation, Lucid Imagination offers skills and resources to help at every phase of the project life cycle. Considering Alternatives to Legacy Packaged Search Applications Change happens quickly, but taking advantage of new opportunities can be limited by existing applications and traditional ways of doing things. Organizations with legacy search applications often realize that they are paying too much to align packaged enterprise search applications with evolving business requirements. In other cases, they discover it is too difficult to integrate existing software with new services, or it takes too long to meet new corporate goals. With the power of Lucene/Solr, Lucid Imagination supplies the expertise organizations need to produce successful search solution efforts, more quickly and less expensively—now and going forward—than other solutions. • Consulting services are highly customized and able to engage quickly to shorten cycles and ramp times, minimize errors and design pitfalls, and improve production results. Lucid Imagination’s consulting team consists of senior search technologists who are intimately familiar with Lucene/Solr technologies and have extensive experience in field-tested search solutions for diverse deployment scenarios. “Organizations with legacy search applications often realize that they are paying too much to align packaged enterprise search applications with evolving business requirements.” Open source software is ideally suited to low-cost prototyping, because it can reduce time to deployment and refine the user experience. For customers striving to integrate a highly diverse base of data and documents, Lucid Imagination offers prototyping services to assist with the process. • Technical training can bring everyone in the IT department up to speed on best practices and the elements of good search design—establishing a solid base of skills before coding begins. This can greatly reduce downstream problems and reduce Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 9
  13. 13. overall costs. Lucid Imagination works with in-house application and system administration teams to provide the knowledge transfer, guidance, training, and support required to implement an enterprise search solution that fits the organization’s specific needs. • When dependable, predictable support is required to accompany an organization’s efforts on a regular basis over time, Lucid Imagination’s support subscriptions provide reliable access to domain experts during the entire application life cycle process. Technical Support features the latest tested versions and timely, predictable support turnaround times. Advanced Development Support provides expert architectural design, development, and testing guidance for building search applications using Lucene and Solr. Advanced Production Support provides expert advice on configuration, performance tuning, and optimization for applications deployed to a production operation environment with live users and service-level attainment regimes. Search Health Check, included with Advanced Support, is a comprehensive set of services that ensures applications are designed to meet recommended best practices for search configuration, optimization, and effectiveness. Custom Support packages are also available for unique situations. • Lucid Imagination’s free 30-Day Get Started Program is available with downloads of Lucidworks, our certified distributions of Lucene and Solr. The Get Started Program complements Lucidworks with added guidance for questions on first-time installation, configuration, and basic usage, as well as evaluation of Lucene/Solr and included utilities. LucidWorks for Solr is the logical starting point for most developers building search applications with Lucene/Solr technology for websites, products, or internal organizational use, because it bundles the most recent and stable Apache/Solr capabilities, along with other tools and utilities. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 10
  14. 14. Building on In-house Lucene/Solr Expertise Many organizations with in-house Lucene/Solr expertise have achieved considerable sophistication in their deployments. Still, they may reach a point where it is difficult to move the architecture or implementation past a particular design, deployment, or optimization constraint. There can be many reasons for this, such as limitations on staff expertise, design, or architecture. Configurations and policies may not have kept pace with current best practices. A dependent part of the IT environment may have changed— anything from upgraded complementary applications to new middleware, or expanded data volume and variety. For organizations that are ready to gain the required knowledge to move ahead, address the current situation, and make sure that a deployment stays at peak performance, Lucid Imagination recommends an in-depth engagement. Typically in a consultative format, engagement begins with an in-depth assessment and review followed by best practices design recommendations, and ends with a strategy proposal for achieving long-term, sustainable innovation for search solutions. “A significant benefit of open source software is its ability to provide fast, low- cost prototyping as a means to reduce time to deployment and refine the user experience.” Another key area where Lucid Imagination stands ready to help is in optimizing performance—both in application response time and its utilization of hardware/software resources. Lucid Imagination experts work with in-house teams to diagnose and improve search application efficiencies. As mentioned earlier, a significant benefit of open source software is its ability to provide fast, low-cost prototyping as a means to reduce time to deployment and refine the user experience. For customers that seek to integrate highly diverse bases of data and documents, or accelerate evaluations of open source search solutions, Lucid Imagination offers prototyping services. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 11
  15. 15. While community support has always been a significant benefit of open source projects, tough issues may not always be answered in timely fashion or with the discretion necessary to prevent exposure of confidential organizational knowledge. That’s when Lucid Imagination’s expert teams can help. Some companies are already skilled in open source technologies in general and Lucene/Solr in particular. For these, Lucid Imagination offers Technical Support and Advanced Support. Technical Support can provide answers within defined response times for users encountering problems with Lucene/Solr projects or production implementations. Different levels of support address most situations. For example, an e-commerce startup may find that community forums provide suitable answers, but not always as quickly as needed. Basic Technical Support provides Web-based and e-mail support at competitive rates for customers that do not require same-day response or direct telephone support. Lucid Imagination also offers various levels of Technical Support for larger or mission- critical installations, including fast turnaround, diagnosis, and bug fixes. Finally, Enterprise Technical Support includes Search Health Checks by Lucid Imagination domain experts to help ensure optimal runtime effectiveness. Next Steps For more information on how Lucid Imagination can help employees, customers, and partners find the information they need, please visit http://www.lucidimagination.com to access blog posts, articles, and reviews of dozens of successful implementations. Please e- mail specific questions to: Support and Service: support@lucidimagination.com Sales and Commercial: sales@lucidimagination.com Consulting: consulting@lucidimagination.com Or call: 1.650.353.4057 Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 12
  16. 16. Appendix: Lucene/Solr Features and Benefits Lucene and Solr are complementary technologies that offer very similar underlying capabilities. In choosing a search solution that is best suited for your requirements, key factors to consider are application scope, development environment, and software development preferences. Lucene is a Java technology-based search library that offers speed, relevancy ranking, complete query capabilities, portability, scalability, and low overhead indexes and rapid incremental indexing. Solr is the Lucene Search Server. It presents a web service layer built atop Lucene using the Lucene search library and extending it to provide application users with a ready-to-use search platform. Solr brings with it operational and administrative capabilities like web services, faceting, configurable schema, caching, replication, and administrative tools for configuration, data loading, statistics, logging, cache management, and more. Lucene presents a collection of directly callable Java libraries and requires coding and solid information retrieval experience. Solr extends the capabilities of Lucene to provide an enterprise- ready search platform, eliminating the need for extensive programming. Solr provides the starting point for most developers who are building a Lucene-based search application. It comes ready to run in a servlet container such as Tomcat or Jetty, making it ready to scale in a production Java environment. With convenient ReST-like/web-service interfaces callable over HTTP, and transparent XML-based configuration files, Solr can greatly accelerate application development and maintenance. In fact, Lucene programmers have often reported that they find Solr contains “the same features I was going to build myself as a framework for Lucene, but already very well implemented.” Using Solr, enterprises can customize the search application according to their requirements, without involving the cost and risk of writing the code from the scratch. Lucene provides greater control of your source code and works best in development environments where resources need to be controlled exclusively by Java API calls. It works best when constructing and embedding a state-of-the-art search engine, allowing programmers to assemble and compile inside a native Java application. While working with Lucene, programmers can directly control the large set of sophisticated features with low-level access, data, or state manipulation. Enterprises that do not require strict control of low-level Java libraries generally prefer Solr, as it provides ease of use and scalable search power out of the box. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 13
  17. 17. As functional siblings, Lucene and Solr have become popular alternatives for search applications; the two differ mainly in the style of application development used. Key benefits of search with Lucene/Solr include: • Search Quality: Speed, Relevance, and Precision Lucene/Solr provides near-real-time search and strong relevance ranking to deliver contextually relevant and accurate results very quickly. Tailor-made coding for relevancy ranking and sophisticated search capabilities like faceted search help users in sorting, organizing, classifying, and structuring retrieved information to ensure that search delivers desired results. Search with Lucene/Solr also provides proximity operators, wildcards, fielded searching, term/field/document weights, find-similar functions, spell checking, multilingual search, and much more. • Lower Cost and Greater Flexibility, Plug and Play Architecture Lucene/Solr reduces recurring and nonrecurring costs, lowering your TCO. As open source software, it does not require purchase of a license and is freely available for use. The open source code can be used as is, modified, customized, and updated as appropriate to your needs. Solr is easily embedded in your enterprise’s existing infrastructure, reducing costs of installation, configuration, and management. • Open Source Platform for Portability and Easy Deployment Because Lucene/Solr is an open- source software solution, it is based on open standards and community-driven development processes. It is highly portable and can run on any platform that supports Java. For instance, you can build an index on Linux and copy it to a Microsoft Windows machine and search there. This unsurpassed portability enables you to keep your search application and your company’s evolving infrastructure in tandem. Lucene, in turn, has been implemented in other environments, including C#, C, Python, and PHP. At deployment time, Solr offers very flexible options; it can be easily deployed on a single server as well as on distributed, multiserver systems. • Largest Installed Base of Applications, Increasing Customer Base Lucene/Solr is the most widely used open source search system and is installed in around 4,000 organizations worldwide. Publicly visible search sites that use Lucene/Solr include CNET, LinkedIn, Monster, Digg, Zappos, MySpace, Netflix, and Wikipedia. Lucene/Solr is also in use at Apple, HP, IBM, Iron Mountain, and Los Alamos National Laboratories. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 14
  18. 18. • Large Developer Base and Adaptability As community developed software, Lucene/Solr provides transparent development and easy access to updates and releases. Developers can work with open source code and customize the software according to business-specific needs and objectives. Its open source paradigm lets Lucene/Solr provide developers with the freedom and flexibility to evolve the software with changing requirements, liberating them from the constraints of commercial vendors. • Commercial-Grade Support for Mission Critical Search Applications from Lucid Imagination Lucid Imagination provides the expertise, resources, and services that are needed to help enterprises deploy and develop Lucene-based search solutions efficiently and cost- effectively. Lucid helps enterprises achieve optimal search performance and accuracy with its broad range of expertise, which includes indexing and metadata management, content analysis, business rule application, and natural language processing. Lucid Imagination also offers certified distributions of Lucene and Solr, commercial-grade SLA-based support, training, high-level consulting and value-added software extensions to enable customers to create powerful and successful search applications. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 15

×