3. • Enterprise search platform
The practice of identifying and enabling specific content across the enterprise to be indexed, searched
and displayed to authorized users.
3
1. Key Concepts
1 Collection
2 Indexing
3 Query
Parser
4 Query
Engine
5 Post
Processor
6 Formatter
Content
Indexing
Query
Processing
1. Crawls directories and
websites, extracts content
from databases and other
repositories. Arranges for
content to be transferred to it
on a regular basis so it can
notify the search engine that
new information is available
2. Creates a searchable index
from all the content, often with
some value added processing
such as metadata extraction
and auto-summarization
(groups information into logical
categories)
3. Accepts searcher queries and
encodes them for optimal use
4. Passes query over index and
finds documents matching
search criteria
5. Sorts documents and applies
logic to the results such as
categorization, clustering and
recommendations
6. Streams out and formats
results
How do they work
4. • Faceted search
– Its the dynamic clustering of items or search results into categories that let users drill into search
results by any value in any field. Each facet displayed shows the number of hits that match that
category. Users can “drill down” by applying specific constraints to the search results. Also called
faceted browsing, faceted navigation, guided navigation and parametric search.
The example started out with all digital cameras, then the user selected the constraints “$400-$500”
and “SLR” from the Price and Digital camera type facets.
4
1. Key Concepts
5. • Faceted search benefits
– Superior feedback: Users can see at a glance a summary of the search results and how those results
break down by different criteria
– No surprises or dead ends: Users know how many results match before they click. Values with zero
counts are normally removed to reduce visual noise and eliminate the possibility of a user accidentaly
selecting a constraint that would lead to no results
– No selection hierarchy is imposed: Users are generally free to add or remove constraints in any order
5
1. Key Concepts
6. Apache Solr Oracle Endeca
An open source community supported
tool that allows IT to implement a
faceted search capability based on text
queries to an index of your data model
(e.g. products)
A mature product that provides all the
GUI based tools needed to allow IT and
business to quickly deploy search and
navigation built on queries to text and
object based data model.
More extensible Faster time to market
Faceted search – text search based Guided navigation – data model based
Limited tools Robust integrated tool set
6
2. Projects overview
7. Solr is a highly popular open source enterprise search platform from Apache. It uses the Lucene Java search
library at its core for full-text indexing and search, it has REST-like HTTP/XML and JSON apis that make it
usable from most programming languages.
Apache Lucene and Apache Solr projects were merged in 2010.
Strengths
• Free
• More powerful and extensible (e.g. freedom to build custom ranking algorithms)
• Larger adoption by the industry
• Larger community / modules / documentation
• Based on industry proven modules
Weaknesses
• No out of the box GUI for business users. Has to be implemented by IT
• No reporting
• It’s considered a framework not a product
7
3. Apache Solr
8. “Oracle had struggled to develop a strategy for enterprise search that would define it as a Leader. To do this, it has repurposed
Oracle Secure Enterprise Search as a tool that informs all its applications.
The acquisition of Endeca catapults Oracle forward in terms of search facility, though, at Oracle, Endeca is more prominent as
a means of improving business intelligence than as a search product.”
Strengths – Gartner report 2013/05
• Oracle offers strong flexibility for the design of conversational search capabilities to reduce the ambiguity
of results
• Oracle has very strong experience in e-commerce use cases
• Oracle has invested particularly strongly in the searching and analysing of structured data for hybrid
structured / unstructured use cases
Weaknesses
• Oracle has changed the model of pricing by data record to a price by processor (Oracle’s long standing
model). Clients indicate that they are often dissatisfied with this new model.
• Oracle is positioning Endeca as a search technology in the e-commerce arena, which might
weaken its development as a stand-alone enterprise search engine.
8
4. Oracle Endeca
9. Feature Apache Solr Oracle Endeca
Data modeling XML editing GUI tool set that supports configuration
and joining data from multiple sources
Index inspection Velocity based application that
supports search
Robust reference application to inspect
data and explore features
Business users n/a GUI based business suite to manage
configurations
Merchandising n/a GUI to manage merchandising rules
Reporting n/a Out of the box reports for search,
navigation and merchandising
Relevance ranking Extend a class to create what you
want
Limited to adjusting modules
XQuery n/a Xquery based ad-hoc querying with XML
support
9
5. Feature comparison
10. Feature Apache Solr Oracle Endeca
Aggregating records n/a Rollup records based on a property to
support variants
Hierarquical dimensions n/a Possible to define hierarchies for ranges
Internationalization Out of the box only supports
English. Has to use external
modules to support it
Licensed support for multiple languages
Clustering Manually configured by IT by using
external modules
Automatic organization of search results
into sets that share attributes
Scalability Based on Apache Zookeeper. Easy
to scale up. More powerful
Linear scalability out of the box. Easier
to manage
10
5. Feature comparison
11. Apache Solr
Strengths
• Fully integrated with Lucene (same project, different
modules).
• More freedom to customize and adapt to business
needs.
• More powerful api.
• Larger adoption / community.
Weaknesses
• No out of the box features for business users.
• More time to market for IT to implement features
(e.g. reporting, business Backoffice).
11
6. Conclusion
Oracle Endeca
Strengths
• Aligned with Oracle’s long-term goals to make it the
e-commerce reference for enterprise search.
• Out of the box features for business users
(backoffice).
Weaknesses
• Separate index. No integration with Lucene.
• Api more constrained. Possibly more difficult to
integrate to diverse business needs.
• Smaller adoption / community.