SlideShare a Scribd company logo
1 of 7
Download to read offline
Enterprise Search Myths and Realities
2
Myths in Enterprise Search
Many people think that since enterprise search has been around for years, all its issues should have
been solved and its complexity dramatically reduced. In other words, it ought to be a commodity by
now. Although search technologies continue to advance, the origin of the complexity is the nature of
language itself – full of ambiguity, contradiction, multiple meanings, and many contexts. The digital
search for what you want to know involves a balance between precision (the measure of the usefulness
or a result) and recall (the measure of the completeness of the result). Increasing precision without
sacrificing recall is a complicated balance to strike, and the commoditization of this balance is one of the
myths not currently in line with reality.
COMMON SEARCH MYTHS
In this document we uncover and correct some of the other myths, assumptions, and misconceptions
about enterprise search that prevail in the market today.
Myth
Web search and Enterprise search are more or less the same.
Reality
This is the assumption behind the statement, “Why can’t I get a Google interface for my company?” The
quick answer is, “because you don’t actually want it.” The more appropriate question is, “Why can I find
what I’m looking for on Google more easily than I can on my own corporate website?” That is indeed a
problem, and it happens, ironically, when people try to solve it from a Web search perspective.
To Google’s credit, they have been successful in forcing the enterprise search market to realize that ease
of use for the end user (and ease of use in general) should be given the serious attention it deserves.
Agreed. But the demands of the enterprise user are different and the complexities of source content are
greater than a public search of the Web.
Users expect a better answer to their question because they are more intimate with their content. They
are not looking for the most popular answer – the general logic behind ranking for Web search – they
want the right answer. Consequently, the ranking model for the enterprise is much more complex than
for the Web. It must consider several parameters in its balance of demand between precision and recall
(term frequency, source freshness, spatial proximity, authority, etc.), a balance that changes from
application to application.
Now add to this greater complexity. The reasons for searching are much more varied. Data types are
much more varied. Data freshness (how soon does a new document appear in my search?) is more
important. Security is an issue. A typical enterprise supports more content types, formats, and security
layers than the entire Web.
3
Finally, search in the enterprise is often in context of a specific solution, for example uncovering legal
risk, assessing product campaigns, or buying goods online. While it is popular to start a request for
information with a search query, unlike the Web, there is an expectation that further investigation will
involve refinement techniques that involve navigating through supplemental information such as facets
and concepts. This more exploratory model visualized through navigators, tag clouds, heat maps, and
the like, is commonplace in enterprise search but not on the Web.
Myth
The Web has more content than any enterprise, so Web search companies are the real experts on
search.
Reality
As of summer 2008, Google’s index was just under 21 billion web pages and growing. Yahoo’s was
actually higher at around 55 billion1
. If you take an average page size of 200 bytes2
, then Google’s index
was then about 4.2TB and Yahoo’s about 11TB. While this is large, and perhaps larger than many
enterprise search implementations, there are many, many enterprise search implementations in the
tens and hundreds of terabytes, and a few now in the petabyte range. The assumption might be that the
corpus is larger because the average document size is also. True for some implementations, but others
contain just email.
In any case, when it comes to scale, it is intuitive who has the most incentive for efficiency. The Web
search companies purchase and host their own hardware. Enterprise search vendors must convince
their customers to buy the hardware themselves3
.
Myth
My relevancy model is better than your relevancy model.
Reality
Search technology, in some form or another, has been around since Lexis-Nexis first commercialized it in
the 1970s. In the 1990s, Google became the catalyst for carving out the web search business as a
separate entity, introducing ranking algorithms based on website popularity. It also legitimized the
importance of search as a strategic asset in the enterprise.
Until then there was little distinction between Web or enterprise search. All the models were more or
less the same, based on matching words typed into the search box against words in documents residing
in the index. But now that there was real money to be had, enterprise search vendors turned up the
volume on competitive differentiation by touting the superiority of their relevancy models.
In hindsight, this was a surprising tactic because the math behind relevancy is quite complex and hardly
the stuff of debate for the search customer: keyword search, conceptual search, semantic search, scope
search, TF/IDF, Bayesian probability, Boolean filtering, query expansion, NLP, text mining, and so on.
One vendor proclaims that enterprise search is not rocket science (or brain surgery – it’s
4
interchangeable); enterprise search is harder. Another vendor simply tells its customers it’s too
complex, so “leave it to the experts who know how to do this.”
Yet, today, as in Time Before Google, the majority of customers are still not satisfied with the results
they get from their search engines. In their minds the quality has not really changed. You still hear, “Why
can’t I find information in my company as easily as I can find it on Google?”
The suggestion that one model is better than another is not so much wrong, but a moot point. Every
vendor’s model is good for some content, just not for all content. What makes a good enterprise search
vendor is their ability to adapt to the context and character of the content and application requirements
by deploying an optimal combination of all these different approaches.
Finally, perhaps we’re all arguing about the wrong thing. Relevancy is important, but what of the user’s
experience? Is the search technology touching the complete information landscape or just part of it?
How is navigation and exploration accomplished? Does the technology act on the results, i.e., connect in
to business operations and trigger action? There is more to information access than search, and there is
more to search than relevancy.
Myth
Manual facet management is easy and quick and offers a good user interface.
Reality
Facets, or dimensions, of results of a search can be used to help navigate to related information. The
conventional approach to facet management requires defining the facets before indexing as part of the
search platform’s configuration. Some vendors provide a well-designed user interface to make the
process as easy as possible. A typical example might be the organization of facets on an electronic retail
outlet’s ecommerce website. For productType = 'computer', I can declare my facets in this
order: price, make, CPUs, memory, storage, slots, monitor. For laptops, I would add a piece of logic that
says if portable='yes' then display the weight facet.
The manual approach is not a problem if your objects have a fairly uniform structure (e.g. books). You
can have millions of them, but the key is they are all described the same way. But imagine a national
retail outlet whose product catalog contains several hundred thousand different products. And further,
imagine the catalog changes fairly constantly. The manual process is now a half-year project and the
change management a major ongoing commitment.
A system that recommends the facets to you automatically and on the fly for each query would remove
all this work. The logic behind the ranking is not unlike the ranking for search results, involving a number
of calculations to arrive at a composite score (e.g. sparse matrix analysis, clustering, facet distribution,
etc.). The algorithms should be smart enough to avoid situations where no facets appear because none
are relevant enough to display (e.g. sparse-matrix analysis alone). This can happen with content that has
minimal facet intersection.
5
Myth
A simple database query is all a search engine needs to extract content from a database. Only the
relational database can truly support ad hoc structured querying
Reality
We challenge this assumption by comparing both relational and search engine technologies, with the
goal of proposing a hybrid solution that reflects the advantages of both. Let’s take a look at the
relational model first.
The relational model, you may recall, was originally designed for managing the transactional integrity of
inputting information and for its efficient retrieval through predefined, repetitive reporting. It works
because the database schema is designed specifically for the structure of the data and the shape of the
reports.
But the market began demanding a more ad hoc approach to querying their data for what-if analysis and
general exploration. In this situation, the query is no longer repetitive or known in advance, and
therefore cannot be planned for in the database engine.
The ad hoc query does not sit well with the basic relational model. Any relationship created a priori (all
relationships in a database schema) will bias for some queries and against others. Since you do not know
the query being asked, you do not know which side it will fall on. It is quite possible to create a “killer
query” that brings the database engine to a screeching halt.
Attempts to solve this problem have resulted in a continuous evolution of the relational model, twisting
it in various ways to provide better performance and greater flexibility (more ad hoc). Technologies
include data marts and star schemas, software and hardware data warehouses (e.g., Teradata, Netezza),
cubes, and vertical indexing technologies (e.g., Vertica, Sybase IQ).
The underlying problem is still there, however. All these technologies still view the problem from a
traditional table-column-relationship point of view, and this is inherently limiting. It does not mean we
abandon SQL or the need for the relational model to manage transactional data entry and fixed
reporting, but it does suggest we should rethink how the basic engine works for the optimization of
rapid, high volume, ad hoc information retrieval.
What might this new engine look like? Search indexes provide an interesting approach. They are
certainly designed for this type of problem. Google, for example, responds to millions of queries a day,
searching through billions of documents, each query taking less than a few seconds to respond. No
database technology comes close to this type of performance.
But then Google does not have to deal with cardinal relationships. It does not have to support the SQL
JOIN statement. The JOIN statement is the cornerstone of both reporting and ad hoc querying. For
example, we may have a hundred invoices for a customer. In a relational database, that amounts to a
101 tables: one for the customer and a hundred for the invoices. The customer data is stored once but
6
referenced a hundred times. If we want to return all the invoices for a particular customer, or all the
customers with invoices greater than a certain amount, the JOIN statement is used to exploit the
relationship between the customer and invoice tables.
The approach conventional search vendors use to extract content from a relational database is to
execute a SQL query against the database, returning a result set of uniform shape that is then indexed. If
different data or a different result set is requested, a new query is defined, the search index is
reconfigured, and the index is re-indexed. It works this way because search technologies simply do not
understand the relational concept.
There are many problems with this model. First, the data is “flattened”, meaning all cardinality is
removed by repeating content in each result set row. In our example, a flattened result set would
include the customer’s properties in every invoice. An updated customer record would require an
update to each one of its invoices in the search index.
Second, there are no real ad hoc capabilities here. You must know beforehand how your users will
explore the database content because you have to predetermine the shape of the results. But often you
don’t know what your next question will be until you see the answer to the first.
Finally, it is now impossible to JOIN content as a database engine does. A JOIN is not like a search; it is a
true Cartesian of results between two sets of content that share a common property value.
This does not need to be so. The rapid, high volume, pure ad hoc querying capability of the search index
is still valid, but the architecture needs to be enhanced to retain the integrity of the cardinal relationship
from the database source. If the search engine was augmented to ingest each table’s rows individually
for all the tables in the database, then it would be possible (with some clever work on the vendor’s part)
to support a JOIN statement executed on the fly at query time.
By the way, because the index contains both structured and unstructured content, the JOIN could be
between a table, email, and a set of documents. Further, since this is a search environment, “fuzzy
JOINs” are possible that capitalize on standard capabilities such as spell correction and synonym
expansion.
NEW DEVELOPMENTS IN SEARCH: INFORMATION ACCESS
While development in enterprise search and web search continue, a new category called unified
information architecture (UIA) is beginning to gain market traction. Unified information Architecture
extends enterprise search capabilities across all types of documents, data, and media. This expanded
scope replaces legacy enterprise search, offering all its functionality and combining simple access to
data and media. The advantages include being able to assemble all relevant information with one query;
connecting content and related data; and searching data with a simple search query instead of a
structured query language and formal reports. UIA can co-exist with search or replace it outright. For
more information about UIA, please visit www.attivo.com.
7
ABOUT ATTIVIO
Attivio’s Active Intelligence Engine® (AIE), redefines the business impact of our customers’ information
assets, so they can quickly seize opportunities, solve critical challenges and fulfill their strategic vision.
Attivio correlates disparate silos of structured data and
unstructured content in ways never before possible.
Offering both intuitive search capabilities and the power
of SQL, AIE seamlessly integrates with existing BI and big
data tools to reveal insight that matters, through the
access method that best suits each user’s technical skills
and priorities. Please visit us at www.attivio.com.
Attivio, Inc. • 275 Grove Street • Newton, MA 02466 USA
o +1.857.226.5040 • f +1.857.226.5072 • info@attivio.com• www.attivio.com
© 2013 Attivio, Inc. All rights reserved. Attivio, Active Intelligence Engine, and all other related logos and product names are registered
trademarks of Attivio. All other company, product, and service names are the property of their respective holders.
FOOTNOTES
1. Source: http://www.worldwidewebsize.com/.
2. Source: http://www.websiteoptimization.com/speed/tweak/average-web-page/.
3. Although adopting a SaaS model gets around this, the cost is still there. It’s just buried in the monthly fee.

More Related Content

More from Attivio

IDC Report - Unified Information Access on a Solid Search Base
IDC Report - Unified Information Access on a Solid Search BaseIDC Report - Unified Information Access on a Solid Search Base
IDC Report - Unified Information Access on a Solid Search Base
Attivio
 

More from Attivio (20)

Attivio Predictions 2017
Attivio Predictions 2017Attivio Predictions 2017
Attivio Predictions 2017
 
Achieving Compliance Efficiencies Amid Heightened Regulatory Scrutiny
Achieving Compliance Efficiencies Amid Heightened Regulatory ScrutinyAchieving Compliance Efficiencies Amid Heightened Regulatory Scrutiny
Achieving Compliance Efficiencies Amid Heightened Regulatory Scrutiny
 
Attivio Survey of Big Data Decision Makers
Attivio Survey of Big Data Decision MakersAttivio Survey of Big Data Decision Makers
Attivio Survey of Big Data Decision Makers
 
Reduce Risk and Protect Brand Value Through Proactive Monitoring and Compliance
Reduce Risk and Protect Brand Value Through Proactive Monitoring and Compliance Reduce Risk and Protect Brand Value Through Proactive Monitoring and Compliance
Reduce Risk and Protect Brand Value Through Proactive Monitoring and Compliance
 
IDC Report - Unified Information Access on a Solid Search Base
IDC Report - Unified Information Access on a Solid Search BaseIDC Report - Unified Information Access on a Solid Search Base
IDC Report - Unified Information Access on a Solid Search Base
 
TDWI Best Practices Report- Achieving Greater Agility with Business Intellige...
TDWI Best Practices Report- Achieving Greater Agility with Business Intellige...TDWI Best Practices Report- Achieving Greater Agility with Business Intellige...
TDWI Best Practices Report- Achieving Greater Agility with Business Intellige...
 
Attivio Customer Success Story - USGA
Attivio Customer Success Story - USGAAttivio Customer Success Story - USGA
Attivio Customer Success Story - USGA
 
Attivio Customer Success Story - UBS Neo
Attivio Customer Success Story - UBS NeoAttivio Customer Success Story - UBS Neo
Attivio Customer Success Story - UBS Neo
 
Attivio Customer Success Story - Durkheim Project
Attivio Customer Success Story - Durkheim Project Attivio Customer Success Story - Durkheim Project
Attivio Customer Success Story - Durkheim Project
 
Attivio Customer Success Story - Citi
Attivio Customer Success Story - CitiAttivio Customer Success Story - Citi
Attivio Customer Success Story - Citi
 
Accelerate Data Discovery
Accelerate Data Discovery   Accelerate Data Discovery
Accelerate Data Discovery
 
Customer Retention & Upsell Solution Brief
Customer Retention & Upsell Solution BriefCustomer Retention & Upsell Solution Brief
Customer Retention & Upsell Solution Brief
 
eCommunications Surveillance Solution Brief
eCommunications Surveillance Solution Brief eCommunications Surveillance Solution Brief
eCommunications Surveillance Solution Brief
 
Attivio Product and Company Overview
Attivio Product and Company OverviewAttivio Product and Company Overview
Attivio Product and Company Overview
 
Attivio Active Security Technical Brief
Attivio Active Security Technical BriefAttivio Active Security Technical Brief
Attivio Active Security Technical Brief
 
Attivio Company Brochure
Attivio Company BrochureAttivio Company Brochure
Attivio Company Brochure
 
Attivio Customer Success Story - UBS Neo Customer Retention & Upsell
Attivio Customer Success Story - UBS Neo Customer Retention & UpsellAttivio Customer Success Story - UBS Neo Customer Retention & Upsell
Attivio Customer Success Story - UBS Neo Customer Retention & Upsell
 
Attivio Customer Success Story - National Instruments Search & Discovery
Attivio Customer Success Story - National Instruments Search & DiscoveryAttivio Customer Success Story - National Instruments Search & Discovery
Attivio Customer Success Story - National Instruments Search & Discovery
 
Attivio Customer Success Story - Information Services Search & Discovery
Attivio Customer Success Story - Information Services Search & DiscoveryAttivio Customer Success Story - Information Services Search & Discovery
Attivio Customer Success Story - Information Services Search & Discovery
 
Attivio Customer Success Story - Global Technology Company Search & Discovery
Attivio Customer Success Story - Global Technology Company Search & DiscoveryAttivio Customer Success Story - Global Technology Company Search & Discovery
Attivio Customer Success Story - Global Technology Company Search & Discovery
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

Enterprise Search Myths and Reality - Attivio White Paper

  • 1. Enterprise Search Myths and Realities
  • 2. 2 Myths in Enterprise Search Many people think that since enterprise search has been around for years, all its issues should have been solved and its complexity dramatically reduced. In other words, it ought to be a commodity by now. Although search technologies continue to advance, the origin of the complexity is the nature of language itself – full of ambiguity, contradiction, multiple meanings, and many contexts. The digital search for what you want to know involves a balance between precision (the measure of the usefulness or a result) and recall (the measure of the completeness of the result). Increasing precision without sacrificing recall is a complicated balance to strike, and the commoditization of this balance is one of the myths not currently in line with reality. COMMON SEARCH MYTHS In this document we uncover and correct some of the other myths, assumptions, and misconceptions about enterprise search that prevail in the market today. Myth Web search and Enterprise search are more or less the same. Reality This is the assumption behind the statement, “Why can’t I get a Google interface for my company?” The quick answer is, “because you don’t actually want it.” The more appropriate question is, “Why can I find what I’m looking for on Google more easily than I can on my own corporate website?” That is indeed a problem, and it happens, ironically, when people try to solve it from a Web search perspective. To Google’s credit, they have been successful in forcing the enterprise search market to realize that ease of use for the end user (and ease of use in general) should be given the serious attention it deserves. Agreed. But the demands of the enterprise user are different and the complexities of source content are greater than a public search of the Web. Users expect a better answer to their question because they are more intimate with their content. They are not looking for the most popular answer – the general logic behind ranking for Web search – they want the right answer. Consequently, the ranking model for the enterprise is much more complex than for the Web. It must consider several parameters in its balance of demand between precision and recall (term frequency, source freshness, spatial proximity, authority, etc.), a balance that changes from application to application. Now add to this greater complexity. The reasons for searching are much more varied. Data types are much more varied. Data freshness (how soon does a new document appear in my search?) is more important. Security is an issue. A typical enterprise supports more content types, formats, and security layers than the entire Web.
  • 3. 3 Finally, search in the enterprise is often in context of a specific solution, for example uncovering legal risk, assessing product campaigns, or buying goods online. While it is popular to start a request for information with a search query, unlike the Web, there is an expectation that further investigation will involve refinement techniques that involve navigating through supplemental information such as facets and concepts. This more exploratory model visualized through navigators, tag clouds, heat maps, and the like, is commonplace in enterprise search but not on the Web. Myth The Web has more content than any enterprise, so Web search companies are the real experts on search. Reality As of summer 2008, Google’s index was just under 21 billion web pages and growing. Yahoo’s was actually higher at around 55 billion1 . If you take an average page size of 200 bytes2 , then Google’s index was then about 4.2TB and Yahoo’s about 11TB. While this is large, and perhaps larger than many enterprise search implementations, there are many, many enterprise search implementations in the tens and hundreds of terabytes, and a few now in the petabyte range. The assumption might be that the corpus is larger because the average document size is also. True for some implementations, but others contain just email. In any case, when it comes to scale, it is intuitive who has the most incentive for efficiency. The Web search companies purchase and host their own hardware. Enterprise search vendors must convince their customers to buy the hardware themselves3 . Myth My relevancy model is better than your relevancy model. Reality Search technology, in some form or another, has been around since Lexis-Nexis first commercialized it in the 1970s. In the 1990s, Google became the catalyst for carving out the web search business as a separate entity, introducing ranking algorithms based on website popularity. It also legitimized the importance of search as a strategic asset in the enterprise. Until then there was little distinction between Web or enterprise search. All the models were more or less the same, based on matching words typed into the search box against words in documents residing in the index. But now that there was real money to be had, enterprise search vendors turned up the volume on competitive differentiation by touting the superiority of their relevancy models. In hindsight, this was a surprising tactic because the math behind relevancy is quite complex and hardly the stuff of debate for the search customer: keyword search, conceptual search, semantic search, scope search, TF/IDF, Bayesian probability, Boolean filtering, query expansion, NLP, text mining, and so on. One vendor proclaims that enterprise search is not rocket science (or brain surgery – it’s
  • 4. 4 interchangeable); enterprise search is harder. Another vendor simply tells its customers it’s too complex, so “leave it to the experts who know how to do this.” Yet, today, as in Time Before Google, the majority of customers are still not satisfied with the results they get from their search engines. In their minds the quality has not really changed. You still hear, “Why can’t I find information in my company as easily as I can find it on Google?” The suggestion that one model is better than another is not so much wrong, but a moot point. Every vendor’s model is good for some content, just not for all content. What makes a good enterprise search vendor is their ability to adapt to the context and character of the content and application requirements by deploying an optimal combination of all these different approaches. Finally, perhaps we’re all arguing about the wrong thing. Relevancy is important, but what of the user’s experience? Is the search technology touching the complete information landscape or just part of it? How is navigation and exploration accomplished? Does the technology act on the results, i.e., connect in to business operations and trigger action? There is more to information access than search, and there is more to search than relevancy. Myth Manual facet management is easy and quick and offers a good user interface. Reality Facets, or dimensions, of results of a search can be used to help navigate to related information. The conventional approach to facet management requires defining the facets before indexing as part of the search platform’s configuration. Some vendors provide a well-designed user interface to make the process as easy as possible. A typical example might be the organization of facets on an electronic retail outlet’s ecommerce website. For productType = 'computer', I can declare my facets in this order: price, make, CPUs, memory, storage, slots, monitor. For laptops, I would add a piece of logic that says if portable='yes' then display the weight facet. The manual approach is not a problem if your objects have a fairly uniform structure (e.g. books). You can have millions of them, but the key is they are all described the same way. But imagine a national retail outlet whose product catalog contains several hundred thousand different products. And further, imagine the catalog changes fairly constantly. The manual process is now a half-year project and the change management a major ongoing commitment. A system that recommends the facets to you automatically and on the fly for each query would remove all this work. The logic behind the ranking is not unlike the ranking for search results, involving a number of calculations to arrive at a composite score (e.g. sparse matrix analysis, clustering, facet distribution, etc.). The algorithms should be smart enough to avoid situations where no facets appear because none are relevant enough to display (e.g. sparse-matrix analysis alone). This can happen with content that has minimal facet intersection.
  • 5. 5 Myth A simple database query is all a search engine needs to extract content from a database. Only the relational database can truly support ad hoc structured querying Reality We challenge this assumption by comparing both relational and search engine technologies, with the goal of proposing a hybrid solution that reflects the advantages of both. Let’s take a look at the relational model first. The relational model, you may recall, was originally designed for managing the transactional integrity of inputting information and for its efficient retrieval through predefined, repetitive reporting. It works because the database schema is designed specifically for the structure of the data and the shape of the reports. But the market began demanding a more ad hoc approach to querying their data for what-if analysis and general exploration. In this situation, the query is no longer repetitive or known in advance, and therefore cannot be planned for in the database engine. The ad hoc query does not sit well with the basic relational model. Any relationship created a priori (all relationships in a database schema) will bias for some queries and against others. Since you do not know the query being asked, you do not know which side it will fall on. It is quite possible to create a “killer query” that brings the database engine to a screeching halt. Attempts to solve this problem have resulted in a continuous evolution of the relational model, twisting it in various ways to provide better performance and greater flexibility (more ad hoc). Technologies include data marts and star schemas, software and hardware data warehouses (e.g., Teradata, Netezza), cubes, and vertical indexing technologies (e.g., Vertica, Sybase IQ). The underlying problem is still there, however. All these technologies still view the problem from a traditional table-column-relationship point of view, and this is inherently limiting. It does not mean we abandon SQL or the need for the relational model to manage transactional data entry and fixed reporting, but it does suggest we should rethink how the basic engine works for the optimization of rapid, high volume, ad hoc information retrieval. What might this new engine look like? Search indexes provide an interesting approach. They are certainly designed for this type of problem. Google, for example, responds to millions of queries a day, searching through billions of documents, each query taking less than a few seconds to respond. No database technology comes close to this type of performance. But then Google does not have to deal with cardinal relationships. It does not have to support the SQL JOIN statement. The JOIN statement is the cornerstone of both reporting and ad hoc querying. For example, we may have a hundred invoices for a customer. In a relational database, that amounts to a 101 tables: one for the customer and a hundred for the invoices. The customer data is stored once but
  • 6. 6 referenced a hundred times. If we want to return all the invoices for a particular customer, or all the customers with invoices greater than a certain amount, the JOIN statement is used to exploit the relationship between the customer and invoice tables. The approach conventional search vendors use to extract content from a relational database is to execute a SQL query against the database, returning a result set of uniform shape that is then indexed. If different data or a different result set is requested, a new query is defined, the search index is reconfigured, and the index is re-indexed. It works this way because search technologies simply do not understand the relational concept. There are many problems with this model. First, the data is “flattened”, meaning all cardinality is removed by repeating content in each result set row. In our example, a flattened result set would include the customer’s properties in every invoice. An updated customer record would require an update to each one of its invoices in the search index. Second, there are no real ad hoc capabilities here. You must know beforehand how your users will explore the database content because you have to predetermine the shape of the results. But often you don’t know what your next question will be until you see the answer to the first. Finally, it is now impossible to JOIN content as a database engine does. A JOIN is not like a search; it is a true Cartesian of results between two sets of content that share a common property value. This does not need to be so. The rapid, high volume, pure ad hoc querying capability of the search index is still valid, but the architecture needs to be enhanced to retain the integrity of the cardinal relationship from the database source. If the search engine was augmented to ingest each table’s rows individually for all the tables in the database, then it would be possible (with some clever work on the vendor’s part) to support a JOIN statement executed on the fly at query time. By the way, because the index contains both structured and unstructured content, the JOIN could be between a table, email, and a set of documents. Further, since this is a search environment, “fuzzy JOINs” are possible that capitalize on standard capabilities such as spell correction and synonym expansion. NEW DEVELOPMENTS IN SEARCH: INFORMATION ACCESS While development in enterprise search and web search continue, a new category called unified information architecture (UIA) is beginning to gain market traction. Unified information Architecture extends enterprise search capabilities across all types of documents, data, and media. This expanded scope replaces legacy enterprise search, offering all its functionality and combining simple access to data and media. The advantages include being able to assemble all relevant information with one query; connecting content and related data; and searching data with a simple search query instead of a structured query language and formal reports. UIA can co-exist with search or replace it outright. For more information about UIA, please visit www.attivo.com.
  • 7. 7 ABOUT ATTIVIO Attivio’s Active Intelligence Engine® (AIE), redefines the business impact of our customers’ information assets, so they can quickly seize opportunities, solve critical challenges and fulfill their strategic vision. Attivio correlates disparate silos of structured data and unstructured content in ways never before possible. Offering both intuitive search capabilities and the power of SQL, AIE seamlessly integrates with existing BI and big data tools to reveal insight that matters, through the access method that best suits each user’s technical skills and priorities. Please visit us at www.attivio.com. Attivio, Inc. • 275 Grove Street • Newton, MA 02466 USA o +1.857.226.5040 • f +1.857.226.5072 • info@attivio.com• www.attivio.com © 2013 Attivio, Inc. All rights reserved. Attivio, Active Intelligence Engine, and all other related logos and product names are registered trademarks of Attivio. All other company, product, and service names are the property of their respective holders. FOOTNOTES 1. Source: http://www.worldwidewebsize.com/. 2. Source: http://www.websiteoptimization.com/speed/tweak/average-web-page/. 3. Although adopting a SaaS model gets around this, the cost is still there. It’s just buried in the monthly fee.