II-SDV 2015, 20 - 21 April, in Nice

1
An Overview of the Enterprise Search
Market, & Current Best Practices
Iain Fletcher
ifletcher@Searchtechnologies.com
April 20, 2015

2
Agenda
• A brief overview of the current enterprise search
market
• The convergence of search with analytics
disciplines
• Likely future architectures for search applications

3
Search engines continue to proliferate…

4
High-level Search Engine Classifications
1. Part of a portfolio, many are recently acquired technologies
– E.g. SharePoint, HP Autonomy, IBM/Vivisimo, Dassault/Exalead
2. Stand-alone specialists, often bought to address specific apps
– E.g. GSA, Coveo, Attivio, Sinequa, Recommind
3. Open source, with or without support or proprietary add-ons
– Raw: E.g. Lucene, Solr, Elasticsearch
– With support/add-ons: E.g. LucidWorks, Cloudera Search, Elastic
4. Cloud-based services, typically based on open source technology
– E.g. Amazon Cloudsearch, MS Azure search

5
The dominant market share is with SharePoint, open
source, and the Google Search Appliance
• SharePoint 2013 search is credible, and bundled
– Search teams are under pressure to use it, or to provide a
compelling reason to do otherwise
• Solr and Elasticsearch are robust and reliable
– Thanks to very wide-spread deployment
• The Google brand sells search – and a lot of GSAs have
been shipped during the past few years
Market Observations

6
Functional Observations
• Core indexing / searching is generally fast and reliable
– Search is a maturing technology
• Key differences remain in peripheral functionality, such as
content processing prior to indexing. For example:
– Coveo, Attivio, Sinequa all have well-developed indexing
pipelines, UI tools, and a range of data connectors
– SharePoint and GSA have limited content processing
functionality and rely on 3rd parties for connectivity
– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t
provide a formal indexing pipeline, UI, or connectors

7
Further Observations
• The search engines with less focus on peripheral issues
(such as content processing and connectivity) have
dominant market share
• Connectivity remains challenging, especially when
combined with continual data growth
• The movement of data sets to the cloud adds further
complexity
– Hybrid indexing environments will be with us for some years

8
Content Processing / Text Analysis Examples
• Normalization
– Names, dates, synonyms, spelling
• Entity identification and resolution
• Additional metadata from content analysis
• Categorization
• Document vector extraction
• Splitting and concatenation
• Dupe & near-dupe detection
• Link analysis
• Ingesting external signals
• Security enforcement and analysis
Index
security
category
metadata

9
Future Directions
So what will search architectures look like in the future?
Important Influences:
• The need for organizational and analytical agility
• The convergence of search and (“big data”) analytics
• Continual growth in data volumes, and churn in repository
/ storage fashions

10
Converging Architectures
Let’s take a brief look at:
1. The “Big Data Architecture”, evangelized by IBM,
Cloudera, etc.
2. Contemporary Search Architectures
Background Info

11
The Big Data Architecture
Designed for Structured Data

12
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
Designed for Unstructured Content

13
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
• A few documents-per-second?
• There are only 2.6 million seconds in a month
• If you change something significant in the index
pipeline, you will need to re-index
RE-INDEX

14
A Better Search Architecture
• Re-indexing rates greatly improved
• “Touch-time” with repositories can be managed autonomously
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
Index
Employee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development

15
The Future Architecture?
Hadoop
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development
• This environment will encourage ever more sophisticated content
processing
• We expect much innovation in text analytics during the next few years
• Driven by cheap, easily available processing power
• The deliverable is a richer search index

16
The Future Architecture
Hadoop
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development
• Google.com works something like this
for 10+ years

17
An Integrated Search/Analytics Architecture
Hadoop
Content
Sources
Connectors
/ Crawlers
CMS
File system
Rapid, & ad hoc Indexing
Content
Processing
Staging
Repository
Iterative
Development
ETL
Data
Sources
Data
Warehouse
Logfiles
Etc.
OSINT Search
App.
Search
App.
Analysis
App.
Analysis
App.
• Encourages agile exploitation of data and content resources

18
Summary
• Search and Analytics are tending towards to the same
architecture
• Autonomous connectivity and content processing systems
simplify and de-risk projects
• The “search index” is a mature technology, and becoming a
commodity
– Thanks to open source alternatives setting high standards
• The centre of attention is shifting from the index to the
content preparation
– This perhaps fits well with the profile of dominant market
leaders: SharePoint, GSA, Solr, Elasticsearch….

19
Conclusion
• The foundation of great search and analytical applications
is a clean, rich and detailed index
• Much of the innovation during the next years will be in
content analytics
– The architecture discussed makes it easy to adopt new ideas
and products
– And it promotes agility, experimentation, and innovation
• In a data-driven world, agility is vital

20
The analyst quote….
And finally….
“Enterprise Search Can Bring Big Data Within Reach”
• Multiple, purpose-built indexes that are derived from enriched
content are necessary.
http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/
* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog

21
An Overview of the Current Enterprise
Search Market, & Current Best Practices
Iain Fletcher
ifletcher@Searchtechnologies.com
April 20, 2015
Thank you!

23
Reference Architecture
Content
sources
Connectors
Indexes
Semantics
Text Mining
Quality
Metrics
Content Processing Pipelines
Big Data Framework
Indexes
Query
parsing
Search Engine
Web Browser
Staging
Repository

II-SDV 2015, 20 - 21 April, in Nice

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

More from Dr. Haxel Consult

More from Dr. Haxel Consult (20)

Recently uploaded

Recently uploaded (19)

II-SDV 2015, 20 - 21 April, in Nice