SlideShare a Scribd company logo
Presented by
Veera Shekar G
Google Search VS Advanced Search (Enterprise
Search implemtation)
8/6/2015
11/05/2015
• A Normal Search engine processes.
• You will understand how search Engine Works.
• I am beginner at this subject.
• 5 Top requirements for Effective Enterprise search implementation.
• Problem with implementations.
Introduction
8/6/2015
11/05/2015
• Topic 1: How Search engine works.
▫ Will see architecture and component details.
• Topic 2: Google Search.
▫ Phases of implementation. Indexing architecture.
• Topic 3: Top 5 requirements for implementing Enterprise search.
▫ Options available for implementations.
Session Outline
8/6/2015
11/05/2015
• A Normal Search Engine Architecture.
• Architecture of a search engine factors determined .
• Indexing Process.
Topic 1: Objectives
8/6/2015
11/05/2015
• Architecture of a search engine can be viewed
as 2 Layered
Topic 1: Content – Normal Search engine Architecture
8/6/2015
11/05/2015
• Architecture of a search engine determined by 2
requirements –
effectiveness (quality of results)
efficiency (response time and throughput)
Topic 1: Content - Factors
8/6/2015
11/05/2015
• Text acquisition –identifies and stores documents for indexing.
• Text transformation –transforms documents into index terms or features
• Index creation –
takes index terms and creates data structures (indexes) to support fast searching
Topic 1: Content
8/6/2015
11/05/2015
• Search engine will have two main processes Indexing process and
Querying Process.
• Questions?
Topic 1: Wrap-up
8/6/2015
11/05/2015
• High Level Architecture of Google search.
• Web Crawlers.
• Technologies Used.
Topic 2: Google Search
8/6/201511/05/2015
Topic 2: Content - High Level Architecture
8/6/201511/05/2015
• A web crawler is a program that, given one or more seed URLs,
downloads the web pages associated with these URLs, extracts any
hyperlinks contained in them.
• Recursively continues to download the web pages identified by these
hyperlinks. Web crawlers are an important component of web search
engines, where they are used to collect the corpus of web pages
indexed by the search engine.
Topic 2: Content - Web Crawlers
8/6/201511/05/2015
• Google visualizes their infrastructure as a three layer stack:
• Products: search, advertising, email, maps, video, chat, blogger
• Distributed Systems Infrastructure: GFS, MapReduce, and BigTable.
• Computing Platforms: a bunch of machines in a bunch of different data
centers
• Make sure easy for folks in the company to deploy at a low cost.
• Look at price performance data on a per application basis. Spend more
money on hardware to not lose log data, but spend less on other types
of data. Having said that, they don't lose data.
Topic 2: Content – Technologies Stack
8/6/201511/05/2015
• Google Technology stack.
• Web-crawlers.
Topic 2: Wrap-up
8/6/201511/05/2015
• Top 5 requirements for implementing Enterprise search.
• Options available at each requirement.
Topic 3: Objectives
8/6/201511/05/2015
• Diverse Content: Ability to crawl, index and search diverse content repository.
The Web, Microsoft SQL database and SharePoint content management systems.
• Secured Search: Ability to crawl secured content and make it accessible to only authorized people
and/or groups.
Single sign-on, forms-based authentication.
• User Interface: Ability to provide various user interface (UI) components to serve end users with
precise results.
Guided navigation, related search terms, related articles and best bets.
AutoSuggest with terms combined from real-time search and custom (user configurable) terms
in data stores
• Desktop Search: Ability to integrate with content stored in the desktop.
• Social Search: Ability to find other people, ratings and expertise within the organization.
Topic 3: Content - Top 5 requirements for implementing
Enterprise search
8/6/201511/05/2015
• Google Web crawler for crawling and indexing Web content (GOOTB).
• Google DB connector for crawling and indexing Microsoft SQL database (GOOTB).
• Google SharePoint connector for crawling and indexing SharePoint content (GOOTB).
• Google forms authentication for index time authorization and serve time authentication
(GOOTB).
• Google front-end configuration for:
> Faceted search, aka guided navigation (limited OOTB).
> Related search terms (GOOTB).
> Related articles (GOOTB).
> Best bets (GOOTB).
> Autosuggest (GOOTB and custom application).
• Google desktop search component integration (external Google component).
• Google results integration with internal rating system
Topic 3: Content – Google implementing requirements
8/6/201511/05/2015
8/6/201511/05/2015
• Google Web Crawler.
• Disadvantage: As efficient and good as it sounds, one disadvantage of
Web crawler is Google’s inability to reveal the exact page that is
currently being processed.
• Alternative: The OS console monitor and/ or tracking log files are some
ways that could help track URL crawl status.
• At any point of time, a developer should be able to view the current URL
being crawled and issues faced (if any) with security. Almost all tools
provide this feature – such as Solr, FAST, Endeca and Autonomy.
Topic 3: Content – Web crawler
8/6/201511/05/2015
• Database Connector.
• Disadvantage:
Google’s inability to allow end implementers to schedule DB crawl
Poor diagnostics for connector/XML-fed content.
Google’s way of removing content from index is quite primitive and time-consuming.
• Alternative: Alternative: Compared to GSA, It found Apache Solr is a better
option for indexing the database via data import handler.
• Solr provides an effective way to remove content from the index, either via
the admin console or via XML import (/update with delete option).
Topic 3: Content – Database Connector
8/6/201511/05/2015
• Google provides connectors to very few CMS systems out of the box.
• Disadvantage:
Even if Google is executing a bulk late binding, performance issues
at query time are inevitable when the document volume is high.
• Alternative: One alternate is to consider the site/page/document level
security as an additional metadata, develop an application that would
post-filter the results based on end-user security attributes. This is again
a primitive method and has its own disadvantages in terms of query
time latency.
Topic 3: Content – SharePoint Connector (for Document
Management system)
8/6/201511/05/2015
• At query time, Google uses the query time configuration to make an HEAD
request that would allow the logged-in user (within a specific domain) to view
only the content that he is authorized to view
.
• Disadvantage:
This late binding security model has performance degradation is
inevitable with higher QPS and/or higher results count.
• Alternative: There are tools that support an early binding security model that
allows the search engine to cache the user security groups along with the
content.
Topic 3: Content – Forms Authentication
8/6/201511/05/2015
• One disadvantage with Apache Solr is that it does not handle secured
content. The only way to serve secured content is to store the security
tags/groups as one of the metadata and implement a field (or
metadata) constrained search.
• That is were ACL’s come into picture.
Note
8/6/201511/05/2015
• GSA provides an open source component called “search-as-you-type” which
allows end implementers to fetch real-time results from the appliance.
• Disadvantage:
Onebox modules are designed to respond within one second. This could
result in no results from TermFederator if there is any delay at the
database.
• Alternative: “TermComponent” in Apache Solr is an effective autosuggest tool.
Terms stored in any local text file can be made available to Solr at startup. A
separate component designed to merge alphabetically.
Topic 3: Content – Auto Suggest
8/6/201511/05/2015
• Best Bets — aka Keymatches, aka AdWords.
• Related search terms same as synonyms.
• Faceted search, aka Guided Navigation: GSA does not support faceted search.
But this feature can be achieved via metadata constrained search at query time,
similar to how it is implemented in Solr.
• Disadvantage: Facet count in GSA is not available OOTB.
• Alternative: Faceted search is one of Apache Solr’s strongest features and is
implemented within many e-commerce Website
And (Oracle) Endeca and (HP) Autonomy maintain content hierarchy for guided
navigation.
Topic 3: Content – User Interface
8/6/201511/05/2015
• InfoValuator component captures end-user rating and saves a
combination of user identity, content URI and value rating in the backend
data store.
Topic 3: Content – InfoValuator
8/6/201511/05/2015
• There is no one search engine that fulfills all enterprise search
requirements. HP Autonomy claims this lofty perch but it comes with a
huge cost overhead, with the base cost crossing half a million dollars.
• Google is not the right fit for many requirements that we have seen so
far. Custom search application development is inevitable and if well
planned, we can basically use any tool in the market to implement
enterprise search as a full-fledged application.
Summary of Session
8/6/201511/05/2015

More Related Content

What's hot

How goole search engine work
How goole search engine workHow goole search engine work
How goole search engine work
SoftCrayons Tech Solutions
 
CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365
Joris Poelmans
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
Vital.AI
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
Philippe Mizrahi
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
Relecura - Features Overview
Relecura - Features OverviewRelecura - Features Overview
Relecura - Features Overview
Relecura Inc.
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016
Norberto Leite
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
What is Business Intelligence
What is Business IntelligenceWhat is Business Intelligence
What is Business Intelligence
Dries Vyvey
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
semanticsconference
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
Mark Grover
 
Improve Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioImprove Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - Comperio
Comperio - Search Matters.
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Eric Shupps
 
Taming Information Chaos in SharePoint 2010
Taming Information Chaos in SharePoint 2010Taming Information Chaos in SharePoint 2010
Taming Information Chaos in SharePoint 2010Eric Shupps
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
Joris Poelmans
 
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest GroupGetting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Corey Roth
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
Hisham Arafat
 
PatSeer Patent Database Overview
PatSeer Patent Database OverviewPatSeer Patent Database Overview
PatSeer Patent Database Overview
Harshad Karmarkar
 

What's hot (20)

How goole search engine work
How goole search engine workHow goole search engine work
How goole search engine work
 
CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Relecura - Features Overview
Relecura - Features OverviewRelecura - Features Overview
Relecura - Features Overview
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
What is Business Intelligence
What is Business IntelligenceWhat is Business Intelligence
What is Business Intelligence
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
Improve Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioImprove Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - Comperio
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
 
Taming Information Chaos in SharePoint 2010
Taming Information Chaos in SharePoint 2010Taming Information Chaos in SharePoint 2010
Taming Information Chaos in SharePoint 2010
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest GroupGetting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
PatSeer Patent Database Overview
PatSeer Patent Database OverviewPatSeer Patent Database Overview
PatSeer Patent Database Overview
 

Viewers also liked

Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Findwise
 
Tips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azureTips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azure
lucenerevolution
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
David Smiley
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
 
User generated advertising
User generated advertisingUser generated advertising
User generated advertisingamcgaugh
 
Films for 41 Million People
Films for 41 Million PeopleFilms for 41 Million People
Films for 41 Million People
blackjack48
 
Virtual dj 7 user guide
Virtual dj 7   user guideVirtual dj 7   user guide
Virtual dj 7 user guide
ingmauser
 
Question 6
Question 6 Question 6
Question 6 caddy20
 
Ugly duckling upload on diigo
Ugly duckling upload on diigoUgly duckling upload on diigo
Ugly duckling upload on diigoemo5073
 
Recruiting Slide Linked In
Recruiting Slide   Linked InRecruiting Slide   Linked In
Recruiting Slide Linked Inua131313
 
Bsp presentation
Bsp presentationBsp presentation
Bsp presentationAllan Vega
 
Brazil 2014: A look at how brands are celebrating the World Cup
Brazil 2014: A look at how brands are celebrating the World CupBrazil 2014: A look at how brands are celebrating the World Cup
Brazil 2014: A look at how brands are celebrating the World Cup
Jannet Cerrer Sta Monica
 
Local Artisanal Mining in Kenya
Local Artisanal Mining in KenyaLocal Artisanal Mining in Kenya
Local Artisanal Mining in Kenya
JMegann
 
Lcs Annual Report 06
Lcs Annual Report 06Lcs Annual Report 06
Lcs Annual Report 06
elbaarman
 
E-Commerce & its pratices
E-Commerce & its praticesE-Commerce & its pratices
E-Commerce & its pratices
Arvindh Sarangapani
 

Viewers also liked (20)

Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Tips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azureTips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azure
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
Sistem gerak pada manusi appt(3)
Sistem gerak pada manusi appt(3)Sistem gerak pada manusi appt(3)
Sistem gerak pada manusi appt(3)
 
IIRSI 2013 CONFERENCE
IIRSI 2013 CONFERENCEIIRSI 2013 CONFERENCE
IIRSI 2013 CONFERENCE
 
User generated advertising
User generated advertisingUser generated advertising
User generated advertising
 
Films for 41 Million People
Films for 41 Million PeopleFilms for 41 Million People
Films for 41 Million People
 
Virtual dj 7 user guide
Virtual dj 7   user guideVirtual dj 7   user guide
Virtual dj 7 user guide
 
Bio report
Bio reportBio report
Bio report
 
Question 6
Question 6 Question 6
Question 6
 
Ugly duckling upload on diigo
Ugly duckling upload on diigoUgly duckling upload on diigo
Ugly duckling upload on diigo
 
Recruiting Slide Linked In
Recruiting Slide   Linked InRecruiting Slide   Linked In
Recruiting Slide Linked In
 
Bsp presentation
Bsp presentationBsp presentation
Bsp presentation
 
Play station 4
Play station 4Play station 4
Play station 4
 
Brazil 2014: A look at how brands are celebrating the World Cup
Brazil 2014: A look at how brands are celebrating the World CupBrazil 2014: A look at how brands are celebrating the World Cup
Brazil 2014: A look at how brands are celebrating the World Cup
 
Local Artisanal Mining in Kenya
Local Artisanal Mining in KenyaLocal Artisanal Mining in Kenya
Local Artisanal Mining in Kenya
 
Lcs Annual Report 06
Lcs Annual Report 06Lcs Annual Report 06
Lcs Annual Report 06
 
E-Commerce & its pratices
E-Commerce & its praticesE-Commerce & its pratices
E-Commerce & its pratices
 

Similar to Google search vs Solr search for Enterprise search

Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
rtpaem
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchC/D/H Technology Consultants
 
Seo for Engineers
Seo for EngineersSeo for Engineers
Seo for Engineers
Cort Tafoya
 
13 Things Developers Forget When Launching Public Websites
13 Things Developers Forget When Launching Public Websites13 Things Developers Forget When Launching Public Websites
13 Things Developers Forget When Launching Public Websites
AJi
 
Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60
Christian Buckley
 
Top ten new ECM features in SharePoint 2013
Top ten new ECM features in SharePoint 2013Top ten new ECM features in SharePoint 2013
Top ten new ECM features in SharePoint 2013
John F. Holliday
 
Product Catalog and IT Service Management
Product Catalog and IT Service ManagementProduct Catalog and IT Service Management
Product Catalog and IT Service Management
Drew Madelung
 
20150211 seo in drupal presentation
20150211 seo in drupal presentation20150211 seo in drupal presentation
20150211 seo in drupal presentation
Dagmar Muth
 
Top 7 mistakes
Top 7 mistakesTop 7 mistakes
Top 7 mistakes
Talbott Crowell
 
Search Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for DevelopersSearch Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for Developers
Matthew Robinson
 
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP
 
Atlan_Product metering_Subrat.pdf
Atlan_Product metering_Subrat.pdfAtlan_Product metering_Subrat.pdf
Atlan_Product metering_Subrat.pdf
Subrat Kumar Dash
 
Most Important On Page SEO elements
Most Important On Page SEO elementsMost Important On Page SEO elements
Most Important On Page SEO elements
SEOSMOPPC
 
Effective Strategies for Searching Oracle UCM
Effective Strategies for Searching Oracle UCMEffective Strategies for Searching Oracle UCM
Effective Strategies for Searching Oracle UCMFishbowl Solutions
 
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam GentHow Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
Branded3
 
Agile and Technical SEO
Agile and Technical SEOAgile and Technical SEO
Agile and Technical SEO
Adam Gent
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023
Amanda King
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEO
IXIASOFT
 
How to prepare for Google's page experience update
How to prepare for Google's page experience updateHow to prepare for Google's page experience update
How to prepare for Google's page experience update
Builtvisible
 

Similar to Google search vs Solr search for Enterprise search (20)

Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 Search
 
Seo for Engineers
Seo for EngineersSeo for Engineers
Seo for Engineers
 
13 Things Developers Forget When Launching Public Websites
13 Things Developers Forget When Launching Public Websites13 Things Developers Forget When Launching Public Websites
13 Things Developers Forget When Launching Public Websites
 
Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60
 
Top ten new ECM features in SharePoint 2013
Top ten new ECM features in SharePoint 2013Top ten new ECM features in SharePoint 2013
Top ten new ECM features in SharePoint 2013
 
Product Catalog and IT Service Management
Product Catalog and IT Service ManagementProduct Catalog and IT Service Management
Product Catalog and IT Service Management
 
20150211 seo in drupal presentation
20150211 seo in drupal presentation20150211 seo in drupal presentation
20150211 seo in drupal presentation
 
Top 7 mistakes
Top 7 mistakesTop 7 mistakes
Top 7 mistakes
 
Search Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for DevelopersSearch Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for Developers
 
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
 
Atlan_Product metering_Subrat.pdf
Atlan_Product metering_Subrat.pdfAtlan_Product metering_Subrat.pdf
Atlan_Product metering_Subrat.pdf
 
Most Important On Page SEO elements
Most Important On Page SEO elementsMost Important On Page SEO elements
Most Important On Page SEO elements
 
Effective Strategies for Searching Oracle UCM
Effective Strategies for Searching Oracle UCMEffective Strategies for Searching Oracle UCM
Effective Strategies for Searching Oracle UCM
 
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam GentHow Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
 
Agile and Technical SEO
Agile and Technical SEOAgile and Technical SEO
Agile and Technical SEO
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023
 
Real world rm in share point 2013
Real world rm in share point 2013Real world rm in share point 2013
Real world rm in share point 2013
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEO
 
How to prepare for Google's page experience update
How to prepare for Google's page experience updateHow to prepare for Google's page experience update
How to prepare for Google's page experience update
 

Recently uploaded

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

Google search vs Solr search for Enterprise search

  • 1. Presented by Veera Shekar G Google Search VS Advanced Search (Enterprise Search implemtation) 8/6/2015 11/05/2015
  • 2. • A Normal Search engine processes. • You will understand how search Engine Works. • I am beginner at this subject. • 5 Top requirements for Effective Enterprise search implementation. • Problem with implementations. Introduction 8/6/2015 11/05/2015
  • 3. • Topic 1: How Search engine works. ▫ Will see architecture and component details. • Topic 2: Google Search. ▫ Phases of implementation. Indexing architecture. • Topic 3: Top 5 requirements for implementing Enterprise search. ▫ Options available for implementations. Session Outline 8/6/2015 11/05/2015
  • 4. • A Normal Search Engine Architecture. • Architecture of a search engine factors determined . • Indexing Process. Topic 1: Objectives 8/6/2015 11/05/2015
  • 5. • Architecture of a search engine can be viewed as 2 Layered Topic 1: Content – Normal Search engine Architecture 8/6/2015 11/05/2015
  • 6. • Architecture of a search engine determined by 2 requirements – effectiveness (quality of results) efficiency (response time and throughput) Topic 1: Content - Factors 8/6/2015 11/05/2015
  • 7. • Text acquisition –identifies and stores documents for indexing. • Text transformation –transforms documents into index terms or features • Index creation – takes index terms and creates data structures (indexes) to support fast searching Topic 1: Content 8/6/2015 11/05/2015
  • 8. • Search engine will have two main processes Indexing process and Querying Process. • Questions? Topic 1: Wrap-up 8/6/2015 11/05/2015
  • 9. • High Level Architecture of Google search. • Web Crawlers. • Technologies Used. Topic 2: Google Search 8/6/201511/05/2015
  • 10. Topic 2: Content - High Level Architecture 8/6/201511/05/2015
  • 11. • A web crawler is a program that, given one or more seed URLs, downloads the web pages associated with these URLs, extracts any hyperlinks contained in them. • Recursively continues to download the web pages identified by these hyperlinks. Web crawlers are an important component of web search engines, where they are used to collect the corpus of web pages indexed by the search engine. Topic 2: Content - Web Crawlers 8/6/201511/05/2015
  • 12. • Google visualizes their infrastructure as a three layer stack: • Products: search, advertising, email, maps, video, chat, blogger • Distributed Systems Infrastructure: GFS, MapReduce, and BigTable. • Computing Platforms: a bunch of machines in a bunch of different data centers • Make sure easy for folks in the company to deploy at a low cost. • Look at price performance data on a per application basis. Spend more money on hardware to not lose log data, but spend less on other types of data. Having said that, they don't lose data. Topic 2: Content – Technologies Stack 8/6/201511/05/2015
  • 13. • Google Technology stack. • Web-crawlers. Topic 2: Wrap-up 8/6/201511/05/2015
  • 14. • Top 5 requirements for implementing Enterprise search. • Options available at each requirement. Topic 3: Objectives 8/6/201511/05/2015
  • 15. • Diverse Content: Ability to crawl, index and search diverse content repository. The Web, Microsoft SQL database and SharePoint content management systems. • Secured Search: Ability to crawl secured content and make it accessible to only authorized people and/or groups. Single sign-on, forms-based authentication. • User Interface: Ability to provide various user interface (UI) components to serve end users with precise results. Guided navigation, related search terms, related articles and best bets. AutoSuggest with terms combined from real-time search and custom (user configurable) terms in data stores • Desktop Search: Ability to integrate with content stored in the desktop. • Social Search: Ability to find other people, ratings and expertise within the organization. Topic 3: Content - Top 5 requirements for implementing Enterprise search 8/6/201511/05/2015
  • 16. • Google Web crawler for crawling and indexing Web content (GOOTB). • Google DB connector for crawling and indexing Microsoft SQL database (GOOTB). • Google SharePoint connector for crawling and indexing SharePoint content (GOOTB). • Google forms authentication for index time authorization and serve time authentication (GOOTB). • Google front-end configuration for: > Faceted search, aka guided navigation (limited OOTB). > Related search terms (GOOTB). > Related articles (GOOTB). > Best bets (GOOTB). > Autosuggest (GOOTB and custom application). • Google desktop search component integration (external Google component). • Google results integration with internal rating system Topic 3: Content – Google implementing requirements 8/6/201511/05/2015
  • 18. • Google Web Crawler. • Disadvantage: As efficient and good as it sounds, one disadvantage of Web crawler is Google’s inability to reveal the exact page that is currently being processed. • Alternative: The OS console monitor and/ or tracking log files are some ways that could help track URL crawl status. • At any point of time, a developer should be able to view the current URL being crawled and issues faced (if any) with security. Almost all tools provide this feature – such as Solr, FAST, Endeca and Autonomy. Topic 3: Content – Web crawler 8/6/201511/05/2015
  • 19. • Database Connector. • Disadvantage: Google’s inability to allow end implementers to schedule DB crawl Poor diagnostics for connector/XML-fed content. Google’s way of removing content from index is quite primitive and time-consuming. • Alternative: Alternative: Compared to GSA, It found Apache Solr is a better option for indexing the database via data import handler. • Solr provides an effective way to remove content from the index, either via the admin console or via XML import (/update with delete option). Topic 3: Content – Database Connector 8/6/201511/05/2015
  • 20. • Google provides connectors to very few CMS systems out of the box. • Disadvantage: Even if Google is executing a bulk late binding, performance issues at query time are inevitable when the document volume is high. • Alternative: One alternate is to consider the site/page/document level security as an additional metadata, develop an application that would post-filter the results based on end-user security attributes. This is again a primitive method and has its own disadvantages in terms of query time latency. Topic 3: Content – SharePoint Connector (for Document Management system) 8/6/201511/05/2015
  • 21. • At query time, Google uses the query time configuration to make an HEAD request that would allow the logged-in user (within a specific domain) to view only the content that he is authorized to view . • Disadvantage: This late binding security model has performance degradation is inevitable with higher QPS and/or higher results count. • Alternative: There are tools that support an early binding security model that allows the search engine to cache the user security groups along with the content. Topic 3: Content – Forms Authentication 8/6/201511/05/2015
  • 22. • One disadvantage with Apache Solr is that it does not handle secured content. The only way to serve secured content is to store the security tags/groups as one of the metadata and implement a field (or metadata) constrained search. • That is were ACL’s come into picture. Note 8/6/201511/05/2015
  • 23. • GSA provides an open source component called “search-as-you-type” which allows end implementers to fetch real-time results from the appliance. • Disadvantage: Onebox modules are designed to respond within one second. This could result in no results from TermFederator if there is any delay at the database. • Alternative: “TermComponent” in Apache Solr is an effective autosuggest tool. Terms stored in any local text file can be made available to Solr at startup. A separate component designed to merge alphabetically. Topic 3: Content – Auto Suggest 8/6/201511/05/2015
  • 24. • Best Bets — aka Keymatches, aka AdWords. • Related search terms same as synonyms. • Faceted search, aka Guided Navigation: GSA does not support faceted search. But this feature can be achieved via metadata constrained search at query time, similar to how it is implemented in Solr. • Disadvantage: Facet count in GSA is not available OOTB. • Alternative: Faceted search is one of Apache Solr’s strongest features and is implemented within many e-commerce Website And (Oracle) Endeca and (HP) Autonomy maintain content hierarchy for guided navigation. Topic 3: Content – User Interface 8/6/201511/05/2015
  • 25. • InfoValuator component captures end-user rating and saves a combination of user identity, content URI and value rating in the backend data store. Topic 3: Content – InfoValuator 8/6/201511/05/2015
  • 26. • There is no one search engine that fulfills all enterprise search requirements. HP Autonomy claims this lofty perch but it comes with a huge cost overhead, with the base cost crossing half a million dollars. • Google is not the right fit for many requirements that we have seen so far. Custom search application development is inevitable and if well planned, we can basically use any tool in the market to implement enterprise search as a full-fledged application. Summary of Session 8/6/201511/05/2015

Editor's Notes

  1. How presentation will benefit audience: Adult learners are more interested in a subject if they know how or why it is important to them. Presenter’s level of expertise in the subject: Briefly state your credentials in this area, or explain why participants should listen to you.
  2. Lesson descriptions should be brief.
  3. Example objectives At the end of this lesson, you will be able to: Save files to the team Web server. Move files to different locations on the team Web server. Share files on the team Web server.