SlideShare a Scribd company logo
1 of 16
Comparing open source
search engines
Richard Boulton
@rboulton
richard@cnav.co.uk
Search Engine?
Document oriented database
Inverted index
Ranking / weighting algorithm
Lucene
Java
Apache License
Low-level: Java API
Lucene Family
Solr: “REST-like” XML/JSON API
ElasticSearch: REST API
… and many commercial engines
Xapian
C++
GPLv2
Low-level: C++ API
Python/Ruby/PHP/Perl/Java bindings
Xapian
C++
GPLv2
Low-level: C++ API
Python/Ruby/PHP/Perl/Java bindings
Partiality
Risk
Xapian Family
Omega: Indexer + CGI interface
Flax: REST API
Xappy: Python wrapper
Sphinx
C++
GPLv2
SQL-like API
Others
Riak Search
Terrier
MySQL Fulltext
PostgreSQL FTS
Redis
Whoosh
Logos
Document model
Lucene, Xapian:
List of terms
Solr, Sphinx:
Fields in a predefined fixed schema.
Flax, Xappy:
Fields, with associated modifiable schema.
ElasticSearch:
Fields, document types, free schema.
Updates
Lucene, Xapian + families:
Dynamic updates
Use batches for fastest updates
Sphinx:
No updates to existing indexes
(“Realtime indexing” in beta with SQL API)
Data structures
Lucene:
Hash based segments
Heirarchical merge
Xapian:
B-tree, transactional
Scaling / replication
● All engines allow searches across databases
● Allows sharding
● All engines allow replication
● Allows spreading load and high availability
● Had difficulty with Sphinx
● Elastic search does it completely transparently
Commercial Support
Lucene: Lucid Imagination, Sematext, …
Xapian: Oligarchy Ltd, Flax, me
Sphinx: Sphinx Technologies Inc
● Lucene / Solr community – revolting (they say)
● Xapian – quieter, but steadily growing
● Sphinx – popular amongst relational database
users (apparently)
Community

More Related Content

What's hot

How search engines work
How search engines workHow search engines work
How search engines work
Chinna Botla
 

What's hot (20)

Front End Development | Introduction
Front End Development | IntroductionFront End Development | Introduction
Front End Development | Introduction
 
Web Scraping and Data Extraction Service
Web Scraping and Data Extraction ServiceWeb Scraping and Data Extraction Service
Web Scraping and Data Extraction Service
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
On-Page SEO Techniques for 2022
On-Page SEO Techniques for 2022On-Page SEO Techniques for 2022
On-Page SEO Techniques for 2022
 
Web Designing Presentation
Web Designing PresentationWeb Designing Presentation
Web Designing Presentation
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational Database
 
ppt of web designing and development
ppt of web designing and developmentppt of web designing and development
ppt of web designing and development
 
HTML
HTMLHTML
HTML
 
Top web development tools
Top web development toolsTop web development tools
Top web development tools
 
CSS
CSSCSS
CSS
 
Javascript
JavascriptJavascript
Javascript
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
 
How search engines work
How search engines workHow search engines work
How search engines work
 
Search engine
Search engineSearch engine
Search engine
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Json
JsonJson
Json
 
Tutorial Screenshot
Tutorial ScreenshotTutorial Screenshot
Tutorial Screenshot
 
HTTP Request Header and HTTP Status Code
HTTP Request Header and HTTP Status CodeHTTP Request Header and HTTP Status Code
HTTP Request Header and HTTP Status Code
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 

Viewers also liked

探索 Everything 背后的技术
探索 Everything 背后的技术探索 Everything 背后的技术
探索 Everything 背后的技术
yiwenshengmei
 
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf Conference
 

Viewers also liked (12)

Xapian vs sphinx
Xapian vs sphinxXapian vs sphinx
Xapian vs sphinx
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
探索 Everything 背后的技术
探索 Everything 背后的技术探索 Everything 背后的技术
探索 Everything 背后的技术
 
The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
How to build_a_search_engine
How to build_a_search_engineHow to build_a_search_engine
How to build_a_search_engine
 
Elasticsearch cluster deep dive
Elasticsearch  cluster deep diveElasticsearch  cluster deep dive
Elasticsearch cluster deep dive
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Firefox OS de la théorie à la pratique - OSDC
Firefox OS de la théorie à la pratique - OSDCFirefox OS de la théorie à la pratique - OSDC
Firefox OS de la théorie à la pratique - OSDC
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Similar to Comparing open source search engines

Java JSON Parser Comparison
Java JSON Parser ComparisonJava JSON Parser Comparison
Java JSON Parser Comparison
Allan Huang
 

Similar to Comparing open source search engines (20)

Phalcon 2 High Performance APIs - DevWeekPOA 2015
Phalcon 2 High Performance APIs - DevWeekPOA 2015Phalcon 2 High Performance APIs - DevWeekPOA 2015
Phalcon 2 High Performance APIs - DevWeekPOA 2015
 
A high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTSA high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTS
 
Salesforce Integration
Salesforce IntegrationSalesforce Integration
Salesforce Integration
 
Salesforce REST API
Salesforce  REST API Salesforce  REST API
Salesforce REST API
 
Solr -
Solr - Solr -
Solr -
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
 
Integration on Force.com Platform
Integration on Force.com PlatformIntegration on Force.com Platform
Integration on Force.com Platform
 
API Platform 2.1: when Symfony meets ReactJS (Symfony Live 2017)
API Platform 2.1: when Symfony meets ReactJS (Symfony Live 2017)API Platform 2.1: when Symfony meets ReactJS (Symfony Live 2017)
API Platform 2.1: when Symfony meets ReactJS (Symfony Live 2017)
 
Java JSON Parser Comparison
Java JSON Parser ComparisonJava JSON Parser Comparison
Java JSON Parser Comparison
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
The return of an old enemy
The return of an old enemyThe return of an old enemy
The return of an old enemy
 
LAJUG Napster REST API
LAJUG Napster REST APILAJUG Napster REST API
LAJUG Napster REST API
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processing
 
API Platform and Symfony: a Framework for API-driven Projects
API Platform and Symfony: a Framework for API-driven ProjectsAPI Platform and Symfony: a Framework for API-driven Projects
API Platform and Symfony: a Framework for API-driven Projects
 
Talis Platform: A Linked Data Engine
Talis Platform: A Linked Data EngineTalis Platform: A Linked Data Engine
Talis Platform: A Linked Data Engine
 
Confluent and Elastic
Confluent and ElasticConfluent and Elastic
Confluent and Elastic
 
LAMP Stack Tutorial by jeetendra mandal
LAMP Stack Tutorial by  jeetendra mandalLAMP Stack Tutorial by  jeetendra mandal
LAMP Stack Tutorial by jeetendra mandal
 
Creating hypermedia APIs in a few minutes using the API Platform framework
Creating hypermedia APIs in a few minutes using the API Platform frameworkCreating hypermedia APIs in a few minutes using the API Platform framework
Creating hypermedia APIs in a few minutes using the API Platform framework
 
The Glory of Rest
The Glory of RestThe Glory of Rest
The Glory of Rest
 

More from Richard Boulton (8)

Improving relevance with log information
Improving relevance with log informationImproving relevance with log information
Improving relevance with log information
 
Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8
 
Making a simple question into a complicated query
Making a simple question into a complicated queryMaking a simple question into a complicated query
Making a simple question into a complicated query
 
Interfaces to xapian
Interfaces to xapianInterfaces to xapian
Interfaces to xapian
 
Haystack
HaystackHaystack
Haystack
 
Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009
 
Optimising Xapian
Optimising XapianOptimising Xapian
Optimising Xapian
 
The Xapian Open Source Search Engine
The Xapian Open Source Search EngineThe Xapian Open Source Search Engine
The Xapian Open Source Search Engine
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Comparing open source search engines