SlideShare a Scribd company logo
Apache Solr
An experience report

2013-10-23 - Corsin Decurtins
Apache Solr
Full-Text Search Engine
Apache Lucene Project
based on Apache Lucene

Apache Solr: http://lucene.apache.org/solr/

Notes

Fast
Proven and Well-Known Technology
Java based
Open APIs
Customizable
Clustering Features
Setting the
Scene
Plaza Search
Full-Text Search Engine for the Intranet of Netcetera
Integrates Various Data Sources
Needs to be fast
Ranking is crucial
Simple Searching
Relevant Filtering Options
Desktop, Tables and Phones

Notes
Warum Intranet-Suchmaschinen unbrauchbar sind
…und was dagegen getan werden kann
2013-07-03 – Corsin Decurtins

http://www.slideshare.net/netceteragroup/20130703-intranet-searchintranetkonferenz
Some Numbers

Live since

Data since

1996

05/2012

~ 275 Users

~ 3'000'000
Documents

~ 500 – 2'000
Searches per day

Index Size
~ 75 GByte

~ 40 Releases
Some Numbers

Notes

Very small load (only a few hundred requests per day)
The indexer agents actually produce a lot more load than the actual end users
Medium size index (at least I think)
Not that many objects, but relatively big documents
Load performance is not a big topic for us
When we talk about performance, we actually usually mean response time
For us

Performance
means

Response Time
File System
Plaza Search

UI

Wiki
Indexer
Plaza Search
Indexer

Rest API

Apache Solr

Indexer

Email Archive

Apache Tika

Issue System

Index

CRM
Architecture
Based on Apache Solr (and other components)
Apache Solr takes care of the text-search aspect
We certainly do not want to build this ourselves
Apache Tika is used for analyzing (file) contents
Also here, we certainly do not want to build this ourselves

Notes
Magic
Apache Solr

Notes

Apache Solr is a very complex system with a lot of knobs and dials
Most things just seem like magic at the beginning … or they just do not work
Apache Solr is extremely powerful with a lot of features
You have to know how to configure the features
Most features need a bit more configuration than just a check box for activating it

Configuration options seem very confusing at the beginning
You do not need to understand everything from the start
Defaults are relatively sensible and the example applications are good starting point
Development Process
Research

Observe
Debug

Think
Design

Configure
Implement
Development Process
In our experience, Apache Solr works best with a very iterative process
Definition of Done is very difficult to specify for search use cases
Iterate through:
- Researching
- Thinking / Designing
- Implementation / Configuration / Testing
- Observing / Analyzing / Debugging

Notes
Research
Observe
Debug
Solr Admin Interface

Notes

Apache Solr has a pretty good admin interface
Very helpful for analysis and (manual) monitoring
If you are not familiar with the Solr Admin interface, you should be
Other tools like profilers, memory analyzers, monitoring tools etc. are also useful
Our Requirements

Correctness
Results that match query

Relevance
Results that matter

Speed
"Instant" results

Intelligence
Do you know what I mean?
Intelligence
Do you know what I mean?

synonyms.txt
stopwords.txt
protwords.txt
Solr Configuration Files

Notes

Solr has pretty much out-of-the-box support for stop words, protected works and
synonyms
These features look very simple, but they are very powerful
Unless you have a very general search use case, the defaults are not enough
Definitely worth developing a configuration specific to your domain
Iterate; consider these features for ranking optimizations as well
Relevance
Results that matter

score
match

boosting

term frequency
inverse document frequency
field weights

boosting function
index time boosting
elevation
Ranking in Solr (simplified)

Notes

Solr determines a score for the results of a query
Score can be used for sorting the results
Score is the product of different factors:
A query-specific part, let's call it the match value that is computed using
term frequency (tf)
inverse document frequency (idf)
There are also other parameters that can influence it (term weights, field weights, …)
The match basically says how well a document matches the query
Ranking in Solr (simplified)

Notes

A generic part (not query specific), let's call this a boosting value
Basically represents the general importance that you assign to a document
Includes a ranking function, e.g. based on the age of the document
Includes a boosting value, that is determined at index time (index-time boosting)
We calculate the boost value based on different attributes of the document, such as
type of resource (people are more important than files)
status of the project that a document is associated with (closed projects are less
important than running projects)
archive flag (archived resources are less important)
…
Ranking Function

recip(ms(NOW,datestamp),3.16e-11,1,1)
Index-Time
Boosting
Regression
Ranking Testing
assertRank("jira", "url",
"https://extranet.netcetera.biz/jira/", 1);
assertRank("jira", "url",
"https://plaza.netcetera.com/.../themas/JIRA", 2);
Regression Testing for the Ranking
Ranking is influenced by various factors
We have continuously executed tests for the ranking
Find ranking regressions as soon as possible
Tests are executed every night, not just with code changes

Notes
War Stories
War Story #1:

Disk Space
Situation
Search is often extremely slow, response times of 20-30s
Situation improves without any intervention
Problem shows up again very soon
Other applications in the same Tomcat server are brought to a grinding halt
No releases within the last 7 days
No significant data changes in the last 7 days
2-3 weeks earlier a new data sources have been added
Index had grown by a factor of 2, but everything worked fine since then

Notes
Disk Usage (fake diagram)
100
80
60
40
20
0
Lucene Index – Disk Usage

Notes

Index needs optimzation from time to time when you update it continuously
Index optimzation uses a lot of resources, i.e. CPU, memory and disk space
Optimzation requires twice the disk space than the optimal index
When your normal index uses > 50% of the available disk space, it's already too late
It's difficult to get out of this situation (without adding disk space)
Deleting stuff from the index does not help, as you need an optimization
Lessons Learned
We need least 2-3 times as much space as the "ideal" index needs
If your index has grown beyond 50%, it's already too late.
Disk Usage Monitoring has to be improved
Some problems take a long time to show themselves
Testing long-term effects and continuous delivery clash to some extent
War Story #2:

Free Memory
Situation
Search is always extremely slow, response times of 20-30s
Other applications in the same Tomcat server show normal performance
No releases within the last few days
No significant data changes in the last few days

Notes
Memory Usage (fake diagram)
12

10
8
6
4
2
0
I/O Caching
OS uses "free" memory for caching
I/O caching has a HUGE impact on I/O heavy applications
Solr (actually Lucene) is a I/O heavy application

Notes
Lessons Learned
Free memory != unused memory
Increasing the heap size can slow down Solr
OS does a better job at caching Solr data than Solr
War Story #3:

Know Your Maths
Situation
Search starts up very fine and is reasonably fast
Out Of Memory Errors after a couple of hours
Restart brings everything back to normal
Out Of Memory Errors come back after a certain time (no obvious pattern)

Notes
Analysis
Analysis of the memory usage using heap dumps
Solr Caches use up a lot of memory (not surprisingly)
Document cache with up to 2048 entries
Entries are dominated by the content field
Content is limited to 50 KByte by the indexers (or so I thought)
Content abbreviation had a bug
Instead of the 50KByte limit of the indexer, the 2 MByte limit of Solr was used
2048 * 2 MByte = 4GByte for the document cache
Heap size at that time = 4GByte

Notes
Lessons Learned
Heap dumps are your friends
Study your heap from time to time, even if you do not have a problem (yet)
Test your limiters
War Story #4:

Expensive
Features
Situation
Search has become slower and slower
We added a lot of data, so that's not really surprising
Analysis into different tuning parameters
Analysis into the cost of different features

Notes
Highlighting

70% of the response time
Lessons Learned
Some features are cool, but also very expensive
Think about what you need to index and what you need to store
Consider loading stuff "offline" and asynchronously
Consider loading stuff from other data sources
A few words on

Scaling
Solr Cloud – Horizontal and Vertical Scaling
Support for Replication and Sharding
Added with Apache Solr 4
Based on Apache Zookeeper
Replication
Fault tolerance, failover
Handling huge amounts of traffic
Sharding
Dealing with huge amounts of data
Geographical
Replication
Geographical Replication

Notes

Load is not an issue for us, but response time is
We have multiple geographically distribute sites
Network latency is a big factor of the response time if you are at a "far away" location
We have been thinking of setting up replicas of the search engine at the different
locations
Relevance-Aware
Sharding
Relevance-Aware Sharding

Notes

Normal sharding distributes data on different, but equal nodes
We have been thinking about creating deliberately different nodes for the distribution
of the data:
Node 1
- extremely fast
- small index, i.e. small amount of data
- lots of memory, CPU, really fast disks

Node 2
- lots more data
- big, but slower disks
- less memory and CPU

Frontend would send queries to both nodes and show results as they come in.
Distribution of the data would be based on the (query independent) boosting value
Wrapping Up
Search rocks
Apache Solr rocks
Learning Curve
Definition of Done
Continuous Inspection
Continuous Improvement
Get your hands dirty
Ranking Optimizations
Continuous Testing
and Monitoring
for Ranking and
Performance Issues
Verification
of features can take
a long time
Cool side projects rock
Contact
Corsin Decurtins
corsin.decurtins@netcetera.com
@corsin
References
Apache Solr
http://lucene.apache.org/solr/

Apache Solr Wiki
http://wiki.apache.org/solr/

Apache Solr on Stack Overflow
http://stackoverflow.com/questions/tagged/solr

Notes

More Related Content

What's hot

Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
Biogeeks
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
YI-CHING WU
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
Erik Hatcher
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Alexandre Rafalovitch
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Andrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublinAndrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublin
lucenerevolution
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
Saumitra Srivastav
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
pittaya
 
Apache Solr
Apache SolrApache Solr
Apache Solr
Minh Tran
 
Introduction Apache Solr & PHP
Introduction Apache Solr & PHPIntroduction Apache Solr & PHP
Introduction Apache Solr & PHP
Hiraq Citra M
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
guest432cd6
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
Murshed Ahmmad Khan
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
Edureka!
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
sagar chaturvedi
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 

What's hot (20)

Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Andrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublinAndrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublin
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction Apache Solr & PHP
Introduction Apache Solr & PHPIntroduction Apache Solr & PHP
Introduction Apache Solr & PHP
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 

Viewers also liked

Warum Intranet-Suchmaschinen unbrauchbar sind ... und was dagegen getan werde...
Warum Intranet-Suchmaschinen unbrauchbar sind ... und was dagegen getan werde...Warum Intranet-Suchmaschinen unbrauchbar sind ... und was dagegen getan werde...
Warum Intranet-Suchmaschinen unbrauchbar sind ... und was dagegen getan werde...
Netcetera
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
Chris Caple
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
Trey Grainger
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
Trey Grainger
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
Paul Borgermans
 

Viewers also liked (10)

Warum Intranet-Suchmaschinen unbrauchbar sind ... und was dagegen getan werde...
Warum Intranet-Suchmaschinen unbrauchbar sind ... und was dagegen getan werde...Warum Intranet-Suchmaschinen unbrauchbar sind ... und was dagegen getan werde...
Warum Intranet-Suchmaschinen unbrauchbar sind ... und was dagegen getan werde...
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 

Similar to Apache Solr - An Experience Report

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysql
liufabin 66688
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
Luis Cabaceira
 
What are you waiting for
What are you waiting forWhat are you waiting for
What are you waiting for
Jason Strate
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Tim Vaillancourt
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Amazon Web Services
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
rschuppe
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Christopher Curtin
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
Amazon Web Services
 
Collaborate 2019 - How to Understand an AWR Report
Collaborate 2019 - How to Understand an AWR ReportCollaborate 2019 - How to Understand an AWR Report
Collaborate 2019 - How to Understand an AWR Report
Alfredo Krieg
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Aaron Shilo
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging Workshop
Brian Christner
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Codemotion
 
Illuminate - Performance Analystics driven by Machine Learning
Illuminate - Performance Analystics driven by Machine LearningIlluminate - Performance Analystics driven by Machine Learning
Illuminate - Performance Analystics driven by Machine Learning
jClarity
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
Balvinder Hira
 
שבוע אורקל 2016
שבוע אורקל 2016שבוע אורקל 2016
שבוע אורקל 2016
Aaron Shilo
 
Drupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp NorthDrupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp North
Philip Norton
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
Michael Yarichuk
 

Similar to Apache Solr - An Experience Report (20)

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysql
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
 
What are you waiting for
What are you waiting forWhat are you waiting for
What are you waiting for
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Collaborate 2019 - How to Understand an AWR Report
Collaborate 2019 - How to Understand an AWR ReportCollaborate 2019 - How to Understand an AWR Report
Collaborate 2019 - How to Understand an AWR Report
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging Workshop
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
 
Illuminate - Performance Analystics driven by Machine Learning
Illuminate - Performance Analystics driven by Machine LearningIlluminate - Performance Analystics driven by Machine Learning
Illuminate - Performance Analystics driven by Machine Learning
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 
שבוע אורקל 2016
שבוע אורקל 2016שבוע אורקל 2016
שבוע אורקל 2016
 
Drupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp NorthDrupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp North
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 

More from Netcetera

Payment trend scouting - Kurt Schmid, Netcetera
Payment trend scouting - Kurt Schmid, NetceteraPayment trend scouting - Kurt Schmid, Netcetera
Payment trend scouting - Kurt Schmid, Netcetera
Netcetera
 
Boost your approved transaction volume - Ana Vuksanovikj Vaneska, Netcetera
Boost your approved transaction volume - Ana Vuksanovikj Vaneska, NetceteraBoost your approved transaction volume - Ana Vuksanovikj Vaneska, Netcetera
Boost your approved transaction volume - Ana Vuksanovikj Vaneska, Netcetera
Netcetera
 
Increase conversion, convenience and security in e-commerce checkouts - Silke...
Increase conversion, convenience and security in e-commerce checkouts - Silke...Increase conversion, convenience and security in e-commerce checkouts - Silke...
Increase conversion, convenience and security in e-commerce checkouts - Silke...
Netcetera
 
3-D Secure 2.0 - Stephan Rüdisüli, Netcetera & Patrick Juffern, INFORM
3-D Secure 2.0 - Stephan Rüdisüli, Netcetera & Patrick Juffern, INFORM3-D Secure 2.0 - Stephan Rüdisüli, Netcetera & Patrick Juffern, INFORM
3-D Secure 2.0 - Stephan Rüdisüli, Netcetera & Patrick Juffern, INFORM
Netcetera
 
Digital Payment in 2020 - Kurt Schmid, Netcetera
Digital Payment in 2020 - Kurt Schmid, NetceteraDigital Payment in 2020 - Kurt Schmid, Netcetera
Digital Payment in 2020 - Kurt Schmid, Netcetera
Netcetera
 
AI First. Erfolgsfaktoren für künstliche Intelligenz im Unternehmen
AI First. Erfolgsfaktoren für künstliche Intelligenz im UnternehmenAI First. Erfolgsfaktoren für künstliche Intelligenz im Unternehmen
AI First. Erfolgsfaktoren für künstliche Intelligenz im Unternehmen
Netcetera
 
Augmenting Maintenance
Augmenting MaintenanceAugmenting Maintenance
Augmenting Maintenance
Netcetera
 
Front-end up front
Front-end up frontFront-end up front
Front-end up front
Netcetera
 
The future of Prototpying
The future of PrototpyingThe future of Prototpying
The future of Prototpying
Netcetera
 
EMV Secure Remote Commerce (SRC)
EMV Secure Remote Commerce (SRC)EMV Secure Remote Commerce (SRC)
EMV Secure Remote Commerce (SRC)
Netcetera
 
Online shopping technology in the fast lane?
Online shopping technology in the fast lane?Online shopping technology in the fast lane?
Online shopping technology in the fast lane?
Netcetera
 
Merchant tokenization and EMV® Secure Remote Commerce
Merchant tokenization and EMV® Secure Remote CommerceMerchant tokenization and EMV® Secure Remote Commerce
Merchant tokenization and EMV® Secure Remote Commerce
Netcetera
 
Seamless 3-D Secure e-commerce experience
Seamless 3-D Secure e-commerce experienceSeamless 3-D Secure e-commerce experience
Seamless 3-D Secure e-commerce experience
Netcetera
 
Augmenting Health Care
Augmenting Health CareAugmenting Health Care
Augmenting Health Care
Netcetera
 
Driving transactional growth with 3-D Secure
Driving transactional growth with 3-D SecureDriving transactional growth with 3-D Secure
Driving transactional growth with 3-D Secure
Netcetera
 
Digital Payment Quo Vadis
Digital Payment Quo VadisDigital Payment Quo Vadis
Digital Payment Quo Vadis
Netcetera
 
EMV® Secure Remote Commerce
EMV® Secure Remote CommerceEMV® Secure Remote Commerce
EMV® Secure Remote Commerce
Netcetera
 
Context: The missing ingredient in multilingual software translation
Context: The missing ingredient in multilingual software translationContext: The missing ingredient in multilingual software translation
Context: The missing ingredient in multilingual software translation
Netcetera
 
Digital Payments - Netcetera Innovation Summit 2018
Digital Payments - Netcetera Innovation Summit 2018Digital Payments - Netcetera Innovation Summit 2018
Digital Payments - Netcetera Innovation Summit 2018
Netcetera
 
"Whats up and new at Netcetera?" - Netcetera Innovation Summit 2018
"Whats up and new at Netcetera?" - Netcetera Innovation Summit 2018"Whats up and new at Netcetera?" - Netcetera Innovation Summit 2018
"Whats up and new at Netcetera?" - Netcetera Innovation Summit 2018
Netcetera
 

More from Netcetera (20)

Payment trend scouting - Kurt Schmid, Netcetera
Payment trend scouting - Kurt Schmid, NetceteraPayment trend scouting - Kurt Schmid, Netcetera
Payment trend scouting - Kurt Schmid, Netcetera
 
Boost your approved transaction volume - Ana Vuksanovikj Vaneska, Netcetera
Boost your approved transaction volume - Ana Vuksanovikj Vaneska, NetceteraBoost your approved transaction volume - Ana Vuksanovikj Vaneska, Netcetera
Boost your approved transaction volume - Ana Vuksanovikj Vaneska, Netcetera
 
Increase conversion, convenience and security in e-commerce checkouts - Silke...
Increase conversion, convenience and security in e-commerce checkouts - Silke...Increase conversion, convenience and security in e-commerce checkouts - Silke...
Increase conversion, convenience and security in e-commerce checkouts - Silke...
 
3-D Secure 2.0 - Stephan Rüdisüli, Netcetera & Patrick Juffern, INFORM
3-D Secure 2.0 - Stephan Rüdisüli, Netcetera & Patrick Juffern, INFORM3-D Secure 2.0 - Stephan Rüdisüli, Netcetera & Patrick Juffern, INFORM
3-D Secure 2.0 - Stephan Rüdisüli, Netcetera & Patrick Juffern, INFORM
 
Digital Payment in 2020 - Kurt Schmid, Netcetera
Digital Payment in 2020 - Kurt Schmid, NetceteraDigital Payment in 2020 - Kurt Schmid, Netcetera
Digital Payment in 2020 - Kurt Schmid, Netcetera
 
AI First. Erfolgsfaktoren für künstliche Intelligenz im Unternehmen
AI First. Erfolgsfaktoren für künstliche Intelligenz im UnternehmenAI First. Erfolgsfaktoren für künstliche Intelligenz im Unternehmen
AI First. Erfolgsfaktoren für künstliche Intelligenz im Unternehmen
 
Augmenting Maintenance
Augmenting MaintenanceAugmenting Maintenance
Augmenting Maintenance
 
Front-end up front
Front-end up frontFront-end up front
Front-end up front
 
The future of Prototpying
The future of PrototpyingThe future of Prototpying
The future of Prototpying
 
EMV Secure Remote Commerce (SRC)
EMV Secure Remote Commerce (SRC)EMV Secure Remote Commerce (SRC)
EMV Secure Remote Commerce (SRC)
 
Online shopping technology in the fast lane?
Online shopping technology in the fast lane?Online shopping technology in the fast lane?
Online shopping technology in the fast lane?
 
Merchant tokenization and EMV® Secure Remote Commerce
Merchant tokenization and EMV® Secure Remote CommerceMerchant tokenization and EMV® Secure Remote Commerce
Merchant tokenization and EMV® Secure Remote Commerce
 
Seamless 3-D Secure e-commerce experience
Seamless 3-D Secure e-commerce experienceSeamless 3-D Secure e-commerce experience
Seamless 3-D Secure e-commerce experience
 
Augmenting Health Care
Augmenting Health CareAugmenting Health Care
Augmenting Health Care
 
Driving transactional growth with 3-D Secure
Driving transactional growth with 3-D SecureDriving transactional growth with 3-D Secure
Driving transactional growth with 3-D Secure
 
Digital Payment Quo Vadis
Digital Payment Quo VadisDigital Payment Quo Vadis
Digital Payment Quo Vadis
 
EMV® Secure Remote Commerce
EMV® Secure Remote CommerceEMV® Secure Remote Commerce
EMV® Secure Remote Commerce
 
Context: The missing ingredient in multilingual software translation
Context: The missing ingredient in multilingual software translationContext: The missing ingredient in multilingual software translation
Context: The missing ingredient in multilingual software translation
 
Digital Payments - Netcetera Innovation Summit 2018
Digital Payments - Netcetera Innovation Summit 2018Digital Payments - Netcetera Innovation Summit 2018
Digital Payments - Netcetera Innovation Summit 2018
 
"Whats up and new at Netcetera?" - Netcetera Innovation Summit 2018
"Whats up and new at Netcetera?" - Netcetera Innovation Summit 2018"Whats up and new at Netcetera?" - Netcetera Innovation Summit 2018
"Whats up and new at Netcetera?" - Netcetera Innovation Summit 2018
 

Recently uploaded

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 

Recently uploaded (20)

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 

Apache Solr - An Experience Report

  • 1. Apache Solr An experience report 2013-10-23 - Corsin Decurtins
  • 2.
  • 3. Apache Solr Full-Text Search Engine Apache Lucene Project based on Apache Lucene Apache Solr: http://lucene.apache.org/solr/ Notes Fast Proven and Well-Known Technology Java based Open APIs Customizable Clustering Features
  • 5.
  • 6. Plaza Search Full-Text Search Engine for the Intranet of Netcetera Integrates Various Data Sources Needs to be fast Ranking is crucial Simple Searching Relevant Filtering Options Desktop, Tables and Phones Notes
  • 7. Warum Intranet-Suchmaschinen unbrauchbar sind …und was dagegen getan werden kann 2013-07-03 – Corsin Decurtins http://www.slideshare.net/netceteragroup/20130703-intranet-searchintranetkonferenz
  • 8.
  • 9. Some Numbers Live since Data since 1996 05/2012 ~ 275 Users ~ 3'000'000 Documents ~ 500 – 2'000 Searches per day Index Size ~ 75 GByte ~ 40 Releases
  • 10. Some Numbers Notes Very small load (only a few hundred requests per day) The indexer agents actually produce a lot more load than the actual end users Medium size index (at least I think) Not that many objects, but relatively big documents Load performance is not a big topic for us When we talk about performance, we actually usually mean response time
  • 12. File System Plaza Search UI Wiki Indexer Plaza Search Indexer Rest API Apache Solr Indexer Email Archive Apache Tika Issue System Index CRM
  • 13. Architecture Based on Apache Solr (and other components) Apache Solr takes care of the text-search aspect We certainly do not want to build this ourselves Apache Tika is used for analyzing (file) contents Also here, we certainly do not want to build this ourselves Notes
  • 14.
  • 15. Magic
  • 16.
  • 17. Apache Solr Notes Apache Solr is a very complex system with a lot of knobs and dials Most things just seem like magic at the beginning … or they just do not work Apache Solr is extremely powerful with a lot of features You have to know how to configure the features Most features need a bit more configuration than just a check box for activating it Configuration options seem very confusing at the beginning You do not need to understand everything from the start Defaults are relatively sensible and the example applications are good starting point
  • 19. Development Process In our experience, Apache Solr works best with a very iterative process Definition of Done is very difficult to specify for search use cases Iterate through: - Researching - Thinking / Designing - Implementation / Configuration / Testing - Observing / Analyzing / Debugging Notes
  • 22. Solr Admin Interface Notes Apache Solr has a pretty good admin interface Very helpful for analysis and (manual) monitoring If you are not familiar with the Solr Admin interface, you should be Other tools like profilers, memory analyzers, monitoring tools etc. are also useful
  • 23. Our Requirements Correctness Results that match query Relevance Results that matter Speed "Instant" results Intelligence Do you know what I mean?
  • 24. Intelligence Do you know what I mean? synonyms.txt stopwords.txt protwords.txt
  • 25. Solr Configuration Files Notes Solr has pretty much out-of-the-box support for stop words, protected works and synonyms These features look very simple, but they are very powerful Unless you have a very general search use case, the defaults are not enough Definitely worth developing a configuration specific to your domain Iterate; consider these features for ranking optimizations as well
  • 26. Relevance Results that matter score match boosting term frequency inverse document frequency field weights boosting function index time boosting elevation
  • 27. Ranking in Solr (simplified) Notes Solr determines a score for the results of a query Score can be used for sorting the results Score is the product of different factors: A query-specific part, let's call it the match value that is computed using term frequency (tf) inverse document frequency (idf) There are also other parameters that can influence it (term weights, field weights, …) The match basically says how well a document matches the query
  • 28. Ranking in Solr (simplified) Notes A generic part (not query specific), let's call this a boosting value Basically represents the general importance that you assign to a document Includes a ranking function, e.g. based on the age of the document Includes a boosting value, that is determined at index time (index-time boosting) We calculate the boost value based on different attributes of the document, such as type of resource (people are more important than files) status of the project that a document is associated with (closed projects are less important than running projects) archive flag (archived resources are less important) …
  • 31. Regression Ranking Testing assertRank("jira", "url", "https://extranet.netcetera.biz/jira/", 1); assertRank("jira", "url", "https://plaza.netcetera.com/.../themas/JIRA", 2);
  • 32. Regression Testing for the Ranking Ranking is influenced by various factors We have continuously executed tests for the ranking Find ranking regressions as soon as possible Tests are executed every night, not just with code changes Notes
  • 35. Situation Search is often extremely slow, response times of 20-30s Situation improves without any intervention Problem shows up again very soon Other applications in the same Tomcat server are brought to a grinding halt No releases within the last 7 days No significant data changes in the last 7 days 2-3 weeks earlier a new data sources have been added Index had grown by a factor of 2, but everything worked fine since then Notes
  • 36. Disk Usage (fake diagram) 100 80 60 40 20 0
  • 37. Lucene Index – Disk Usage Notes Index needs optimzation from time to time when you update it continuously Index optimzation uses a lot of resources, i.e. CPU, memory and disk space Optimzation requires twice the disk space than the optimal index When your normal index uses > 50% of the available disk space, it's already too late It's difficult to get out of this situation (without adding disk space) Deleting stuff from the index does not help, as you need an optimization
  • 38. Lessons Learned We need least 2-3 times as much space as the "ideal" index needs If your index has grown beyond 50%, it's already too late. Disk Usage Monitoring has to be improved Some problems take a long time to show themselves Testing long-term effects and continuous delivery clash to some extent
  • 40. Situation Search is always extremely slow, response times of 20-30s Other applications in the same Tomcat server show normal performance No releases within the last few days No significant data changes in the last few days Notes
  • 41. Memory Usage (fake diagram) 12 10 8 6 4 2 0
  • 42. I/O Caching OS uses "free" memory for caching I/O caching has a HUGE impact on I/O heavy applications Solr (actually Lucene) is a I/O heavy application Notes
  • 43. Lessons Learned Free memory != unused memory Increasing the heap size can slow down Solr OS does a better job at caching Solr data than Solr
  • 44. War Story #3: Know Your Maths
  • 45. Situation Search starts up very fine and is reasonably fast Out Of Memory Errors after a couple of hours Restart brings everything back to normal Out Of Memory Errors come back after a certain time (no obvious pattern) Notes
  • 46. Analysis Analysis of the memory usage using heap dumps Solr Caches use up a lot of memory (not surprisingly) Document cache with up to 2048 entries Entries are dominated by the content field Content is limited to 50 KByte by the indexers (or so I thought) Content abbreviation had a bug Instead of the 50KByte limit of the indexer, the 2 MByte limit of Solr was used 2048 * 2 MByte = 4GByte for the document cache Heap size at that time = 4GByte Notes
  • 47. Lessons Learned Heap dumps are your friends Study your heap from time to time, even if you do not have a problem (yet) Test your limiters
  • 49. Situation Search has become slower and slower We added a lot of data, so that's not really surprising Analysis into different tuning parameters Analysis into the cost of different features Notes
  • 50. Highlighting 70% of the response time
  • 51. Lessons Learned Some features are cool, but also very expensive Think about what you need to index and what you need to store Consider loading stuff "offline" and asynchronously Consider loading stuff from other data sources
  • 52. A few words on Scaling
  • 53. Solr Cloud – Horizontal and Vertical Scaling Support for Replication and Sharding Added with Apache Solr 4 Based on Apache Zookeeper Replication Fault tolerance, failover Handling huge amounts of traffic Sharding Dealing with huge amounts of data
  • 55. Geographical Replication Notes Load is not an issue for us, but response time is We have multiple geographically distribute sites Network latency is a big factor of the response time if you are at a "far away" location We have been thinking of setting up replicas of the search engine at the different locations
  • 57. Relevance-Aware Sharding Notes Normal sharding distributes data on different, but equal nodes We have been thinking about creating deliberately different nodes for the distribution of the data: Node 1 - extremely fast - small index, i.e. small amount of data - lots of memory, CPU, really fast disks Node 2 - lots more data - big, but slower disks - less memory and CPU Frontend would send queries to both nodes and show results as they come in. Distribution of the data would be based on the (query independent) boosting value
  • 64. Get your hands dirty Ranking Optimizations
  • 65. Continuous Testing and Monitoring for Ranking and Performance Issues
  • 66. Verification of features can take a long time
  • 69. References Apache Solr http://lucene.apache.org/solr/ Apache Solr Wiki http://wiki.apache.org/solr/ Apache Solr on Stack Overflow http://stackoverflow.com/questions/tagged/solr Notes