Apache Solr 101
Killing the Vampires of Search
Cluj, 2013
Olivier Dobberkau
Some Vampirology first
●
●
●
●
●

Nosferatu
Dracula
van Helsing
Selene
Edward & Bella

http://en.wikipedia.org/wiki/Vampir...
Agenda
●
●
●
●
●
●

About me
History of EXT:solr
Current status
Solr Basics
Caveats
Books & Documents
About me
Olivier Dobberkau
CEO of dkd Internet Service GmbH
Research and Development
over 10 years of TYPO3 CMS
Member of ...
Scratching ...
.. the TYPO3 CMS search itch
History of EXT:solr
We all know when a solution fails ...
History of EXT:solr
●
●
●
●
●
●
●

Indexed Search gave us some pain
First prototype 2009
What you get in one or two days o...
Current Status
Version 2.8.2 was released November 2012
Introduced the Add-ons for additional features
Supported TYPO3 CMS...
The last TER Release
TER: 2.8.3
Introduce support for TYPO3 CMS Versions 4.5
- 6.1
Loads of bug-fixes
Maintenance Release
Next Major Version
EXT:solr 3.x will be the next version
Release will be hopefully soon(tm)
Will have no new features on t...
Roadmap for EXT:solr 4.x
●
●
●
●
●

Backend parts of the EXT all in Extbase
Templates go FLUID
Frontend goes Extbase
4.x w...
The EXT:solr ecosystem
The base is EXT:solr
Features are added thru Add-ons
● EXT:solrfile (File-Indexing for CMS 4.5 - 4....
EXT:solr
So what does it do?
● Indexing
● Querying
● Results Listing
● Logging / Analysis
Indexing
●
●
●
●

Indexing of pages
Indexing of TCA records
Indexing of Files (Add-On)
Index Queue
○ List of all to be ind...
Indexing
● Indexing is very easy and can be achieved
thru simple typoscript configuration
● Additionally you can use Apach...
Querying
● Easy to set up
● Apply Lucene query language if you want to
search for specific items (only news i.e)
● You can...
Results Listing
● Results can be fully individualized
○ Templates for different results types

● Sorting of the Results Li...
Result Listings
● Facettes
○ Filter the results based of attributes
○ Hierarchical Facettes

●
●
●
●

Suggestions / Autoco...
Logging / Analysis
● Built in query logging
● Can be used with your favorite Analytics
suite
● Feature rich analysis & deb...
Caveats
● Junk in / Junk out
● Get your data right
● A String is not Text
○ Be aware of the difference between Strings and...
Caveats
● Synonyms are nice, but don't abuse them
● Don't confuse Solr with a Database
○ %WORD% does not work

● Search wi...
Caveats
● Beware of indexing time
○ Pages index slower than TCA records
○ Files might be too big for initial settings
Some web resources
● You will find a lot of infos around the Apache
Solr Extension: www.typo3-solr.com
● http://forge.typo...
Books & Documentation
● Taming Text
● Apache Solr Cookbook
● Administering Solr
● Apache Solr 4.x
● WIKI of Apache Solr
ht...
Merci!
Thank you!
Upcoming SlideShare
Loading in …5
×

Apache Solr for TYPO3 CMS 101

667
-1

Published on

The TYPO3 Extension EXT:solr adds a fast, precise and extendable modern search the TYPO3 CMS. In this Presentation you will be informed about the current Status of development of the Extension and its Add-Ons. We will give you an overview on common indexing strategies and offer you insights into the best practices for your implementation

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
667
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Apache Solr for TYPO3 CMS 101

  1. 1. Apache Solr 101 Killing the Vampires of Search Cluj, 2013 Olivier Dobberkau
  2. 2. Some Vampirology first ● ● ● ● ● Nosferatu Dracula van Helsing Selene Edward & Bella http://en.wikipedia.org/wiki/Vampire_film
  3. 3. Agenda ● ● ● ● ● ● About me History of EXT:solr Current status Solr Basics Caveats Books & Documents
  4. 4. About me Olivier Dobberkau CEO of dkd Internet Service GmbH Research and Development over 10 years of TYPO3 CMS Member of the T3A EAB olivier.dobberkau@dkd.de Twitter: T3RevNeverEnd
  5. 5. Scratching ... .. the TYPO3 CMS search itch
  6. 6. History of EXT:solr We all know when a solution fails ...
  7. 7. History of EXT:solr ● ● ● ● ● ● ● Indexed Search gave us some pain First prototype 2009 What you get in one or two days of work Started Funding of Development over 70 Sponsors Its possible to offer services around it Support and Consulting available
  8. 8. Current Status Version 2.8.2 was released November 2012 Introduced the Add-ons for additional features Supported TYPO3 CMS Versions 4.5, 4.6 & 4.7 Supported Solr Server 3.6.2 (Time flies when you are having fun!)
  9. 9. The last TER Release TER: 2.8.3 Introduce support for TYPO3 CMS Versions 4.5 - 6.1 Loads of bug-fixes Maintenance Release
  10. 10. Next Major Version EXT:solr 3.x will be the next version Release will be hopefully soon(tm) Will have no new features on the TYPO3 side Support for TYPO3 CMS 4.5 - 6.1 Add Apache Solr 4.4 as a Server
  11. 11. Roadmap for EXT:solr 4.x ● ● ● ● ● Backend parts of the EXT all in Extbase Templates go FLUID Frontend goes Extbase 4.x will be 6.2 only! Effort estimated 2 to 4 man months
  12. 12. The EXT:solr ecosystem The base is EXT:solr Features are added thru Add-ons ● EXT:solrfile (File-Indexing for CMS 4.5 - 4.7) ● EXT:solrdam (File-Indexing with DAM) ● EXT:solrfal (File-Indexing for CMS 6.1 & 6.2) ● EXT:solrmlt (More like this) ● EXT:solrgrouping ● EXT:tika (Extracting Service)
  13. 13. EXT:solr So what does it do? ● Indexing ● Querying ● Results Listing ● Logging / Analysis
  14. 14. Indexing ● ● ● ● Indexing of pages Indexing of TCA records Indexing of Files (Add-On) Index Queue ○ List of all to be indexed items ○ Every time an items is touched/changed an update is sent to the solr server ○ No need for a crawler / instant results
  15. 15. Indexing ● Indexing is very easy and can be achieved thru simple typoscript configuration ● Additionally you can use Apache Nutch to index non TYPO3 websites ● Support for more than 30 Languages
  16. 16. Querying ● Easy to set up ● Apply Lucene query language if you want to search for specific items (only news i.e) ● You can tell solr to boost results if query terms are in the fields you are searching ● Use elevation to rank terms ● Correct Stemming available ● Range queries (Intelligent dates)
  17. 17. Results Listing ● Results can be fully individualized ○ Templates for different results types ● Sorting of the Results List ○ ○ ○ ○ Relevance Date Title any other field ● Can be toggled
  18. 18. Result Listings ● Facettes ○ Filter the results based of attributes ○ Hierarchical Facettes ● ● ● ● Suggestions / Autocomplete Stopwords Protected words Did you mean?
  19. 19. Logging / Analysis ● Built in query logging ● Can be used with your favorite Analytics suite ● Feature rich analysis & debugging options
  20. 20. Caveats ● Junk in / Junk out ● Get your data right ● A String is not Text ○ Be aware of the difference between Strings and Text ○ Protect proper names from stemming ○ Example
  21. 21. Caveats ● Synonyms are nice, but don't abuse them ● Don't confuse Solr with a Database ○ %WORD% does not work ● Search with “WORD” if you want your query to remain untouched ● * work only at the end of a word ○ cat* will find catapult, cats, catastrophe etc ○ *cat will yield with no results
  22. 22. Caveats ● Beware of indexing time ○ Pages index slower than TCA records ○ Files might be too big for initial settings
  23. 23. Some web resources ● You will find a lot of infos around the Apache Solr Extension: www.typo3-solr.com ● http://forge.typo3. org/projects/show/extension-solr ● Mailing List / Newsgroup / Forums ● Afraid of Solr? try www.hosted-solr.com
  24. 24. Books & Documentation ● Taming Text ● Apache Solr Cookbook ● Administering Solr ● Apache Solr 4.x ● WIKI of Apache Solr https://cwiki.apache. org/confluence/display/solr/Apache+Solr+Refer ence+Guide
  25. 25. Merci! Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×