http://www.dkd.de
Freitag, 10. Juni 2011
d dkdevelopment
kommunikation
design
Freitag, 10. Juni 2011
Welcome
Olivier Dobberkau
CEO
dkd Internet Service GmbH
Frankfurt am Main, Germany
Freitag, 10. Juni 2011
Agenda
What is search?
Search in TYPO3
Search expectations today
Apache Solr
Why and how?
Watch out!
Freitag, 10. Juni 2011
Aboutme
Freitag, 10. Juni 2011
OlivierDobberkau
Founder of dkd Internet Service GmbH
aka „the reverend never-end“
Met TYPO3 with Version 3.2 beta 3
Membe...
WhatisSearch?
Freitag, 10. Juni 2011
DefinitionofInformationRetrieval
Information retrieval (IR) is the area of study
concerned with searching for documents, f...
FactorsinInformationRetrieval
Recall
Precision
Fall-out
Scalability
Performance
Freitag, 10. Juni 2011
FactorsinInformationRetrieval
Recall
Precision
Fall-out
Scalability
Performance
Simplicity
Flexibility
Freitag, 10. Juni 2...
Recall
Percent of documents that are returned
400 documents
100 containing information
25% recall
Freitag, 10. Juni 2011
Precision
Percentage of documents that are relevant
500 returned, 100 relevant
20% precision
Freitag, 10. Juni 2011
Best would be:
100% Recall with 100% Precision
Freitag, 10. Juni 2011
Index
The purpose of storing an index is to optimize
speed and performance in finding relevant
documents for a search quer...
Index
Index
Document 5
Document 4
Document 3
Document 2
Document 1
Extbase
TYPO3
San
Baseball
My
is
Francisco
is
cat
T3CON...
PostingFile
Word Document
My 1,2
cat 1
is 1,2,5
cool 1
Baseball 2
Sport 2
San 3
Freitag, 10. Juni 2011
SearchinTYPO3
Freitag, 10. Juni 2011
IndexedSearch
Indexed Search since TYPO3 Version 3.5
Frontend Indexing through the Frontend
Searches in Pages and in some ...
IndexedSearch
Index in Database
Problems with large websites
Slow
no sorting
no Templating
OK for small websites
Freitag, ...
Search
Expectations
Freitag, 10. Juni 2011
Expectationvs.Experience
Users expect „Google-Like“ interface and
behaviour in search
No one navigates through an online s...
ApacheSolr
Enterprise Search Server
Freitag, 10. Juni 2011
ApacheSolr
Apache Software Foundation
Enterprise Search Server
uses the Lucene Index
Lots of great Features
CNet, Netflix,...
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Freitag, 10. Juni 2011
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
...
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
...
Howdoesitwork?
REST like Interface
Indexing with POST
Search with GET
Results in XML, JSON, PHP and many more
Libraries fo...
Whyandhow?
Freitag, 10. Juni 2011
ScratchingourItch
Why?
Indexed Search was too slow
misses a lot of now a days requirements
Freitag, 10. Juni 2011
History
Prototype im Summer 2008
Kick-off February 2009
„Acts like Indexed Search“
Early Access Program
T3CON September 20...
Components
Indexing
Search
Flexible Templating
Analysis and Statistics
Administration
Freitag, 10. Juni 2011
Challenges
Page Rendering in TYPO3
Access Rights
File Indexing
Easy Setup for Non Java People
Integrating Solr in general
...
Solutions
Record Monitor und Indexing Queue
Solr Query Parser Plugin
Integration of Apache Tika
Fully Automated bash Insta...
Features
Facetted Search
File Indexing
Multi-language Support
Did you mean
Freitag, 10. Juni 2011
Features
Search Word Highlighting
Autocomplete / Suggestions
Access Rights Support
More to come
Freitag, 10. Juni 2011
Watchout!
Freitag, 10. Juni 2011
„I do not have any solution. I admire the problem.“
Ashleight Brillant, Cartonist and Author.
Freitag, 10. Juni 2011
CommonProblems
Relanvancy Perception Trap
Assumption: Search should display a certain
result like an Employee Name
Query: ...
CommonProblems
Finding Corpses in your Corpus
While Searching you find „interesting“ Results
You have forgotten to hide co...
CommonProblems
Data updates without using the TCE Main
You wonder: Why do my new records of table
XY not show up
You have ...
CommonProblems
Can‘t access the Solr Server
You can not access the Solr Server on another
Machine
Possible Solution
Freita...
CommonProblems
Help my Index gets deleted
Syntom: Your Index is empty
Possible Cause: Your Solr Server is not secured
Frei...
CommonProblems
My news are not being indexed
News that you have in a Sysfolder are not
showing up in your Results
The Fold...
Questions?
Freitag, 10. Juni 2011
d dk
development
kommunikation
design
Thankyou.
Freitag, 10. Juni 2011
Upcoming SlideShare
Loading in …5
×

Searching does not mean finding Stuff - Apache Solr for TYPO3

1,306
-1

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,306
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Searching does not mean finding Stuff - Apache Solr for TYPO3

  1. 1. http://www.dkd.de Freitag, 10. Juni 2011
  2. 2. d dkdevelopment kommunikation design Freitag, 10. Juni 2011
  3. 3. Welcome Olivier Dobberkau CEO dkd Internet Service GmbH Frankfurt am Main, Germany Freitag, 10. Juni 2011
  4. 4. Agenda What is search? Search in TYPO3 Search expectations today Apache Solr Why and how? Watch out! Freitag, 10. Juni 2011
  5. 5. Aboutme Freitag, 10. Juni 2011
  6. 6. OlivierDobberkau Founder of dkd Internet Service GmbH aka „the reverend never-end“ Met TYPO3 with Version 3.2 beta 3 Member of T3A BCC 43 years old olivier.dobberkau@dkd.de Twitter: @T3RevNeverEnd Freitag, 10. Juni 2011
  7. 7. WhatisSearch? Freitag, 10. Juni 2011
  8. 8. DefinitionofInformationRetrieval Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web. Wikipedia: http://en.wikipedia.org/wiki/Information_retrieval Freitag, 10. Juni 2011
  9. 9. FactorsinInformationRetrieval Recall Precision Fall-out Scalability Performance Freitag, 10. Juni 2011
  10. 10. FactorsinInformationRetrieval Recall Precision Fall-out Scalability Performance Simplicity Flexibility Freitag, 10. Juni 2011
  11. 11. Recall Percent of documents that are returned 400 documents 100 containing information 25% recall Freitag, 10. Juni 2011
  12. 12. Precision Percentage of documents that are relevant 500 returned, 100 relevant 20% precision Freitag, 10. Juni 2011
  13. 13. Best would be: 100% Recall with 100% Precision Freitag, 10. Juni 2011
  14. 14. Index The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Freitag, 10. Juni 2011
  15. 15. Index Index Document 5 Document 4 Document 3 Document 2 Document 1 Extbase TYPO3 San Baseball My is Francisco is cat T3CON my is a rocks Fort cool Ghetto Mason Sport Freitag, 10. Juni 2011
  16. 16. PostingFile Word Document My 1,2 cat 1 is 1,2,5 cool 1 Baseball 2 Sport 2 San 3 Freitag, 10. Juni 2011
  17. 17. SearchinTYPO3 Freitag, 10. Juni 2011
  18. 18. IndexedSearch Indexed Search since TYPO3 Version 3.5 Frontend Indexing through the Frontend Searches in Pages and in some Filetypes Works with Languages and Accessrights Freitag, 10. Juni 2011
  19. 19. IndexedSearch Index in Database Problems with large websites Slow no sorting no Templating OK for small websites Freitag, 10. Juni 2011
  20. 20. Search Expectations Freitag, 10. Juni 2011
  21. 21. Expectationvs.Experience Users expect „Google-Like“ interface and behaviour in search No one navigates through an online shop up to 30% of users use the search instead of going through text or navigation Search is mediocre on a lot of websites Slow and incomplete Lots of improvement possible Freitag, 10. Juni 2011
  22. 22. ApacheSolr Enterprise Search Server Freitag, 10. Juni 2011
  23. 23. ApacheSolr Apache Software Foundation Enterprise Search Server uses the Lucene Index Lots of great Features CNet, Netflix, Zappos.com and many more... Freitag, 10. Juni 2011
  24. 24. SolrKey-Features Synonyms Stopwords Boosting / Weighting Facetting Paid Content / Elevation Freitag, 10. Juni 2011
  25. 25. SolrKey-Features Synonyms Stopwords Boosting / Weighting Facetting Paid Content / Elevation Spellchecking / Did you mean? Freitag, 10. Juni 2011
  26. 26. SolrKey-Features Synonyms Stopwords Boosting / Weighting Facetting Paid Content / Elevation Spellchecking / Did you mean? Speed Freitag, 10. Juni 2011
  27. 27. Howdoesitwork? REST like Interface Indexing with POST Search with GET Results in XML, JSON, PHP and many more Libraries for many programming languages SolrPhpClient Freitag, 10. Juni 2011
  28. 28. Whyandhow? Freitag, 10. Juni 2011
  29. 29. ScratchingourItch Why? Indexed Search was too slow misses a lot of now a days requirements Freitag, 10. Juni 2011
  30. 30. History Prototype im Summer 2008 Kick-off February 2009 „Acts like Indexed Search“ Early Access Program T3CON September 2009 Version 1.0 Freitag, 10. Juni 2011
  31. 31. Components Indexing Search Flexible Templating Analysis and Statistics Administration Freitag, 10. Juni 2011
  32. 32. Challenges Page Rendering in TYPO3 Access Rights File Indexing Easy Setup for Non Java People Integrating Solr in general Freitag, 10. Juni 2011
  33. 33. Solutions Record Monitor und Indexing Queue Solr Query Parser Plugin Integration of Apache Tika Fully Automated bash Install Script SolrPhpClient Freitag, 10. Juni 2011
  34. 34. Features Facetted Search File Indexing Multi-language Support Did you mean Freitag, 10. Juni 2011
  35. 35. Features Search Word Highlighting Autocomplete / Suggestions Access Rights Support More to come Freitag, 10. Juni 2011
  36. 36. Watchout! Freitag, 10. Juni 2011
  37. 37. „I do not have any solution. I admire the problem.“ Ashleight Brillant, Cartonist and Author. Freitag, 10. Juni 2011
  38. 38. CommonProblems Relanvancy Perception Trap Assumption: Search should display a certain result like an Employee Name Query: Mike Miller Results: Mill 100% Relanvancy Miller 75% Relanvancy Possible Issue: Stemming on proper Names Solution: Don‘t stemm Fields with Names Freitag, 10. Juni 2011
  39. 39. CommonProblems Finding Corpses in your Corpus While Searching you find „interesting“ Results You have forgotten to hide content You have not set the „no search“ Flag You have made copies of records and forgotten them Freitag, 10. Juni 2011
  40. 40. CommonProblems Data updates without using the TCE Main You wonder: Why do my new records of table XY not show up You have updated the tables with i.e phpMyAdmin You might have forgotten to add the Language id in the records Freitag, 10. Juni 2011
  41. 41. CommonProblems Can‘t access the Solr Server You can not access the Solr Server on another Machine Possible Solution Freitag, 10. Juni 2011
  42. 42. CommonProblems Help my Index gets deleted Syntom: Your Index is empty Possible Cause: Your Solr Server is not secured Freitag, 10. Juni 2011
  43. 43. CommonProblems My news are not being indexed News that you have in a Sysfolder are not showing up in your Results The Folder in not in the rootline of the Website Configure the PID of the Sysfolder correctly Freitag, 10. Juni 2011
  44. 44. Questions? Freitag, 10. Juni 2011
  45. 45. d dk development kommunikation design Thankyou. Freitag, 10. Juni 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×