Apache Solr & TYPO3
        Ingo Renner   TYPO3 Core Developer,
                      Release Manager TYPO3 4.2
3. org
                ypo er
            o @t enn
        i ng gor
     il @in
 ma r
tw itte
Indexed
 Search
Indexed Search
 •   Indexing Frontend / Crawler
 •   Respects access rights
 •   Respects languages
 •   Index in Database
 •   Totally OK for smaller websites



Slo ooooooooo ooowww
Apache Solr
So what is Apache Solr?

•   Enterprise Search Server
•   Based on Lucene Index
•   Apache Software Foundation Project
•   Many powerful features


•   CNet, Netflix, ilocal.nl, Zappos.com
Solr Concepts

•   Index = Collection of Documents
•   Document = Data stored in Fields
•   Field Type defines processing through
    Analizers, Tokenizers, Filters
•   Dynamic Fields

                                     bi li ty
•   Copy Fields
                        l ex i
                       F
Why Apache Solr?
•   Speed: Many times faster than IS
•   Better search results
•   Faceted search
•   Spellchecker: Did you mean ... ?
•   Similarity search: More like this ...
•
                                        &
    Editorial Content / paid search results

                                     ed
                               pe
•   Synonyms, Stopwords
                            S            r
                                       e
•   Boosting of specific index fields
•
                                P
    Replication, distributed search
                                    o w
How it works

•   REST like interface
•   Indexing of XML Documents through
    HTTP POST
•   Querying through HTTP GET
•   Results as XML, JSON, PHP
                                   AP I
                          E a sy
Disadvantages


•   Needs Java



                                     rs
•   We donʻt want to deal with Java
    Solr shields us from Java once e
•
                         e lo   p  set-up

               D  e   v              P   H P
                        w    i th
            s  ta   y
Advantages

   •   Multiple times faster than IS
   •   NO database queries
   •   Easy installation / Configuration
   •   Respects access rights
   •   Respects languages
   •
           se erful
   t y to u w
       Cutomizability

 as as
F E       P o
EXT:solr
    +
Current Status
•   „Acts like Indexed Search“
•   Indexing through Frontend / Crawler
•   Search
•   Search Word Highlighting
•   Sorting
•   Spellchecker: Did you mean ... ?
•   Similarity Search: More like this ...
•   Faceted Search
•   Suggest / Autocompletion
Outlook
•   Backend Modul
•   API, indexing through BE
•   Related Searches
•   Last Searches
•   Smart Reranking through user usage
•   Editorial Search Results
•   Editing of Stopwords, Synonyms
Development Model
•   Private financing of new features
•   Financing partners get
    Early Access and Support
•   Minimum stake of 5 man days
•   v2.0 end of Q2 next year
•   Development as Community
    Project in parrallel
Community Edition

•   Released v1.0 on TER
•   Project on TYPO3 Forge
•   Open Development
•   Only few differences
    compared to „our“ version
Showcases
Showcases
Showcases
Showcases
Showcases
Showcases
Making the
sun shine on
your search
Requirements, Setup

•   Requires any J2EE container:
    Tomcat, Jetty, Resin, ...


•   Run setup scripts provided with EXT:solr
•   Copy provided configuration files to Solr
•   config.index_enable = 1
Customization


•   Indexing of additional Data through
    hooks, interfaces, TS configuration
•   Individual index schema
•   En/Disable features through TS
•   Individual, flexible rendering of results
More than Solr
Projects around Solr


•   Lucene - Search Index Library


•   Tika - Content Extraction from Files


•   Nutch - Crawl External Sites
Thanks for listening.
3. org
                ypo er
            o @t enn
        i ng gor
     il @in
 ma r
tw itte

Apache Solr for TYPO3 at TYPO3 Usergroup Day Netherlands

  • 1.
    Apache Solr &TYPO3 Ingo Renner TYPO3 Core Developer, Release Manager TYPO3 4.2
  • 2.
    3. org ypo er o @t enn i ng gor il @in ma r tw itte
  • 3.
  • 4.
    Indexed Search • Indexing Frontend / Crawler • Respects access rights • Respects languages • Index in Database • Totally OK for smaller websites Slo ooooooooo ooowww
  • 5.
  • 6.
    So what isApache Solr? • Enterprise Search Server • Based on Lucene Index • Apache Software Foundation Project • Many powerful features • CNet, Netflix, ilocal.nl, Zappos.com
  • 7.
    Solr Concepts • Index = Collection of Documents • Document = Data stored in Fields • Field Type defines processing through Analizers, Tokenizers, Filters • Dynamic Fields bi li ty • Copy Fields l ex i F
  • 8.
    Why Apache Solr? • Speed: Many times faster than IS • Better search results • Faceted search • Spellchecker: Did you mean ... ? • Similarity search: More like this ... • & Editorial Content / paid search results ed pe • Synonyms, Stopwords S r e • Boosting of specific index fields • P Replication, distributed search o w
  • 9.
    How it works • REST like interface • Indexing of XML Documents through HTTP POST • Querying through HTTP GET • Results as XML, JSON, PHP AP I E a sy
  • 10.
    Disadvantages • Needs Java rs • We donʻt want to deal with Java Solr shields us from Java once e • e lo p set-up D e v P H P w i th s ta y
  • 11.
    Advantages • Multiple times faster than IS • NO database queries • Easy installation / Configuration • Respects access rights • Respects languages • se erful t y to u w Cutomizability as as F E P o
  • 12.
  • 13.
    Current Status • „Acts like Indexed Search“ • Indexing through Frontend / Crawler • Search • Search Word Highlighting • Sorting • Spellchecker: Did you mean ... ? • Similarity Search: More like this ... • Faceted Search • Suggest / Autocompletion
  • 14.
    Outlook • Backend Modul • API, indexing through BE • Related Searches • Last Searches • Smart Reranking through user usage • Editorial Search Results • Editing of Stopwords, Synonyms
  • 15.
    Development Model • Private financing of new features • Financing partners get Early Access and Support • Minimum stake of 5 man days • v2.0 end of Q2 next year • Development as Community Project in parrallel
  • 16.
    Community Edition • Released v1.0 on TER • Project on TYPO3 Forge • Open Development • Only few differences compared to „our“ version
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Making the sun shineon your search
  • 24.
    Requirements, Setup • Requires any J2EE container: Tomcat, Jetty, Resin, ... • Run setup scripts provided with EXT:solr • Copy provided configuration files to Solr • config.index_enable = 1
  • 25.
    Customization • Indexing of additional Data through hooks, interfaces, TS configuration • Individual index schema • En/Disable features through TS • Individual, flexible rendering of results
  • 26.
  • 27.
    Projects around Solr • Lucene - Search Index Library • Tika - Content Extraction from Files • Nutch - Crawl External Sites
  • 28.
  • 29.
    3. org ypo er o @t enn i ng gor il @in ma r tw itte