SlideShare a Scribd company logo
1 of 24
KohaCon12
Adding browse to Koha
     using Solr
        <http://tinyurl.com/solr-browse>

             Stefano Bargioni
Pontifical University Santa Croce – Rome
             bargioni@pusc.it
The PUSC Library
●   160,000 volumes
       –   147,000 bibs
       –   111,300 auth
●   Aleph 300; Amicus 3.5
●   Koha 3.2.7 from May 1st, 2011
●   PUSC belongs to the URBE Network
       –   17 academic libraries
       –   2 of them using Koha
                     Adding browse to Koha using Solr   2
Why we need browse at PUSC?
●   Aleph 300 and Amicus offered it
●   Our users and cataloguers frequently used it
●   We have a lot of ancient authors, Popes, …,
    requiring “seen from”, “see also”
●   We started to add subjects to our bibliographic
    records



                   Adding browse to Koha using Solr   3
How do you say?
●   Alighieri, Dante or
●   Dante Alighieri or
●   Allighieri, Dante ?

●   Ratzinger, Joseph, 1927- or
●   Benedictus PP. XVI, 1927- or
●   Papi (2005- : Benedictus XVI) ?

We have to help users and cataloguers to use the
correct form.
                         Adding browse to Koha using Solr   4
Grouping
●   Uniform Titles
●   Dewey
●   Series, ...




                     Adding browse to Koha using Solr   5
Browse Functionalities
●   Headings from authority as well as
    bibliographic records
●   Starting from
●   Previous Headings, Next Headings
●   Number of documents
●   Related headings (see, see also, seen from)
●   Go to authority record, if any
●   Additional Links
                    Adding browse to Koha using Solr   6
Browse Requirements
●   Indexes fed by headings coming from
        –   more than one auth tag
        –   more than one bib tag
●   Sort form for Latin-1 (non-latin scripts?)
●   Consider non-filing characters
●   Synchronize frequently
●   Integrated in Koha opac
●   MARC flavour independence
                      Adding browse to Koha using Solr   7
The engine
●   Why Solr?
       –   Schema flexibility
       –   Facets
       –   High performance in update and query
       –   Better than MySQL
       –   Will be part of Koha, maybe replacing
            Zebra


                    Adding browse to Koha using Solr   8
The architecture
  Web                                        Perl CGI
browser




Koha          loader.pl
 SQL                                          Solr db
tables
                cron job

          Adding browse to Koha using Solr              9
The Solr Document (1)
Field name                                    Value

id                                            unique identifier

authid | sysno                                int

au | tl | se ...                              string (display form)

sortform_au | sortform_tl | sortform_se...    string

timestamp                                     ISO 8601

type                                          acc | see | also ...




                               Adding browse to Koha using Solr       10
The Solr Document (2)

Field name                        Example auth
id                                au_a_1234_100_0
authid                            1234
au                                Alighieri, Dante
sortform_au                       alighieri.dante
timestamp                         2012-05-23T19:10:54Z
type                              acc




                    Adding browse to Koha using Solr     11
The Solr Document (3)

Field name         Example bib
id                 tl_b_5678_245_0
sysno              5678
tl                 Gesù Cristo secondo la dottrina di S. Tommaso
                   d'Aquino
sortform_tl        gesu.cristo.secondo.la.dottrina.di.s.tommaso.d.aquino
timestamp          2012-05-23T18:15:44Z
type               acc




                      Adding browse to Koha using Solr                     12
The Solr Document (4)
Id structure
    –   List name                                      au | tl | se ...
    –   Source                                         a|b
    –   Source authid or sysno                         nnn
    –   Tag                                            ttt
    –   Occurrence #                                   n     0 based




                    Adding browse to Koha using Solr                      13
The Solr Document (5)
The sort form:
   –   Diacritics to simple letter (àÀ to aA, ...)
            ●   use Text::Unidecode;
   –   Lowercase
   –   Strip out non-filing characters (titles)
   –   Replace non a-z0-9 with dot
Used for facets


                      Adding browse to Koha using Solr   14
Loading & Synchronizing (1)
●   The same cron based Perl script loads the Solr db for
    the first time and updates it
         –   use C4::Context;
         –   use C4::AuthoritiesMarc;
         –   use WebService::Solr;
●   2000 # of docs modified before issuing a commit
●   5        # of commits before issuing an optimize
●   38 minutes to load 662,400 headings
●   Configured through an xml file
                         Adding browse to Koha using Solr   15
Loading & Synchronizing (2)
●   The XML config file (XML::Simple → YAML?):
         –   Two main sections: auth and bib
         –   each section lists tags that feed indexes

<tag>                            <tag>
    <code>400</code>                 <code>130</code>
    <list>au</list>                  <list>tl</list>
    <type>see</type>                 <type>acc</type>
    <subfields>*</subfields>         <subfields>*</subfields>
    <suffix>.</suffix>               <skip_indicator>2</skip_indicator>
</tag>                           </tag>
<tag>                            <tag>
    ...                              ...
</tag>                           </tag>


                         Adding browse to Koha using Solr                 16
Loading & Synchronizing (3)
●   Special records in Solr, type:system
        –   Created if not exist, otherwise incremented
        –   An usage counter for each index
        –   Last update timestamps
●   Search for new, modified or deleted records
        –   MySQL tables auth_header, biblioitems,
             deletedbiblioitems, deleted_auth_header
        –   Modified AuthoritiesMarc.pm to fill
             deleted_auth_header for auth deletion
●   Cron once a minute, using a lock file
                       Adding browse to Koha using Solr   17
Querying (1)
A new page in Koha: Browse list of indexes

                                                     Ac
                                                  l as ook
                                                 res t use ie st
                                                     ul t d l i or e
                                                         s p st     s
                                                            er and
                                                              pa
                                                                ge




              Adding browse to Koha using Solr                    18
Querying (2)
        # of documents                                            Related
C4::AuthoritiesMarc::CountUsage                                  headings




                                                   Search VIAF




                  Show Koha auth record


                Adding browse to Koha using Solr                            19
Querying (3)
   Titles list contains standard titles and series titles




       Multivolume work



 Adding browse to Koha using Solr                           20
Statistics


        Only for PUSC


   Public

  Staff only

Will be public
                                                           We started
                                                           some weeks
                                                           ago




                        Adding browse to Koha using Solr                21
Security
●   Solr db can be erased with a single http
    request
●   Many ways to add admin security
●   For instance, modify
        –   jetty.xml
        –   webdefault.xml
        –   realm.properties


                        Adding browse to Koha using Solr   22
License and portability
●   The same as Koha
●   Tested on Koha 3.2 and Koha 3.6
●   Needs work to be included in Koha
        –   I18N
        –   .tt instead of AJAX
        –   Branches?
        –   Integration with Koha system preferences
●   … Solr experts... (BibLibre?)
                        Adding browse to Koha using Solr   23
Thank you – Grazie!




     Adding browse to Koha using Solr   24

More Related Content

Similar to Adding browse to Koha using Solr

Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using SolrStefano Bargioni
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2longkeyy
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Apache Solr for TYPO3 Components & Review 2016
Apache Solr for TYPO3 Components & Review 2016Apache Solr for TYPO3 Components & Review 2016
Apache Solr for TYPO3 Components & Review 2016timohund
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
PLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationPLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationAlfresco Software
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and SparkLucidworks
 
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...Lucidworks
 
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...NETWAYS
 

Similar to Adding browse to Koha using Solr (20)

Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Apache Solr for TYPO3 Components & Review 2016
Apache Solr for TYPO3 Components & Review 2016Apache Solr for TYPO3 Components & Review 2016
Apache Solr for TYPO3 Components & Review 2016
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
PLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationPLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR Integration
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Hands on-solr
Hands on-solrHands on-solr
Hands on-solr
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
 
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 

More from Stefano Bargioni

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Stefano Bargioni
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Stefano Bargioni
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Stefano Bargioni
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniStefano Bargioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)Stefano Bargioni
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Stefano Bargioni
 
Catalog enrichment: importing Dewey Decimal Classification from external sour...
Catalog enrichment: importing Dewey Decimal Classification from external sour...Catalog enrichment: importing Dewey Decimal Classification from external sour...
Catalog enrichment: importing Dewey Decimal Classification from external sour...Stefano Bargioni
 
Catalog enrichment: importing Dewey Decimal Classification from external sour...
Catalog enrichment: importing Dewey Decimal Classification from external sour...Catalog enrichment: importing Dewey Decimal Classification from external sour...
Catalog enrichment: importing Dewey Decimal Classification from external sour...Stefano Bargioni
 

More from Stefano Bargioni (11)

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)
 
Open, Big, & Linked Data
Open, Big, & Linked DataOpen, Big, & Linked Data
Open, Big, & Linked Data
 
Un nuovo motore per Koha
Un nuovo motore per KohaUn nuovo motore per Koha
Un nuovo motore per Koha
 
Catalog enrichment: importing Dewey Decimal Classification from external sour...
Catalog enrichment: importing Dewey Decimal Classification from external sour...Catalog enrichment: importing Dewey Decimal Classification from external sour...
Catalog enrichment: importing Dewey Decimal Classification from external sour...
 
Catalog enrichment: importing Dewey Decimal Classification from external sour...
Catalog enrichment: importing Dewey Decimal Classification from external sour...Catalog enrichment: importing Dewey Decimal Classification from external sour...
Catalog enrichment: importing Dewey Decimal Classification from external sour...
 
Stelline 2013
Stelline 2013Stelline 2013
Stelline 2013
 

Adding browse to Koha using Solr

  • 1. KohaCon12 Adding browse to Koha using Solr <http://tinyurl.com/solr-browse> Stefano Bargioni Pontifical University Santa Croce – Rome bargioni@pusc.it
  • 2. The PUSC Library ● 160,000 volumes – 147,000 bibs – 111,300 auth ● Aleph 300; Amicus 3.5 ● Koha 3.2.7 from May 1st, 2011 ● PUSC belongs to the URBE Network – 17 academic libraries – 2 of them using Koha Adding browse to Koha using Solr 2
  • 3. Why we need browse at PUSC? ● Aleph 300 and Amicus offered it ● Our users and cataloguers frequently used it ● We have a lot of ancient authors, Popes, …, requiring “seen from”, “see also” ● We started to add subjects to our bibliographic records Adding browse to Koha using Solr 3
  • 4. How do you say? ● Alighieri, Dante or ● Dante Alighieri or ● Allighieri, Dante ? ● Ratzinger, Joseph, 1927- or ● Benedictus PP. XVI, 1927- or ● Papi (2005- : Benedictus XVI) ? We have to help users and cataloguers to use the correct form. Adding browse to Koha using Solr 4
  • 5. Grouping ● Uniform Titles ● Dewey ● Series, ... Adding browse to Koha using Solr 5
  • 6. Browse Functionalities ● Headings from authority as well as bibliographic records ● Starting from ● Previous Headings, Next Headings ● Number of documents ● Related headings (see, see also, seen from) ● Go to authority record, if any ● Additional Links Adding browse to Koha using Solr 6
  • 7. Browse Requirements ● Indexes fed by headings coming from – more than one auth tag – more than one bib tag ● Sort form for Latin-1 (non-latin scripts?) ● Consider non-filing characters ● Synchronize frequently ● Integrated in Koha opac ● MARC flavour independence Adding browse to Koha using Solr 7
  • 8. The engine ● Why Solr? – Schema flexibility – Facets – High performance in update and query – Better than MySQL – Will be part of Koha, maybe replacing Zebra Adding browse to Koha using Solr 8
  • 9. The architecture Web Perl CGI browser Koha loader.pl SQL Solr db tables cron job Adding browse to Koha using Solr 9
  • 10. The Solr Document (1) Field name Value id unique identifier authid | sysno int au | tl | se ... string (display form) sortform_au | sortform_tl | sortform_se... string timestamp ISO 8601 type acc | see | also ... Adding browse to Koha using Solr 10
  • 11. The Solr Document (2) Field name Example auth id au_a_1234_100_0 authid 1234 au Alighieri, Dante sortform_au alighieri.dante timestamp 2012-05-23T19:10:54Z type acc Adding browse to Koha using Solr 11
  • 12. The Solr Document (3) Field name Example bib id tl_b_5678_245_0 sysno 5678 tl Gesù Cristo secondo la dottrina di S. Tommaso d'Aquino sortform_tl gesu.cristo.secondo.la.dottrina.di.s.tommaso.d.aquino timestamp 2012-05-23T18:15:44Z type acc Adding browse to Koha using Solr 12
  • 13. The Solr Document (4) Id structure – List name au | tl | se ... – Source a|b – Source authid or sysno nnn – Tag ttt – Occurrence # n 0 based Adding browse to Koha using Solr 13
  • 14. The Solr Document (5) The sort form: – Diacritics to simple letter (àÀ to aA, ...) ● use Text::Unidecode; – Lowercase – Strip out non-filing characters (titles) – Replace non a-z0-9 with dot Used for facets Adding browse to Koha using Solr 14
  • 15. Loading & Synchronizing (1) ● The same cron based Perl script loads the Solr db for the first time and updates it – use C4::Context; – use C4::AuthoritiesMarc; – use WebService::Solr; ● 2000 # of docs modified before issuing a commit ● 5 # of commits before issuing an optimize ● 38 minutes to load 662,400 headings ● Configured through an xml file Adding browse to Koha using Solr 15
  • 16. Loading & Synchronizing (2) ● The XML config file (XML::Simple → YAML?): – Two main sections: auth and bib – each section lists tags that feed indexes <tag> <tag> <code>400</code> <code>130</code> <list>au</list> <list>tl</list> <type>see</type> <type>acc</type> <subfields>*</subfields> <subfields>*</subfields> <suffix>.</suffix> <skip_indicator>2</skip_indicator> </tag> </tag> <tag> <tag> ... ... </tag> </tag> Adding browse to Koha using Solr 16
  • 17. Loading & Synchronizing (3) ● Special records in Solr, type:system – Created if not exist, otherwise incremented – An usage counter for each index – Last update timestamps ● Search for new, modified or deleted records – MySQL tables auth_header, biblioitems, deletedbiblioitems, deleted_auth_header – Modified AuthoritiesMarc.pm to fill deleted_auth_header for auth deletion ● Cron once a minute, using a lock file Adding browse to Koha using Solr 17
  • 18. Querying (1) A new page in Koha: Browse list of indexes Ac l as ook res t use ie st ul t d l i or e s p st s er and pa ge Adding browse to Koha using Solr 18
  • 19. Querying (2) # of documents Related C4::AuthoritiesMarc::CountUsage headings Search VIAF Show Koha auth record Adding browse to Koha using Solr 19
  • 20. Querying (3) Titles list contains standard titles and series titles Multivolume work Adding browse to Koha using Solr 20
  • 21. Statistics Only for PUSC Public Staff only Will be public We started some weeks ago Adding browse to Koha using Solr 21
  • 22. Security ● Solr db can be erased with a single http request ● Many ways to add admin security ● For instance, modify – jetty.xml – webdefault.xml – realm.properties Adding browse to Koha using Solr 22
  • 23. License and portability ● The same as Koha ● Tested on Koha 3.2 and Koha 3.6 ● Needs work to be included in Koha – I18N – .tt instead of AJAX – Branches? – Integration with Koha system preferences ● … Solr experts... (BibLibre?) Adding browse to Koha using Solr 23
  • 24. Thank you – Grazie! Adding browse to Koha using Solr 24