1. KohaCon12
Adding browse to Koha
using Solr
<http://tinyurl.com/solr-browse>
Stefano Bargioni
Pontifical University Santa Croce – Rome
bargioni@pusc.it
2. The PUSC Library
● 160,000 volumes
– 147,000 bibs
– 111,300 auth
● Aleph 300; Amicus 3.5
● Koha 3.2.7 from May 1st, 2011
● PUSC belongs to the URBE Network
– 17 academic libraries
– 2 of them using Koha
Adding browse to Koha using Solr 2
3. Why we need browse at PUSC?
● Aleph 300 and Amicus offered it
● Our users and cataloguers frequently used it
● We have a lot of ancient authors, Popes, …,
requiring “seen from”, “see also”
● We started to add subjects to our bibliographic
records
Adding browse to Koha using Solr 3
4. How do you say?
● Alighieri, Dante or
● Dante Alighieri or
● Allighieri, Dante ?
● Ratzinger, Joseph, 1927- or
● Benedictus PP. XVI, 1927- or
● Papi (2005- : Benedictus XVI) ?
We have to help users and cataloguers to use the
correct form.
Adding browse to Koha using Solr 4
5. Grouping
● Uniform Titles
● Dewey
● Series, ...
Adding browse to Koha using Solr 5
6. Browse Functionalities
● Headings from authority as well as
bibliographic records
● Starting from
● Previous Headings, Next Headings
● Number of documents
● Related headings (see, see also, seen from)
● Go to authority record, if any
● Additional Links
Adding browse to Koha using Solr 6
7. Browse Requirements
● Indexes fed by headings coming from
– more than one auth tag
– more than one bib tag
● Sort form for Latin-1 (non-latin scripts?)
● Consider non-filing characters
● Synchronize frequently
● Integrated in Koha opac
● MARC flavour independence
Adding browse to Koha using Solr 7
8. The engine
● Why Solr?
– Schema flexibility
– Facets
– High performance in update and query
– Better than MySQL
– Will be part of Koha, maybe replacing
Zebra
Adding browse to Koha using Solr 8
9. The architecture
Web Perl CGI
browser
Koha loader.pl
SQL Solr db
tables
cron job
Adding browse to Koha using Solr 9
10. The Solr Document (1)
Field name Value
id unique identifier
authid | sysno int
au | tl | se ... string (display form)
sortform_au | sortform_tl | sortform_se... string
timestamp ISO 8601
type acc | see | also ...
Adding browse to Koha using Solr 10
11. The Solr Document (2)
Field name Example auth
id au_a_1234_100_0
authid 1234
au Alighieri, Dante
sortform_au alighieri.dante
timestamp 2012-05-23T19:10:54Z
type acc
Adding browse to Koha using Solr 11
12. The Solr Document (3)
Field name Example bib
id tl_b_5678_245_0
sysno 5678
tl Gesù Cristo secondo la dottrina di S. Tommaso
d'Aquino
sortform_tl gesu.cristo.secondo.la.dottrina.di.s.tommaso.d.aquino
timestamp 2012-05-23T18:15:44Z
type acc
Adding browse to Koha using Solr 12
13. The Solr Document (4)
Id structure
– List name au | tl | se ...
– Source a|b
– Source authid or sysno nnn
– Tag ttt
– Occurrence # n 0 based
Adding browse to Koha using Solr 13
14. The Solr Document (5)
The sort form:
– Diacritics to simple letter (àÀ to aA, ...)
● use Text::Unidecode;
– Lowercase
– Strip out non-filing characters (titles)
– Replace non a-z0-9 with dot
Used for facets
Adding browse to Koha using Solr 14
15. Loading & Synchronizing (1)
● The same cron based Perl script loads the Solr db for
the first time and updates it
– use C4::Context;
– use C4::AuthoritiesMarc;
– use WebService::Solr;
● 2000 # of docs modified before issuing a commit
● 5 # of commits before issuing an optimize
● 38 minutes to load 662,400 headings
● Configured through an xml file
Adding browse to Koha using Solr 15
16. Loading & Synchronizing (2)
● The XML config file (XML::Simple → YAML?):
– Two main sections: auth and bib
– each section lists tags that feed indexes
<tag> <tag>
<code>400</code> <code>130</code>
<list>au</list> <list>tl</list>
<type>see</type> <type>acc</type>
<subfields>*</subfields> <subfields>*</subfields>
<suffix>.</suffix> <skip_indicator>2</skip_indicator>
</tag> </tag>
<tag> <tag>
... ...
</tag> </tag>
Adding browse to Koha using Solr 16
17. Loading & Synchronizing (3)
● Special records in Solr, type:system
– Created if not exist, otherwise incremented
– An usage counter for each index
– Last update timestamps
● Search for new, modified or deleted records
– MySQL tables auth_header, biblioitems,
deletedbiblioitems, deleted_auth_header
– Modified AuthoritiesMarc.pm to fill
deleted_auth_header for auth deletion
● Cron once a minute, using a lock file
Adding browse to Koha using Solr 17
18. Querying (1)
A new page in Koha: Browse list of indexes
Ac
l as ook
res t use ie st
ul t d l i or e
s p st s
er and
pa
ge
Adding browse to Koha using Solr 18
19. Querying (2)
# of documents Related
C4::AuthoritiesMarc::CountUsage headings
Search VIAF
Show Koha auth record
Adding browse to Koha using Solr 19
20. Querying (3)
Titles list contains standard titles and series titles
Multivolume work
Adding browse to Koha using Solr 20
21. Statistics
Only for PUSC
Public
Staff only
Will be public
We started
some weeks
ago
Adding browse to Koha using Solr 21
22. Security
● Solr db can be erased with a single http
request
● Many ways to add admin security
● For instance, modify
– jetty.xml
– webdefault.xml
– realm.properties
Adding browse to Koha using Solr 22
23. License and portability
● The same as Koha
● Tested on Koha 3.2 and Koha 3.6
● Needs work to be included in Koha
– I18N
– .tt instead of AJAX
– Branches?
– Integration with Koha system preferences
● … Solr experts... (BibLibre?)
Adding browse to Koha using Solr 23
24. Thank you – Grazie!
Adding browse to Koha using Solr 24