Adding browse to Koha using Solr

KohaCon12
Adding browse to Koha
using Solr
<http://tinyurl.com/solr-browse>

Stefano Bargioni
Pontifical University Santa Croce – Rome
bargioni@pusc.it

The PUSC Library
● 160,000 volumes
– 147,000 bibs
– 111,300 auth
● Aleph 300; Amicus 3.5
● Koha 3.2.7 from May 1st, 2011
● PUSC belongs to the URBE Network
– 17 academic libraries
– 2 of them using Koha
Adding browse to Koha using Solr 2

Why we need browse at PUSC?
● Aleph 300 and Amicus offered it
● Our users and cataloguers frequently used it
● We have a lot of ancient authors, Popes, …,
requiring “seen from”, “see also”
● We started to add subjects to our bibliographic
records


How do you say?
● Alighieri, Dante or
● Dante Alighieri or
● Allighieri, Dante ?

● Ratzinger, Joseph, 1927- or
● Benedictus PP. XVI, 1927- or
● Papi (2005- : Benedictus XVI) ?

We have to help users and cataloguers to use the
correct form.

Grouping
● Uniform Titles
● Dewey
● Series, ...


Browse Functionalities
● Headings from authority as well as
bibliographic records
● Starting from
● Previous Headings, Next Headings
● Number of documents
● Related headings (see, see also, seen from)
● Go to authority record, if any
● Additional Links

Browse Requirements
● Indexes fed by headings coming from
– more than one auth tag
– more than one bib tag
● Sort form for Latin-1 (non-latin scripts?)
● Consider non-filing characters
● Synchronize frequently
● Integrated in Koha opac
● MARC flavour independence

The engine
● Why Solr?
– Schema flexibility
– Facets
– High performance in update and query
– Better than MySQL
– Will be part of Koha, maybe replacing
Zebra


The architecture
Web Perl CGI
browser

Koha loader.pl
SQL Solr db
tables
cron job



Field name Example auth
id au_a_1234_100_0
authid 1234
au Alighieri, Dante
sortform_au alighieri.dante
timestamp 2012-05-23T19:10:54Z
type acc



Field name Example bib
id tl_b_5678_245_0
sysno 5678
tl Gesù Cristo secondo la dottrina di S. Tommaso
d'Aquino
sortform_tl gesu.cristo.secondo.la.dottrina.di.s.tommaso.d.aquino
timestamp 2012-05-23T18:15:44Z
type acc


Id structure
– List name au | tl | se ...
– Source a|b
– Source authid or sysno nnn
– Tag ttt
– Occurrence # n 0 based


The sort form:
– Diacritics to simple letter (àÀ to aA, ...)
● use Text::Unidecode;
– Lowercase
– Strip out non-filing characters (titles)
– Replace non a-z0-9 with dot
Used for facets


Loading & Synchronizing (1)
● The same cron based Perl script loads the Solr db for
the first time and updates it
– use C4::Context;
– use C4::AuthoritiesMarc;
– use WebService::Solr;
● 2000 # of docs modified before issuing a commit
● 5 # of commits before issuing an optimize
● 38 minutes to load 662,400 headings
● Configured through an xml file

● The XML config file (XML::Simple → YAML?):
– Two main sections: auth and bib
– each section lists tags that feed indexes

<tag> <tag>
<code>400</code> <code>130</code>
<list>au</list> <list>tl</list>
<type>see</type> <type>acc</type>
<subfields>*</subfields> <subfields>*</subfields>
<suffix>.</suffix> <skip_indicator>2</skip_indicator>
</tag> </tag>
<tag> <tag>
... ...
</tag> </tag>


● Special records in Solr, type:system
– Created if not exist, otherwise incremented
– An usage counter for each index
– Last update timestamps
● Search for new, modified or deleted records
– MySQL tables auth_header, biblioitems,
deletedbiblioitems, deleted_auth_header
– Modified AuthoritiesMarc.pm to fill
deleted_auth_header for auth deletion
● Cron once a minute, using a lock file

Querying (1)
A new page in Koha: Browse list of indexes

Ac
l as ook
res t use ie st
ul t d l i or e
s p st s
er and
pa
ge


Querying (2)
# of documents Related
C4::AuthoritiesMarc::CountUsage headings

Search VIAF

Show Koha auth record


Querying (3)
Titles list contains standard titles and series titles

Multivolume work


Statistics

Only for PUSC

Public

Staff only

Will be public
We started
some weeks
ago


Security
● Solr db can be erased with a single http
request
● Many ways to add admin security
● For instance, modify
– jetty.xml
– webdefault.xml
– realm.properties


License and portability
● The same as Koha
● Tested on Koha 3.2 and Koha 3.6
● Needs work to be included in Koha
– I18N
– .tt instead of AJAX
– Branches?
– Integration with Koha system preferences
● … Solr experts... (BibLibre?)

Thank you – Grazie!


Adding browse to Koha using Solr

Recommended

Recommended

More Related Content

Similar to Adding browse to Koha using Solr

Similar to Adding browse to Koha using Solr (20)

More from Stefano Bargioni

More from Stefano Bargioni (11)

Adding browse to Koha using Solr