SlideShare a Scribd company logo
Stefano Bargioni
Pontificia Università della Santa Croce

Catalogue enrichment: importing
Dewey Decimal Classification
from external sources

Oct 18, 2013

ADLUG 2013

1
The project
●

Improving the Dewey search path
–
–

●

●

with a minimal effort
while adding BNCF compliant subject headings to our
catalog

Koha 3 <http://koha-community.org> open source
ILS
Can be applied to other ILS's

Oct 18, 2013

ADLUG 2013

2
Version 1: The Batch Mode
●

Add Dewey notations to the catalog
–

automatically

–

from selected sources

–

ensure quality and uniformity

Oct 18, 2013

ADLUG 2013

3
An atomic copy cataloguing
●
●

copy cataloguing is usually related to the full record
we only need to copy field 082 (MARC21) or 676
(Unimarc)

●

ISBN unique identifier

●

the policy issue

Oct 18, 2013

ADLUG 2013

4
Records to be modified
●

without Dewey notation

●

with ISBN

●

limit: 008 language
–

SELECT biblionumber, ISBN
FROM biblio
WHERE ISBN_present
AND dewey_absent
AND language_008='...'

Oct 18, 2013

ADLUG 2013

In
Ko
cla ha,
My use i the W
Ex
tra SQ s ba HE
on ctV L
s
fie alu fun ed o RE
ld
e, t ctio n
thr bibl ha n
ou io. t w
exp gh X ma ork
res Pa rcxm s
sio th
l
ns
5
Dewey Sources (I)
●

a choice based on copy cataloguing experience

●

OCLC Classify

●

some National Libraries

●

API, Z39.50 or HTML access

Oct 18, 2013

ADLUG 2013

6
Dewey Sources (II): OCLC Classify
●

●

●

Classify is a FRBR-based prototype designed to support the assignment of classification
numbers and subject headings for books, DVDs, CDs, and other types of materials.
This project applies principles of the FRBR model to aggregate bibliographic information
above the manifestation level. Bibliographic records are grouped using the OCLC FRBR
Work-Set algorithm to form a work-level summary of the class numbers and subject headings
assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number,
author/title, or subject heading.
The Classify database is accessible through a user interface and as a machine-to-machine
service. The database provides access to more than 36 million WorldCat records that contain
Dewey Decimal Classification (DDC) numbers,[...].

●

Retrieved information is in XML format.

●

http://www.oclc.org/research/activities/classify.html?urlm=159746

Oct 18, 2013

ADLUG 2013

7
Dewey Sources (III): National Libraries
LC

Library of Congress

(any)

MARC

BNF

Bibliothèque nationale de France

(fre)

MARC

DNB

Deutsche Nationalbibliothek

(ger)

HTML

BNCF

Biblioteca Nazionale Centrale di Firenze

(ita)

HTML

BNCR

Biblioteca Nazionale Centrale di Roma

(ita)

HTML

BNB

British National Bibliography

(eng)

MARC

Oct 18, 2013

ADLUG 2013

8
The logic used in the programs
●

open the connection to the bibliographical database

●

obtain the ISBN from records without a Dewey number

●

open the connection to the Dewey source, if Z39.50

●

for each ISBN

●

query the data source using the current ISBN

●

if a Dewey number is available in the response

●

if the Dewey number passes quality control

●

update the bibliographical record

●

wait to avoid overloading

●

close the connection to the Dewey source, if Z39.50

●

close the connection to the bibliographical database

Oct 18, 2013

ADLUG 2013

9
Quality check
●

Catalogs contain errors

●

DDC has many editions

●

Our old Dewey numbers start from edition 19

●

Indicators

●

Lot of discarded Dewey...

●

… but we moved from 40,000
to 60,000 records with Dewey number

Oct 18, 2013

ADLUG 2013

+5

0%
10
Delay while searching sources
●

Continuous searching can suffocate remote servers
–
–

●
●

robots.txt
policies for crawlers

Continuous indexing can overload your server
Wait a few seconds between searches or group of
searches
–

this will slow the harvesting process

Oct 18, 2013

ADLUG 2013

11
Statistics
Source

Language

Dewey #
not found

Dewey #
discarded

Classify

all

42387

10267

5321

6607

20059

LC

all

31999

1252

21195

8562

1011

BNF

all

30903

2253

21327

7268

55

DNB

ger

4193

163

3867

163

0

BNCF

ita

12017

4088

3643

3542

744

BNCR

ita

7549

1515

3003

2978

53

BNB

eng

6215

193

5449

55

518

Total

Oct 18, 2013

Records
Scanned

Records
Modified

ISBN not
found

Several
works
with
same
ISBN

8240

ISBN
incorrect

133

19710

ADLUG 2013

12
Browsing Dewey Index
Besides author, uniform
titles and subject
headings, our OPAC
offers a path of semantic
search based on the
Dewey classification
number

Oct 18, 2013

ADLUG 2013

13
Software
●

Query programs were written in Perl language, making
use of the Koha API and the following libraries
available on CPAN:
–

LWP for HTTP connections

–

ZOOM for Z39.50 connections

–

DBI for connections to the MySQL database

–

XML::XPath for XML data processing

–

WWW::Scraper for HTML data processing

–

MARC::Record for MARC records processing

Oct 18, 2013

ADLUG 2013

14
A scientific article
●

●

published on JLIS.it at
http://leo.cilea.it/index.php/jlis/article/view/8766
JLIS.it, Italian Journal of Library and information
science, is an academic journal of international
scope, peer-reviewed and open access

●

written with my cataloguers

●

doesn't deal with the dynamic component

Oct 18, 2013

ADLUG 2013

15
Version 2.0 - Single Record Mode
●

New record:
–
–

retrieve Dewey from important catalogs

–
●

enter the ISBN
choose and import the best one into the new record

Or upgrade an old record adding or modifying its
Dewey classification

Oct 18, 2013

ADLUG 2013

16
Oct 18, 2013

ADLUG 2013

17
Conclusions
●

Increase of available bibliographic data on the net

●

Unique identifiers
–
–

●

ISBN, ISSN, ...
VIAF Id, ISNI, ...

Catalog enrichment
–
–

●

bibliographic records
authority records

Expose rich linked data
–

with coded information like Dewey

–

with standard IDs like iSBN, ISNI, ...

Oct 18, 2013

ADLUG 2013

18
Thank you
Gracias
Grazie

Oct 18, 2013

ADLUG 2013

19

More Related Content

Similar to Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
IMPACT Centre of Competence
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
cneudecker
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
Sawood Alam
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
Sawood Alam
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
Sawood Alam
 
AGROVOC GACS Working Group
AGROVOC GACS Working GroupAGROVOC GACS Working Group
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
Piergiorgio Lucidi
 
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kuali Days UK
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
Cliff Landis
 
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
CILIP MDG
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
Knoldus Inc.
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
Knoldus Inc.
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
chiportal
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
Claudio Montoya
 
Lokijs
LokijsLokijs
Rene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataRene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect data
KBNLResearch
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and Metrics
Ed King
 
BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013
eimgreece
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
Science Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesScience Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related services
riround
 

Similar to Catalog enrichment: importing Dewey Decimal Classification from external sources (slides) (20)

Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
AGROVOC GACS Working Group
AGROVOC GACS Working GroupAGROVOC GACS Working Group
AGROVOC GACS Working Group
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
 
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Lokijs
LokijsLokijs
Lokijs
 
Rene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataRene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect data
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and Metrics
 
BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Science Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesScience Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related services
 

More from Stefano Bargioni

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Stefano Bargioni
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Stefano Bargioni
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)
Stefano Bargioni
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioni
Stefano Bargioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)
Stefano Bargioni
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)
Stefano Bargioni
 
Open, Big, & Linked Data
Open, Big, & Linked DataOpen, Big, & Linked Data
Open, Big, & Linked Data
Stefano Bargioni
 
Un nuovo motore per Koha
Un nuovo motore per KohaUn nuovo motore per Koha
Un nuovo motore per Koha
Stefano Bargioni
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
Stefano Bargioni
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
Stefano Bargioni
 

More from Stefano Bargioni (11)

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)
 
Open, Big, & Linked Data
Open, Big, & Linked DataOpen, Big, & Linked Data
Open, Big, & Linked Data
 
Un nuovo motore per Koha
Un nuovo motore per KohaUn nuovo motore per Koha
Un nuovo motore per Koha
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Stelline 2013
Stelline 2013Stelline 2013
Stelline 2013
 

Recently uploaded

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

  • 1. Stefano Bargioni Pontificia Università della Santa Croce Catalogue enrichment: importing Dewey Decimal Classification from external sources Oct 18, 2013 ADLUG 2013 1
  • 2. The project ● Improving the Dewey search path – – ● ● with a minimal effort while adding BNCF compliant subject headings to our catalog Koha 3 <http://koha-community.org> open source ILS Can be applied to other ILS's Oct 18, 2013 ADLUG 2013 2
  • 3. Version 1: The Batch Mode ● Add Dewey notations to the catalog – automatically – from selected sources – ensure quality and uniformity Oct 18, 2013 ADLUG 2013 3
  • 4. An atomic copy cataloguing ● ● copy cataloguing is usually related to the full record we only need to copy field 082 (MARC21) or 676 (Unimarc) ● ISBN unique identifier ● the policy issue Oct 18, 2013 ADLUG 2013 4
  • 5. Records to be modified ● without Dewey notation ● with ISBN ● limit: 008 language – SELECT biblionumber, ISBN FROM biblio WHERE ISBN_present AND dewey_absent AND language_008='...' Oct 18, 2013 ADLUG 2013 In Ko cla ha, My use i the W Ex tra SQ s ba HE on ctV L s fie alu fun ed o RE ld e, t ctio n thr bibl ha n ou io. t w exp gh X ma ork res Pa rcxm s sio th l ns 5
  • 6. Dewey Sources (I) ● a choice based on copy cataloguing experience ● OCLC Classify ● some National Libraries ● API, Z39.50 or HTML access Oct 18, 2013 ADLUG 2013 6
  • 7. Dewey Sources (II): OCLC Classify ● ● ● Classify is a FRBR-based prototype designed to support the assignment of classification numbers and subject headings for books, DVDs, CDs, and other types of materials. This project applies principles of the FRBR model to aggregate bibliographic information above the manifestation level. Bibliographic records are grouped using the OCLC FRBR Work-Set algorithm to form a work-level summary of the class numbers and subject headings assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number, author/title, or subject heading. The Classify database is accessible through a user interface and as a machine-to-machine service. The database provides access to more than 36 million WorldCat records that contain Dewey Decimal Classification (DDC) numbers,[...]. ● Retrieved information is in XML format. ● http://www.oclc.org/research/activities/classify.html?urlm=159746 Oct 18, 2013 ADLUG 2013 7
  • 8. Dewey Sources (III): National Libraries LC Library of Congress (any) MARC BNF Bibliothèque nationale de France (fre) MARC DNB Deutsche Nationalbibliothek (ger) HTML BNCF Biblioteca Nazionale Centrale di Firenze (ita) HTML BNCR Biblioteca Nazionale Centrale di Roma (ita) HTML BNB British National Bibliography (eng) MARC Oct 18, 2013 ADLUG 2013 8
  • 9. The logic used in the programs ● open the connection to the bibliographical database ● obtain the ISBN from records without a Dewey number ● open the connection to the Dewey source, if Z39.50 ● for each ISBN ● query the data source using the current ISBN ● if a Dewey number is available in the response ● if the Dewey number passes quality control ● update the bibliographical record ● wait to avoid overloading ● close the connection to the Dewey source, if Z39.50 ● close the connection to the bibliographical database Oct 18, 2013 ADLUG 2013 9
  • 10. Quality check ● Catalogs contain errors ● DDC has many editions ● Our old Dewey numbers start from edition 19 ● Indicators ● Lot of discarded Dewey... ● … but we moved from 40,000 to 60,000 records with Dewey number Oct 18, 2013 ADLUG 2013 +5 0% 10
  • 11. Delay while searching sources ● Continuous searching can suffocate remote servers – – ● ● robots.txt policies for crawlers Continuous indexing can overload your server Wait a few seconds between searches or group of searches – this will slow the harvesting process Oct 18, 2013 ADLUG 2013 11
  • 12. Statistics Source Language Dewey # not found Dewey # discarded Classify all 42387 10267 5321 6607 20059 LC all 31999 1252 21195 8562 1011 BNF all 30903 2253 21327 7268 55 DNB ger 4193 163 3867 163 0 BNCF ita 12017 4088 3643 3542 744 BNCR ita 7549 1515 3003 2978 53 BNB eng 6215 193 5449 55 518 Total Oct 18, 2013 Records Scanned Records Modified ISBN not found Several works with same ISBN 8240 ISBN incorrect 133 19710 ADLUG 2013 12
  • 13. Browsing Dewey Index Besides author, uniform titles and subject headings, our OPAC offers a path of semantic search based on the Dewey classification number Oct 18, 2013 ADLUG 2013 13
  • 14. Software ● Query programs were written in Perl language, making use of the Koha API and the following libraries available on CPAN: – LWP for HTTP connections – ZOOM for Z39.50 connections – DBI for connections to the MySQL database – XML::XPath for XML data processing – WWW::Scraper for HTML data processing – MARC::Record for MARC records processing Oct 18, 2013 ADLUG 2013 14
  • 15. A scientific article ● ● published on JLIS.it at http://leo.cilea.it/index.php/jlis/article/view/8766 JLIS.it, Italian Journal of Library and information science, is an academic journal of international scope, peer-reviewed and open access ● written with my cataloguers ● doesn't deal with the dynamic component Oct 18, 2013 ADLUG 2013 15
  • 16. Version 2.0 - Single Record Mode ● New record: – – retrieve Dewey from important catalogs – ● enter the ISBN choose and import the best one into the new record Or upgrade an old record adding or modifying its Dewey classification Oct 18, 2013 ADLUG 2013 16
  • 18. Conclusions ● Increase of available bibliographic data on the net ● Unique identifiers – – ● ISBN, ISSN, ... VIAF Id, ISNI, ... Catalog enrichment – – ● bibliographic records authority records Expose rich linked data – with coded information like Dewey – with standard IDs like iSBN, ISNI, ... Oct 18, 2013 ADLUG 2013 18
  • 19. Thank you Gracias Grazie Oct 18, 2013 ADLUG 2013 19