SlideShare a Scribd company logo
Linked Library Datain the wild
Technical Lead for Prism Phil John Introductions...
So, what’s Prism then? Introductions...
a next generation discovery interface Prism Introductions
(yes…even configuration settings) Built entirely on Linked Data Prism
Discovery of library  catalogue resources Prism but grander plans afoot...
...some future sources... Prism ,[object Object]
 archives/records (e.g. DS Calm)
 thesis repositories
 rare items/special collections
 and more!,[object Object]
MARC 21    RDF Performs data conversion Prism
this ensures it keeps in sync with the LMS Initial “bulk” conversion then periodic “delta” files Prism
provided by a suite of RESTful web services Borrower/Availability data pulled from LMS “live” Prism
just add .rss to collectionsor .rdf/.nt/.ttl/.json to items Linked Data API Prism
The Challenges Prism
Extracting data from MARC 21 The Challenges
Some quotes... Extracting Data from MARC 21 ...cataloguers may want to look away now
...and even if it does, there are millions of existing records that we’ll want to convert MARC 21 is not going away anytime soon... Extracting Data from MARC 21
How are we approaching it? Extracting Data from MARC 21
By tackling it in small chunks! Extracting Data from MARC 21
We’ve created a solution that... Extracting Data from MARC 21 ,[object Object]
 compartmentalises code for different sections
 provides robustness
 is performant
 allows us to experiment ,[object Object]
fires events when it encounters a MARC 21 data structure; very strict with syntax MARC 21 Parser Extracting Data from MARC 21
listens for MARC 21 data structures and hands control over to one or more handlers Event Observer Extracting Data from MARC 21
know how to convert MARC 21structures and fields into linked data Bibliographic Handlers Extracting Data from MARC 21
So, where are we up to? Extracting Data from MARC 21
we tackled this one first as it allows us to reason more fully about the record Format (and duration) Extracting Data from MARC 21
In theory quite easy... Format
...in practice not so much... Format ,[object Object]
 DVD and LaserDisc share(d) a code
 LC slow(ish) to support new formats in M21
 limited use of control field (007) codings...
 ...so need to parse text from 3xx, 5xx fields,[object Object]
Which gives us...
an important part of the recordto model, or so I’ve been told Title Extracting Data from MARC 21
Quite tricky because... Title ,[object Object]
 ‡c must be last subfield in a 245...
 ...so sometimes data from ‡n / ‡p is in ‡c instead...
 ...which means we can’t just drop the ‡c ,[object Object]
Now with more title
sounds easy...acronyms from EAN to UPC describing 13 digit codes...right? Identifier Extracting Data from MARC 21
what are all those other things doing in the ‡a? ...STOP! Identifier
Identifier “For a hardbound resource, there is no attempt to use a consistent term other than to use one that conveys the condition intelligibly.” Library of Congress Rule Interpretation 1.8
(and then validate whatever’s left) So we need to parse them out Identifier
LDR: 01425ngm a22005058  4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007    enk||| e          v|eng d 020:  ,   | $c Retail (S24.99) | 024: 3,   | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029:  ,   | $a 7321900108089 | 082:  ,   | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260:  ,   | $b Warner Home Video, | $c 2007. | 300:  ,   | $a 1 Blu-Ray (139 min.) : | $b col. | 306:  ,   | $a 021900 | 366:  ,   | $b 20070611 | 511:  ,   | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8,   | $a BBFC code: 18. | 538:  ,   | $a Blu-Ray. | 700: 1,   | $a Scorsese, Martin | 700: 1,   | $a Brooks, Christopher | 852:  ,   | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert Phew, this one’s easy, no (pbk), (hbk) or even (pbk. , alk. paper) to contend with
Now we can start performing lookups against other sources!
hardest of the lot... Author Extracting Data from MARC 21
...why? Author ,[object Object]
 Rowling, J.K. vs Rowling, Joanne K.
 Few records with relator term in 100/700 ‡e...
 ...so we have to parse that from the 245 ‡c...
 ...and we don’t just deal with English records.,[object Object]
we’ve licensed the names/subjects authority files, and created RDF from them Library of Congress to the rescue! Author
LDR: 01425ngm a22005058  4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007    enk||| e          v|eng d 020:  ,   | $c Retail (S24.99) | 024: 3,   | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029:  ,   | $a 7321900108089 | 082:  ,   | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260:  ,   | $b Warner Home Video, | $c 2007. | 300:  ,   | $a 1 Blu-Ray (139 min.) : | $b col. | 306:  ,   | $a 021900 | 366:  ,   | $b 20070611 | 511:  ,   | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8,   | $a BBFC code: 18. | 538:  ,   | $a Blu-Ray. | 700: 1,   | $a Scorsese, Martin | 700: 1,   | $a Brooks, Christopher | $e music 852:  ,   | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert A contrived example (sorry!) with and without relator terms
Hope you can all read this at the back!
A closer look at Authority Matching Author
Some requirements: Author ,[object Object]
 ...(able to process 2M records in several hours)
 requires accuracy
 must handle pseudonyms and variant spellings,[object Object]
You can tell J.K. Rowling is successful, she’s been translated lots
Language/Alternate Graphical Representation Extracting Data from MARC 21
Nice “high impact” feature Language ,[object Object]

More Related Content

Similar to Linked Library Data in the wild

PAL
PALPAL
SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL Databases
Farzad Nozarian
 
Cwmg
CwmgCwmg
Cwmg
nilamdoc
 
CouchDB
CouchDBCouchDB
CouchDB
codebits
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
Jeremy Schneider
 
All About Storeconfigs
All About StoreconfigsAll About Storeconfigs
All About Storeconfigs
Brice Figureau
 
Introduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesIntroduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and Processes
PrestoCentre
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
Amazon Web Services
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
brettallison
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Amazon Web Services
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
jtdudley
 
Avtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - Fargo
Avtex
 
Data Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldData Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into Gold
Søren Schaffstein
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
unit 5.ppt
unit 5.pptunit 5.ppt
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
ashish61_scs
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
Wolfgang Engel
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
Ange Albertini
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
Duyhai Doan
 
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivDynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
Amazon Web Services
 

Similar to Linked Library Data in the wild (20)

PAL
PALPAL
PAL
 
SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL Databases
 
Cwmg
CwmgCwmg
Cwmg
 
CouchDB
CouchDBCouchDB
CouchDB
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
All About Storeconfigs
All About StoreconfigsAll About Storeconfigs
All About Storeconfigs
 
Introduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesIntroduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and Processes
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
 
Avtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - Fargo
 
Data Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldData Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into Gold
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
unit 5.ppt
unit 5.pptunit 5.ppt
unit 5.ppt
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivDynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
 

Recently uploaded

Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 

Recently uploaded (20)

Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 

Linked Library Data in the wild

  • 2. Technical Lead for Prism Phil John Introductions...
  • 3. So, what’s Prism then? Introductions...
  • 4.
  • 5.
  • 6.
  • 7. a next generation discovery interface Prism Introductions
  • 8. (yes…even configuration settings) Built entirely on Linked Data Prism
  • 9. Discovery of library catalogue resources Prism but grander plans afoot...
  • 10.
  • 13. rare items/special collections
  • 14.
  • 15. MARC 21 RDF Performs data conversion Prism
  • 16. this ensures it keeps in sync with the LMS Initial “bulk” conversion then periodic “delta” files Prism
  • 17. provided by a suite of RESTful web services Borrower/Availability data pulled from LMS “live” Prism
  • 18. just add .rss to collectionsor .rdf/.nt/.ttl/.json to items Linked Data API Prism
  • 19.
  • 20.
  • 21.
  • 23. Extracting data from MARC 21 The Challenges
  • 24. Some quotes... Extracting Data from MARC 21 ...cataloguers may want to look away now
  • 25.
  • 26. ...and even if it does, there are millions of existing records that we’ll want to convert MARC 21 is not going away anytime soon... Extracting Data from MARC 21
  • 27.
  • 28. How are we approaching it? Extracting Data from MARC 21
  • 29. By tackling it in small chunks! Extracting Data from MARC 21
  • 30.
  • 31. compartmentalises code for different sections
  • 34.
  • 35. fires events when it encounters a MARC 21 data structure; very strict with syntax MARC 21 Parser Extracting Data from MARC 21
  • 36. listens for MARC 21 data structures and hands control over to one or more handlers Event Observer Extracting Data from MARC 21
  • 37. know how to convert MARC 21structures and fields into linked data Bibliographic Handlers Extracting Data from MARC 21
  • 38. So, where are we up to? Extracting Data from MARC 21
  • 39. we tackled this one first as it allows us to reason more fully about the record Format (and duration) Extracting Data from MARC 21
  • 40. In theory quite easy... Format
  • 41.
  • 42. DVD and LaserDisc share(d) a code
  • 43. LC slow(ish) to support new formats in M21
  • 44. limited use of control field (007) codings...
  • 45.
  • 47. an important part of the recordto model, or so I’ve been told Title Extracting Data from MARC 21
  • 48.
  • 49. ‡c must be last subfield in a 245...
  • 50. ...so sometimes data from ‡n / ‡p is in ‡c instead...
  • 51.
  • 52. Now with more title
  • 53. sounds easy...acronyms from EAN to UPC describing 13 digit codes...right? Identifier Extracting Data from MARC 21
  • 54. what are all those other things doing in the ‡a? ...STOP! Identifier
  • 55. Identifier “For a hardbound resource, there is no attempt to use a consistent term other than to use one that conveys the condition intelligibly.” Library of Congress Rule Interpretation 1.8
  • 56.
  • 57. (and then validate whatever’s left) So we need to parse them out Identifier
  • 58. LDR: 01425ngm a22005058 4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007 enk||| e v|eng d 020: , | $c Retail (S24.99) | 024: 3, | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029: , | $a 7321900108089 | 082: , | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260: , | $b Warner Home Video, | $c 2007. | 300: , | $a 1 Blu-Ray (139 min.) : | $b col. | 306: , | $a 021900 | 366: , | $b 20070611 | 511: , | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8, | $a BBFC code: 18. | 538: , | $a Blu-Ray. | 700: 1, | $a Scorsese, Martin | 700: 1, | $a Brooks, Christopher | 852: , | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert Phew, this one’s easy, no (pbk), (hbk) or even (pbk. , alk. paper) to contend with
  • 59. Now we can start performing lookups against other sources!
  • 60. hardest of the lot... Author Extracting Data from MARC 21
  • 61.
  • 62. Rowling, J.K. vs Rowling, Joanne K.
  • 63. Few records with relator term in 100/700 ‡e...
  • 64. ...so we have to parse that from the 245 ‡c...
  • 65.
  • 66. we’ve licensed the names/subjects authority files, and created RDF from them Library of Congress to the rescue! Author
  • 67. LDR: 01425ngm a22005058 4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007 enk||| e v|eng d 020: , | $c Retail (S24.99) | 024: 3, | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029: , | $a 7321900108089 | 082: , | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260: , | $b Warner Home Video, | $c 2007. | 300: , | $a 1 Blu-Ray (139 min.) : | $b col. | 306: , | $a 021900 | 366: , | $b 20070611 | 511: , | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8, | $a BBFC code: 18. | 538: , | $a Blu-Ray. | 700: 1, | $a Scorsese, Martin | 700: 1, | $a Brooks, Christopher | $e music 852: , | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert A contrived example (sorry!) with and without relator terms
  • 68. Hope you can all read this at the back!
  • 69. A closer look at Authority Matching Author
  • 70.
  • 71. ...(able to process 2M records in several hours)
  • 73.
  • 74. You can tell J.K. Rowling is successful, she’s been translated lots
  • 75. Language/Alternate Graphical Representation Extracting Data from MARC 21
  • 76.
  • 77. both forms can be searched for
  • 78.
  • 79. tagged with an ISO-639-2 language and masquerading as the field listed in ‡6 Passes 880s back into Observer Language
  • 81.
  • 82.
  • 83.
  • 84. it’s part of the reason we use Linked Data...but it’s got some challenges at the moment Using/Linking to External Datasets The Challenges
  • 85.
  • 86. ...or worse, is taken offline permanently?
  • 87. can we trust this data?
  • 88.
  • 89. ...or, if that’s not practical, proxy requests using a caching proxy such as Squid
  • 90. if using Wikipedia and worried about vandalism...
  • 91.
  • 92. ...or – what we’d like to seehappen to Linked Library Data The Future...
  • 93. especially on the peripheries – authority data, author information, links to other resources More library data as LOD The Future
  • 94. seriously – this would makeour lives so much simpler LMS vendors adopting LOD The Future
  • 95. LOD replacing MARC 21 as the standard representation of bibliographic records The Future
  • 96.
  • 97. Photo Credits Slide 15 - http://www.flickr.com/photos/gammaman/5241860326/ Slide 21 - http://www.flickr.com/photos/agizienski/3778965891/ Slide 40 - http://www.flickr.com/photos/54409200@N04/5070012761/ Slide 42 - http://www.flickr.com/photos/proimos/4199675334/ Slide 48 - http://www.flickr.com/photos/maveric2003/91198458/ Slide 63 - http://richard.cyganiak.de/2007/10/lod/ Slide 67 - http://www.flickr.com/photos/markchapmanphoto/5139429152/ Slide 72 - http://www.flickr.com/photos/-bast-/349497988/