Linked Open Communism - c4l13
Upcoming SlideShare
Loading in...5
×
 

Linked Open Communism - c4l13

on

  • 2,317 views

 

Statistics

Views

Total Views
2,317
Views on SlideShare
1,773
Embed Views
544

Actions

Likes
8
Downloads
17
Comments
0

5 Embeds 544

http://cynng.wordpress.com 361
https://twitter.com 107
http://lanyrd.com 59
http://eventifier.co 13
http://www.tuicool.com 4

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Linked Open Communism - c4l13 Linked Open Communism - c4l13 Presentation Transcript

  • Linked Open Communism: Better discovery through data dis- and re-aggregation --- or --- How I learned to shut about about linked data AND BUILD SOMETHING!!Presented at code4lib2013by Corey A Harper2013-02-13
  • Linked Data • Metadata as a Graph • Typed “things”, named by URIs • The relationships between those things, also built on URIs • Ease of integration *across* data sources – “merging graphs” 2013-02-13 ☭ code4lib2013 ☭ 2
  • 2013-02-13 ☭ code4lib2013 ☭ 3
  • Refine 2013-02-13 ☭ code4lib2013 ☭ 4
  • ViewShare 2013-02-13 ☭ code4lib2013 ☭ 5
  • Context Narrative Story telling Context The archive’s story, The librarys story, but also… 2013-02-13 ☭ code4lib2013 ☭ 6
  • Users’ stories Adding context through recombinant metadata2013-02-13 ☭ code4lib2013 ☭ 7
  • Backing Away from Evangelism... Image NOT used by permission. Probably a violation of several copyrights & trademarks. 2013-02-13 ☭ code4lib2013 ☭ 8
  • Image by Jonestown Institute via Wikimedia Commons http://en.wikipedia.org/wiki/File:Jonestown_entrance.jpg 9 ☭ code4lib2013 ☭Aside on metaphors 2013-02-13
  • Image by Joe Mabel via Wikimedia Commons. http://en.wikipedia.org/wiki/File:Furthur_05.jpg 10 ☭ code4lib2013 ☭Aside on metaphors 2013-02-13
  • 2013-02-13 ☭ code4lib2013 ☭ 11
  • Premise Context is so central 2013-02-13 ☭ code4lib2013 ☭ 12
  • And yet our Controlled Vocabs Are nearly gone Because the interfaces to them were broken2013-02-13 ☭ code4lib2013 ☭ 13
  • 2013-02-13 ☭ code4lib2013 ☭ 14
  • The Death of Browse • Next-Gen Discovery Systems dont make use of Authority Control • “Browse” was/is broken as a UI Design • Rich data in Authorities, disconnected from narrative, context, search • Richer “Authority” type data outside libraries... 2013-02-13 ☭ code4lib2013 ☭ 15
  • Linked Data Based UI DesignFor Boutique Collections 2013-02-13 ☭ code4lib2013 ☭ 16
  • Public Domain image of Paulette Goddard via Wikimedia Commons. http://en.wikipedia.org/wiki/File:Paulette_Goddard-publicity.JPG 17 ☭ code4lib2013 ☭A research leave 2013-02-13
  • Public Domain image via Wikimedia Commons. http://en.wikipedia.org/wiki/File:Symbol-hammer-and-sickle.svg 18 ☭ code4lib2013 ☭Initial Scope 2013-02-13
  • Linked Open Communism • Dis-aggregate EAD records into Collections & Components • Create a broad set of resource “types” • Extract key “entities” from EAD  People, Places, Topics, Corporate Bodies  Incorporate additional data about entites • Put this in Blacklight • Load MARC & other data 2013-02-13 ☭ code4lib2013 ☭ 19
  • 2013-02-13 ☭ code4lib2013 ☭ 20
  • 2013-02-13 ☭ code4lib2013 ☭ 21
  • 2013-02-13 ☭ code4lib2013 ☭ 22
  • Technology Stack - UI • Vanilla Blacklight  Minor SOLR Index Tweaks / Additions  Minor View Hacks • “pre-beta”  Only on localhost right now 2013-02-13 ☭ code4lib2013 ☭ 23
  • Technology Stack – Support Tools 2013-02-13 ☭ code4lib2013 ☭ 24
  • Gadget! 2013-02-13 ☭ code4lib2013 ☭ 25
  • Technology Stack - Backend • Python & RDFLib • 4Store & HTTP4Store • Sunburnt • FuzzyWuzzy • (Lots of other Python modules....) 2013-02-13 ☭ code4lib2013 ☭ 26
  • Fuzzy Wuzzy – Awesome Library from SeatGeek https://github.com/seatgeek/fuzzywuzzy http://seatgeek.com/blog/dev/fuzzywuzzy-fuzzy-string-matching-in-python 27 ☭ code4lib2013 ☭FuzzyWuzzy & SeatGeek! 2013-02-13
  • Data Flow 2013-02-13 ☭ code4lib2013 ☭ 28
  • Object Oriented Python • Classes: Collections, Components, Entities • Class methods  makeGraph  makeSolr  to4store  output (turtle, rdf/xml, etc) 2013-02-13 ☭ code4lib2013 ☭ 29
  • Performance Benchmarks • EAD -> SOLR:  ~26 hrs to parse 1600 EAD, push 385k “records” to SOLR • DBPedia matching  X-ref label varients for entities against 9.4 million DBPedia labels (labels-en.ttl).  Should be using Hadoop  Other ideas? • Re-solr-izing entities: ~10 minutes  Pulls local copy of dbpedia data from 4store 2013-02-13 ☭ code4lib2013 ☭ 30
  • 4Store • Provenance-ish  Naming of sub-graphs  Default context is everything • First EAD cut produced ~4m triples • Easy to delete whole graphs, or individ triples • SPARQL-able – good for stats:  992 DBPedia links for 6331 “Entities” 2013-02-13 ☭ code4lib2013 ☭ 31
  • Image by wallygrom via flickrhttp://www.flickr.com/photos/33037982@N04/3669790240/ 32 https://github.com/chrpr/ead2rdf2solr ☭ code4lib2013 ☭ 2013-02-13
  • Future Steps: Code to Incorporate • Components: Inheritance of accesspoints  fuzzywuzzy string match to unittitle  matched about 10%  Extend to cross ead match via 4Store • VIAF, id.loc, fast reconciliation • Override configs for DBPedia matching 2013-02-13 ☭ code4lib2013 ☭ 33
  • DBPedia Override Examples Germany. |t Treaties, etc. |g Soviet Union, |d 1939 Aug. 23. http://dbpedia.org/page/Treaty_of_Non- Aggression_between_Germany_and_the_Sovi et_Union Textile Workers Strike, Gastonia, N.C., 1929. http://dbpedia.org/page/Loray_Mill_Strike 2013-02-13 ☭ code4lib2013 ☭ 34
  • Further Development Next Steps • EAC-CPF reconciliation, record creation • Possibly relationship to Hydra?  Annotation Interface, DBP Overrides • SOLR Relevancy Ranking • SOLR-Marc Modifications • Update mechanism • Test with other Datasets (NYPL/NYU/METRO project) 2013-02-13 ☭ code4lib2013 ☭ 35
  • Thanks! corey.harper@nyu.edu 212.998.2479 @chrpr 2013-02-13 ☭ code4lib2013 ☭ 36