Linked Open Communism - c4l13
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Linked Open Communism - c4l13

on

  • 2,389 views

 

Statistics

Views

Total Views
2,389
Views on SlideShare
1,841
Embed Views
548

Actions

Likes
8
Downloads
17
Comments
0

5 Embeds 548

http://cynng.wordpress.com 364
https://twitter.com 107
http://lanyrd.com 59
http://eventifier.co 13
http://www.tuicool.com 5

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Linked Open Communism - c4l13 Presentation Transcript

  • 1. Linked Open Communism: Better discovery through data dis- and re-aggregation --- or --- How I learned to shut about about linked data AND BUILD SOMETHING!!Presented at code4lib2013by Corey A Harper2013-02-13
  • 2. Linked Data • Metadata as a Graph • Typed “things”, named by URIs • The relationships between those things, also built on URIs • Ease of integration *across* data sources – “merging graphs” 2013-02-13 ☭ code4lib2013 ☭ 2
  • 3. 2013-02-13 ☭ code4lib2013 ☭ 3
  • 4. Refine 2013-02-13 ☭ code4lib2013 ☭ 4
  • 5. ViewShare 2013-02-13 ☭ code4lib2013 ☭ 5
  • 6. Context Narrative Story telling Context The archive’s story, The librarys story, but also… 2013-02-13 ☭ code4lib2013 ☭ 6
  • 7. Users’ stories Adding context through recombinant metadata2013-02-13 ☭ code4lib2013 ☭ 7
  • 8. Backing Away from Evangelism... Image NOT used by permission. Probably a violation of several copyrights & trademarks. 2013-02-13 ☭ code4lib2013 ☭ 8
  • 9. Image by Jonestown Institute via Wikimedia Commons http://en.wikipedia.org/wiki/File:Jonestown_entrance.jpg 9 ☭ code4lib2013 ☭Aside on metaphors 2013-02-13
  • 10. Image by Joe Mabel via Wikimedia Commons. http://en.wikipedia.org/wiki/File:Furthur_05.jpg 10 ☭ code4lib2013 ☭Aside on metaphors 2013-02-13
  • 11. 2013-02-13 ☭ code4lib2013 ☭ 11
  • 12. Premise Context is so central 2013-02-13 ☭ code4lib2013 ☭ 12
  • 13. And yet our Controlled Vocabs Are nearly gone Because the interfaces to them were broken2013-02-13 ☭ code4lib2013 ☭ 13
  • 14. 2013-02-13 ☭ code4lib2013 ☭ 14
  • 15. The Death of Browse • Next-Gen Discovery Systems dont make use of Authority Control • “Browse” was/is broken as a UI Design • Rich data in Authorities, disconnected from narrative, context, search • Richer “Authority” type data outside libraries... 2013-02-13 ☭ code4lib2013 ☭ 15
  • 16. Linked Data Based UI DesignFor Boutique Collections 2013-02-13 ☭ code4lib2013 ☭ 16
  • 17. Public Domain image of Paulette Goddard via Wikimedia Commons. http://en.wikipedia.org/wiki/File:Paulette_Goddard-publicity.JPG 17 ☭ code4lib2013 ☭A research leave 2013-02-13
  • 18. Public Domain image via Wikimedia Commons. http://en.wikipedia.org/wiki/File:Symbol-hammer-and-sickle.svg 18 ☭ code4lib2013 ☭Initial Scope 2013-02-13
  • 19. Linked Open Communism • Dis-aggregate EAD records into Collections & Components • Create a broad set of resource “types” • Extract key “entities” from EAD  People, Places, Topics, Corporate Bodies  Incorporate additional data about entites • Put this in Blacklight • Load MARC & other data 2013-02-13 ☭ code4lib2013 ☭ 19
  • 20. 2013-02-13 ☭ code4lib2013 ☭ 20
  • 21. 2013-02-13 ☭ code4lib2013 ☭ 21
  • 22. 2013-02-13 ☭ code4lib2013 ☭ 22
  • 23. Technology Stack - UI • Vanilla Blacklight  Minor SOLR Index Tweaks / Additions  Minor View Hacks • “pre-beta”  Only on localhost right now 2013-02-13 ☭ code4lib2013 ☭ 23
  • 24. Technology Stack – Support Tools 2013-02-13 ☭ code4lib2013 ☭ 24
  • 25. Gadget! 2013-02-13 ☭ code4lib2013 ☭ 25
  • 26. Technology Stack - Backend • Python & RDFLib • 4Store & HTTP4Store • Sunburnt • FuzzyWuzzy • (Lots of other Python modules....) 2013-02-13 ☭ code4lib2013 ☭ 26
  • 27. Fuzzy Wuzzy – Awesome Library from SeatGeek https://github.com/seatgeek/fuzzywuzzy http://seatgeek.com/blog/dev/fuzzywuzzy-fuzzy-string-matching-in-python 27 ☭ code4lib2013 ☭FuzzyWuzzy & SeatGeek! 2013-02-13
  • 28. Data Flow 2013-02-13 ☭ code4lib2013 ☭ 28
  • 29. Object Oriented Python • Classes: Collections, Components, Entities • Class methods  makeGraph  makeSolr  to4store  output (turtle, rdf/xml, etc) 2013-02-13 ☭ code4lib2013 ☭ 29
  • 30. Performance Benchmarks • EAD -> SOLR:  ~26 hrs to parse 1600 EAD, push 385k “records” to SOLR • DBPedia matching  X-ref label varients for entities against 9.4 million DBPedia labels (labels-en.ttl).  Should be using Hadoop  Other ideas? • Re-solr-izing entities: ~10 minutes  Pulls local copy of dbpedia data from 4store 2013-02-13 ☭ code4lib2013 ☭ 30
  • 31. 4Store • Provenance-ish  Naming of sub-graphs  Default context is everything • First EAD cut produced ~4m triples • Easy to delete whole graphs, or individ triples • SPARQL-able – good for stats:  992 DBPedia links for 6331 “Entities” 2013-02-13 ☭ code4lib2013 ☭ 31
  • 32. Image by wallygrom via flickrhttp://www.flickr.com/photos/33037982@N04/3669790240/ 32 https://github.com/chrpr/ead2rdf2solr ☭ code4lib2013 ☭ 2013-02-13
  • 33. Future Steps: Code to Incorporate • Components: Inheritance of accesspoints  fuzzywuzzy string match to unittitle  matched about 10%  Extend to cross ead match via 4Store • VIAF, id.loc, fast reconciliation • Override configs for DBPedia matching 2013-02-13 ☭ code4lib2013 ☭ 33
  • 34. DBPedia Override Examples Germany. |t Treaties, etc. |g Soviet Union, |d 1939 Aug. 23. http://dbpedia.org/page/Treaty_of_Non- Aggression_between_Germany_and_the_Sovi et_Union Textile Workers Strike, Gastonia, N.C., 1929. http://dbpedia.org/page/Loray_Mill_Strike 2013-02-13 ☭ code4lib2013 ☭ 34
  • 35. Further Development Next Steps • EAC-CPF reconciliation, record creation • Possibly relationship to Hydra?  Annotation Interface, DBP Overrides • SOLR Relevancy Ranking • SOLR-Marc Modifications • Update mechanism • Test with other Datasets (NYPL/NYU/METRO project) 2013-02-13 ☭ code4lib2013 ☭ 35
  • 36. Thanks! corey.harper@nyu.edu 212.998.2479 @chrpr 2013-02-13 ☭ code4lib2013 ☭ 36