http://www.flickr.com/photos/pict_u_re/2372235999




LODLAM bootcamp
Linked Open Data in Libraries, Archives & Museums
Great Lakes THATCamp
April 29, 2011

Jon Voss, Founder of LookBackMaps
@jonvoss jon@lookbackmaps.net

#lodlam
Welcome

• Introductions


  • Where we come from


  • What we do


  • Interest in Linked Open Data
Welcome

 • A chance to share knowledge, expertise, perspective; explore ideas


 • Goal: a solid, basic, conceptual understanding of Linked Open Data


 • Breaking it down into 3, and maybe 4 parts


 • #lodlam teaser: http://youtu.be/YdrVI7emnt4
Linked Open Data in Cultural Context

• It’s not just Libraries, Archives &
  Museums


• Linked Open Data has evolved in
  the cultural context of shared
  information, music, movies


• From rock to rap to hip-hop to
  mashups


• Changing expectations from
  audiences, curators, technologists


• http://mashupbreakdown.com/
LODLAM is a Growing Movement

• in its infancy, but picking up steam


• it requires experimentation

• small, niche, domain-specific implementations

• use cases, reasons for content providers to get excited about contributing
LODLAM is a product of our increasingly
 connected culture.
• it’s an unfolding story, but it’s awn...


• first funded projects in the US exploring Linked Open Data in the
  humanities now underway: http://lod-lam.net


   • 100 people gathering from around the world to forward LODLAM in the
     next year
LODLAM is a product of our increasingly
connected culture.
 • and that’s just the beginning...




                              Linked   Open   Data
Linked




         http://openlibrary.org/works/OL6048721W/Linked
Going from Tables to Graphs

• Our data and databases have been organized in tables


• which works, but only to a point




                http://www.flickr.com/photos/thomasjwoods-com/2264301251
Going from Tables to Graphs

• The World Wide Web is much more like a graph, and the ability to link to
  disparate datasets relies on our ability to understand data as nodes and
  links in a graph
Going from Tables to Graphs

• As computing power increases, the ability to build more and more complex
  graphs becomes a reality.


• Human vs. Machine readable


                                     msulibraries      lookbackmaps
                                     msulibraries      internetarchive
                                     msulibraries      librarycongress
                                     lookbackmaps      internetarchive
                                     internetarchive   librarycongress
Introducing Triples

• Quite simply: Subject, Predicate, Object


• gives us the ability to describe entities in a way that is machine readable



                              knows
             jonvoss                             copystar



                        Nodes & Links
What do we know about the person: Ed Summers
(aside from the fact that he rocks)?


  Bio: Hacker for libraries, digital archaeologist, pragmatist.

                                bio
                                                      knows


                     depiction of

                                                knows




                                                                  http://inkdroid.org/ehs.rdf
A quick word about vocabularies

• Caution: what libraries call vocabularies is not necessarily what we mean...


• This is how we organize information and triangulate the data we’re looking
  for


• How we agree on predicates


• Ontologies like FOAF, OWL, http://id.loc.gov/, VIAF, etc.
Triples for machines

• now we’re ready to talk to machines


• triples can be serialized in many different ways, including Resource
  Description Framework, RDF/XML, RDFa, N3, Turtle, etc, but they all
  describe things in the <subject><predicate><object> format.


• of course, we need to be consistent and predictable for machines to
  understand us.


• More info from old Semantic Focus article
http://inkdroid.org/ehs.rdf
Tim Berners-Lee’s 4 rules of Linked Data
• http://www.w3.org/DesignIssues/LinkedData.html



 1. Use URIs as names for things
 2. Use HTTP URIs so that people can look up those names.
 3. When someone looks up a URI, provide useful information, using the standards
    (RDF*, SPARQL)
 4. Include links to other URIs. so that they can discover more things.
Now that we can see the code...

• RDF at Open Library (search for
  Civil War regiments: http://
  openlibrary.org/search?
  q=regiment&has_fulltext=true&tim
  e_facet=Civil+War%2C
  +1861-1865)


• @musebrarian’s Of Ships and Men
  project. http://bit.ly/h8W2yl
  (vocabulary: minting uri’s)


• Advanced: Ed Summer’s SNAC
  hacks post: http://inkdroid.org/
  journal/2011/03/31/snac-hacks/
Tim Berners-Lee 2010 Ted Talk

• what people are doing with Linked Data


• http://www.ted.com/talks/
  tim_berners_lee_the_year_open_data_went_worldwide.html
Civil War Data 150

• consider graph demo: http://civilwardata150.net


• Civil War vocabulary, or a way to link and traverse across datasets


  • Regiments, battles, Freebase military schema


• Building apps


  • How tools like Simile/Exhibit can use Linked Data in coordination with
    Freebase (Conflict History: http://conflicthistory.com/#/period/
    1861-1865/conflict/+en+american_civil_war)
In summary
                                                 Linked
• Graphs


• Human AND Machine readable


• Vocabulary, agreed terms for organizing info


• Triples, RDF
Break?
The “Open” part of Linked Open Data
                                               Open
• 5 Stars


• Considerations and ramifications


• Difference between shared, published, open


• Legal tools


• Precedents/Examples
Tim Berners-Lee: 5 Stars of Linked Data

• More thanks to Ed Summers: http://inkdroid.org/journal/2010/06/04/the-5-
  stars-of-open-linked-data/


• This is NOT all or nothing
Expose yourself, be vulnerable

• This is the major cultural shift, the tide rising amongst institutions, that data
  wants to be free in a culture economy.


• There is value in sharing


• It does require a leap of faith, but risks and rewards should be carefully
  considered and calculated


• Excellent resource: JISC Open Bibliographic Data Guide http://
  obd.jisc.ac.uk/
What will happen to your data?

• If you want people to do something with your data/metadata, you have to
  put it out there


• But once you do, it’s [mostly] out of your control. Yet it can be a part of
  something much greater than any of the component parts


• Roots and Wings
What will happen to your data?

• working with Open Data from
  NOAA at wherecamp 2011. http://
  www.nauticalcharts.noaa.gov/
  history/CivilWar/
Metadata vs. data, assets, digital surrogates

• A key conceptual shift with Open Data is looking at metadata and data as
  two separate things, that can have different licensing and permissions
What are the tools for publishing Open Data
Creative Commons

• In the last several years, Creative Commons have provided standardized,
  portable legal tools that make it easier for individuals and institutions to
  use.


• http://creativecommons.org/licenses/
          Open Data                              Published Data

                CC-BY
                                                       CC-BY-NC-ND
                CC0
                                                       CC-BY-NC
               Public Domain Mark
                                                       CC-BY-ND

                                                       CC-BY-SA

                                                       CC-BY-NC-SA
Open Data Commons

• ODC Public Domain Dedication and License


• http://www.opendatacommons.org/licenses/


• Building tools with a focus on databases


• May need a graphic artist?
Concerns and Limitations

• There is some argument about whether or not metadata can be protected
  under copyright at all. Copyright protects a creative work, and some argue
  that metadata is scientific fact, rather than creative work.


• Databases are protected differently in the EU and US, for example.


• Public Domain and No Known Copyright...


• Issuing blanket copyright over all works on a website, even though some
  may be in the public domain


• Institutions that will not issue any kind of copyright due to concerns or
  questions about ownership and copyright
Examples and precedents

• Bibliographic data:


  • British Library (CC0), University of Michigan (CC0), Stanford (CC-BY)
    have published large, raw datasets of bibliographic data they have
    created (being careful not to publish OCLC or other vendor controlled or
    licensed metadata)
Examples and precedents

 • Civil War Data 150


   • Metadata from contributing federal institutions are largely considered
     to be Public Domain.


   • State, local, university & individual researchers are considering
     policies for metadata publishing on a case by case basis.
Sciences leading the way vs. Humanities

• In the sciences, there have been a lot of advances in the realm of Open
  Data, which will provide models for humanities research as well


  • Nano Publishing: the idea of publishing datasets separately from
    research findings, so that it can more easily be built upon and integrated
    into other datasets. Several scientific journals have already started this.


  • Federally funded medical research must have a data management plan
    and some funders are requiring that data be published separately from
    analysis and findings as Open Data
In summary
                                   Open
• put it out there... 5 stars


• published, shared, and/or open


• tools


• metadata vs. assets
Break?
Raw Data Now...
                                                        Open
• Looking at Civil War Data 150 workflow and strategy


• http://www.civilwardata150.net/join


• How we plan to take various datasets and:


  • Clean


  • Reconcile/Vocabulary Alignment


  • Publish triples
Raw Data Now...

• One of our inspirations for this sort of workflow:


• Data.gov Wiki from RPI


• http://data-gov.tw.rpi.edu/wiki
Google Refine

• A tool for large datasets, cleaning and reconciling


• http://code.google.com/p/google-refine/


• Extremely powerful, though scripting language has not yet been very well
  documented.


• Enables you to reconcile data against the 20 million + known entities in
  Freebase
Sandbox

• Depending on time and interest,
  some possibilities


• Demo Refine, or break into small
  groups to work with datasets


• Look at MQL/SPARQL queries as
  the next step of interacting with
  the Global Graph
What Would You Do?

• Conceptualizing domains, Linked Open Data projects, collaborations, etc
Join the LODLAM movement

• http://groups.google.com/group/lod-lam


• #lodlam hashtag on Twitter


• http://lod-lam.net proceedings online and on the road for the next year at
  various annual meetings and conferences


• Contribute!
Thanks Ethan, MATRIX, Amanda, CHNM, MSU,
@edsu and all y’all.

Intro to Linked Open Data in Libraries, Archives & Museums

  • 1.
    http://www.flickr.com/photos/pict_u_re/2372235999 LODLAM bootcamp Linked OpenData in Libraries, Archives & Museums Great Lakes THATCamp April 29, 2011 Jon Voss, Founder of LookBackMaps @jonvoss jon@lookbackmaps.net #lodlam
  • 2.
    Welcome • Introductions • Where we come from • What we do • Interest in Linked Open Data
  • 3.
    Welcome • Achance to share knowledge, expertise, perspective; explore ideas • Goal: a solid, basic, conceptual understanding of Linked Open Data • Breaking it down into 3, and maybe 4 parts • #lodlam teaser: http://youtu.be/YdrVI7emnt4
  • 4.
    Linked Open Datain Cultural Context • It’s not just Libraries, Archives & Museums • Linked Open Data has evolved in the cultural context of shared information, music, movies • From rock to rap to hip-hop to mashups • Changing expectations from audiences, curators, technologists • http://mashupbreakdown.com/
  • 5.
    LODLAM is aGrowing Movement • in its infancy, but picking up steam • it requires experimentation • small, niche, domain-specific implementations • use cases, reasons for content providers to get excited about contributing
  • 6.
    LODLAM is aproduct of our increasingly connected culture. • it’s an unfolding story, but it’s awn... • first funded projects in the US exploring Linked Open Data in the humanities now underway: http://lod-lam.net • 100 people gathering from around the world to forward LODLAM in the next year
  • 7.
    LODLAM is aproduct of our increasingly connected culture. • and that’s just the beginning... Linked Open Data
  • 8.
    Linked http://openlibrary.org/works/OL6048721W/Linked
  • 9.
    Going from Tablesto Graphs • Our data and databases have been organized in tables • which works, but only to a point http://www.flickr.com/photos/thomasjwoods-com/2264301251
  • 10.
    Going from Tablesto Graphs • The World Wide Web is much more like a graph, and the ability to link to disparate datasets relies on our ability to understand data as nodes and links in a graph
  • 11.
    Going from Tablesto Graphs • As computing power increases, the ability to build more and more complex graphs becomes a reality. • Human vs. Machine readable msulibraries lookbackmaps msulibraries internetarchive msulibraries librarycongress lookbackmaps internetarchive internetarchive librarycongress
  • 12.
    Introducing Triples • Quitesimply: Subject, Predicate, Object • gives us the ability to describe entities in a way that is machine readable knows jonvoss copystar Nodes & Links
  • 13.
    What do weknow about the person: Ed Summers (aside from the fact that he rocks)? Bio: Hacker for libraries, digital archaeologist, pragmatist. bio knows depiction of knows http://inkdroid.org/ehs.rdf
  • 14.
    A quick wordabout vocabularies • Caution: what libraries call vocabularies is not necessarily what we mean... • This is how we organize information and triangulate the data we’re looking for • How we agree on predicates • Ontologies like FOAF, OWL, http://id.loc.gov/, VIAF, etc.
  • 15.
    Triples for machines •now we’re ready to talk to machines • triples can be serialized in many different ways, including Resource Description Framework, RDF/XML, RDFa, N3, Turtle, etc, but they all describe things in the <subject><predicate><object> format. • of course, we need to be consistent and predictable for machines to understand us. • More info from old Semantic Focus article
  • 16.
  • 17.
    Tim Berners-Lee’s 4rules of Linked Data • http://www.w3.org/DesignIssues/LinkedData.html 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things.
  • 18.
    Now that wecan see the code... • RDF at Open Library (search for Civil War regiments: http:// openlibrary.org/search? q=regiment&has_fulltext=true&tim e_facet=Civil+War%2C +1861-1865) • @musebrarian’s Of Ships and Men project. http://bit.ly/h8W2yl (vocabulary: minting uri’s) • Advanced: Ed Summer’s SNAC hacks post: http://inkdroid.org/ journal/2011/03/31/snac-hacks/
  • 19.
    Tim Berners-Lee 2010Ted Talk • what people are doing with Linked Data • http://www.ted.com/talks/ tim_berners_lee_the_year_open_data_went_worldwide.html
  • 20.
    Civil War Data150 • consider graph demo: http://civilwardata150.net • Civil War vocabulary, or a way to link and traverse across datasets • Regiments, battles, Freebase military schema • Building apps • How tools like Simile/Exhibit can use Linked Data in coordination with Freebase (Conflict History: http://conflicthistory.com/#/period/ 1861-1865/conflict/+en+american_civil_war)
  • 21.
    In summary Linked • Graphs • Human AND Machine readable • Vocabulary, agreed terms for organizing info • Triples, RDF
  • 22.
  • 23.
    The “Open” partof Linked Open Data Open • 5 Stars • Considerations and ramifications • Difference between shared, published, open • Legal tools • Precedents/Examples
  • 24.
    Tim Berners-Lee: 5Stars of Linked Data • More thanks to Ed Summers: http://inkdroid.org/journal/2010/06/04/the-5- stars-of-open-linked-data/ • This is NOT all or nothing
  • 25.
    Expose yourself, bevulnerable • This is the major cultural shift, the tide rising amongst institutions, that data wants to be free in a culture economy. • There is value in sharing • It does require a leap of faith, but risks and rewards should be carefully considered and calculated • Excellent resource: JISC Open Bibliographic Data Guide http:// obd.jisc.ac.uk/
  • 26.
    What will happento your data? • If you want people to do something with your data/metadata, you have to put it out there • But once you do, it’s [mostly] out of your control. Yet it can be a part of something much greater than any of the component parts • Roots and Wings
  • 27.
    What will happento your data? • working with Open Data from NOAA at wherecamp 2011. http:// www.nauticalcharts.noaa.gov/ history/CivilWar/
  • 28.
    Metadata vs. data,assets, digital surrogates • A key conceptual shift with Open Data is looking at metadata and data as two separate things, that can have different licensing and permissions
  • 29.
    What are thetools for publishing Open Data
  • 30.
    Creative Commons • Inthe last several years, Creative Commons have provided standardized, portable legal tools that make it easier for individuals and institutions to use. • http://creativecommons.org/licenses/ Open Data Published Data CC-BY CC-BY-NC-ND CC0 CC-BY-NC Public Domain Mark CC-BY-ND CC-BY-SA CC-BY-NC-SA
  • 31.
    Open Data Commons •ODC Public Domain Dedication and License • http://www.opendatacommons.org/licenses/ • Building tools with a focus on databases • May need a graphic artist?
  • 32.
    Concerns and Limitations •There is some argument about whether or not metadata can be protected under copyright at all. Copyright protects a creative work, and some argue that metadata is scientific fact, rather than creative work. • Databases are protected differently in the EU and US, for example. • Public Domain and No Known Copyright... • Issuing blanket copyright over all works on a website, even though some may be in the public domain • Institutions that will not issue any kind of copyright due to concerns or questions about ownership and copyright
  • 33.
    Examples and precedents •Bibliographic data: • British Library (CC0), University of Michigan (CC0), Stanford (CC-BY) have published large, raw datasets of bibliographic data they have created (being careful not to publish OCLC or other vendor controlled or licensed metadata)
  • 34.
    Examples and precedents • Civil War Data 150 • Metadata from contributing federal institutions are largely considered to be Public Domain. • State, local, university & individual researchers are considering policies for metadata publishing on a case by case basis.
  • 35.
    Sciences leading theway vs. Humanities • In the sciences, there have been a lot of advances in the realm of Open Data, which will provide models for humanities research as well • Nano Publishing: the idea of publishing datasets separately from research findings, so that it can more easily be built upon and integrated into other datasets. Several scientific journals have already started this. • Federally funded medical research must have a data management plan and some funders are requiring that data be published separately from analysis and findings as Open Data
  • 36.
    In summary Open • put it out there... 5 stars • published, shared, and/or open • tools • metadata vs. assets
  • 37.
  • 38.
    Raw Data Now... Open • Looking at Civil War Data 150 workflow and strategy • http://www.civilwardata150.net/join • How we plan to take various datasets and: • Clean • Reconcile/Vocabulary Alignment • Publish triples
  • 39.
    Raw Data Now... •One of our inspirations for this sort of workflow: • Data.gov Wiki from RPI • http://data-gov.tw.rpi.edu/wiki
  • 40.
    Google Refine • Atool for large datasets, cleaning and reconciling • http://code.google.com/p/google-refine/ • Extremely powerful, though scripting language has not yet been very well documented. • Enables you to reconcile data against the 20 million + known entities in Freebase
  • 41.
    Sandbox • Depending ontime and interest, some possibilities • Demo Refine, or break into small groups to work with datasets • Look at MQL/SPARQL queries as the next step of interacting with the Global Graph
  • 42.
    What Would YouDo? • Conceptualizing domains, Linked Open Data projects, collaborations, etc
  • 43.
    Join the LODLAMmovement • http://groups.google.com/group/lod-lam • #lodlam hashtag on Twitter • http://lod-lam.net proceedings online and on the road for the next year at various annual meetings and conferences • Contribute!
  • 44.
    Thanks Ethan, MATRIX,Amanda, CHNM, MSU, @edsu and all y’all.

Editor's Notes