Linked Open Data in Libraries, Archives and
               Museums
        Jon Voss – July 12, 2011 – NYPL Labs
http://www.flickr.com/photos/pict_u_re/2372235999

Linked Open Data
in
Libraries, Archives & Museums

New York Public Library                                         @jonvoss
July 12, 2011

Jon Voss
Historypin Strategic Partnerships Director
jon.voss@wearewhatwedo.org
Welcome
•   Goal: a solid, basic, conceptual understanding of Linked
    Open Data

•   A chance to collaborate with others, share knowledge,
    expertise, perspective; explore ideas
Linked Open Data in Cultural Context


•   It’s not just Libraries,      •   http://
    Archives & Museums                mashupbreakdown.com/

•   Linked Open Data has
    evolved in the cultural
    context of shared
    information, music, movies

•   From rock to rap to hip-hop
    to mashups

•   Changing expectations from
    audiences, curators,
    technologists
History & Mashup Culture




                      +




2010 National Archives Photo Contest

 http://www.flickr.com/photos/37377809@N00/5304492185/in/pool-1633053@N21/
2009
                                 Linked
                                 Open
                                  Data




photos by PhOtOnQuAnTiQuE, TED
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
LODLAM is a Growing Movement


•   in its infancy, but picking up steam

•   it requires experimentation

•   small, niche, domain-specific implementations

•   use cases, reasons for content providers to get excited about
    contributing
LODLAM is a product of our increasingly
            connected culture.

•   it’s an unfolding story, but it’s awn...

•   first funded projects in the US exploring Linked Open Data in the
    humanities now underway: http://lod-lam.net

    •   June 2-3, 100 people gathered from around the world to
        forward LODLAM in the next year
LODLAM is a product of our increasingly
        connected culture.

•   and that’s just the beginning...




                           Linked      Open   Data
Linked




         http://openlibrary.org/works/OL6048721W/Linked
Going from Tables to Graphs




 http://www.flickr.com/photos/thomasjwoods-com/2264301251
Going from Tables to Graphs

 •   nodes and links in a graph
Going from Tables to Graphs

•   As computing power increases, the ability to build
    more and more complex graphs becomes a reality.

•   Human vs. Machine readable


                              msulibraries      lookbackmaps
                              msulibraries      internetarchive
                              msulibraries      librarycongress
                              lookbackmaps      internetarchive
                              internetarchive   librarycongress
Introducing Triples

                      Nodes & Links


                          follows
            jonvoss                      NYPL_Labs




•   Quite simply: Subject, Predicate, Object

•   gives us the ability to describe entities in a way that is
    machine readable
What do we know about the person: Ed
 Summers (aside from the fact that he
              rocks)?
Bio: Hacker for libraries, digital archaeologist, pragmatist.

                              bio
                                                      knows


                  depiction of

                                               knows




                                                                http://inkdroid.org/ehs.rdf
Triples for machines
•   triples can be serialized in many different ways,
    including Resource Description
    Framework, RDF/XML, RDFa, N3, Turtle, etc,
    but they all describe things in the
    <subject><predicate><object> format.

•   of course, we need to be consistent and
    predictable for machines to understand us.
•   we’re almost ready to talk to machines




                              http://www.flickr.com/photos/oface/3306994117/
http://inkdroid.org/ehs.rdf
•   consider graph demo: http://civilwardata150.net

•   Civil War vocabulary, or a way to link and traverse
    across datasets

    •   Regiments, battles, Freebase military schema

•   Building apps

    •   How tools like Simile/Exhibit can use Linked
        Data in coordination with Freebase (Conflict
        History: http://conflicthistory.com/#/period/
Now that we can see the code...


•   Books

•   Photos

•   Information
Tim Berners-Lee’s 4 rules of Linked Data

• Use URIs as names for things
• Use HTTP URIs so that people can look up those
 names.
• When someone looks up a URI, provide useful
 information, using the standards (RDF*, SPARQL)
• Include links to other URIs. so that they can discover
 more things.


http://www.w3.org/DesignIssues/LinkedData.html
Tim Berners-Lee: 5 Stars of Linked Data

•   More thanks to Ed Summers: http://inkdroid.org/
    journal/2010/06/04/the-5-stars-of-open-linked-
    data/

•   This is NOT all or nothing
A cautionary word about vocabularies




                   http://www.flickr.com/photos/sillygwailo/272291003/
A cautionary word about vocabularies

•   Caution: what libraries call vocabularies is not
    necessarily what we mean...

•   This is how we organize information and
    triangulate the data we’re looking for

•   How we agree on predicates

•   Ontologies like FOAF, OWL, http://id.loc.gov/,
    VIAF, etc.
In summary                    Linked




• Graphs
• Human AND Machine readable
• Vocabulary, agreed terms for organizing info
• Triples, RDF
The “Open” part of Linked
      Open Data                                  Open




•   Considerations and ramifications

•   Difference between shared, published, open

•   Legal tools

•   Precedents/Examples
Expose yourself, be vulnerable
•   This is the major cultural shift, the tide rising
    amongst institutions, that data wants to be free in
    a culture economy.

•   There is value in sharing

•   It does require a leap of faith, but risks and
    rewards should be carefully considered and
    calculated

•   Excellent resource: JISC Open Bibliographic Data
    Guide http://obd.jisc.ac.uk/
What will happen to your data?

•   If you want people to do something with your
    data/metadata, you have to put it out there

•   But once you do, it’s [mostly] out of your control.
    Yet it can be a part of something much greater
    than any of the component parts

•   Roots and Wings

•   Lessig: Humility of the Web
What will happen to your data?
•   working with
    Open Data
    from NOAA
    at wherecamp
    2011.




                       http://www.nauticalcharts.noaa.gov/history/CivilWar/
Metadata vs. data, assets, digital
         surrogates


•   A key conceptual shift with Open Data is
    looking at metadata and data as two separate
    things, that can have different licensing and
    permissions
http://www.loc.gov/pictures/collection/cwp/item/2003653763/
http://www.loc.gov/pictures/item/2003653763/marc/
What are the legal tools for
 publishing Open Data?
Legal Tools

•    http://creativecommons.org/licenses/

•    http://www.opendatacommons.org/licenses/


         Open Data                                 Published Data

                  CC-BY                                 CC-BY-NC-ND
                 CC0
                                                        CC-BY-NC
                 Public Domain Mark
                                                        CC-BY-ND
Public Domain Dedication and License (PDDL)
                                                         CC-BY-SA
Attribution License (ODC-By)
Open Database License (ODC-ODbL)                         CC-BY-NC-SA
Concerns and Limitations
•   There is some argument about whether or not
    metadata can be protected under copyright at all.
    Copyright protects a creative work, and some
    argue that metadata is scientific fact, rather than
    creative work.

•   Databases are protected differently in the EU and
    US, for example.

•   Public Domain and No Known Copyright...

•   Issuing blanket copyright over all works on a
    website, even though some may be in the public
    domain
Examples and precedents

•   Bibliographic data:

    •   British Library (CC0), University of Michigan
        (CC0), Stanford (CC-BY) have published large,
        raw datasets of bibliographic data they have
        created (being careful not to publish OCLC or
        other vendor controlled or licensed metadata)
Examples and precedents

•   Civil War Data 150

    •   Metadata from contributing federal
        institutions are largely considered to be Public
        Domain.

    •   State, local, university & individual researchers
        are considering policies for metadata
        publishing on a case by case basis.
Examples and precedents




http://googleancientplaces.wordpress.com/public-domain/
Sciences leading the way vs. Humanities

•   In the sciences, there have been a lot of advances
    in the realm of Open Data, which will provide
    models for humanities research as well

    • Nano Publishing: the idea of publishing
        datasets separately from research findings, so
        that it can more easily be built upon and
        integrated into other datasets. Several scientific
        journals have already started this.

    •   Federally funded medical research must have a
        data management plan and some funders are
        requiring that data be published separately from
        analysis and findings as Open Data
In summary            Open




• put it out there...
• published, shared, and/or open
• tools
• metadata vs. assets
Google Refine



•   A tool for large datasets, cleaning and reconciling

•   http://code.google.com/p/google-refine/

•   Extremely powerful, though scripting language has
    not yet been very well documented.

•   Enables you to reconcile data against the 20 million
    + known entities in Freebase
What Would You Do?


• Conceptualizing domains, Linked Open
  Data projects, collaborations, etc
Join the LODLAM movement


• #lodlam hashtag on Twitter
• http://groups.google.com/group/lod-lam
• http://lod-lam.net proceedings online and
  on the road for the next year at various
  annual meetings and conferences
• Contribute!
Thanks
          @NYPL_Labs Team
           @edsu & crew
Sloan Foundation, NEH, Internet Archive
              Historypin

               & all y’all.

Linked Open Data in Libraries, Archives & Museums

  • 1.
    Linked Open Datain Libraries, Archives and Museums Jon Voss – July 12, 2011 – NYPL Labs
  • 2.
    http://www.flickr.com/photos/pict_u_re/2372235999 Linked Open Data in Libraries,Archives & Museums New York Public Library @jonvoss July 12, 2011 Jon Voss Historypin Strategic Partnerships Director jon.voss@wearewhatwedo.org
  • 3.
    Welcome • Goal: a solid, basic, conceptual understanding of Linked Open Data • A chance to collaborate with others, share knowledge, expertise, perspective; explore ideas
  • 4.
    Linked Open Datain Cultural Context • It’s not just Libraries, • http:// Archives & Museums mashupbreakdown.com/ • Linked Open Data has evolved in the cultural context of shared information, music, movies • From rock to rap to hip-hop to mashups • Changing expectations from audiences, curators, technologists
  • 5.
    History & MashupCulture + 2010 National Archives Photo Contest http://www.flickr.com/photos/37377809@N00/5304492185/in/pool-1633053@N21/
  • 6.
    2009 Linked Open Data photos by PhOtOnQuAnTiQuE, TED
  • 7.
    Linking Open Datacloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 9.
    LODLAM is aGrowing Movement • in its infancy, but picking up steam • it requires experimentation • small, niche, domain-specific implementations • use cases, reasons for content providers to get excited about contributing
  • 10.
    LODLAM is aproduct of our increasingly connected culture. • it’s an unfolding story, but it’s awn... • first funded projects in the US exploring Linked Open Data in the humanities now underway: http://lod-lam.net • June 2-3, 100 people gathered from around the world to forward LODLAM in the next year
  • 11.
    LODLAM is aproduct of our increasingly connected culture. • and that’s just the beginning... Linked Open Data
  • 12.
    Linked http://openlibrary.org/works/OL6048721W/Linked
  • 13.
    Going from Tablesto Graphs http://www.flickr.com/photos/thomasjwoods-com/2264301251
  • 14.
    Going from Tablesto Graphs • nodes and links in a graph
  • 15.
    Going from Tablesto Graphs • As computing power increases, the ability to build more and more complex graphs becomes a reality. • Human vs. Machine readable msulibraries lookbackmaps msulibraries internetarchive msulibraries librarycongress lookbackmaps internetarchive internetarchive librarycongress
  • 16.
    Introducing Triples Nodes & Links follows jonvoss NYPL_Labs • Quite simply: Subject, Predicate, Object • gives us the ability to describe entities in a way that is machine readable
  • 17.
    What do weknow about the person: Ed Summers (aside from the fact that he rocks)? Bio: Hacker for libraries, digital archaeologist, pragmatist. bio knows depiction of knows http://inkdroid.org/ehs.rdf
  • 18.
    Triples for machines • triples can be serialized in many different ways, including Resource Description Framework, RDF/XML, RDFa, N3, Turtle, etc, but they all describe things in the <subject><predicate><object> format. • of course, we need to be consistent and predictable for machines to understand us.
  • 19.
    we’re almost ready to talk to machines http://www.flickr.com/photos/oface/3306994117/
  • 20.
  • 21.
    consider graph demo: http://civilwardata150.net • Civil War vocabulary, or a way to link and traverse across datasets • Regiments, battles, Freebase military schema • Building apps • How tools like Simile/Exhibit can use Linked Data in coordination with Freebase (Conflict History: http://conflicthistory.com/#/period/
  • 25.
    Now that wecan see the code... • Books • Photos • Information
  • 28.
    Tim Berners-Lee’s 4rules of Linked Data • Use URIs as names for things • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) • Include links to other URIs. so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html
  • 29.
    Tim Berners-Lee: 5Stars of Linked Data • More thanks to Ed Summers: http://inkdroid.org/ journal/2010/06/04/the-5-stars-of-open-linked- data/ • This is NOT all or nothing
  • 30.
    A cautionary wordabout vocabularies http://www.flickr.com/photos/sillygwailo/272291003/
  • 31.
    A cautionary wordabout vocabularies • Caution: what libraries call vocabularies is not necessarily what we mean... • This is how we organize information and triangulate the data we’re looking for • How we agree on predicates • Ontologies like FOAF, OWL, http://id.loc.gov/, VIAF, etc.
  • 32.
    In summary Linked • Graphs • Human AND Machine readable • Vocabulary, agreed terms for organizing info • Triples, RDF
  • 33.
    The “Open” partof Linked Open Data Open • Considerations and ramifications • Difference between shared, published, open • Legal tools • Precedents/Examples
  • 34.
    Expose yourself, bevulnerable • This is the major cultural shift, the tide rising amongst institutions, that data wants to be free in a culture economy. • There is value in sharing • It does require a leap of faith, but risks and rewards should be carefully considered and calculated • Excellent resource: JISC Open Bibliographic Data Guide http://obd.jisc.ac.uk/
  • 35.
    What will happento your data? • If you want people to do something with your data/metadata, you have to put it out there • But once you do, it’s [mostly] out of your control. Yet it can be a part of something much greater than any of the component parts • Roots and Wings • Lessig: Humility of the Web
  • 36.
    What will happento your data? • working with Open Data from NOAA at wherecamp 2011. http://www.nauticalcharts.noaa.gov/history/CivilWar/
  • 37.
    Metadata vs. data,assets, digital surrogates • A key conceptual shift with Open Data is looking at metadata and data as two separate things, that can have different licensing and permissions
  • 38.
  • 39.
  • 40.
    What are thelegal tools for publishing Open Data?
  • 41.
    Legal Tools • http://creativecommons.org/licenses/ • http://www.opendatacommons.org/licenses/ Open Data Published Data CC-BY CC-BY-NC-ND CC0 CC-BY-NC Public Domain Mark CC-BY-ND Public Domain Dedication and License (PDDL) CC-BY-SA Attribution License (ODC-By) Open Database License (ODC-ODbL) CC-BY-NC-SA
  • 42.
    Concerns and Limitations • There is some argument about whether or not metadata can be protected under copyright at all. Copyright protects a creative work, and some argue that metadata is scientific fact, rather than creative work. • Databases are protected differently in the EU and US, for example. • Public Domain and No Known Copyright... • Issuing blanket copyright over all works on a website, even though some may be in the public domain
  • 43.
    Examples and precedents • Bibliographic data: • British Library (CC0), University of Michigan (CC0), Stanford (CC-BY) have published large, raw datasets of bibliographic data they have created (being careful not to publish OCLC or other vendor controlled or licensed metadata)
  • 44.
    Examples and precedents • Civil War Data 150 • Metadata from contributing federal institutions are largely considered to be Public Domain. • State, local, university & individual researchers are considering policies for metadata publishing on a case by case basis.
  • 45.
  • 46.
    Sciences leading theway vs. Humanities • In the sciences, there have been a lot of advances in the realm of Open Data, which will provide models for humanities research as well • Nano Publishing: the idea of publishing datasets separately from research findings, so that it can more easily be built upon and integrated into other datasets. Several scientific journals have already started this. • Federally funded medical research must have a data management plan and some funders are requiring that data be published separately from analysis and findings as Open Data
  • 47.
    In summary Open • put it out there... • published, shared, and/or open • tools • metadata vs. assets
  • 48.
    Google Refine • A tool for large datasets, cleaning and reconciling • http://code.google.com/p/google-refine/ • Extremely powerful, though scripting language has not yet been very well documented. • Enables you to reconcile data against the 20 million + known entities in Freebase
  • 49.
    What Would YouDo? • Conceptualizing domains, Linked Open Data projects, collaborations, etc
  • 50.
    Join the LODLAMmovement • #lodlam hashtag on Twitter • http://groups.google.com/group/lod-lam • http://lod-lam.net proceedings online and on the road for the next year at various annual meetings and conferences • Contribute!
  • 51.
    Thanks @NYPL_Labs Team @edsu & crew Sloan Foundation, NEH, Internet Archive Historypin & all y’all.

Editor's Notes

  • #2 \n
  • #3 \n
  • #4 \n
  • #5 \n
  • #6 exploring history on mobile apps\n
  • #7 people much smarter than I were already on it. earlier in 2009, the father of the World Wide Web, Tim Berners-Lee, was taking his message of Linked Open Data to the streets. How we can build a web of data... sounds familiar... and it seems to worked out the first time... From a web of documents, to a web of data\n
  • #8 and that web of data is already growing rapidly...\n
  • #9 What if we begin to apply this to the vast amounts of data at libraries, archives, and museums?\n
  • #10 \n
  • #11 \n
  • #12 \n
  • #13 It started for me with the book Linked, which was first published in 2002. I don&amp;#x2019;t think I read it until 2003 or so, but it changed my life. The explanations of mathematical graph and network theory in lay terms helped me to see how an understanding of interconnectedness would allow us to do amazing things with the disparate datasets around us. \n
  • #14 --Our data and databases have been organized in tables\n--which works, but only to a point\n
  • #15 The World Wide Web is much more like a graph, and the ability to link to disparate datasets relies on our ability to understand data as nodes and links in a graph\n
  • #16 \n
  • #17 \n
  • #18 \n
  • #19 \n
  • #20 \n
  • #21 Where did we get all that info about Ed? He published it here.\n
  • #22 \n
  • #23 \n
  • #24 \n
  • #25 \n
  • #26 \n
  • #27 \n
  • #28 \n
  • #29 \n
  • #30 \n
  • #31 \n
  • #32 \n
  • #33 \n
  • #34 \n
  • #35 \n
  • #36 \n
  • #37 \n
  • #38 \n
  • #39 \n
  • #40 \n
  • #41 \n
  • #42  In the last several years, Creative Commons have provided standardized, portable legal tools that make it easier for individuals and institutions to use. Also see licenses by Open Knowledge Foundation, designed for databases.\n
  • #43 \n
  • #44 \n
  • #45 \n
  • #46 \n
  • #47 \n
  • #48 \n
  • #49 \n
  • #50 \n
  • #51 \n
  • #52 \n