http://socialarchive.iath.virginia.edu/
Discussion Points1. Background2. Overview of prototype biographical resource and  access system3. Future directions and op...
Context• Research and demonstration project• Sponsored by NEH• Grant term: March 2010 – March 2012• Three partner organiza...
Goals• Develop tools for extracting EAC-CPF records,  drawing on existing data (EAD finding aids/  collection guides)• Bui...
What is EAC-CPF?• Encoded Archival Context – Corporate Bodies,  Persons, and Families• Standard for encoding archival auth...
EAD and EAC-CPF
EAD and EAC-CPF
EAD and EAC-CPF
EAD and EAC-CPF
A Vision for Integrated Access
A Vision for Integrated Access                            Freebase
Data Inputs• EAD Finding Aids  – Online Archive of California [~14,000]  – Northwest Digital Archives (NWDA) [~5,200]  – L...
Data Flow      VIA       F      ULA       N
Data Flow• Extract names from EAD finding aids   – Creator names (<...name>) with biographical/organizational histories (<...
Meet the target users
Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic famil...
Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic famil...
Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic famil...
Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic famil...
Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic famil...
EAC’s Implicit Information Architecture
EAC’s Implicit Information Architecture    Expose Schema’s terminology in user interface
EAC’s Implicit Information Architecture    Expose Schema’s terminology in user interface    Metadata Fields / used mostly ...
EAC’s Implicit Information Architecture    Expose Schema’s terminology in user interface    Metadata Fields / used mostly ...
XTF XSLT Frameworkpre filter - do special tokenization to create custom EAC facetsquery parser - CGI params to XTF query XM...
Tinkerpop Graph Stackhttp://www.tinkerpop.com/
social graph visualization code at https://code.google.com/p/eac-graph-load/ simple JSON access to tinkerpop graph on back...
Linked Data / Open DataRDFa owl:sameAs links to VIAF    httpRange-14 (XTF URL + “#entity” for the car)HTML5 microdata chro...
Demohttp://socialarchive.iath.virginia.edu/xtf/search
Future Directions?•   From research and demonstration to longer-term resource?•   Integration of merged data back into EAD...
Questions?http://socialarchive.iath.virginia.edu/
Creative Commons Credit• http://www.flickr.com/photos/danja/2949957005/• photo by Danny Ayers                             ...
Snac webinar v3
Upcoming SlideShare
Loading in...5
×

Snac webinar v3

1,595

Published on

Slides about the SNAC project for OAC webinar.

http://www.cdlib.org/services/dsc/webinars/snac/

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,595
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Snac webinar v3

    1. 1. http://socialarchive.iath.virginia.edu/
    2. 2. Discussion Points1. Background2. Overview of prototype biographical resource and access system3. Future directions and open discussion
    3. 3. Context• Research and demonstration project• Sponsored by NEH• Grant term: March 2010 – March 2012• Three partner organizations: – Institute for Advanced Technology in the Humanities, University of Virginia – School of Information, UC Berkeley – California Digital Library
    4. 4. Goals• Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids/ collection guides)• Build a large test corpus of EAC-CPF records• Create a prototype biographical resource and access system, using those records
    5. 5. What is EAC-CPF?• Encoded Archival Context – Corporate Bodies, Persons, and Families• Standard for encoding archival authority records: – Authorized name headings for the entity – Biographical/historical context for the entity – Links to resources created by the entity, and about the entity • Collections (represented by EAD finding aids) • Bibliographic resources, etc.
    6. 6. EAD and EAC-CPF
    7. 7. EAD and EAC-CPF
    8. 8. EAD and EAC-CPF
    9. 9. EAD and EAC-CPF
    10. 10. A Vision for Integrated Access
    11. 11. A Vision for Integrated Access Freebase
    12. 12. Data Inputs• EAD Finding Aids – Online Archive of California [~14,000] – Northwest Digital Archives (NWDA) [~5,200] – Library of Congress [~900] – Virginia Heritage [~8,300]• Authority Records – Library of Congress: NACO/LCNAF [~4+ million] – Getty Vocabulary Program: Union List of Artist Names (ULAN) [~290,000] – OCLC Research: Virtual International Authority File (VIAF) [intersection with NACO/LCNAF]
    13. 13. Data Flow VIA F ULA N
    14. 14. Data Flow• Extract names from EAD finding aids – Creator names (<...name>) with biographical/organizational histories (<bioghist>) – Names as subjects (<controlaccess>) – Names in correspondence series• Normalize and convert into EAC-CPF; retain link back to EAD(s)• Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles, languages used, and sex (VIAF), and historical data (ULAN)
    15. 15. Meet the target users
    16. 16. Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and networks.  Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event.  He also TAs an undergraduate history class and sometimes has to help students find topics for papers. 
    17. 17. Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and networks.  Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event.  He also TAs an undergraduate history class and sometimes has to help students find topics for papers.  Connie: Works at an institution that contributed records to the project.  Is going to be asking themselves how this site would be useful to their users.  Wants to understand how their records were used and what the added value is.
    18. 18. Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and networks.  Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event.  He also TAs an undergraduate history class and sometimes has to help students find topics for papers.  Connie: Works at an institution that contributed records to the project.  Is going to be asking themselves how this site would be useful to their users.  Wants to understand how their records were used and what the added value is. Quincy: Library School Student working to QA record matching.
    19. 19. Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and networks.  Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event.  He also TAs an undergraduate history class and sometimes has to help students find topics for papers.  Connie: Works at an institution that contributed records to the project.  Is going to be asking themselves how this site would be useful to their users.  Wants to understand how their records were used and what the added value is. Quincy: Library School Student working to QA record matching. Adele: Person doing authority work during collection processing.
    20. 20. Meet the target users Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and networks.  Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event.  He also TAs an undergraduate history class and sometimes has to help students find topics for papers.  Connie: Works at an institution that contributed records to the project.  Is going to be asking themselves how this site would be useful to their users.  Wants to understand how their records were used and what the added value is. Quincy: Library School Student working to QA record matching. Adele: Person doing authority work during collection processing. Lenny: Lenny likes linked data, and wants to be able to mine the links that have been established programatically.
    21. 21. EAC’s Implicit Information Architecture
    22. 22. EAC’s Implicit Information Architecture Expose Schema’s terminology in user interface
    23. 23. EAC’s Implicit Information Architecture Expose Schema’s terminology in user interface Metadata Fields / used mostly for facets
    24. 24. EAC’s Implicit Information Architecture Expose Schema’s terminology in user interface Metadata Fields / used mostly for facets XTF Section Types / based on hierarchy of EAC
    25. 25. XTF XSLT Frameworkpre filter - do special tokenization to create custom EAC facetsquery parser - CGI params to XTF query XMLresult formatter - XTF results to HTMLdoc formatter - EAC-CPF to HTMLhttp://code.google.com/p/xtf-cpf/
    26. 26. Tinkerpop Graph Stackhttp://www.tinkerpop.com/
    27. 27. social graph visualization code at https://code.google.com/p/eac-graph-load/ simple JSON access to tinkerpop graph on backend with javscript on front end in live prototype [graph demo link in prototype] graphML file with open license should be viewable in other tools
    28. 28. Linked Data / Open DataRDFa owl:sameAs links to VIAF httpRange-14 (XTF URL + “#entity” for the car)HTML5 microdata chronologyFuture: RDF Dump with an Open Data License based on Ed Summer’s graphML to RDF python script links to wikipedia and other sources
    29. 29. Demohttp://socialarchive.iath.virginia.edu/xtf/search
    30. 30. Future Directions?• From research and demonstration to longer-term resource?• Integration of merged data back into EAD access systems?• Distributed cooperative archival authority control that is crowd-sourced by researchers and curated by archivists?• Scale up EAD data sources?• More links to external resources (Wikipedia, WorldCat Identities, openURLs)?• Social network visualizations/interactive navigation?• Unique identifiers for EAC-CPF records (ORCID, ISNI, ARK)?• Standardized name entries for source repositories contributing EAC-CPF records?
    31. 31. Questions?http://socialarchive.iath.virginia.edu/
    32. 32. Creative Commons Credit• http://www.flickr.com/photos/danja/2949957005/• photo by Danny Ayers 25
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×