Dlf 2012

  • 428 views
Uploaded on

Denver, Colorado …

Denver, Colorado
Sunday, November 4, 2012
Adrian Turner, California Digital Library
Ray R. Larson, School of Information, UC Berkeley
Brian Tingle, California Digital Library

http://www.diglib.org/forums/2012forum/social-networks-and-archival-context-project/

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
428
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Ray R. Larson, School of Information, UC BerkeleyBrian Tingle, California Digital LibraryAdrian Turner, California Digital Library2012 DLF Forum | Denver, CO
  • 2. http://socialarchive.iath.virginia.edu
  • 3. Archival NameAuthority System
  • 4. Hamilton, Alexander, Hamilton Alexander 1757 1804 Patton, G George S. S (George Smith), Luce, Clare Boothe, 1903 1885 1945 1987Oppenheimer, J. Robert, 1904 Sontag, Susan, 1933 20041967 Archival Name Washington, George, 1732 1799 Authority System Whitman, Walt, 1819 1892Patton family Wright, Lloyd, 1890 1978
  • 5. Franklin, Benjamin, 1706 1790Anthony, Susan B y Buckminster Fuller, R. Hamilton, Alexander, Hamilton Alexander 1757 1804 (Richard Buckminster) S. Buckminster),S1895 1983 Patton, G George Berkeley Free Church (George Smith), Luce, Clare Boothe, 1903 1885 1945 1757 1804 Hamilton, Alexander, 1987Bernstein, Leonard, Sontag, Clare Boothe,2004 1987 Oppenheimer, J. Robert, 1904 Luce, Susan, 1933 1903 Oppenheimer, J. Robert, 1904 1967 Archival Name Washington, George, 1732 179919181967 Authority System Whitman, Walt, 1819 1892 Patton family Patton family Block, Herbert, 1909 2001 Wright, Lloyd, 1890 1978Bush, Vannevar, 1890 1974 Patton, George S. ( (George Smith), h) Frankfurter, Felix, 1882 1965 kf l
  • 6. Engelland, Jurgen (George). Walfred. Enwall, Ogie (Aage). Erickson, Norwick, Goodman. Selma Inez. Inez Nygaard, Lars Thomas Thomas. Holmes, Holmes Anna Gudrun HaugeHauge. Fahl, Hans Johan Fredrik. Odmark, Elsie Karlson. Holmes, Elias Kristofferson Velholmen.Fet, Peter Laurits. Ohrt, Sigfrid Eidsness. Hoset, Ole. Flones, Edward. Oliver, Kole Skaflestad. Howard, Barnett Allen, b. 1827. Olson, Alvin E. Fredrickson, Hans. Hamilton, Alexander, 1757 1804 Opsal, Cato Torvald. Hytmo, Guri Olsdatter. 1885 1945 Johnson, Andrew (Anders Johansson).Fredrickson, SF di k Sven Fredrick. F di k Petersen, Greta Jensen. Johnson, Phiea Petersen Stahl.Garberg, Peder. Luce, Clare Boothe, 1903 Rasmussen, Martin. Johnson, Thelma IreneGillam, Chandler B., 1833 1899. Rinne, Esther Wiirre. Sontag, Susan, 1933 2004 1987 Halseth, Otto Hjalmer.Rodney family Washington, George, 1732 1799 Underdal. Handeland, Martha Tweiten.Sandback, George Brun. Jorgenson, Jorgen Aadneram. Hansen, A H Anne S h id S Schmidt.Saure, Si t A d Sivert Andreas. Oppenheimer, J. Robert, 1904 Enwall, Ogie (Aage). Kjersem, Ole Johnson.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen. Whitman, Walt, 1819 1892 Erickson, 1967 Knudsen, Johanne.Hemmestad, Olga Kristine Brodahl. Henry, Oscar M., 1851 1916. Archival Name Selma Inez. Flones, Edward. Kofoed, Thorvald Andreas. Larsen, Elias. Fredrickson, Hans. Holmes, Anna Gudrun H l A G d Authority System Fahl, Fahl Hans Johan Fredrik. Fredrik Lillelien, Thor. Fet, Peter Laurits. Norberg, Jonas Walfred. Fredrickson, Sven Fredrick. Hauge. Patton family Garberg, Peder. Norwick, Goodman. Loe, Otto Calvin. Molund, Erik Wilhelm. Nygaard, Lars Thomas. ChandlerNakkerud,1899.Amanda Treland. Gillam, B., 1833 Inga Holmes, Elias Kristofferson Odmark, Elsie Karlson. Halseth, Otto Hjalmer. Nakkerud, Trygve Bloch. Velholmen . Patton, George S. Ohrt, Sigfrid Eidsness. Oh t Si f id Eid . Nerland, Einar Magnus. Nelson, Amanda. Nelson AmandaHoset, Ole. Oliver, Kole Skaflestad. (George Smith),Howard, Barnett Allen, b. 1827. Olson, Alvin E. Wright, Lloyd, 1890 1978 Nielsen , Einer.Hytmo, Guri Olsdatter. Opsal, Cato Torvald. Petersen, Greta Jensen. Nilsen, Martha Dagsvik. Knudsen, Johanne. Nissen, Nissen Ole Andreas Nissenivert Andreas Andreas. Rasmussen, Martin. R M ti Kofoed, Thorvald Andreas. Johnson, Andrew (Anders Johansson). Rinne, Esther Wiirre. Nakkerud, Inga Amanda Treland. Johnson, Phiea Petersen Stahl. Rodney family Nakkerud, Trygve Bloch. Nelson, Amanda. Sandback, George Brun. Johnson, Thelma IreneNerland, Einar Magnus. Saure, SHandeland, MarthaNielsen, Einer. Tweiten. Underdal. UnderdalNilsen, Martha Dagsvik. Jorgenson, Jorgen Aadneram. Hansen, Anne Schmidt. Nissen, Ole Andreas Nissen. Kjersem, Ole Johnson. Hansen, Sylvia (Solveig). Norberg, Jonas Haug, Olga Karoline Nilsen.
  • 7. Engelland, Jurgen (George). Nelson, Amanda. Hoset, Ole. Enwall, Ogie (Aage). Nerland, Einar Magnus. Howard, Barnett Allen, b. 1827. Erickson, Nielsen, Einer. Hytmo, Guri Olsdatter.Engelland, Jurgen (George). Selma Inez. Engelland, Jurgen (George). Nelson, Amanda. Martha Dagsvik. Nilsen, Nelson, Amanda.Hoset, Ole. Hoset, Ole. Johnson, Andrew (Anders Johansson). Enwall, Ogie (Aage). Fahl, Hans Johan Fredrik. Nerland, Einar Magnus. , Ole Andreas Nissen. Enwall, Ogie (Aage). Nissen, Nerland, Einar Magnus. Barnett Allen, b. 1827. Howard, Barnett Allen, b. 1827. Howard, Fet, Peter Laurits. Erickson, Ei k Nielsen, Ei Ni l Einer. E i k Erickson, Nielsen, Ei Ni l Einer. Johnson, H t G Petersen Stahl.H t G Phiea i Olsdatter. Hytmo, Guri Ol d tt Hytmo, GuSelma Inez. Selma Inez. Flones, Edward. Nilsen, Martha Dagsvik.Norberg, Jonas Walfred. Nilsen, Martha Dagsvik. Johnson, Andrew (Anders Johansson). Irene (Anders Johnson, Thelma Johnson, AndrewFahl, Hans Johan Fredrik. Fredrickson, Hans Johan Fredrik. Fahl, Hans. Norwick, Nissen. Benjamin, 1706 1790 Nissen, Ole Andreas Franklin, Goodman. Ole Andreas Nissen.Petersen Stahl. Phiea Peterse Nissen, Johnson, Phiea Johnson,Underdal.Fet, PeterFredrickson, Sven Fredrick.Peter Laurits. Laurits. Fet, Nygaard, Lars Thomas. Anthony, Susan B y Flones, Edward. Garberg, Peder. Flones, Edward. Odmark, Elsie Karlson. Buckminster Norberg, Jonas Walfred. Norberg, Jonas Walfred. R. JorgenIrene Johnson, Thelma Aadneram. Thelm Jorgenson, Fuller, Johnson, FredricksonB., 1833 1899 Hamilton AlexanderNorwick, Goodman. Fredrickson, Hans 1899.Hamilton, Alexander, 1757 1804 Hans. Fredrickson, Hans. Fredrickson Hans Norwick, Goodman. Gillam, Gillam Chandler BFredrickson, Sven Fredrick. Fredrickson, Sven Hjalmer. Nygaard, Lars Thomas. Halseth, Otto Fredrick. (Richard Buckminster) S. Buckminster),S1895 1983 Patton, G George Ohrt, Sigfrid Eidsness. Kj Underdal. Underdal h Kjersem, Ol Johnson. Ole J U Nygaard, Lars Oliver, Kole Skaflestad. Thomas.Garberg, Peder. Handeland,Peder. Odmark, Elsie Karlson.E. Garberg, Martha Tweiten. Olson, Alvin (George Smith), Odmark, Elsie Karlson. Jorgenson, Jorgen Aadneram. Jorgen AadJorgenson, Knudsen, Johanne. Berkeley Free ChurchGillam, Chandler B., 1833 1899. Gillam, Chandler B., 1833 1899. Luce, Clare Boothe, 1903 Hansen, Anne Schmidt. Ohrt, Sigfrid Eidsness. Opsal, Cato Torvald. Ohrt, Sigfrid Eidsness. Kofoed, Thorvald Andreas. Ole J Kjersem, Ole Johnson. Kjersem, Halseth, Otto Hjalmer. Hansen, Sylvia (Solveig). 1885 1945 Halseth, Otto Hjalmer. Kole Skaflestad. Oliver, Petersen, Greta Jensen.Larsen, Elias. 1757 1804 Hamilton, Alexander, Oliver, Kole Skaflestad. g, g Karoline Nilsen. 1987 Haug, OlgaHandeland, Martha Tweiten.Olson, Alvin E. Martha Tweiten.Olson, Rasmussen Martin.Lillelien, Thor. Handeland, Alvin E. Rasmussen, Martin Knudsen, Johanne. Knudsen, J Hansen, Anne Schmidt. Hemmestad, Olga Kristine Brodahl. Hansen, Anne Schmidt. Cato Torvald. Opsal, Kofoed, Thorvald Andreas. Loe, OttoCato Torvald. Opsal, Calvin. Kofoed, Thorv Bernstein, Leonard, Sontag, Clare Boothe,2004 1987 Henry,Haug, Olga Karoline Nilsen. Holmes, Anna Gudrun Luce, Susan, 1933 1903Hansen, Sylvia (Solveig). OscarHansen, Sylvia (Solveig). M., 1851 1916. Rinne, Esther Wiirre. Rodney family Petersen, Greta Jensen.Larsen, Elias. Oppenheimer, J. Robert,George Brun. Haug, Olga Karoline Nilsen. Sandback, 1904Rasmussen, Martin. Petersen, Greta Jensen.Larsen, Elias. Molund, Erik Wilhelm. Lillelien, Thor. Rasmussen, Martin.Lillelien, Thor. Treland. Nakkerud, Inga AmandaHemmestad, Olga Kristine Brodahl. Oppenheimer, J. Robert, 1904 1967 Archival Name Hemmestad, Olga Kristine Brodahl. Washington, George, 1732 1799 Rinne, Esther Wiirre. Andreas. Rinne, Esther Wiirre. Loe, Otto Calvin. Nakkerud, Otto Calvin. Loe, Trygve Bloch. 1918 g Holmes, Anna Gudrun System Henry, Oscar M., 18511967 Henry,Rodney family 1916. Hauge. Holmes, Anna Gudrunl l d d 1916. Authority Saure, Sivert Oscar M., 1851 Sandback, George Brun. Rodney( family Enwall, Ogie (Aage). g g ) Molund, Erik Wilhelm. Molund, Erik Wilhelm. Nelson, Amanda. Sandback, George Brun. Nerland, Einar Magnus. d Treland.N kk d I Erickson, Nakkerud, Inga A N kk d I Amanda l d Nakkerud, Inga A Amanda d Holmes, Elias Kristofferson Whitman, Walt, 1819 1892 Nakkerud, Trygve Bloch. Nakkerud, Trygve Bl Hauge. Hauge. Patton family Saure, Sivert Andreas. Nielsen Saure, Sivert Andreas. Velholmen Patton family Block, .Herbert, 1909 2001 Inez. Holmes, Elias Kristofferson Elias Kristofferson Hoset, Ole. Holmes, Selma Enwall, Ogie (Aage). Fahl, Hans Johan Fredrik. Erickson, Enwall, Ogie (Aage).Nelson, Amanda. Nerland, Einar Magnus. Wright, Lloyd, 1890 1978 Nilsen, Erickson, Martha Dagsvik. , Einer. Nelson, Amanda Nerland, Einar Magnus. Howard, Barnett Allen, b. 1827. Velholmen V lh l Vannevar, V lh l1974 Velholmen . Selma Inez. InezNielsenSmith), Fet, Peter Laurits. Patton, George S. ( (George Flones,. Ed Fl h) Edward.d Selma Inez. Inez Nissen, Ole Andreas Nissen. Norberg, Jonas Walfred. , Einer Einer. Nielsen , Ei Hytmo, Guri Olsdatter.1890Hoset, Ole. Bush, Hoset, Ole. Fahl, Hans Johan Fredrik. Fredrickson, Hans JohanNilsen, Martha Dagsvik. Norwick, Goodman. Dagsvik. Fahl, Hans. Fredrik. Nilsen, Martha Fet, Peter Laurits. Fredrickson, Sven Fredrick. Nissen, Ole Andreas Nissen. Johnson, Andrew (Anders Johansson). Fet, Peter Laurits. Nissen, Ole Andreas Nissen. Nygaard, Lars Thomas.Howard, Barnett Allen, b. 1827. Howard, Barnett Allen, b. 1827. Flones, Edward. Flones, Edward. Norberg, Jonas Walfred. Odmark, Norberg, Jonas Walfred. Elsie Karlson. Hytmo, Guri Olsdatter.Garberg, Peder. Johnson, Phiea Petersen Stahl.Hytmo, Guri Olsdatter. Johnson, Thelma Irene Underdal. Fredrickson, Hans. Fredrickson, Hans. Norwick,Ohrt, Sigfrid Eidsness. Norwick, Goodman. Goodman. Jorgenson AndrewAadneram Johnson, Johnson (Anders Johansson) Jorgenson, Jorgen Aadneram. Johansson). Johnson, Fredrickson, Sven Fredrick. 1833Fredrickson, Sven Fredrick. Skaflestad Gillam, Chandler B., Johnson Andrew (Anders Johansson) Johansson). 1899. Nygaard, Nygaard Lars Thomas Oliver, Thomas. Oliver Kole Skaflestad. Nygaard, Nygaard Lars Thomas Thomas. Frankfurter, Felix, 1882 Peder. Garberg, Peder. kf Garberg, 1965 lJohnson, Phiea Petersen Stahl. Johnson, Phiea Petersen Stahl. Kjersem, Ole Johnson. Halseth, Otto Hjalmer. Olson, Alvin E. Karlson. Odmark, Elsie Odmark, Elsie Karlson. Johnson, Thelma Knudsen, Johanne. Irene Underdal. Johnson, Thelma Irene Underdal. Handeland, Martha Tweiten. Ohrt, Sigfrid Eidsness. Opsal, Cato Torvald. Ohrt, Sigfrid Eidsness. Gillam, Chandler B., 1833 1899. Gillam, Chandler B., Skaflestad. Hansen, Anne Schmidt. Oliver, Kole 1833 1899. Jensen. Oliver, Kole Skaflestad.Jorgenson, Jorgen Aadneram.Thorvald Andreas. Aadneram. Kofoed, Jorgenson, Jorgen Petersen, Greta Halseth, Otto Hjalmer. Hansen, Sylvia (Solveig). Halseth, Otto Hjalmer. Olson, Alvin E.Kjersem, Ole Johnson. Larsen, Elias. Ole Johnson. Kjersem, Rasmussen, Martin. E. Olson, Alvin Handeland, Martha Tweiten. Handeland, Martha Tweiten. Haug, Olga Karoline Nilsen. Opsal, Cato Torvald. Knudsen, Johanne. Thor. , Lillelien, Knudsen, Johanne. , Rinne, Esther Wiirre.Torvald. , Opsal, Cato Hemmestad, Ol Schmidt. Brodahl. Hansen, H Andreas.dAnne Schmidt B d hl Hansen AnneGreta Jensen. Rodney family Greta Jensen. Hansen, Hansen Olga Kristine Ki i Petersen, Schmidt Schmidt. Petersen, Loe, Otto Thorvald Andreas. Kofoed, Calvin. Kofoed, Thorvald Hansen, Sylvia (Solveig). 1851 Hansen, Sylvia (Solveig). Henry, Oscar M., 1916. Rasmussen, Martin. Rasmussen, Martin. Molund,Larsen, Elias. Erik Wilhelm. Larsen, Elias. Haug, Olga Karoline Nilsen. Sandback, George Brun. Lillelien, Thor. Holmes, Anna Gudrun Haug, Olga Karoline Nilsen.Saure, Sivert Andreas. Lillelien, Thor. Hauge. Rinne, Esther Wiirre. Rinne, Esther Wiirre. Nakkerud, Inga Amanda Treland. Hemmestad, Olga Kristine Brodahl. Velholmen. Kristine Brodahl. Hemmestad, OlgaLoe, Otto Calvin. Loe, Otto Calvin. Holmes, Elias Kristofferson Rodney family Rodney family Nakkerud, Trygve Bloch. Henry, Oscar M., 1851 1916. Henry, Oscar M.,Sandback, George Brun. 1851 1916.Molund, Erik Wilhelm. Molund, Erik Wilhelm. Sandback, George Brun. Holmes, Anna Gudrun Hauge. Holmes, Anna Gudrun Hauge. Andreas Saure Sivert Saure Sivert AndreasN kk d I A d T l d N kk d I A d T l d
  • 8. Archival Name Archival Name Authority SystemAuthority System
  • 9. Archival Name Archival Name Authority SystemAuthority System
  • 10. Archival NameAuthority System
  • 11. Background• Research and demonstration project• Multi year funding• National Endowment for the Humanities (2010 2012)• Andrew W. Mellon Foundation ((2012 2014)
  • 12. Objectives1. Develop tools for extracting EAC CPF l l f records, drawing on existing data (EAD , g g ( finding aids, MARC records)2. Match, merge2 Match merge, and enhance; build a large test corpus of EAC CPF records3. Create a prototype biographical resource and access system using system, those records
  • 13. Objectives1. Develop tools for extracting EAC CPF l l f records, drawing on existing data (EAD , g g ( finding aids, MARC records)2. Match, merge2 Match merge, and enhance; build a large test corpus of EAC CPF records3. Create a prototype biographical resource and access system using system, those records
  • 14. Objectives1. Develop tools for extracting EAC CPF l l f records, drawing on existing data (EAD , g g ( finding aids, MARC records)2. Match, merge2 Match merge, and enhance; build a large test corpus of EAC CPF records3. Create a prototype biographical resource and access system using system, those records
  • 15. Project Team• University of Virginia, Institute for Advanced Technology in the Humanities – Daniel Pitti (PI) and Worthy Martin• UC Berkeley School of Information – Ray Larson and Yiming Liu• California Digital Library – Rachael Hu, Brian Tingle, and Adrian Turner
  • 16. Project Team• Terry Catapano (Columbia University)• Sara Sprenkle (Washington and Lee University)• Sarah Wells (University of Virginia)• Kathy Wisser (Simmons Graduate School of Library and Information Science)• Tom L h (U i T Lynch (University of Illinois School of Library it f Illi i S h l f Lib and Information Science)
  • 17. EAC CPF• XML based data structure standard for encoding archival authority records g y• Authorized name headings for the entity• Biographical/historical context f the entity i hi l/hi i l for h i• Links to resources created by the entity y y• Links to resources about the entity
  • 18. Title
  • 19. Title
  • 20. Title
  • 21. Data Sources• EAD fi di aids [~150,000] finding id – 13 regional and statewide consortia – 35 repositories in US, UK, and France; multiple US federal agencies• MARC21 records [~1.5 million] – OCLC W ldC t WorldCat• Authority records – OCLC Research: Virtual International Authority File (VIAF) [~16 million] – Getty Vocabulary Program: Union List of Artist Names (ULAN) [ [~120,000] ] – Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives
  • 22. Consortia Individual institutions•Archives Florida •American Philosophical Society •Northwestern University•ArchivesHub (UK) •Archives nationales (France) •Princeton University•Arizona Archives Online •Archives of American Art •Rutgers University•EAD FACTORY (OhioLink) • Points P i t •Bibliothèque nationale de France •Smithsonian Institution Archives Bibliothèque Smithsonian•Five Colleges •BnF Archives et manuscripts •Syracuse University•Maine Archival Collections •French Union Catalog •University of AlabamaOnline (MACON) ( ) •Brigham Young University •University of Chicago•Northwest Digital Archives •Church of Latter Day Saints •University of Connecticut(NWDA) Archives •University of Delaware•Online Archive of California •Columbia University •University of Florida•Philadelphia Area •Cornell University Cornell •University of Illinois UniversityConsortium of Special •Duke University •University of KansasCollections Libraries (PACSCL) •Harvard University •University of Maryland•Rhode Island Archival & •Indiana University •University of Michigan Bentley &Manuscript Collections Online •Library of Congress (publicly Special Collections(RIAMCO) available without restriction) •University of Minnesota•Rocky Mountain Online •Minnesota Historical Society •University of NebraskaArchive (RMOA) •Massachusetts Institute of Massachusetts •University of North Carolina, University Carolina•Texas Archival Resources Technology Chapel HillOnline (TARO) •National Library of Medicine •University of Utah•Virginia Heritage •New York Public Library •Utah State Archives •New York University •Utah State University •North Carolina State •Yale University
  • 23. Data Sources• EAD fi di aids [~150,000] finding id – 13 regional and statewide consortia – 35 repositories in US, UK, and France; multiple US federal agencies• MARC21 records [~1.5 million] – OCLC W ldC t WorldCat• Authority records – OCLC Research: Virtual International Authority File (VIAF) [~16 million] – Getty Vocabulary Program: Union List of Artist Names (ULAN) [ [~120,000] ] – Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives
  • 24. Data Sources• EAD fi di aids [~150,000] finding id – 13 regional and statewide consortia – 35 repositories in US, UK, and France; multiple US federal agencies• MARC21 records [~1.5 million] – OCLC W ldC t WorldCat• Authority records – OCLC Research: Virtual International Authority File (VIAF) [~16 million] – Getty Vocabulary Program: Union List of Artist Names (ULAN) [ [~120,000] ] – Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives
  • 25. Prototype Access System• text http://socialarchive.iath.virginia.edu //
  • 26. SNACSocial Networks and Archival Context
  • 27. SNACSocial Networks and Archival Context
  • 28. NAACNational Archival Authorities Cooperative
  • 29. NAACNational Archival Authorities Cooperativehttp://socialarchive.iath.virginia.edu/ NAAC_index.html
  • 30. Activities1. Cultivate EAC CPF expertise across the archival community, through 140 SAA community hosted workshops2. Develop a blueprint for a sustainable, national archival authority cooperative
  • 31. Activities1. Cultivate EAC CPF expertise across the archival community, through 140 SAA community hosted workshops2. Develop a blueprint for a sustainable, national archival authority cooperative
  • 32. Activities1. Cultivate EAC CPF expertise across the archival community, through 140 SAA community hosted workshops2. Develop a blueprint for a sustainable, national archival authority cooperative Stay tuned for fall 2013!
  • 33. Ray R. Larson, School of Information, UC BerkeleyBrian Tingle, California Digital LibraryAdrian Turner, California Digital Library2012 DLF Forum | Denver, CO
  • 34. Brian Tingle and Adrian TurnerRBMSPre Conference 2012San Diego, CA
  • 35. The Social Networks and Archival Context Project: Status Report Adrian Turner*, Ray R. Larson**, Brian Tingle* *California Digital Library **University of California, Berkeley - School of Information Thanks to Daniel V. Pitti of the Institute for Advanced Technology in the Humanities, University of Virginia, and Brian Tingle of the California Digital Library for many of the slides hereDLF 2012 - Denver 2012-11-04 - SLIDE
  • 36. Funding and People • Funding and Timeline – National Endowment for the Humanities – May 2010-April 2012 – Andrew W. Mellon Foundation – May 2012-April 2014 • People – Daniel Pitti (PI) and Worthy Martin (Institute for Advanced Technology in the Humanities, University of Virginia) – Adrian Turner and Brian Tingle (California Digital Library, University of California) – Ray Larson (School of Information, University of California, Berkeley)DLF 2012 - Denver 2012-11-04 - SLIDE
  • 37. Two Interrelated Project • Further the transformation of archival description (separate description of records from description of people documented in them) in order to … • Enhance access to archival resources, though in fact all cultural heritage resources • Enhance understanding of resources by providing the social-professional context within which people lived and workedDLF 2012 - Denver 2012-11-04 - SLIDE
  • 38. The Source Data • EAD-encoded finding aids (guides to archival records) – 150K – Primarily from U.S. sources, but also U.K. and France • Archival authority records (360K) – National Archives and Records Administration – State Archive of New York – Smithsonian Institution – British Library – National Archives (France) & BnF • WorldCat Archival Descriptions: 2MDLF 2012 - Denver 2012-11-04 - SLIDE
  • 39. Library and Museum Authority Records • Getty Vocabulary Program: Union List of Artist Names (293K personal and corporate names) • Virtual International Authority File (16M+ cluster records) – Contributed from around the world by national libraries and othersDLF 2012 - Denver 2012-11-04 - SLIDE
  • 40. Methods and Processing • Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about)DLF 2012 - Denver 2012-11-04 - SLIDE
  • 41. Example EAD Record (Hub) <ARCHDESC LEVEL = "FONDS" LANGMATERIAL = "English"><EAD> <DID> <EADHEADER LANGENCODING = "ISO 639"> <REPOSITORY> <EADID> University of Manchester, John Rylands University Library of ManchesterGB 0133 TAB </REPOSITORY> </EADID> <UNITID ENCODINGANALOG = "ISADG3.1.1." COUNTRYCODE = "GB" <FILEDESC> REPOSITORYCODE = "0133"> <TITLESTMT> GB 0133 TAB <TITLEPROPER> </UNITID>Tabley Muniments <UNITTITLE LABEL = "Title" ENCODINGANALOG = "ISADG3.1.2."> </TITLEPROPER> Tabley Muniments </TITLESTMT> </UNITTITLE> <PUBLICATIONSTMT> <UNITDATE LABEL = "Dates of Creation" ENCODINGANALOG = "ISADG3.1.3."> <PUBLISHER> 19th centuryJohn Rylands University Library of </UNITDATE>Manchester <PHYSDESC LABEL = "Extent" ENCODINGANALOG = "ISADG3.1.5."> </PUBLISHER> <EXTENT> <ADDRESS> 1.24 cu.m <ADDRESSLINE> </EXTENT>150 Deansgate </PHYSDESC> </ADDRESSLINE> <ORIGINATION LABEL = "Creator" ENCODINGANALOG = "ISADG3.2.1."> <ADDRESSLINE> <FAMNAME SOURCE = "NCARULES">Manchester Warren, family, of Tabley, Cheshire </ADDRESSLINE> </FAMNAME> <ADDRESSLINE> <PERSNAME SOURCE = "NCARULES">... (Parts removed )… Warren, John Byrne Leicester, 1835-1895, 3rd Baron de Tabley, poet </FRONTMATTER> </PERSNAME> </ORIGINATION> </DID> DLF 2012 - Denver 2012-11-04 - SLIDE
  • 42. Example EAD Record (Hub) <BIOGHIST ENCODINGANALOG = "ISADG3.2.2."> <HEAD> Administrative/Biographical History </HEAD> <P> The poet John Byrne Leicester Warren, later 3rd and last Baron de Tabley, of Tabley near Knutsford, Cheshire, was born in 1835, the son of the 2nd Baron de Tabley (1811-1887), and his wife, Catherina. His mother was Italian, the daughter of the count de Soglio, and Warren spent much of his early childhood with her in Italy and Greece. He was educated at Eton and Christ Church, Oxford. At Oxford he published a volume of poetry. Originally he published under the pseudonyms George F. Preston (1859-1862) and William Lancaster (1863-1868), but latterly under his own name. </P> <P> His early verse included <TITLE> Praeterita </TITLE> (1863), <TITLE> Eclogues and Monodramas </TITLE> (1864), <TITLE> Studies in Verse </TITLE> (1865), <TITLE> Philocletes </TITLE> (1866), and <TITLE> Orestes </TITLE> (1868). His early work was Tennysonian in style, but he was later to be influenced by both Browning and Swinburne. In 1873 he produced …. (some data removed)…DLF 2012 - Denver 2012-11-04 - SLIDE
  • 43. Example EAD Record (Hub) <SCOPECONTENT ENCODINGANALOG = "ISADG3.3.1."> <HEAD> Scope and Content </HEAD> <P> The collection consists mainly of the personal papers of the 3rd Baron de Tabley. The papers reflect his interests in literature, politics, botany and numismatics and include correspondence with numerous prominent later Victorian figures. Attention should also be drawn to de Tabley’s extensive and important collection of armorial bookplates. </P> <P> Correspondents include Sir Mountstuart Grant Duff, Edmund Gosse, Lord Houghton, A.C.Benson, and Robert Bridges. There are volumes of Tableys essays and verse, as well as a considerable number of notebooks and loose manuscripts of verse and other writings. There are various bundles and boxes relating to &quot;Coins&quot;, &quot;Botany&quot;, &quot;Poetry&quot;, &quot;Literary&quot;, &quot;Financial&quot; and bookplates. </P> </SCOPECONTENT> <ADD> <OTHERFINDAID ENCODINGANALOG = "ISADG3.4.6."> <P> Preliminary survey list. </P> </OTHERFINDAID> <RELATEDMATERIAL ENCODINGANALOG = "ISADG3.5.3."> <P> There is correspondence with the 3rd Baron de Tabley among the Edward Freeman Papers, held at JRULM. The Library also has custody of the important Tabley Book Collection. </P> </RELATEDMATERIAL> <SEPARATEDMATERIAL> <P> The family and estate papers of the Leicester-Warren Family of Tabley are held by Cheshire Record Office. Some of these papers were originally in the custody of the John Rylands University Library of Manchester. </P> </SEPARATEDMATERIAL> </ADD>DLF 2012 - Denver 2012-11-04 - SLIDE
  • 44. Example EAD Record (Hub)<CONTROLACCESS> <PERSNAME SOURCE = "NCARULES"> <HEAD> <EMPH ALTRENDER = "surname">Milnes</EMPH>Index terms <EMPH ALTRENDER = "forename">Richard Monckton</EMPH> </HEAD> <EMPH ALTRENDER = "dates">1809-1885</EMPH> <GEOGNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "epithet">1st Baron Houghton</EMPH><EMPH ALTRENDER = "a">Tabley Inferior</EMPH> </PERSNAME><EMPH ALTRENDER = "a-">Cheshire SJ7378</EMPH> <SUBJECT SOURCE = "LCSH"> </GEOGNAME> <EMPH ALTRENDER = "a">Bookplates</EMPH> <PERSNAME SOURCE = "NCARULES"> </SUBJECT><EMPH ALTRENDER = "surname">Benson</EMPH> <SUBJECT SOURCE = "LCSH"><EMPH ALTRENDER = "forename">Arthur Christopher</EMPH> <EMPH ALTRENDER = "a">Botany</EMPH><EMPH ALTRENDER = "dates">1862-1923</EMPH> </SUBJECT> </PERSNAME> <SUBJECT SOURCE = "LCSH"> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "a">Numismatics</EMPH><EMPH ALTRENDER = "surname">Bridges</EMPH> </SUBJECT><EMPH ALTRENDER = "forename">Robert Seymour</EMPH> <SUBJECT SOURCE = "LCSH"><EMPH ALTRENDER = "dates">1844-1930</EMPH> <EMPH ALTRENDER = "a-">Poetry</EMPH> </PERSNAME> <EMPH ALTRENDER = "a">Modern</EMPH> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "y">19th century</EMPH><EMPH ALTRENDER = "surname">Duff</EMPH> </SUBJECT><EMPH ALTRENDER = "title">Sir</EMPH> </CONTROLACCESS><EMPH ALTRENDER = "forename">Mountstuart Elphinstone Grant</EMPH> </ARCHDESC><EMPH ALTRENDER = "dates">1829-1906</EMPH> </EAD><EMPH ALTRENDER = "epithet">Knight</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"><EMPH ALTRENDER = "surname">Gosse</EMPH><EMPH ALTRENDER = "title">Sir</EMPH><EMPH ALTRENDER = "forename">Edmund William</EMPH><EMPH ALTRENDER = "dates">1849-1928</EMPH><EMPH ALTRENDER = "epithet">Knight</EMPH> </PERSNAME>DLF 2012 - Denver 2012-11-04 - SLIDE
  • 45. 2010-2012 Extraction Results • Source data: 30,000 finding aids • EAC-CPF records extracted – LoC: 43,702 from 1,159 finding aids – OAC: 91,811 from ~15,400 – NWDA: 22,609 from 5,160 – VH: 15,175 from 8,390 – Total 173,297DLF 2012 - Denver 2012-11-04 - SLIDE
  • 46. Methods and Processing • Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about)DLF 2012 - Denver 2012-11-04 - SLIDE
  • 47. The Problem • Proliferation of the forms of names – Different names for the same person – Different people with the same names • Examples – from Books in Print (semi-controlled but not consistent) – ERIC author index (not controlled)DLF 2012 - Denver 2012-11-04 - SLIDE
  • 48. Goethe …etc…DLF 2012 - Denver 2012-11-04 - SLIDE
  • 49. John MuirDLF 2012 - Denver 2012-11-04 - SLIDE
  • 50. Library and Archive Authority • Library (or bibliographic) authority control is almost exclusively about the control of names • Archival authority control involves biographical- historical description of the CPF entity – Descriptions based on controlled vocabularies, for example, occupations, place of birth and death – But also biographical-historical description • Prose • Chronological list • Archival authority control provides context for understanding records, the context of their creation, the provenanceDLF 2012 - Denver 2012-11-04 - SLIDE
  • 51. Merging EAC-CPF Records LCNAF Repository VIAF Repository ULAN Repository Cheshire Search Connect Connect records using exactly name Merge matching authority records information Repository of Repository ofEAC Repository connected EAC merged EAC Records Records (MongoDB) DLF 2012 - Denver 2012-11-04 - SLIDE
  • 52. Merging EAC-CPF Records VIAF Repository Cheshire Search Connect Connect records using exactly name Merge matching authority records information Repository of Repository ofEAC Repository connected EAC merged EAC Records Records (MongoDB) DLF 2012 - Denver 2012-11-04 - SLIDE
  • 53. Connect Exact Matches • The EAC-CPF records provide the names without having to parse texts, etc. • Allows us to use some simple methods like exact matching – Assume identical name entries means the same person/corporate body/family – Enter the full names and record IDs into a database and flag IDs with same names for mergingDLF 2012 - Denver 2012-11-04 - SLIDE
  • 54. But… • Exact merging assumes that archives are following LC cataloging practice in their EAD records – There are some problems with this assumptionDLF 2012 - Denver 2012-11-04 - SLIDE
  • 55. Some failures for merging… • Different abbreviations: – A. & G. Carisch & C. – A. & G. Carisch & Co. • And spacing issues: – A. C. Peters & Bro. – A. C. Peters & Brother. – A. C. Peters. (??) – A. C.Peters & Bro. • Completeness and alternate rules – Tabb, John B. (John Banister), 1845-1909. – Tabb, John Banister, 1845-1909. • Also differing transliterations for non-Latin scriptsDLF 2012 - Denver 2012-11-04 - SLIDE
  • 56. More… • Variant romanizations (and spacing): – M. P. Belaieff. – M. P. Belaïeff. – M. P. Bieliaev. – M.P. Belaïeff. – M.P.Belaïeff. • Initials vs. names: – Zabolotskii, N.A. – Zabolotskii, Nikolai Alekseevich, 1903-1958. – Zabolotskii.DLF 2012 - Denver 2012-11-04 - SLIDE
  • 57. More… • Inverted order vs. uninverted – Taylor, Zachary, 1784-1850. – Zachary Taylor. • Various combinations: – Tchaikovsky, Peter I. – Tchaikovsky, Pëtr Il. – Tchaikovsky, Piotr Ilyich. – Tchaikovsky, Pyotr Il. – Tchaikovsky, Pyotr Ilyich.DLF 2012 - Denver 2012-11-04 - SLIDE
  • 58. Merging EAC-CPF Records VIAF Repository Cheshire Search Connect Connect records using exactly name Merge matching authority records information Repository of Repository ofEAC Repository connected EAC merged EAC Records Records (MongoDB) DLF 2012 - Denver 2012-11-04 - SLIDE
  • 59. Search Authority Files • For each name, formulate a search of the VIAF database using the Cheshire system (SGML/XML retrieval system with probabilistic and Boolean matching) – Search both the “authoritative” and “non- authoritative” forms – Consider any name matching a non- authoritative form to be a candidate match for the authoritative form – Flag EAC records that match the same authority record as potential matchesDLF 2012 - Denver 2012-11-04 - SLIDE
  • 60. NGRAM or Shingle Matching Name: Einstein Albert Shingle sequence: ein, ins, nst, ste, tei, ein … , ert Probability that the sequence (ins, nst, ste) follows ein is very high for the name einstein Shingle Language Model for names Krishna Janakiraman and Sean Marimpietri - BiographDLF 2012 - Denver 2012-11-04 - SLIDE
  • 61. Name 1 : Einstein Albert Name 2 : Ainshtain Albert Name 3 : Albert Einstein ein In hta tai na ein In ain ste na sht ste al al nst al nsh nst alb alb alb ins ins ins lbe ein lbe Ain ein lbe ert ert ein ert ein ein rte tei rte tei tei rte Shingle Language Model for names Krishna Janakiraman and Sean Marimpietri - BiographDLF 2012 - Denver 2012-11-04 - SLIDE
  • 62. Merging EAC-CPF Records VIAF Repository Cheshire Search Connect Connect records using exactly name Merge matching authority records information Repository of Repository ofEAC Repository connected EAC merged EAC Records Records (MongoDB) DLF 2012 - Denver 2012-11-04 - SLIDE
  • 63. Merge Flagged Records • For all of the exact matches and authority matches – Use the Authoritative form of the name – Combine data from each match into a single EAC-CPF record – Retain all source record IDs and information • Finally, output the merged EAC-CPF recordsDLF 2012 - Denver 2012-11-04 - SLIDE
  • 64. Inputs to SNAC merging • LoC: 43,702 EAC-CPF records derived from 1159 finding aids • OAC: 91,814 EAC-CPF records derived from ~15,400 finding aids • NWDA: 24952 EAC-CPF records derived from 5,568 finding aids • VH: 15,175 EAC-CPF records • Total: 175,688 Input EAC records for merging • Result: 128,781 “unique” namesDLF 2012 - Denver 2012-11-04 - SLIDE
  • 65. Another view of the numbers… • 95624 Person names merged from 125555 Person records • 31287 Institutions merged from 47189 Institution records • 1980 Families merged from 2899 Family recordsDLF 2012 - Denver 2012-11-04 - SLIDE
  • 66. Merging Conclusions • There will not be a single merging method, but a staged set of approaches that will allow us to go from the simplest exact matches, to (we hope) reliably identifying various variant forms of a name, etc. when corroborated by contextual (date, etc.) informationDLF 2012 - Denver 2012-11-04 - SLIDE
  • 67. Next • Developing an updateable database of merged EAC data (dumping Mongo for PostgreSQL) – Will permit incremental addition of new data and support editing and “forced” merges • Process the 2M WorldCat archival descriptions • Process the 150,000 finding aids • Convert several hundred thousand archival authority records into EAC-CPF and match/ merge processDLF 2012 - Denver 2012-11-04 - SLIDE
  • 68. Methods and Processing • Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about)DLF 2012 - Denver 2012-11-04 - SLIDE
  • 69. For More Information • http://socialarchive.iath.virginia.edu/ (Project website) • http://socialarchive.iath.virginia.edu/xtf/ search (public prototype)DLF 2012 - Denver 2012-11-04 - SLIDE
  • 70. Historic Social Networks Prototype access system
  • 71. Outline• User Persona!• Search and Display!• Network graph visualization!• Linked Data / RDF!• Future Plans
  • 72. Meet the target usersPersonas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brandor product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)• Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and networks.  Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event.  He also TAs an undergraduate history class and sometimes has to help students find topics for papers. "• Connie: Works at an institution that contributed records to the project.  Is going to be asking themselves how this site would be useful to their users.  Wants to understand how their records were used and what the added value is."• Quincy: Library School Student working to QA record matching. "• Adele: Person doing authority work during collection processing. "• Lenny: Lenny likes linked data, and wants to be able to mine the links that have been established programatically.
  • 73. Outline• User Persona!• Search and Display• Network graph visualization!• Linked Data / RDF!• Future Plans
  • 74. Advanced limits match EAC sections
  • 75. Outline• User Persona!• Search and Display!• Network graph visualization • Context widget (needs new name)• Linked Data / RDF!• Future Plans
  • 76. Tinkerpop graph database stack• Simple "property graph" model!• "JDBC for graph databases" [SNAC is using Neo4J for the graphDB]!• XPath like "gremlin" for graph query!• REST interfaces with "Rexster"!• For me, this was 10 to 100 times easier than using RDF
  • 77. Outline• User Persona!• Search and Display!• Network graph visualization!• Linked Data / RDF• Future Plans
  • 78. What is Linked Open Data?• w3c Semantic Web Technology Stack!• Web of atomized Data, not a web of documents!• RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores!• httpRange14; content negotiation; CURIE!• No restrictions on data use; free and easy license!• Lenny wants it, but does Randy?
  • 79. What is Linked Open Data?• Getting to the good stuff! • Blue underlined text! • Pulling in data from multiple sources, in an intelligent way, into a "document"!• Understand and discover relationships!• Open access for research, education, private study and other fair use
  • 80. RDFa owl:sameAs
  • 81. HTML 5 microdata in chron list
  • 82. RDF of the social graph Thanks Ed Summers!
  • 83. Silvia Mazzini" regesta.exe srlhttp://templates.xdams.net/IBC/ontology/eac-cpf.rdf
  • 84. &mode=xml2owl [experimental]
  • 85. My opinion on the use cases for w3c RDF tech• Good for publishing data!• Good for controlled vocabularies!• Data models?!• Most people with open source RDF-store type systems do the real stuff with solr!• Consider a graph database
  • 86. Outline• User Persona!• Search and Display!• Linked Data / RDF!• Network graph visualization!• Future Plans
  • 87. Future Plans• Conduct assessment activities involving members of target audiences to establish mental model of users for design work!• Scale interface to millions of names!• Visualizations useful and integrated (network and geospatial)!• Stable URLs between batches for linked data!• Social and personalization features (gateway to crowdsourcing)!• Integration with local systems (such as with the context widget)
  • 88. • Photo attribution http://www.flickr.com/photos/ dsevilla/139656712/in/photostream/!• http://xtf.cdlib.org/ !• http://code.google.com/p/eac-graph-load/source/ browse/README.txt!• http://tinkerpop.com/!• http://thejit.org/!• https://github.com/tingletech/snac-related-widget