Opening Slide
Building an Archival IdentityManagement Network: Transforming  Archival Practice and Historical             Research      ...
Funding and People• Funding and Timeline   –   National Endowment for the Humanities   –   May 2010-April 2012   –   Andre...
The Source Data• EAD-encoded finding aids (guides to archival  records)  – 150K  – Primarily from U.S. sources, but also U...
Library and Museum Authority Records• Getty Vocabulary Program: Union List of  Artist Names (293K personal and corporate  ...
12/11/122012-11-04 - SLIDE
Methods and Processing• Extract EAC-CPF records from existing EAD-  encoded archival descriptions  – Extracting both creat...
Example EAD Record (Hub)                                        <ARCHDESC LEVEL = "FONDS" LANGMATERIAL = "English"><EAD>  ...
Example EAD Record (Hub)  <BIOGHIST ENCODINGANALOG = "ISADG3.2.2.">    <HEAD>  Administrative/Biographical History    </HE...
Example EAD Record (Hub)     <SCOPECONTENT ENCODINGANALOG = "ISADG3.3.1.">        <HEAD>     Scope and Content        </HE...
Example EAD Record (Hub)<CONTROLACCESS>                                                                    <PERSNAME SOURC...
2010-2012 Extraction Results• Source data: 30,000 finding aids• EAC-CPF records extracted  – LoC: 43,702 from 1,159 findin...
Phase II preliminary results• unmerged SIA Henry Correspondence• 32,988 Names• unmerged WorldCat MARC• 4,548,270 Names    ...
Methods and Processing• Extract EAC-CPF records from existing EAD-  encoded archival descriptions  – Extracting both creat...
The Problem• Proliferation of the forms of names  – Different names for the same person  – Different people with the same ...
Goethe         …etc…                       12/11/12         2012-11-04 - SLIDE
John Muir                          12/11/12            2012-11-04 - SLIDE
Library and Archive Authority Control• Library (or bibliographic) authority control is almost  exclusively about the contr...
Merging EAC-CPF Records             LCNAF Repository   VIAF Repository      ULAN Repository                               ...
Merging EAC-CPF Records                            VIAF Repository                              Cheshire                  ...
Connect Exact Matches• The EAC-CPF records provide the names  without having to parse texts, etc.• Allows us to use some s...
But…• Exact merging assumes that archives are  following LC cataloging practice in their  EAD records  – There are some pr...
Some failures for merging…• Different abbreviations:   – A. & G. Carisch & C.   – A. & G. Carisch & Co.• And spacing issue...
More…• Variant romanizations (and spacing):  – M. P. Belaieff.  – M. P. Belaïeff.  – M. P. Bieliaev.  – M.P. Belaïeff.  – ...
More…• Inverted order vs. uninverted  – Taylor, Zachary, 1784-1850.  – Zachary Taylor.• Various combinations:  – Tchaikovs...
Merging EAC-CPF Records                            VIAF Repository                              Cheshire                  ...
Search Authority Files• For each name, formulate a search of the  VIAF database using the Cheshire system  (SGML/XML retri...
NGRAM or Shingle MatchingName: Einstein Albert  Shingle sequence: ein, ins, nst, ste, tei, ein … , ertProbability that the...
Name 1 : Einstein Albert                         Name 2 : Ainshtain Albert                          Name 3 : Albert Einste...
Merging EAC-CPF Records                            VIAF Repository                              Cheshire                  ...
Merge Flagged Records• For all of the exact matches and authority  matches  – Use the Authoritative form of the name  – Co...
Inputs to SNAC merging• LoC: 43,702 EAC-CPF records derived from 1159  finding aids• OAC: 91,814 EAC-CPF records derived f...
Another view of the numbers…• 95624 Person names merged from  125555 Person records• 31287 Institutions merged from 47189 ...
Merging Conclusions• There will not be a single merging method,  but a staged set of approaches that will  allow us to go ...
Next• Developing an updateable database of  merged EAC data (dumping Mongo for  PostgreSQL)  – Will permit incremental add...
Methods and Processing• Extract EAC-CPF records from existing EAD-  encoded archival descriptions  – Extracting both creat...
Outline• User Persona• Search and Display• Network graph visualization• Linked Data / RDF• Future Plans                   ...
Meet the target usersPersonas are fictional characters created to represent the different user types within a targeted dem...
Outline• User Persona• Search and Display• Network graph visualization• Linked Data / RDF• Future Plans                   ...
12/11/12
12/11/12
12/11/12
Advanced limits match EAC sections
Outline• User Persona• Search and Display• Network graph visualization    • Context widget (needs new name)• Linked Data /...
Tinkerpop graph database stack• Simple "property graph" model• "JDBC for graph databases" [SNAC is using Neo4J  for the gr...
Outline• User Persona• Search and Display• Network graph visualization• Linked Data / RDF• Future Plans                   ...
What is Linked Open Data?• w3c Semantic Web Technology Stack• Web of atomized Data, not a web of documents• RDF; OWL ontol...
What is Linked Open Data?• Getting to the good stuff    • Blue underlined text    • Pulling in data from multiple sources,...
RDFa owl:sameAs
HTML 5 microdata in chron list
RDF of the social graph                          Thanks Ed Summers!
Silvia Mazzini                                     regesta.exe srlhttp://templates.xdams.net/IBC/ontology/eac-cpf.rdf
&mode=xml2owl [experimental]                               12/11/12
My opinion on the use cases for w3c RDFtech• Good for publishing data• Good for controlled vocabularies• Data models?• Mos...
Outline• User Persona• Search and Display• Linked Data / RDF• Network graph visualization• Future Plans                   ...
Future Plans• Conduct assessment activities involving members of  target audiences to establish mental model of users for ...
• Photo attribution http://www.flickr.com/photos/dsevilla• http://xtf.cdlib.org/• http://code.google.com/p/eac-graph-load/...
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Cni2012
Upcoming SlideShare
Loading in …5
×

Cni2012

662 views

Published on

#cni12f SNAC slides

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
662
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • In the order of importance Lenny the link head is last
  • So, this is what happens when you let the programmer design the user interface In phase two, Rachel Hu, CDL&apos;s user experience designer in our in house assessment group will be helping
  • Hopefully this is where the user will focus
  • AZ browse
  • Featured items on home page (rather than 0-9) Note the tabs to limit by record type
  • Also note the subject and occupation facets
  • Person
  • Advanced search hides, allows On other browsers, hierarchy represented graphically
  • Advanced search help
  • Autocomplete
  • Search results for Oppenheimer
  • View EAD Report data issue link has been added back Will come back to the radial graph demo
  • Sometimes the related resources will come from the EAD, but most of these are from VIAF This whole section is hard to use when there are lots of related items
  • This was the first iteration of the graph visualization
  • Cni2012

    1. 1. Opening Slide
    2. 2. Building an Archival IdentityManagement Network: Transforming Archival Practice and Historical Research Daniel Pitti* and Brian Tingle** * Institute for Advance Technology in the Humanities ** California Digital Library Thanks to Ray R. Larson of the University of California, Berkeley, School of Information for many of the slides here 12/11/12 2012-11-04 - SLIDE
    3. 3. Funding and People• Funding and Timeline – National Endowment for the Humanities – May 2010-April 2012 – Andrew W. Mellon Foundation – May 2012-April 2014• People – Daniel Pitti (PI) and Worthy Martin (Institute for Advanced Technology in the Humanities, University of Virginia) – Adrian Turner and Brian Tingle (California Digital Library, University of California) – Ray Larson (School of Information, University of California, Berkeley) 12/11/12 2012-11-04 - SLIDE
    4. 4. The Source Data• EAD-encoded finding aids (guides to archival records) – 150K – Primarily from U.S. sources, but also U.K. and France• Archival authority records (360K) – National Archives and Records Administration – State Archive of New York – Smithsonian Institution – British Library – National Archives (France) & BnF• WorldCat Archival Descriptions: 2M 12/11/12 2012-11-04 - SLIDE
    5. 5. Library and Museum Authority Records• Getty Vocabulary Program: Union List of Artist Names (293K personal and corporate names)• Virtual International Authority File (16M+ cluster records) – Contributed from around the world by national libraries and others 12/11/12 2012-11-04 - SLIDE
    6. 6. 12/11/122012-11-04 - SLIDE
    7. 7. Methods and Processing• Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names• Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN)• Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about) 12/11/12 2012-11-04 - SLIDE
    8. 8. Example EAD Record (Hub) <ARCHDESC LEVEL = "FONDS" LANGMATERIAL = "English"><EAD> <DID> <EADHEADER LANGENCODING = "ISO 639"> <REPOSITORY> <EADID> University of Manchester, John Rylands University Library of ManchesterGB 0133 TAB </REPOSITORY> </EADID> <UNITID ENCODINGANALOG = "ISADG3.1.1." COUNTRYCODE = "GB" <FILEDESC> REPOSITORYCODE = "0133"> <TITLESTMT> GB 0133 TAB <TITLEPROPER> </UNITID>Tabley Muniments <UNITTITLE LABEL = "Title" ENCODINGANALOG = "ISADG3.1.2."> </TITLEPROPER> Tabley Muniments </TITLESTMT> </UNITTITLE> <PUBLICATIONSTMT> <UNITDATE LABEL = "Dates of Creation" ENCODINGANALOG = "ISADG3.1.3."> <PUBLISHER> 19th centuryJohn Rylands University Library of </UNITDATE>Manchester <PHYSDESC LABEL = "Extent" ENCODINGANALOG = "ISADG3.1.5."> </PUBLISHER> <EXTENT> <ADDRESS> 1.24 cu.m <ADDRESSLINE> </EXTENT>150 Deansgate </PHYSDESC> </ADDRESSLINE> <ORIGINATION LABEL = "Creator" ENCODINGANALOG = "ISADG3.2.1."> <ADDRESSLINE> <FAMNAME SOURCE = "NCARULES">Manchester Warren, family, of Tabley, Cheshire </ADDRESSLINE> </FAMNAME> <ADDRESSLINE> <PERSNAME SOURCE = "NCARULES">... (Parts removed )… Warren, John Byrne Leicester, 1835-1895, 3rd Baron de Tabley, poet </FRONTMATTER> </PERSNAME> </ORIGINATION> </DID> 12/11/12 2012-11-04 - SLIDE
    9. 9. Example EAD Record (Hub) <BIOGHIST ENCODINGANALOG = "ISADG3.2.2."> <HEAD> Administrative/Biographical History </HEAD> <P> The poet John Byrne Leicester Warren, later 3rd and last Baron de Tabley, of Tabley near Knutsford, Cheshire, was born in 1835, the son of the 2nd Baron de Tabley (1811-1887), and his wife, Catherina. His mother was Italian, the daughter of the count de Soglio, and Warren spent much of his early childhood with her in Italy and Greece. He was educated at Eton and Christ Church, Oxford. At Oxford he published a volume of poetry. Originally he published under the pseudonyms George F. Preston (1859-1862) and William Lancaster (1863-1868), but latterly under his own name. </P> <P> His early verse included <TITLE> Praeterita </TITLE> (1863), <TITLE> Eclogues and Monodramas </TITLE> (1864), <TITLE> Studies in Verse </TITLE> (1865), <TITLE> Philocletes </TITLE> (1866), and <TITLE> Orestes </TITLE> (1868). His early work was Tennysonian in style, but he was later to be influenced by both Browning and Swinburne. In 1873 he produced …. (some data removed)… 12/11/12 2012-11-04 - SLIDE
    10. 10. Example EAD Record (Hub) <SCOPECONTENT ENCODINGANALOG = "ISADG3.3.1."> <HEAD> Scope and Content </HEAD> <P> The collection consists mainly of the personal papers of the 3rd Baron de Tabley. The papers reflect his interests in literature, politics, botany and numismatics and include correspondence with numerous prominent later Victorian figures. Attention should also be drawn to de Tabley’s extensive and important collection of armorial bookplates. </P> <P> Correspondents include Sir Mountstuart Grant Duff, Edmund Gosse, Lord Houghton, A.C.Benson, and Robert Bridges. There are volumes of Tableys essays and verse, as well as a considerable number of notebooks and loose manuscripts of verse and other writings. There are various bundles and boxes relating to &quot;Coins&quot;, &quot;Botany&quot;, &quot;Poetry&quot;, &quot;Literary&quot;, &quot;Financial&quot; and bookplates. </P> </SCOPECONTENT> <ADD> <OTHERFINDAID ENCODINGANALOG = "ISADG3.4.6."> <P> Preliminary survey list. </P> </OTHERFINDAID> <RELATEDMATERIAL ENCODINGANALOG = "ISADG3.5.3."> <P> There is correspondence with the 3rd Baron de Tabley among the Edward Freeman Papers, held at JRULM. The Library also has custody of the important Tabley Book Collection. </P> </RELATEDMATERIAL> <SEPARATEDMATERIAL> <P> The family and estate papers of the Leicester-Warren Family of Tabley are held by Cheshire Record Office. Some of these papers were originally in the custody of the John Rylands University Library of Manchester. </P> </SEPARATEDMATERIAL> </ADD> 12/11/12 2012-11-04 - SLIDE
    11. 11. Example EAD Record (Hub)<CONTROLACCESS> <PERSNAME SOURCE = "NCARULES"> <HEAD> <EMPH ALTRENDER = "surname">Milnes</EMPH>Index terms <EMPH ALTRENDER = "forename">Richard Monckton</EMPH> </HEAD> <EMPH ALTRENDER = "dates">1809-1885</EMPH> <GEOGNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "epithet">1st Baron Houghton</EMPH><EMPH ALTRENDER = "a">Tabley Inferior</EMPH> </PERSNAME><EMPH ALTRENDER = "a-">Cheshire SJ7378</EMPH> <SUBJECT SOURCE = "LCSH"> </GEOGNAME> <EMPH ALTRENDER = "a">Bookplates</EMPH> <PERSNAME SOURCE = "NCARULES"> </SUBJECT><EMPH ALTRENDER = "surname">Benson</EMPH> <SUBJECT SOURCE = "LCSH"><EMPH ALTRENDER = "forename">Arthur Christopher</EMPH> <EMPH ALTRENDER = "a">Botany</EMPH><EMPH ALTRENDER = "dates">1862-1923</EMPH> </SUBJECT> </PERSNAME> <SUBJECT SOURCE = "LCSH"> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "a">Numismatics</EMPH><EMPH ALTRENDER = "surname">Bridges</EMPH> </SUBJECT><EMPH ALTRENDER = "forename">Robert Seymour</EMPH> <SUBJECT SOURCE = "LCSH"><EMPH ALTRENDER = "dates">1844-1930</EMPH> <EMPH ALTRENDER = "a-">Poetry</EMPH> </PERSNAME> <EMPH ALTRENDER = "a">Modern</EMPH> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "y">19th century</EMPH><EMPH ALTRENDER = "surname">Duff</EMPH> </SUBJECT><EMPH ALTRENDER = "title">Sir</EMPH> </CONTROLACCESS><EMPH ALTRENDER = "forename">Mountstuart Elphinstone Grant</EMPH> </ARCHDESC><EMPH ALTRENDER = "dates">1829-1906</EMPH> </EAD><EMPH ALTRENDER = "epithet">Knight</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"><EMPH ALTRENDER = "surname">Gosse</EMPH><EMPH ALTRENDER = "title">Sir</EMPH><EMPH ALTRENDER = "forename">Edmund William</EMPH><EMPH ALTRENDER = "dates">1849-1928</EMPH><EMPH ALTRENDER = "epithet">Knight</EMPH> </PERSNAME> 12/11/12 2012-11-04 - SLIDE
    12. 12. 2010-2012 Extraction Results• Source data: 30,000 finding aids• EAC-CPF records extracted – LoC: 43,702 from 1,159 finding aids – OAC: 91,811 from ~15,400 – NWDA: 22,609 from 5,160 – VH: 15,175 from 8,390 – Total 173,297 12/11/12 2012-11-04 - SLIDE
    13. 13. Phase II preliminary results• unmerged SIA Henry Correspondence• 32,988 Names• unmerged WorldCat MARC• 4,548,270 Names 12/11/12 2012-11-04 - SLIDE
    14. 14. Methods and Processing• Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names• Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN)• Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about) 12/11/12 2012-11-04 - SLIDE
    15. 15. The Problem• Proliferation of the forms of names – Different names for the same person – Different people with the same names• Examples – from Books in Print (semi-controlled but not consistent) – ERIC author index (not controlled) 12/11/12 2012-11-04 - SLIDE
    16. 16. Goethe …etc… 12/11/12 2012-11-04 - SLIDE
    17. 17. John Muir 12/11/12 2012-11-04 - SLIDE
    18. 18. Library and Archive Authority Control• Library (or bibliographic) authority control is almost exclusively about the control of names• Archival identity control involves biographical- historical description of the CPF entity – Descriptions based on controlled vocabularies, for example, occupations, place of birth and death – But also biographical-historical description • Prose • Chronological list• Archival authority control provides context for understanding records, the context of their creation, the provenance 12/11/12 2012-11-04 - SLIDE
    19. 19. Merging EAC-CPF Records LCNAF Repository VIAF Repository ULAN Repository Cheshire Search Connect Connect exactly records using Merge matching name authority records information Repository of Repository ofEAC Repository merged EAC connected EAC Records Records (MongoDB) 12/11/12 2012-11-04 - SLIDE
    20. 20. Merging EAC-CPF Records VIAF Repository Cheshire Search Connect Connect exactly records using Merge matching name authority records information Repository of Repository ofEAC Repository merged EAC connected EAC Records Records (MongoDB) 12/11/12 2012-11-04 - SLIDE
    21. 21. Connect Exact Matches• The EAC-CPF records provide the names without having to parse texts, etc.• Allows us to use some simple methods like exact matching – Assume identical name entries means the same person/corporate body/family – Enter the full names and record IDs into a database and flag IDs with same names for merging 12/11/12 2012-11-04 - SLIDE
    22. 22. But…• Exact merging assumes that archives are following LC cataloging practice in their EAD records – There are some problems with this assumption 12/11/12 2012-11-04 - SLIDE
    23. 23. Some failures for merging…• Different abbreviations: – A. & G. Carisch & C. – A. & G. Carisch & Co.• And spacing issues: – A. C. Peters & Bro. – A. C. Peters & Brother. – A. C. Peters. (??) – A. C.Peters & Bro.• Completeness and alternate rules – Tabb, John B. (John Banister), 1845-1909. – Tabb, John Banister, 1845-1909.• Also differing transliterations for non-Latin scripts 12/11/12 2012-11-04 - SLIDE
    24. 24. More…• Variant romanizations (and spacing): – M. P. Belaieff. – M. P. Belaïeff. – M. P. Bieliaev. – M.P. Belaïeff. – M.P.Belaïeff.• Initials vs. names: – Zabolotskii, N.A. – Zabolotskii, Nikolai Alekseevich, 1903-1958. – Zabolotskii. 12/11/12 2012-11-04 - SLIDE
    25. 25. More…• Inverted order vs. uninverted – Taylor, Zachary, 1784-1850. – Zachary Taylor.• Various combinations: – Tchaikovsky, Peter I. – Tchaikovsky, Pëtr Il. – Tchaikovsky, Piotr Ilyich. – Tchaikovsky, Pyotr Il. – Tchaikovsky, Pyotr Ilyich. 12/11/12 2012-11-04 - SLIDE
    26. 26. Merging EAC-CPF Records VIAF Repository Cheshire Search Connect Connect exactly records using Merge matching name authority records information Repository of Repository ofEAC Repository merged EAC connected EAC Records Records (MongoDB) 12/11/12 2012-11-04 - SLIDE
    27. 27. Search Authority Files• For each name, formulate a search of the VIAF database using the Cheshire system (SGML/XML retrieval system with probabilistic and Boolean matching) – Search both the “authoritative” and “non- authoritative” forms – Consider any name matching a non- authoritative form to be a candidate match for the authoritative form – Flag EAC records that match the same authority record as potential matches 12/11/12 2012-11-04 - SLIDE
    28. 28. NGRAM or Shingle MatchingName: Einstein Albert Shingle sequence: ein, ins, nst, ste, tei, ein … , ertProbability that the sequence (ins, nst, ste) follows ein is very high for the nameeinstein Shingle Language Model for names Krishna Janakiraman and Sean Marimpietri - Biograph 12/11/12 2012-11-04 - SLIDE
    29. 29. Name 1 : Einstein Albert Name 2 : Ainshtain Albert Name 3 : Albert Einstein ein In hta tai na ein In ain ste na sht ste al al nst al nsh nst alb alb ins alb insins lbe ein lbe Ain ein lbe ert ert ein ert ein ein rte tei rte tei tei rte Shingle Language Model for names Krishna Janakiraman and Sean Marimpietri - Biograph 12/11/12 2012-11-04 - SLIDE
    30. 30. Merging EAC-CPF Records VIAF Repository Cheshire Search Connect Connect exactly records using Merge matching name authority records information Repository of Repository ofEAC Repository merged EAC connected EAC Records Records (MongoDB) 12/11/12 2012-11-04 - SLIDE
    31. 31. Merge Flagged Records• For all of the exact matches and authority matches – Use the Authoritative form of the name – Combine data from each match into a single EAC-CPF record – Retain all source record IDs and information• Finally, output the merged EAC-CPF records 12/11/12 2012-11-04 - SLIDE
    32. 32. Inputs to SNAC merging• LoC: 43,702 EAC-CPF records derived from 1159 finding aids• OAC: 91,814 EAC-CPF records derived from ~15,400 finding aids• NWDA: 24952 EAC-CPF records derived from 5,568 finding aids• VH: 15,175 EAC-CPF records• Total: 175,688 Input EAC records for merging• Result: 128,781 “unique” names 12/11/12 2012-11-04 - SLIDE
    33. 33. Another view of the numbers…• 95624 Person names merged from 125555 Person records• 31287 Institutions merged from 47189 Institution records• 1980 Families merged from 2899 Family records 12/11/12 2012-11-04 - SLIDE
    34. 34. Merging Conclusions• There will not be a single merging method, but a staged set of approaches that will allow us to go from the simplest exact matches, to (we hope) reliably identifying various variant forms of a name, etc. when corroborated by contextual (date, etc.) information 12/11/12 2012-11-04 - SLIDE
    35. 35. Next• Developing an updateable database of merged EAC data (dumping Mongo for PostgreSQL) – Will permit incremental addition of new data and support editing and “forced” merges• Process the 2M WorldCat archival descriptions• Process the 150,000 finding aids• Convert several hundred thousand archival authority records into EAC-CPF and match/merge process 12/11/12 2012-11-04 - SLIDE
    36. 36. Methods and Processing• Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names• Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN)• Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about) 12/11/12 2012-11-04 - SLIDE
    37. 37. Outline• User Persona• Search and Display• Network graph visualization• Linked Data / RDF• Future Plans 12/11/12
    38. 38. Meet the target usersPersonas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand orproduct in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)• Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families and networks.  Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event.  He also TAs an undergraduate history class and sometimes has to help students find topics for papers. • Connie: Works at an institution that contributed records to the project.  Is going to be asking themselves how this site would be useful to their users.  Wants to understand how their records were used and what the added value is.• Quincy: Library School Student working to QA record matching.• Adele: Person doing authority work during collection processing.• Lenny: Lenny likes linked data, and wants to be able to mine the links that have been established programatically. 12/11/12
    39. 39. Outline• User Persona• Search and Display• Network graph visualization• Linked Data / RDF• Future Plans 12/11/12
    40. 40. 12/11/12
    41. 41. 12/11/12
    42. 42. 12/11/12
    43. 43. Advanced limits match EAC sections
    44. 44. Outline• User Persona• Search and Display• Network graph visualization • Context widget (needs new name)• Linked Data / RDF• Future Plans 12/11/12
    45. 45. Tinkerpop graph database stack• Simple "property graph" model• "JDBC for graph databases" [SNAC is using Neo4J for the graphDB]• XPath like "gremlin" for graph query• REST interfaces with "Rexster"• For me, this was 10 to 100 times easier than using RDF 12/11/12
    46. 46. Outline• User Persona• Search and Display• Network graph visualization• Linked Data / RDF• Future Plans 12/11/12
    47. 47. What is Linked Open Data?• w3c Semantic Web Technology Stack• Web of atomized Data, not a web of documents• RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores• httpRange14; content negotiation; CURIE• No restrictions on data use; free and easy license• Lenny wants it, but does Randy? 12/11/12
    48. 48. What is Linked Open Data?• Getting to the good stuff • Blue underlined text • Pulling in data from multiple sources, in an intelligent way, into a "document"• Understand and discover relationships• Open access for research, education, private study and other fair use 12/11/12
    49. 49. RDFa owl:sameAs
    50. 50. HTML 5 microdata in chron list
    51. 51. RDF of the social graph Thanks Ed Summers!
    52. 52. Silvia Mazzini regesta.exe srlhttp://templates.xdams.net/IBC/ontology/eac-cpf.rdf
    53. 53. &mode=xml2owl [experimental] 12/11/12
    54. 54. My opinion on the use cases for w3c RDFtech• Good for publishing data• Good for controlled vocabularies• Data models?• Most people with open source RDF-store type systems do the real stuff with solr• Consider a graph database 12/11/12
    55. 55. Outline• User Persona• Search and Display• Linked Data / RDF• Network graph visualization• Future Plans 12/11/12
    56. 56. Future Plans• Conduct assessment activities involving members of target audiences to establish mental model of users for design work• Scale interface to millions of names• Visualizations useful and integrated (network and geospatial)• Stable URLs between batches for linked data• Social and personalization features (gateway to crowdsourcing) 12/11/12• Integration with local systems (such as with the context
    57. 57. • Photo attribution http://www.flickr.com/photos/dsevilla• http://xtf.cdlib.org/• http://code.google.com/p/eac-graph-load/source/bro• http://tinkerpop.com/• http://thejit.org/• https://github.com/tingletech/snac-related- widget 12/11/12

    ×