SlideShare a Scribd company logo
1 of 69
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Herbert Van de Sompel
DANS
@hvdsomp
https://orcid.org/0000-0002-0715-6126
Collecting the Organizational Scholarly Record
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
James Powell, Harihar Shankar, Marko Rodriguez, and Herbert Van de Sompel (2014) EgoSystem: Where are our
Alumni? code{4}lib journal, issue 24. https://journal.code4lib.org/articles/9519
2013 - EgoSystem
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
EgoSystem Team
• Los Alamos National Laboratory:
• James Powell
• Harihar Shankar
• Herbert Van de Sompel
• Aurellius:
• Marko Rodriguez
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Motivation
• When postdocs leave LANL, the local information systems
maintain very little information about them
• But senior management is interested in engaging them after they
leave LANL as Ambassadors and Advocates
• They needs answers to questions like:
• Who is currently working where?
• Who is involved in what areas of research?
• Who might serve as advocates for the Lab?
• Who knows someone who knows someone we need to
connect with?
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
2012 - Initial Approach: Set Up a VIVO Instance
• 2700+ records were
ingested from LANL
Postdoc Office data to
create initial user profiles
• 8 postdoc alumni were
contacted to complete
their profile
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Up-to-date information at all times is essential to meet the need of
senior LANL management
• Some existing VIVO instances seemed to have been pre-
populated but then remained static after launch
• Would current and former postdocs be interested in
maintaining a professional profile on a VIVO instance
intended to help out LANL?
Doubts about the VIVO Instance
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Leverage public, network-level information pertaining to LANL
Alumni
• Find their network presences - social portals, scientific
portals, homepages, etc.
• Recurrently collect information from those presences: current
employer, social network neighborhood, geo location, etc.
• Create applications based on that information
• Rationale: People have incentives to keep network-layer
information up-to-date
• Goal: Devise a sustainable approach to gather and use up-
to-date information pertaining to LANL Alumni
2013 - New Approach: Leverage Network-Level Information
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Available information elements for PostDocs:
• Z#
• Name
• Institutions:
o PhD University; LANL; Institution after
LANL
• Field of Study
• Discipline
Find network identities:
• Various queries based on information
elements in:
o Yahoo Boss API; MS Academic
Search API
• Search for candidate identities:
o LinkedIn; MS Academic; Twitter;
Homepage; Blogger; SlideShare;
WikiPedia
• Rank and select candidate identities
o Reward when: same identities from
various searches; content matches
information elements
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
LinkedIn Identity
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
LinkedIn Identity
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
LinkedIn Identity
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Twitter Identity
Network-derived information:
• Identities:
o LinkedIn; MS Academic; Twitter;
Homepage; Blogger; SlideShare;
WikiPedia
• Additional information elements:
o Current institution; geo location;
updated discipline
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
0
200
400
600
800
1000
1200
1400
1600
1800
none one two three four five
Web Identities Discovered Per Postdoc
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Resulting Identity Types per Postdoc
0
500
1000
1500
2000
2500
3000
3500
LANL MS Academic LinkedIn Twitter
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Random set of 100 postdocs
• MS Academic
o 86 correct
- 71 correctly discovered identities
- 15 correctly labeled as not having identity
o 14 incorrect
- 2 discovered identities did not match the postdoc
- 12 existing identities were not discovered
• Algorithms favored precision over recall
Evaluation of the Discovery Algorithm
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Network-derived information:
• Network neighborhood:
o Social network ~ Twitter: followers,
followed
o Academic network ~ co-authors MS
Academic
o Affiliations ~ LinkedIn, homepage
• Artifacts: papers, slide decks
• Concepts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Platonic vertices
o Persons
o Institutions
o Artifacts
o Concepts
• Affiliation vertices
o Different types
o Different time periods
• Graph extent, started with 3,005 postdocs:
o Vertices: 9,015,844
o Edges: 19,399,683
Property Graph Representation of Resulting Information
Property Graph Representation of Resulting Information
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Graph Database for Storage/Retrieval/Analysis
Titan Distributed Graph Database
http://titan.thinkaurelius.com/
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Simple web query interface
• Shareable profile page for individuals
• Graph analytics (aggregate social networks, path analysis) and
graph visualization
• Who’s where (the LANL Director travels) search
• Capability to add non-LANL person to the graph
o To find closest path to the person via a LANL postdoc
EgoSystem Application
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Success?
• At the end of the demo meeting, the director said (paraphrasing)
o “I didn’t know what I wanted when we first met but this looks
like what I want, what I need.”
• Project discontinued because of the inability to access LinkedIn
data in legitimate manner
• As a result of heuristic-based processes, the database, query
results are not necessarily correct/complete. This made
EgoSystem an approximating application.
• Fantastic 2 month (~ 6 MM) project that did not yield a production
system but in which we learned an awful lot
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
James Powell, Martin Klein, and Herbert Van de Sompel (2017) Autoload: a pipeline for expanding the holdings of
an Institutional Repository enabled by ResourceSync code{4}lib journal, issue 36.
https://journal.code4lib.org/articles/12427
2016 - Autoload
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
2018 – myresearch.institute
The Scholarly Orphans project
is funded by the Andrew W. Mellon Foundation
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
myresearch.institute Team
• Los Alamos National Laboratory:
• Lyudmila Balakireva
• Martin Klein
• James Powell
• Harihar Shankar
• Herbert Van de Sompel
• Old Dominion University:
• Sawood Alam
• Grant Atkins
• Shawn Jones
• Mat Kelly
• Michael L. Nelson
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers are increasingly using a variety of web platforms for
collaboration and communication
• Why?
• Many of these platforms have desirable characteristics
• Versioning
• Time stamping
• Social embedding
• Their institutions do not provide platforms that have global reach
• Collaboration, cf. Github ~ productivity
• Communication, cf. SlideShare ~ visibility
Research and Research Communication on the Web
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers are increasingly using a variety of web platforms for
collaboration and communication
• Web Platforms:
• Dedicated to scholarship:
• Commercial: e.g., FigShare, Publons
• Not for profit: e.g., OSF, Zenodo
• General purpose:
• Commercial: e.g., GitHub, SlideShare
• Not for profit: e.g., Wikipedia, Wikidata
Research and Research Communication on the Web
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Emma Schymanski
https://orcid.org/0000-0001-6868-8145
https://github.com/schymane
https://www.slideshare.net/EmmaSchymanski
https://figshare.com/authors/Emma_Schymanski/5087039
https://publons.com/author/1538491/emma-schymanski#profile
https://www.eawag.ch/en/aboutus/portrait/organisation/staff/profile/emma-schymanski/
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Shawn Jones
https://orcid.org/0000-0002-4372-870X
http://www.shawnmjones.org/
https://github.com/shawnmjones
https://www.slideshare.net/shawnmjones
https://en.wikipedia.org/wiki/User:Shawnmjones
https://www.blogger.com/profile/17827543974149663194
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers deposit artifacts in web platforms
• Status quo - The researchers’ institutions are in the dark
• Do not know about the existence of these artifact
• Do not have a copy of these artifacts
Research and Research Communication on the Web
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers deposit artifacts in web platforms
• Status quo – Uncertainty regarding long-term access
• Commercial: changing business model, no preservation commitment
• Not for profit: unpredictable funding stream
Research and Research Communication on the Web
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers deposit artifacts in web platforms
• Status quo - Not systematically archived
• No frameworks like LOCKSS/Portico exist for these artifacts
• Researchers only selectively deposit artifacts in portals that
provide archival guarantees; to obtain a cite-able DOI
• Can’t expect researchers to (also) upload all artifacts in IRs
• Web archives only incidentally archive these artifacts, cf.
anecdotal & Hiberlink project evidence
Research and Research Communication on the Web
Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE
https://doi.org/10.1371/journal.pone.0115253
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Emma’s SlideShare Artifact: 0 Mementos
https://www.slideshare.net/EmmaSchymanski/dmcm2018-community-resources-connecting-chemistry-and-toxicity-knowledge
http://timetravel.mementoweb.org/
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Shawn’s GitHub Artifact: 1 Memento
https://github.com/shawnmjones/mediawiki
https://web.archive.org/web/*/https://github.com/shawnmjones/mediawiki
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Evidence from the Hiberlink Project
Web resources referenced in Elsevier corpus (1996-2012)
without representative Memento in public web archives
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
The Scholarly Orphans Project: How to Archive these Artifacts?
• Explores an institution-driven paradigm
• Academic institutions typically have a long shelf life
• A basic premise underlying e.g., LOCKSS, perma.cc
• An academic institution should be interested in capturing the
artifacts (intellectual property) its scholars deposit on the web
• Collecting and archiving such artifacts aligns with the
mission of academic libraries
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
An Institutional Perspective
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
The Scholarly Orphans Project: How to Archive these Artifacts?
• Explores a paradigm inspired by web archiving
• Scale of the problem
• Can’t expect researchers to upload all artifacts in an institutional
repository
• Bilateral agreements for archival purposes with most web
portals unlikely
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
A Web Archiving Perspective
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
myresearch.institute Prototype Pipeline
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Tracking Artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Tracking Artifacts - Description
• In order to track artifacts that were recently deposited by an
institutional researcher in a portal, one reasonably needs:
• The web identity of the researcher in the portal
• Algorithmic discovery, cf. EgoSystem
• Discovery via a registry, cf. ORCID paper
• Manual collection
• A portal API that supports:
• Access by web identity
• Access to contributions “since …” for the web identity
• Result of tracking:
• URI(s) of new artifact(s) discovered in the portal
Klein, M., and Van de Sompel, H. (2017) Discovering Scholarly Orphans Using ORCID. Proceedings of the 2017
ACM/IEEE Joint Conference on Digital Libraries https://arxiv.org/abs/1703.09343
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Tracking Artifacts - Challenges
• Portal API access by web identity
• Broadly supported by general purpose portals
• Typically not supported by scholarly portals
• Some lack an API altogether
• Should add ORCID access to APIs
• OAI-PMH and ResourceSync need sets per web identity
• Professional versus personal contributions
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Capturing Artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Capturing Artifacts - Description
• The capture process takes as input the URI of a new artifact
discovered in a portal
• Its task is to create a representative institutional capture of the
artifact
• Result of capture:
• WARC file for new artifact in an institutional archive
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Capturing Artifacts - Challenges
• Create a high-fidelity capture using an approach that scales for a
steady stream of new artifacts
• Handle dynamic content & interactive features of web pages
• Determine the web boundary of the artifact
• More than the input artifact URI
• The boundary is in the eye of the beholder
• We made a significant breakthrough with the Memento Tracer
framework
• Others (cf. webrecorder.io Autopilot, IA Brozzler) are working on
the same problem
Memento Tracer: http://tracer.mementoweb.org
Autopilot: https://blog.webrecorder.io/2019/08/14/autopilot
Brozzler: https://github.com/internetarchive/brozzler
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Capturing Artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Memento Tracer - Framework
http://tracer.mementoweb.org
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Archiving Artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Archiving Artifacts - Description
• The archiving process takes as input the URI of a WARC file
generated by the capture process
• Its task is to ingest the WARC file in a cross-institutional web archive
• This can be achieved using off-the-shelf web archiving software,
e.g., pywb, Open Wayback
• Result of archiving:
• Mementos pertaining to newly discovered artifact in a cross-
institutional, Memento-compliant web archive
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Archiving Artifacts - Challenges
• Attempted to use ipwb, a pywb version that uses IPFS
• Cross-institutional distributed file system with redundancy
• Ran out of time to get it operationally stable
Sawood Alam, Mat Kelly, and Michael L. Nelson (2016) InterPlanetary Wayback: The Permanent Web Archive
https://doi.org/10.1145/2910896.2925467
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
myresearch.institute - Researchers
• Uniquely identified by ORCIDs
• Web identities in multiple portals
• Create various types of artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
myresearch.institute - Portals
• Tracking started August 27 2018
• Tracking artifacts created starting
August 1 2018
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Scholarly Orphans – Pipeline
• 16,005 unique artifacts tracked, captured, and archived between
20180801 and 20190828
• 60MB event database
• 83GB of WARC files
• 3GB of web archive index
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Showtime: myresearch.institute Portal
https://myresearchinstitute.org
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Success?
• “Interesting project! I’m happy to participate.”
“One more thing, is it possible to get a copy of the URI-Rs that
you guys detected so that I can feed them into an archive of my
choice?...”
• Prototype pipeline developed over 8 months (24 MM)
• Metrics of the prototype demonstrate that researchers generate
a lot of artifacts (that their institutions are typically not aware of)
• Metrics of the prototype suggest it should be possible to run a
production pipeline at the scale of an academic institution
• But would they …?
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Some Final Thoughts
• For a number of reasons, applications that leverage network-level
information at scale (e.g. EgoSystem, myresearch.institute,
Autoload) tend not to be perfect. But they are automatic.
• Do institutions reserve sufficient resources for innovation and
failure? The alternative seems to be outsourcing and loss of
expertise.
• Ideas/visions are rarely fully realized when working on them. But
many times, the work does improve on the status quo. So keep
dreaming and working!
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Herbert Van de Sompel
DANS
@hvdsomp
https://orcid.org/0000-0002-0715-6126
Collecting the Organizational Scholarly Record

More Related Content

Similar to Collecting the organizational scholarly record

Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...KISK FF MU
 
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Hristian Daskalov
 
Digital identity and employability
Digital identity and employabilityDigital identity and employability
Digital identity and employabilityLisa Harris
 
OKFest2014 glam-survey_workshop_20140717
OKFest2014 glam-survey_workshop_20140717OKFest2014 glam-survey_workshop_20140717
OKFest2014 glam-survey_workshop_20140717Beat Estermann
 
Data extraction tools (2019 Version)
Data extraction tools (2019 Version)Data extraction tools (2019 Version)
Data extraction tools (2019 Version)Cristian Ruiz
 
Six month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaSix month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaRick Frank
 
Six month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaSix month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaRick Frank
 
Tracker workshop ALT Conference 2016
Tracker workshop ALT Conference 2016Tracker workshop ALT Conference 2016
Tracker workshop ALT Conference 2016Jisc
 
Elsevier social media for researchers - University of Balamand
Elsevier social media for researchers - University of BalamandElsevier social media for researchers - University of Balamand
Elsevier social media for researchers - University of Balamanduoblibraries
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
 
Creativity, simplicity, and
Creativity, simplicity, andCreativity, simplicity, and
Creativity, simplicity, andsamira amiri
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsNeo4j
 
Exploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraExploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraTogar Simatupang
 
Studoland: The Studen'ts Dream
Studoland: The Studen'ts DreamStudoland: The Studen'ts Dream
Studoland: The Studen'ts DreamBitBomB01
 
Domain of One's Own @ Emory for TATTO 2015
Domain of One's Own @ Emory for TATTO 2015Domain of One's Own @ Emory for TATTO 2015
Domain of One's Own @ Emory for TATTO 2015HeatherJulien
 

Similar to Collecting the organizational scholarly record (20)

Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
 
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
 
Digital identity and employability
Digital identity and employabilityDigital identity and employability
Digital identity and employability
 
Bhagi
BhagiBhagi
Bhagi
 
OKFest2014 glam-survey_workshop_20140717
OKFest2014 glam-survey_workshop_20140717OKFest2014 glam-survey_workshop_20140717
OKFest2014 glam-survey_workshop_20140717
 
Data extraction tools (2019 Version)
Data extraction tools (2019 Version)Data extraction tools (2019 Version)
Data extraction tools (2019 Version)
 
Six month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaSix month evaluation dufferin research serbia
Six month evaluation dufferin research serbia
 
Six month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaSix month evaluation dufferin research serbia
Six month evaluation dufferin research serbia
 
Tracker workshop ALT Conference 2016
Tracker workshop ALT Conference 2016Tracker workshop ALT Conference 2016
Tracker workshop ALT Conference 2016
 
Elsevier social media for researchers - University of Balamand
Elsevier social media for researchers - University of BalamandElsevier social media for researchers - University of Balamand
Elsevier social media for researchers - University of Balamand
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Studoland BitBomB01
Studoland  BitBomB01Studoland  BitBomB01
Studoland BitBomB01
 
Creativity, simplicity, and
Creativity, simplicity, andCreativity, simplicity, and
Creativity, simplicity, and
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with Graphs
 
Exploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraExploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital Era
 
STI2 Board Meeting 2012
STI2 Board Meeting 2012STI2 Board Meeting 2012
STI2 Board Meeting 2012
 
Studoland: The Studen'ts Dream
Studoland: The Studen'ts DreamStudoland: The Studen'ts Dream
Studoland: The Studen'ts Dream
 
Studoland: The Student's Dream
Studoland: The Student's DreamStudoland: The Student's Dream
Studoland: The Student's Dream
 
Data for Social Good
Data for Social GoodData for Social Good
Data for Social Good
 
Domain of One's Own @ Emory for TATTO 2015
Domain of One's Own @ Emory for TATTO 2015Domain of One's Own @ Emory for TATTO 2015
Domain of One's Own @ Emory for TATTO 2015
 

More from Herbert Van de Sompel

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Herbert Van de Sompel
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordHerbert Van de Sompel
 

More from Herbert Van de Sompel (20)

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
To the Rescue of Scholarly Orphans
To the Rescue of Scholarly OrphansTo the Rescue of Scholarly Orphans
To the Rescue of Scholarly Orphans
 
Almost two decades at LANL
Almost two decades at LANLAlmost two decades at LANL
Almost two decades at LANL
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
ResourceSync Quick Overview
ResourceSync Quick OverviewResourceSync Quick Overview
ResourceSync Quick Overview
 
Memento 101
Memento 101Memento 101
Memento 101
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly Record
 

Recently uploaded

Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfThe Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfMilind Agarwal
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewingbigorange77
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 

Recently uploaded (20)

Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfThe Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 

Collecting the organizational scholarly record

  • 1. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Herbert Van de Sompel DANS @hvdsomp https://orcid.org/0000-0002-0715-6126 Collecting the Organizational Scholarly Record
  • 2. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro James Powell, Harihar Shankar, Marko Rodriguez, and Herbert Van de Sompel (2014) EgoSystem: Where are our Alumni? code{4}lib journal, issue 24. https://journal.code4lib.org/articles/9519 2013 - EgoSystem
  • 3. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro EgoSystem Team • Los Alamos National Laboratory: • James Powell • Harihar Shankar • Herbert Van de Sompel • Aurellius: • Marko Rodriguez
  • 4. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Motivation • When postdocs leave LANL, the local information systems maintain very little information about them • But senior management is interested in engaging them after they leave LANL as Ambassadors and Advocates • They needs answers to questions like: • Who is currently working where? • Who is involved in what areas of research? • Who might serve as advocates for the Lab? • Who knows someone who knows someone we need to connect with?
  • 5. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro 2012 - Initial Approach: Set Up a VIVO Instance • 2700+ records were ingested from LANL Postdoc Office data to create initial user profiles • 8 postdoc alumni were contacted to complete their profile
  • 6. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Up-to-date information at all times is essential to meet the need of senior LANL management • Some existing VIVO instances seemed to have been pre- populated but then remained static after launch • Would current and former postdocs be interested in maintaining a professional profile on a VIVO instance intended to help out LANL? Doubts about the VIVO Instance
  • 7. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Leverage public, network-level information pertaining to LANL Alumni • Find their network presences - social portals, scientific portals, homepages, etc. • Recurrently collect information from those presences: current employer, social network neighborhood, geo location, etc. • Create applications based on that information • Rationale: People have incentives to keep network-layer information up-to-date • Goal: Devise a sustainable approach to gather and use up- to-date information pertaining to LANL Alumni 2013 - New Approach: Leverage Network-Level Information
  • 8. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
  • 9. Available information elements for PostDocs: • Z# • Name • Institutions: o PhD University; LANL; Institution after LANL • Field of Study • Discipline
  • 10. Find network identities: • Various queries based on information elements in: o Yahoo Boss API; MS Academic Search API • Search for candidate identities: o LinkedIn; MS Academic; Twitter; Homepage; Blogger; SlideShare; WikiPedia • Rank and select candidate identities o Reward when: same identities from various searches; content matches information elements
  • 11. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro LinkedIn Identity
  • 12. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro LinkedIn Identity
  • 13. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro LinkedIn Identity
  • 14. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Twitter Identity
  • 15. Network-derived information: • Identities: o LinkedIn; MS Academic; Twitter; Homepage; Blogger; SlideShare; WikiPedia • Additional information elements: o Current institution; geo location; updated discipline
  • 16. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro 0 200 400 600 800 1000 1200 1400 1600 1800 none one two three four five Web Identities Discovered Per Postdoc
  • 17. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Resulting Identity Types per Postdoc 0 500 1000 1500 2000 2500 3000 3500 LANL MS Academic LinkedIn Twitter
  • 18. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Random set of 100 postdocs • MS Academic o 86 correct - 71 correctly discovered identities - 15 correctly labeled as not having identity o 14 incorrect - 2 discovered identities did not match the postdoc - 12 existing identities were not discovered • Algorithms favored precision over recall Evaluation of the Discovery Algorithm
  • 19. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Network-derived information: • Network neighborhood: o Social network ~ Twitter: followers, followed o Academic network ~ co-authors MS Academic o Affiliations ~ LinkedIn, homepage • Artifacts: papers, slide decks • Concepts
  • 20. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Platonic vertices o Persons o Institutions o Artifacts o Concepts • Affiliation vertices o Different types o Different time periods • Graph extent, started with 3,005 postdocs: o Vertices: 9,015,844 o Edges: 19,399,683 Property Graph Representation of Resulting Information
  • 21. Property Graph Representation of Resulting Information
  • 22. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Graph Database for Storage/Retrieval/Analysis Titan Distributed Graph Database http://titan.thinkaurelius.com/
  • 23. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Simple web query interface • Shareable profile page for individuals • Graph analytics (aggregate social networks, path analysis) and graph visualization • Who’s where (the LANL Director travels) search • Capability to add non-LANL person to the graph o To find closest path to the person via a LANL postdoc EgoSystem Application
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Success? • At the end of the demo meeting, the director said (paraphrasing) o “I didn’t know what I wanted when we first met but this looks like what I want, what I need.” • Project discontinued because of the inability to access LinkedIn data in legitimate manner • As a result of heuristic-based processes, the database, query results are not necessarily correct/complete. This made EgoSystem an approximating application. • Fantastic 2 month (~ 6 MM) project that did not yield a production system but in which we learned an awful lot
  • 34. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro James Powell, Martin Klein, and Herbert Van de Sompel (2017) Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync code{4}lib journal, issue 36. https://journal.code4lib.org/articles/12427 2016 - Autoload
  • 35. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro 2018 – myresearch.institute The Scholarly Orphans project is funded by the Andrew W. Mellon Foundation
  • 36. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro myresearch.institute Team • Los Alamos National Laboratory: • Lyudmila Balakireva • Martin Klein • James Powell • Harihar Shankar • Herbert Van de Sompel • Old Dominion University: • Sawood Alam • Grant Atkins • Shawn Jones • Mat Kelly • Michael L. Nelson
  • 37. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers are increasingly using a variety of web platforms for collaboration and communication • Why? • Many of these platforms have desirable characteristics • Versioning • Time stamping • Social embedding • Their institutions do not provide platforms that have global reach • Collaboration, cf. Github ~ productivity • Communication, cf. SlideShare ~ visibility Research and Research Communication on the Web
  • 38. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers are increasingly using a variety of web platforms for collaboration and communication • Web Platforms: • Dedicated to scholarship: • Commercial: e.g., FigShare, Publons • Not for profit: e.g., OSF, Zenodo • General purpose: • Commercial: e.g., GitHub, SlideShare • Not for profit: e.g., Wikipedia, Wikidata Research and Research Communication on the Web
  • 39. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Emma Schymanski https://orcid.org/0000-0001-6868-8145 https://github.com/schymane https://www.slideshare.net/EmmaSchymanski https://figshare.com/authors/Emma_Schymanski/5087039 https://publons.com/author/1538491/emma-schymanski#profile https://www.eawag.ch/en/aboutus/portrait/organisation/staff/profile/emma-schymanski/
  • 40. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Shawn Jones https://orcid.org/0000-0002-4372-870X http://www.shawnmjones.org/ https://github.com/shawnmjones https://www.slideshare.net/shawnmjones https://en.wikipedia.org/wiki/User:Shawnmjones https://www.blogger.com/profile/17827543974149663194
  • 41. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers deposit artifacts in web platforms • Status quo - The researchers’ institutions are in the dark • Do not know about the existence of these artifact • Do not have a copy of these artifacts Research and Research Communication on the Web
  • 42. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers deposit artifacts in web platforms • Status quo – Uncertainty regarding long-term access • Commercial: changing business model, no preservation commitment • Not for profit: unpredictable funding stream Research and Research Communication on the Web
  • 43. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers deposit artifacts in web platforms • Status quo - Not systematically archived • No frameworks like LOCKSS/Portico exist for these artifacts • Researchers only selectively deposit artifacts in portals that provide archival guarantees; to obtain a cite-able DOI • Can’t expect researchers to (also) upload all artifacts in IRs • Web archives only incidentally archive these artifacts, cf. anecdotal & Hiberlink project evidence Research and Research Communication on the Web Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
  • 44. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Emma’s SlideShare Artifact: 0 Mementos https://www.slideshare.net/EmmaSchymanski/dmcm2018-community-resources-connecting-chemistry-and-toxicity-knowledge http://timetravel.mementoweb.org/
  • 45. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Shawn’s GitHub Artifact: 1 Memento https://github.com/shawnmjones/mediawiki https://web.archive.org/web/*/https://github.com/shawnmjones/mediawiki
  • 46. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Evidence from the Hiberlink Project Web resources referenced in Elsevier corpus (1996-2012) without representative Memento in public web archives
  • 47. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro The Scholarly Orphans Project: How to Archive these Artifacts? • Explores an institution-driven paradigm • Academic institutions typically have a long shelf life • A basic premise underlying e.g., LOCKSS, perma.cc • An academic institution should be interested in capturing the artifacts (intellectual property) its scholars deposit on the web • Collecting and archiving such artifacts aligns with the mission of academic libraries
  • 48. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro An Institutional Perspective
  • 49. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro The Scholarly Orphans Project: How to Archive these Artifacts? • Explores a paradigm inspired by web archiving • Scale of the problem • Can’t expect researchers to upload all artifacts in an institutional repository • Bilateral agreements for archival purposes with most web portals unlikely
  • 50. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro A Web Archiving Perspective
  • 51. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro myresearch.institute Prototype Pipeline
  • 52. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Tracking Artifacts
  • 53. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Tracking Artifacts - Description • In order to track artifacts that were recently deposited by an institutional researcher in a portal, one reasonably needs: • The web identity of the researcher in the portal • Algorithmic discovery, cf. EgoSystem • Discovery via a registry, cf. ORCID paper • Manual collection • A portal API that supports: • Access by web identity • Access to contributions “since …” for the web identity • Result of tracking: • URI(s) of new artifact(s) discovered in the portal Klein, M., and Van de Sompel, H. (2017) Discovering Scholarly Orphans Using ORCID. Proceedings of the 2017 ACM/IEEE Joint Conference on Digital Libraries https://arxiv.org/abs/1703.09343
  • 54. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Tracking Artifacts - Challenges • Portal API access by web identity • Broadly supported by general purpose portals • Typically not supported by scholarly portals • Some lack an API altogether • Should add ORCID access to APIs • OAI-PMH and ResourceSync need sets per web identity • Professional versus personal contributions
  • 55. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Capturing Artifacts
  • 56. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Capturing Artifacts - Description • The capture process takes as input the URI of a new artifact discovered in a portal • Its task is to create a representative institutional capture of the artifact • Result of capture: • WARC file for new artifact in an institutional archive
  • 57. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Capturing Artifacts - Challenges • Create a high-fidelity capture using an approach that scales for a steady stream of new artifacts • Handle dynamic content & interactive features of web pages • Determine the web boundary of the artifact • More than the input artifact URI • The boundary is in the eye of the beholder • We made a significant breakthrough with the Memento Tracer framework • Others (cf. webrecorder.io Autopilot, IA Brozzler) are working on the same problem Memento Tracer: http://tracer.mementoweb.org Autopilot: https://blog.webrecorder.io/2019/08/14/autopilot Brozzler: https://github.com/internetarchive/brozzler
  • 58. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Capturing Artifacts
  • 59. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Memento Tracer - Framework http://tracer.mementoweb.org
  • 60. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Archiving Artifacts
  • 61. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Archiving Artifacts - Description • The archiving process takes as input the URI of a WARC file generated by the capture process • Its task is to ingest the WARC file in a cross-institutional web archive • This can be achieved using off-the-shelf web archiving software, e.g., pywb, Open Wayback • Result of archiving: • Mementos pertaining to newly discovered artifact in a cross- institutional, Memento-compliant web archive
  • 62. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Archiving Artifacts - Challenges • Attempted to use ipwb, a pywb version that uses IPFS • Cross-institutional distributed file system with redundancy • Ran out of time to get it operationally stable Sawood Alam, Mat Kelly, and Michael L. Nelson (2016) InterPlanetary Wayback: The Permanent Web Archive https://doi.org/10.1145/2910896.2925467
  • 63. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro myresearch.institute - Researchers • Uniquely identified by ORCIDs • Web identities in multiple portals • Create various types of artifacts
  • 64. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro myresearch.institute - Portals • Tracking started August 27 2018 • Tracking artifacts created starting August 1 2018
  • 65. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Scholarly Orphans – Pipeline • 16,005 unique artifacts tracked, captured, and archived between 20180801 and 20190828 • 60MB event database • 83GB of WARC files • 3GB of web archive index
  • 66. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Showtime: myresearch.institute Portal https://myresearchinstitute.org
  • 67. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Success? • “Interesting project! I’m happy to participate.” “One more thing, is it possible to get a copy of the URI-Rs that you guys detected so that I can feed them into an archive of my choice?...” • Prototype pipeline developed over 8 months (24 MM) • Metrics of the prototype demonstrate that researchers generate a lot of artifacts (that their institutions are typically not aware of) • Metrics of the prototype suggest it should be possible to run a production pipeline at the scale of an academic institution • But would they …?
  • 68. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Some Final Thoughts • For a number of reasons, applications that leverage network-level information at scale (e.g. EgoSystem, myresearch.institute, Autoload) tend not to be perfect. But they are automatic. • Do institutions reserve sufficient resources for innovation and failure? The alternative seems to be outsourcing and loss of expertise. • Ideas/visions are rarely fully realized when working on them. But many times, the work does improve on the status quo. So keep dreaming and working!
  • 69. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Herbert Van de Sompel DANS @hvdsomp https://orcid.org/0000-0002-0715-6126 Collecting the Organizational Scholarly Record

Editor's Notes

  1. ~100k articles with links > 230k links total
  2. New paradigm for web archiving, found as part of this problem Unexpected, yet most important result/contribution of this effort Lets imagine you need to frequently archive slide decks from SlideShare (we do) Understand that there are boundary and quality problems Bring human (curator) in the loop Navigate to *one* SS presentation Interact with that presentation in an attempt to show what the boundary is, make explicit what needs to be archived Browser extension, listens to browser events, intercepts them and records them in abstract way (not in terms of URLs, addresses in the DOM, Xpath, CSS selectors) Result: trace expresses in abstract way the interactions the curator had with slide deck Abstract b/c same info how to interact with *this* presentation will apply to *all* presentations Record one, share, re-use with headless browser Share in repo, collectively create, curate traces, update with layout of pages