Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BESOCIAL:
A Knowledge Graph for
Social Media Archiving
github.com/RMLio/social-media-archiving
Sven Lieber, Dylan Van Assc...
Valuable information in archived records
Historic government records or early climate data,
e.g. demographics or taxes on ...
BESOCIAL:
A Knowledge Graph for
Social Media Archiving
github.com/RMLio/social-media-archiving
Sven Lieber, Dylan Van Assc...
4
What is BESOCIAL?
What are archives and collections?
Heterogeneous data & data stewardship problem
Our data-centric meth...
BESOCIAL: a cross-institutional research project to
develop a social media archiving strategy for Belgium
Follow up of a p...
6
What is BESOCIAL?
What are archives and collections?
Heterogeneous data & data stewardship problem
Our data-centric meth...
Archives heavily rely on document-based metadata
Physical or digital “things”
Page Book
Finding aids combine
different typ...
8
What is BESOCIAL?
What are archives and collections?
Heterogeneous data & data stewardship problem
Our data-centric meth...
Implicit metadata in heterogeneous formats
Web archiving
Social media
WARC files
Harvest Access
Storage
HTTP Replay
Web Cr...
Data stewardship challenges
Manual curation and maintenance of preservation metadata
10
Different representations for diff...
11
What is BESOCIAL?
What are archives and collections?
Heterogeneous data & data stewardship problem
Our data-centric met...
Generate preservation records from
an interoperable Knowledge Graph
12
Heterogenous (meta) data Knowledge Graph Metadata r...
Our methodology in a nutshell
13
Use Case / Requirements
Formal competency questions
Declarative mapping
Declarative recor...
Our methodology in a nutshell
14
Use Case / Requirements
Formal competency questions
Declarative mapping
Declarative recor...
Our methodology in a nutshell
15
Use Case / Requirements
Formal competency questions
Declarative mapping
Declarative recor...
Our methodology in a nutshell
16
Use Case / Requirements
Formal competency questions
Declarative mapping
Declarative recor...
17
What is BESOCIAL?
What are archives and collections?
Heterogeneous data & data stewardship problem
Our data-centric met...
We opted for the tool Social Feed Manager (SFM)
18
Because SFM
Is a framework and supports different social media provider...
Reusing existing heterogeneous (meta) data
19
Declaratively generate a KG on the fly
20
Generate metadata records
21
/collections/
/collection/{id}
/collection/{id}/export/ead
/collection/{id}/export/marc21
2 m...
Preserved files and rich metadata supporting access
22
collections
items
WARC
files
787 WARC.gz
files
11.4 mio item-level ...
23
What is BESOCIAL?
What are archives and collections?
Heterogeneous data & data stewardship problem
Our data-centric met...
Our Knowledge Graph describes social media
collections interoperably, allows different
views and enables data stewardship....
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

BESOCIAL A Knowledge Graph for Social Media Archiving

Download to read offline

The presentation of our paper "BESOCIAL: A Sustainable Knowledge Graph-based Workflow for Social Media Archiving" presented at the SEMANTiCS EU conference 2021 in Amsterdam.

Joint work with Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert. Julie M. Birkholz and Anastasia Dimou

The relate video is online available at https://youtu.be/oYmzD3e8rBE?t=1912

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

BESOCIAL A Knowledge Graph for Social Media Archiving

  1. 1. BESOCIAL: A Knowledge Graph for Social Media Archiving github.com/RMLio/social-media-archiving Sven Lieber, Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert, Julie M. Birkholz, Anastasia Dimou
  2. 2. Valuable information in archived records Historic government records or early climate data, e.g. demographics or taxes on crop yields Invaluable data loss NASA is unable to locate the original high quality moon landing video. How about 21st century data? Social media content influences the real world, what if Twitter and Co are gone? Historical records Moon landing in the 1960s The web and social media
  3. 3. BESOCIAL: A Knowledge Graph for Social Media Archiving github.com/RMLio/social-media-archiving Sven Lieber, Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert, Julie M. Birkholz, Anastasia Dimou
  4. 4. 4 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  5. 5. BESOCIAL: a cross-institutional research project to develop a social media archiving strategy for Belgium Follow up of a project for general web archiving Lead by the Royal Library of Belgium Research partners with different expertise Funded by the Belgian Science Policy Office
  6. 6. 6 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  7. 7. Archives heavily rely on document-based metadata Physical or digital “things” Page Book Finding aids combine different types of metadata Descriptive metadata Technical metadata Administrative metadata Collections Finding aids Different perspectives on metadata, resulting in different XML-based standards G alleries L ibraries A archives M useums Images from the Gale Family Library at the Minnesota Historical Society (YouTube)
  8. 8. 8 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  9. 9. Implicit metadata in heterogeneous formats Web archiving Social media WARC files Harvest Access Storage HTTP Replay Web Crawler Wayback machine Content Selection Seed lists from Web archivists Different APIs Different JSON files Text editor or Custom visualization Seed lists from Web archivists HTTP Preservation system Preservation Metadata records ?
  10. 10. Data stewardship challenges Manual curation and maintenance of preservation metadata 10 Different representations for different users, e.g. data analyst vs historian Access and interaction, e.g. take down requests, i.e. remove from the public search index
  11. 11. 11 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  12. 12. Generate preservation records from an interoperable Knowledge Graph 12 Heterogenous (meta) data Knowledge Graph Metadata records Reuse metadata already existing in DBs, WARC and JSON files Interoperable data model described using PREMIS, PROV and the Europeana Data Model Archival or bibliographic records with specific perspective in specific XML syntax generate map
  13. 13. Our methodology in a nutshell 13 Use Case / Requirements Formal competency questions Declarative mapping Declarative record generation User story 14 As an archive-user, I want to know which named entities are mentioned in a collection, e.g. cities or events, so I can assess if the content is relevant to me. What are the extracted named entities of a collection? Competency question cq-items-context-4
  14. 14. Our methodology in a nutshell 14 Use Case / Requirements Formal competency questions Declarative mapping Declarative record generation SPARQL query to fetch named entities, used to validate that our solution fulfills the requirements
  15. 15. Our methodology in a nutshell 15 Use Case / Requirements Formal competency questions Declarative mapping Declarative record generation RML rules in YARRRML, generating named entities by using FnO and the DBpedia spotlight API
  16. 16. Our methodology in a nutshell 16 Use Case / Requirements Formal competency questions Declarative mapping Declarative record generation XML template populated by the Knowledge Graph
  17. 17. 17 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  18. 18. We opted for the tool Social Feed Manager (SFM) 18 Because SFM Is a framework and supports different social media providers Collects the most provenance information by relying on APIs and WARC Provides a user interface suitable for non-technical users Is actively maintained on GitHub by GW libraries
  19. 19. Reusing existing heterogeneous (meta) data 19
  20. 20. Declaratively generate a KG on the fly 20
  21. 21. Generate metadata records 21 /collections/ /collection/{id} /collection/{id}/export/ead /collection/{id}/export/marc21 2 main collections with ~300k social media posts 5 mapping files with 31 mappings github.com/RMLio/social-media-archiving
  22. 22. Preserved files and rich metadata supporting access 22 collections items WARC files 787 WARC.gz files 11.4 mio item-level triples 56k collection-level triples 13k aggregation triples
  23. 23. 23 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR A cross-institutional research project to develop a social media archiving strategy Preserving data and describing it with different types of metadata Challenges regarding curation and access by different types of users Exploiting existing metadata, stored interoperable enabling different views Pilot for social media archiving resulting in reusable RML mappings and queries
  24. 24. Our Knowledge Graph describes social media collections interoperably, allows different views and enables data stewardship. However, more challenges and opportunities regarding data quality ahead. SvenLieber sven-lieber.org knows.idlab.ugent.be github.com/RMLio/social-media-archiving

The presentation of our paper "BESOCIAL: A Sustainable Knowledge Graph-based Workflow for Social Media Archiving" presented at the SEMANTiCS EU conference 2021 in Amsterdam. Joint work with Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert. Julie M. Birkholz and Anastasia Dimou The relate video is online available at https://youtu.be/oYmzD3e8rBE?t=1912

Views

Total views

47

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

2

Shares

0

Comments

0

Likes

0

×