Successfully reported this slideshow.
Your SlideShare is downloading. ×

BESOCIAL A Knowledge Graph for Social Media Archiving

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 24 Ad

BESOCIAL A Knowledge Graph for Social Media Archiving

Download to read offline

The presentation of our paper "BESOCIAL: A Sustainable Knowledge Graph-based Workflow for Social Media Archiving" presented at the SEMANTiCS EU conference 2021 in Amsterdam.

Joint work with Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert. Julie M. Birkholz and Anastasia Dimou

The relate video is online available at https://youtu.be/oYmzD3e8rBE?t=1912

The presentation of our paper "BESOCIAL: A Sustainable Knowledge Graph-based Workflow for Social Media Archiving" presented at the SEMANTiCS EU conference 2021 in Amsterdam.

Joint work with Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert. Julie M. Birkholz and Anastasia Dimou

The relate video is online available at https://youtu.be/oYmzD3e8rBE?t=1912

Advertisement
Advertisement

More Related Content

Slideshows for you (19)

Similar to BESOCIAL A Knowledge Graph for Social Media Archiving (20)

Advertisement

Recently uploaded (20)

BESOCIAL A Knowledge Graph for Social Media Archiving

  1. 1. BESOCIAL: A Knowledge Graph for Social Media Archiving github.com/RMLio/social-media-archiving Sven Lieber, Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert, Julie M. Birkholz, Anastasia Dimou
  2. 2. Valuable information in archived records Historic government records or early climate data, e.g. demographics or taxes on crop yields Invaluable data loss NASA is unable to locate the original high quality moon landing video. How about 21st century data? Social media content influences the real world, what if Twitter and Co are gone? Historical records Moon landing in the 1960s The web and social media
  3. 3. BESOCIAL: A Knowledge Graph for Social Media Archiving github.com/RMLio/social-media-archiving Sven Lieber, Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert, Julie M. Birkholz, Anastasia Dimou
  4. 4. 4 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  5. 5. BESOCIAL: a cross-institutional research project to develop a social media archiving strategy for Belgium Follow up of a project for general web archiving Lead by the Royal Library of Belgium Research partners with different expertise Funded by the Belgian Science Policy Office
  6. 6. 6 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  7. 7. Archives heavily rely on document-based metadata Physical or digital “things” Page Book Finding aids combine different types of metadata Descriptive metadata Technical metadata Administrative metadata Collections Finding aids Different perspectives on metadata, resulting in different XML-based standards G alleries L ibraries A archives M useums Images from the Gale Family Library at the Minnesota Historical Society (YouTube)
  8. 8. 8 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  9. 9. Implicit metadata in heterogeneous formats Web archiving Social media WARC files Harvest Access Storage HTTP Replay Web Crawler Wayback machine Content Selection Seed lists from Web archivists Different APIs Different JSON files Text editor or Custom visualization Seed lists from Web archivists HTTP Preservation system Preservation Metadata records ?
  10. 10. Data stewardship challenges Manual curation and maintenance of preservation metadata 10 Different representations for different users, e.g. data analyst vs historian Access and interaction, e.g. take down requests, i.e. remove from the public search index
  11. 11. 11 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  12. 12. Generate preservation records from an interoperable Knowledge Graph 12 Heterogenous (meta) data Knowledge Graph Metadata records Reuse metadata already existing in DBs, WARC and JSON files Interoperable data model described using PREMIS, PROV and the Europeana Data Model Archival or bibliographic records with specific perspective in specific XML syntax generate map
  13. 13. Our methodology in a nutshell 13 Use Case / Requirements Formal competency questions Declarative mapping Declarative record generation User story 14 As an archive-user, I want to know which named entities are mentioned in a collection, e.g. cities or events, so I can assess if the content is relevant to me. What are the extracted named entities of a collection? Competency question cq-items-context-4
  14. 14. Our methodology in a nutshell 14 Use Case / Requirements Formal competency questions Declarative mapping Declarative record generation SPARQL query to fetch named entities, used to validate that our solution fulfills the requirements
  15. 15. Our methodology in a nutshell 15 Use Case / Requirements Formal competency questions Declarative mapping Declarative record generation RML rules in YARRRML, generating named entities by using FnO and the DBpedia spotlight API
  16. 16. Our methodology in a nutshell 16 Use Case / Requirements Formal competency questions Declarative mapping Declarative record generation XML template populated by the Knowledge Graph
  17. 17. 17 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR
  18. 18. We opted for the tool Social Feed Manager (SFM) 18 Because SFM Is a framework and supports different social media providers Collects the most provenance information by relying on APIs and WARC Provides a user interface suitable for non-technical users Is actively maintained on GitHub by GW libraries
  19. 19. Reusing existing heterogeneous (meta) data 19
  20. 20. Declaratively generate a KG on the fly 20
  21. 21. Generate metadata records 21 /collections/ /collection/{id} /collection/{id}/export/ead /collection/{id}/export/marc21 2 main collections with ~300k social media posts 5 mapping files with 31 mappings github.com/RMLio/social-media-archiving
  22. 22. Preserved files and rich metadata supporting access 22 collections items WARC files 787 WARC.gz files 11.4 mio item-level triples 56k collection-level triples 13k aggregation triples
  23. 23. 23 What is BESOCIAL? What are archives and collections? Heterogeneous data & data stewardship problem Our data-centric methodology to build a Knowledge Graph Applying our solution within BESOCIAL at KBR A cross-institutional research project to develop a social media archiving strategy Preserving data and describing it with different types of metadata Challenges regarding curation and access by different types of users Exploiting existing metadata, stored interoperable enabling different views Pilot for social media archiving resulting in reusable RML mappings and queries
  24. 24. Our Knowledge Graph describes social media collections interoperably, allows different views and enables data stewardship. However, more challenges and opportunities regarding data quality ahead. SvenLieber sven-lieber.org knows.idlab.ugent.be github.com/RMLio/social-media-archiving

×