BlogForever Project presentation at MTSR2013

2,312 views

Published on

BlogForever, a collaborative European Commission funded project, developed an exciting new system to harvest, preserve, manage and reuse blog content.

Published in: Technology, Education
  • Be the first to comment

BlogForever Project presentation at MTSR2013

  1. 1. The BlogForever Project http://blogforever.eu Vangelis Banos, BlogForever Project Manager MTSR 2013, 22 Nov 2013, Thessaloniki 1
  2. 2. Contents The Disappearing Web Web Archiving The BlogForever Project BlogForever Applications MTSR 2013, 22 Nov 2013, Thessaloniki 2
  3. 3. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 3
  4. 4. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 4
  5. 5. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 5
  6. 6. Web Archiving The Internet Archive comes to the rescue! MTSR 2013, 22 Nov 2013, Thessaloniki 6
  7. 7. Web Archiving The process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. MTSR 2013, 22 Nov 2013, Thessaloniki 7
  8. 8. The challenge of web archiving File(s) Software Hardware RECORD Generic file archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 8
  9. 9. The challenge of web archiving File(s) File(s) Software File(s) File(s) Software ??? Hardware Website Record(s) ??? File(s) Software File(s) File(s) Web archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 9
  10. 10. We are focusing on blogs  Blogs have become fairly established as an online communication and web publishing tool.  Hundreds of millions of blogs are published about every conceivable subject. Examples 12/9/2013 70+ million sites in the world 369 million people viewing more than 11.8 billion pages each month 38 million new posts and 62.3 million new comments each month 136.5 million blogs 61 billion posts 83.7 million daily posts MTSR 2013, 22 Nov 2013, Thessaloniki 10
  11. 11. Blog Archiving: Objectives & Concerns  Blog characteristics:  Database driven, dynamic websites,  High frequency of updates,  Special structure, metadata, semantics & communication protocols,  Highly interconnected,  Quantity and range of resources,  Ownership and DRM.  Our aims:  harvest, preserve, manage and reuse blogs and their resources. MTSR 2013, 22 Nov 2013, Thessaloniki 11
  12. 12. The BlogForever Project  Collaborative EC funded project,  Duration: 1 Mar 11’ – 31 Aug 13’,  Aims: Theoretic and applied research on blog archiving  Coordinated by AUTH.  Partners: MTSR 2013, 22 Nov 2013, Thessaloniki 12
  13. 13. BlogForever project achievements BlogForever has created a novel blog archiving approach. It is not only about archiving pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc.). Blog modelling and semantics Preservation strategies Cases studies and validation Implementation of the BlogForever platform MTSR 2013, 22 Nov 2013, Thessaloniki 13
  14. 14. BlogForever project achievements Harvesting Unstructured information Web services Blog APIs Blog crawlers     Real-time monitoring Html data extraction engine Spam filtering Web services extraction engine Original data and XML metadata Web services Web interface Managing and reusing Blog digital repository Preserving MTSR 2013, 22 Nov 2013, Thessaloniki        Digital preservation Quality assurance Collections curation Public access APIs Personalised services Information retreival Public web interface / Browse, search,14 export
  15. 15. BlogForever Added Value  BlogForever structures the archived blog content. BlogForever is not only about archiving html pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc) based on a special data model.  BlogForever is based on Invenio an open source state-of-the-art digital library management system developed by CERN.  Better metadata and higher information granularity.  Open Standards and Interoperability (MARCXML, Web Services)  Better management of archived information, increasing the utility of the web archive.  Easy to facilitate added value services e.g. analytics. MTSR 2013, 22 Nov 2013, Thessaloniki 15
  16. 16. BlogForever Impact Blog archiving methods and policies which are reusable and generic. A blog archiving solution that any institution could use to preserve their collections of blogs ensuring authenticity, integrity, completeness, usability, long term accessibility A blog archiving solution that any researcher could use to gather, analyse and reuse blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 16
  17. 17. BlogForever Applications  CERN is currently implementing a high energy physics blogs repository.  AUTH is designing an academic blogs repository.  The Linguistics Department of the University of Hannover is doing a diachronic analysis on certain linguistic and textual phenomena / features using German blogs.  The University of Warwick Computer Science Department is doing social web analytics using blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 17
  18. 18. Thank you! Visit http://blogforever.eu  Access all BlogForever Deliverables (Open Access).  Download the Open Source BlogForever Platform. Contact us:  Project Manager: Vangelis Banos vbanos@gmail.com  Exploitation Manager: Efstratios Arampatzis sa@tero.gr MTSR 2013, 22 Nov 2013, Thessaloniki 18

×