SlideShare a Scribd company logo
1 of 18
The BlogForever Project
http://blogforever.eu
Vangelis Banos,
BlogForever Project Manager

MTSR 2013, 22 Nov 2013, Thessaloniki

1
Contents
The Disappearing Web
Web Archiving
The BlogForever Project

BlogForever Applications
MTSR 2013, 22 Nov 2013, Thessaloniki

2
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

3
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

4
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

5
Web Archiving

The Internet
Archive comes
to the rescue!

MTSR 2013, 22 Nov 2013, Thessaloniki

6
Web Archiving
The process of collecting portions of the
World Wide Web to ensure the information is
preserved in an archive for future researchers,
historians, and the public.

MTSR 2013, 22 Nov 2013, Thessaloniki

7
The challenge of web archiving

File(s)

Software

Hardware

RECORD

Generic file archiving operation

MTSR 2013, 22 Nov 2013, Thessaloniki

8
The challenge of web archiving
File(s)
File(s)
Software

File(s)
File(s)

Software

???
Hardware

Website

Record(s)
???

File(s)
Software
File(s)
File(s)

Web archiving operation
MTSR 2013, 22 Nov 2013, Thessaloniki

9
We are focusing on blogs
 Blogs have become fairly established as an online
communication and web publishing tool.
 Hundreds of millions of blogs are published about every
conceivable subject.
Examples 12/9/2013
70+ million sites in the world
369 million people viewing more than
11.8 billion pages each month
38 million new posts and 62.3 million
new comments each month
136.5 million blogs
61 billion posts
83.7 million daily posts
MTSR 2013, 22 Nov 2013, Thessaloniki

10
Blog Archiving: Objectives & Concerns
 Blog characteristics:
 Database driven, dynamic websites,
 High frequency of updates,
 Special structure, metadata, semantics & communication
protocols,
 Highly interconnected,
 Quantity and range of resources,
 Ownership and DRM.

 Our aims:
 harvest, preserve, manage and reuse blogs and their
resources.
MTSR 2013, 22 Nov 2013, Thessaloniki

11
The BlogForever Project
 Collaborative EC funded project,
 Duration: 1 Mar 11’ – 31 Aug 13’,
 Aims: Theoretic and applied research on blog
archiving
 Coordinated by AUTH.
 Partners:

MTSR 2013, 22 Nov 2013, Thessaloniki

12
BlogForever project achievements
BlogForever has created a novel blog archiving approach.
It is not only about archiving pages. It is about archiving information
entities (posts, comments, authors, metadata, dates, pingbacks, etc.).

Blog modelling and
semantics

Preservation strategies

Cases studies and
validation

Implementation of the
BlogForever platform

MTSR 2013, 22 Nov 2013, Thessaloniki

13
BlogForever project achievements
Harvesting

Unstructured
information
Web services
Blog APIs

Blog crawlers






Real-time monitoring
Html data extraction engine
Spam filtering
Web services extraction
engine

Original data and
XML metadata

Web services
Web interface
Managing and reusing

Blog digital repository
Preserving

MTSR 2013, 22 Nov 2013, Thessaloniki









Digital preservation
Quality assurance
Collections curation
Public access APIs
Personalised services
Information retreival
Public web interface /
Browse, search,14
export
BlogForever Added Value
 BlogForever structures the archived blog content. BlogForever is
not only about archiving html pages. It is about archiving
information entities (posts, comments, authors, metadata,
dates, pingbacks, etc) based on a special data model.
 BlogForever is based on Invenio an open source state-of-the-art
digital library management system developed by CERN.

 Better metadata and higher information granularity.
 Open Standards and Interoperability (MARCXML, Web Services)
 Better management of archived information, increasing the
utility of the web archive.
 Easy to facilitate added value services e.g. analytics.
MTSR 2013, 22 Nov 2013, Thessaloniki

15
BlogForever Impact
Blog archiving methods and policies which
are reusable and generic.
A blog archiving solution that any institution
could use to preserve their collections of
blogs ensuring authenticity, integrity,
completeness, usability, long term accessibility
A blog archiving solution that any researcher
could use to gather, analyse and reuse blog
data.
MTSR 2013, 22 Nov 2013, Thessaloniki

16
BlogForever Applications
 CERN is currently implementing a high energy
physics blogs repository.
 AUTH is designing an academic blogs repository.
 The Linguistics Department of the University of
Hannover is doing a diachronic analysis on certain
linguistic and textual phenomena / features using
German blogs.
 The University of Warwick Computer Science
Department is doing social web analytics using blog
data.
MTSR 2013, 22 Nov 2013, Thessaloniki

17
Thank you!
Visit http://blogforever.eu
 Access all BlogForever Deliverables (Open Access).
 Download the Open Source BlogForever Platform.

Contact us:
 Project Manager: Vangelis Banos vbanos@gmail.com
 Exploitation Manager: Efstratios Arampatzis
sa@tero.gr

MTSR 2013, 22 Nov 2013, Thessaloniki

18

More Related Content

What's hot

The Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementThe Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementTom Cobbaert
 
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Matthieu Bonicel
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky ReichEDINA, University of Edinburgh
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...The Frick Collection
 

What's hot (8)

The Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementThe Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival management
 
marc portier_westtoer
marc portier_westtoermarc portier_westtoer
marc portier_westtoer
 
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
 
Summer 2008 Conference Overview
Summer 2008 Conference OverviewSummer 2008 Conference Overview
Summer 2008 Conference Overview
 
Muehlberger umea google
Muehlberger umea googleMuehlberger umea google
Muehlberger umea google
 

Similar to BlogForever Project presentation at MTSR2013

Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Paolo Romano
 
VREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchVREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchChristopher Brown
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conferencepathsproject
 
Oregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentKaren Estlund
 
Preserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattPreserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattRepository Fringe
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupAntoine Isaac
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolElectronic Resources & Libraries
 
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...dannyijwest
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadataLuis Bermudez
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival TechnologiesCliff Landis
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyVassilis Protonotarios
 
Strategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationStrategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationNASIG
 
The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?Lorna Campbell
 
BlogForever project presentation
BlogForever project presentationBlogForever project presentation
BlogForever project presentationBlogForever
 
Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
 

Similar to BlogForever Project presentation at MTSR2013 (20)

Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
VREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchVREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative research
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conference
 
Oregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra Development
 
Intro-EOSC.pptx
Intro-EOSC.pptxIntro-EOSC.pptx
Intro-EOSC.pptx
 
Preserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattPreserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell Boyatt
 
dh_specialist_interview
dh_specialist_interviewdh_specialist_interview
dh_specialist_interview
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator Group
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
 
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet Ontology
 
Strategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationStrategies for Expanding eJournal Preservation
Strategies for Expanding eJournal Preservation
 
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...
 
The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?
 
BlogForever project presentation
BlogForever project presentationBlogForever project presentation
BlogForever project presentation
 
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes "C...
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes  "C...Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes  "C...
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes "C...
 
Caa2015 2 a_gattiglia
Caa2015 2 a_gattigliaCaa2015 2 a_gattiglia
Caa2015 2 a_gattiglia
 
Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...
 

More from eimgreece

Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥΗ ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥeimgreece
 
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών eimgreece
 
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαΜετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαeimgreece
 
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...eimgreece
 
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)eimgreece
 
Eim brochure-gr
Eim brochure-grEim brochure-gr
Eim brochure-greimgreece
 

More from eimgreece (6)

Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥΗ ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
 
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
 
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαΜετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
 
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
 
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
 
Eim brochure-gr
Eim brochure-grEim brochure-gr
Eim brochure-gr
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

BlogForever Project presentation at MTSR2013

  • 1. The BlogForever Project http://blogforever.eu Vangelis Banos, BlogForever Project Manager MTSR 2013, 22 Nov 2013, Thessaloniki 1
  • 2. Contents The Disappearing Web Web Archiving The BlogForever Project BlogForever Applications MTSR 2013, 22 Nov 2013, Thessaloniki 2
  • 3. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 3
  • 4. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 4
  • 5. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 5
  • 6. Web Archiving The Internet Archive comes to the rescue! MTSR 2013, 22 Nov 2013, Thessaloniki 6
  • 7. Web Archiving The process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. MTSR 2013, 22 Nov 2013, Thessaloniki 7
  • 8. The challenge of web archiving File(s) Software Hardware RECORD Generic file archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 8
  • 9. The challenge of web archiving File(s) File(s) Software File(s) File(s) Software ??? Hardware Website Record(s) ??? File(s) Software File(s) File(s) Web archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 9
  • 10. We are focusing on blogs  Blogs have become fairly established as an online communication and web publishing tool.  Hundreds of millions of blogs are published about every conceivable subject. Examples 12/9/2013 70+ million sites in the world 369 million people viewing more than 11.8 billion pages each month 38 million new posts and 62.3 million new comments each month 136.5 million blogs 61 billion posts 83.7 million daily posts MTSR 2013, 22 Nov 2013, Thessaloniki 10
  • 11. Blog Archiving: Objectives & Concerns  Blog characteristics:  Database driven, dynamic websites,  High frequency of updates,  Special structure, metadata, semantics & communication protocols,  Highly interconnected,  Quantity and range of resources,  Ownership and DRM.  Our aims:  harvest, preserve, manage and reuse blogs and their resources. MTSR 2013, 22 Nov 2013, Thessaloniki 11
  • 12. The BlogForever Project  Collaborative EC funded project,  Duration: 1 Mar 11’ – 31 Aug 13’,  Aims: Theoretic and applied research on blog archiving  Coordinated by AUTH.  Partners: MTSR 2013, 22 Nov 2013, Thessaloniki 12
  • 13. BlogForever project achievements BlogForever has created a novel blog archiving approach. It is not only about archiving pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc.). Blog modelling and semantics Preservation strategies Cases studies and validation Implementation of the BlogForever platform MTSR 2013, 22 Nov 2013, Thessaloniki 13
  • 14. BlogForever project achievements Harvesting Unstructured information Web services Blog APIs Blog crawlers     Real-time monitoring Html data extraction engine Spam filtering Web services extraction engine Original data and XML metadata Web services Web interface Managing and reusing Blog digital repository Preserving MTSR 2013, 22 Nov 2013, Thessaloniki        Digital preservation Quality assurance Collections curation Public access APIs Personalised services Information retreival Public web interface / Browse, search,14 export
  • 15. BlogForever Added Value  BlogForever structures the archived blog content. BlogForever is not only about archiving html pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc) based on a special data model.  BlogForever is based on Invenio an open source state-of-the-art digital library management system developed by CERN.  Better metadata and higher information granularity.  Open Standards and Interoperability (MARCXML, Web Services)  Better management of archived information, increasing the utility of the web archive.  Easy to facilitate added value services e.g. analytics. MTSR 2013, 22 Nov 2013, Thessaloniki 15
  • 16. BlogForever Impact Blog archiving methods and policies which are reusable and generic. A blog archiving solution that any institution could use to preserve their collections of blogs ensuring authenticity, integrity, completeness, usability, long term accessibility A blog archiving solution that any researcher could use to gather, analyse and reuse blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 16
  • 17. BlogForever Applications  CERN is currently implementing a high energy physics blogs repository.  AUTH is designing an academic blogs repository.  The Linguistics Department of the University of Hannover is doing a diachronic analysis on certain linguistic and textual phenomena / features using German blogs.  The University of Warwick Computer Science Department is doing social web analytics using blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 17
  • 18. Thank you! Visit http://blogforever.eu  Access all BlogForever Deliverables (Open Access).  Download the Open Source BlogForever Platform. Contact us:  Project Manager: Vangelis Banos vbanos@gmail.com  Exploitation Manager: Efstratios Arampatzis sa@tero.gr MTSR 2013, 22 Nov 2013, Thessaloniki 18

Editor's Notes

  1. The key BlogForever project goals were fully achieved during the time span of the project, during a series of theoretical and applied research tasks.Initially, BlogForever focused on studying weblog structure and semantics, and started developing preservation strategies for weblogs.Later the focus gradually moved to implement the BlogForever platform as well as interoperability prospects and digital rights management strategies.An important aspect of the project was also the design and implementation of extensive case studies of variable complexity and size, to validate and test the BlogForever platform.BlogForever createdan exciting new system to harvest, preserve and manage blog content, developing new insights through its restructuring and reuse. Towards this, it has stepped into yet uncharted territories of theoretical and practical aspects of blog preservation; it first researched blog structure and semantics; it then defined solid blog preservation policies and developed a robust blog preservation software platform; finally it validated the platform through specific case studies using real world data.
  2. After working on what to preserve and how to preserve it, we present how we implemented blog preservation.