SlideShare a Scribd company logo
1 of 18
The BlogForever Project
http://blogforever.eu
Vangelis Banos,
BlogForever Project Manager

MTSR 2013, 22 Nov 2013, Thessaloniki

1
Contents
The Disappearing Web
Web Archiving
The BlogForever Project

BlogForever Applications
MTSR 2013, 22 Nov 2013, Thessaloniki

2
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

3
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

4
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

5
Web Archiving

The Internet
Archive comes
to the rescue!

MTSR 2013, 22 Nov 2013, Thessaloniki

6
Web Archiving
The process of collecting portions of the
World Wide Web to ensure the information is
preserved in an archive for future researchers,
historians, and the public.

MTSR 2013, 22 Nov 2013, Thessaloniki

7
The challenge of web archiving

File(s)

Software

Hardware

RECORD

Generic file archiving operation

MTSR 2013, 22 Nov 2013, Thessaloniki

8
The challenge of web archiving
File(s)
File(s)
Software

File(s)
File(s)

Software

???
Hardware

Website

Record(s)
???

File(s)
Software
File(s)
File(s)

Web archiving operation
MTSR 2013, 22 Nov 2013, Thessaloniki

9
We are focusing on blogs
 Blogs have become fairly established as an online
communication and web publishing tool.
 Hundreds of millions of blogs are published about every
conceivable subject.
Examples 12/9/2013
70+ million sites in the world
369 million people viewing more than
11.8 billion pages each month
38 million new posts and 62.3 million
new comments each month
136.5 million blogs
61 billion posts
83.7 million daily posts
MTSR 2013, 22 Nov 2013, Thessaloniki

10
Blog Archiving: Objectives & Concerns
 Blog characteristics:
 Database driven, dynamic websites,
 High frequency of updates,
 Special structure, metadata, semantics & communication
protocols,
 Highly interconnected,
 Quantity and range of resources,
 Ownership and DRM.

 Our aims:
 harvest, preserve, manage and reuse blogs and their
resources.
MTSR 2013, 22 Nov 2013, Thessaloniki

11
The BlogForever Project
 Collaborative EC funded project,
 Duration: 1 Mar 11’ – 31 Aug 13’,
 Aims: Theoretic and applied research on blog
archiving
 Coordinated by AUTH.
 Partners:

MTSR 2013, 22 Nov 2013, Thessaloniki

12
BlogForever project achievements
BlogForever has created a novel blog archiving approach.
It is not only about archiving pages. It is about archiving information
entities (posts, comments, authors, metadata, dates, pingbacks, etc.).

Blog modelling and
semantics

Preservation strategies

Cases studies and
validation

Implementation of the
BlogForever platform

MTSR 2013, 22 Nov 2013, Thessaloniki

13
BlogForever project achievements
Harvesting

Unstructured
information
Web services
Blog APIs

Blog crawlers






Real-time monitoring
Html data extraction engine
Spam filtering
Web services extraction
engine

Original data and
XML metadata

Web services
Web interface
Managing and reusing

Blog digital repository
Preserving

MTSR 2013, 22 Nov 2013, Thessaloniki









Digital preservation
Quality assurance
Collections curation
Public access APIs
Personalised services
Information retreival
Public web interface /
Browse, search,14
export
BlogForever Added Value
 BlogForever structures the archived blog content. BlogForever is
not only about archiving html pages. It is about archiving
information entities (posts, comments, authors, metadata,
dates, pingbacks, etc) based on a special data model.
 BlogForever is based on Invenio an open source state-of-the-art
digital library management system developed by CERN.

 Better metadata and higher information granularity.
 Open Standards and Interoperability (MARCXML, Web Services)
 Better management of archived information, increasing the
utility of the web archive.
 Easy to facilitate added value services e.g. analytics.
MTSR 2013, 22 Nov 2013, Thessaloniki

15
BlogForever Impact
Blog archiving methods and policies which
are reusable and generic.
A blog archiving solution that any institution
could use to preserve their collections of
blogs ensuring authenticity, integrity,
completeness, usability, long term accessibility
A blog archiving solution that any researcher
could use to gather, analyse and reuse blog
data.
MTSR 2013, 22 Nov 2013, Thessaloniki

16
BlogForever Applications
 CERN is currently implementing a high energy
physics blogs repository.
 AUTH is designing an academic blogs repository.
 The Linguistics Department of the University of
Hannover is doing a diachronic analysis on certain
linguistic and textual phenomena / features using
German blogs.
 The University of Warwick Computer Science
Department is doing social web analytics using blog
data.
MTSR 2013, 22 Nov 2013, Thessaloniki

17
Thank you!
Visit http://blogforever.eu
 Access all BlogForever Deliverables (Open Access).
 Download the Open Source BlogForever Platform.

Contact us:
 Project Manager: Vangelis Banos vbanos@gmail.com
 Exploitation Manager: Efstratios Arampatzis
sa@tero.gr

MTSR 2013, 22 Nov 2013, Thessaloniki

18

More Related Content

What's hot

The Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementThe Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementTom Cobbaert
 
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Matthieu Bonicel
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky ReichEDINA, University of Edinburgh
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...The Frick Collection
 

What's hot (8)

The Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementThe Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival management
 
marc portier_westtoer
marc portier_westtoermarc portier_westtoer
marc portier_westtoer
 
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
 
Summer 2008 Conference Overview
Summer 2008 Conference OverviewSummer 2008 Conference Overview
Summer 2008 Conference Overview
 
Muehlberger umea google
Muehlberger umea googleMuehlberger umea google
Muehlberger umea google
 

Similar to BlogForever Project presentation at MTSR2013

Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Paolo Romano
 
VREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchVREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchChristopher Brown
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conferencepathsproject
 
Oregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentKaren Estlund
 
Preserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattPreserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattRepository Fringe
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupAntoine Isaac
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolElectronic Resources & Libraries
 
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...dannyijwest
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadataLuis Bermudez
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival TechnologiesCliff Landis
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyVassilis Protonotarios
 
Strategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationStrategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationNASIG
 
The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?Lorna Campbell
 
BlogForever project presentation
BlogForever project presentationBlogForever project presentation
BlogForever project presentationBlogForever
 
Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
 

Similar to BlogForever Project presentation at MTSR2013 (20)

Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
VREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchVREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative research
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conference
 
Oregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra Development
 
Intro-EOSC.pptx
Intro-EOSC.pptxIntro-EOSC.pptx
Intro-EOSC.pptx
 
Preserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattPreserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell Boyatt
 
dh_specialist_interview
dh_specialist_interviewdh_specialist_interview
dh_specialist_interview
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator Group
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
 
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet Ontology
 
Strategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationStrategies for Expanding eJournal Preservation
Strategies for Expanding eJournal Preservation
 
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...
 
The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?
 
BlogForever project presentation
BlogForever project presentationBlogForever project presentation
BlogForever project presentation
 
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes "C...
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes  "C...Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes  "C...
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes "C...
 
Caa2015 2 a_gattiglia
Caa2015 2 a_gattigliaCaa2015 2 a_gattiglia
Caa2015 2 a_gattiglia
 
Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...
 

More from eimgreece

Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥΗ ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥeimgreece
 
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών eimgreece
 
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαΜετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαeimgreece
 
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...eimgreece
 
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)eimgreece
 
Eim brochure-gr
Eim brochure-grEim brochure-gr
Eim brochure-greimgreece
 

More from eimgreece (6)

Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥΗ ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
 
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
 
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαΜετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
 
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
 
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
 
Eim brochure-gr
Eim brochure-grEim brochure-gr
Eim brochure-gr
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

BlogForever Project presentation at MTSR2013

  • 1. The BlogForever Project http://blogforever.eu Vangelis Banos, BlogForever Project Manager MTSR 2013, 22 Nov 2013, Thessaloniki 1
  • 2. Contents The Disappearing Web Web Archiving The BlogForever Project BlogForever Applications MTSR 2013, 22 Nov 2013, Thessaloniki 2
  • 3. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 3
  • 4. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 4
  • 5. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 5
  • 6. Web Archiving The Internet Archive comes to the rescue! MTSR 2013, 22 Nov 2013, Thessaloniki 6
  • 7. Web Archiving The process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. MTSR 2013, 22 Nov 2013, Thessaloniki 7
  • 8. The challenge of web archiving File(s) Software Hardware RECORD Generic file archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 8
  • 9. The challenge of web archiving File(s) File(s) Software File(s) File(s) Software ??? Hardware Website Record(s) ??? File(s) Software File(s) File(s) Web archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 9
  • 10. We are focusing on blogs  Blogs have become fairly established as an online communication and web publishing tool.  Hundreds of millions of blogs are published about every conceivable subject. Examples 12/9/2013 70+ million sites in the world 369 million people viewing more than 11.8 billion pages each month 38 million new posts and 62.3 million new comments each month 136.5 million blogs 61 billion posts 83.7 million daily posts MTSR 2013, 22 Nov 2013, Thessaloniki 10
  • 11. Blog Archiving: Objectives & Concerns  Blog characteristics:  Database driven, dynamic websites,  High frequency of updates,  Special structure, metadata, semantics & communication protocols,  Highly interconnected,  Quantity and range of resources,  Ownership and DRM.  Our aims:  harvest, preserve, manage and reuse blogs and their resources. MTSR 2013, 22 Nov 2013, Thessaloniki 11
  • 12. The BlogForever Project  Collaborative EC funded project,  Duration: 1 Mar 11’ – 31 Aug 13’,  Aims: Theoretic and applied research on blog archiving  Coordinated by AUTH.  Partners: MTSR 2013, 22 Nov 2013, Thessaloniki 12
  • 13. BlogForever project achievements BlogForever has created a novel blog archiving approach. It is not only about archiving pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc.). Blog modelling and semantics Preservation strategies Cases studies and validation Implementation of the BlogForever platform MTSR 2013, 22 Nov 2013, Thessaloniki 13
  • 14. BlogForever project achievements Harvesting Unstructured information Web services Blog APIs Blog crawlers     Real-time monitoring Html data extraction engine Spam filtering Web services extraction engine Original data and XML metadata Web services Web interface Managing and reusing Blog digital repository Preserving MTSR 2013, 22 Nov 2013, Thessaloniki        Digital preservation Quality assurance Collections curation Public access APIs Personalised services Information retreival Public web interface / Browse, search,14 export
  • 15. BlogForever Added Value  BlogForever structures the archived blog content. BlogForever is not only about archiving html pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc) based on a special data model.  BlogForever is based on Invenio an open source state-of-the-art digital library management system developed by CERN.  Better metadata and higher information granularity.  Open Standards and Interoperability (MARCXML, Web Services)  Better management of archived information, increasing the utility of the web archive.  Easy to facilitate added value services e.g. analytics. MTSR 2013, 22 Nov 2013, Thessaloniki 15
  • 16. BlogForever Impact Blog archiving methods and policies which are reusable and generic. A blog archiving solution that any institution could use to preserve their collections of blogs ensuring authenticity, integrity, completeness, usability, long term accessibility A blog archiving solution that any researcher could use to gather, analyse and reuse blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 16
  • 17. BlogForever Applications  CERN is currently implementing a high energy physics blogs repository.  AUTH is designing an academic blogs repository.  The Linguistics Department of the University of Hannover is doing a diachronic analysis on certain linguistic and textual phenomena / features using German blogs.  The University of Warwick Computer Science Department is doing social web analytics using blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 17
  • 18. Thank you! Visit http://blogforever.eu  Access all BlogForever Deliverables (Open Access).  Download the Open Source BlogForever Platform. Contact us:  Project Manager: Vangelis Banos vbanos@gmail.com  Exploitation Manager: Efstratios Arampatzis sa@tero.gr MTSR 2013, 22 Nov 2013, Thessaloniki 18

Editor's Notes

  1. The key BlogForever project goals were fully achieved during the time span of the project, during a series of theoretical and applied research tasks.Initially, BlogForever focused on studying weblog structure and semantics, and started developing preservation strategies for weblogs.Later the focus gradually moved to implement the BlogForever platform as well as interoperability prospects and digital rights management strategies.An important aspect of the project was also the design and implementation of extensive case studies of variable complexity and size, to validate and test the BlogForever platform.BlogForever createdan exciting new system to harvest, preserve and manage blog content, developing new insights through its restructuring and reuse. Towards this, it has stepped into yet uncharted territories of theoretical and practical aspects of blog preservation; it first researched blog structure and semantics; it then defined solid blog preservation policies and developed a robust blog preservation software platform; finally it validated the platform through specific case studies using real world data.
  2. After working on what to preserve and how to preserve it, we present how we implemented blog preservation.