BlogForever Project presentation at MTSR2013

E
The BlogForever Project
http://blogforever.eu
Vangelis Banos,
BlogForever Project Manager

MTSR 2013, 22 Nov 2013, Thessaloniki

1
Contents
The Disappearing Web
Web Archiving
The BlogForever Project

BlogForever Applications
MTSR 2013, 22 Nov 2013, Thessaloniki

2
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

3
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

4
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

5
Web Archiving

The Internet
Archive comes
to the rescue!

MTSR 2013, 22 Nov 2013, Thessaloniki

6
Web Archiving
The process of collecting portions of the
World Wide Web to ensure the information is
preserved in an archive for future researchers,
historians, and the public.

MTSR 2013, 22 Nov 2013, Thessaloniki

7
The challenge of web archiving

File(s)

Software

Hardware

RECORD

Generic file archiving operation

MTSR 2013, 22 Nov 2013, Thessaloniki

8
The challenge of web archiving
File(s)
File(s)
Software

File(s)
File(s)

Software

???
Hardware

Website

Record(s)
???

File(s)
Software
File(s)
File(s)

Web archiving operation
MTSR 2013, 22 Nov 2013, Thessaloniki

9
We are focusing on blogs
 Blogs have become fairly established as an online
communication and web publishing tool.
 Hundreds of millions of blogs are published about every
conceivable subject.
Examples 12/9/2013
70+ million sites in the world
369 million people viewing more than
11.8 billion pages each month
38 million new posts and 62.3 million
new comments each month
136.5 million blogs
61 billion posts
83.7 million daily posts
MTSR 2013, 22 Nov 2013, Thessaloniki

10
Blog Archiving: Objectives & Concerns
 Blog characteristics:
 Database driven, dynamic websites,
 High frequency of updates,
 Special structure, metadata, semantics & communication
protocols,
 Highly interconnected,
 Quantity and range of resources,
 Ownership and DRM.

 Our aims:
 harvest, preserve, manage and reuse blogs and their
resources.
MTSR 2013, 22 Nov 2013, Thessaloniki

11
The BlogForever Project
 Collaborative EC funded project,
 Duration: 1 Mar 11’ – 31 Aug 13’,
 Aims: Theoretic and applied research on blog
archiving
 Coordinated by AUTH.
 Partners:

MTSR 2013, 22 Nov 2013, Thessaloniki

12
BlogForever project achievements
BlogForever has created a novel blog archiving approach.
It is not only about archiving pages. It is about archiving information
entities (posts, comments, authors, metadata, dates, pingbacks, etc.).

Blog modelling and
semantics

Preservation strategies

Cases studies and
validation

Implementation of the
BlogForever platform

MTSR 2013, 22 Nov 2013, Thessaloniki

13
BlogForever project achievements
Harvesting

Unstructured
information
Web services
Blog APIs

Blog crawlers






Real-time monitoring
Html data extraction engine
Spam filtering
Web services extraction
engine

Original data and
XML metadata

Web services
Web interface
Managing and reusing

Blog digital repository
Preserving

MTSR 2013, 22 Nov 2013, Thessaloniki









Digital preservation
Quality assurance
Collections curation
Public access APIs
Personalised services
Information retreival
Public web interface /
Browse, search,14
export
BlogForever Added Value
 BlogForever structures the archived blog content. BlogForever is
not only about archiving html pages. It is about archiving
information entities (posts, comments, authors, metadata,
dates, pingbacks, etc) based on a special data model.
 BlogForever is based on Invenio an open source state-of-the-art
digital library management system developed by CERN.

 Better metadata and higher information granularity.
 Open Standards and Interoperability (MARCXML, Web Services)
 Better management of archived information, increasing the
utility of the web archive.
 Easy to facilitate added value services e.g. analytics.
MTSR 2013, 22 Nov 2013, Thessaloniki

15
BlogForever Impact
Blog archiving methods and policies which
are reusable and generic.
A blog archiving solution that any institution
could use to preserve their collections of
blogs ensuring authenticity, integrity,
completeness, usability, long term accessibility
A blog archiving solution that any researcher
could use to gather, analyse and reuse blog
data.
MTSR 2013, 22 Nov 2013, Thessaloniki

16
BlogForever Applications
 CERN is currently implementing a high energy
physics blogs repository.
 AUTH is designing an academic blogs repository.
 The Linguistics Department of the University of
Hannover is doing a diachronic analysis on certain
linguistic and textual phenomena / features using
German blogs.
 The University of Warwick Computer Science
Department is doing social web analytics using blog
data.
MTSR 2013, 22 Nov 2013, Thessaloniki

17
Thank you!
Visit http://blogforever.eu
 Access all BlogForever Deliverables (Open Access).
 Download the Open Source BlogForever Platform.

Contact us:
 Project Manager: Vangelis Banos vbanos@gmail.com
 Exploitation Manager: Efstratios Arampatzis
sa@tero.gr

MTSR 2013, 22 Nov 2013, Thessaloniki

18
1 of 18

Recommended

Can you save the web? Web Archiving! by
Can you save the web? Web Archiving!Can you save the web? Web Archiving!
Can you save the web? Web Archiving!Vangelis Banos
3.4K views39 slides
Rhizomer by
RhizomerRhizomer
RhizomerSTI Innsbruck
473 views15 slides
'Your Scholarship. Our World. Preserving the Long Tail' by Vicky Reich by
'Your Scholarship. Our World. Preserving the Long Tail' by Vicky Reich'Your Scholarship. Our World. Preserving the Long Tail' by Vicky Reich
'Your Scholarship. Our World. Preserving the Long Tail' by Vicky ReichEDINA, University of Edinburgh
878 views27 slides
'HathiTrust's Long View: Perspectives on Preservation Strategies' by Mike Fur... by
'HathiTrust's Long View: Perspectives on Preservation Strategies' by Mike Fur...'HathiTrust's Long View: Perspectives on Preservation Strategies' by Mike Fur...
'HathiTrust's Long View: Perspectives on Preservation Strategies' by Mike Fur...EDINA, University of Edinburgh
874 views8 slides
‘PERSIST – UNESCO’s Memory of the World Programme as a catalyst for the deba... by
 ‘PERSIST – UNESCO’s Memory of the World Programme as a catalyst for the deba... ‘PERSIST – UNESCO’s Memory of the World Programme as a catalyst for the deba...
‘PERSIST – UNESCO’s Memory of the World Programme as a catalyst for the deba...EDINA, University of Edinburgh
1.1K views10 slides
Austrian National Library - Google Public Private Partnership by
Austrian National Library - Google Public Private PartnershipAustrian National Library - Google Public Private Partnership
Austrian National Library - Google Public Private PartnershipMax Kaiser
660 views72 slides

More Related Content

What's hot

The Needs of Archives: 16 (simple) rules for a better archival management by
The Needs of Archives: 16 (simple) rules for a better archival managementThe Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementTom Cobbaert
897 views35 slides
marc portier_westtoer by
marc portier_westtoermarc portier_westtoer
marc portier_westtoerKatrien Steelandt
612 views42 slides
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob... by
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Matthieu Bonicel
1.1K views18 slides
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work... by
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
1.1K views53 slides
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich by
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky ReichEDINA, University of Edinburgh
1K views15 slides

What's hot(9)

The Needs of Archives: 16 (simple) rules for a better archival management by Tom Cobbaert
The Needs of Archives: 16 (simple) rules for a better archival managementThe Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival management
Tom Cobbaert897 views
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob... by Matthieu Bonicel
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Matthieu Bonicel1.1K views
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work... by Anna Perricci
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Anna Perricci1.1K views
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ... by The Frick Collection
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...

Similar to BlogForever Project presentation at MTSR2013

Scientific Workflows: what do we have, what do we miss? by
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Paolo Romano
742 views35 slides
VREs and Research Tools - supporting collaborative research by
VREs and Research Tools - supporting collaborative researchVREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchChristopher Brown
1.3K views20 slides
PATHS at the eChallenges conference by
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conferencepathsproject
596 views28 slides
Oregon Digital: Collaborative Hydra Development by
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentKaren Estlund
411 views1 slide
Intro-EOSC.pptx by
Intro-EOSC.pptxIntro-EOSC.pptx
Intro-EOSC.pptxSarah Jones
189 views32 slides
Preserving a MOOC - Russell Boyatt by
Preserving a MOOC - Russell BoyattPreserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattRepository Fringe
880 views6 slides

Similar to BlogForever Project presentation at MTSR2013(20)

Scientific Workflows: what do we have, what do we miss? by Paolo Romano
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
Paolo Romano742 views
VREs and Research Tools - supporting collaborative research by Christopher Brown
VREs and Research Tools - supporting collaborative researchVREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative research
Christopher Brown1.3K views
PATHS at the eChallenges conference by pathsproject
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conference
pathsproject596 views
Oregon Digital: Collaborative Hydra Development by Karen Estlund
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra Development
Karen Estlund411 views
W3C Library Linked Data Incubator Group by Antoine Isaac
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator Group
Antoine Isaac1.1K views
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog... by dannyijwest
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
dannyijwest3 views
Validation of services, data and metadata by Luis Bermudez
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
Luis Bermudez472 views
Archival Technologies by Cliff Landis
Archival TechnologiesArchival Technologies
Archival Technologies
Cliff Landis24.2K views
Strategies for Expanding eJournal Preservation by NASIG
Strategies for Expanding eJournal PreservationStrategies for Expanding eJournal Preservation
Strategies for Expanding eJournal Preservation
NASIG837 views
The Learning Registry: Social networking for open educational resources? by Lorna Campbell
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?
Lorna Campbell941 views
BlogForever project presentation by BlogForever
BlogForever project presentationBlogForever project presentation
BlogForever project presentation
BlogForever2.7K views
Project update: A collaborative approach to "filling the digital preservation... by Jenny Mitcham
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...
Jenny Mitcham184 views

More from eimgreece

Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ by
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥΗ ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥeimgreece
1.1K views30 slides
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών by
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών eimgreece
4.2K views15 slides
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα by
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαΜετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαeimgreece
641 views30 slides
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR... by
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...eimgreece
635 views42 slides
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM) by
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)eimgreece
55.7K views30 slides
Eim brochure-gr by
Eim brochure-grEim brochure-gr
Eim brochure-greimgreece
474 views40 slides

More from eimgreece(6)

Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ by eimgreece
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥΗ ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
eimgreece1.1K views
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών by eimgreece
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
eimgreece4.2K views
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα by eimgreece
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαΜετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
eimgreece641 views
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR... by eimgreece
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
eimgreece635 views
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM) by eimgreece
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
eimgreece55.7K views
Eim brochure-gr by eimgreece
Eim brochure-grEim brochure-gr
Eim brochure-gr
eimgreece474 views

Recently uploaded

SAP Automation Using Bar Code and FIORI.pdf by
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdfVirendra Rai, PMP
19 views38 slides
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...NUS-ISS
28 views70 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
Perth MeetUp November 2023 by
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023 Michael Price
15 views44 slides
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum... by
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...NUS-ISS
34 views35 slides
Combining Orchestration and Choreography for a Clean Architecture by
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean ArchitectureThomasHeinrichs1
69 views24 slides

Recently uploaded(20)

SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS28 views
Perth MeetUp November 2023 by Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price15 views
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum... by NUS-ISS
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
NUS-ISS34 views
Combining Orchestration and Choreography for a Clean Architecture by ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs169 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2216 views
AI: mind, matter, meaning, metaphors, being, becoming, life values by Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada130 views
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... by NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS37 views
[2023] Putting the R! in R&D.pdf by Eleanor McHugh
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh38 views
How the World's Leading Independent Automotive Distributor is Reinventing Its... by NUS-ISS
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
NUS-ISS15 views
Spesifikasi Lengkap ASUS Vivobook Go 14 by Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 views
Web Dev - 1 PPT.pdf by gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet55 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi120 views
Future of Learning - Yap Aye Wee.pdf by NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS41 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10209 views

BlogForever Project presentation at MTSR2013

  • 1. The BlogForever Project http://blogforever.eu Vangelis Banos, BlogForever Project Manager MTSR 2013, 22 Nov 2013, Thessaloniki 1
  • 2. Contents The Disappearing Web Web Archiving The BlogForever Project BlogForever Applications MTSR 2013, 22 Nov 2013, Thessaloniki 2
  • 3. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 3
  • 4. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 4
  • 5. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 5
  • 6. Web Archiving The Internet Archive comes to the rescue! MTSR 2013, 22 Nov 2013, Thessaloniki 6
  • 7. Web Archiving The process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. MTSR 2013, 22 Nov 2013, Thessaloniki 7
  • 8. The challenge of web archiving File(s) Software Hardware RECORD Generic file archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 8
  • 9. The challenge of web archiving File(s) File(s) Software File(s) File(s) Software ??? Hardware Website Record(s) ??? File(s) Software File(s) File(s) Web archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 9
  • 10. We are focusing on blogs  Blogs have become fairly established as an online communication and web publishing tool.  Hundreds of millions of blogs are published about every conceivable subject. Examples 12/9/2013 70+ million sites in the world 369 million people viewing more than 11.8 billion pages each month 38 million new posts and 62.3 million new comments each month 136.5 million blogs 61 billion posts 83.7 million daily posts MTSR 2013, 22 Nov 2013, Thessaloniki 10
  • 11. Blog Archiving: Objectives & Concerns  Blog characteristics:  Database driven, dynamic websites,  High frequency of updates,  Special structure, metadata, semantics & communication protocols,  Highly interconnected,  Quantity and range of resources,  Ownership and DRM.  Our aims:  harvest, preserve, manage and reuse blogs and their resources. MTSR 2013, 22 Nov 2013, Thessaloniki 11
  • 12. The BlogForever Project  Collaborative EC funded project,  Duration: 1 Mar 11’ – 31 Aug 13’,  Aims: Theoretic and applied research on blog archiving  Coordinated by AUTH.  Partners: MTSR 2013, 22 Nov 2013, Thessaloniki 12
  • 13. BlogForever project achievements BlogForever has created a novel blog archiving approach. It is not only about archiving pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc.). Blog modelling and semantics Preservation strategies Cases studies and validation Implementation of the BlogForever platform MTSR 2013, 22 Nov 2013, Thessaloniki 13
  • 14. BlogForever project achievements Harvesting Unstructured information Web services Blog APIs Blog crawlers     Real-time monitoring Html data extraction engine Spam filtering Web services extraction engine Original data and XML metadata Web services Web interface Managing and reusing Blog digital repository Preserving MTSR 2013, 22 Nov 2013, Thessaloniki        Digital preservation Quality assurance Collections curation Public access APIs Personalised services Information retreival Public web interface / Browse, search,14 export
  • 15. BlogForever Added Value  BlogForever structures the archived blog content. BlogForever is not only about archiving html pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc) based on a special data model.  BlogForever is based on Invenio an open source state-of-the-art digital library management system developed by CERN.  Better metadata and higher information granularity.  Open Standards and Interoperability (MARCXML, Web Services)  Better management of archived information, increasing the utility of the web archive.  Easy to facilitate added value services e.g. analytics. MTSR 2013, 22 Nov 2013, Thessaloniki 15
  • 16. BlogForever Impact Blog archiving methods and policies which are reusable and generic. A blog archiving solution that any institution could use to preserve their collections of blogs ensuring authenticity, integrity, completeness, usability, long term accessibility A blog archiving solution that any researcher could use to gather, analyse and reuse blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 16
  • 17. BlogForever Applications  CERN is currently implementing a high energy physics blogs repository.  AUTH is designing an academic blogs repository.  The Linguistics Department of the University of Hannover is doing a diachronic analysis on certain linguistic and textual phenomena / features using German blogs.  The University of Warwick Computer Science Department is doing social web analytics using blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 17
  • 18. Thank you! Visit http://blogforever.eu  Access all BlogForever Deliverables (Open Access).  Download the Open Source BlogForever Platform. Contact us:  Project Manager: Vangelis Banos vbanos@gmail.com  Exploitation Manager: Efstratios Arampatzis sa@tero.gr MTSR 2013, 22 Nov 2013, Thessaloniki 18

Editor's Notes

  1. The key BlogForever project goals were fully achieved during the time span of the project, during a series of theoretical and applied research tasks.Initially, BlogForever focused on studying weblog structure and semantics, and started developing preservation strategies for weblogs.Later the focus gradually moved to implement the BlogForever platform as well as interoperability prospects and digital rights management strategies.An important aspect of the project was also the design and implementation of extensive case studies of variable complexity and size, to validate and test the BlogForever platform.BlogForever createdan exciting new system to harvest, preserve and manage blog content, developing new insights through its restructuring and reuse. Towards this, it has stepped into yet uncharted territories of theoretical and practical aspects of blog preservation; it first researched blog structure and semantics; it then defined solid blog preservation policies and developed a robust blog preservation software platform; finally it validated the platform through specific case studies using real world data.
  2. After working on what to preserve and how to preserve it, we present how we implemented blog preservation.