A Social Content Delivery Network for Scientific Cooperation: Vision, Design...Simon Caton
Data volumes have increased so significantly that we need to carefully consider how we interact with, share, and analyze data to avoid bottlenecks. In contexts such as eScience and scientific computing, a large emphasis is placed on collaboration, resulting in many well-known challenges in ensuring that data is in the right place at the right time and accessible by the right users. Yet these simple requirements create substantial challenges for the distribution, analysis, storage, and replication of potentially "large" datasets. Additional complexity is added through constraints such as budget, data locality, usage, and available local storage. In this paper, we propose a "socially driven" approach to address some of the challenges within (academic) research contexts by defining a Social Data Cloud and underpinning Content Delivery Network: a Social CDN (S-CDN). Our approach leverages digitally encoded social constructs via social network platforms that we use to represent (virtual) research communities. Ultimately, the S-CDN builds upon the intrinsic incentives of members of a given scientific community to address their data challenges collaboratively and in proven trusted settings. We define the design and architecture of a S-CDN and investigate its feasibility via a coauthorship case study as first steps to illustrate its usefulness.
The thorough integration of information technology and resources into scientific workflows has nurtured a new paradigm of data-intensive science. However, far too much research activity still takes place in silos, to the detriment of open scientific inquiry and advancement. Data-intensive science would be facilitated by more universal adoption of good data management practices ensuring the ongoing viability and usability of all legitimate research outputs, including data, and the encouragement of data publication and sharing for reuse. The centerpiece of such data sharing is the digital repository, acting as the foundation for external value-added services supporting and promoting effective data acquisition, publication, discovery, and dissemination. Since a general-purpose curation repository will not be able to offer the same level of specialized user experience provided by disciplinary tools and portals, a layered model built on a stable repository core is an appropriate division of labor, taking best advantage of the relative strengths of the concerned systems.
The Merritt repository, operated by the University of California Curation Center (UC3) at the California Digital Library (CDL), functions as a curation core for several data sharing initiatives, including the eScholarship open access publishing platform, the DataONE network, and the Open Context archaeological portal. This presentation with highlight two recent examples of external integration for purposes of research data sharing: DataShare, an open portal for biomedical data at UC, San Francisco; and Research Hub, an Alfresco-based content management system at UC, Berkeley. They both significantly extend Merritt’s coverage of the full research data lifecycle and workflows, both upstream, with augmented capabilities for data description, packaging, and deposit; and downstream, with enhanced domain-specific discovery. These efforts showcase the catalyzing effect that coupled integration of curation repositories and well-known public disciplinary search environments can have on research data sharing and scientific advancement.
Over the past decade, as the scholarly community’s reliance on e-content has increased, so too has the development of preservation-related digital repositories. The need for descriptive, administrative, and structural metadata for each digital object in a preservation repository was clearly recognized by digital archivists and curators. However, in the early 2000’s, most of the published specifications for preservation-related metadata were either implementation specific or broadly theoretical. In 2003, the Online Computer Library Center (OCLC) and Research Libraries Group (RLG) established an international working group called PREMIS (Preservation Metadata: Implementation Strategies) to develop a common core set of metadata elements for digital preservation. The first version of the PREMIS Data Dictionary for Preservation Metadata and its supporting XML schema was issued in 2005. Experience using its specifications in preservation repositories has led to several revisions, with the completion of a version 2.0 in 2008. The Data Dictionary is now in version 2.2 (July 2012), and it is widely implemented in preservation repositories throughout the world in multiple domains.
Presentation on the Warsaw Conference on National Bibliographies August 2012nw13
An up date on the conference held at the National Library of Poland in August 2012 on the challenges facing national bibliographic services in the digital age. The presentation was made at the IFLA WLIC Conference as part of the IFLA Bibliography Standing Committee section of the conference.
Charleston 2012 - The Future of Serials in a Linked Data WorldProQuest
The educational objective of this session is to review today’s MARC-based environment in which the serial record predominates, and compare that with what might be possible in a future world of linked data. The session will inspire conversation and reflection on a number of questions. What will a world of statement-based rather than record-based metadata look like? What will a new environment mean for library systems, workflows, and information dissemination?
A Social Content Delivery Network for Scientific Cooperation: Vision, Design...Simon Caton
Data volumes have increased so significantly that we need to carefully consider how we interact with, share, and analyze data to avoid bottlenecks. In contexts such as eScience and scientific computing, a large emphasis is placed on collaboration, resulting in many well-known challenges in ensuring that data is in the right place at the right time and accessible by the right users. Yet these simple requirements create substantial challenges for the distribution, analysis, storage, and replication of potentially "large" datasets. Additional complexity is added through constraints such as budget, data locality, usage, and available local storage. In this paper, we propose a "socially driven" approach to address some of the challenges within (academic) research contexts by defining a Social Data Cloud and underpinning Content Delivery Network: a Social CDN (S-CDN). Our approach leverages digitally encoded social constructs via social network platforms that we use to represent (virtual) research communities. Ultimately, the S-CDN builds upon the intrinsic incentives of members of a given scientific community to address their data challenges collaboratively and in proven trusted settings. We define the design and architecture of a S-CDN and investigate its feasibility via a coauthorship case study as first steps to illustrate its usefulness.
The thorough integration of information technology and resources into scientific workflows has nurtured a new paradigm of data-intensive science. However, far too much research activity still takes place in silos, to the detriment of open scientific inquiry and advancement. Data-intensive science would be facilitated by more universal adoption of good data management practices ensuring the ongoing viability and usability of all legitimate research outputs, including data, and the encouragement of data publication and sharing for reuse. The centerpiece of such data sharing is the digital repository, acting as the foundation for external value-added services supporting and promoting effective data acquisition, publication, discovery, and dissemination. Since a general-purpose curation repository will not be able to offer the same level of specialized user experience provided by disciplinary tools and portals, a layered model built on a stable repository core is an appropriate division of labor, taking best advantage of the relative strengths of the concerned systems.
The Merritt repository, operated by the University of California Curation Center (UC3) at the California Digital Library (CDL), functions as a curation core for several data sharing initiatives, including the eScholarship open access publishing platform, the DataONE network, and the Open Context archaeological portal. This presentation with highlight two recent examples of external integration for purposes of research data sharing: DataShare, an open portal for biomedical data at UC, San Francisco; and Research Hub, an Alfresco-based content management system at UC, Berkeley. They both significantly extend Merritt’s coverage of the full research data lifecycle and workflows, both upstream, with augmented capabilities for data description, packaging, and deposit; and downstream, with enhanced domain-specific discovery. These efforts showcase the catalyzing effect that coupled integration of curation repositories and well-known public disciplinary search environments can have on research data sharing and scientific advancement.
Over the past decade, as the scholarly community’s reliance on e-content has increased, so too has the development of preservation-related digital repositories. The need for descriptive, administrative, and structural metadata for each digital object in a preservation repository was clearly recognized by digital archivists and curators. However, in the early 2000’s, most of the published specifications for preservation-related metadata were either implementation specific or broadly theoretical. In 2003, the Online Computer Library Center (OCLC) and Research Libraries Group (RLG) established an international working group called PREMIS (Preservation Metadata: Implementation Strategies) to develop a common core set of metadata elements for digital preservation. The first version of the PREMIS Data Dictionary for Preservation Metadata and its supporting XML schema was issued in 2005. Experience using its specifications in preservation repositories has led to several revisions, with the completion of a version 2.0 in 2008. The Data Dictionary is now in version 2.2 (July 2012), and it is widely implemented in preservation repositories throughout the world in multiple domains.
Presentation on the Warsaw Conference on National Bibliographies August 2012nw13
An up date on the conference held at the National Library of Poland in August 2012 on the challenges facing national bibliographic services in the digital age. The presentation was made at the IFLA WLIC Conference as part of the IFLA Bibliography Standing Committee section of the conference.
Charleston 2012 - The Future of Serials in a Linked Data WorldProQuest
The educational objective of this session is to review today’s MARC-based environment in which the serial record predominates, and compare that with what might be possible in a future world of linked data. The session will inspire conversation and reflection on a number of questions. What will a world of statement-based rather than record-based metadata look like? What will a new environment mean for library systems, workflows, and information dissemination?
Keh-Jiann Chen
Principal Investigator
Core Platforms for Digital Contents Project, TELDAP
Research Fellow
Research Center for Information Technology Innovation &
Institute of Information Science, Academia Sinica
Talk on "Dissecting Wikipedia" given at CRASSH, Cambridge, on 6th March 2013.
Abstract:
Andrew Gray, the British Library's Wikipedian in Residence, has been working on an AHRC-supported program to help more academics and researchers engage with Wikipedia. In this talk, he will give a brief history of the Wikipedia project, looking at its origins and the way it has developed over time. The talk will also cover the growing amount of research done around Wikipedia itself. Well over 2,000 peer-reviewed papers have been published which looked at Wikipedia in some way - looking at the project's content and community, or using this data as a way to study broader questions of collaboration and interaction.
Presented by Samara Carter and Monique Clark at the 2013 Power Up Your Pedagogy Conference held at the Annandale campus of Northern Virginia Community College.
Wiki in web 2.0 scenerio concept emerged as a response to the technologies and setting the libraries into more user-centered, networking faculty, students, and librarians to create a vital and evolving organization designed to meet the need of the of the user in digital library era.
Analyzing Multidimensional Networks within MediaWikisBrian Keegan
The MediaWiki platform supports popular socio-technical systems such as Wikipedia as well as thousands of other wikis. This software encodes and records a variety of relationships about the content, history, and editors of its articles such as hyperlinks between articles, discussions among editors, and editing histories. These relationships can be analyzed using standard techniques from social network analysis, however, extracting relational data from Wikipedia has traditionally required specialized knowledge of its API, information retrieval, network analysis, and data visualization that has inhibited scholarly analysis. We present a software library called the NodeXL MediaWiki Importer that extracts a variety of relationships from the MediaWiki API and integrates with the popular NodeXL network analysis and visualization software. This library allows users to query and extract a variety of multidimensional relationships from any MediaWiki installation with a publicly-accessible API. We present a case study examining the similarities and differences between dierent relationships for the Wikipedia articles about "Pope Francis" and "Social media." We conclude by discussing the implications this library has for both theoretical and methodological research as well as community management and outline future work to expand the capabilities of the library.
Connections that work: Linked Open Data demystifiedJakob .
Keynote given 2014-10-22 at the National Library of Finland at Kirjastoverkkopäivät 2014 (https://www.kiwi.fi/pages/viewpage.action?pageId=16767828) #kivepa2014
Keh-Jiann Chen
Principal Investigator
Core Platforms for Digital Contents Project, TELDAP
Research Fellow
Research Center for Information Technology Innovation &
Institute of Information Science, Academia Sinica
Talk on "Dissecting Wikipedia" given at CRASSH, Cambridge, on 6th March 2013.
Abstract:
Andrew Gray, the British Library's Wikipedian in Residence, has been working on an AHRC-supported program to help more academics and researchers engage with Wikipedia. In this talk, he will give a brief history of the Wikipedia project, looking at its origins and the way it has developed over time. The talk will also cover the growing amount of research done around Wikipedia itself. Well over 2,000 peer-reviewed papers have been published which looked at Wikipedia in some way - looking at the project's content and community, or using this data as a way to study broader questions of collaboration and interaction.
Presented by Samara Carter and Monique Clark at the 2013 Power Up Your Pedagogy Conference held at the Annandale campus of Northern Virginia Community College.
Wiki in web 2.0 scenerio concept emerged as a response to the technologies and setting the libraries into more user-centered, networking faculty, students, and librarians to create a vital and evolving organization designed to meet the need of the of the user in digital library era.
Analyzing Multidimensional Networks within MediaWikisBrian Keegan
The MediaWiki platform supports popular socio-technical systems such as Wikipedia as well as thousands of other wikis. This software encodes and records a variety of relationships about the content, history, and editors of its articles such as hyperlinks between articles, discussions among editors, and editing histories. These relationships can be analyzed using standard techniques from social network analysis, however, extracting relational data from Wikipedia has traditionally required specialized knowledge of its API, information retrieval, network analysis, and data visualization that has inhibited scholarly analysis. We present a software library called the NodeXL MediaWiki Importer that extracts a variety of relationships from the MediaWiki API and integrates with the popular NodeXL network analysis and visualization software. This library allows users to query and extract a variety of multidimensional relationships from any MediaWiki installation with a publicly-accessible API. We present a case study examining the similarities and differences between dierent relationships for the Wikipedia articles about "Pope Francis" and "Social media." We conclude by discussing the implications this library has for both theoretical and methodological research as well as community management and outline future work to expand the capabilities of the library.
Connections that work: Linked Open Data demystifiedJakob .
Keynote given 2014-10-22 at the National Library of Finland at Kirjastoverkkopäivät 2014 (https://www.kiwi.fi/pages/viewpage.action?pageId=16767828) #kivepa2014
Collaborative Creation of a Wikidata handbookJakob .
Presentation about the creation of a German handbook on Wikidata and authority files. Accepted at OpenSym (WikiSym) conference 2014 (August 28th). More about the book at http://hshdb.github.io/normdaten-in-wikidata/
Lightening talk for Semantic Web in Libraries (SWIB13) conference at 2013-11-27 about another method of expressing RDF data. See http://gbv.github.io/aREF/ for a preliminary specification.
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...Jakob .
Vortrag am 06.02.2003 im Kolloquium des XML Clearinghouses, Berlin (siehe http://www.ag-nbi.de/archiv/www.xml-clearinghouse.de/ch-veranstaltungen/1/kolloquium_single4f8d.html?eventId=91).
Abstract: Eines der ursprünglichen Ziele von SGML war die Schaffung einheitlicher Formate für Textdokumente. Für verschiedene Anwendungsbereiche haben sich unterschiedliche Dokumentenformate (DTD) wie TEI und DocBook etabliert. Ein allen Anforderungen genügendes Schema kann es jedoch nicht geben. Der Computer und Medienservice der Humboldt Universität Berlin nutzt für die Langzeitarchivierung von Dissertationen seit 5 Jahren die eigens entwickelte DiML-DTD mit einem Bestand von inzwischen fast 250 Dokumenten in SGML. Mit der Umstellung auf XML hat die Arbeitsgruppe Elektronisches Publizieren zur Verwaltung des neuen Dokumentenformates ein eigenes System entwickelt, mit dem wiederverwendbare Strukturen verwaltet werden. Aus diesen lassen sich bedarfsgerecht DTDs für verschiedene Arten wissenschaftlicher Publikationen (Dissertationen, Artikel, Vorlesungen, Konferenzbände etc.) erzeugen, die alle fachspezifischen Elemente enthalten und von Autoren mit XML-Textwerkzeugen überschaubar nutzbar sind. Gleichzeitig können die im Zusammenhang benutzten Werkzeuge wie Dokumentvorlagen und Stylesheets einheitlich gestaltet werden. Das System soll am Beispiel der neuen DiML-DTD vorgestellt und die Möglichkeit der Übertragung auf andere Anwendungsgebiete diskutiert werden.
FRBR light with Simplified Ontology for Bibliographic ResourceJakob .
Lightning Talk about a Simplified Ontology for Bibliographic Resources, given at the Semantic Web in Libraries (SWIB11) conference at November 29th 2011
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
By Design, not by Accident - Agile Venture Bolzano 2024
Metadata in Wikipedia
1. Metadata in Wikipedia
Daniel Kinzler
Wikipedia
Traditional Metadata
Metadata in Wikipedia Document and Revision
Media Metadata
Accessing Metadata
data in, data out Link Structure
Hyperlinks
Categories
Inter-Language Links
WikiWord
Daniel Kinzler Structured Data
Records
Infoboxes
Wikimedia Deutschland e.V. DBPedia
Semantic MediaWiki
WikiData
September 26. 2008 Conclusion
We Have
We Need
Thank You
2. Metadata in Wikipedia
Wikipedia
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
Wikipedia is the free encyclopedia anyone can edit Inter-Language Links
WikiWord
Founded in 2001 Structured Data
Records
Has become the standard online reference Infoboxes
DBPedia
Semantic MediaWiki
Number 8 website (Alexa), 50K requests per second WikiData
Conclusion
Exists in 250 languages, has 10 million articles We Have
We Need
Run by Wikimedia, runs on MediaWiki Thank You
Free content, free software
3. Metadata in Wikipedia
Document Metadata
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Traditional (document) metadata is available throughout Link Structure
Hyperlinks
Wikipedia Categories
Inter-Language Links
Document information WikiWord
Structured Data
Title Records
URL Infoboxes
DBPedia
Semantic MediaWiki
Revision information WikiData
Author Conclusion
We Have
Timestamp We Need
Thank You
4. Metadata in Wikipedia
Document Metadata
Daniel Kinzler
Wikipedia
Metadata for media files is maintained on-page, as Traditional Metadata
content: Document and Revision
Media Metadata
Accessing Metadata
Source, License, Contributors, . . . Link Structure
Hyperlinks
Categories
Inter-Language Links
WikiWord
Structured Data
Records
Infoboxes
DBPedia
Semantic MediaWiki
WikiData
Conclusion
We Have
We Need
Thank You
5. Metadata in Wikipedia
Images Metadata
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Metadata for image formats Media Metadata
Accessing Metadata
Resolution Link Structure
Hyperlinks
EXIF Categories
Inter-Language Links
Author, Copyright WikiWord
Structured Data
Timestamp Records
Exposure, Aperture, Infoboxes
DBPedia
Flash Semantic MediaWiki
WikiData
Camera model
Conclusion
... We Have
We Need
Metadata for audio and Thank You
video formats is not yet
supported.
6. Metadata in Wikipedia
Online Export Interface
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
MediaWiki’s page export facility provides limited Hyperlinks
Categories
metadata Inter-Language Links
WikiWord
Special:Export Structured Data
Records
Pages and revisions Infoboxes
DBPedia
Semantic MediaWiki
XML wrapper around wikitext WikiData
Conclusion
Some basic metadata We Have
We Need
Thank You
7. Metadata in Wikipedia
Online Export Interface XML
Daniel Kinzler
Wikipedia
http://en.wikipedia.org/wiki/Special: Traditional Metadata
Export/Berlin Document and Revision
Media Metadata
Accessing Metadata
<page>
<title>Berlin</title> Link Structure
Hyperlinks
<id>3354</id> Categories
<revision> Inter-Language Links
WikiWord
<id>240627831</id>
<timestamp>2008-09-24T06:44:58Z</timestamp> Structured Data
Records
<contributor>
Infoboxes
<username>Ling.Nut</username> DBPedia
<id>1929579</id> Semantic MediaWiki
WikiData
</contributor>
<minor/> Conclusion
We Have
<comment>clean up, typos fixed</comment> We Need
<text xml:space=quot;preservequot;> Thank You
{{pp-semi-protected|small=yes}}
{{otheruses1|the capital of Germany}}
{{Infobox German Bundesland
8. Metadata in Wikipedia
MediaWiki Web API
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
MediaWiki’s web API for bots/scripts Hyperlinks
Categories
api.php Inter-Language Links
WikiWord
supports complex queries Structured Data
Records
lots of properties Infoboxes
DBPedia
Semantic MediaWiki
several output formats (JSON, YAML, WDDX, . . . ) WikiData
Conclusion
but no RDF We Have
We Need
Thank You
9. Metadata in Wikipedia
MediaWiki Web API XML
Daniel Kinzler
http://en.wikipedia.org/w/api.php?action= Wikipedia
query&titles=Berlin&prop=info| Traditional Metadata
revisions&rvlimit=5&format=xml Document and Revision
Media Metadata
Accessing Metadata
<page pageid=quot;3354quot;
Link Structure
ns=quot;0quot; Hyperlinks
title=quot;Berlinquot; Categories
touched=quot;2008-09-24T06:44:58Zquot; Inter-Language Links
WikiWord
lastrevid=quot;240627831quot;
Structured Data
counter=quot;2317quot; Records
length=quot;91446quot;> Infoboxes
<revisions> DBPedia
Semantic MediaWiki
<rev revid=quot;240627831quot; WikiData
minor=quot;quot;
Conclusion
user=quot;Ling.Nutquot; We Have
timestamp=quot;2008-09-24T06:44:58Zquot; We Need
comment=quot;clean up, typos fixedquot; /> Thank You
<rev revid=quot;239984512quot;
user=quot;Lear 21quot;
timestamp=quot;2008-09-21T12:03:45Zquot;
comment=quot;/* Transportation */ refquot; />
10. Metadata in Wikipedia
MediaWiki RDF Extension
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
The RDF Extension provides access to metadata Accessing Metadata
Link Structure
Per-page RDF output Hyperlinks
Categories
Document info mainly in DC and CC vocab Inter-Language Links
WikiWord
Also links, categories, images, etc Structured Data
Records
Output in XML, Turtle or NTriples Infoboxes
DBPedia
Semantic MediaWiki
Supports custom RDF embedded in wiki pages WikiData
Conclusion
Compare http://www.communitywiki.org/en/ We Have
DublinCoreForWiki We Need
Thank You
Not on Wikipedia, used by WikiTravel
11. Metadata in Wikipedia
MediaWiki RDF Extension XML
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
http://wikitravel.org/en/Special:Rdf/Berlin Media Metadata
Accessing Metadata
<rdf:Description Link Structure
Hyperlinks
rdf:about=quot;http://wikitravel.org/en/Berlinquot;> Categories
<dc:date Inter-Language Links
rdf:datatype=quot;http://purl.org/dc/elements/1.1/W3CDTFquot;> WikiWord
2008-09-23T18:04:01Z Structured Data
Records
</dc:date>
Infoboxes
<dc:rights> DBPedia
Creative Commons Attribution-ShareAlike 1.0 Semantic MediaWiki
WikiData
</dc:rights>
<dc:title xml:lang=quot;enquot;> Conclusion
We Have
Berlin We Need
</dc:title> Thank You
12. Metadata in Wikipedia
Structural Information
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
Wiki pages contain several types of links Inter-Language Links
WikiWord
The structure of hyperlinks encodes relations Structured Data
Records
Links connect on textual and conceptual level Infoboxes
DBPedia
Semantic MediaWiki
Links maintened by users, relations are implicit WikiData
Conclusion
We Have
We Need
Thank You
13. Metadata in Wikipedia
Page Links
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Hyperlinks cross-reference pages Link Structure
Hyperlinks
Navigational, but also conceptual Categories
Inter-Language Links
WikiWord
Mutually linked pages → related concepts
Structured Data
Link label and link target → word and meaning Records
Infoboxes
DBPedia
Beware identity crisis when choosing URIs Semantic MediaWiki
WikiData
Conclusion
[[Berlin Wall|The Wall]] We Have
We Need
Thank You
14. Metadata in Wikipedia
Category Links
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Pages are assigned to one or more categories.
Link Structure
Hyperlinks
Categories form a poly-hierarchy (by convention) Categories
Inter-Language Links
Categories of pages → Subsumtion of concepts WikiWord
Structured Data
Structure often unclear or broken Records
Infoboxes
No intersection, no transitive inclusion DBPedia
Semantic MediaWiki
WikiData
[[Category:Capitals in Europe]] Conclusion
We Have
[[Category:States of Germany]] We Need
Thank You
15. Metadata in Wikipedia
Inter-Language Links
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Inter-language links refer to the same page in a different Accessing Metadata
language (on another wiki) Link Structure
Hyperlinks
Granularity and coverage differ greatly Categories
Inter-Language Links
WikiWord
Mutually linked pages probably describe the same
Structured Data
concept Records
Infoboxes
Maintained manually, and per bot DBPedia
Semantic MediaWiki
WikiData
Would a centralized system be better?
Conclusion
We Have
[[de:Berliner Mauer]] We Need
Thank You
[[fr:Mur de Berlin]]
16. Metadata in Wikipedia
WikiWord
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
WikiWord builds a thesaurus by mining the link structure Accessing Metadata
Link Structure
Every page describes a concept Hyperlinks
Categories
Inter-Language Links
Link labels are terms refering to those concepts WikiWord
Structured Data
Links and categories define relations Records
Infoboxes
Multilingual thesaurus by merging languages DBPedia
Semantic MediaWiki
Export to SKOS WikiData
Conclusion
No web interface yet We Have
We Need
Thank You
http://brightbyte.de/page/WikiWord
17. Metadata in Wikipedia
Data Records
Daniel Kinzler
Wikipedia
Wikipedia uses templates to present structured data Traditional Metadata
Document and Revision
records Media Metadata
Accessing Metadata
Maintained directly by users Link Structure
Hyperlinks
Template parameters can be extracted Categories
Inter-Language Links
WikiWord
MediaWiki stores them as plain text
Structured Data
Records
External mining tools needed Infoboxes
DBPedia
Semantic MediaWiki
{{Infobox German Bundesland WikiData
|Name = Berlin
Conclusion
|image_photo = Cityscapeberlin2006.JPG We Have
|area = 891.82 We Need
Thank You
|population = 3416300
|elevation = 34 - 115
|GDP = 81.7
...
18. Metadata in Wikipedia
Infoboxes
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Infoboxes present a terse overview of Hyperlinks
Categories
properties Inter-Language Links
WikiWord
Used for Cities, animals, bands, Structured Data
Records
books, chemicals, . . . Infoboxes
DBPedia
Semantic MediaWiki
Qualifiers are problematic: WikiData
date of measurement, error Conclusion
We Have
margin, unit, source, etc We Need
Thank You
19. Metadata in Wikipedia
Personendaten
Daniel Kinzler
Wikipedia
“Personendaten” are biographic records on the German Traditional Metadata
Document and Revision
Wikipedia Media Metadata
Accessing Metadata
Works like a hidden infobox Link Structure
Hyperlinks
Contains date/place of birth/death, aliases, etc. Categories
Inter-Language Links
Maintained by a WikiProject WikiWord
Structured Data
Automated extraction (every now and then) Records
Infoboxes
DBPedia
{{Personendaten Semantic MediaWiki
WikiData
|NAME=Einstein, Albert
|ALTERNATIVNAMEN= Conclusion
We Have
|KURZBESCHREIBUNG=Physiker
We Need
|GEBURTSDATUM=14. M¨rz 1879
a Thank You
|GEBURTSORT=[[Ulm]]
|STERBEDATUM=18. April 1955
|STERBEORT=[[Princeton (New Jersey)|Princeton]], [[USA]]
}}
20. Metadata in Wikipedia
DBPedia
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
DBPedia is a project that mines RDF triples from Inter-Language Links
WikiWord
Infoboxes Structured Data
Records
Allows SPARQL queries Infoboxes
DBPedia
Semantic MediaWiki
Multiple languages WikiData
100 million triples Conclusion
We Have
We Need
Web interface Thank You
http://dbpedia.org
21. Metadata in Wikipedia
DBPedia XML
Daniel Kinzler
Wikipedia
Traditional Metadata
http://dbpedia.org/data/Berlin Document and Revision
Media Metadata
Accessing Metadata
<rdf:Description
Link Structure
rdf:about=quot;http://dbpedia.org/resource/Lothar_Bolzquot;>
Hyperlinks
<n0pred:deathPlace xmlns:n0pred=quot;http://dbpedia.org/property/quot; Categories
rdf:resource=quot;http://dbpedia.org/resource/Berlinquot;/> Inter-Language Links
WikiWord
</rdf:Description>
<rdf:Description Structured Data
Records
rdf:about=quot;http://dbpedia.org/resource/Alfred_Wegenerquot;> Infoboxes
<n0pred:birthPlace xmlns:n0pred=quot;http://dbpedia.org/property/quot; DBPedia
rdf:resource=quot;http://dbpedia.org/resource/Berlinquot;/> Semantic MediaWiki
WikiData
</rdf:Description>
<rdf:Description Conclusion
We Have
rdf:about=quot;http://dbpedia.org/resource/Untotenquot;> We Need
<n0pred:origin xmlns:n0pred=quot;http://dbpedia.org/property/quot; Thank You
rdf:resource=quot;http://dbpedia.org/resource/Berlinquot;/>
</rdf:Description>
22. Metadata in Wikipedia
Semantic MediaWiki
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
Semantic MediaWiki is a MediaWiki extension: Inter-Language Links
WikiWord
Builds an RDF structure Structured Data
Records
Allows SPARQL queries Infoboxes
DBPedia
Semantic MediaWiki
Users enter semantic relations in wiki syntax WikiData
Conclusion
More complex syntax We Have
We Need
semantic-mediawiki.org Thank You
Not supported by Wikipedia
23. Metadata in Wikipedia
Semantic MediaWiki XML
Daniel Kinzler
Wikipedia
http://semantic-mediawiki.org/wiki/Special: Traditional Metadata
ExportRDF/Berlin Document and Revision
Media Metadata
Accessing Metadata
<swivt:Subject rdf:about=quot;&wiki;Berlinquot;> Link Structure
<rdfs:label>Berlin</rdfs:label> Hyperlinks
<swivt:page rdf:resource=quot;&wikiurl;Berlinquot;/> Categories
Inter-Language Links
<rdfs:isDefinedBy rdf:resource=quot;&wikiurl;Special:ExportRDF/Berlinquot;/>
WikiWord
<rdf:type rdf:resource=quot;&wiki;Category-3ACityquot;/>
Structured Data
<property:Capital_of rdf:resource=quot;&wiki;Germanyquot;/> Records
<property:Coordinates Infoboxes
rdf:datatype=quot;http://www.w3.org/2001/XMLSchema#stringquot;> DBPedia
Semantic MediaWiki
52◦ 31 0 N, 13◦ 24 0 E WikiData
</property:Coordinates> Conclusion
<property:Located_in rdf:resource=quot;&wiki;Germanyquot;/> We Have
<property:Population We Need
Thank You
rdf:datatype=quot;http://www.w3.org/2001/XMLSchema#doublequot;>
3391407
</property:Population>
</swivt:Subject>
24. Metadata in Wikipedia
WikiData
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
WikiData is a MediaWiki extension: Inter-Language Links
WikiWord
Stores structured data separate from wikitext Structured Data
Records
Reusable across wikis Infoboxes
DBPedia
Semantic MediaWiki
Form-based structured data entry WikiData
Conclusion
No export interface We Have
We Need
omegawiki.org Thank You
Not used by Wikipedia, active on OmegaWiki
25. Metadata in Wikipedia
We Have
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
We have. . . Categories
Inter-Language Links
Document Metadata WikiWord
Structured Data
Structural Data Records
Infoboxes
Structured data records DBPedia
Semantic MediaWiki
WikiData
Lots of people maintaining this
Conclusion
We Have
We Need
Thank You
26. Metadata in Wikipedia
We Need
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
We need ways to. . . Hyperlinks
Categories
maintain the data easily. Inter-Language Links
WikiWord
store structured data sensibly. Structured Data
Records
query the data efficiently. Infoboxes
DBPedia
Semantic MediaWiki
access the data conveniently. WikiData
Conclusion
We need people to make it happen. We Have
We Need
Thank You
27. Metadata in Wikipedia
Thank You
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
The End Categories
Inter-Language Links
WikiWord
Structured Data
Records
Infoboxes
DBPedia
Semantic MediaWiki
WikiData
http://brightbyte.de/repos/papers/2008/ Conclusion
We Have
We Need
Thank You