Georg Rehm. Observations on Annotations – From Computational Linguistics and the World Wide Web to Artificial Intelligence and back again. Annotation in Scholarly Editions and Research: Function – Differentiation – Systematization, University of Wuppertal, Germany. February 20-22, 2019. Invited keynote talk.
This document summarizes a presentation about building a research data infrastructure for educational studies in Germany. It discusses the experiences of the FDB project, which aims to create a coordinated infrastructure through a network of organizations. Currently, the research data infrastructure has some limitations, including a lack of coordination, insufficient capacities, and underrepresentation of certain data types. Moving forward, the infrastructure needs to further coordinate its services, expand its capacities, and ensure long-term sustainability to fully support open educational data. The FDB project aims to address these challenges through its collaborative efforts.
1. The document provides information about several upcoming conferences and workshops in 2001 calling for papers, including:
- ACL-2001 Workshop on Data-driven MT
- MT 2010 Workshop towards a Road Map for MT
- MT Evaluation Workshop
- IWPT'01 in Beijing on Parsing Technologies
2. The conferences and workshops cover a range of topics within machine translation and natural language processing, including statistical machine translation, example-based machine translation, evaluation of MT systems, and parsing technologies.
3. Important dates are provided for each event, such as paper submission deadlines ranging from April to June 2001, and notification of acceptance ranging from May to July 2001.
European data forum 2012 campaign concept 2012 08 28 0STIinnsbruck
The document provides a campaign concept for promoting the European Data Forum 2012 conference. It identifies the target audience as businesses, researchers, government, and non-profits working in areas related to big data and the data economy. It outlines channels for promotion, including the conference website, social media, mailing lists, and partner organizations. The campaign schedule includes press releases, social media posts, and advertisements leading up to the June 2012 event. Evaluation plans involve analyzing website traffic, registrations, and engagement on social platforms.
Semantic interoperability courses training module 1 - introductory overview...Semic.eu
This document provides an introduction and overview of semantic interoperability and existing initiatives. It defines key terms like interoperability and semantic interoperability. It explains that semantic interoperability ensures the precise meaning of exchanged information is preserved. It also discusses potential conflicts like data-level conflicts due to different representations of data and schema-level conflicts due to different logical structures. The document outlines existing initiatives to achieve semantic interoperability like the ISA Programme, INSPIRE Data Models, UN/CEFACT, and NIEM.
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
Seminar at the School of Informatics, The University of Edinburgh.
In this talk we will present how we are applying ontology engineering principles and tools for the development of a set of shared vocabularies across municipalities in Spain, so that they can start homogenising the generation and publication of open data that may be useful for their own internal reuse as well as for third parties who want to develop applications reusing open data once and deploy them for all municipalities. We will discuss on the main challenges for ontology engineering that arise in this setting, as well as present the work that we have done to integrate ontology development tools into common software development infrastructure used by those who are not experts in Ontology Engineering.
Open Access Statistics: An Examination how to Generate Interoperable Usage In...Daniel Beucke
The document summarizes the Open Access Statistics (OAS) project, which aimed to develop standards for collecting and exchanging usage statistics across open access repositories and services. The OAS project created a technical infrastructure that allowed different repositories and services to aggregate usage data in a central system and exchange standardized usage information. The project helped pilot the implementation of usage statistics in repositories and demonstrated the ability to generate interoperable usage measures across distributed open access systems. However, further work is still needed to refine metrics and facilitate international collaboration.
This document summarizes a presentation about building a research data infrastructure for educational studies in Germany. It discusses the experiences of the FDB project, which aims to create a coordinated infrastructure through a network of organizations. Currently, the research data infrastructure has some limitations, including a lack of coordination, insufficient capacities, and underrepresentation of certain data types. Moving forward, the infrastructure needs to further coordinate its services, expand its capacities, and ensure long-term sustainability to fully support open educational data. The FDB project aims to address these challenges through its collaborative efforts.
1. The document provides information about several upcoming conferences and workshops in 2001 calling for papers, including:
- ACL-2001 Workshop on Data-driven MT
- MT 2010 Workshop towards a Road Map for MT
- MT Evaluation Workshop
- IWPT'01 in Beijing on Parsing Technologies
2. The conferences and workshops cover a range of topics within machine translation and natural language processing, including statistical machine translation, example-based machine translation, evaluation of MT systems, and parsing technologies.
3. Important dates are provided for each event, such as paper submission deadlines ranging from April to June 2001, and notification of acceptance ranging from May to July 2001.
European data forum 2012 campaign concept 2012 08 28 0STIinnsbruck
The document provides a campaign concept for promoting the European Data Forum 2012 conference. It identifies the target audience as businesses, researchers, government, and non-profits working in areas related to big data and the data economy. It outlines channels for promotion, including the conference website, social media, mailing lists, and partner organizations. The campaign schedule includes press releases, social media posts, and advertisements leading up to the June 2012 event. Evaluation plans involve analyzing website traffic, registrations, and engagement on social platforms.
Semantic interoperability courses training module 1 - introductory overview...Semic.eu
This document provides an introduction and overview of semantic interoperability and existing initiatives. It defines key terms like interoperability and semantic interoperability. It explains that semantic interoperability ensures the precise meaning of exchanged information is preserved. It also discusses potential conflicts like data-level conflicts due to different representations of data and schema-level conflicts due to different logical structures. The document outlines existing initiatives to achieve semantic interoperability like the ISA Programme, INSPIRE Data Models, UN/CEFACT, and NIEM.
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
Seminar at the School of Informatics, The University of Edinburgh.
In this talk we will present how we are applying ontology engineering principles and tools for the development of a set of shared vocabularies across municipalities in Spain, so that they can start homogenising the generation and publication of open data that may be useful for their own internal reuse as well as for third parties who want to develop applications reusing open data once and deploy them for all municipalities. We will discuss on the main challenges for ontology engineering that arise in this setting, as well as present the work that we have done to integrate ontology development tools into common software development infrastructure used by those who are not experts in Ontology Engineering.
Open Access Statistics: An Examination how to Generate Interoperable Usage In...Daniel Beucke
The document summarizes the Open Access Statistics (OAS) project, which aimed to develop standards for collecting and exchanging usage statistics across open access repositories and services. The OAS project created a technical infrastructure that allowed different repositories and services to aggregate usage data in a central system and exchange standardized usage information. The project helped pilot the implementation of usage statistics in repositories and demonstrated the ability to generate interoperable usage measures across distributed open access systems. However, further work is still needed to refine metrics and facilitate international collaboration.
This document provides an overview of the EXCITEMENT project, which aims to develop an open platform for multi-lingual textual inference. It involves academic and industrial partners working on two main goals: (1) developing algorithms and resources for multi-lingual textual inference, and (2) applying inference techniques to analyze customer interactions in multiple languages and channels. The work is organized into several work packages focusing on requirements, data collection, architecture, platform development, evaluation, and dissemination over a three year period.
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware ac.uk
Presented at the SoundSoftware 2012 Workshop: http://soundsoftware.ac.uk/soundsoftware2012
Sustainable and reusable software and data are becoming increasingly important in today's research environment. Methods for processing audio and music have become so complex they cannot fully be described in a research paper. Even if really useful research is being done in one research group, other researchers may find it hard to build on this research - or even to know it exists. Researchers are becoming increasingly aware of the need to publish and maintain software code alongside their results, but practical barriers often prevent this from happening. We will describe the Sound Software project, an effort to support software development practice in the UK audio and music research community. We examine some of the the barriers to software reuse, and suggest an incremental approach to overcoming some of them. Finally we make some recommendations for research groups seeking to improve their own researchers' software practice.
The document discusses the development of a metadata model for digitized newspaper articles. It aims to gather existing metadata models, design a comprehensive new model called ENMAP based on standards like METS and MODS, and manage feedback on the format. The model will include a data dictionary defining structural elements and text types found in newspapers. Elements may include titles, headlines, advertisements, illustrations, and page numbers. Text types could be breaking news, reviews, obituaries, advertisements, weather forecasts, and more. The objectives are to provide clear definitions and examples to help libraries apply the metadata and tools can use it for search and crowd-based services. Feedback is sought on defining elements and how they interact with readers.
Web Annotations – A Game Changer for Language Technology?Georg Rehm
Georg Rehm, Felix Sasaki, and Aljoscha Burchardt. Web Annotations - A Game Changer for Language Technologies? I Annotate 2016, Berlin, Germany, May 2016. May 19/20, 2016.
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
The document discusses research directions in intelligent systems and data science. It describes work on making sense of scholarly data through techniques like data mining, semantic technologies, and machine learning. It also discusses mapping and classifying computer science research areas using an automatically generated ontology with over 14,000 topics. Other topics discussed include predicting emerging research areas, applications in smart cities like the MK:Smart project, and potential roles for robots in smart cities like an autonomous health and safety inspector.
Open Access Repositories & Interoperable Usage Statistics: Current Developmen...uherb
The document discusses developments in open access repositories in Germany and Europe. It describes the heterogeneous repository software landscape and efforts to increase integration and standardization. It then outlines the Open Access Statistics project, which aimed to develop a common standard for exchanging usage data between repositories and provide aggregated usage information and metrics. The project created specifications and software modules and demonstrated the potential for a centralized infrastructure to process and exchange interoperable usage statistics. However, challenges around data volumes and standards remain. Further work is needed on privacy, metrics, and international cooperation.
This presentation was provided by Markus Kaindl of Springer Nature, during the NISO event "Transforming Search: What the Information Community Can and Should Build." The virtual conference was held on August 26, 2020.
OLE Project Webinr - Conversation with CUFTS April 8 2009John Little
The document discusses a webinar about the CUFTS (Open Source Serials Management) application. The webinar featured presentations from Brian Owen and Kevin Stranack of Simon Fraser University about the current state and future roadmap of the CUFTS application. Attendees could ask questions about topics like the CUFTS knowledgebase, development timeline, integrating other applications, documentation, and governance plans for ongoing support and enhancements.
Georg Rehm. META-NET and META-SHARE: An Overview. Human Language Technologies: The Baltic Perspective 2010, Riga, Latvia, October 2010. October 8, 2010. Invited keynote talk.
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
- SURF is an organization in the Netherlands that works to improve ICT infrastructure for higher education and research.
- SURF is working on projects to develop "enhanced publications" which combine traditional publications like text with additional materials like data, maps, images and annotations.
- Several projects have been funded to create enhanced publications in fields like archaeology and psychology. Challenges include presentation, identification, long-term preservation and developing tools and infrastructure to support enhanced publications.
- Moving forward, SURF will work on developing repository infrastructure to store and share enhanced publications, creating guidelines and incentivizing their creation through things like legal reports and reward systems.
The presentation aims to emphasize the need for more applications and prototypes in the area of the Semantic Web that will showcase the various research findings and technologies.
New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...Tim Highfield
This document discusses new directions for mapping blog networks. It summarizes previous research mapping political blogs during elections. The author proposes exploring different types of links between blogs, temporal dynamics around events, combining link and content analysis, geographical representations, and alternative mapping approaches beyond networks. The goal is a more nuanced depiction of blog connections and activity that accounts for link semantics, temporal variations, mixed methods, and location.
I held this presentation at the first PKP Scholarly Publishing Conference in Vancouver Canada, on July 12th 2007. Check out the general conference blog if you want to know more about the event:
http://scholarlypublishing.blogspot.com/
You may also be interested in things marked with the "open-access" tag in my own blog:
http://corpblawg.ynada.com/
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Bernhard Rieder
Digital methods allow for the computational analysis of social media data through three main steps: data extraction via platform APIs, data processing and aggregation through extraction software, and data analysis and visualization using analysis software. While promising access to behavioral data at scale, social media analysis requires an understanding of each platform's data formalizations and technical limitations. Different analytical gestures can be applied through statistics, graph theory, and other methods to investigate patterns in content, users, and their relations.
Strategies for implementing ePortfolios in Higher EducationPeter Baumgartner
In the talk I will present a taxonomy of ePortfolio use cases. This taxonomy is one of our results from a research project funded by the Austrian Ministry of Science and Research. During the study we reviewed the state of the art of ePortfolio in the Austrian Higher Education sector. We especially investigated implementation strategies and the use of ePortfolio software.
It turned out that different implementation approaches are prioritising different features of ePortfolio software. Studying the literature on ePortfolio usage we categorised the different features of software functionalities of ePortfolio software and found clues how to match them with different implementation strategies. This approach not only helped us to distinguish different prototypes of use cases but also resulted in an elaborated taxonomy for ePortfolios.
Strategies for implementing ePortfolios in Higher EducationEPNET-Europortfolio
More information about this webinaris available at http://europortfolio.org/articles/webinar-recording-strategies-implementing-eportfolios-higher-education
Slides are available at
http://www.slideshare.net/EPNET-Europortfolio/strategies-for-implementing-eportfolios-in-higher-education-34724244
Dr Professor Baumgartner from Donau-Universität Krems presents a taxonomy of ePortfolio use cases. He writes:
This is the result of a research project at the Department of Interactive Media and Educational Technologies, which reviewed the state of the art of ePortfolio in the Austrian Higher Education sector. Special emphasis will be put on implementation strategies and the use of ePortfolio software.
In the talk I will present a taxonomy of ePortfolio use cases. This taxonomy is one of our results from a research project funded by the Austrian Ministry of Science and Research. During the study we reviewed the state of the art of ePortfolio in the Austrian Higher Education sector. We especially investigated implementation strategies and the use of ePortfolio software.
It turned out that different implementation approaches are prioritising different features of ePortfolio software. Studying the literature on ePortfolio usage we categorised the different features of software functionalities of ePortfolio software and found clues how to match them with different implementation strategies. This approach not only helped us to distinguish different prototypes of use cases but also resulted in an elaborated taxonomy for ePortfolios.
Europortfolio is a European Network of ePortfolio Experts & Practitioners.
Europortfolio, a not-for profit association established with the support of the European Commission, is, dedicated to exploring how e-portfolios and e-portfolio-related technologies and practices can help us to empower:
1. 'Individuals as reflective learners and practitioners;
2. Organisations as a place for authentic learning and assessment, and
3. Society as a place for lifelong learning, employability and self-realisation."
Europortfolio has a broad agenda, if you would wish to know more, or to get involved, you can do this by visiting our website www.europortfolio.org
Tut mathematics and hypermedia research seminar 2011 11-11Yleisradio
The document discusses visualization and analysis of social media networks. It begins by defining information visualization and social network analysis. It then explains how social media data can be gathered from systems through crawling or backend collection. Tools for visualizing the data include Gephi and Gource. Use cases shown include visualizing collaboration networks in academic courses and events like data journalism workshops. The document concludes that visualizations can reveal hidden patterns and recommends more dynamic, user-oriented visualizations.
Monitoring the transformation of a domain-specific portal into a social infor...Ramón OVELAR
1. Domain-specific portals provide integrated access to online resources in a specific domain and often include search, personalization, communication tools, and alerts.
2. As portals have adopted Web 2.0 features that lower participation barriers and encourage user-generated content, they are transforming into social information hubs centered around the online community.
3. This document discusses evaluating features added to a domain-specific e-learning portal to build a social information management system, including how user contributions, organization and retrieval tools impact performance for forecasting and disseminating information.
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...Georg Rehm
Georg Rehm. QURATOR: Developing a Flexible AI Platform for Digital Content Curation. QURATOR 2020 – Conference on Digital Curation Technologies., 1 2020. Fraunhofer FOKUS, January 20/21, 2020. Invited keynote talk.
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...Georg Rehm
Georg Rehm. The Preparation, Impact and Future of the META-NET White Paper Series “Europe’s Languages in the Digital Age”. Sanskrit and Other Indian Languages Technology (SOIL-Tech), Jawaharlal Nehru University, New Delhi, India, February 2019. February 15, 2019. Invited keynote talk.
More Related Content
Similar to Observations on Annotations – From Computational Linguistics and the World Wide Web to Artificial Intelligence and back again
This document provides an overview of the EXCITEMENT project, which aims to develop an open platform for multi-lingual textual inference. It involves academic and industrial partners working on two main goals: (1) developing algorithms and resources for multi-lingual textual inference, and (2) applying inference techniques to analyze customer interactions in multiple languages and channels. The work is organized into several work packages focusing on requirements, data collection, architecture, platform development, evaluation, and dissemination over a three year period.
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware ac.uk
Presented at the SoundSoftware 2012 Workshop: http://soundsoftware.ac.uk/soundsoftware2012
Sustainable and reusable software and data are becoming increasingly important in today's research environment. Methods for processing audio and music have become so complex they cannot fully be described in a research paper. Even if really useful research is being done in one research group, other researchers may find it hard to build on this research - or even to know it exists. Researchers are becoming increasingly aware of the need to publish and maintain software code alongside their results, but practical barriers often prevent this from happening. We will describe the Sound Software project, an effort to support software development practice in the UK audio and music research community. We examine some of the the barriers to software reuse, and suggest an incremental approach to overcoming some of them. Finally we make some recommendations for research groups seeking to improve their own researchers' software practice.
The document discusses the development of a metadata model for digitized newspaper articles. It aims to gather existing metadata models, design a comprehensive new model called ENMAP based on standards like METS and MODS, and manage feedback on the format. The model will include a data dictionary defining structural elements and text types found in newspapers. Elements may include titles, headlines, advertisements, illustrations, and page numbers. Text types could be breaking news, reviews, obituaries, advertisements, weather forecasts, and more. The objectives are to provide clear definitions and examples to help libraries apply the metadata and tools can use it for search and crowd-based services. Feedback is sought on defining elements and how they interact with readers.
Web Annotations – A Game Changer for Language Technology?Georg Rehm
Georg Rehm, Felix Sasaki, and Aljoscha Burchardt. Web Annotations - A Game Changer for Language Technologies? I Annotate 2016, Berlin, Germany, May 2016. May 19/20, 2016.
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
The document discusses research directions in intelligent systems and data science. It describes work on making sense of scholarly data through techniques like data mining, semantic technologies, and machine learning. It also discusses mapping and classifying computer science research areas using an automatically generated ontology with over 14,000 topics. Other topics discussed include predicting emerging research areas, applications in smart cities like the MK:Smart project, and potential roles for robots in smart cities like an autonomous health and safety inspector.
Open Access Repositories & Interoperable Usage Statistics: Current Developmen...uherb
The document discusses developments in open access repositories in Germany and Europe. It describes the heterogeneous repository software landscape and efforts to increase integration and standardization. It then outlines the Open Access Statistics project, which aimed to develop a common standard for exchanging usage data between repositories and provide aggregated usage information and metrics. The project created specifications and software modules and demonstrated the potential for a centralized infrastructure to process and exchange interoperable usage statistics. However, challenges around data volumes and standards remain. Further work is needed on privacy, metrics, and international cooperation.
This presentation was provided by Markus Kaindl of Springer Nature, during the NISO event "Transforming Search: What the Information Community Can and Should Build." The virtual conference was held on August 26, 2020.
OLE Project Webinr - Conversation with CUFTS April 8 2009John Little
The document discusses a webinar about the CUFTS (Open Source Serials Management) application. The webinar featured presentations from Brian Owen and Kevin Stranack of Simon Fraser University about the current state and future roadmap of the CUFTS application. Attendees could ask questions about topics like the CUFTS knowledgebase, development timeline, integrating other applications, documentation, and governance plans for ongoing support and enhancements.
Georg Rehm. META-NET and META-SHARE: An Overview. Human Language Technologies: The Baltic Perspective 2010, Riga, Latvia, October 2010. October 8, 2010. Invited keynote talk.
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
- SURF is an organization in the Netherlands that works to improve ICT infrastructure for higher education and research.
- SURF is working on projects to develop "enhanced publications" which combine traditional publications like text with additional materials like data, maps, images and annotations.
- Several projects have been funded to create enhanced publications in fields like archaeology and psychology. Challenges include presentation, identification, long-term preservation and developing tools and infrastructure to support enhanced publications.
- Moving forward, SURF will work on developing repository infrastructure to store and share enhanced publications, creating guidelines and incentivizing their creation through things like legal reports and reward systems.
The presentation aims to emphasize the need for more applications and prototypes in the area of the Semantic Web that will showcase the various research findings and technologies.
New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...Tim Highfield
This document discusses new directions for mapping blog networks. It summarizes previous research mapping political blogs during elections. The author proposes exploring different types of links between blogs, temporal dynamics around events, combining link and content analysis, geographical representations, and alternative mapping approaches beyond networks. The goal is a more nuanced depiction of blog connections and activity that accounts for link semantics, temporal variations, mixed methods, and location.
I held this presentation at the first PKP Scholarly Publishing Conference in Vancouver Canada, on July 12th 2007. Check out the general conference blog if you want to know more about the event:
http://scholarlypublishing.blogspot.com/
You may also be interested in things marked with the "open-access" tag in my own blog:
http://corpblawg.ynada.com/
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Bernhard Rieder
Digital methods allow for the computational analysis of social media data through three main steps: data extraction via platform APIs, data processing and aggregation through extraction software, and data analysis and visualization using analysis software. While promising access to behavioral data at scale, social media analysis requires an understanding of each platform's data formalizations and technical limitations. Different analytical gestures can be applied through statistics, graph theory, and other methods to investigate patterns in content, users, and their relations.
Strategies for implementing ePortfolios in Higher EducationPeter Baumgartner
In the talk I will present a taxonomy of ePortfolio use cases. This taxonomy is one of our results from a research project funded by the Austrian Ministry of Science and Research. During the study we reviewed the state of the art of ePortfolio in the Austrian Higher Education sector. We especially investigated implementation strategies and the use of ePortfolio software.
It turned out that different implementation approaches are prioritising different features of ePortfolio software. Studying the literature on ePortfolio usage we categorised the different features of software functionalities of ePortfolio software and found clues how to match them with different implementation strategies. This approach not only helped us to distinguish different prototypes of use cases but also resulted in an elaborated taxonomy for ePortfolios.
Strategies for implementing ePortfolios in Higher EducationEPNET-Europortfolio
More information about this webinaris available at http://europortfolio.org/articles/webinar-recording-strategies-implementing-eportfolios-higher-education
Slides are available at
http://www.slideshare.net/EPNET-Europortfolio/strategies-for-implementing-eportfolios-in-higher-education-34724244
Dr Professor Baumgartner from Donau-Universität Krems presents a taxonomy of ePortfolio use cases. He writes:
This is the result of a research project at the Department of Interactive Media and Educational Technologies, which reviewed the state of the art of ePortfolio in the Austrian Higher Education sector. Special emphasis will be put on implementation strategies and the use of ePortfolio software.
In the talk I will present a taxonomy of ePortfolio use cases. This taxonomy is one of our results from a research project funded by the Austrian Ministry of Science and Research. During the study we reviewed the state of the art of ePortfolio in the Austrian Higher Education sector. We especially investigated implementation strategies and the use of ePortfolio software.
It turned out that different implementation approaches are prioritising different features of ePortfolio software. Studying the literature on ePortfolio usage we categorised the different features of software functionalities of ePortfolio software and found clues how to match them with different implementation strategies. This approach not only helped us to distinguish different prototypes of use cases but also resulted in an elaborated taxonomy for ePortfolios.
Europortfolio is a European Network of ePortfolio Experts & Practitioners.
Europortfolio, a not-for profit association established with the support of the European Commission, is, dedicated to exploring how e-portfolios and e-portfolio-related technologies and practices can help us to empower:
1. 'Individuals as reflective learners and practitioners;
2. Organisations as a place for authentic learning and assessment, and
3. Society as a place for lifelong learning, employability and self-realisation."
Europortfolio has a broad agenda, if you would wish to know more, or to get involved, you can do this by visiting our website www.europortfolio.org
Tut mathematics and hypermedia research seminar 2011 11-11Yleisradio
The document discusses visualization and analysis of social media networks. It begins by defining information visualization and social network analysis. It then explains how social media data can be gathered from systems through crawling or backend collection. Tools for visualizing the data include Gephi and Gource. Use cases shown include visualizing collaboration networks in academic courses and events like data journalism workshops. The document concludes that visualizations can reveal hidden patterns and recommends more dynamic, user-oriented visualizations.
Monitoring the transformation of a domain-specific portal into a social infor...Ramón OVELAR
1. Domain-specific portals provide integrated access to online resources in a specific domain and often include search, personalization, communication tools, and alerts.
2. As portals have adopted Web 2.0 features that lower participation barriers and encourage user-generated content, they are transforming into social information hubs centered around the online community.
3. This document discusses evaluating features added to a domain-specific e-learning portal to build a social information management system, including how user contributions, organization and retrieval tools impact performance for forecasting and disseminating information.
Similar to Observations on Annotations – From Computational Linguistics and the World Wide Web to Artificial Intelligence and back again (20)
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...Georg Rehm
Georg Rehm. QURATOR: Developing a Flexible AI Platform for Digital Content Curation. QURATOR 2020 – Conference on Digital Curation Technologies., 1 2020. Fraunhofer FOKUS, January 20/21, 2020. Invited keynote talk.
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...Georg Rehm
Georg Rehm. The Preparation, Impact and Future of the META-NET White Paper Series “Europe’s Languages in the Digital Age”. Sanskrit and Other Indian Languages Technology (SOIL-Tech), Jawaharlal Nehru University, New Delhi, India, February 2019. February 15, 2019. Invited keynote talk.
AI and Conference Interpretation – From Smart Assistants for the Human Interp...Georg Rehm
Georg Rehm. AI and Conference Interpretation - From Smart Assistants for the Human Interpreter to Automatic Solutions. DG Interpretation Lunchtime Session on Digital Transformation. European Commission, Brussels, November 2018. November 12, 2018. Invited talk.
Künstliche Intelligenz beim Dolmetschen und ÜbersetzenGeorg Rehm
Georg Rehm. Künstliche Intelligenz beim Dolmetschen und Übersetzen. Institut für Angewandte Linguistik und Translatologie, Universität Leipzig, November 2018. November 1, 2018. Invited presentation.
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...Georg Rehm
Georg Rehm. Herausforderungen und Lösungen für die europäische Sprachtechnologie-Forschung und -Entwicklung. Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Berlin, Germany, October 2018. October 30, 2018. Presentation on the occasion of being awarded the appointment as a DFKI Research Fellow.
European Language Technologies – Past, Present and FutureGeorg Rehm
Georg Rehm. European Language Technologies – Past, Present and Future. Language Equality in the Digital Age. Conference on language technologies and digital equality in a multilingual Europe, European Parliament, Brussels, Belgium, September 2018. September 27, 2018. Invited talk
Towards a Human Language Project for Multilingual Europe: AI and InterpretationGeorg Rehm
Georg Rehm. Towards a Human Language Project for Multilingual Europe: AI and Interpretation. DG Interpretation Conference - Interpretation: Sharing Knowledge & Fostering Communities. European Commission, Brussels, April 2018. April 19/20, 2018. Invited talk.
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) ÜberblickGeorg Rehm
Georg Rehm. KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick. Interdisziplinärer Forschungsverbund Digital Humanities in Berlin (ifDHb), 23. Berliner DH-Rundgang im Deutschen Forschungszentrum für Künstliche Intelligenz, Berlin, Germany, February 05, 2018.
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Georg Rehm
META-NET has received funding from the EU for several projects related to language technologies, most recently the CRACKER project. The document outlines the history and development of META-NET's Strategic Research and Innovation Agenda (SRIA), including versions 0.5, 0.9, and the current version 1.0 beta, which endorses the establishment of a Human Language Project to help overcome language barriers in Europe. A recent survey of over 600 language technology experts found strong support for a large-scale Human Language Project to achieve deep natural language understanding by 2030.
AI for Translation Technologies and Multilingual EuropeGeorg Rehm
Georg Rehm. AI for Translation Technologies and Multilingual Europe. DG TRAD Conference - Translation Services in the Digital World: A Sneak Peek into the (near) Future. Luxembourg. October 16/17, 2017.
Georg Rehm. Kuratieren im Zeitalter der KI. #DKT17 - Kuratieren im Zeitalter der KI, Berlin, Germany, October 2017. October 12, 2017. Invited keynote talk.
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...Georg Rehm
Georg Rehm. Transformieren, Manipulieren, Kuratieren? Technologien für die Wissensarbeit im Netz. KOOP-LITERA International. Konferenz 2017, Berlin, Germany, June 2017. June 20, 2017. Invited talk.
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenGeorg Rehm
Georg Rehm and Clemens Neudecker. Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken . Berliner Bibliothekswissenschaftliches Kolloqium (BBK), Humboldt-Universität zu Berlin, Berlin, Germany, June 2017. June 06, 2017. Invited talk.
Georg Rehm. EPUB, quo vadis? ePublishing im W3C. Jahrestagung der IG Digital. Im Rahmen der Buchtage, Jahreskongress des Börsenvereins, Berlin, Germany, June 2017. June 14, 2017. Invited talk.
Human Language Technologies in a Multilingual EuropeGeorg Rehm
The document summarizes a presentation on human language technologies in a multilingual Europe. Some key points:
- There are 24 official EU languages and many regional/minority languages that have equal status but most are under-supported by language technologies and face digital extinction.
- The META-NET alliance coordinates language technology research across Europe but the field remains fragmented. There is a need for high-quality, deployable language technologies to support applications like translation, conversational interfaces, and a multilingual digital single market.
- A proposed "Multilingual Value Programme" would help enable the multilingual digital single market through technologies for translating, analyzing, processing and curating natural language content.
- A long-term
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Georg Rehm
Georg Rehm. Language Technologies for Big Data – A Strategic Agenda for the Multilingual Digital Single Market. BDVA Summit (Big Data Value Association), Valencia, Spain, December 2016. December 1, 2016.
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Georg Rehm
Georg Rehm. Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda for the Multilingual Digital Single Market. Future and Emerging Trends in Language Technologies, Machine Learning and Big Data (FETLT 2016), Seville, Spain, November 2016. November 30, 2016.
Georg Rehm. Mehrsprachigkeit für das Digitale Europa. Ringvorlesung Digitale Lebenswelten, University of Hildesheim, Germany, November 2016. November 15, 2016.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Pushing the limits of ePRTC: 100ns holdover for 100 days
Observations on Annotations – From Computational Linguistics and the World Wide Web to Artificial Intelligence and back again
1. Georg Rehm
German Research Center for Artificial Intelligence (DFKI) GmbH
Annotation in scholarly editions and research
Bergische Universität Wuppertal – 21 February 2019
Observations on Annotations
From Computational Linguistics and the
World Wide Web to AI and back again
2. Observations on Annotations – Wuppertal, Germany, 21 February 2019 2
Annotation
Computational
Linguistics and AI
(since 1992)
SGML and TEI
(since 1995)
XML since 1998
XSLT
XPath
Several others ...
Corpus
annotation
formats
Hypertext and
Textlinguistics
Web
Technologies,
W3C, Markup
Languages
W3C Office
Germany/Austria
(since 2013)
AI and Language
Technology
Development
(since 2009)
Infrastructures
and Platforms
Service
Deployment
Research Data
Language
Resources
Metadata
Data FormatsOpen Science
Annotation:
Personal Background
3. Introduction
• Annotations have been playing an important role in
Computational Linguistics and related fields
(especially Digital Humanities) for decades.
• This talk: Recent examples, lessons learned and
some general observations on annotations.
• My own research in this area (since approx. 1996):
– from basic and applied research to
– innovation and technology development
Observations on Annotations – Wuppertal, Germany, 21 February 2019 3
4. Outline
• Annotations – brief definition
• World Wide Web
• Annotations and AI
• Annotations and Computational Linguistics
• Annotations and Language Technology
• Annotations for a Credible Web
• Annotations and Open Science
• Annotations and Markup
• Dimensions of Annotations
• Summary and Conclusions
Observations on Annotations – Wuppertal, Germany, 21 February 2019 4
6. Annotations
• Definition/“Definition”:
Secondary data added to a piece of primary data –
in science, this is, often, research data.
• Wikipedia:
An annotation is a metadatum (e.g., a post, explanation,
markup) attached to [a?] location or other data.
http://www.merriam-webster.com
Observations on Annotations – Wuppertal, Germany, 21 February 2019 6
7. • Literature and education:
– Textual scholarship: Textual scholarship is a discipline that often
uses the technique of annotation to describe or add additional
historical context to texts and physical documents.
– Learning and instruction: As part of guided noticing [annotation]
involves highlighting, naming or labelling and commenting
aspects of visual representations to help focus learners' attention
on specific visual aspects. In other words, it means the assignment
of typological representations (culturally meaningful categories),
to topological representations (e.g. images).
• Software engineering:
– Text documents: Markup languages like XML and HTML annotate
text in a way that is syntactically distinguishable from that text.
They can be used to add information about the desired visual
presentation, or machine-readable semantic information, as in
the semantic web.
• Linguistics:
– In linguistics, annotations include comments and metadata; these
non-transcriptional annotations are also non-linguistic.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 7
12. “Vague but exciting”
Observations on Annotations – Wuppertal, Germany, 21 February 2019 12
Information Management: A Proposal
Tim Berners-Lee, CERN, March 1989, May 1990
“Private links
One must be able to
add one's own private
links to and from public
information. One must
also be able to annotate
links, as well as nodes,
privately.”
13. World Wide Web Consortium
• W3C is an international non-profit member-financed
standards developing organisation
• Founded in 1994 by Sir Tim Berners-Lee
• Currently 451 members – 23 in Germany/Austria
• Approx. 60 staff (ERCIM, MIT, UKeio, UBeihang)
• Approx. 20 offices in important regions
• The W3C Office Germany/Austria is run by
• Open Web Platform, HTML5, CSS, Credible Web, Digital
Publishing, Linked Data etc.
http://w3.org ! http://w3c.de
13
Interested in joining? Talk to me!
14. Relevant W3C Standards
• XML – Extensible Markup Language
– Extremely influential
– Widely adopted
– TEI and many other languages
• Semantic Web
– RDF, OWL, SPARQL, SKOS etc.
• Digital Publishing
– New versions of EPub
• Web Annotation Data Model and Vocabulary
Observations on Annotations – Wuppertal, Germany, 21 February 2019 14
https://www.w3.org/2001/10/03-sww-1/slide7-0.html
16. Web Annotations
• Web Annotation – Three W3C Recommendations
• Most popular and relevant implementation: Hypothes.is
– Mission-driven, non-profit Open Source company
– Main focus on scholarly publishing
(“Annotating All Knowledge Coalition”)
– Very active and vibrant community
• Hypothes.is: main driving force
behind the I Annotate conference series
– Open proceedings, very interesting programme, diverse
speakers from several disciplines – consider attending!
– Videos of almost all previous events available online
Observations on Annotations – Wuppertal, Germany, 21 February 2019 16
17. • Web Annotation Data Model
Describes the underlying Annotation Abstract Data
Model as well as a JSON-LD serialization
• Web Annotation Vocabulary
The Vocabulary which underpins the
Web Annotation Data Model
• Web Annotation Protocol
The HTTP API for publishing, syndicating,
and distributing Web Annotations
• Published on 23 February 2017
Observations on Annotations – Wuppertal, Germany, 21 February 2019 17
Web Annotation Standard
18. Web Annotation Standard
• What does this mean for end users?
– Annotation: a set of connected resources, typically incl. a
body and target – the body is related to the target.
– No more comment widgets and silos!
– Annotation capability can be built natively into the browser
– Conversations can take place anywhere on the web and in
a standards-based way
• Why is this different?
– Annotations can live separately from documents and are
reunited and re-anchored in real-time
– Annotations are under the control of the user
– Users can form communities (across HTML, PDF etc.)
Observations on Annotations – Wuppertal, Germany, 21 February 2019 18
20. Hypothes.is Statistics
Observations on Annotations – Wuppertal, Germany, 21 February 2019 20
December 2018: 4.4 Million Annotations and Counting
260K
In groups, private
In groups, shared
Private
Public
JAN
2015
JAN
2016
JAN
2017
JAN
2018
DEC
2018
20K
40K
60K
80K
100K
120K
140K
160K
180K
200K
220K
240K
21. The Hypothes.is Tool
Observations on Annotations – Wuppertal, Germany, 21 February 2019 21
! Private Notes
! Public annotations
! Collaboration groups
! Linked Data connections
! Cross format:
○ HTML
○ PDF
○ EPUB
○ Data
! Community driven
! Open Source
24. Observations on Annotations – Wuppertal, Germany, 21 February 2019 24
ADA: American Diabetes Association
● Wanted a way to update content
and add information links
● Needed to restrict use to ADA staff
26. Automated Annotation
Observations on Annotations – Wuppertal, Germany, 21 February 2019 26
Automated systems can
tag elements such as
RRIDs (Research
Resource Identifiers) and
other scholarly identifiers
or entities, allowing
navigation to background
information and powerful
search queries through
other papers mentioning
the same entity.
31. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Data Intelligence
Current breakthroughs based on Machine Learning (“Deep Learning”)
Also still in use: symbolic, rule-based methods and expert systems
Artificial Intelligence
Huge data sets + powerful learning algorithms + very fast hardware
31
32. Annotations and AI
• Modern AI is data-driven – supervised learning relies
on annotated data sets.
• However, certain AI algorithms can learn structure and
patterns without any annotations whatsoever.
• The relevance of annotations has increased dramatically.
• This is especially true for very large annotated data sets.
• Many consist of primary data and secondary annotations.
• Companies have emerged that produce annotated data
sets using crowd-workers (e.g., Figure Eight, Crowdee)
• Key question: how detailed, relevant, correct, meaningful
and reliable are these annotations really?
Observations on Annotations – Wuppertal, Germany, 21 February 2019 32
33. Annotations and Events
• Likes and Favs (user-driven annotation, action)
• Five-star ratings (user-driven annotation, action)
• Online comments (user-driven annotation, action)
• Online reviews (user-driven annotation, action)
• Clicking an article headline/link (user-initiated event, action)
• Reading an ebook (user-initiated event, action)
– Page turns in ebooks are measured – when slow: “boredom”, “disinterest”
– Next time in the ebook store you’re getting adjusted recommendations
• No longer reading an ebook (user-initiated event, non-action)
– Boring chapters where people throw in the towel can be easily identified
– (Brave new) future: use automatic paraphrasing to re-write the chapter
– Or maybe NLG and A/B tests – then it’s the original author vs. the machine
Observations on Annotations – Wuppertal, Germany, 21 February 2019 33
35. Annotations in CL
• Diverse and specialised tool landscape
http://annotation.exmaralda.org/index.php?title=Linguistic_Annotation
• Diverse and specialised format landscape:
TEI, NIF, NAF, LAF, TIGER, STTS, FoLiA
and many, many others
• From trivial annotation schemes to extremely complex
• From low inter-annotator agreement scores to high ones
• From flexible tools to highly specialised tools
• From very high quality annotations to very low ones
• A brief look at a few tools …
Observations on Annotations – Wuppertal, Germany, 21 February 2019 35
43. Language Technology
• Language Technology transfers theoretical results from
language-oriented research into technologies and
applications that are ready for production use.
• Uses results from, e.g.:
– Artificial Intelligence
– Computer Science
– Computational Linguistics
– Natural Language Processing
– Psychology, Psycholinguistics
– Cognitive Science
Observations on Annotations – Wuppertal, Germany, 21 February 2019 43
Example Applications
• Spell checkers
• Dictation systems
• Translation systems
• Search engines
• Report generation
• Expert systems
• Dialogue systems
• Text summarisers
44. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
The relationship between
Web Annotations
and Language Technology
on a rather general level.
44
45. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
Content could be created by Language
Technology fully automatically or in a
semi-automatic way (text generation)
45
46. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
Content could be analysed by
Language Technology (semantic
analysis, input for ML algorithms etc.)
46
47. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
Especially in Social Media Analytics
we are interested in UGC, i.e., in
comments, feedback – “what do
users think of a certain product?“.
47
48. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
• Analysing UGC is difficult and
costly (many heterogeneous
sources, many different formats)
• A few established and widely used
Web Annotation services would
simplify SMA dramatically!
48
49. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
We can also use LT methods to
create or help create annotations,
e.g., in smart authoring scenarios.
49
50. LT and Web Annotations
• Analysis of web annotations and exploiting web
annotations through Language Technology:
– Arbitrary web annotations (i.e., unstructured text)
• No more crawling, aggregating, mapping!
– Dedicated LT-specific web annotations
• Annotating language data without any specialised
stand-alone tools or data repositories!
• Generation of web annotations through Language
Technology (e.g., to provide background information on
important content). Example: Content semantification.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 50
51. Platform for digital Curation Technologies
Broker REST API
Curation Service 1
Curation Service 2
Client uses
the API
External
Service 1
External
Service 2
Client uses
the API
Client uses
the API
Curation Workflow
Input
Output
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos/> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
<http://link.omitted/documents/document1#char=0,26>
a nif:RFC5147String , nif:String , nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "26"^^xsd:nonNegativeInteger ;
nif:isString "Welcome to Berlin in 2016. "^^xsd:string ;
dfkinif:averageLatitude "52.516666666666666"^^xsd:double ;
dfkinif:averageLongitude "13.383333333333333"^^xsd:double ;
dfkinif:stdDevLatitude "0.0"^^xsd:double ;
dfkinif:stdDevLongitude "0.0"^^xsd:double ;
nif:meanDateRange "20160101010000_20170101010000"^^xsd:string .
<http://link.omitted/documents/document1#char=21,25>
a nif:RFC5147String , nif:String ;
itsrdf:taIdentRef <http://link.omitted/ontologies/nif#date=20160101000000_20170101000000> ;
nif:anchorOf "2016"^^xsd:string ;
nif:beginIndex "21"^^xsd:nonNegativeInteger ;
nif:endIndex "25"^^xsd:nonNegativeInteger ;
nif:entity <http://link.omitted/ontologies/nif#date>.
<http://link.omitted/documents/#char=11,17>
a nif:RFC5147String , nif:String ;
nif:anchorOf "Berlin"^^xsd:string ;
nif:beginIndex "11"^^xsd:nonNegativeInteger ;
nif:endIndex "17"^^xsd:nonNegativeInteger ;
itsrdf:taClassRef <http://dbpedia.org/ontology/Location> ;
nif:referenceContext <http://link.omitted/documents/#char=0,26> ;
geo:lat "52.516666666666666"^^xsd:double ;
geo:long "13.383333333333333"^^xsd:double ;
itsrdf:taIdentRef <http://dbpedia.org/resource/Berlin> .
NLP Interchange
Format (NIF)
“Welcome to Berlin in 2016.”
• RDF/OWL-basiertes Format für NLP-
Anwendungen
• Ermöglicht Interoperabilität
• Durch pures RDF „natürliche“
Integration von Linked-Data-Daten
• Entwickelt von der Universität Leipzig
• Plattform unterstützt neben NIF auch
Web Annotations
Digital Curation Technologies:
Prototypically implemented Platform and Services
Peter Bourgonje, Julian Moreno-Schneider, Jan Nehring, Georg Rehm, Felix Sasaki, and Ankit Srivastava.
“Towards a Platform for Curation Technologies: Enriching Text Collections with a Semantic-Web Layer.” In
Harald Sack, Giuseppe Rizzo, Nadine Steinmetz, Dunja Mladenić, Sören Auer, and Christoph Lange,
editors, The Semantic Web, number 9989 in LNCS, pages 65-68. Springer, June 2016. ESWC 2016
Satellite Events. Heraklion, Crete, Greece, May 29 - June 2, 2016 Revised Selected Papers.
Client uses
the API
52. 52
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos/> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
<http://link.omitted/documents/document1#char=0,26>
a nif:RFC5147String , nif:String , nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "26"^^xsd:nonNegativeInteger ;
nif:isString "Welcome to Berlin in 2019. "^^xsd:string ;
dfkinif:averageLatitude "52.516666666666666"^^xsd:double ;
dfkinif:averageLongitude "13.383333333333333"^^xsd:double ;
dfkinif:stdDevLatitude "0.0"^^xsd:double ;
dfkinif:stdDevLongitude "0.0"^^xsd:double ;
nif:meanDateRange "20190101010000_20200101010000"^^xsd:string .
<http://link.omitted/documents/document1#char=21,25>
a nif:RFC5147String , nif:String ;
itsrdf:taIdentRef <http://link.omitted/ontologies/nif#date=20190101000000_20200101000000> ;
nif:anchorOf "2019"^^xsd:string ;
nif:beginIndex "21"^^xsd:nonNegativeInteger ;
nif:endIndex "25"^^xsd:nonNegativeInteger ;
nif:entity <http://link.omitted/ontologies/nif#date>.
<http://link.omitted/documents/#char=11,17>
a nif:RFC5147String , nif:String ;
nif:anchorOf "Berlin"^^xsd:string ;
nif:beginIndex "11"^^xsd:nonNegativeInteger ;
nif:endIndex "17"^^xsd:nonNegativeInteger ;
itsrdf:taClassRef <http://dbpedia.org/ontology/Location> ;
nif:referenceContext <http://link.omitted/documents/#char=0,26> ;
geo:lat "52.516666666666666"^^xsd:double ;
geo:long "13.383333333333333"^^xsd:double ;
itsrdf:taIdentRef <http://dbpedia.org/resource/Berlin> .
NLP Interchange
Format (NIF)
“Welcome to Berlin in 2019.”
• RDF/OWL-based format for NLP
applications
• Enables interoperability
• Pure RDF and, hence, natural
integration of Linked Data data
• Developed by Universität Leipzig
• Our platform also supports Web
Annotation data model
53. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Julian Moreno-Schneider, Ankit Srivastava, Peter Bourgonje, David Wabnitz, and Georg Rehm. “Semantic Storytelling, Cross-
lingual Event Detection and other Semantic Services for a Newsroom Content Curation Dashboard.” In Octavian Popescu and
Carlo Strapparava, editors, Proceedings of Natural Language Processing meets Journalism - EMNLP 2017 Workshop (NLPMJ
2017), Copenhagen, Denmark, September 2017. 7. September.
Sector: Journalism
53
54. Observations on Annotations – Wuppertal, Germany, 21 February 2019
Sector: TV, Web-TV, Media
54
Georg Rehm, Julián Moreno Schneider, Peter Bourgonje, Ankit Srivastava, Rolf Fricke, Jan Thomsen, Jing He, Joachim Quantz, Armin Berger, Luca König, Sören
Räuchle, Jens Gerth, and David Wabnitz. “Different Types of Automated and Semi-Automated Semantic Storytelling: Curation Technologies for Different Sectors”.
In Georg Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital Age: 27th International Conference, GSCL 2017, Berlin,
Germany, September 13-14, 2017, Proceedings, number 10713 in Lecture Notes in Artificial Intelligence (LNAI), pages 232-247, Cham, Switzerland, January 2018.
Gesellschaft für Sprachtechnologie und Computerlinguistik e.V., Springer. 13/14 September 2017.
58. Viral Content and Filter Bubbles
• Content is often published without checking its validity,
discovered through social media and, if it appears
relevant, shared immediately.
• Content is often shared without reading it.
• Goal: virality ➟ reach ➟ clicks ➟ ad revenue
• Not all “journalistic” content (or publishing outlets) is really
committed to reporting the facts.
• Nowadays the burden of fact-checking is with the readers.
• „Fake news“: label for several classes of online content.
• Can we balance out filter bubble and network effects?
Observations on Annotations – Wuppertal, Germany, 21 February 2019 58
Georg Rehm. “An Infrastructure for Empowering Internet Users to handle Fake News and other Online Media Phenomena”. In Georg
Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital Age: Proceedings of the GSCL
Conference 2017, Berlin, September 2017. Gesellschaft für Sprachtechnologie und Computerlinguistik e.V. 13.-15. September 2017.
59. Seven classes
of false news
Satire or
parody
Wrong
connection
or relation:
when title
and photos
don‘t
support the
content
Misleading
content:
use of
information
to put
someone
or
something
in a bad
light
Wrong
context:
when
genuine
content is
presented
in the
wrong
context
Deceiving
content:
imitation of
real
sources
Bad
content
with a clear
purpose to
deceive
Fabricated
content:
completely
untrue,
produced
to deceive
Characteristics
Clickbait X X ? ? ?
Disinformation X X X X
Political bias ? X ? ? X
Bad journalism X X X
Publisher‘sintention
Parody X ? ?
Provocation X X X
Profit ? X X X
Deception X X X X X X
Influence politics X X X X
Influence politics X X X X X
Different classes of false news and their individual characteristics and intentions
(based on Wardle, 2017; Walbrühl, 2017; Rubin et al., 2015; Holan, 2016; Weedon et al., 2017)
59
60. Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
61. Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
• Infrastructure as a native part of the web
• Necessary for that: support and buy-in from all
browser vendors, media publishers and standards
• All users need immediate access
62. Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
Tools analyse
automatically
63. Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
• Automatic results and free text
annotations are stored as Web
Annotations.
• Users make their annotations
available to one another.
64. Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
• Automatic analysis of free text
annotations (NLP, IE, RE etc.).
• Extraction of opinions, arguments,
claims, statements etc.
65. Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
UGM
• Standardised metadata schemas for efficient annotations,
e.g. “content is intentionally deceptive.”
• W3C Provenance Ontology, Schema.org (ClaimReview).
• To be used by the human and the machine
66. Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
UGM
Goal: provide technologies to the user, with which
they can consume, assess, analyse, verify and
process digital content and media in a better way and
that indicate which contents may be problematic.
67. Web Annotation + Fake News
• Crowd-sourced Web Annotation content in combination
with a set of automatic analysis tools has enormous
potential to tackle online misinformation campaigns.
• Big impact if deployed widely and implemented correctly.
• However, there’s a danger to shift the point of attack that
misinformation campaigns exploit (to annotations).
• The Credibility Coalition has developed a similar
approach in parallel, see, e.g.,
https://web.hypothes.is/blog/annotation-powered-questionnaires/
Observations on Annotations – Wuppertal, Germany, 21 February 2019 67
69. Open Science
• Movement to make scientific research, data
and dissemination accessible to all levels of
an inquiring society, amateur or professional.
• Encompasses practices such as
publishing open research, campaigning
for open access, encouraging scientists to
practice open notebook science, and
generally making it easier to publish and
communicate scientific knowledge.
• Connection to: annotations, research data
(corpora, LRs), semantics, knowledge,
linked data, repositories and other topics.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 69
https://en.wikipedia.org/wiki/Open_science
70. Observations on Annotations – Wuppertal, Germany, 21 February 2019 70
Open Science Taxonomy
https://en.wikipedia.org/wiki/Open_science
71. Observations on Annotations – Wuppertal, Germany, 21 February 2019 71
Open Science Taxonomy
https://en.wikipedia.org/wiki/Open_science
72. Annotations & Open Science
• Open Science will soon become the norm and goal in
data-intensive science
• Important aspects: interoperability, reproducibility, open
documentation of experiments, use of standards etc.
• Trend: open tools, open workflows, open data sets
• Annotations are an important and crucial piece of the
puzzle, especially documented, meaningful annotations
• Relevant initiatives: NFDI, EOSC
• Relevant principle: FAIR
Observations on Annotations – Wuppertal, Germany, 21 February 2019 72
73. FAIR Principles
• TO BE FINDABLE:
– F1 (meta)data are assigned a globally unique and eternally persistent identifier.
– F2 data are described with rich metadata.
– F3 (meta)data are registered or indexed in a searchable resource.
– F4 metadata specify the data identifier.
• TO BE ACCESSIBLE:
– A1 (meta)data are retrievable by their identifier using a standardized protocol.
– A1.1 the protocol is open, free, and universally implementable.
– A1.2 the protocol allows for an authentication and authorization procedure.
– A2 metadata are accessible, even when the data are no longer available.
• TO BE INTEROPERABLE:
– I1. (meta)data use a formal, accessible, shared, and broadly applicable language for
knowledge representation.
– I2. (meta)data use vocabularies that follow FAIR principles.
– I3. (meta)data include qualified references to other (meta)data.
• TO BE RE-USABLE:
– R1. meta(data) have a plurality of accurate and relevant attributes.
– R1.1 (meta)data are released with a clear and accessible data usage license.
– R1.2 (meta)data are associated with their provenance.
– R1.3 (meta)data meet domain-relevant community standards.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 73
74. Open Science and … Science
• Open Science approaches recommend the use of standards
• Only standardised data and metadata are truly interoperable
• BUT fundamental research is about inventing NEW things
• This contradicts the use of standards as the consensus that
was reached within a specific community
• However, it does NOT contradict the use of established tools
and best practice approaches
• Neither does it contradict the modification of standards
• At the end of the day, it’s about semantics & documentation
• If an established, standardised approach does not work for a
new piece of research, invent a new approach or get creative!
Observations on Annotations – Wuppertal, Germany, 21 February 2019 74
75. Annotation of Documents
• Open Science will be transforming research, making it
more sustainable, more visible, more transparent
• Substantially improved digital infrastructures
• This will, soon, include the annotation of documents,
starting with scientific publications (Web Annotation)
• First steps towards Open Peer Review (cf. arxiv.org)
• Trend: micro-publications (esp. for incremental research)
• Will the scientific paper continue to be the atomic unit?
• Important relevant initiative: ORKG
Observations on Annotations – Wuppertal, Germany, 21 February 2019 75
76. ORKG
• Vision driven forward by Sören Auer (TIB Hannover)
• Exchange of scholarly knowledge is primarily
document-based: researchers produce articles (online
or offline) as coarse-grained text documents.
• Transform this predominant paradigm into knowledge-
based information flows by representing and expressing
knowledge through semantically rich, interlinked graphs.
• Sören Auer et al. (2018): “Towards an Open Research
Knowledge Graph“.
https://doi.org/10.5281/zenodo.1157185
Observations on Annotations – Wuppertal, Germany, 21 February 2019 76
77. Interlinking of Concepts
Observations on Annotations – Wuppertal, Germany, 21 February 2019 77
ated procedures alone do not achieve the necessary coverage and accuracy; fully manual
n is too time-consuming; librarians lack the necessary domain-specific expertise; and scientists
e necessary expertise in knowledge representation. By combining the four strategies in a
ngful way, they can bring their respective strengths to bear and compensate for the weak points.
Interlinking of interdisciplinary and subject-specific concepts and artefacts of scientific work in the
different domains (here: TIB subject areas).
Open Research Knowledge Graph (ORKG) provides interlinking, integration, visualization,
ation, and search functions. It enables scientists to gain a much faster overview of new
pments in a specific field and identify relevant research problems. It represents the evolution of
entific discourse in the individual disciplines and enables scientists to make their work more
ible to colleagues and potential users in industry through semantic description. Figure 3 depicts a
ch contribution represented in simplified form by a knowledge graph.
technical ecosystem for knowledge-based science communication. The ORKG service is
Auer et al. (2018)
Linked Open Data Cloud
Semantic Web
Standards
Persistent Identifiers
GND European
Open Science Cloud
79. Annotations and Markup
• Complex topic – we can only scratch the surface
• XML is – unfortunately – considered “done” within W3C,
all senior XML specialists have left the organisation.
• https://www.balisage.net/Proceedings/vol21/html/Tovey0
1/BalisageVol21-Tovey01.html
– Discussion on the trend from declarative to procedural (!)
markup – there’s stagnation in the markup world.
• Relevant and timely: https://markupdeclaration.org
• Markup is not dead – there’s a small but active and
passionate community.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 79
81. Annotations
• Annotation – Definition:
Secondary data added to a piece of primary data –
in science, this is, often, research data.
• The secondary data is, typically, a property of part of the
primary research data.
• Let’s examine this a bit more closely.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 81
82. Annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 82
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Property
Label of
property
Value of
property
Pointer to
annotation schema
Annotation schema
(possibly external)
may constrain
or restrict
Examples: lemma,
part of speech,
instance-of etc.
• What is the conceptual
nature of this property? Is it
best practice in research or
can it be entirely made up?
• How many colleagues in
the community agree on it?
• Is the label adequate and
self-explanatory?
Text
83. Annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 83
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Property
Label of
property
Value of
property
Pointer to
annotation schema
Annotation schema
(possibly external)
may constrain
or restrict
Examples: adjective,
JJ, object, “some free
text comment” etc.
• The actual annotation payload
• Is the value free text or taken from a
shared vocabulary?
• Is the shared vocabulary prescribed by
an annotation schema or ontology?
• How many colleagues in the community
agree on the value?
• How many colleagues in the community
agree on the shared vocabulary?
Text
84. Annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 84
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Property
Label of
property
Value of
property
Pointer to
annotation schema
Annotation schema
(possibly external)
may constrain
or restrict
Text
• Is there structure among the different properties?
• Markup languages, markup grammars
• Syntactic structure
– Ex.: “HVBXJ” => “AHXB”, “HKVZ”
• Semantic, i.e., logical structure
– Ex.: “NP” => “DET”, “N”
Many annotations
85. Annotating Annotations
Annotations on annotations (just a few selected points)
• Source (machine vs. single human vs. crowd-sourced)
• Application scenario: annotations for human vs. machine consumption
• Purpose or scope of the annotation (e.g., document structure, layout or
style, semantics, rhetorical structure, linguistic properties etc.)
– Can the structure be made explicit by the annotation format,
maybe via a markup language’s grammar?
– Can structure be made explicit through an ontology
that is put on top of the individual properties?
• Confidence value
• Quality indicator (0..1)
• Time added, time modified (timestamp)
• Style information – how annotations are rendered
• Annotation layers – one or multiple layers, independent or interrelated?
Observations on Annotations – Wuppertal, Germany, 21 February 2019 85
86. Evaluation of Annotations
• Measuring inter-annotator agreement
• Measuring intra-annotator agreement – what if the same
person does the same annotation task again after a
week or a month?
• Test replicability and reproducibility
• Important exercise for:
– Emerging annotation formats
– Complex annotation exercises
– Measuring consensus
– Making sure that terms and labels are meaningful
Observations on Annotations – Wuppertal, Germany, 21 February 2019 86
87. Complexity of Annotations
• In (Computational) Linguistics we’ve designed some
fairly detailed annotation formats in the last 30 years.
• In contrast, many modern data sets (especially for data-
driven AI approaches in NLP) are quite shallow.
• AI classifiers need enormous amounts of data and just a
few high-level labels.
• It’s not feasible and too expensive to annotate data with
complex and sophisticated annotation formats.
• Is NLP/AI research forgetting annotation principles?
• Are we dumbing down linguistics to the simple
annotation of trivial labels?
• Has annotation research perhaps become obsolete?
Observations on Annotations – Wuppertal, Germany, 21 February 2019 87
88. • Example: GermEval 2018 data set
Tweet label, tweet label, tweet label etc.
• There is no structure, no concretisation, no hierarchical
information, no additional metadata
• Two observations:
– there’s a trend towards simply more annotations, i.e.,
increased quantity while ignoring quality, complexity and
structure – complex annotations are expensive and difficult
to generalise from.
– there’s a trend towards dumb annotations, which are
often crowd-sourced – it’s easier to generalise from simple
than from structured, hierarchical annotations.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 88
Complexity of Annotations
90. Summary
• Annotations: from trivial to very complex
• From experimental to highly (de facto) standardised
• Annotations of annotations
• Multi-layer annotations – independent or interrelated
• Interoperability and reusability through standards
• But: standards vs. flexibility – basic science vs. applied
• Nowadays, annotations usually happen in the web
• Powerful stack of W3C technologies:
Web Annotation, Semantic Web, Linked Data, XML
• Web-scale annotations for scholarly publishing
• Annotations for Open Science
Observations on Annotations – Wuppertal, Germany, 21 February 2019 90
91. Summary
• Language Technology …
• … to automate the generation of annotations
– Semantification of journalistic/media content
– Semantification of scientific content
• … to automate the analysis of annotations
– Annotations for Open Science
• … to restore credibility and trust in the media
• In AI, annotations in data sets are often trivial
– Trend towards simply more and more annotations
– Trend towards more and more simple annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 91
92. Annotating Annotations
• Different Dimensions of Annotations
• Is it possible to tie all dimensions together in a compact,
machine-readable way to describe and document an
annotation project?
– Complexity
– Semantics
– Source
– Impact
– Standard
– Research Question
– Methodology
– …
Observations on Annotations – Wuppertal, Germany, 21 February 2019 92
• Relevant for Open Science
• Relevant for interoperability
• Relevant for search & retrieval
• Relevant for reproducibility
• Relevant for evaluation
• Relevant for documentation & repos
• Relevant for good scientific practice
• … but maybe this is all too complicated
because a scientific paper already
does the trick in an established way?
93. Four Quadrant Diagram
Observations on Annotations – Wuppertal, Germany, 21 February 2019 93
Basic
research
Applications
and solutions
Humanities research
Computer Science and ICT research
X
• No need for
standardisation
• No need to use
standards
X
Clear need to use standards
for maximum adoption
X
• Avantgarde formats
• Weird phenomena
• Weird needs
• Expressibility
X
• Performance
• Standards
• Interoperability
Number of users:
rather small
Number of users:
rather high
XAI
X
• Markup
• Formal languages
• Querying
• Overlap
X
Digital
Humanities
This diagram is
work in progress.
94. Thank you!
Dr. Georg Rehm
Principal Researcher and Research Fellow
Speech and Language Technology Lab
DFKI, Berlin, Germany
! georg.rehm@dfki.de
! http://georg-re.hm
! http://de.linkedin.com/in/georgrehm
! https://www.slideshare.net/georgrehm
With many thanks to (in alphabetical order):
• Ivan Herman (W3C, The Netherlands)
• Heather Staines, Jon Udell, Dan Whaley (Hypothes.is, USA)
Observations on Annotations – Wuppertal, Germany, 21 February 2019 94
95. • Georg Rehm, Julian Moreno Schneider, and Peter Bourgonje. Automatic and Manual Web Annotations in an
Infrastructure to handle Fake News and other Online Media Phenomena. In Nicoletta Calzolari, Khalid Choukri,
Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani,
Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, editors, Proceedings of the
11th Language Resources and Evaluation Conference (LREC 2018), pages 2416-2422, Miyazaki, Japan, May
2018. European Language Resources Association (ELRA).
• Georg Rehm. An Infrastructure for Empowering Internet Users to handle Fake News and other Online Media
Phenomena. In Georg Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital
Age: 27th International Conference, GSCL 2017, Berlin, Germany, September 13-14, 2017, Proceedings, number
10713 in Lecture Notes in Artificial Intelligence (LNAI), pages 216-231, Cham, Switzerland, January 2018.
Gesellschaft für Sprachtechnologie und Computerlinguistik e.V., Springer. 13/14 September 2017.
• Georg Rehm. The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and
Distributing Language Resources. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck,
Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors,
Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016), pages 2450-2454,
Portorož, Slovenia, May 2016. European Language Resources Association (ELRA).
• Georg Rehm. Texttechnologische Grundlagen. In Kai-Uwe Carstensen, Christian Ebert, Cornelia Endriss, Susanne
Jekat, Ralf Klabunde, and Hagen Langer, editors, Computerlinguistik und Sprachtechnologie - Eine Einführung,
pages 159-168. Spektrum, Heidelberg, 3 edition, 2010.
• Georg Rehm, Richard Eckart, Christian Chiarcos, and Johannes Dellert. Ontology-Based XQuery'ing of XML-
Encoded Language Resources on Multiple Annotation Layers. In Nicoletta Calzolari (Conference Chair), Khalid
Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, editors, Proc. of the 6th
Language Resources and Evaluation Conference (LREC 2008), pages 525-532, Marrakesh, Morocco, May 2008.
• Georg Rehm, Andreas Witt, Erhard Hinrichs, and Marga Reis. Sustainability of Annotated Resources in Linguistics.
In Lisa Lena Opas-Hänninen, Mikko Jokelainen, Ilkka Juuso, and Tapio Seppänen, editors, Digital Humanities 2008,
pages 21-29, Oulu, Finland, June 2008. ACH, ALLC.
• Andreas Witt, Georg Rehm, Timm Lehmberg, and Erhard Hinrichs. Mapping Multi-Rooted Trees from a Sustainable
Exchange Format to TEI Feature Structures. In TEI@20: 20 Years of Supporting the Digital Humanities. The 20th
Anniversary TEI Consortium Members' Meeting, University of Maryland, College Park, October 2007.
• Andreas Witt, Oliver Schonefeld, Georg Rehm, Jonathan Khoo, and Kilian Evang. On the Lossless Transformation
of Single-File, Multi-Layer Annotations into Multi-Rooted Trees. In B. Tommie Usdin, editor, Proceedings of Extreme
Markup Languages 2007, Montréal, Canada, August 2007.
• Kai Wörner, Andreas Witt, Georg Rehm, and Stefanie Dipper. Modelling Linguistic Data Structures. In B. Tommie
Usdin, editor, Proceedings of Extreme Markup Languages 2006, Montréal, Canada, August 2006.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 95