Slides from a talk I gave at Perspectives Workshop on Semantic Web, http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=09271 ... Dagstuhl, Germany 2009-06-29. Title was from Jim Hender!
Presentation about reference rot given at the Complexity Science Hub in Vienna, November 2021.
Links to web resources frequently break (link rot), and linked content can change at unpredictable rates (content drift). These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information.
This presentation will report on research that assessed the extent of these problems for links to web resources in scholarly literature, by using three vast corpora of publications and a range of public web archives. It will also describe the Robust Link approach that offers a proactive, uniform, and machine-actionable way to combat link rot and content drift. Finally, it will introduce the Robustify web service and API that was devised to generate links that remain functional over time, paying special attention to challenges related to deploying infrastructure that is required to be long lasting.
Slides from a talk I gave at Perspectives Workshop on Semantic Web, http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=09271 ... Dagstuhl, Germany 2009-06-29. Title was from Jim Hender!
Presentation about reference rot given at the Complexity Science Hub in Vienna, November 2021.
Links to web resources frequently break (link rot), and linked content can change at unpredictable rates (content drift). These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information.
This presentation will report on research that assessed the extent of these problems for links to web resources in scholarly literature, by using three vast corpora of publications and a range of public web archives. It will also describe the Robust Link approach that offers a proactive, uniform, and machine-actionable way to combat link rot and content drift. Finally, it will introduce the Robustify web service and API that was devised to generate links that remain functional over time, paying special attention to challenges related to deploying infrastructure that is required to be long lasting.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
Keynote talk at the 18th International Conference on Business Information Systems, 24-26 June 2015, Poznań, Poland
URL:
http://bis.kie.ue.poznan.pl/bis2015/keynote-speakers/
Abstract:
Motivated by Google, Yahoo!, Microsoft, and Facebook, hundreds of thousands of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, and Microformats. In parallel, the adoption of Linked Data technologies by government agencies, libraries, and scientific institutions has risen considerably. In his talk, Christian Bizer will give an overview of the content profile of the resulting Web of Data. He will showcase applications that exploit the Web of Data and will discuss the challenges of integrating and cleansing data from thousands of independent Web data sources.
Various FAIR criteria pertaining to machine interaction with scholarly artifacts can commonly be addressed by means of repository-wide affordances that are uniformly provided for all hosted artifacts rather than through artifact-specific interventions. If various repository platforms provide such affordances in an interoperable manner, devising tools - for both human and machine use - that leverage them becomes easier.
My involvement, over the years, in a range of interoperability efforts has brought the insight that two factors strongly influence adoption: addressing a burning issue and delivering a KISS solution to tackle it. Undoubtedly, FAIR and FAIR DOs are burning issues. FAIR Signposting <https://signposting.org/FAIR/> is an ad-hoc repository interoperability effort that squarely fits in this problem space and that purposely specifies a KISS solution, hoping to inspire wide adoption.
Presentation for a workshop about persistent identifiers organized by the Royal Library of The Netherlands and DANS. Highlights the non-trivial commitments required of all parties involved in persistent identifier systems to actually keep links based on persistent identifiers ... err ... persistent.
واقع البوابات المفتوحة في الحكومات الالكترونية في دول مجلس التعاون الخليجيSaeed Al Dhaheri
هذا العرض يشمل على دراسة لواقع البيانات الحكومية المفتوحة في دول مجلس التعاون الخليجي وعرض أمثلة عن أفضل الممارسات في مجال البيانات الحكومية المفتوحة والتوجه المطلوب على مستوى دول مجلس التعاون
Using Minecraft for community engagement and public space designmysociety
This was presented by Pontus Westerberg from UN-Habitat at the Impacts of Civic Technology Conference (TICTeC2016) in Barcelona on 27th April. You can find out more information about the conference here: https://www.mysociety.org/research/tictec-2016/
يهدف هذا الويبينار إلى تقديم موضوع البيانات المرتبطة المفتوحة بالإضافة إلى تغطّية المفاهيم الأساسيَّة للبيانات المرتبطة المفتوحة (LOD) والمفاهيم المتّصلة مثل الشبكة الدلاليّة والبيانات المفتوحة ومناقشة الكيفية التي يُمكن تطبيقها من قِبَل أولئك الذين يعملون في المجال الزراعيّ وخارجه مما يساعدهم علىتحقّيق أهدافهم الاستراتيجية العليا. وسيناقش الويبينار مواضيع أكثر تحديدًا مثل إنشاء وربط ونشر البيانات الخاصة بك، وسيقدّم معلومات عن الأدوات والموارد التي يمكنك الإستعانة بها لهذا الغرض
See how taxonomies and thesauri serve as a core element of a linked data strategy and how large knowledge graphs can be built around it. Based on semantic web standards like SKOS, OWL, and SPARQL, enterprises can develop highly agile data integration platforms.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
Keynote talk at the 18th International Conference on Business Information Systems, 24-26 June 2015, Poznań, Poland
URL:
http://bis.kie.ue.poznan.pl/bis2015/keynote-speakers/
Abstract:
Motivated by Google, Yahoo!, Microsoft, and Facebook, hundreds of thousands of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, and Microformats. In parallel, the adoption of Linked Data technologies by government agencies, libraries, and scientific institutions has risen considerably. In his talk, Christian Bizer will give an overview of the content profile of the resulting Web of Data. He will showcase applications that exploit the Web of Data and will discuss the challenges of integrating and cleansing data from thousands of independent Web data sources.
Various FAIR criteria pertaining to machine interaction with scholarly artifacts can commonly be addressed by means of repository-wide affordances that are uniformly provided for all hosted artifacts rather than through artifact-specific interventions. If various repository platforms provide such affordances in an interoperable manner, devising tools - for both human and machine use - that leverage them becomes easier.
My involvement, over the years, in a range of interoperability efforts has brought the insight that two factors strongly influence adoption: addressing a burning issue and delivering a KISS solution to tackle it. Undoubtedly, FAIR and FAIR DOs are burning issues. FAIR Signposting <https://signposting.org/FAIR/> is an ad-hoc repository interoperability effort that squarely fits in this problem space and that purposely specifies a KISS solution, hoping to inspire wide adoption.
Presentation for a workshop about persistent identifiers organized by the Royal Library of The Netherlands and DANS. Highlights the non-trivial commitments required of all parties involved in persistent identifier systems to actually keep links based on persistent identifiers ... err ... persistent.
واقع البوابات المفتوحة في الحكومات الالكترونية في دول مجلس التعاون الخليجيSaeed Al Dhaheri
هذا العرض يشمل على دراسة لواقع البيانات الحكومية المفتوحة في دول مجلس التعاون الخليجي وعرض أمثلة عن أفضل الممارسات في مجال البيانات الحكومية المفتوحة والتوجه المطلوب على مستوى دول مجلس التعاون
Using Minecraft for community engagement and public space designmysociety
This was presented by Pontus Westerberg from UN-Habitat at the Impacts of Civic Technology Conference (TICTeC2016) in Barcelona on 27th April. You can find out more information about the conference here: https://www.mysociety.org/research/tictec-2016/
يهدف هذا الويبينار إلى تقديم موضوع البيانات المرتبطة المفتوحة بالإضافة إلى تغطّية المفاهيم الأساسيَّة للبيانات المرتبطة المفتوحة (LOD) والمفاهيم المتّصلة مثل الشبكة الدلاليّة والبيانات المفتوحة ومناقشة الكيفية التي يُمكن تطبيقها من قِبَل أولئك الذين يعملون في المجال الزراعيّ وخارجه مما يساعدهم علىتحقّيق أهدافهم الاستراتيجية العليا. وسيناقش الويبينار مواضيع أكثر تحديدًا مثل إنشاء وربط ونشر البيانات الخاصة بك، وسيقدّم معلومات عن الأدوات والموارد التي يمكنك الإستعانة بها لهذا الغرض
See how taxonomies and thesauri serve as a core element of a linked data strategy and how large knowledge graphs can be built around it. Based on semantic web standards like SKOS, OWL, and SPARQL, enterprises can develop highly agile data integration platforms.
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)Emil Eifrem
Presentation given at nosql east 2009 in Atlanta. Introduces the NOSQL space by offering a framework for categorization and discusses the benefits of graph databases. Oh, and also includes some tongue-in-cheek party poopers about sucky things in the NOSQL space.
This is an edited version of a talk that I gave on the 11th of February to some PhD students from the University of Utrecht at a seminar on science and communication.
This long paper started out as a small experiment which was supposed to last an afternoon - a play-around with softwares NetDraw and yEd.
It ended up being a huge paper - too long to publish in a printed publication.
Results are not that significant, in that in the IETF (Internet Engineering Task Force) community, it appears that people really mingle a lot with each other, but the matter of interest is to discover the power of the analysis which can be performed using the software used.
I really believe that Social Network Analysis using Netdraw, yEd, and other SNA and visualisation software, should be mandatory for any bottom-up organisation. I also think that corporations and organisations would really benefit from:
1. having their internal social networks analysis in the same manner.
2. using this type of analysis on their external professional social networks
This pinpoints who are the movers and shakers in the organization. This also pinpoints areas/departments where information flow might not be optimal, thus having a lesser contribution to the organization as a whole.
Feedback/discussion very welcome.
Slides of the presentation by Robert Isele of Free University of Berlin, Germany in the course of the LOD2 webinar: SILK on 21.02.2012 - for more information please see: http://lod2.eu/BlogPost/webinar-series
A Semantic Web Primer: The History and Vision of Linked Open Data and the Web 3.0
There is a transformational change coming to the world-wide-web that will fundamentally alter how its vast array of data is structured, and as a result greatly enhance the way humans and machines interact with this indispensable resource. Given the inertia of existing infrastructure, this segue will be evolutionary as opposed to revolutionary, and indeed has been envisioned since the inception of the web. Come join us for a layman's look at the nature of the Web 3.0, its historical underpinnings, and the opportunities it presents.
A presentation by Gill Hamilton, Digital Access Manager at the National Library of Scotland (NLS).
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Ansgar Scherp
Slides of our presentation @iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence, Linz, Austria, 29 November 2021 - 1 December 2021. ACM 2021, ISBN 978-1-4503-9556-4
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...Ansgar Scherp
Presentation for our paper @iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence, Linz, Austria, 29 November 2021 - 1 December 2021. ACM 2021, ISBN 978-1-4503-9556-4
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Ansgar Scherp
Text extraction from scientific figures has been addressed in the past by different unsupervised approaches due to the limited amount of training data. Motivated by the recent advances in Deep Learning, we propose a two-step neural-network-based pipeline to localize and extract text using Fully Convolutional Networks. We improve the localization of the text bounding boxes by applying a novel combination of a Residual Network with the Region Proposal Network based on Faster R-CNN. The predicted bounding boxes are further pre-processed and used as input to the of-the-shelf optical character recognition engine Tesseract 4.0. We evaluate our improved text localization method on five different datasets of scientific figures and compare it with the best unsupervised pipeline. Since only limited training data is available, we further experiment with different data augmentation techniques for increasing the size of the training datasets and demonstrate their positive impact. We use Average Precision and F1 measure to assess the text localization results. In addition, we apply Gestalt Pattern Matching and Levenshtein Distance for evaluating the quality of the recognized text. Our extensive experiments show that our new pipeline based on neural networks outperforms the best unsupervised approach by a large margin of 19-20%.
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresAnsgar Scherp
So far, there has not been a comparative evaluation of different approaches for text extraction from scholarly figures. In order to fill this gap, we have defined a generic pipeline for text extraction that abstracts from the existing approaches as documented in the literature. In this paper, we use this generic pipeline to systematically evaluate and compare 32 configurations for text extraction over four datasets of scholarly figures of different origin and characteristics. In total, our experiments have been run over more than 400 manually labeled figures. The experimental results show that the approach BS-4OS results in the best F-measure of 0.67 for the Text Location Detection and the best average Levenshtein Distance of 4.71 between the recognized text and the gold standard on all four datasets using the Ocropy OCR engine.
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...Ansgar Scherp
ACM SIGMM Rising Stars Symposium
The ACM SIGMM Rising Stars Symposium, inaugurated in 2015, will highlight plenary presentations of six selected rising SIGMM members on their vision and research achievements, and dialogs with senior members about the future of multimedia research.
See: http://www.acmmm.org/2016/?page_id=706
Mining and Managing Large-scale Linked Open DataAnsgar Scherp
Linked Open Data (LOD) is about publishing and interlinking data of different origin and purpose on the web. The Resource Description Framework (RDF) is used to describe data on the LOD cloud. In contrast to relational databases, RDF does not provide a fixed, pre-defined schema. Rather, RDF allows for flexibly modeling the data schema by attaching RDF types and properties to the entities. Our schema-level index called SchemEX allows for searching in large-scale RDF graph data. The index can be efficiently computed with reasonable accuracy over large-scale data sets with billions of RDF triples, the smallest information unit on the LOD cloud. SchemEX is highly needed as the size of the LOD cloud quickly increases. Due to the evolution of the LOD cloud, one observes frequent changes of the data. We show that also the data schema changes in terms of combinations of RDF types and properties. As changes cannot capture the dynamics of the LOD cloud, current work includes temporal clustering and finding periodicities in entity dynamics over large-scale snapshots of the LOD cloud with about 100 million triples per week for more than three years.
Knowledge Discovery in Social Media and Scientific Digital LibrariesAnsgar Scherp
The talk presents selected results of our research in the area of text and data mining in social media and scientific literature. (1) First, we consider the area of classifying microblogging postings like tweets on Twitter. Typically, the classification results are evaluated against a gold standard, which is either the hashtags of the tweets’ authors or manual annotations. We claim that there are fundamental differences between these two kinds of gold standard classifications and conducted an experiment with 163 participants to manually classify tweets from ten topics. Our results show that the human annotators are more likely to classify tweets like other human annotators than like the tweets’ authors (i. e., the hashtags). This may influence the evaluation of classification methods like LDA and we argue that researchers should reflect the kind of gold standard used when interpreting their results. (2) Second, we present a framework for semantic document annotation that aims to compare different existing as well as new annotation strategies. For entity detection, we compare semantic taxonomies, trigrams, RAKE, and LDA. For concept activation, we cover a set of statistical, hierarchy-based, and graph-based methods. The strategies are evaluated over 100,000 manually labeled scientific documents from economics, politics, and computer science. (3) Finally, we present a processing pipeline for extracting text of varying size, rotation, color, and emphases from scholarly figures. The pipeline does not need training nor does it make any assumptions about the characteristics of the scholarly figures. We conducted a preliminary evaluation with 121 figures from a broad range of illustration types.
URL: https://www.ukp.tu-darmstadt.de/ukp-home/news-singleview/artikel/guest-speaker-ansgar-scherp/
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
We introduce a framework for automated semantic document annotation that is composed of four processes, namely concept extraction, concept activation, annotation selection, and evaluation. The framework is used to implement and compare different annotation strategies motivated by the literature. For concept extraction, we apply entity detection with semantic hierarchical knowledge bases, Tri-gram, RAKE, and LDA. For concept activation, we compare a set of statistical, hierarchy-based, and graph-based methods. For selecting annotations, we compare top-k as well as kNN. In total, we define 43 different strategies including novel combinations like using graph-based activation with kNN. We have evaluated the strategies using three different datasets of varying size from three scientific disciplines (economics, politics, and computer science) that contain 100, 000 manually labeled documents in total. We obtain the best results on all three datasets by our novel combination of entity detection with graph-based activation (e.g., HITS and Degree) and kNN. For the economic and political science datasets, the best F-measure is .39 and .28, respectively. For the computer science dataset, the maximum F-measure of .33 can be reached. The experiments are the by far largest on scholarly content annotation, which typically are up to a few hundred documents per dataset only.
Gregor Große-Bölting, Chifumi Nishioka, and Ansgar Scherp. 2015. A Comparison of Different Strategies for Automated Semantic Document Annotation. In Proceedings of the 8th International Conference on Knowledge Capture (K-CAP 2015). ACM, New York, NY, USA, , Article 8 , 8 pages. DOI=http://dx.doi.org/10.1145/2815833.2815838
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Ansgar Scherp
We propose a pipeline for text extraction from infographics
that makes use of a novel combination of data mining and computer vision techniques. The pipeline defines a sequence of steps to identify characters, cluster them into text lines, determine their rotation angle, and apply state-of-the-art OCR to recognize the text. In this paper, we formally define the pipeline and present its current implementation. In addition, we have conducted preliminary evaluations over a data corpus of 121 manually annotated infographics from a broad range of illustration types such as bar charts, pie charts, and line charts, maps, and others. We assess the results of our text extraction pipeline by comparing it with two baselines. Finally, we sketch an outline for future work and possibilities for improving the pipeline. - http://ceur-ws.org/Vol-1458/
A Framework for Iterative Signing of Graph Data on the WebAnsgar Scherp
Existing algorithms for signing graph data typically do not cover the whole signing process. In addition, they lack distinctive features such as signing graph data at different levels of granularity, iterative signing of graph data, and signing multiple graphs. In this paper, we introduce a novel framework for signing arbitrary graph data provided, e g., as RDF(S), Named Graphs, or OWL. We conduct an extensive theoretical and empirical analysis of the runtime and space complexity of different framework configurations. The experiments are performed on synthetic and real-world graph data of different size and different number of blank nodes. We investigate security issues, present a trust model, and discuss practical considerations for using our signing framework.
We released a Java-based open source implementation of our software framework for iterative signing of arbitrary graph data provided, e. g., as RDF(S), Named Graphs, or OWL. The software framework is based on a formalization of different graph signing functions and supports different configurations. It is available in source code as well as pre-compiled as .jar-file.
The graph signing framework exhibits the following unique features:
- Signing graphs on different levels of granularity
- Signing multiple graphs at once
- Iterative signing of graph data for provenance tracking
- Independence of the used language for encoding the graph (i. e., the signature does not break when changing the graph representation)
The documentation of the software framework and its source code is available from: http://icp.it-risk.iwvi.uni-koblenz.de/wiki/Software_Framework_for_Signing_Graph_Data
Smart photo selection: interpret gaze as personal interestAnsgar Scherp
Manually selecting subsets of photos from large collections in order to present them to friends or colleagues or to print them as photo books can be a tedious task. Today, fully automatic approaches are at hand for supporting users. They make use of pixel information extracted from the images, analyze contextual information such as capture time and focal aperture, or use both to determine a proper subset of photos. However, these approaches miss the most important factor in the photo selection process: the user. The goal of our approach is to consider individual interests. By recording and analyzing gaze information from the user's viewing photo collections, we obtain information on user's interests and use this information in the creation of personal photo selections. In a controlled experiment with 33 participants, we show that the selections can be significantly improved over a baseline approach by up to 22% when taking individual viewing behavior into account. We also obtained significantly better results for photos taken at an event participants were involved in compared with photos from another event.
Events in Multimedia - Theory, Model, ApplicationAnsgar Scherp
Talk by Ansgar Scherp.
Title: Events in Multimedia - Theory, Model, Application
Event: Workshop on Event-based Media Integration and Processing, ACM Multimedia, 2013
Can you see it? Annotating Image Regions based on Users' Gaze InformationAnsgar Scherp
Presentation on eyetracking-based annotation of image regions that I gave at Vienna on Oct 19, 2012. Download original PowerPoint file to enjoy all animations. For the papers, please refer to: http://www.ansgarscherp.net/publications
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
1. SchemEX – Building an Index
for Linked Open Data
Ansgar Scherp, Thomas Gottron, Mathias Konrath
University of Koblenz-Landau, Germany
Oslo, Norway
August 2012
SchemEX – Building an Index for LOD Slide 1 of 44
2. Learning Goals
• Understand the motivation and
fundamentals of Linked Open Data (LOD).
• Qualify in why an index for LOD is needed
and how to efficiently create such an index.
SchemEX – Building an Index for LOD Slide 2 of 44
3. Scenario
• Tim plans to travel
– from London
– to a customer in Cologne
SchemEX – Building an Index for LOD Slide 3 of 44
4. Website of the German Railway
It works, why bother…?
SchemEX – Building an Index for LOD Slide 4 of 44
5. Let„s Try Different Queries
Bottlenecks in public transportation?
Compare the connections with flights?
Visualize on a map?
…
All these queries cannot be answered,
because the data …
SchemEX – Building an Index for LOD Slide 5 of 44
6. … locked in Silos!
– High Integration Effort
– Lack in Reuse of Data
SchemEX – Building an Index for LOD Slide 6Jagendorf, http://www.flickr.com/photos/bobjagendorf/, CC-BY
B. of 44
7. Linked Data
• Publishing and interlinking of data
• Different quality and purpose
• From different sources in the Web
World Wide Web Linked Data
Documents Data
Hyperlinks Typed Links
HTML RDF
Addresses (URIs) Addresses (URIs)
Example: http://www.uio.no/
SchemEX – Building an Index for LOD Slide 7 of 44
9. Linked Data: May „07 Sept. „11
Web 2.0
Media
Publications
eGovernment
Cross-Domain
Life
Geographic Sciences
SchemEX – Building an Index for LOD
< 31 Billion Triples Slide 9 of 44 Source: http://lod-cloud.net
10. Linked Data Principles
1. Identification
2. Interlinkage
3. Dereferencing
4. Description
SchemEX – Building an Index for LOD Slide 10 of 44
11. Example: Big Lynx
Matt Briggs
Scott Miller
?
Big Lynx
Company
SchemEX – Building an Index
< 31 Milliarde Triple for LOD Slide 11 of 44 Source: http://lod-cloud.net
12. 1. Use URIs for Identification
Matt Briggs
Scott Miller
http://biglynx.co.uk/
people/matt-briggs
http://biglynx.co.uk/
people/scott-miller
SchemEX – Building B. Gazen,http://www.flickr.com/photos/bayat/,12 of 44
an Index for LOD Slide CC-BY
13. Example: Big Lynx
Matt Briggs
Scott Miller
Big Lynx
Company
How to model relationships like knows?
SchemEX – Building an Index for LOD Slide 13 of 44
14. Resource DescriptionFramework (RDF)
• Description of Ressources with RDF triple
Matt Briggs is a Person
Subject Predicate Object
@prefix rdf:<http://w3.org/1999/02/22-rdf-
syntax-ns#> .
@prefix foaf:<http://xmlns.com/foaf/0.1/> .
<http://biglynx.co.uk/people/matt-briggs>
rdf:type foaf:Person .
SchemEX – Building an Index for LOD Slide 14 of 44
15. 1. Use URIs also for Relations
http://biglynx.co.uk/
people/matt-briggs
http://biglynx.co.uk/
people/scott-miller
SchemEX – Building B. Gazen,http://www.flickr.com/photos/bayat/,15 of 44
an Index for LOD Slide CC-BY
16. Example: Big Lynx
Dave Smith
London
„lives here―
Matt Briggs
„same
Scott Miller
Big Lynx
… person―
Company
DBpedia Matt Briggs
Matts private
Webseite
SchemEX – Building an Index for LOD Slide 16 of 44
17. 2. Establishing Interlinkage
• Relation links between ressources
<http://biglynx.co.uk/people/dave-smith>
foaf:based_near
<http://dbpedia.org/resource/London> .
Identity links between ressources
<http://biglynx.co.uk/people/matt-briggs>
owl:sameAs
<http://www.matt-briggs.eg.uk#me> .
SchemEX – Building an Index for LOD Slide 17 of 44
18. Example: Big Lynx
Dave Smith
London
„lives here―
foaf:based_near
Matt Briggs
„same
owl:sameAs
Person― Big Lynx
Company
DBpedia Matt Briggs
Matts private
Webseite
SchemEX – Building an Index for LOD Slide 18 of 44
19. 3. Dereferencing of URIs
• Looking up of web documents
• How can we ―look up‖ things of the real world?
http://biglynx.co.uk/
people/matt-briggs
SchemEX – Building an Index for LOD Slide 19 of 44
20. Two Approaches
1. Hash URIs
– URI contains a part separated by #, e.g.,
http://biglynx.co.uk/vocab/sme#Team
2. Negotiation via „303 See Other― request
http://biglynx.co.uk/people/matt-briggs
Response: „Look here:―
http://biglynx.co.uk/people/matt-briggs.rdf
SchemEX – Building an Index for LOD Slide 20 of 44
21. Example: Big Lynx
Dave Smith
London
foaf:based_near
Description of
Matt Briggs
Matt?
owl:sameAs
Big Lynx
Company
DBpedia Matt Briggs
Matts private
Webseite
SchemEX – Building an Index for LOD Slide 21 of 44
22. 4. Description of URIs
foaf:Person …
… dp:Birmingham
rdf:type
foaf:based_near …
biglynx:matt-briggs ex:loc
_:point
foaf:knows
wgs84:
wgs84: long
biglynx:dave-smith
lat
―-0.118‖
foaf:based_near
―51.509‖
dp:London
… …
SchemEX – Building an Index for LOD Slide 22 of 44
23. RDF / RDF Schema Vocabulary
• Set of URIs defined in rdf:/rdfs: namespace
• rdf:type • rdfs:domain
• rdf:Property • rdfs:range
• rdf:XMLLiteral • rdfs:Resource
• rdf:List • rdfs:Literal
• rdf:first • rdfs:Datatype
• rdf:rest • rdfs:Class
• rdf:Seq • rdfs:subClassOf
• rdf:Bag • rdfs:subPropertyOf
• rdf:Alt • rdfs:comment
• ... • …
• rdf:value • rdfs:label
SchemEX – Building an Index for LOD Slide 23 of 44
24. Semantic Web Layer Cake (Simplified)
SchemEX – Building an Index for LOD Slide 24 of 44
25. Learning Goals
• Understand the motivation and
fundamentals of Linked Open Data (LOD).
• Qualify in why an index for LOD is needed
and how to efficiently create such an index.
SchemEX – Building an Index for LOD Slide 25 of 44
26. Scenario
• People who are politicians and actors
• Who else?
• Where do they live?
• Whom do they know? …are they married with?
SchemEX – Building an Index for LOD Slide 26 of 44
27. Problem
• No single federated query interface provided
• Execute those queries on the LOD cloud
SELECT ?x
FROM …
WHERE {
?x rdf:type ex:Actor .
?x rdf:type ex:Politician .
}
“politicians
and actors”
SchemEX – Building an Index for LOD Slide 27 of 44
28. Principle Solution
• Suitable index structure for looking up sources
“politicians
and actors”
SchemEX – Building an Index for LOD Slide 28 of 44
29. The Naive Approach
1. Download the entire LOD cloud
2. Put it into a (really) large triple store
3. Process the data and extract schema
4. Provide lookup
- Big machinery
- Late in processing the data
- High effort to scale with LOD cloud
SchemEX – Building an Index for LOD Slide 29 of 44
30. Idea
Schema-level index
Define families of graph patterns
Assign instances to graph patterns
Map graph patterns to context (source URI)
Construction
Stream-based for scalability
Little loss of accuracy
Note
Index defined over instances
But stores the context
SchemEX – Building an Index for LOD Slide 30 of 44
31. Input Data
n-Quads
<subject> <predicate> <object> <context>
Example:
<http://www.w3.org/People/Connolly/#me>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#
<http://xmlns.com/foaf/0.1/Person>
<http://dig.csail.mit.edu/2008/webdav/timbl/
http://dig.csail.mit.edu/2008/
webdav/timbl/foaf.rdf
w3p:
#me
foaf:
Person
SchemEX – Building an Index for LOD Slide 31 of 44
32. Building the Schema and Index
RDF
C1 C2 C3 … Ck
classes
consistsOf
Type
TC1 TC2 … TCm clusters
hasEQ
Class p1 p2
EQC1 EQC2 … EQCn Equivalence
classes
hasDataSource
… Data
DS1 DS2 DS3 DS4 DS5 DSx sources
SchemEX – Building an Index for LOD Slide 32 of 44
33. Layer 1: RDF Classes
All instances of a C1
particular type
DS 1 DS 2 DS 3
SELECT ?x
FROM …
WHERE {
?x rdfs:type foaf:Person .
foaf:Person
}
http://dig.csail.mit.edu/2008/...
foaf:
timbl: Person
card#i http://www.w3.org/People/Berners-Lee/card
SchemEX – Building an Index for LOD Slide 33 of 44
34. Layer 2: Type Clusters
All instances belonging C1 C2
to exactly the same set
TC1
of types
SELECT ?x DS 1 DS 2 DS 3
FROM …
WHERE {
foaf:Person pim:Male
?x rdfs:type foaf:Person .
?x rdfs:type pim:Male . tc4711
}
pim:
Male
http://www.w3.org/People/Berners-Lee/card
foaf:
timbl:
Person
card#i
SchemEX – Building an Index for LOD Slide 34 of 44
35. Layer 3: Equivalence Classes
Two instances are C1 C2 C3
equivalent iff:
They are in the same TC TC1 TC2
They have the same p
properties
EQC1
The property targets are
in the same TC DS 1 DS 2 DS 3
Similar to 1-Bisimulation
SchemEX – Building an Index for LOD Slide 35 of 44
36. Layer 3: Equivalence Classes
SELECT ?x
WHERE {
?x rdfs:type foaf:Person foaf:Person
.
?x rdfs:type pim:Male . pim:Male foaf:PPD
?x foaf:maker ?y .
?y rdfs:type
foaf:PersonalProfileDocument .
tc4711 tc1234
} eqc0815
-maker-
pim: foaf: foaf: tc1234
Male Person PPD
eqc0815
foaf:maker
timbl: http://www.w3.org/People/Berners-Lee/card
timbl: card
card#i
SchemEX – Building an Index for LOD Slide 36 of 44
37. The SchemEX Approach
• Stream-based schema extraction
• While crawling the data
FIFO
LOD-Crawler Instance-
RDF-Dump Cache RDF
Triple Store RDBMS
NxParser
Nquad- Schema- Schema-
Parser
Stream Extractor Level
Index
SchemEX – Building an Index for LOD Slide 37 of 44
38. Building the Index from a Stream
Stream of n-quads (coming from a LD crawler)
… Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1
FiFo
1
C3 4
6
C2 3
4
2
C2 2
1 3
C1 5
• Linear runtime complexity wrt # of input triples
SchemEX – Building an Index for LOD Slide 38 of 44
39. Computing SchemEX: TimBL Data Set
• Analysis of a smaller data set
• 11 M triples, TimBL’s FOAF profile
• LDspider with ~ 2k triples / sec
• Different cache sizes: 100, 1k, 10k, 50k, 100k
• Compared SchemEX with reference schema
• Index queries on all Types, TCs, EQCs
• Good precision/recall ratio at 50k+
SchemEX – Building an Index for LOD Slide 39 of 44
40. Quality of Stream-based Index
Construction
• Runtime increases hardly with window size
• Memory consumption scales with window size
SchemEX – Building an Index for LOD Slide 40 of 44
41. Computing SchemEX: Full BTC 2011 Data
Cache size: 50 k
SchemEX – Building an Index for LOD Slide 41 of 44
43. Conclusions: SchemEX
• Linked Open Data (LOD) approach
• Publishing and interlinking data on the web
• SchemEX
• Stream-based approach to LOD schema
extraction
• Scalable to arbitrary amount of Linked Data
• Applicable on commodity hardware
(4GB RAM, single CPU)
SchemEX – Building an Index for LOD Slide 43 of 44
44. Learning Goals
• Understand the motivation and
fundamentals of Linked Open Data (LOD).
• Qualify in why an index for LOD is needed
and how to efficiently create such an index.
SchemEX – Building an Index for LOD Slide 44 of 44
45. Recommended Readings
• Maciej Janik, Ansgar Scherp, Steffen Staab: The Semantic Web:
Collective Intelligence on the Web. Informatik Spektrum 34(5): 469-483
(2011)
URL: http://dx.doi.org/10.1007/s00287-011-0535-x
• Mathias Konrath, Thomas Gottron, Steffen Staab, Ansgar Scherp:
SchemEX — Efficient construction of a data catalogue by stream-based
indexing of linked data, J. of Web Semantics: Science, Services and
Agents on the World Wide Web, Available online 23 June 2012
URL: http://www.sciencedirect.com/science/article/pii/S1570826812000716
• Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global
Data Space, Morgan & Claypool Publishers, 2011
URL: http://dx.doi.org/10.2200/S00334ED1V01Y201102WBE001
SchemEX – Building an Index for LOD Slide 45 of 44