The document presents a methodology for crowdsourcing the assessment of Linked Data quality. The methodology involves a two stage process - a find stage using Linked Data experts to identify potential quality issues, and a verify stage using microtasks on Amazon Mechanical Turk to validate the issues. The study assesses three types of quality problems in DBpedia through this methodology and analyzes the results in terms of precision. The findings indicate that crowdsourcing is effective for detecting certain quality issues, and that the expertise of Linked Data experts is best for domain-specific tasks while microtask workers perform well on data comparison tasks. The conclusions discuss integrating crowdsourcing into Linked Data curation processes and conducting further experiments.
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
The Web of Data: do we actually understand what we built?Frank van Harmelen
Despite its obvious success (largest knowledge base ever built, used in practice by companies and governments alike), we actually understand very little of the structure of the Web of Data. Its formal meaning is specified in logic, but with its scale, context dependency and dynamics, the Web of Data has outgrown its traditional model-theoretic semantics.
Is the meaning of a logical statement (an edge in the graph) dependent on the cluster ("context") in which it appears? Does a more densely connected concept (node) contain more information? Is the path length between two nodes related to their semantic distance?
Properties such as clustering, connectivity and path length are not described, much less explained by model-theoretic semantics. Do such properties contribute to the meaning of a knowledge graph?
To properly understand the structure and meaning of knowledge graphs, we should no longer treat knowledge graphs as (only) a set of logical statements, but treat them properly as a graph. But how to do this is far from clear.
In this talk, I report on some of our early results on some of these questions, but I ask many more questions for which we don't have answers yet.
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
Data science remains a high-touch activity, especially in life, physical, and social sciences. Data management and manipulation tasks consume too much bandwidth: Specialized tools and technologies are difficult to use together, issues of scale persist despite the Cambrian explosion of big data systems, and public data sources (including the scientific literature itself) suffer curation and quality problems.
Together, these problems motivate a research agenda around “human-data interaction:” understanding and optimizing how people use and share quantitative information.
I’ll describe some of our ongoing work in this area at the University of Washington eScience Institute.
In the context of the Myria project, we're building a big data "polystore" system that can hide the idiosyncrasies of specialized systems behind a common interface without sacrificing performance. In scientific data curation, we are automatically correcting metadata errors in public data repositories with cooperative machine learning approaches. In the Viziometrics project, we are mining patterns of visual information in the scientific literature using machine vision, machine learning, and graph analytics. In the VizDeck and Voyager projects, we are developing automatic visualization recommendation techniques. In graph analytics, we are working on parallelizing best-of-breed graph clustering algorithms to handle multi-billion-edge graphs.
The common thread in these projects is the goal of democratizing data science techniques, especially in the sciences.
There are high expectations for Linked Government Data—the practice of publishing public sector information on the Web using Linked Data formats. This slideset reviews some of the ongoing work in the US, UK, and within W3C, as well as activities within my institute (DERI, National University of Ireland, Galway).
Content + Signals: The value of the entire data estate for machine learningPaul Groth
Content-centric organizations have increasingly recognized the value of their material for analytics and decision support systems based on machine learning. However, as anyone involved in machine learning projects will tell you the difficulty is not in the provision of the content itself but in the production of annotations necessary to make use of that content for ML. The transformation of content into training data often requires manual human annotation. This is expensive particularly when the nature of the content requires subject matter experts to be involved.
In this talk, I highlight emerging approaches to tackling this challenge using what's known as weak supervision - using other signals to help annotate data. I discuss how content companies often overlook resources that they have in-house to provide these signals. I aim to show how looking at a data estate in terms of signals can amplify its value for artificial intelligence.
The need for a transparent data supply chainPaul Groth
Illustrating data supply chains and motivating the need for a more transparent data supply chain in the context of responsible data science. Presented at the 2018 KNAW-Royal Society bilateral meeting on responsible data science.
Deep neural networks for matching online social networking profilesTraian Rebedea
> Proposed a large dataset for matching online social networking profiles
›This allowed us to train a deep neural network for profile matching using both domain-specific features and word embeddings generated from textual descriptions from social profiles
›Experiments showed that the NN surpassed both unsupervised and supervised models, achieving a high precision (P = 0.95) with a good recall rate (R = 0.85)
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingMaribel Acosta Deibe
Best Student Paper Award at the 8th International Conference on Knowledge Capture (K-CAP 2015).
http://tinyurl.com/hare-paper
Abstract:
Due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against RDF. To overcome this limitation, we present HARE, a novel hybrid query processing engine that brings together machine and human computation to execute SPARQL queries. We propose a model that exploits the characteristics of RDF in order to estimate the complete- ness of portions of a data set. The completeness model complemented by crowd knowledge is used by the HARE query engine to on-the-fly decide which parts of a query should be executed against the data set or via crowd computing. To evaluate HARE, we created and executed a collection of 50 SPARQL queries against the DBpedia data set. Experimental results clearly show that our solution accurately enhances answer completeness.
(The HARE logo is based on the art work by icons8 (https://icons8.com/)
Conference Live: Accessible and Sociable Conference Semantic DataAnna Lisa Gentile
In this paper we describe Conference Live, a semantic Web application to browse conference data. Conference Live is a Web and mobile application based on conference data from the Semantic Web Dog Food server, which provides facilities to browse papers and authors at a specific conference. Available data for the specific conference is enriched with social features (e.g. integrated Twitter accounts of paper authors), scheduling features (calendar information are attached for paper presentations and social events), the possibility to check and add feedback to each paper and to vote for papers, if the conference includes sessions where participants can vote, as it is popular e.g. for poster sessions. As use case we report on the usage of the application at the Extended Semantic Web Conference (ESWC) in May 2014.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
The Web of Data: do we actually understand what we built?Frank van Harmelen
Despite its obvious success (largest knowledge base ever built, used in practice by companies and governments alike), we actually understand very little of the structure of the Web of Data. Its formal meaning is specified in logic, but with its scale, context dependency and dynamics, the Web of Data has outgrown its traditional model-theoretic semantics.
Is the meaning of a logical statement (an edge in the graph) dependent on the cluster ("context") in which it appears? Does a more densely connected concept (node) contain more information? Is the path length between two nodes related to their semantic distance?
Properties such as clustering, connectivity and path length are not described, much less explained by model-theoretic semantics. Do such properties contribute to the meaning of a knowledge graph?
To properly understand the structure and meaning of knowledge graphs, we should no longer treat knowledge graphs as (only) a set of logical statements, but treat them properly as a graph. But how to do this is far from clear.
In this talk, I report on some of our early results on some of these questions, but I ask many more questions for which we don't have answers yet.
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
Data science remains a high-touch activity, especially in life, physical, and social sciences. Data management and manipulation tasks consume too much bandwidth: Specialized tools and technologies are difficult to use together, issues of scale persist despite the Cambrian explosion of big data systems, and public data sources (including the scientific literature itself) suffer curation and quality problems.
Together, these problems motivate a research agenda around “human-data interaction:” understanding and optimizing how people use and share quantitative information.
I’ll describe some of our ongoing work in this area at the University of Washington eScience Institute.
In the context of the Myria project, we're building a big data "polystore" system that can hide the idiosyncrasies of specialized systems behind a common interface without sacrificing performance. In scientific data curation, we are automatically correcting metadata errors in public data repositories with cooperative machine learning approaches. In the Viziometrics project, we are mining patterns of visual information in the scientific literature using machine vision, machine learning, and graph analytics. In the VizDeck and Voyager projects, we are developing automatic visualization recommendation techniques. In graph analytics, we are working on parallelizing best-of-breed graph clustering algorithms to handle multi-billion-edge graphs.
The common thread in these projects is the goal of democratizing data science techniques, especially in the sciences.
There are high expectations for Linked Government Data—the practice of publishing public sector information on the Web using Linked Data formats. This slideset reviews some of the ongoing work in the US, UK, and within W3C, as well as activities within my institute (DERI, National University of Ireland, Galway).
Content + Signals: The value of the entire data estate for machine learningPaul Groth
Content-centric organizations have increasingly recognized the value of their material for analytics and decision support systems based on machine learning. However, as anyone involved in machine learning projects will tell you the difficulty is not in the provision of the content itself but in the production of annotations necessary to make use of that content for ML. The transformation of content into training data often requires manual human annotation. This is expensive particularly when the nature of the content requires subject matter experts to be involved.
In this talk, I highlight emerging approaches to tackling this challenge using what's known as weak supervision - using other signals to help annotate data. I discuss how content companies often overlook resources that they have in-house to provide these signals. I aim to show how looking at a data estate in terms of signals can amplify its value for artificial intelligence.
The need for a transparent data supply chainPaul Groth
Illustrating data supply chains and motivating the need for a more transparent data supply chain in the context of responsible data science. Presented at the 2018 KNAW-Royal Society bilateral meeting on responsible data science.
Deep neural networks for matching online social networking profilesTraian Rebedea
> Proposed a large dataset for matching online social networking profiles
›This allowed us to train a deep neural network for profile matching using both domain-specific features and word embeddings generated from textual descriptions from social profiles
›Experiments showed that the NN surpassed both unsupervised and supervised models, achieving a high precision (P = 0.95) with a good recall rate (R = 0.85)
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingMaribel Acosta Deibe
Best Student Paper Award at the 8th International Conference on Knowledge Capture (K-CAP 2015).
http://tinyurl.com/hare-paper
Abstract:
Due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against RDF. To overcome this limitation, we present HARE, a novel hybrid query processing engine that brings together machine and human computation to execute SPARQL queries. We propose a model that exploits the characteristics of RDF in order to estimate the complete- ness of portions of a data set. The completeness model complemented by crowd knowledge is used by the HARE query engine to on-the-fly decide which parts of a query should be executed against the data set or via crowd computing. To evaluate HARE, we created and executed a collection of 50 SPARQL queries against the DBpedia data set. Experimental results clearly show that our solution accurately enhances answer completeness.
(The HARE logo is based on the art work by icons8 (https://icons8.com/)
Conference Live: Accessible and Sociable Conference Semantic DataAnna Lisa Gentile
In this paper we describe Conference Live, a semantic Web application to browse conference data. Conference Live is a Web and mobile application based on conference data from the Semantic Web Dog Food server, which provides facilities to browse papers and authors at a specific conference. Available data for the specific conference is enriched with social features (e.g. integrated Twitter accounts of paper authors), scheduling features (calendar information are attached for paper presentations and social events), the possibility to check and add feedback to each paper and to vote for papers, if the conference includes sessions where participants can vote, as it is popular e.g. for poster sessions. As use case we report on the usage of the application at the Extended Semantic Web Conference (ESWC) in May 2014.
Twitter: @crowdsem, #crowdsem2013
1st International Workshop on “Crowdsourcing the Semantic Web” in conjunction with the 12th Interantional Seamntic Web Conference (ISWC 2013), 21-25 October 2013, in Sydney, Australia. This interactive workshop takes stock of the emergent work and chart the research agenda with interactive sessions to brainstorm ideas and potential applications of collective intelligence to solving AI hard semantic web problems.
Semantic Data Management in Graph Databases: ESWC 2014 TutorialMaribel Acosta Deibe
In this tutorial we present the basis of graph database frameworks and their applicability in semantic data management. The tutorial targets any conference attendee interested in learning about the current graph-based limited capabilities of existing RDF engines, existing graph database techniques, and extensions to RDF data management approaches in order to provide an efficient graph-based access to linked data.
Linked Data Quality Assessment – daQ and Luzzujerdeb
Presentation at the Ontology Engineering Group at UPM related to Linked Data Quality and the work done in the Enterprise Information System Group at Universität Bonn
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
This webinar features project overviews of all EarthCube Awards (Building Blocks, Research Coordination Networks, Conceptual Designs, and Test Governance), followed by a call for involvement, and a Q&A session.
Agenda:
EarthCube Awards – Project Overviews
1.. EarthCube Web Services (Building Block)
2. EC3: Earth-Centered Community for Cyberinfrastructure (RCN)
3. GeoSoft (Building Block)
4. Specifying and Implementing ODSIP (Building Block)
5. A Broker Framework for Next Generation Geoscience (BCube) (Building Block)
6. Integrating Discrete and Continuous Data (Building Block)
7. EAGER: Collaborative Research (Building Block)
8. A Cognitive Computer Infrastructure for Geoscience (Building Block)
9. Earth System Bridge (Building Block)
10. CINERGI – Community Inventory of EC Resources for Geoscience Interoperability (BB)
11. Building a Sediment Experimentalist Network (RCN)
12. C4P: Collaboration and Cyberinfrastructure for Paleogeosciences (RCN)
13. Developing a Data-Oriented Human-centric Enterprise for Architecture (CD)
14. Enterprise Architecture for Transformative Research and Collaboration (CD)
15. EC Test Enterprise Governance: An Agile Approach (Test Governance)
A Call for Involvement!
RDF and graph databases are steadily increasing their adoption and are no longer choices of niche-only communities. For almost 20 years, a constraint language for RDF was a big missing piece in the technology stack and a prohibiting factor for further adoption.
Even though most RDF-based systems were performing data validation and quality assessment, there was no standardized way to define constraints. People were using ad-hoc solutions or schemas and languages that were not meant for validation.
Thankfully, since 2017 there are 2 additions to the RDF technology stack: SHACL & ShEx. Both provide a high level RDF constraint language that people can use to define data constraints (a.k.a. Shapes), each with different strengths.
This talk provides an outline of different types of RDF data quality issues and existing approaches to quality assessment. The goal is to give an overview of the existing RDF validation landscape and hopefully, inspire people on how to improve their RDF publishing workflows.
- What is Clustering, Honeypots and Density Based Clustering?
- What is Optics Clustering and how is it different than DB Clustering? …and how
can it be used for outlier detection.
- What is so-called soft clustering and how is it different than clustering? …and how
can it be used for outlier detection.
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
This is a presentation we perform internally every quarter as part of our Data Science Brown Bag Series. This presentation was talking about different types of soft clustering techniques - all of which the team currently performs depending on the complexity of the data and the complexity of customer problems. If you are interested in learning more about working with L-3 Data Tactics or interested in working for the L-3 Data Tactics Data Science team please contact us soon! Thank you.
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
Carole Goble is a professor in the school of computer science at the University of Manchester.
In this keynote, Carole offered her insights into research data management and data centres.
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
Keynote at JISC Digifest 2015 on Reproducibility and Research Objects in Scholarly Communication
Includes hidden slides
All material except maybe the IT Crowd screengrab reusable
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
The release of the Data Cube Vocabulary specification introduces a standardised method for publishing statistics following the linked data principles. However, a statistical dataset can be very complex, and so understanding how to get value out of it may be hard. Analysts need the ability to quickly grasp the content of the data to be able to make use of it appropriately. In addition, while remodelling the data, data cube publishers need support to detect bugs and issues in the structure or content of the dataset. There are several aspects of RDF, the Data Cube vocabulary and linked data that can help with these issues however, including that they make the data "self-descriptive". Here, we attempt to answer the question "How feasible is it to use this feature to give an overview of the data in a way that would facilitate debugging and exploration of statistical linked open data?" We present a tool that automatically builds interactive facets as diagrams out of a Data Cube representation without prior knowledge of the data content to be used for debugging and early analysis. We show how this tool can be used on a large, complex dataset and we discuss the potential of this approach.
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyMaribel Acosta Deibe
Summary of crowdsourcing studies to assess the quality of knowledge graphs and complete missing values. Results focus on findings over the DBpedia knowledge graph ( https://wiki.dbpedia.org/).
Related publications:
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., & Lehmann, J. Crowdsourcing Linked Data Quality Assessment. In International Semantic Web Conference (pp. 260-276), 2013.
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., & Lehmann, J. Detecting Linked Data Quality issues via Crowdsourcing: A DBpedia Study. Semantic Web Journal, 9(3), 303-335, 2018.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: A hybrid SPARQL engine to enhance query answers via crowdsourcing. In Proceedings of the 8th International Conference on Knowledge Capture (p. 11). 2015. Best Student Paper Award.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. Enhancing answer completeness of SPARQL queries via crowdsourcing. Journal of Web Semantics, 45, 41-62, 2017.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: An engine for enhancing answer completeness of SPARQL queries via crowdsourcing. Companion Volume of the Web Conference (pp. 501-505). 2018.
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Crowdsourcing Linked Data Quality Assessment
1. Crowdsourcing Linked Data Quality Assessment
Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer
and Jens Lehmann
@ISWC2013
KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association
www.kit.edu
2. Motivation
Varying quality of Linked Data sources
Some quality issues require certain interpretation
that can be easily performed by humans
dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.
Solution: Include human verification in the
process of LD quality assessment
Direct application: Detecting pattern in errors
may allow to identify (and correct) the extraction
mechanisms
3
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
3. Research questions
RQ1: Is it possible to detect quality issues in LD data sets
via crowdsourcing mechanisms?
RQ2: What type of crowd is most suitable for each type of
quality issue?
RQ3: Which types of errors are made by lay users and
experts when assessing RDF triples?
4
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
4. Related work
DBpedia
Assessing LD
mappings
ZenCrowd
Entity resolution
(Automatic)
Crowdsourcing
& Linked Data
CrowdMAP
Ontology allignment
Web of data
quality
assessment
Quality
characteristics of
LD data sources
(Semi-automatic)
WIQA, Sieve,
(Manual)
GWAP for LD
Our work
5
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
5. OUR APPROACH
6
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
6. Methodology
2
1
Correct
{s p o .}
Dataset
{s p o .}
3
Incorrect +
Quality issue
Steps to implement the methodology
1
2
Selecting the appropriate crowdsourcing approaches
3
7
Selecting LD quality issues to crowdsource
Designing and generating the interfaces to present the data
to the crowd
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
7. 1
Selecting LD quality issues
to crowdsource
Three categories of quality problems occur
in DBpedia [Zaveri2013] and can be crowdsourced:
Incorrect object
Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.
Incorrect data type or language tags
Example: dbpedia:Torishima_Izu_Islands foaf:name “
”@en.
Incorrect link to “external Web pages”
Example: dbpedia:John-Two-Hawks dbpedia-owl:wikiPageExternalLink
<http://cedarlakedvd.com/>
8
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
8. 2
Selecting appropriate
crowdsourcing approaches (1)
Find
Verify
Contest
Microtasks
LD Experts
Difficult task
Final prize
Workers
Easy task
Micropayments
TripleCheckMate
[Kontoskostas2013]
MTurk
http://mturk.com
Adapted from [Bernstein2010]
9
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
9. 3
Presenting the data to the crowd
Microtask interfaces: MTurk tasks
Incorrect object
• Selection of foaf:name or
rdfs:label to extract humanreadable descriptions
• Values extracted automatically
from Wikipedia infoboxes
• Link to the Wikipedia article via
foaf:isPrimaryTopicOf
Incorrect data type or language tag
Incorrect outlink
• Preview of external pages by
implementing HTML iframe
10
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
10. EXPERIMENTAL STUDY
11
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
11. Experimental design
• Crowdsourcing approaches:
• Find stage: Contest with LD experts
• Verify stage: Microtasks (5 assignments)
• Creation of a gold standard:
• Two of the authors of this paper (MA, AZ) generated the gold
standard for all the triples obtained from the contest
• Each author independently evaluated the triples
• Conflicts were resolved via mutual agreement
• Metric: precision
12
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
12. Overall results
LD Experts
Number of distinct
participants
Total time
Total triples evaluated
Total cost
13
28.10.2013
Microtask workers
50
80
3 weeks (predefined)
4 days
1,512
1,073
~ US$ 400 (predefined)
~ US$ 43
Maribel Acosta - Identifying DBpedia Quality Issues via Crowdsourcing
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
13. Precision results: Incorrect object task
• MTurk workers can be used to reduce the error rates of LD experts for
the Find stage
Triples compared
LD Experts
MTurk
(majority voting: n=5)
509
0.7151
0.8977
• 117 DBpedia triples had predicates related to dates with
incorrect/incomplete values:
”2005 Six Nations Championship” Date 12 .
• 52 DBpedia triples had erroneous values from the source:
”English (programming language)” Influenced by ? .
•
•
14
Experts classified all these triples as incorrect
Workers compared values against Wikipedia and successfully classified this
triples as “correct”
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
14. Precision results: Incorrect data type task
Triples compared
LD Experts
MTurk
(majority voting: n=5)
341
0.8270
0.4752
Number of triples
140
Experts TP
120
Experts FP
100
Crowd TP
80
Crowd FP
60
40
20
0
Date
English Millimetre
Nanometre
Number
Number
with
decimals
Data types
15
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Second
Volt
Year
Not
specified /
URI
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
15. Precision results: Incorrect link task
Triples compared
Baseline
LD Experts
MTurk
(n=5 majority voting)
223
0.2598
0.1525
0.9412
• We analyzed the 189 misclassifications by the experts:
11%
39%
Freebase links
50%
Wikipedia images
External links
• The 6% misclassifications by the workers correspond to
pages with a language different from English.
16
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
16. Final discussion
RQ1: Is it possible to detect quality issues in LD data sets via
crowdsourcing mechanisms?
Both forms of crowdsourcing can be applied to detect certain
LD quality issues
RQ2: What type of crowd is most suitable for each type of quality issue?
The effort of LD experts must be applied on those tasks
demanding specific-domain skills. MTurk crowd was
exceptionally good at performing data comparisons
RQ3: Which types of errors are made by lay users and experts?
Lay users do not have the skills to solve domain-specific
tasks, while experts performance is very low on tasks that
demand an extra effort (e.g., checking an external page)
17
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
17. CONCLUSIONS & FUTURE WORK
18
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
18. Conclusions & Future Work
A crowdsourcing methodology for LD quality assessment:
Find stage: LD experts
Verify stage: MTurk workers
Crowdsourcing approaches are feasible in detecting the
studied quality issues
Application: Detecting pattern in errors to fix the extraction
mechanisms
Future Work
Conducting new experiments (other quality issues and domains)
Integration of the crowd into curation processes and tools
19
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
19. References & Acknowledgements
[Bernstein2010]
M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R.
Karger, D. Crowell, and K. Panovich. Soylent: a word processor with a crowd
inside. In Proceedings of the 23nd annual ACM symposium on User interface
software and technology, UIST ’10, pages 313–322, New
York, NY, USA, 2010. ACM.
[Kontoskostas2013]
D Kontokostas, A Zaveri, S Auer, J Lehmann. TripleCheckMate: A Tool for
Crowdsourcing the Quality Assessment of Linked Data . Knowledge
Engineering and the Semantic Web, 2013
[Zaveri2013]
A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer.
Quality as- sessment methodologies for linked open data. Under
review, http://www.semantic-web-journal.net/content/quality-assessmentmethodologies-linked-open-data.
20
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
20. Approach
MTurk tasks
Incorrect object
Verify
Find
Contest
Microtasks
LD Experts
Difficult task
Final prize
Workers
Easy task
Micropayments
TripleCheckMate
Incorrect data type
MTurk
Incorrect outlink
Results: Precision
Object
values
Data types
Interlinks
Linked Data
experts
0.7151
0.8270
0.1525
MTurk
0.8977
0.4752
0.9412
(majority voting)
21
28.10.2013
Acosta et al. – Crowdsourcing Linked Data Quality Assessment
QUESTIONS?
Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
Editor's Notes
As we know, the Linking Open Data cloud is a great source of data. However, the varying quality of Linked Data sets often imposes serious problems to developers aiming to consume and integrate LD in their applications.Keeping aside the factual flaws of the original sources, several quality issues are introduced during the RDFication process. Solution: Include human verification in the process of LD quality assessment in order to detect the quality issues that cannot be easily detected by other meansDirect application: Detecting patterns in errors may allow to identify (and correct) the extraction mechanisms in order
TP = a triple that is identified as “incorrect” by the crowd, and the triple is indeed incorrectFP = a triple identified as “incorrect” by the crowd, but was actually correct in the data set