The document discusses link traversal based query execution for querying linked data on the web. It describes an approach that alternates between evaluating parts of a query on a continuously augmented local dataset, and looking up URIs in solutions to retrieve more data and add it to the local dataset. This allows querying linked data as if it were a single large database, without needing to know all data sources in advance. A key issue is how to efficiently cache retrieved data to avoid redundant lookups.
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficientlyijsrd.com
Efficient and effective full-text retrieval in unstructured peer-to-peer networks remains a challenge in the research community. First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of item replicas across the wide area network, incurring a large amount of communication and storage costs. In this paper, we propose BloomCast, an efficient and effective full-text retrieval scheme, in unstructured P2P networks. By leveraging a hybrid P2P protocol, BloomCast replicates the items uniformly at random across the P2P networks, achieving a guaranteed recall at a communication cost of O (N), where N is the size of the network. Furthermore, by casting Bloom Filters instead of the raw documents across the network, BloomCast significantly reduces the communication and storage costs for replication. Results show that BloomCast achieves an average query recall, which outperforms the existing WP algorithm by 18 percent, while BloomCast greatly reduces the search latency for query processing by 57 percent.
Abstract:
An increasing number of applications rely on RDF, OWL 2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL 2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationOlaf Hartig
These are the slides of my invited talk at the 5th Int. Workshop on Usage Analysis and the Web of Data (USEWOD 2015): http://usewod.org/usewod2015.html
The abstract of this talks is given as follows:
To reduce user-perceived response time many interactive Web applications visualize information in a dynamic, incremental manner. Such an incremental presentation can be particularly effective for cases in which the underlying data processing systems are not capable of completely answering the users' information needs instantaneously. An example of such systems are systems that support live querying of the Web of Data, in which case query execution times of several seconds, or even minutes, are an inherent consequence of these systems' ability to guarantee up-to-date results. However, support for an incremental result visualization has not received much attention in existing work on such systems. Therefore, the goal of this talk is to discuss approaches that enable query systems for the Web of Data to return query results incrementally.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Template-based information access, in which templates are constructed for keywords, is a recent development of linked data information retrieval. However, most such approaches suffer from ineffective template management. Because linked data has a structured data representation, we assume the data’s inside statistics can effectively influence template management. In this work, we use this influence for template
creation, template ranking, and scaling. Our proposal can effectively be used for automatic linked data information retrieval and can be incorporated with other techniques such as ontology inclusion and sophisticated matching to further improve performance.
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficientlyijsrd.com
Efficient and effective full-text retrieval in unstructured peer-to-peer networks remains a challenge in the research community. First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of item replicas across the wide area network, incurring a large amount of communication and storage costs. In this paper, we propose BloomCast, an efficient and effective full-text retrieval scheme, in unstructured P2P networks. By leveraging a hybrid P2P protocol, BloomCast replicates the items uniformly at random across the P2P networks, achieving a guaranteed recall at a communication cost of O (N), where N is the size of the network. Furthermore, by casting Bloom Filters instead of the raw documents across the network, BloomCast significantly reduces the communication and storage costs for replication. Results show that BloomCast achieves an average query recall, which outperforms the existing WP algorithm by 18 percent, while BloomCast greatly reduces the search latency for query processing by 57 percent.
Abstract:
An increasing number of applications rely on RDF, OWL 2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL 2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationOlaf Hartig
These are the slides of my invited talk at the 5th Int. Workshop on Usage Analysis and the Web of Data (USEWOD 2015): http://usewod.org/usewod2015.html
The abstract of this talks is given as follows:
To reduce user-perceived response time many interactive Web applications visualize information in a dynamic, incremental manner. Such an incremental presentation can be particularly effective for cases in which the underlying data processing systems are not capable of completely answering the users' information needs instantaneously. An example of such systems are systems that support live querying of the Web of Data, in which case query execution times of several seconds, or even minutes, are an inherent consequence of these systems' ability to guarantee up-to-date results. However, support for an incremental result visualization has not received much attention in existing work on such systems. Therefore, the goal of this talk is to discuss approaches that enable query systems for the Web of Data to return query results incrementally.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Template-based information access, in which templates are constructed for keywords, is a recent development of linked data information retrieval. However, most such approaches suffer from ineffective template management. Because linked data has a structured data representation, we assume the data’s inside statistics can effectively influence template management. In this work, we use this influence for template
creation, template ranking, and scaling. Our proposal can effectively be used for automatic linked data information retrieval and can be incorporated with other techniques such as ontology inclusion and sophisticated matching to further improve performance.
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
The increasing adoption of Linked Data principles has led
to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profile consists of structured dataset metadata describing topics and their relevance. Profiles are generated through the configuration of techniques for resource sampling from datasets, topic extraction from reference datasets and their ranking based on graphical models. To enable a good trade-off between scalability and accuracy of generated profiles, appropriate parameters are determined experimentally. Our evaluation considers topic profiles for all accessible datasets from the Linked Open Data cloud. The results show that our approach generates accurate profiles even with comparably small sample sizes (10%) and outperforms established topic modelling approaches.
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
Text mining is an emerging research field evolving from information retrieval area. Clustering and
classification are the two approaches in data mining which may also be used to perform text classification
and text clustering. The former is supervised while the later is un-supervised. In this paper, our objective is
to perform text clustering by defining an improved distance metric to compute the similarity between two
text files. We use incremental frequent pattern mining to find frequent items and reduce dimensionality.
The improved distance metric may also be used to perform text classification. The distance metric is
validated for the worst, average and best case situations [15]. The results show the proposed distance
metric outperforms the existing measures.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Keyword-based linked data information retrieval is an easy choice for general purpose users, but implementation of such approach is a challenge because mere keyword does not hold semantics. Some studies have incorporated templates in an eort to bridge this gap, but most such pproaches have proven ineective because of inecient template management. Because linked data can be resented in a structured format, we can assume that the data's internal statistics can be used to eectively in
uence template management. In this work, we explore
the use of this in uence for template creation, ranking, and scaling. Then, we demonstrate how our proposal for automatic linked data information retrieval can be used alongside familiar keyword-based information retrieval methods, and can also be incorporated alongside other techniques, such as ontology inclusion and sophisticated matching, to achieve increased levels of performance.
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet spefic criteria, has become an increasingly important, yet challenging task to support issues such as entity retrieval or semantic search and data linking. Particularly with respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and ecient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to
the semantic web tradition in dealing with "fnding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.
While an understanding of the nature of the content of specic datasets is a crucial
prerequisite for the mentioned issues, we adopt in this dissertation the notion of
\dataset prole" | a set of features that describe a dataset and allow the comparison
of dierent datasets with regard to their represented characteristics. Our
rst research direction was to implement a collaborative ltering-like dataset recommendation
approach, which exploits both existing dataset topic proles, as well
as traditional dataset connectivity measures, in order to link LOD datasets into
a global dataset-topic-graph. This approach relies on the LOD graph in order to
learn the connectivity behaviour between LOD datasets. However, experiments have
shown that the current topology of the LOD cloud group is far from being complete
to be considered as a ground truth and consequently as learning data.
Facing the limits the current topology of LOD (as learning data), our research
has led to break away from the topic proles representation of \learn to rank"
approach and to adopt a new approach for candidate datasets identication where
the recommendation is based on the intensional proles overlap between dierent
datasets. By intensional prole, we understand the formal representation of a set of
schema concept labels that best describe a dataset and can be potentially enriched
Comparative analysis of relative and exact search for web information retrievaleSAT Journals
Abstract The volume of data on web repository is huge. To get specific and precise information for the web repository is a big challenge. Existing Information Retrieval (IR) techniques, given by contemporary researchers, are very useful in field of IR. Here, the authors have implemented and tested two of the techniques from the fields of IR. The authors dealt with Relative Search and Exact Search techniques one by one. Initially relative search tested on web repository data using web mining tool and then its results are analyzed. In the same manner, the exact search technique of IR tested on web repository data and the results are measured. The researchers have experienced the significant importance on exact search and relative search. The focused of the research paper is to retrieve relevant information from the web information repository. With the use of two searching criteria these can be done. With the use of the suggested methods the searchers may retrieve a relevant web data in a fewer time. Key Words: Web data Mining, Exact Search, Relative Search, PR, TM, CD, VSM and TASE
Directed versus undirected network analysis of student essaysRoy Clariana
IWALS 2018
6th International Workshop on Advanced Learning Sciences
Perspectives on the Learner: Cognition, Brain, and Education
University of Pittsburgh, USA JUNE 6-8, 2018
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
The increasing adoption of Linked Data principles has led
to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profile consists of structured dataset metadata describing topics and their relevance. Profiles are generated through the configuration of techniques for resource sampling from datasets, topic extraction from reference datasets and their ranking based on graphical models. To enable a good trade-off between scalability and accuracy of generated profiles, appropriate parameters are determined experimentally. Our evaluation considers topic profiles for all accessible datasets from the Linked Open Data cloud. The results show that our approach generates accurate profiles even with comparably small sample sizes (10%) and outperforms established topic modelling approaches.
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
Text mining is an emerging research field evolving from information retrieval area. Clustering and
classification are the two approaches in data mining which may also be used to perform text classification
and text clustering. The former is supervised while the later is un-supervised. In this paper, our objective is
to perform text clustering by defining an improved distance metric to compute the similarity between two
text files. We use incremental frequent pattern mining to find frequent items and reduce dimensionality.
The improved distance metric may also be used to perform text classification. The distance metric is
validated for the worst, average and best case situations [15]. The results show the proposed distance
metric outperforms the existing measures.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Keyword-based linked data information retrieval is an easy choice for general purpose users, but implementation of such approach is a challenge because mere keyword does not hold semantics. Some studies have incorporated templates in an eort to bridge this gap, but most such pproaches have proven ineective because of inecient template management. Because linked data can be resented in a structured format, we can assume that the data's internal statistics can be used to eectively in
uence template management. In this work, we explore
the use of this in uence for template creation, ranking, and scaling. Then, we demonstrate how our proposal for automatic linked data information retrieval can be used alongside familiar keyword-based information retrieval methods, and can also be incorporated alongside other techniques, such as ontology inclusion and sophisticated matching, to achieve increased levels of performance.
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet spefic criteria, has become an increasingly important, yet challenging task to support issues such as entity retrieval or semantic search and data linking. Particularly with respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and ecient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to
the semantic web tradition in dealing with "fnding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.
While an understanding of the nature of the content of specic datasets is a crucial
prerequisite for the mentioned issues, we adopt in this dissertation the notion of
\dataset prole" | a set of features that describe a dataset and allow the comparison
of dierent datasets with regard to their represented characteristics. Our
rst research direction was to implement a collaborative ltering-like dataset recommendation
approach, which exploits both existing dataset topic proles, as well
as traditional dataset connectivity measures, in order to link LOD datasets into
a global dataset-topic-graph. This approach relies on the LOD graph in order to
learn the connectivity behaviour between LOD datasets. However, experiments have
shown that the current topology of the LOD cloud group is far from being complete
to be considered as a ground truth and consequently as learning data.
Facing the limits the current topology of LOD (as learning data), our research
has led to break away from the topic proles representation of \learn to rank"
approach and to adopt a new approach for candidate datasets identication where
the recommendation is based on the intensional proles overlap between dierent
datasets. By intensional prole, we understand the formal representation of a set of
schema concept labels that best describe a dataset and can be potentially enriched
Comparative analysis of relative and exact search for web information retrievaleSAT Journals
Abstract The volume of data on web repository is huge. To get specific and precise information for the web repository is a big challenge. Existing Information Retrieval (IR) techniques, given by contemporary researchers, are very useful in field of IR. Here, the authors have implemented and tested two of the techniques from the fields of IR. The authors dealt with Relative Search and Exact Search techniques one by one. Initially relative search tested on web repository data using web mining tool and then its results are analyzed. In the same manner, the exact search technique of IR tested on web repository data and the results are measured. The researchers have experienced the significant importance on exact search and relative search. The focused of the research paper is to retrieve relevant information from the web information repository. With the use of two searching criteria these can be done. With the use of the suggested methods the searchers may retrieve a relevant web data in a fewer time. Key Words: Web data Mining, Exact Search, Relative Search, PR, TM, CD, VSM and TASE
Directed versus undirected network analysis of student essaysRoy Clariana
IWALS 2018
6th International Workshop on Advanced Learning Sciences
Perspectives on the Learner: Cognition, Brain, and Education
University of Pittsburgh, USA JUNE 6-8, 2018
The Linked Data and Services presentation was presented by Andreas Harth (KIT) and Barry Norton (KIT) at the PlanetData project Kick-off Meeting on October 11, 2010 in Palma de Mallorca, Spain.
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Similar to last year, I’ve posted all the content (lectures, labs and software) for any one to follow along with at their own pace. I also plan to release videos for all the lectures and labs.
Query Processing : Query Processing Problem, Layers of Query Processing Query Processing in Centralized Systems – Parsing & Translation, Optimization, Code generation, Example Query Processing in Distributed Systems – Mapping global query to local, Optimization,
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
The proliferation of heterogeneous Linked Data on the Web requires data management systems to constantly improve their scalability and efficiency. Despite recent advances in distributed Linked Data management, efficiently processing large amounts of Linked Data in a scalable way is still very challenging. In spite of their seemingly simple data models, Linked Data actually encode rich and complex graphs mixing both instance and schema level data. At the same time, users are increasingly interested in investigating or visualizing large collections of online data by performing complex analytic queries. The heterogeneity of Linked Data on the Web also poses new challenges to database systems. The capacity to store, track, and query provenance data is becoming a pivotal feature of Linked Data Management Systems. In this thesis, we tackle issues revolving around processing queries on big, unstructured, and heterogeneous Linked Data graphs.
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, because of this heterogeneity, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triple stores. In this paper, we tackle the problem of efficiently executing provenance-enabled queries over RDF data. We propose, implement and empirically evaluate five different query execution strategies for RDF queries that incorporate knowledge of provenance. The evaluation is conducted on Web Data obtained from two different Web crawls (The Billion Triple Challenge, and the Web Data Commons). Our evaluation shows that using an adaptive query materialization execution strategy performs best in our context. Interestingly, we find that because provenance is prevalent within Web Data and is highly selective, it can be used to improve query processing performance. This is a counterintuitive result as provenance is often associated with additional overhead.
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...COST Action TD1210
Paul Groth (Elsevier) “Data Analysis in a Changing Discourse: The Challenges of Scholarly Communication“
Presentation at the KnoweScape workshop "Evolution and variation of classification systems" March 4-5, 2015 Amsterdam
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
I used this slideset to present our research paper at the 14th Int. Semantic Web Conference (ISWC 2015). Find a preprint of the paper here:
http://olafhartig.de/files/HartigPerez_ISWC2015_Preprint.pdf
An Overview on PROV-AQ: Provenance Access and QueryOlaf Hartig
The slides which I used at the Dagstuhl seminar on Principles of Provenance (Feb.2012) for presenting the main contributions and open issues of the PROV-AQ document created by the W3C provenance working group.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
The Impact of Data Caching of on Query Execution for Linked Data
1. The Impact of
Data Caching on
Query Execution for Linked Data
Olaf Hartig
http://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research Group
Humboldt-Universität zu Berlin
2. Can we query the Web of Data
as of it were a single,
giant database?
SELECT DISTINCT ?i ?label
WHERE {
?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ;
foaf:topic_interest ?i .
}
OPTIONAL {
}
?i rdfs:label ?label
FILTER( LANG(?label)="en" || LANG(?label)="")
ORDER BY ?label
?
Our approach: Link Traversal Based Query Execution
[ISWC'09]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 2
3. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 3
4. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 4
5. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 5
6. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 6
7. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 7
8. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
“Descriptor object”
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 8
9. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 9
10. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://bob.name
Query kno
ws
http://bob.name
http://alice.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 10
11. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://bob.name
Query kno
ws
http://bob.name
http://alice.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 11
12. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 12
13. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 13
14. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 14
15. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 15
16. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 16
17. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://alice.name
Query pr o
http://bob.name jec
t
?prjName http://.../AlicesPrj
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 17
18. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset
http://alice.name
Query pr o
http://bob.name jec
t
?prjName http://.../AlicesPrj
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 18
19. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 19
20. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset ?prj ?prjName
http://.../AlicesPrj “…“
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 20
21. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset ?prj ?prjName
http://.../AlicesPrj “…“
Query ?acq ?prj ?prjName
http://bob.name
?prjName http://alice.name http://.../AlicesPrj “…“
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 21
22. Characteristics
● Link traversal based query execution:
● Evaluation on a continuously augmented dataset
● Discovery of potentially relevant data during execution
● Discovery driven by intermediate solutions
● Main advantage:
● No need to know all data sources in advance
● Limitations:
● Query has to contain a URI as a starting point
●
Ignores data that is not reachable* by the query execution
*
formal definition in [LDOW'11a]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 22
23. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 23
24. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
htt query-local
p: //b
ob dataset
? .nam
e
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 24
25. The Issue
Query
?acq interest http://bob.name
?i
kno
s
ow
w s
label
kn
http://alice.name
http://bob.name
?iLabel
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 25
26. The Issue
Query
?acq interest http://bob.name
?i
kno
s
ow
w s
label
kn
http://alice.name
http://bob.name
?iLabel
query-local
dataset
?acq ?i ?iLabel
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 26
27. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 27
28. Reusing the Query-Local Dataset
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 28
29. Reusing the Query-Local Dataset
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
http://alice.name
o ws
Query kn
http://bob.name
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 29
30. Reusing the Query-Local Dataset
Query
?acq interest
?i ?acq
s
ow
http://alice.name
label
kn
http://bob.name
?iLabel
http://alice.name
o ws
Query kn
http://bob.name
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 30
31. Hypothesis
Re-using the query-local dataset (a.k.a. data caching)
may benefit
query performance + result completeness
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 31
32. Contributions
● Systematic analysis of the impact of data caching
●
Theoretical foundation*
●
Conceptual analysis*
● Empirical evaluation of the potential impact
*
see [LDOW'11a]
● Out of scope: Caching strategies (replacement, invalidation)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 32
33. Experiment – Scenario
● Information about the
distributed social
network of FOAF
profiles
● 5 types of queries
● Experiment Setup:
● 20 persons
● Sequential use
➔ 100 queries
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 33
34. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
query
per ● no reuse experiment:
0,01 0,1 1 10 100
ContactInfoDanBri
● No data caching
(Query No. 61) ● reuse per query experiment
UnsetPropsDanBri ● Reuse of query-local dataset
(Query No. 62) for 3 executions of each query
2ndDegree1DanBri
● Third execution measured
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 34
35. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
query
per ● no 0,01
reuse experiment:
0,1 1 10 100
ContactInfoDanBri
● No data caching
(Query No. 61) ● reuse per query experiment
UnsetPropsDanBri ● Reuse of query-local dataset
(Query No. 62) for 3 executions of each query
2ndDegree1DanBri
● Third execution measured
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 35
36. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 36
37. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 37
38. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 38
39. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
query
per reuse 40
queries
● reuse all queries experiment: 100
0,01 0,1 1 10
ContactInfoDanBri
● Reuse of the query-local
(Query No. 61) dataset for the complete
sequence of all 100 queries
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 39
40. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 40
41. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 41
42. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 42
43. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 43
44. Outlook
● Requirements of a data cache:
● Replacement mechanism
● Coherency mechanism
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 44
45. Cache Replacement
● Cache full → remove descriptor objects
● Replacement strategy
● Primary goal: maximize hit rate
● Recency-based
● Frequency-based
● Function-based
● Randomized
● Replacement process
● Watermarks: high and low
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 45
47. Studying Cache Replacement?
“Web cache replacement in its general
form seems to be a solved topic.”
S. Podlipnig and L. Böszörmenyi: Survey of
Web Cache Replacement Strategies, 2003
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 47
48. Studying Cache Replacement?
“Web cache replacement in its general
form seems to be a solved topic.”
S. Podlipnig and L. Böszörmenyi: Survey of
Web Cache Replacement Strategies, 2003
●
6 quad indexes* in main memory
● Size grows linear in the number of quads
● Example (after reuse all queries experiment, 100 queries):
● 905 descriptor objects, overall number of 745,756 triples
● ca. 103 MB
➔ Available main memory is almost no limit
*
as introduced in [LDOW'11b]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 48
49. Cache Coherency
● Data items in the cache may become inconsistent
● Strong cache consistency
● Server validation
● Client validation
● Weak cache consistency
● Time to live (TTL)
● Adaptive TTL
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 49
50. Client Validation
● Polling every time
● Enables strong cache consistency
● Conditional GET
● Request with If-Modified-Since header
● Possible response: 304 Not Modified
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 50
51. Client Validation
● Polling every time
● Enables strong cache consistency
● Conditional GET
● Request with If-Modified-Since header
● Possible response: 304 Not Modified
● Not supported by most Linked Data servers
● Experiment based on the CKAN catalog of linked datasets
● 41 out of 154 example resources (26.6%) from 110 datasets
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 51
52. Time to Live (TTL)
● TTL field: life time estimation for each object
● Supported by HTTP response headers:
● Expires
● Cache-Control: max-age
● When TTL elapses, object is invalid
● Accessing an invalid object → re-retrieve object again
● Conditional GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 52
53. Time to Live (TTL)
● TTL field: life time estimation for each object
● Supported by HTTP response headers:
● Expires 37.0%
● Cache-Control: max-age 37.7%
● When TTL elapses, object is invalid
● Accessing an invalid object → re-retrieve object again
● Conditional GET 26.6%
● Alternative (due to lack of support in Linked Data servers):
● Assume a default TTL for each object
● Ordinary GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 53
54. Adaptive TTL
● Assumption:
● The older an object, the less likely it is to be modified
● TTL is a percentage of the age:
● Threshold = 10% ; age = 30 days → TTL = 3 days
● Last verification: yesterday → invalidation in 2 days
● HTTP-based implementation:
● Calculation of age: use Last-Modified response header
● Verification with conditional GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 54
55. Adaptive TTL
● Assumption:
● The older an object, the less likely it is to be modified
● TTL is a percentage of the age:
● Threshold = 10% ; age = 30 days → TTL = 3 days
● Last verification: yesterday → invalidation in 2 days
● HTTP-based implementation: 35.1%
● Calculation of age: use Last-Modified response header
● Verification with conditional GET
● Alternative (due to lack of support in Linked Data servers):
● Assume Last-Modified is time of first retrieval
● Verification by comparing a response to the current version
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 55
56. Summary
● Systematic analysis of the impact of data cache
● Theoretical foundation
● Conceptual analysis
● Empirical evaluation
● Main findings:
● Additional results possible (for semantically similar queries)
● Impact on performance may be positive but also negative
● Future work:
● Analysis of caching strategies in our context
● Main issue: invalidation
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 56
58. Contributions
● Theoretical foundation (extension of the original definition)
● Reachability by a Dseed-initialized execution of a BGP query b
● Dseed-dependent solution for a BGP query b
● Reachability R(B) for a serial execution of B = b1 , … , bn
➔ Each solution for bcur is also R(B)-dependent solution for bcur
● Conceptual analysis of the impact of data caching
● Performance factor: p( bcur , B ) = c( bcur , [ ] ) – c( bcur , B )
● Serendipity factor: s( bcur , B ) = b( bcur , B ) – b( bcur , [ ] )
● Empirical verification of the potential impact
● Out of scope: Caching strategies (replacement, invalidation)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 58
60. Query Template UnsetProps
SELECT DISTINCT ?result ?resultLabel WHERE
{
?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> .
?result rdfs:domain foaf:Person .
OPTIONAL { <PERSON> ?result ?var0 }
FILTER ( !bound(?var0) )
<PERSON> foaf:knows ?var2 .
?var2 ?result ?var3 .
?result rdfs:label ?resultLabel .
?result vs:term_status ?var1 .
}
ORDER BY ?var1
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 60
61. Query Template Incoming
SELECT DISTINCT ?result WHERE
{
?result foaf:knows <PERSON> .
OPTIONAL
{
?result foaf:knows ?var1 .
FILTER ( <PERSON> = ?var1 )
<PERSON> foaf:knows ?result .
}
FILTER ( !bound(?var1) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 61
62. Query Template 2ndDegree1
SELECT DISTINCT ?result WHERE
{
<PERSON> foaf:knows ?p1 .
<PERSON> foaf:knows ?p2 .
FILTER ( ?p1 != ?p2 )
?p1 foaf:knows ?result .
FILTER ( <PERSON> != ?result )
?p2 foaf:knows ?result .
OPTIONAL {
<PERSON> ?knows ?result .
FILTER ( ?knows = foaf:knows )
}
FILTER ( !bound(?knows) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 62
63. Query Template 2ndDegree2
SELECT DISTINCT ?result WHERE
{
<PERSON> foaf:knows ?p1 .
<PERSON> foaf:knows ?p2 .
FILTER ( ?p1 != ?p2 )
?result foaf:knows ?p1 .
FILTER ( <PERSON> != ?result )
?result foaf:knows ?p2 .
OPTIONAL {
<PERSON> ?knows ?result .
FILTER ( ?knows = foaf:knows )
}
FILTER ( !bound(?knows) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 63
64. Experiment – Single Query
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
1
Averaged over all 100 queries
● In the ideal case for Bupper= [ bcur , bcur ] :
● pupper( bcur , Bupper ) = c( bcur , [ ] ) – c( bcur , Bupper ) = c( bcur , [ ] )
● supper( bcur , Bupper ) = b( bcur , Bupper ) – b( bcur , [ ] ) = 0
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 64
65. Experiment – Single Query
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
1
Averaged over all 100 queries
● Summary (measurement errors aside):
● Same number of query results
● Significant improvements in query performance
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 65
66. Experiment – Complete Sequence
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
44.87 0.991 37.91 s
reuse all queries
(178.36) (0.053) (112.94)
1
Averaged over all 100 queries
● Summary:
● Data cache may provide for additional query results
● Impact on performance may be positive but also negative
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 66
67. Experiment – Complete Sequence
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
44.87 0.991 37.91 s
reuse all queries
(178.36) (0.053) (112.94)
reuse all queries 118.18 0.992 20.61 s
(random orders) (867.07) (0.016) (216.61)
● Executing the query sequence in a random order results in
measurements similar to the given order.
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 67
68. These slides have been created by
Olaf Hartig
http://olafhartig.de
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 68