Rethinking Online SPARQL Querying to Support Incremental Result VisualizationOlaf Hartig
These are the slides of my invited talk at the 5th Int. Workshop on Usage Analysis and the Web of Data (USEWOD 2015): http://usewod.org/usewod2015.html
The abstract of this talks is given as follows:
To reduce user-perceived response time many interactive Web applications visualize information in a dynamic, incremental manner. Such an incremental presentation can be particularly effective for cases in which the underlying data processing systems are not capable of completely answering the users' information needs instantaneously. An example of such systems are systems that support live querying of the Web of Data, in which case query execution times of several seconds, or even minutes, are an inherent consequence of these systems' ability to guarantee up-to-date results. However, support for an incremental result visualization has not received much attention in existing work on such systems. Therefore, the goal of this talk is to discuss approaches that enable query systems for the Web of Data to return query results incrementally.
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
Big data analysis involves several processes: collecting, storage, computing, analysis and visualization. In this slides, the author demonstrates these processes by using python tools to build a data product. The example is based on text-analyzing an online forum.
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationOlaf Hartig
These are the slides of my invited talk at the 5th Int. Workshop on Usage Analysis and the Web of Data (USEWOD 2015): http://usewod.org/usewod2015.html
The abstract of this talks is given as follows:
To reduce user-perceived response time many interactive Web applications visualize information in a dynamic, incremental manner. Such an incremental presentation can be particularly effective for cases in which the underlying data processing systems are not capable of completely answering the users' information needs instantaneously. An example of such systems are systems that support live querying of the Web of Data, in which case query execution times of several seconds, or even minutes, are an inherent consequence of these systems' ability to guarantee up-to-date results. However, support for an incremental result visualization has not received much attention in existing work on such systems. Therefore, the goal of this talk is to discuss approaches that enable query systems for the Web of Data to return query results incrementally.
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
Big data analysis involves several processes: collecting, storage, computing, analysis and visualization. In this slides, the author demonstrates these processes by using python tools to build a data product. The example is based on text-analyzing an online forum.
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Search engines (e.g. Google.com, Yahoo.com, and Bi
ng.com) have become the dominant model of online search. Large and small e-commerce provide built-in search capability to their visitors to examine the products they have. While most large business are able to hire the
necessary skills to build advanced search engines,
small online business still lack the ability to evaluate the results of their search engines, which means losing the opportunity to compete with larger business. The purpose of this paper is to build an open-source model that can measure the relevance of search results for online businesses
as well as the accuracy of their underlined algorithms. We used data from a Kaggle.com competition in order to show our model running on real data.
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
ParlBench: a SPARQL-benchmark for electronic publishing applications.Tatiana Tarasova
Slides from the workshop on Benchmarking RDF Systems co-located with the Extended Semantic Web Conference 2013. The presentation is about an on-going work on building the benchmark for electronic publishing applications. The benchmark provides real-world data sets, the Dutch parliamentary proceedings and a set of analytical SPARQL queries that were built on top of these data sets. The queries were grouped into micro-benchmarks according to their analytical aims. This allows one to perform better analysis of RDF stores behaviors with respect to a certain SPARQL feature used in a micro-benchmark/query.
Preliminary results of running the benchmark on the Virtuoso native RDF store are presented, as well as references to the on-line material including the data sets, queries and the scripts that were used to obtain the results.
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
Parquet is a columnar format designed to be extremely efficient and interoperable across the hadoop ecosystem. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, …) and serialization models (Thrift, Avro, Protocol Buffers, …) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). In this talk, we will describe how one can us Parquet with a wide variety of data analysis tools like Spark, Impala, Pig, Hive, and Cascading to create powerful, efficient data analysis pipelines. Data management is simplified as the format is self describing and handles schema evolution. Support for nested structures enables more natural modeling of data for Hadoop compared to flat representations that create the need for often costly joins.
The workshop will present how to combine tools to quickly query, transform and model data using command line tools.
The goal is to show that command line tools are efficient at handling reasonable sizes of data and can accelerate the data science
process. We will show that in many instances, command line processing ends up being much faster than ‘big-data’ solutions. The content
of the workshop is derived from the book of the same name (http://datascienceatthecommandline.com/). In addition, we will cover
vowpal-wabbit (https://github.com/JohnLangford/vowpal_wabbit) as a versatile command line tool for modeling large datasets.
Multimodal Features for Search and Hyperlinking of Video ContentPetra Galuscakova
In the talk, I will discuss content-based retrieval in audio-visual collections. I will focus on retrieval of relevant segments of video using a textual query. In addition, I will describe techniques for detecting hyperlinks within audio-visual collections. Our retrieval system ranked first in the MediaEval 2014 Search and Hyperlinking shared task. The experiments were performed on almost 4000 hours of BBC broadcast video.
As the segmentation of the recordings shows to be crucial for high-quality video retrieval and hyperlinking, I will focus on segmentation strategies. I will show the possibility of employment of the prosodic and visual information into the segmentation process. Our decision tree-based segmentation proved to outperform fixed-length segmentation which regularly achieves the best results in the retrieval process. Visual and prosodic similarity are also explored in addition to the hyperlinking based on the subtitles and automatic transcripts. The employment of the visual similarity achieves a constant improvement, while the employment of the prosodic similarity shows a small but promising improvement too.
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
Python is open source and has so many libraries for data wrangling and visualization that makes life of data scientists easier. For data wrangling pandas is used as it represent tabular data and it has other function to parse data from different sources, data cleaning, handling missing values, merging data sets etc. To visualize data, low level matplotlib can be used. But it is a base package for other high level packages such as seaborn, that draw well customized plot in just one line of code. Python has dash framework that is used to make interactive web application using python code without javascript and html. These dash application can be published on any server as well as on clouds like google cloud but freely on heroku cloud.
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
Search engines (e.g. Google.com, Yahoo.com, and Bi
ng.com) have become the dominant model of online search. Large and small e-commerce provide built-in search capability to their visitors to examine the products they have. While most large business are able to hire the
necessary skills to build advanced search engines,
small online business still lack the ability to evaluate the results of their search engines, which means losing the opportunity to compete with larger business. The purpose of this paper is to build an open-source model that can measure the relevance of search results for online businesses
as well as the accuracy of their underlined algorithms. We used data from a Kaggle.com competition in order to show our model running on real data.
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
ParlBench: a SPARQL-benchmark for electronic publishing applications.Tatiana Tarasova
Slides from the workshop on Benchmarking RDF Systems co-located with the Extended Semantic Web Conference 2013. The presentation is about an on-going work on building the benchmark for electronic publishing applications. The benchmark provides real-world data sets, the Dutch parliamentary proceedings and a set of analytical SPARQL queries that were built on top of these data sets. The queries were grouped into micro-benchmarks according to their analytical aims. This allows one to perform better analysis of RDF stores behaviors with respect to a certain SPARQL feature used in a micro-benchmark/query.
Preliminary results of running the benchmark on the Virtuoso native RDF store are presented, as well as references to the on-line material including the data sets, queries and the scripts that were used to obtain the results.
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
Parquet is a columnar format designed to be extremely efficient and interoperable across the hadoop ecosystem. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, …) and serialization models (Thrift, Avro, Protocol Buffers, …) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). In this talk, we will describe how one can us Parquet with a wide variety of data analysis tools like Spark, Impala, Pig, Hive, and Cascading to create powerful, efficient data analysis pipelines. Data management is simplified as the format is self describing and handles schema evolution. Support for nested structures enables more natural modeling of data for Hadoop compared to flat representations that create the need for often costly joins.
The workshop will present how to combine tools to quickly query, transform and model data using command line tools.
The goal is to show that command line tools are efficient at handling reasonable sizes of data and can accelerate the data science
process. We will show that in many instances, command line processing ends up being much faster than ‘big-data’ solutions. The content
of the workshop is derived from the book of the same name (http://datascienceatthecommandline.com/). In addition, we will cover
vowpal-wabbit (https://github.com/JohnLangford/vowpal_wabbit) as a versatile command line tool for modeling large datasets.
Multimodal Features for Search and Hyperlinking of Video ContentPetra Galuscakova
In the talk, I will discuss content-based retrieval in audio-visual collections. I will focus on retrieval of relevant segments of video using a textual query. In addition, I will describe techniques for detecting hyperlinks within audio-visual collections. Our retrieval system ranked first in the MediaEval 2014 Search and Hyperlinking shared task. The experiments were performed on almost 4000 hours of BBC broadcast video.
As the segmentation of the recordings shows to be crucial for high-quality video retrieval and hyperlinking, I will focus on segmentation strategies. I will show the possibility of employment of the prosodic and visual information into the segmentation process. Our decision tree-based segmentation proved to outperform fixed-length segmentation which regularly achieves the best results in the retrieval process. Visual and prosodic similarity are also explored in addition to the hyperlinking based on the subtitles and automatic transcripts. The employment of the visual similarity achieves a constant improvement, while the employment of the prosodic similarity shows a small but promising improvement too.
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
Python is open source and has so many libraries for data wrangling and visualization that makes life of data scientists easier. For data wrangling pandas is used as it represent tabular data and it has other function to parse data from different sources, data cleaning, handling missing values, merging data sets etc. To visualize data, low level matplotlib can be used. But it is a base package for other high level packages such as seaborn, that draw well customized plot in just one line of code. Python has dash framework that is used to make interactive web application using python code without javascript and html. These dash application can be published on any server as well as on clouds like google cloud but freely on heroku cloud.
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
Streaming machine learning is being integrated in Spark 2.1+, but you don’t need to wait. Holden Karau and Seth Hendrickson demonstrate how to do streaming machine learning using Spark’s new Structured Streaming and walk you through creating your own streaming model. By the end of this session, you’ll have a better understanding of Spark’s Structured Streaming API as well as how machine learning works in Spark.
The talk cover concepts and internal mechanisms of how PostgreSQL, a popular open-source database, operates. While doing so, I'll also draw similarities to other RDBMS like Oracle, MySQL or SQL Server.
Some topics to touch during this presentation:
- PostgreSQL internal concepts: table, index, page, heap, vacuum, toast, etc.
- MVCC and relational transactions
- Indexes and how they affect performance
- Discuss on Uber's blog post about moving from PostgreSQL to MySQL
The talk is suitable for technical audience who has worked with databases before (software engineers/data analysts) and want to learn about its internal mechanism.
Speaker: Huy Nguyen, CTO & Cofounder, Holistics Software
Huy's currently CTO of Holistics, a Business Intelligence (BI) and Data Infrastructure product. Holistics helps customers generate reports and insights from their data. Holistics customers include tech companies like Grab, Traveloka, The Coffee House, Tech In Asia and e27.
Before Holistics, Huy worked at Viki, helping build their end-to-end data platform that scale to over 100M records a day. Previously, Huy spent a year writing medical simulation in Europe, and did an internship with Facebook HQ working for their growth team.
Huy's proudest achievement is 251 scores on Flappy Bird.
Language: Vietnamese, with slides in English.
This presentation is an attempt do demystify the practice of building reliable data processing pipelines. We go through the necessary pieces needed to build a stable processing platform: data ingestion, processing engines, workflow management, schemas, and pipeline development processes. The presentation also includes component choice considerations and recommendations, as well as best practices and pitfalls to avoid, most learnt through expensive mistakes.
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big DataPingCAP
The performance of analytical query processing in data management systems depends primarily on the capabilities of the system's query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer.
In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture.
In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.
This will address two recently concluded Kaggle competitions.
1. Google landmark retrieval
2. Google landmark recognition
The talk would focus on image retrieval and recognition in large scale. The tentative plan for the presentation:
Primer on signal analysis (DFT, Wavelets).
Primer on information retrieval.
Tips for parallelizing your data pipeline.
Description of my approach and detailed discussion of bottlenecks, limitations and lessons.
In-depth analysis of winning solutions.
This will be a combination of theoretical rigor and practical implementation.
Beyond EXPLAIN: Query Optimization From Theory To CodeYuto Hayamizu
EXPLAIN is too much explained. Let's go "beyond EXPLAIN".
This talk will take you to an optimizer backstage tour: from theoretical background of state-of-the-art query optimization to close look at current implementation of PostgreSQL.
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
Apache Spark 2.2 ships with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, max/min, avg/max length, etc.) to improve the quality of query execution plans. Leveraging these reliable statistics helps Spark to make better decisions in picking the most optimal query plan. Examples of these optimizations include selecting the correct build side in a hash-join, choosing the right join type (broadcast hash-join vs. shuffled hash-join) or adjusting a multi-way join order, among others. In this talk, we’ll take a deep dive into Spark’s cost based optimizer and discuss how we collect/store these statistics, the query optimizations it enables, and its performance impact on TPC-DS benchmark queries.
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
I used this slideset to present our research paper at the 14th Int. Semantic Web Conference (ISWC 2015). Find a preprint of the paper here:
http://olafhartig.de/files/HartigPerez_ISWC2015_Preprint.pdf
An Overview on PROV-AQ: Provenance Access and QueryOlaf Hartig
The slides which I used at the Dagstuhl seminar on Principles of Provenance (Feb.2012) for presenting the main contributions and open issues of the PROV-AQ document created by the W3C provenance working group.
Querying Trust in RDF Data with tSPARQLOlaf Hartig
With these slides I presented my paper on "Querying Trust in RDF Data with tSPARQL" at the European Semantic Web Conference 2009 (ESWC) in Heraklion, Crete. Actually, this slideset is an extended version of the slides I used for the talk (more examples and evaluation).
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimization" (WWW 2013 Ed.)
1. Linked Data Query Processing
Tutorial at the 22nd International World Wide Web Conference (WWW 2013)
May 14, 2013
http://db.uwaterloo.ca/LDQTut2013/
5. Query Planning
and Optimization
Olaf Hartig
University of Waterloo
2. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 2
Query Plan Selection
● Possible assessment criteria:
● Benefit (size of computed query result)
● Cost (overall query execution time)
● Response time (time for returning k solutions)
● To select from candidate plans, criteria must be estimated
● For index-based source selection: estimation may be
based on information recorded in the index [HHK+10]
● For (pure) live exploration: estimation impossible
● No a-priori information available
● Use heuristics instead
3. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 3
Outline
Heuristics-Based Planning
Optimizing Link Traversing Iterators
➢ Prefetching
➢ Postponing
Source Ranking
➢ Harth et al. [HHK+10, UHK+11]
➢ Ladwig and Tran [LT10]
4. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 4
Heuristics-Based Plan Selection [Har11a]
● Four rules:
● DEPENDENCY RULE
● SEED RULE
● INSTANCE SEED RULE
● FILTER RULE
● Tailored to LTBQE implemented by link traversing iterators
● Assumptions about queries:
● Query pattern refers to instance data
● URIs mentioned in the query pattern are the seed URIs
5. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 5
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RULE
● Dependency: a variable from each triple pattern already
occurs in one of the preceding triple patterns
tp1
= ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
tp2
= ( ?p , ex:interested_in , ?b ) I2
tp3
= ( ?b , rdf:type , <http://.../Book> ) I3
Use a dependency respecting query plan
√
6. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 6
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RULE
● Dependency: a variable from each triple pattern already
occurs in one of the preceding triple patterns
tp1
= ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
tp2
= ( ?p , ex:interested_in , ?b ) I2
tp3
= ( ?b , rdf:type , <http://.../Book> ) I3
Use a dependency respecting query plan
7. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 7
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RULE
● Dependency: a variable from each triple pattern already
occurs in one of the preceding triple patterns
● Rationale:
Avoid
cartesian
products
tp1
= ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
tp2
= ( ?b , rdf:type , <http://.../Book> ) I2
tp3
= ( ?p , ex:interested_in , ?b ) I3
Use a dependency respecting query plan
8. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 8
Recall assumption:
seed URIs = URIs in the query
SEED RULE
● Seed triple pattern of a plan
… is the first triple pattern in the plan, and
… contains at least one HTTP URI
● Rationale:
Good starting point
Use a plan with a seed triple pattern
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
√
√
√
9. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 9
INSTANCE SEED RULE
● Patterns to avoid:
✗ ?s ex:any_property ?o
✗ ?s rdf:type ex:any_class
● Rationale: URIs for vocabulary terms usually resolve to
vocabulary definitions with little instance data
Avoid a seed triple pattern with vocabulary terms
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
√
10. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 10
FILTER RULE
● Filtering triple pattern: each variable already occurs in one
of the preceding triple patterns
● For each valuation
consumed as input
a filtering TP can
only report 1 or 0
valuations as
output
● Rationale: Reduce
cost
tp2
= ( ?p , ex:interested_in , ?b ) I2
tp3
= ( ?b , rdf:type , <http://.../Book> ) I3
Use a plan where all filtering triple patterns are
as close to the first triple pattern as possible
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
tp2
' = ( <http://.../alice> , ex:interested_in , ?b )
tp3
' = ( <http://.../b1> , rdf:type , <http://.../Book> )
tp1
= ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
11. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 11
Outline
Heuristics-Based Planning
Optimizing Link Traversing Iterators
➢ Prefetching
➢ Postponing
Source Ranking
➢ Harth et al. [HHK+10, UHK+11]
➢ Ladwig and Tran [LT10]
√
16. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 16
Next?
Next?
tp3
= ( ?b , rdf:type , <http://.../Book> ) I3
tp1
= ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1
tp2
= ( ?p , ex:interested_in , ?b )
tp2
' = ( <http://.../alice> , ex:interested_in , ?b )
I2
query-local
dataset
{ ?p = <http://.../alice> }
Prefetching of URIs [HBF09]
Wait until look-up
is finished
Initiate
look-up
in the
background
Initiate look-up(s)
and wait
17. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 17
Postponing Iterator [HBF09]
● Idea: temporarily reject an input solution
if processing it would cause blocking
● Enabled by an extension of the iterator paradigm:
● New function POSTPONE: treat the element most recently
reported by GETNEXT as if it
has not yet been reported
(i.e., “take back” this element)
● Adjusted GETNEXT: either return a (new) next element or
return a formerly postponed element
18. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 19
Outline
Heuristics-Based Planning
Optimizing Link Traversing Iterators
➢ Prefetching
➢ Postponing
Source Ranking
➢ Harth et al. [HHK+10, UHK+11]
➢ Ladwig and Tran [LT10]
√
√
19. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 20
General Idea of Source Ranking
Rank the URIs resulting from source selection
such that
the ranking represents a priority for lookup
● Possible objectives:
● Report first solutions as early as possible
● Minimize time for computing the first k solutions
● Maximize the number of solutions computed in a
given amount of time
20. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 21
Harth et al. [HHK+10, UHK+11]
● For triple patterns this number is directly available:
● Recall, each QTree bucket stores a set of (URI,count)-pairs
● All query-relevant buckets are known after source selection
For any URI u (selected by the QTree-based approach), let:
rank(u) :═ estimated number of solutions that u contributes to
Root
B
C
AA1
A2
B2
B1
Root
A B
A1 A2
C
B1 B2
21. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 22
Harth et al. [HHK+10, UHK+11]
● For triple patterns this number is directly available:
● Recall, each QTree bucket stores a set of (URI,count)-pairs
● All query-relevant buckets are known after source selection
● For BGPs, estimate the number recursively:
● Recursively determine regions of join-able data
(based on overlapping QTree buckets for each triple pattern)
● For each of these regions, recursively estimate number of
triples the URI contributes to the region
● Factor in the estimated join result cardinality of these regions
(estimated based on overlap between contributing buckets)
For any URI u (selected by the QTree-based approach), let:
rank(u) :═ estimated number of solutions that u contributes to
22. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 23
Ladwig and Tran [LT10]
● Multiple scores
● Triple pattern cardinality
● Triple frequency – inverse source frequency (TF–ISF)
● (URI-specific) join pattern cardinality
● Incoming links
● Assumption: pre-populated index that stores triple pattern
cardinalities and join pattern cardinalities for each URI
● Aggregation of the scores to obtain ranks
● For indexed URIs: weighted summation of all scores
● For non-indexed URIs: weighting of (currently known) in-links
● Ranking is refined at run-time
23. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 24
Metric: Triple Pattern Cardinality [LT10]
● Rationale: data that contains many matching triples
is likely to contribute to many solutions
● Requirement: pre-populated index that stores the cardinalities
● Caveat: some triple patterns have a high
cardinality for almost all URIs
● Example: (?x, rdf:type, ?y)
● These patterns do not discriminate URIs
For a selected URI u, and a triple pattern tp (from the query), let:
card(u, tp) :═ number of triples in the data of u that match tp
24. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 25
Metric: TF–ISF [LT10]
● Idea: adopt TF-IDF concept to weight triple patterns
● Triple Frequency – Inverse Source Frequency (TF–ISF)
● Rationale:
● Importance positively correlates to the number of matching
triples that occur in the data for a URI
● Importance negatively correlates to how often matching
triples occur for all known URIs (i.e., all indexed URIs)
For a selected URI u, a triple pattern tp, and a set of all known
URIs Uknown , let:
tf.isf (u ,tp):=card (u ,tp) ∗ log
( ∣U known∣
{r∈U known ∣ card (r ,tp)>0})
25. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 26
Metric: Join Pattern Cardinality [LT10]
● Rationale: data that matches pairs of (joined) triple patterns
is highly relevant, because it matches a larger
part of the query
● Requirement: these join cardinalities are also pre-computed
and stored in a pre-populated index
For a selected URI u, two triple pattern tpi and tpj , and
query variable v, let:
card(u, tpi , tpj , v) :═ number of solutions produced
by joining tpi and tpj on variable v
using only the data from u
26. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 27
Ladwig and Tran [LT10]
● Multiple scores
● Triple pattern cardinality
● Triple frequency – inverse source frequency (TF–ISF)
● (URI-specific) join pattern cardinality
● Incoming links
● Assumption: pre-populated index that stores triple pattern
cardinalities and join pattern cardinalities for each URI
● Aggregation of the scores to obtain ranks
● For indexed URIs: weighted summation of all scores
● For non-indexed URIs: weighting of (currently known) in-links
● Ranking is refined at run-time
27. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 28
Refinement at Run-Time [LT10]
● During query execution information becomes available
(1) intermediate join results (2) more incoming links
● Use it to adjust scores & ranking (for integrated execution)
● Re-estimate join pattern cardinalities based on samples of
intermediate results (available from hash tables in SHJ)
● Parameters for influencing behavior of ranking process:
● Invalid score threshold: re-rank when the number of URIs
with invalid scores passes this threshold
● Sample size: larger samples give better estimates, but make
the process more costly
● Re-sampling threshold: reuse cached estimates unless the
hash table of join operators grows past this threshold
28. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 29
Outline
Heuristics-Based Planning
Optimizing Link Traversing Iterators
➢ Prefetching
➢ Postponing
Source Ranking
➢ Harth et al. [HHK+10, UHK+11]
➢ Ladwig and Tran [LT10]
√
√
√
29. WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 30
Tutorial Outline
(1) Introduction
(2) Theoretical Foundations
(3) Source Selection Strategies
(4) Execution Process
(5) Query Planning and Optimization
… Thanks!
30. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 31
These slides have been created by
Olaf Hartig
for the
WWW 2013 tutorial on
Link Data Query Processing
Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
(Some of the slides in this slide set have been inspired by
slides from Günter Ladwig [LT10] – Thanks!)
31. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 32
These slides have been created by
Olaf Hartig
for the
WWW 2013 tutorial on
Link Data Query Processing
Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
(Slides 24 - 26, 33, and 34 are inspired by slides
from Günter Ladwig [LT10] – Thanks!)
32. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 33
Backup Slides
33. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 34
Metric: Links to Results [LT10]
● Rationale: a URI is more relevant if data from
many relevant URIs mention it
● Links are only discovered at run-time
The “links to results” of a selected URI u is defined by:
where Uprocessed is the set of URIs whose data has already been
processed and links( u1 , u2 ) are the links to URI u1 mentioned
in the data from URI u2.
links(u):={l ∈links(u ,uprocessed )∣u processed ∈U processed }
34. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 35
Metric: Retrieval Cost [LT10]
● Rationale: URIs are more relevant the faster their data can
be retrieved
● Size is available in the pre-populated index
● Bandwidth for any particular host can be approximated
based on past experience or average performance
recorded during the query execution process
The retrieval cost of a selected URI u is defined by:
cost( u) :═ Agg( size(u) , bandwidth(u) )
where size(u) is the of the data from u, and bandwidth(u) is the
bandwidth of the Web server that hosts u.