"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
Understanding the semantic relationships between terms is a fundamental task in natural language
processing applications. While structured resources that can express those relationships in
a formal way, such as ontologies, are still scarce, a large number of linguistic resources gathering
dictionary definitions is becoming available, but understanding the semantic structure of natural
language definitions is fundamental to make them useful in semantic interpretation tasks. Based
on an analysis of a subset of WordNet’s glosses, we propose a set of semantic roles that compose
the semantic structure of a dictionary definition, and show how they are related to the definition’s
syntactic configuration, identifying patterns that can be used in the development of information
extraction frameworks and semantic models.
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
We have attempted in this paper to reduce the number of checked condition through saving frequency of the
tandem replicated words, and also using non-overlapping iterative neighbor intervals on plane sweep
algorithm. The essential idea of non-overlapping iterative neighbor search in a document lies in focusing
the search not on the full space of solutions but on a smaller subspace considering non-overlapping
intervals defined by the solutions. Subspace is defined by the range near the specified minimum keyword.
We repeatedly pick a range up and flip the unsatisfied keywords, so the relevant ranges are detected. The
proposed method tries to improve the plane sweep algorithm by efficiently calculating the minimal group of
words and enumerating intervals in a document which contain the minimum frequency keyword. It
decreases the number of comparison and creates the best state of optimized search algorithm especially in
a high volume of data. Efficiency and reliability are also increased compared to the previous modes of the
technical approach.
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
Understanding the semantic relationships between terms is a fundamental task in natural language
processing applications. While structured resources that can express those relationships in
a formal way, such as ontologies, are still scarce, a large number of linguistic resources gathering
dictionary definitions is becoming available, but understanding the semantic structure of natural
language definitions is fundamental to make them useful in semantic interpretation tasks. Based
on an analysis of a subset of WordNet’s glosses, we propose a set of semantic roles that compose
the semantic structure of a dictionary definition, and show how they are related to the definition’s
syntactic configuration, identifying patterns that can be used in the development of information
extraction frameworks and semantic models.
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
We have attempted in this paper to reduce the number of checked condition through saving frequency of the
tandem replicated words, and also using non-overlapping iterative neighbor intervals on plane sweep
algorithm. The essential idea of non-overlapping iterative neighbor search in a document lies in focusing
the search not on the full space of solutions but on a smaller subspace considering non-overlapping
intervals defined by the solutions. Subspace is defined by the range near the specified minimum keyword.
We repeatedly pick a range up and flip the unsatisfied keywords, so the relevant ranges are detected. The
proposed method tries to improve the plane sweep algorithm by efficiently calculating the minimal group of
words and enumerating intervals in a document which contain the minimum frequency keyword. It
decreases the number of comparison and creates the best state of optimized search algorithm especially in
a high volume of data. Efficiency and reliability are also increased compared to the previous modes of the
technical approach.
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemLiwei Ren任力偉
The bad character shift rule of Boyer-Moore string search algorithm is studied in this paper for the purpose of extending it to more general string match problems. An abstract problem of string match is defined in general. An optimized string match algorithm based one the bad character heuristics is proposed to solve the abstract match problem efficiently.
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
Learning over images and understanding the quality of content play an important role at Pinterest. This talk will present a Spark based system responsible for detecting near (and far) duplicate images. The system is used to improve the accuracy of recommendations and search results across a number of production surfaces at Pinterest.
At the core of the pipeline is a Spark implementation of batch LSH (locality sensitive hashing) search capable of comparing billions of items on a daily basis. This implementation replaced an older (MR/Solr/OpenCV) system, increasing throughput by 13x and decreasing runtime by 8x. A generalized Spark Batch LSH is now used outside of the image similarity context by a number of consumers. Inverted index compression using variable byte encoding, dictionary encoding, and primitives packing are some examples of what allows this implementation to scale. The second part of this talk will detail training and integration of a Tensorflow neural net with Spark, used in the candidate selection step of the system. By directly leveraging vectorization in a Spark context we can reduce the latency of the predictions and increase the throughput.
Overall, this talk will cover a scalable Spark image processing and prediction pipeline.
Entity Linking in Queries: Efficiency vs. EffectivenessFaegheh Hasibi
Slides for the ECIR 2017 paper "Entity Linking in Queries: Efficiency vs. Effectiveness"
Identifying and disambiguating entity references in queries is one of the core enabling components for semantic search. While there is a large body of work on entity linking in documents, entity linking in queries poses new challenges due to the limited context the query provides coupled with the efficiency requirements of an online setting. Our goal is to gain a deeper understanding of how to approach entity linking in queries, with a special focus on how to strike a balance between effectiveness and efficiency. We divide the task of entity linking in queries to two main steps: candidate entity ranking and disambiguation, and explore both unsupervised and supervised alternatives for each step. Our main
finding is that best overall performance (in terms of efficiency and effectiveness) can be achieved by employing supervised learning for the entity ranking step, while tackling disambiguation with a simple unsupervised algorithm. Using the Entity Recognition and Disambiguation Challenge platform, we further demonstrate that our recommended method achieves state-of-the-art performance.
Gleaning Types for Literals in RDF with Application to Entity SummarizationKalpa Gunaratna
ESWC 2016 talk about how to compute types (ontology classes) for literals and add semantics to them, making them richer. Then utilize them in an entity summarization usecase.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemLiwei Ren任力偉
The bad character shift rule of Boyer-Moore string search algorithm is studied in this paper for the purpose of extending it to more general string match problems. An abstract problem of string match is defined in general. An optimized string match algorithm based one the bad character heuristics is proposed to solve the abstract match problem efficiently.
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
Learning over images and understanding the quality of content play an important role at Pinterest. This talk will present a Spark based system responsible for detecting near (and far) duplicate images. The system is used to improve the accuracy of recommendations and search results across a number of production surfaces at Pinterest.
At the core of the pipeline is a Spark implementation of batch LSH (locality sensitive hashing) search capable of comparing billions of items on a daily basis. This implementation replaced an older (MR/Solr/OpenCV) system, increasing throughput by 13x and decreasing runtime by 8x. A generalized Spark Batch LSH is now used outside of the image similarity context by a number of consumers. Inverted index compression using variable byte encoding, dictionary encoding, and primitives packing are some examples of what allows this implementation to scale. The second part of this talk will detail training and integration of a Tensorflow neural net with Spark, used in the candidate selection step of the system. By directly leveraging vectorization in a Spark context we can reduce the latency of the predictions and increase the throughput.
Overall, this talk will cover a scalable Spark image processing and prediction pipeline.
Entity Linking in Queries: Efficiency vs. EffectivenessFaegheh Hasibi
Slides for the ECIR 2017 paper "Entity Linking in Queries: Efficiency vs. Effectiveness"
Identifying and disambiguating entity references in queries is one of the core enabling components for semantic search. While there is a large body of work on entity linking in documents, entity linking in queries poses new challenges due to the limited context the query provides coupled with the efficiency requirements of an online setting. Our goal is to gain a deeper understanding of how to approach entity linking in queries, with a special focus on how to strike a balance between effectiveness and efficiency. We divide the task of entity linking in queries to two main steps: candidate entity ranking and disambiguation, and explore both unsupervised and supervised alternatives for each step. Our main
finding is that best overall performance (in terms of efficiency and effectiveness) can be achieved by employing supervised learning for the entity ranking step, while tackling disambiguation with a simple unsupervised algorithm. Using the Entity Recognition and Disambiguation Challenge platform, we further demonstrate that our recommended method achieves state-of-the-art performance.
Gleaning Types for Literals in RDF with Application to Entity SummarizationKalpa Gunaratna
ESWC 2016 talk about how to compute types (ontology classes) for literals and add semantics to them, making them richer. Then utilize them in an entity summarization usecase.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
Automated building of taxonomies for search enginesBoris Galitsky
We build a taxonomy of entities which is intended to improve the relevance of search engine in a vertical domain. The taxonomy construction process starts from the seed entities and mines the web for new entities associated with them. To form these new entities, machine learning of syntactic parse trees (their generalization) is applied to the search results for existing entities to form commonalities between them. These commonality expressions then form parameters of existing entities, and are turned into new entities at the next learning iteration.
Taxonomy and paragraph-level syntactic generalization are applied to relevance improvement in search and text similarity assessment. We conduct an evaluation of the search relevance improvement in vertical and horizontal domains and observe significant contribution of the learned taxonomy in the former, and a noticeable contribution of a hybrid system in the latter domain. We also perform industrial evaluation of taxonomy and syntactic generalization-based text relevance assessment and conclude that proposed algorithm for automated taxonomy learning is suitable for integration into industrial systems. Proposed algorithm is implemented as a part of Apache OpenNLP.Similarity project.
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
Although there is an emerging trend towards generating embeddings for primarily unstructured data and, recently, for structured data, no systematic suite for measuring the quality of embeddings has been proposed yet.
This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of the encoded structure as well as semantic patterns in the embedding space.
In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect.
Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings.
Furthermore, w.r.t. this framework, multiple experimental studies were run to compare the quality of the available embedding models.
Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts.
We positioned our sampled data and code at https://github.com/alshargi/Concept2vec under GNU General Public License v3.0.
Presented at Hypertext'13.
Topic classification (TC) of short text messages o↵ers an ef- fective and fast way to reveal events happening around the world ranging from those related to Disaster (e.g. Sandy hurricane) to those related to Violence (e.g. Egypt revolu- tion). Previous approaches to TC have mostly focused on exploiting individual knowledge sources (KS) (e.g. DBpedia or Freebase) without considering the graph structures that surround concepts present in KSs when detecting the top- ics of Tweets. In this paper we introduce a novel approach for harnessing such graph structures from multiple linked KSs, by: (i) building a conceptual representation of the KSs, (ii) leveraging contextual information about concepts by exploiting semantic concept graphs, and (iii) providing a principled way for the combination of KSs. Experiments evaluating our TC classifier in the context of Violence detec- tion (VD) and Emergency Responses (ER) show promising results that significantly outperform various baseline models including an approach using a single KS without linked data and an approach using only Tweets.
Survey of Generative Clustering Models 2008Roman Stanchak
Survey of Generative Clustering Models "Probabilistic Topic Models" circa 2008. Class presentation by Roman Stanchak and Prithviraj Sen for University of Maryland College Park cmsc828g, Link Mining and Dynamic Graph Analysis. Spring 2008. Instructor: Prof. Lise Getoor
Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favourable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or ‘duet’ performs significantly better than either neural network individually on a Web page ranking task, and significantly outperforms traditional baselines and other recently proposed models based on neural networks.
Similar to EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs (20)
"Benchmarking Big Linked Data: The case of the HOBBIT Project" as presented in the First International Workshop on Semantic Web Technologies for Health Data Management (SWH 2018), co-located with ISWC 2018, 9th October, 2018 held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning Benchmark" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2018" as presented in the 12th ACM International Conference on Distributed and Event-Based Systems (DEBS 2017), 25 - 29 June, 2017 held in Hammilton, New Zeland
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking of distributed linked data streaming systems" as presented in the Stream Reasoning Workshop 2018, January 16-17, 2018, held by Department of Informatics DDIS (University of Zurich) in Zurich, Suisse
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SQCFramework: SPARQL Query Containment Benchmarks Generation Framework" as presented in the 9th International Conference on Knowledge Capture(K-Cap 2017), December 4th-6th, 2017, held in Austin, USA.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation" as presented in the 17th International Semantic Web Conference ISWC ( ournal track), 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2017" as presented in the The 11th ACM International Conference on Distributed and Event-Based Systems, 19 - 23 June, 2017 held in Barcelona, Spain
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QALD-9 Question Answering over Linked Data Challenge" as presented in the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" poster presented ECAI 2016, September 2016, held in the Hague, Netherlands.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"An Evaluation of Models for Runtime Approximation in Link Discovery" as presented in the IEEE/WIC/ACM WI, August 25th, 2017, held in Leipzig, Germany.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" as presented in the 15th International Semantic Web Conference ISWC, Doctoral Consortium, October 18th, 2016, held in Kobe, Japan
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SPgen: A Benchmark Generator for Spatial Link Discovery Tools" as presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign" was presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
OKE Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
MOCHA Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Paper presented at European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
HOBBIT project overview presented at European Big Data Value Forum, 21-23 Nov 2017, held in Versailles, France (Palais des Congres).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
More from Holistic Benchmarking of Big Linked Data (20)
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs
1. EARL
Joint Entity and Relation Linking for
Question Answering over Knowledge Graphs
Mohnish Dubey1,2
, Debayan Banerjee1,2
, Debanjan Chaudhuri1,2
,Jens Lehmann1,2
1
University of Bonn, Bonn, Germany
2
Fraunhofer IAIS, St. Augustin, Germany
2. Question Answering
Entity linking in a minute
EARL Overview
Approach 1: GTSP
Approach 2: Connection Density
Comparison of the two approach
Evaluation
Conclusion
Outline
4. Question Answering over KG
Understand the intent of a factual question, and return the
implicit KB resource.
Typically treated as a translation problem from natural
language to formal language.
Seen major advancement in the past five years.
4
5. Question Answering over Knowledge Graph
end-to-end
machine learning
fully rule based
systems
5
6. dbr:Angela_Merkel
dbo:birthPlace“Angela Merkel”
Select right template for
the type of question
Where was Angela Merkel born?
NER
“born”Relation Linking
“Where/WRB was/VBD
Angela/NNP Merkel/NNP
born/VBN ?/.”
NLP
Preprocessing
Entity Linking
Relation Linking
SPARQL type
SELECT DISTINCT ?uri WHERE {
dbr:Angela_Merkel dbo:birthPlace ?uri }
Question Entity ; Relation Extracted ; Expected Answer type
Question Understanding
6
Refer : [1][2]
7. Key components in QA pipeline
1. Entity Identification and Linking
2. Relation Identification and Linking
3. Query Building
7
8. Key components in QA pipeline
1. Entity Identification and Linking
2. Relation Identification and Linking
3. Query Building
} EARL
8
9. Anatomy of a Entity Linking
System
9
Mention Detection Candidate Selection Disambiguation
Refer : http://qatutorial.sda.tech/
10. Mention Detection
Anatomy
10
Robert John Downey Jr. is an American actor
and singer. Downey was born April 4, 1965 in
Manhattan, New York, the son of writer,
director and filmographer Robert Downey Sr.
and actress Elsie Downey . Beginning in 2008,
Downey began portraying the role of Marvel
Comics superhero Iron Man in the Marvel
Cinematic Universe, appearing in several films
as either the lead role
Candidate Selection Disambiguation
11. Anatomy
11
Robert John Downey Jr. is an American actor
and singer. Downey was born April 4, 1965 in
Manhattan, New York, the son of writer,
director and filmographer Robert Downey Sr.
and actress Elsie Downey . Beginning in 2008,
Downey began portraying the role of Marvel
Comics superhero Iron Man in the Marvel
Cinematic Universe, appearing in several
films as either the lead role
Mention Detection Candidate Selection Disambiguation
12. Anatomy
12
Candidate
Selection
DisambiguationMention Detection
Robert John Downey Jr. is an American actor
and singer. Downey was born April 4, 1965 in
Manhattan, New York, the son of writer,
director and filmographer Robert Downey Sr.
and actress Elsie Downey . Beginning in 2008,
Downey began portraying the role of Marvel
Comics superhero Iron Man in the Marvel
Cinematic Universe, appearing in several
films as either the lead role
13. Anatomy
13
Candidate Selection DisambiguationMention Detection
Robert John Downey Jr. is an American actor
and singer. Downey was born April 4, 1965 in
Manhattan, New York, the son of writer,
director and filmographer Robert Downey Sr.
and actress Elsie Downey . Beginning in 2008,
Downey began portraying the role of Marvel
Comics superhero Iron Man in the Marvel
Cinematic Universe, appearing in several films
as either the lead role
14. But what about questions
● Single sentence
● Mostly with One Entity
● No previous context
- Disambiguation is problem
14
15. Relation Linking
● String Similarity
● Dictionary
● WordNet
● Word Embedding
● Machine learning approaches
15
.. written by ..
.. wrote ..
.. author ...
.. screenwriter..
dbo:writer
16. Advantage Disadvantage
Sequential - Reduces candidate search
space for Relation Linking
- Allows schema verification
- Relation Linking information cannot be
exploited in Entity Linking process
- Allows schema verification
- Errors in Entity Linking cannot be overcome
Parallel - Lower runtime
- Re-ranking of Entities possible
based on Relation Linking
- Entity Linking process cannot use information
from Relation Linking process and vice versa
- Does not allow schema verification
Joint - Potentially high accuracy
- Reduces error propagation
- Better disambiguation
- Allows schema verification
- Allows re-ranking
- Complexity increase
- Larger search space
State of the art for Entity and Relation linking in QA
16
17. Preliminaries
Transform KG to the Subdivision Graph
S(G) of a graph G is the graph obtained from G by replacing each
edge e = (u, v) of G by a new vertex we
and 2 new edges (u, we
) and (v, we
) .
17
18. Postulates
H1:Given candidate lists of entities and relations from a question, the
correct solution is a cycle of minimal cost that visits exactly one
candidate from each list.
H2: Given candidate lists of entities and relations from a question, the
correct candidates exhibit relatively dense and short-hop connections
among themselves in the knowledge graph compared to wrong
candidate sets.
H3: Jointly linking entity and relation leads to higher accuracy compared
to performing these tasks separately.
18
21. Preprocessing
Word Tokenizer
- "Where was the founder of Tesla and SpaceX born?" we identify
<founder, Tesla, SpaceX, born> as our keyword phrases.
- Stop word removal
21
Refer : [3]
"Where was the founder of Tesla and SpaceX born?"
22. Preprocessing
Entity Relation Predictor
- Predict whether each keyword phrase is an entity or a relation
- Character embedding based long-short term memory network (LSTM)
- RDF2Vec[4] provides latent representation for entities and relations in RDF
graphs
22
Refer : [4]
"Where was the founder of Tesla and SpaceX born?"
23. Preprocessing
Candidate Generation
- To retrieve the top candidates for a keyword EARL uses Elasticsearch index of
URI-label pairs
- Expanded labels - Wikidata, Oxford Dictionary and fastText
23
Refer : https://www.elastic.co/products/elasticsearch
https://developer.oxforddictionaries.com/
https://fasttext.cc/
25. Approach 1: GTSP
● f(spot) which maps a string s (the input question) to a set K of substrings of s.
● f(cand) which maps each keyword to a set of candidate node
● f(cost) which maps how closely nodes are related
● Goal: is to find combinations of candidates, which are closely related
25
26. Approach 1: GTSP
● f(spot) which maps a string s (the input question) to a set K of substrings of s.
● f(cand) which maps each keyword to a set of candidate node
● f(cost) which maps how closely nodes are related
● Goal: is to find combinations of candidates, which are closely related
26
Who is the husband of the leader of Germany?
Refer : [5]
27. Approach 1: GTSP
● f(spot) which maps a string s (the input question) to a set K of substrings of s.
● f(cand) which maps each keyword to a set of candidate node
● f(cost) which maps how closely nodes are related
● Goal: is to find combinations of candidates, which are closely related
27
Who is the husband of the leader of Germany?
28. Approach 1: GTSP
● f(spot) which maps a string s (the input question) to a set K of substrings of s.
● f(cand) which maps each keyword to a set of candidate node
● f(cost) which maps how closely nodes are related
● Goal: is to find combinations of candidates, which are closely related
28
Who is the husband of the leader of Germany?
…
….
…..
…
….
…..
…
….
…..
29. Approach 1: GTSP
Each node set cand(k) is called a cluster in this vertex set.
“The GTSP problem is to find a subset V’ = (v1
, . . . , vn
) of
V which contains exactly one node from each cluster and the
total cost n−1
∑i=1
cost(vi
, vi+1
) is minimal with respect to all such
subsets.”
29
32. Approximate GTSP Solvers
The GTSP is NP-hard and hence it is intractable.
GTSP can be reduced to standard TSP.
state-of-the-art approximate GTSP solver is the Lin–Kernighan–Helsgaun algorithm
32
Refer : [6]
33. Drawbacks of GTSP solution
Only provide the best candidate for each keyword given the list of candidates
Approximate GTSP solutions do not explore all possible paths and nodes
33
35. Connection Density
Connection Density is based on the three features:
1. Text similarity based initial Rank of the List item (Ri
)
2. Connection-Count (C)
3. Hop-Count (H)
35
36. Connection Density
Connection Density is based on the three features:
1. Text similarity based initial Rank of the List item (Ri
)
Initial Rank of the List (Ri
), is generated by retrieving the candidates
from the search index via text search. This is achieved in the
preprocessing steps
2. Connection-Count (C)
3. Hop-Count (H)
36
37. Connection Density
Connection Density is based on the three features:
1. Text similarity based initial Rank of the List item (Ri
)
2. Connection-Count (C)
The Connection-Count C for an candidate c, is the number of
connections from c to candidates in all the other lists divided by
the total number n of keywords spotted.
3. Hop-Count (H)
37
38. Connection Density
Connection Density is based on the three features:
1. Text similarity based initial Rank of the List item (Ri
)
2. Connection-Count (C)
The Connection-Count C for an candidate c, is the number of
connections from c to candidates in all the other lists divided by
the total number n of keywords spotted.
3. Hop-Count (H)
38
Who is the husband of the leader of Germany?
…
….
…..
…
….
…..
…
….
…..
39. Connection Density
Connection Density is based on the three features:
1. Text similarity based initial Rank of the List item (Ri
)
2. Connection-Count (C)
3. Hop-Count (H)
The Hop-Count H for a candidate c, is the sum of distances
from c to all the other candidates in all the other lists divided
by the total number of keywords spotted.
39
40. Connection Density
Connection Density is based on the three features:
1. Text similarity based initial Rank of the List item (Ri
)
2. Connection-Count (C)
3. Hop-Count (H)
Final rank list ( Rf
) based on the the three features
40
41. Connection Density
Connection Density is based on the three features:
1. Text similarity based initial Rank of the List item (Ri
)
2. Connection-Count (C)
3. Hop-Count (H)
Classifier
Ri
C
H
Rf
41
42. Connection Density
Connection Density is based on the three features:
1. Text similarity based initial Rank of the List item (Ri
)
2. Connection-Count (C)
3. Hop-Count (H)
Classifier
xgboost
Ri
C
H
Rf
42
45. GTSP vs Connection Density
GTSP Connection Density
Requires no training data Requires data to train the XGBoost classifier
The approximate GSTP LKH solution is only
able to return the top result as not all
possible paths are explored.
Returns a list of all possible candidates in
order of score
Time complexity of LKH is O(nL2
)
where n = number of nodes in graph,
L = number of clusters in graph
Time complexity is O(N2
L2
)
where N = number of nodes per cluster,
L = number of clusters in graph
Relies on identifying the path with minimum
cost
Depends on identifying dense and short-hop
connections
45
46. Postulates ( a look back)
H1:Given candidate lists of entities and relations from a question, the
correct solution is a cycle of minimal cost that visits exactly one
candidate from each list.
H2: Given candidate lists of entities and relations from a question, the
correct candidates exhibit relatively dense and short-hop connections
among themselves in the knowledge graph compared to wrong
candidate sets.
H3: Jointly linking entity and relation leads to higher accuracy compared
to performing these tasks separately.
46
47. Evaluation
Contribution 1 : create a gold label data set for entity and relation linking over
LC-QuAD[7] dataset of 5000 questions
Contribution 2 : Extended label set for Dbpedia entity and relations
Experiment 1: hypothesis H1 and H2
Experiment 2: hypothesis H2
Experiment 3 & 4 : hypothesis H3
47
Refer : [7]
48. Experiment 1: GTSP vs Connection density
Approach Accuracy (K=30) Accuracy (K=10) Time Complexity
Brute Force GTSP 0.61 0.62 O(n2
2n
)
LKH- GTSP 0.59 0.58 O(nLn
)
Connection Density 0.61 0.62 O(N2
L2
)
Aim: To compare the time complexity and accuracy of different approaches
48
49. Experiment 2: Reranking
Value of k Rf
based on Ri
Rf
based on C, H Rf
based on Ri
, C, H
k = 10 0.543 0.689 0.708
k = 30 0.544 0.666 0.735
k = 50 0.543 0.617 0.737
k* = 10 0.568 0.864 0.905
k* = 30 0.554 0.779 0.864
k* = 50 0.549 0.603 0.852
Aim: Evaluating Connection Density for predicting the correct entity and
relation candidates from a set of candidates (Metric is MRR scores)
49k* - artificially insert the correct candidate into each list to purely test re-ranking abilities of our system
50. Experiment 3 : Entity Linking
System Accuracy LC-QuAD Accuracy QALD
FOX[8] + AGDISTIS[9] 0.36 0.30
DBpediaSpotlight[10] 0.40 0.42
TextRazor[11] 0.52 0.53
Babelfy[12] 0.56 0.56
EARL without adaptive learning 0.61 0.55
EARL with adaptive learning 0.65 0.57
Aim: To evaluate EARL with other state-of-the-art systems on the entity
linking task (Metric is Accuracy scores (top-1))
50
Refer : [8,9,10,11,12]
51. Experiment 4 : Relation Linking
System Accuracy LC-QuAD Accuracy QALD
ReMatch[13] 0.12 0.31
RelMatch[14] 0.15 0.29
EARL without adaptive learning 0.32 0.45
EARL with adaptive learning 0.36 0.47
Aim: To evaluate EARL with other Relation linking system
(Metric is Accuracy scores (top-1))
51
Refer : [13,14]
52. Discussion
We experimentally achieve similar accuracy as the exact GTSP solution with
both LKH-GTSP and Connection Density with better time complexity.
The current approach does not tackle questions with hidden relations
EARL cannot be used as inference tool for entities
52
53. Conclusion
We provided two strategies for joint linking:
- one based on reducing the problem to an instance of the Generalised
Travelling Salesman problem.
- other based on a connection density based machine learning approach.
53
54. 54
This work was supported by grants from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).
Acknowledgement
55. 55
This work was supported by grants from the EU H2020 Framework Programme
provided for the project WDAqua (ITN, GA. 642795)
Acknowledgement
56. References
56
[1] K. Höffner, S. Walter, E. Marx, R. Usbeck, J. Lehmann, and A.-C. Ngonga Ngomo. Survey on challenges of
question answering in the semantic web. Semantic Web, 2017
[2] K. Singh, A. S. Radhakrishna, A. Both, S. Shekarpour, I. Lytra, R. Usbeck, et al. Why reinvent the wheel: Let’s build
question answering systems together. In Proceedings of the 2018 World Wide Web Conference on World Wide Web.
[3] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost)
from scratch. Journal of Machine Learning Research, 2(Aug):2493–2537, 2011.
[4] P. Ristoski and H. Paulheim. Rdf2vec: Rdf graph embeddings for data mining. In International Semantic Web
Conference, pages 498–514. Springer, 2016.
[5] G. Laporte, H. Mercure, and Y. Nobert. Generalized travelling salesman problem through n sets of nodes: the
asymmetrical case. Discrete Applied Mathematics, 18(2):185-197, 1987.
[6] K. Helsgaun. Solving the equality generalized traveling salesman problem using the lin–kernighan–helsgaun
algorithm. Mathematical Programming Computation, 2015.
[7] P. Trivedi, G. Maheshwari, M. Dubey, and J. Lehmann. Lc-quad: A corpus for complex question answering over
knowledge graphs. In International Semantic Web Conference, 2017.
57. References
57
[8] R. Speck and A.-C. Ngonga Ngomo. Ensemble learning for named entity recognition. In The Semantic Web –
International Semantic Web Conference, 2014.
[9] R. Usbeck, A.-C. N. Ngomo, M. Röder, et al. Agdistis-graph-based disambiguation of named entities using
linked data. In International Semantic Web Conference, 2014.
[10] P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding lighton the web of
documents. In Proceedings of the 7th international conference on semantic systems, 2011.
[11] https://www.textrazor.com/
[12] A. Moro, A. Raganato, and R. Navigli. Entity linking meets word sense disambiguation: a unified approach.
Transactions of the Association for Computational Linguistics, 2014.
[13] I. O. Mulang, K. Singh, and F. Orlandi. Matching natural language relations to knowledge graph properties for
question answering. In Proceedings of the 13th International Conference on Semantic Systems, pages 89–96.
ACM, 2017.
[14] K. Singh, I. O. Mulang, I. Lytra, M. Y. Jaradeh, A. Sakor, M.-E. Vidal, C. Lange, and S. Auer. Capturing
knowledge in semantically-typed relational patterns to enhance relation linking. In Proceedings of the Knowledge
Capture Conference, page 31. ACM, 2017.