Talk given on September 21 to the Bay Area R User Group. The talk walks a stochastic project SVD algrorithm through the steps from initial implementation in R to a proposed implementation using map-reduce that integrates cleanly with R via NFS export of the distributed file system. Not surprisingly, this algorithm is essentially the same as the one used by Mahout.
Talk given on September 21 to the Bay Area R User Group. The talk walks a stochastic project SVD algrorithm through the steps from initial implementation in R to a proposed implementation using map-reduce that integrates cleanly with R via NFS export of the distributed file system. Not surprisingly, this algorithm is essentially the same as the one used by Mahout.
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
Building a transactional key-value store that scales to 100+ nodes (percona l...PingCAP
This slide deck from Siddon Tang, Chief engineer from PingCAP, was for Siddon's talk at Percona Live 2018 regarding how to scale TiKV, an open source transactional Key-Value store to 100+ nodes.
The talk I gave at the Stream Reasoning workshop in TU Berlin on December 8. I give an overview of RSEP-QL and how it can capture and formalise the behaviour of existing RSP engines, e.g. CSPARQL, EP-SPARQL, CQELS, SPARQLstream
The port of a city is a very dynamic environment that houses
lots of companies. Ships come and go, and goods are always on the move. Information integration in geographic information systems is of great importance for tracking purposes, and for proactive and reactive incident handling by the port operators. Linked data and semantic web technologies can be of benefit for the integration of both streaming data such as location data about ships, trains and containers, with static data about companies, their activities and storage sites. With current RDF stream processors it is possible to execute SPARQL queries over RDF data streams taking into account static background data. However, none of them is capable of handling GeoSPARQL queries. GeoSPARQL is an extension to the SPARQL query language for processing geospatial data. To address this challenge we extended the RSP engine C-SPARQL with GeoSPARQL support, making it possible to query geospatial data streams.
Many experts believe that ageing can be delayed, this is one of the main goals of the the Institute of Healthy Ageing at University College London. I will present the results of my lifespan-extension research where we integrated publicly available genes databases in order to identify ageing related genes. I will show what challenges we met and what we have learned about the process of ageing.
Ageing is one of the fundamental mysteries in biology and many scientists are starting to study this fascinating process. I am part of the research group led by Dr Eugene Schuster at UCL Institute of Healthy Ageing. We experiment with Drosophila and Caenorhabditis elegans by modifying their genes in order to create long-lived mutants. The results of our experiments are quantified using high-throughput microarray analysis. Finally we apply information technology in order to understand how the ageing process works. I will show how we mine microarrays data in order to find the connections between thousands of genes and how we identify candidates for ageing genes.
We are interested in building a better understanding of genes functions by harnessing the large quantity of experimental microarray data in the public databases. Our hope is that after understanding the ageing process in simpler organisms we will be able to apply this knowledge in humans.
Cross-referencing expressions levels in thousands of genes and hundreds of experiments turned out to be a computationally challenging problem but Hadoop and Amazon cloud came to our rescue. In this talk I will present a case study based on our use of R with Amazon Elastic MapReduce and will give background on our bioinformatics challenges.
These slides were presented at ApacheCon Europe 2012:
http://www.apachecon.eu/schedule/presentation/3/
Detecting paraphrases using recursive autoencodersFeynman Liang
Presentation on deep learning applied to natural language processing, presented at University of Cambridge Machine Learning Group's Research and Communication Club 2-11-2015 meeting.
Выступление Сергея Кольцова (НИУ ВШЭ) на International Conference on Big Data and its Applications (ICBDA).
ICBDA — конференция для предпринимателей и разработчиков о том, как эффективно решать бизнес-задачи с помощью анализа больших данных.
http://icbda2015.org/
Semantic Web technologies are a set of languages standardized by the World Wide Web Consortium (W3C) and designed to create a web of data that can be processed by machines. One of the core languages of the Semantic Web is Web Ontology Language (OWL), a family of knowledge representation languages for authoring ontologies or knowledge bases. The newest OWL is based on Description Logics (DL), a family of logics that are decidable fragments of first-order logic. leanCoR is a new description logic reasoner designed for experimenting with the new connection method algorithms and optimization techniques for DL. leanCoR is an extension of leanCoP, a compact automated theorem prover for classical first-order logic.
Reference classes: a case study with the poweRlaw packageColin Gillespie
Power-law distributions have been used extensively to characterise many disparate scenarios, inter alia, the sizes of moon craters and annual incomes. Recently power-laws have even been used to characterize terrorist attacks and interstate wars. However, for every correct characterisation that a particular process obeys a power-law, there are many systems that have been incorrectly labelled as being scale-free.
Part of the reason for incorrectly categorising systems with power-law properties is the lack of easy to use software. The poweRlaw package aims to tackles this problem by allowing multiple heavy tail distributions, to be fitted within a standard framework. Within this package, different distributions are represented using reference classes. This enables a consistent interface to be constructed for plotting and parameter inference.
This talk will describe the advantages (and disadvantages) of using reference classes. In particular, how reference classes can be leveraged to allow fast, efficient computation via parameter caching. The talk will also touch upon potential difficulties such as combining reference classes with parallel computation.
What is "logic programming" and "constraint programming"
Prolog in a nutshell
How Prolog "makes pointers safe"
Why Prolog was the ultimate scripting language for AI (backtracking search, interpreters, and DSLs for free)
What is "functional-logic programming" (a taste of the programming language Mercury)
Video recording of the talk: http://youtu.be/Fhc7fPQF1iY
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
Building a transactional key-value store that scales to 100+ nodes (percona l...PingCAP
This slide deck from Siddon Tang, Chief engineer from PingCAP, was for Siddon's talk at Percona Live 2018 regarding how to scale TiKV, an open source transactional Key-Value store to 100+ nodes.
The talk I gave at the Stream Reasoning workshop in TU Berlin on December 8. I give an overview of RSEP-QL and how it can capture and formalise the behaviour of existing RSP engines, e.g. CSPARQL, EP-SPARQL, CQELS, SPARQLstream
The port of a city is a very dynamic environment that houses
lots of companies. Ships come and go, and goods are always on the move. Information integration in geographic information systems is of great importance for tracking purposes, and for proactive and reactive incident handling by the port operators. Linked data and semantic web technologies can be of benefit for the integration of both streaming data such as location data about ships, trains and containers, with static data about companies, their activities and storage sites. With current RDF stream processors it is possible to execute SPARQL queries over RDF data streams taking into account static background data. However, none of them is capable of handling GeoSPARQL queries. GeoSPARQL is an extension to the SPARQL query language for processing geospatial data. To address this challenge we extended the RSP engine C-SPARQL with GeoSPARQL support, making it possible to query geospatial data streams.
Many experts believe that ageing can be delayed, this is one of the main goals of the the Institute of Healthy Ageing at University College London. I will present the results of my lifespan-extension research where we integrated publicly available genes databases in order to identify ageing related genes. I will show what challenges we met and what we have learned about the process of ageing.
Ageing is one of the fundamental mysteries in biology and many scientists are starting to study this fascinating process. I am part of the research group led by Dr Eugene Schuster at UCL Institute of Healthy Ageing. We experiment with Drosophila and Caenorhabditis elegans by modifying their genes in order to create long-lived mutants. The results of our experiments are quantified using high-throughput microarray analysis. Finally we apply information technology in order to understand how the ageing process works. I will show how we mine microarrays data in order to find the connections between thousands of genes and how we identify candidates for ageing genes.
We are interested in building a better understanding of genes functions by harnessing the large quantity of experimental microarray data in the public databases. Our hope is that after understanding the ageing process in simpler organisms we will be able to apply this knowledge in humans.
Cross-referencing expressions levels in thousands of genes and hundreds of experiments turned out to be a computationally challenging problem but Hadoop and Amazon cloud came to our rescue. In this talk I will present a case study based on our use of R with Amazon Elastic MapReduce and will give background on our bioinformatics challenges.
These slides were presented at ApacheCon Europe 2012:
http://www.apachecon.eu/schedule/presentation/3/
Detecting paraphrases using recursive autoencodersFeynman Liang
Presentation on deep learning applied to natural language processing, presented at University of Cambridge Machine Learning Group's Research and Communication Club 2-11-2015 meeting.
Выступление Сергея Кольцова (НИУ ВШЭ) на International Conference on Big Data and its Applications (ICBDA).
ICBDA — конференция для предпринимателей и разработчиков о том, как эффективно решать бизнес-задачи с помощью анализа больших данных.
http://icbda2015.org/
Semantic Web technologies are a set of languages standardized by the World Wide Web Consortium (W3C) and designed to create a web of data that can be processed by machines. One of the core languages of the Semantic Web is Web Ontology Language (OWL), a family of knowledge representation languages for authoring ontologies or knowledge bases. The newest OWL is based on Description Logics (DL), a family of logics that are decidable fragments of first-order logic. leanCoR is a new description logic reasoner designed for experimenting with the new connection method algorithms and optimization techniques for DL. leanCoR is an extension of leanCoP, a compact automated theorem prover for classical first-order logic.
Reference classes: a case study with the poweRlaw packageColin Gillespie
Power-law distributions have been used extensively to characterise many disparate scenarios, inter alia, the sizes of moon craters and annual incomes. Recently power-laws have even been used to characterize terrorist attacks and interstate wars. However, for every correct characterisation that a particular process obeys a power-law, there are many systems that have been incorrectly labelled as being scale-free.
Part of the reason for incorrectly categorising systems with power-law properties is the lack of easy to use software. The poweRlaw package aims to tackles this problem by allowing multiple heavy tail distributions, to be fitted within a standard framework. Within this package, different distributions are represented using reference classes. This enables a consistent interface to be constructed for plotting and parameter inference.
This talk will describe the advantages (and disadvantages) of using reference classes. In particular, how reference classes can be leveraged to allow fast, efficient computation via parameter caching. The talk will also touch upon potential difficulties such as combining reference classes with parallel computation.
What is "logic programming" and "constraint programming"
Prolog in a nutshell
How Prolog "makes pointers safe"
Why Prolog was the ultimate scripting language for AI (backtracking search, interpreters, and DSLs for free)
What is "functional-logic programming" (a taste of the programming language Mercury)
Video recording of the talk: http://youtu.be/Fhc7fPQF1iY
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
Presented at the Fourth openCypher Implementers Meeting
Numerous graph use cases require continuous evaluation of queries over a constantly changing data set, e.g. fraud detection in financial systems, recommendations, and checking integrity constraints. For relational systems, incremental view maintenance has been researched for three decades, resulting in a wide body of literature. The property graph data model and the openCypher language, however, are recent developments, and therefore lack established techniques to perform efficient view maintenance. In this talk, we give an overview of the view maintenance problem for property graphs, discuss why it is particularly difficult and present an approach that tackles a meaningful subset of the language.
Similar to Semantic Parsing with Combinatory Categorial Grammar (CCG) (20)
Applications of Word Vectors in Text Retrieval and Classificationshakimov
Applications of word vectors (word2vec, BERT, etc.) on problems such as text retrieval, classification of textual documents for tasks such as sentiment analysis, spam detection.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
2. Publication
Sherzod Hakimov, Christina Unger, Sebastian Walter, Philipp Cimiano. Applying Semantic Parsing to Question
Answering Over Linked Data: Addressing the Lexical Gap. NLDB 2015: 103-109
3. QALD-4
Question Answering over Linked Data
● multilingual (English, German, Spanish, Italian, French, Dutch, Romanian)
● over 200 training, 50 test questions
● DBpedia 3.9
4. QALD-4
Question:
Which river does the Brooklyn Bridge cross?
Query:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
SELECT DISTINCT ?uri
WHERE {
res:Brooklyn_Bridge dbo:crosses ?uri .
}
Answer:
http://dbpedia.org/resource/East_River
5. Semantic Parsing
● ZC_05 [1] Parsing
○ Lambda Calculus
○ CCG (Combinatory Categorial Grammar)[2]
○ CYK - dynamic programming
● Training
○ Perceptron Algorithm
[1] Luke S. Zettlemoyer, Michael Collins. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic
Categorial Grammars. UAI 2005: 658-666
[2] Mark Steedman. Surface structure and interpretation. Linguistic inquiry 30, MIT Press 1997, ISBN 978-0-262-69193-2, pp. 1-126
6. Geobase880
● 880 questions paired with semantics
● geographic concepts about the U.S.
● categories : city, river, mountain ...
● functions : next_to, loc ...
● entities : colorado, texas ...
Question : what are the major cities in Texas?
Semantics : λx (major(x) ∧ city(x) ∧ location(texas, x))
7. Geobase880
● 880 questions paired with semantics
● geographic concepts about the U.S.
● categories : city, river, mountain ...
● functions : next_to, loc ...
● entities : colorado, texas ...
Question : what are the major cities in Texas?
Semantics : λx (major(x) ∧ city(x) ∧ location(texas, x))
Precision Recall F1
ZC05 96.25 79.29 86.95
10. what are the major cities in Texas
S/(SNP) (SNP)/NP (NP/N) N/N N (NN)/NP NP
λg.λx. (g(x)) λf.f(x) λf.f(x) λg.λx.major(x) ∧ g(x) λx.city(x) λy.λg.λx.location(y, x) ∧ g(x) texas
NN
λg.λx.location(texas, x) ∧ g(x)
11. what are the major cities in Texas
S/(SNP) (SNP)/NP (NP/N) N/N N (NN)/NP NP
λg.λx. (g(x)) λf.f(x) λf.f(x) λg.λx.major(x) ∧ g(x) λx.city(x) λy.λg.λx.location(y, x) ∧ g(x) texas
NN
λg.λx.location(texas, x) ∧ g(x)
N
λx.location(texas, x) ∧ city(x)
12. what are the major cities in Texas
S/(SNP) (SNP)/NP (NP/N) N/N N (NN)/NP NP
λg.λx.(g(x)) λf.f(x) λf.f(x) λg.λx.major(x) ∧ g(x) λx.city(x) λy.λg.λx.location(y, x) ∧ g(x) texas
NN
λg.λx.location(texas, x) ∧ g(x)
N
λx.location(texas, x) ∧ city(x)
N
λx.major(x) ∧ location(texas, x) ∧ city(x)
13. what are the major cities in Texas
S/(SNP) (SNP)/NP (NP/N) N/N N (NN)/NP NP
λg.λx.(g(x)) λf.f(x) λf.f(x) λg.λx.major(x) ∧ g(x) λx.city(x) λy.λg.λx.location(y, x) ∧ g(x) texas
NN
λg.λx.location(texas, x) ∧ g(x)
N
λx.location(texas, x) ∧ city(x)
N
λx.major(x) ∧ location(texas, x) ∧ city(x)
NP
λx.major(x) ∧ location(texas, x) ∧ city(x)
14. what are the major cities in Texas
S/(SNP) (SNP)/NP (NP/N) N/N N (NN)/NP NP
λg.λx.(g(x)) λf.f(x) λf.f(x) λg.λx.major(x) ∧ g(x) λx.city(x) λy.λg.λx.location(y, x) ∧ g(x) texas
NN
λg.λx.location(texas, x) ∧ g(x)
N
λx.location(texas, x) ∧ city(x)
N
λx.major(x) ∧ location(texas, x) ∧ city(x)
NP
λx.major(x) ∧ location(texas, x) ∧ city(x)
(SNP)
λx.major(x) ∧ location(texas, x) ∧ city(x)
15. what are the major cities in Texas
S/(SNP) (SNP)/NP (NP/N) N/N N (NN)/NP NP
λg.λx.(g(x)) λf.f(x) λf.f(x) λg.λx.major(x) ∧ g(x) λx.city(x) λy.λg.λx.location(y, x) ∧ g(x) texas
NN
λg.λx.location(texas, x) ∧ g(x)
N
λx.location(texas, x) ∧ city(x)
N
λx.major(x) ∧ location(texas, x) ∧ city(x)
NP
λx.major(x) ∧ location(texas, x) ∧ city(x)
(SNP)
λx.major(x) ∧ location(texas, x) ∧ city(x)
S
λx (major(x) ∧ location(texas, x) ∧ city(x))
16. Semantic Parsing on QALD
● update GENLEX[1] rules for DBpedia predicates, categories, resources
● Convert logical forms in Lambda Calculus to SPARQL queries
[1] Luke S. Zettlemoyer, Michael Collins. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic
Categorial Grammars. UAI 2005: 658-666
23. Results -2
- lexical entries generated by M-ATOLL[1]
- DBpedia Ontology labels (predicates, categories)
Precision Recall F1
Learned lexicon + ontology labels + M-ATOLL 0.70 0.18 0.30
[1] Sebastian Walter, Christina Unger, Philipp Cimiano. M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple
Languages. International Semantic Web Conference (1) 2014: 472-486