This document summarizes Tomasz Jurczyk's PhD dissertation defense on improving question answering. It discusses his research contributions in sentence-based factoid question answering, including developing a multi-stage annotation scheme and exploring neural architectures. It also covers his work on non-factoid question answering, such as combining multiple question answering corpora and solving elementary arithmetic questions by constructing equations.
News Session-Based Recommendations Using Deep Neural NetworksFelipe Ferreira
News recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, fast growing number of items, accelerated item's value decay, and users preferences dynamic shift. Some promising results have been recently achieved by the usage of Deep Learning techniques on Recommender Systems, specially for item's feature extraction and for session-based recommendations with Recurrent Neural Networks. In this paper, it is proposed an instantiation of the CHAMELEON -- a Deep Learning Meta-Architecture for News Recommender Systems. This architecture is composed of two modules, the first responsible to learn news articles representations, based on their text and metadata, and the second module aimed to provide session-based recommendations using Recurrent Neural Networks. The recommendation task addressed in this work is next-item prediction for users sessions: "what is the next most likely article a user might read in a session?" Users sessions context is leveraged by the architecture to provide additional information in such extreme cold-start scenario of news recommendation. Users' behavior and item features are both merged in an hybrid recommendation approach. A temporal offline evaluation method is also proposed as a complementary contribution, for a more realistic evaluation of such task, considering dynamic factors that affect global readership interests like popularity, recency, and seasonality. Experiments with an extensive number of session-based recommendation methods were performed and the proposed instantiation of CHAMELEON meta-architecture obtained a significant relative improvement in top-n accuracy and ranking metrics (10% on Hit Rate and 13% on MRR) over the best benchmark methods.
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
The emergence of deep learning-based methods for information retrieval (IR) poses several challenges and opportunities for benchmarking. Some of these are new, while others have evolved from existing challenges in IR exacerbated by the scale at which deep learning models operate. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track, and reflect on the road ahead.
Presented at the Global Pharma R&D Informatics Congress. To find out more, visit:
www.global-engage.com
Text mining extracts complex information from text (entities, events and epistemic knowledge). It can be used to support pathway construction and the design of experiments by extracting evidence from literature. In this presentation, Sophia Ananiadou, Director of the National Centre for Text Mining, discusses bridging the gap between knowledge and text in cancer biology.
With the rise of Web 2.0, API-based software has appeared. This article examines the API-based search tool created for the Korean search engine Naver: Webonaver (Webometrics Tool for Naver). The software is able to collect large amounts of data automatically and can easily distinguish between different types of information on the web, which was impossible before. In particular, Internet researchers can improve efficiency of data analysis within a specified timeframe using this tool. This paper illustrates how to use WeboNaver and tries to verify the usability and reliability through several case studies. In this article, Korean National Assembly Members’ web presence was analyzed, as was the web presence of the term H1N1.
Web 2.0의 도래와 함께 Open API를 응용한 소프트웨어 프로그램이 등장하면서 더 이상 사용자들은 웹에서 정보를 수동으로 검색하면서 일일이 살펴보는 번거로움을 겪지 않아도 된다. 공개된 API를 활용해 몇 번의 간단한 조작으로 방대한 데이터를 체계적으로 수집하고 관리할 수 있다. 본 논문은 Open API를 응용해 개발한 검색전문 프로그램 WeboNaver(Webometrics Tool for Naver)를 소개한다. 이는 한국에서 가장 영향력 있는 검색엔진 중의 하나인 네이버를 이용해 방대한 데이터를 카테고리별로 자동수집하여 저장해주는 프로그램이다. 연구자들은 이를 활용해 데이터 관리와 처리, 분석 과정에 정확성과 고도의 효율성을 기할 수 있을 것이다. 논문의 목적은 WeboNaver의 사용을 원하는 학생, 일반인, 연구자의 이해를 돕고자 실제 사례들을 통하여 분석절차를 구체적으로 제시해 그 유용성을 입증하는 것이다. 이 프로그램을 사용하여 18대 국회의원 292명의 웹가시성을 조사하였다. 또한 신종플루와 관련된 단어들의 웹 가시성을 분석하였다.
Many data mining and knowledge discovery methodologies and process models have been developed, with varying degrees of success, there are three main methods used to discover patterns in data; KDD, SEMMA and CRISP-DM. They are presented in many of the publications of the area and are used in practice. To our knowledge, there is no clear methodology developed to support link mining. However, there is a well known methodology in knowledge discovery in databases, known as Cross Industry Standard Process for Data Mining (CRISPDM), developed by a consortium of several industrial companies which can be relevant to the study of link mining. In this study CRISP-DM has been adapted to the field of Link mining to detect anomalies. An important goal in link mining is the task of inferring links that are not yet known in a given network. This approach is implemented through the use of a case study of real world data (co-citation data). This case study aims to use mutual information to interpret the semantics of anomalies identified in co-citation, dataset that can provide valuable insights in determining the nature of a given link and potentially identifying important future link relationships
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Matteo Romanello
Slides of my presentation at the Digital Classics Association's titled *Making Meaning from Data* at the 146th Annual Meeting of Society for Classical Studies (was American Philological Association).
News Session-Based Recommendations Using Deep Neural NetworksFelipe Ferreira
News recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, fast growing number of items, accelerated item's value decay, and users preferences dynamic shift. Some promising results have been recently achieved by the usage of Deep Learning techniques on Recommender Systems, specially for item's feature extraction and for session-based recommendations with Recurrent Neural Networks. In this paper, it is proposed an instantiation of the CHAMELEON -- a Deep Learning Meta-Architecture for News Recommender Systems. This architecture is composed of two modules, the first responsible to learn news articles representations, based on their text and metadata, and the second module aimed to provide session-based recommendations using Recurrent Neural Networks. The recommendation task addressed in this work is next-item prediction for users sessions: "what is the next most likely article a user might read in a session?" Users sessions context is leveraged by the architecture to provide additional information in such extreme cold-start scenario of news recommendation. Users' behavior and item features are both merged in an hybrid recommendation approach. A temporal offline evaluation method is also proposed as a complementary contribution, for a more realistic evaluation of such task, considering dynamic factors that affect global readership interests like popularity, recency, and seasonality. Experiments with an extensive number of session-based recommendation methods were performed and the proposed instantiation of CHAMELEON meta-architecture obtained a significant relative improvement in top-n accuracy and ranking metrics (10% on Hit Rate and 13% on MRR) over the best benchmark methods.
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
The emergence of deep learning-based methods for information retrieval (IR) poses several challenges and opportunities for benchmarking. Some of these are new, while others have evolved from existing challenges in IR exacerbated by the scale at which deep learning models operate. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track, and reflect on the road ahead.
Presented at the Global Pharma R&D Informatics Congress. To find out more, visit:
www.global-engage.com
Text mining extracts complex information from text (entities, events and epistemic knowledge). It can be used to support pathway construction and the design of experiments by extracting evidence from literature. In this presentation, Sophia Ananiadou, Director of the National Centre for Text Mining, discusses bridging the gap between knowledge and text in cancer biology.
With the rise of Web 2.0, API-based software has appeared. This article examines the API-based search tool created for the Korean search engine Naver: Webonaver (Webometrics Tool for Naver). The software is able to collect large amounts of data automatically and can easily distinguish between different types of information on the web, which was impossible before. In particular, Internet researchers can improve efficiency of data analysis within a specified timeframe using this tool. This paper illustrates how to use WeboNaver and tries to verify the usability and reliability through several case studies. In this article, Korean National Assembly Members’ web presence was analyzed, as was the web presence of the term H1N1.
Web 2.0의 도래와 함께 Open API를 응용한 소프트웨어 프로그램이 등장하면서 더 이상 사용자들은 웹에서 정보를 수동으로 검색하면서 일일이 살펴보는 번거로움을 겪지 않아도 된다. 공개된 API를 활용해 몇 번의 간단한 조작으로 방대한 데이터를 체계적으로 수집하고 관리할 수 있다. 본 논문은 Open API를 응용해 개발한 검색전문 프로그램 WeboNaver(Webometrics Tool for Naver)를 소개한다. 이는 한국에서 가장 영향력 있는 검색엔진 중의 하나인 네이버를 이용해 방대한 데이터를 카테고리별로 자동수집하여 저장해주는 프로그램이다. 연구자들은 이를 활용해 데이터 관리와 처리, 분석 과정에 정확성과 고도의 효율성을 기할 수 있을 것이다. 논문의 목적은 WeboNaver의 사용을 원하는 학생, 일반인, 연구자의 이해를 돕고자 실제 사례들을 통하여 분석절차를 구체적으로 제시해 그 유용성을 입증하는 것이다. 이 프로그램을 사용하여 18대 국회의원 292명의 웹가시성을 조사하였다. 또한 신종플루와 관련된 단어들의 웹 가시성을 분석하였다.
Many data mining and knowledge discovery methodologies and process models have been developed, with varying degrees of success, there are three main methods used to discover patterns in data; KDD, SEMMA and CRISP-DM. They are presented in many of the publications of the area and are used in practice. To our knowledge, there is no clear methodology developed to support link mining. However, there is a well known methodology in knowledge discovery in databases, known as Cross Industry Standard Process for Data Mining (CRISPDM), developed by a consortium of several industrial companies which can be relevant to the study of link mining. In this study CRISP-DM has been adapted to the field of Link mining to detect anomalies. An important goal in link mining is the task of inferring links that are not yet known in a given network. This approach is implemented through the use of a case study of real world data (co-citation data). This case study aims to use mutual information to interpret the semantics of anomalies identified in co-citation, dataset that can provide valuable insights in determining the nature of a given link and potentially identifying important future link relationships
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Matteo Romanello
Slides of my presentation at the Digital Classics Association's titled *Making Meaning from Data* at the 146th Annual Meeting of Society for Classical Studies (was American Philological Association).
Dynamic Search Using Semantics & StatisticsPaul Hofmann
This presentation shows 3 applications of successfully combining semantics and statistics for text mining and interactive search.
1) We predict the Lehman bankruptcy using statistical topic modeling, SAP Business Objects entity extraction and associative memories (powered by Saffron Technologies).
2) We semi-automatically handle service requests at Cisco using knowledge extraction and knowledge reuse.
3) We discover user intent for interactive retrieval. User intent is defined as a latent state. The observations of this latent state are the reformulated query sequence, and the retrieved documents, together with the positive or negative feedback provided by the user. Demo shows recognizing user’s intent for health care search.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
Accelerating the Pace of Engineering Education with Simulation, Hardware and ...Joachim Schlosser
Presentation for MathWorks (www.mathworks.com) at World Engineering Education Forum 2014, Dubai.
Education is challenging. It always was challenging, and it always will be challenging, but every generation of educators and society has to find answers specific to their era. This talk addresses some of the challenges in engineering education in the 21st century.
Industry complains about the skills gap they face with graduates in engineering, for lack of project awareness, problem solving skills, applicable tool skills or applied science skills. Academia complains about students not bringing the necessary basic skills as engineering freshmen. Teachers complain about a lack of student engagement. Students complain about classes not engaging them and seeming irrelevant.
When putting this chain of challenges – industry, academia, school, students – on its head and starting with the student engagement, one method getting attention is Project-Based Learning. Students educate themselves on concepts they need, with the teacher facilitating the learning experience. Applying theory in practical ways with tools that are used in industry gives students first-hand experience on industry relevant methods as well as the why behind theory. The talk shows examples of programming, modeling and simulation to gain insight into theory and application.
Too often students and educators feel that topics throughout their education are not connected. Early on they lack understanding of the why they are learning something. Later they no longer see the connection of advanced theory to fundamental concepts. Reusing learning artifacts, skills and methods helps mapping out the story. Demonstrations illustrate how educators implement this re-use throughout teaching.
Consequent reuse leads to Integrated Curriculum, where methods and skills in each year build on previous ones. Evaluations in integrated curriculum enabled programs show a higher retention of know-how.
We all can make math, physics and engineering able to experience using simulation and hardware experiments. The tools and resources are there. Let's address our generation's engineering education challenges.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...Phu H. Nguyen
Doing literature reviews is a must for us (researchers) to avoid reinventing the wheel, and to expand the boundary of knowledge. Why not having fun with the snowballing technique and conducting the reviews systematically? This talk shares some insights from a Systematic Mapping Study (SMS) and a Systematic Literature Review (SLR). When to conduct a SMS? When to conduct a SLR? What are the differences?
Dynamic Search Using Semantics & StatisticsPaul Hofmann
This presentation shows 3 applications of successfully combining semantics and statistics for text mining and interactive search.
1) We predict the Lehman bankruptcy using statistical topic modeling, SAP Business Objects entity extraction and associative memories (powered by Saffron Technologies).
2) We semi-automatically handle service requests at Cisco using knowledge extraction and knowledge reuse.
3) We discover user intent for interactive retrieval. User intent is defined as a latent state. The observations of this latent state are the reformulated query sequence, and the retrieved documents, together with the positive or negative feedback provided by the user. Demo shows recognizing user’s intent for health care search.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
Accelerating the Pace of Engineering Education with Simulation, Hardware and ...Joachim Schlosser
Presentation for MathWorks (www.mathworks.com) at World Engineering Education Forum 2014, Dubai.
Education is challenging. It always was challenging, and it always will be challenging, but every generation of educators and society has to find answers specific to their era. This talk addresses some of the challenges in engineering education in the 21st century.
Industry complains about the skills gap they face with graduates in engineering, for lack of project awareness, problem solving skills, applicable tool skills or applied science skills. Academia complains about students not bringing the necessary basic skills as engineering freshmen. Teachers complain about a lack of student engagement. Students complain about classes not engaging them and seeming irrelevant.
When putting this chain of challenges – industry, academia, school, students – on its head and starting with the student engagement, one method getting attention is Project-Based Learning. Students educate themselves on concepts they need, with the teacher facilitating the learning experience. Applying theory in practical ways with tools that are used in industry gives students first-hand experience on industry relevant methods as well as the why behind theory. The talk shows examples of programming, modeling and simulation to gain insight into theory and application.
Too often students and educators feel that topics throughout their education are not connected. Early on they lack understanding of the why they are learning something. Later they no longer see the connection of advanced theory to fundamental concepts. Reusing learning artifacts, skills and methods helps mapping out the story. Demonstrations illustrate how educators implement this re-use throughout teaching.
Consequent reuse leads to Integrated Curriculum, where methods and skills in each year build on previous ones. Evaluations in integrated curriculum enabled programs show a higher retention of know-how.
We all can make math, physics and engineering able to experience using simulation and hardware experiments. The tools and resources are there. Let's address our generation's engineering education challenges.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...Phu H. Nguyen
Doing literature reviews is a must for us (researchers) to avoid reinventing the wheel, and to expand the boundary of knowledge. Why not having fun with the snowballing technique and conducting the reviews systematically? This talk shares some insights from a Systematic Mapping Study (SMS) and a Systematic Literature Review (SLR). When to conduct a SMS? When to conduct a SLR? What are the differences?
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Improving Question Answering by Bridging Linguistic Structures with Statistical Learning
1. Improving Question Answering by
Bridging Linguistic Structures with
Statistical Learning
Tomasz Jurczyk
Advisor: Jinho D. Choi
Emory University
11/02/2017
PhD Dissertation Defense
2. Want big
impact?
Use big image.
2
Image: https://www.psychologicalscience.org/news/minds-business/asking-questions-increases-likability.html
3. Want big
impact?
Use big image.
3
Image: https://www.psychologicalscience.org/news/minds-business/asking-questions-increases-likability.htmlImage: http://www.kindynews.com/blog/kids-ask-how-many-questions-per-day
4. Want big
impact?
Use big image.
4
Image: https://www.psychologicalscience.org/news/minds-business/asking-questions-increases-likability.htmlImage: https://www.shutterstock.com/video/clip-11021852-stock-footage-hispanic-woman-reading-laying-on-the-floor-of-the-library-k.html
5. Want big
impact?
Use big image.
5
Image: https://www.psychologicalscience.org/news/minds-business/asking-questions-increases-likability.htmlImage: https://autoshopsolutions.com/top-10-questions-ask-web-design-internet-marketing-company/
6. 6
“Questions vs. Queries in Informational Search Tasks”, Ryen W. White et al., WWW 2015
http://www.internetlivestats.com/google-search-statistics/
11. Research
Goal
Improve various question answering
aspects by combining linguistic structures
with statistical learning and constructing
abstract text representations. Address the
challenges for the applications in
cross-genre tasks 11
13. Research Contributions
13
A multi-stage
annotation scheme
For sentence-based factoid
question answering using
crowdsourcing technique
Exploration of neural
architectures for FQA
Convolutional neural
networks for
sentence-based FQA
14. Research Contributions
14
A multi-stage
annotation scheme
For sentence-based factoid
question answering using
crowdsourcing technique
A subtree matching
mechanism
For measuring contextual
similarity between two
sentences
Exploration of neural
architectures for FQA
Convolutional neural
networks for
sentence-based FQA
15. Research Contributions
15
A multi-stage
annotation scheme
For sentence-based factoid
question answering using
crowdsourcing technique
A subtree matching
mechanism
For measuring contextual
similarity between two
sentences
Exploration of neural
architectures for FQA
Convolutional neural
networks for
sentence-based FQA
Combining multiple
QA corpora
Improving the performance
of QA systems by
cross-using multiple sets
16. Research Contributions
16
A multi-stage
annotation scheme
For sentence-based factoid
question answering using
crowdsourcing technique
A subtree matching
mechanism
For measuring contextual
similarity between two
sentences
Exploration of neural
architectures for FQA
Convolutional neural
networks for
sentence-based FQA
Combining multiple
QA corpora
Improving the performance
of QA systems by
cross-using multiple sets
A semantics-based
graph
Abstract representation
applied on arithmetic
question answering
Sentence-based Factoid Question Answering (2016)
17. Research Contributions
17
A multi-stage
annotation scheme
For sentence-based factoid
question answering using
crowdsourcing technique
A subtree matching
mechanism
For measuring contextual
similarity between two
sentences
Exploration of neural
architectures for FQA
Convolutional neural
networks for
sentence-based FQA
Combining multiple
QA corpora
Improving the performance
of QA systems by
cross-using multiple sets
A semantics-based
graph
Abstract representation
applied on arithmetic
question answering
Multi-field
structural
decomposition
For event-based question
answering
Sentence-based Factoid Question Answering (2016)
18. Research Contributions
18
A multi-stage
annotation scheme
For sentence-based factoid
question answering using
crowdsourcing technique
A subtree matching
mechanism
For measuring contextual
similarity between two
sentences
Exploration of neural
architectures for FQA
Convolutional neural
networks for
sentence-based FQA
Combining multiple
QA corpora
Improving the performance
of QA systems by
cross-using multiple sets
A semantics-based
graph
Abstract representation
applied on arithmetic
question answering
Multi-field
structural
decomposition
For event-based question
answering
Document retrieval
task for Cross-genre
Structure Matching for
conversation and formal
writings
Non-factoid Question Answering
(2015-2016)
Sentence-based Factoid Question Answering (2016)
19. Research Contributions
19
A multi-stage
annotation scheme
For sentence-based factoid
question answering using
crowdsourcing technique
A subtree matching
mechanism
For measuring contextual
similarity between two
sentences
Exploration of neural
architectures for FQA
Convolutional neural
networks for
sentence-based FQA
Combining multiple
QA corpora
Improving the performance
of QA systems by
cross-using multiple sets
A semantics-based
graph
Abstract representation
applied on arithmetic
question answering
Multi-field
structural
decomposition
For event-based question
answering
A multi-gram
attention CNN
For passage completion
task for conversational
dialog texts
Document retrieval
task for Cross-genre
Structure Matching for
conversation and formal
writings
Non-factoid Question Answering
Applications to Cross-genre Tasks (2016-2017)
Sentence-based Factoid Question Answering (2016)
(2015-2016)
21. What is Sentence-based Question
Answering?
Given a question and a list of sentences,
reorder or classify them with respect to how
likely the one answers or supports the answer
to the question.
21
23. Tasks in sentence-based question
answering
Answer Sentence Selection Answer Triggering
A ranking problem A classification/ranking problem
Rerank sentences with respect to how
likely they support the question
Decide whether the answer is among the
sentence candidates
MRR - Mean Reciprocal Rank
(multiplicative inverse of the rank of the
first correct answer)
MAP - Mean Average Precision
Precision and Recall (F1 score)
23
24. 24
How to build
a scalable,
diverse and
challenging
datasets
Image: https://www.meraevents.com/event/How-To-Write-and-Publish-a-Book-
25. Building a sentence-based factoid question
answering corpus
25
Diverse and artificially
challenging datasets
are needed to train
statistical models.
However, access to real
search engines user
queries (Google, Bing,
etc.) is almost
impossible
26. SelQA - a dataset built using a multi-stage
crowdsourcing annotation scheme
26
Crowdsourced
Crowdsourcing
techniques used to
built a dataset
Scalable size
Can be used on a big
scale
Low-cost
Cost-effective due to
quality control
Quality control
Poor-quality
annotations are
rejected
Diverse
Data sources come
from multiple
domains
Challenging
Semantically difficult
due to paraphrase
step
SelQA: A New Benchmark for Selection-Based Question Answering, Jurczyk
et al., ICTAI’2016
27. The annotation process
27
Sample data
collection
(e.g.:
articles
from
Wikipedia) Preprocess
the
collection
(sentence
segmentati
on etc.)
4 annotation
tasks on
MTurk + 1
task using
Elasticsearch
31. 7,904Questions annotated with their contexts
9%More overlapping words compared to WikiQA,
on average
15%Drop of ratio in overlapping words due to
paraphrasing step
31
34. An example: one sentence supports the
question example
34
Question: who lead the polish army in the siege of warsaw?
Sentences:
...
1) Despite German radio broadcasts claiming to have captured
Warsaw, the initial enemy attack was repelled and soon
afterwards Warsaw was placed under siege.
2) The siege lasted until September 28, when the Polish garrison,
commanded under General Walerian Czuma, officially
capitulated.
3) The following day approximately 140,000 Polish soldiers and
troops left the city and were taken as prisoners of war.
...
35. Subtree matching for
contextual semantic
similarity
The dependency
grammar can be used
to match the syntax of
two sentences (a
question and a
sentence candidate)
and to calculate their
semantic similarity
35
36. Subtree matching example
36
Question: Who lead the polish army in the Siege of Warsaw?
Sentence: The siege lasted until September 28, when the Polish
garrison, commanded under General Walerian Czuma, officially
capitulated.
SelQA: A New Benchmark for Selection-Based Question Answering, Jurczyk
et al., ICTAI’2016
37. How to use the subtree
matching features?
The subtree
matching features
are combined with a
convolutional neural
network to prove the
effectiveness in
extracting semantic
similarity
37
39. Answer Sentence Selection on SelQA
39
Model
Development Evaluation
MAP MRR MAP MRR
CNN0
: baseline 84.62 85.65 83.20 84.20
CNN2
: avg + emb 85.70 86.67 84.66 85.68
Santos et al. (2017) - - 87.58 88.12
Shen et al. (2017) - - 89.14 89.93
40. Answer sentence selection on WikiQA
40
Model
Development Evaluation
MAP MRR MAP MRR
CNN0
: baseline 69.93 70.66 65.62 66.46
CNN2
: avg + emb (2016) 69.22 70.18 68.78 70.82
Yang et al. (2015) - - 65.20 66.52
Santos et al. (2016) - - 68.86 69.57
Miao et al. (2016) - - 68.86 70.69
Yin et al. (2016) - - 69.21 71.08
Wang et al. (2016) - - 70.58 72.26
Wang et al. (2017) - - 73.41 74.18
41. Answer triggering on SelQA
41
Model
Development Evaluation
P R F1 P R F1
CNN0
: baseline 50.63 40.60 45.07 52.10 40.34 45.47
CNN2
: max + emb 49.32 48.99 49.16 53.69 48.38 50.89
42. Answer triggering on WikiQA
42
Model
Development Evaluation
P R F1 P R F1
CNN0
: baseline 41.86 42.86 42.35 29.70 37.45 32.73
CNN3
: max + emb+ 44.44 44.44 44.44 29.43 48.56 36.65
Yang et al. (2015) - - - 27.96 37.86 32.17
43. 6.5%Improvement over the state of the art for the
WikiQA dataset in answer sentence selection
F1@36.65New state of the art for the answer triggering
on WikiQA
12%Improvement over the state of the art for the
SelQA dataset in answer triggering
43
45. Taking advantage of
multiple QA corpora
Researchers have
independently released
several QA corpora.
The performance of QA
systems could be
improved by combining
them.
45
48. The results on cross-testing the corpora
48
Trained On
Evaluated on
WikiQA SelQA SQuAD
MAP MRR F1 MAP MRR F1 MAP MRR F1
WikiQA 65.54 67.41 13.33 53.47 54.12 8.68 73.16 73.72 11.26
SelQA 49.05 49.64 24.30 82.72 83.70 48.66 77.22 78.04 44.70
SQuAD 58.17 58.53 19.35 81.15 82.27 42.88 88.84 89.69 44.93
W+S+Q 56.40 56.51 - 83.19 84.25 - 88.78 89.65 -
ALL 60.19 60.68 - 82.88 83.97 - 88.92 89.79 -
49. HigherAccuracy of answer triggering on WikiQA
when trained on SelQA
HigherAccuracy on SQuAD when trained on
combined datasets
FasterConvergence for SQuAD when trained on
SelQA with almost identical performance
49
51. What is non-factoid question
answering?
As an umbrella, it covers a
wide spectrum of tasks
such as recommendation,
arithmetic, visual,
community-based question
answering, etc.
It often requires more
complex and customized
approaches.
51
53. How we solve the following problems?
53
Question
A restaurant served 9
pizzas during lunch and 6
during dinner today. How
many pizzas were served
today?
Sara has 31 red and 15
green balloons. Sandy has
24 red balloons. How many
red balloons do they have
in total?
54. … likely, we construct an equation and
calculate it
54
Question Equation Answer
A restaurant served 9
pizzas during lunch and 6
during dinner today. How
many pizzas were served
today?
x = 9 + 6 x = 15
Sara has 31 red and 15
green balloons. Sandy has
24 red balloons. How many
red balloons do they have
in total?
x = 31 + 24 x = 55
55. Application to arithmetic questions
55
Sequence classification
This task can be seen as a
sequence classification of verb
polarities
Three verb classes
Each verb can be either
positive (+), negative (-) or
neutral (0)
Linear equation formed
Once all polarities are
classified, the equation is
formed
Semantics-based graph
Is used to extract syntactic
and semantic features for verb
classification
56. Natural language processing tasks
56
Semantics-based Graph Approach to Complex Question Answering,
Jurczyk et al., NAACL-SRW’2015
60. The results on the AllenAI dataset
60
Model Accuracy
This work
(2015)
71.75%
Roy et al.
(2014)
64.00%
Hosseini et
al. (2015)
77.7%
Roy et al.
(2016)
78.00%
61. ~6%lower than the previous approach
SuccessfullyApplied the semantics-graph to non-factoid
question answering
ButDoes not require extra annotation in verb
polarities
61
63. How does event-based question answering
look
63
Sentence ID Text Support
1 Fred picked up the football there.
2 Fred gave the football to Jeff
3 What did Fred gave to Jeff? 2
4 Bill went back to the bathroom
5 Jeff grabbed the milk there
6 Who gave the football to Jeff? 2
64. Hybrid system for event-based question
answering
64
NLP/IR solution
A good mix of natural
language processing and
information retrieval
Three groups of fields
Lexical, syntactic, and
semantic representations of
text are extracted
Lucene-based engine
A lucene-based search engine
is used to index extracted
fields
Event-based QA eval.
The approach will be
evaluated on non-factoid
question answering task
Multi-Field Structural Decomposition for Question Answering, Jurczyk et al.
arXiv
70. What is the cross-genre
document retrieval task?
Given a description (query) and a list of
conversational scripts (documents),
retrieve the scripts that are relevant
(support) this description
70
71. Documents: ‘Friends’ scripts
10 seasons
of Friends
1 season =
~24
episodes
71
1 episode =
~14 scenes
1 scene =
~20 utterances
1 utterance =
speaker +
utterance
72. A slice from a scene
Rachel How does going to a strip club help him better?
Ross Because there are naked ladies there.
Joey
Which helps him get to Phase Three, picturing yourself
with other women.
Ross There are naked ladies there too.
Joey Yeah.
72
74. Description examples
74
Dialogue Summary + Plot
Joey
One woman? That’s like saying there’s
only one flavor of ice cream for you.
Lemme tell you something, Ross. There’s
lots of flavors out there.
Joey compares
women to ice cream
Ross
You know you probably didn’t know this,
but back in the high school, I had, a, um,
major crush on you.
Ross reveals his high
school crush on
Rachel
Rachel I knew.
Chandler
Alright, one of you give me your
underpants.
Chandler asks Joey
for his underwear,
but Joey can’t help
him out as he’s not
wearing any
Joey Can’t help you, I’m not wearing any.
76. Structure extraction for conversational and
formal writings
76
“Chandler: Alright, one of you give me your underpants”
Cross-genre Document Retrieval: Matching between Conversational and
Formal Writings, Jurczyk et al.,
BLGNLP 2017 (during EMNLP)
77. How does the structure matching work
77
Indexed scripts’ structures
Match
Description
Extract
Retrieve
82. 47.64%Initial R@1 achieved by Elasticsearch
ButStructure matching is based on single
utterance, this will be improved
9.2%Improvement when the structure extraction
features were used
82
84. Passage completion is reading
comprehension
84
Cross-genre
It is a cross-genre task on the
conversational data
Reading Comprehension
It benchmarks the ability to
read and comprehend the
natural language
Entity-based
It is based on the entity
prediction given a query and a
passage
Towards QA
As a reading comprehension,
it will be crucial for future
question answering
92. 66.59%Best so-far score
ButIs more robust when CNN + Bi-LSTM is used
(accuracy more stable when number of
utterances increased)
~5%Lower than Bi-LSTM
92
100. Research Contributions
100
A multi-stage
annotation scheme
For sentence-based factoid
question answering using
crowdsourcing technique
A subtree matching
mechanism
For measuring contextual
similarity between two
sentences
Exploration of neural
architectures for FQA
Convolutional neural
networks for
sentence-based FQA
Combining multiple
QA corpora
Improving the performance
of QA systems by
cross-using multiple sets
A semantics-based
graph
Abstract representation
applied on arithmetic
question answering
Multi-field
structural
decomposition
For event-based question
answering
A multi-gram
attention CNN
For passage completion
task for conversational
dialog texts
Document retrieval
task for Cross-genre
Structure Matching for
conversation and formal
writings
Non-factoid Question AnsweringSentence-based Factoid Question Answering (2016)
(2015-2016)
Applications to Cross-genre Tasks (2016-2017)
101. The process of the scheme
101
1. ~500 articles are uniformly sampled from Wikipedia from
the following topics: Arts, Country, Food, Historical Events,
Movies, Music, Science, Sports, Travel, TV.
2. Section that have more than 2 and less than 26 sentences
are selected.
3. These sections are used in the annotation scheme and are
sent to annotators.
4. Four tasks are performed on Mechanical Turk, the fifth task
is performed using Elasticsearch.
102. The process of subtree matching
102
1. For a question-sentence pair, extract a list of
overlapping words
2. For each overlapping pairs, extract their tree slices
3. Perform the matching step on three levels: parents,
siblings, and children
4. Calculate their semantic similarity scores
103. The experimentation setup
103
◎ One of the state-of-the-art convolutional neural
networks introduced by Yu et al. (2014) is used to
evaluate QA corpora.
◎ The model consists of a single convolutional layer, a
max pooling, and then the sigmoid function.
◎ 40 words of question and 40 words of answer are used
as the input to the model.
◎ Original splits provided by the creators are used.
104. Natural language processing tasks are used to
build a graph
104
Definition Example
Document a single document of text
microblog note or
Wikipedia article
Entity
a set of instances
referring to the same
instance in given context
“John went to Emory
University. He majored in
CompSci.”
Instance
the atomic level object in
the graph, usually
represented by a modifier
of an instance
“John went to Emory
University with Jessica.”
Predicate-Argument
an argument that
completes the meaning of
other instance (predicate)
“This car was sold to
Michael two days ago.”
Attribute a modifier of a instance “Alicia has a black cat.”
105. Document retrieval in the cross-genre texts
105
◎ Source text is Friends TV show (scripts), while target
text (queries) are episode descriptions.
◎ For a list of source texts (episodes) and a query
(description), retrieve the source that matches the
description.
◎ Structure extraction is presented that improves the
retrieval performance.
◎ This task is a preliminary step to perform a question
answering on conversational scripts.
106. Target texts - Show’s descriptions
106
◎ Episode summaries and plots have been crawled from
fan sites.
◎ Summaries are one-paragraph texts, usually of 5-6
sentences that provide a high overview of an episode.
◎ Plots are multi-paragraph texts, usually giving more
detailed description of an episode
◎ Each summary and plot was sentence segmented,
tokenized and is represented as a single query.
◎ The set of over 5,000 queries is used in the
experimentation.
107. Improving R@1 given R@10
107
◎ Extracted relations are now used to improve R@1
when 10 (k=10) most relevant documents are given.
◎ Two-stage classification setup is presented that uses
the extracted relations.
◎ Feed forward neural network is used to train the
model which makes a decision whether the episode
ranked as top (k 1) should be returned as the correct
one.
◎ If not, the initial ranking from Elasticsearch is paired
with structure matching scores, and a new
prediction is made.
108. Passage completion for cross-genre texts
108
◎ It is very difficult to extract actual answers from the
dialogue data, where an answer might be contained
within a single or many utterances.
◎ The machine must understand the logic of the
human dialogue first, it can not just “extract” the
answer from the, it must infer it.
◎ As a proxy task, we first want to tackle a passage
completion task.
◎ A query that consists of one or more entities is tested
against a document text (news article)
109. Tackling complexity of arithmetic questions
109
◎ It is relatively easy to develop a question answering
system for a single type of questions.
◎ Arithmetic questions seen on the previous slide
require reasoning on the abstract level.
◎ A semantics-based graph approach is presented that
builds an abstract representation of the text
110. Application to arithmetic questions
110
◎ This task is represented as a sequence classification
of verb polarities.
◎ Each verb can be either positive (+), negative (-) or
neutral (0).
◎ Positive/negative verb yields an addition/subtraction
operation with its associated chain node. Neutral is
omitted from the linear equation.
◎ Prediction is made for all recognized verbs in a
sentence, then a linear equation is formed and solved.
◎ Presented graph is applied to extract abstract
features based on text.
111. Hybrid system for event-based question
answering
111
◎ A solution that is a good mix of natural language
processing and information retrieval is presented.
◎ NLP structures are used to extract lexical, syntactic
and semantic representations of text.
◎ Lucene-based search engine is used to index extracted
fields, and then to perform a document retrieval on
these fields.
◎ The approach is evaluated using publicly available
dataset bAbI that consists of 20 tasks, where each
task represents a different kind of question answering
challenge.
Multi-Field Structural Decomposition for Question Answering, Jurczyk et al., arXiv