Imitation learning is used to address the problem of distant supervision for relation extraction. It decomposes the task into named entity classification (NEC) and relation extraction (RE), allowing the models to be trained separately. Through an iterative process, imitation learning is able to learn the dependencies between NEC and RE even when only labels for RE are provided. This overcomes limitations of prior approaches that rely on distantly labeled data. Evaluation shows the approach improves over baselines by leveraging multi-stage modeling to compensate for mistakes at the NEC stage.
Seed Selection for Distantly Supervised Web-Based Relation ExtractionIsabelle Augenstein
Slides of my presentation on "Seed Selection for Distantly Supervised Web-Based Relation Extraction" at the Semantic Web and Information Extraction workshop (SWAIE) and COLING 2014
Download link for the paper: http://staffwww.dcs.shef.ac.uk/people/I.Augenstein/SWAIE2014-Seed.pdf
Relation Extraction from the Web using Distant SupervisionIsabelle Augenstein
Slides of my presentation on "Relation Extraction from the Web using Distant Supervision" at EKAW 2014. Download link for the paper: http://staffwww.dcs.shef.ac.uk/people/I.Augenstein/EKAW2014-Relation.pdf
Distant supervision for relation extraction without labeled data
Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky
ACL 2009
I introduced this paper at NAIST Machine Translation Study Group.
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
Presentation of work that will be published at EMNLP 2016.
Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, Sebastian Riedel. emoji2vec: Learning Emoji Representations from their Description. SocialNLP at EMNLP 2016. https://arxiv.org/abs/1609.08359
Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel. Numerically Grounded Language Models for Semantic Error Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147
Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina Bontcheva. Stance Detection with Bidirectional Conditional Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersIsabelle Augenstein
This paper describes the University of Sheffield's submission to the SemEval 2016 Twitter Stance Detection weakly supervised task (SemEval 2016 Task 6, Subtask B). In stance detection, the goal is to classify the stance of a tweet towards a target as "favor", "against", or "none". In Subtask B, the targets in the test data are different from the targets in the training data, thus rendering the task more challenging but also more realistic.
To address the lack of target-specific training data, we use a large set of unlabelled tweets containing all targets and train a bag-of-words autoencoder to learn how to produce feature representations of tweets. These feature representations are then used to train a logistic regression classifier on labelled tweets, with additional features such as an indicator of whether the target is contained in the tweet. Our submitted run on the test data achieved an F1 of 0.3270.
Paper: http://isabelleaugenstein.github.io/papers/SemEval2016-Stance.pdf
Seed Selection for Distantly Supervised Web-Based Relation ExtractionIsabelle Augenstein
Slides of my presentation on "Seed Selection for Distantly Supervised Web-Based Relation Extraction" at the Semantic Web and Information Extraction workshop (SWAIE) and COLING 2014
Download link for the paper: http://staffwww.dcs.shef.ac.uk/people/I.Augenstein/SWAIE2014-Seed.pdf
Relation Extraction from the Web using Distant SupervisionIsabelle Augenstein
Slides of my presentation on "Relation Extraction from the Web using Distant Supervision" at EKAW 2014. Download link for the paper: http://staffwww.dcs.shef.ac.uk/people/I.Augenstein/EKAW2014-Relation.pdf
Distant supervision for relation extraction without labeled data
Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky
ACL 2009
I introduced this paper at NAIST Machine Translation Study Group.
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
Presentation of work that will be published at EMNLP 2016.
Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, Sebastian Riedel. emoji2vec: Learning Emoji Representations from their Description. SocialNLP at EMNLP 2016. https://arxiv.org/abs/1609.08359
Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel. Numerically Grounded Language Models for Semantic Error Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147
Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina Bontcheva. Stance Detection with Bidirectional Conditional Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersIsabelle Augenstein
This paper describes the University of Sheffield's submission to the SemEval 2016 Twitter Stance Detection weakly supervised task (SemEval 2016 Task 6, Subtask B). In stance detection, the goal is to classify the stance of a tweet towards a target as "favor", "against", or "none". In Subtask B, the targets in the test data are different from the targets in the training data, thus rendering the task more challenging but also more realistic.
To address the lack of target-specific training data, we use a large set of unlabelled tweets containing all targets and train a bag-of-words autoencoder to learn how to produce feature representations of tweets. These feature representations are then used to train a logistic regression classifier on labelled tweets, with additional features such as an indicator of whether the target is contained in the tweet. Our submitted run on the test data achieved an F1 of 0.3270.
Paper: http://isabelleaugenstein.github.io/papers/SemEval2016-Stance.pdf
Introduction to natural language generation with artificial neural networks (ANNs) and a group poetry writing exercise where humans pretend to be neurons in an ANN.
Slides for my tutorial at the ESWC Summer School 2015, giving an introduction to information extraction with Linked Data and an introduction to one of the applications of information extraction, opinion mining.
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Glen Cathey
A deep dive into resume and LinkedIn sourcing and matching solutions claiming to use artificial intelligence, semantic search, and NLP, including how they work, their pros, cons, and limitations, and examples of what sourcers and recruiters can do that even the most advanced automated search and match algorithms can't do. Topics covered include human capital data information retrieval and analysis (HCDIR & A), Boolean and extended Boolean, semantic search, dynamic inference, dark matter resumes and social network profiles, and what I believe to be the ideal resume search and matching solution.
Introduction to natural language generation with artificial neural networks (ANNs) and a group poetry writing exercise where humans pretend to be neurons in an ANN.
Slides for my tutorial at the ESWC Summer School 2015, giving an introduction to information extraction with Linked Data and an introduction to one of the applications of information extraction, opinion mining.
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Glen Cathey
A deep dive into resume and LinkedIn sourcing and matching solutions claiming to use artificial intelligence, semantic search, and NLP, including how they work, their pros, cons, and limitations, and examples of what sourcers and recruiters can do that even the most advanced automated search and match algorithms can't do. Topics covered include human capital data information retrieval and analysis (HCDIR & A), Boolean and extended Boolean, semantic search, dynamic inference, dark matter resumes and social network profiles, and what I believe to be the ideal resume search and matching solution.
Epistemic networks for Epistemic CommitmentsSimon Knight
The ways in which people seek and process information are fundamentally epistemic in nature. Existing epistemic cognition research has tended towards characterizing this fundamental relationship as cognitive or belief-based in nature. This paper builds on recent calls for a shift towards activity-oriented perspectives on epistemic cognition and proposes a new theory of ‘epistemic commitments’. An additional contribution of this paper comes from an analytic approach to this recast construct of epistemic commitments through the use of Epistemic Network Analysis (ENA) to explore connections between particular modes of epistemic commitment. Illustrative examples are drawn from existing research data on children’s epistemic talk when engaged in collaborative information seeking tasks. A brief description of earlier analysis of this data is given alongside a newly conducted ENA to demonstrate the potential for such an approach.
Paper at: http://oro.open.ac.uk/39254/
Bring your own idea - Visual learning analyticsJoris Klerkx
Workshop on visual learning analytics that was part of LASI 2014 - http://www.solaresearch.org/events/lasi-2/lasi2014/
Examples of learning dashboards were presented during the workshop by Sven Charleer:
http://www.slideshare.net/svencharleer/learning-dashboard-visual-learning-analytics-workshop-lasi2014-h-harvard
Building AI Applications using Knowledge GraphsAndre Freitas
Goals of this Tutorial:
Provide a broad view of the multiple perspectives underlying knowledge graphs.
Show knowledge graphs as a foundation for building AI systems.
Method:
Focus on the contemporary and emerging perspectives.
Sampling exemplar approaches and infrastructures on each of these emerging perspectives (not an exhaustive survey).
Design of learning experiences for science teaching & faculty development - W...Liz Dorland
Presentation on the design of learning experiences for science teaching & faculty development for the Washington University Education Research Group. What do students "see" in visualizations? What theories of learning apply?
Learning Relations from Social Tagging DataHang Dong
An interesting research direction is to discover structured knowledge from user generated data. Our work aims to find relations among social tags and organise them into hierarchies so as to better support discovery and search for online users. We cast relation discovery in this context to a binary classification problem in supervised learning. This approach takes as input features of two tags extracted using probabilistic topic modelling, and predicts whether a broader-narrower relation holds between them. Experiments were conducted using two large, real-world datasets, the Bibsonomy dataset which is used to extract tags and their features, and the DBpedia dataset which is used as the ground truth. Three sets of features were designed and extracted based on topic distri- butions, similarity and probabilistic associations. Evaluation results with respect to the ground truth demonstrate that our method outperforms existing ones based on various features and heuristics. Future studies are suggested to study the Knowledge Base Enrichment from folksonomies and deep neural network approaches to process tagging data.
Presentation for researchED maths and science on June 11th 2016. References at the end (might be some extra references from slides that were removed later on, this interesting :-)
Interested in discussing, contact me at C.Bokhove@soton.ac.uk or on Twitter @cbokhove
I of course tried to reference all I could. If you have objections to the inclusion of materials, please let me know.
Similar to Distant Supervision with Imitation Learning (20)
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationIsabelle Augenstein
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible — e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. In this talk, I will present some first steps towards addressing these problems, discussing our research on exaggeration detection, scientific fact checking, and on modelling information change in scientific communication more broadly.
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. In this talk, I will present some first steps towards addressing these problems, discussing our research on exaggeration detection of scientific claims and on scientific fact checking.
The past decade has seen a substantial rise in the amount of mis- and disinformation online, from targeted disinformation campaigns to influence politics, to the unintentional spreading of misinformation about public health. This development has spurred research in the area of automatic fact checking, a knowledge-intensive and complex reasoning task. Most existing fact checking models predict a claim’s veracity with black-box models, which often lack explanations of the reasons behind their predictions and contain hidden vulnerabilities. The lack of transparency in fact checking systems and ML models, in general, has been exacerbated by increased model size and by “the right…to obtain an explanation of the decision reached” enshrined in European law. This talk presents some first solutions to generating explanations for fact checking models. It then examines how to assess the generated explanations using diagnostic properties, and how further optimising for these diagnostic properties can improve the quality of the generating explanations. Finally, the talk examines how to systemically reveal vulnerabilities of black-box fact checking models.
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges.
Towards Explainable Fact Checking (DIKU Business Club presentation)Isabelle Augenstein
Outline:
- Fact checking – what is it and why do we need it?
- False information online
- Content-based automatic fact checking
- Explainability – what is it and why do we need it?
- Making the right predictions for the right reasons
- Model training pipeline
- Explainable fact checking – some first solutions
- Rationale selection
- Generating free-text explanations
- Wrap-up
Tutorial on 'Explainability for NLP' given at the first ALPS (Advanced Language Processing) winter school: http://lig-alps.imag.fr/index.php/schedule/
The talk introduces the concepts of 'model understanding' as well as 'decision understanding' and provides examples of approaches from the areas of fact checking and text classification.
Exercises to go with the tutorial are available here: https://github.com/copenlu/ALPS_2021
Automatic fact checking is one of the more involved NLP tasks currently researched: not only does it require sentence understanding, but also an understanding of how claims relate to evidence documents and world knowledge. Moreover, there is still no common understanding in the automatic fact checking community of how the subtasks of fact checking — claim check-worthiness detection, evidence retrieval, veracity prediction — should be framed. This is partly owing to the complexity of the task, despite efforts to formalise the task of fact checking through the development of benchmark datasets.
The first part of the talk will be on automatically generating textual explanations for fact checking, thereby exposing some of the reasoning processes these models follow. The second part of the talk will be on re-examining how claim check-worthiness is defined, and how check-worthy claims can be detected; followed by how to automatically generate claims which are hard to fact-check automatically.
Talk on 'Tracking False Information Online' at W-NUT workshop at EMNLP 2019.
=========
Digital media enables fast sharing of information and discussions among users. While this comes with many benefits to today’s society, such as broadening information access, the manner in which information is disseminated also has obvious downsides. Since fast access to information is expected by many users and news outlets are often under financial pressure, speedy access often comes at the expense of accuracy, which leads to misinformation. Moreover, digital media can be misused by campaigns to intentionally spread false information, i.e. disinformation, about events, individuals or governments. In this talk, I will present on different ways false information is spread online, including misinformation and disinformation. I will then report findings from our recent and ongoing work on automatic fact checking, stance detection and framing attitudes.
What can typological knowledge bases and language representations tell us abo...Isabelle Augenstein
One of the core challenges in typology is to record properties of languages in a structured way. As a result of manual efforts, typological knowledge bases have emerged, which contains information about languages’ phonological, morphological and syntactic properties; as well as information about language families. Ideally, such typological knowledge bases would provide useful information for multilingual NLP models to learn how to selectively share parameters.
A related area of research suggests a different way of encoding properties of languages, namely to learn language representation vectors directly from text documents.
In this talk, I will analyse and contrast these two ways of encoding linguistic properties, as well as present research on how the two can benefit one another.
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Isabelle Augenstein
Paper presented at NAACL 2018. Link: https://arxiv.org/abs/1802.09913
Abstract:
============
We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis.
Learning with limited labelled data in NLP: multi-task learning and beyondIsabelle Augenstein
When labelled training data for certain NLP tasks or languages is not readily available, different approaches exist to leverage other resources for the training of machine learning models. Those are commonly either instances from a related task or unlabelled data.
An approach that has been found to work particularly well when only limited training data is available is multi-task learning.
There, a model learns from examples of multiple related tasks at the same time by sharing hidden layers between tasks, and can therefore benefit from a larger overall number of training instances and extend the models' generalisation performance. In the related paradigm of semi-supervised learning, unlabelled data as well as labelled data for related tasks can be easily utilised by transferring labels from labelled instances to unlabelled ones in order to essentially extend the training dataset.
In this talk, I will present my recent and ongoing work in the space of learning with limited labelled data in NLP, including our NAACL 2018 papers 'Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces [1] and 'From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings’ [2].
[1] https://t.co/A5jHhFWrdw
[2] https://arxiv.org/abs/1802.09375
==========
Bio from my website http://isabelleaugenstein.github.io/index.html:
I am a tenure-track assistant professor at the University of Copenhagen, Department of Computer Science since July 2017, affiliated with the CoAStAL NLP group and work in the general areas of Statistical Natural Language Processing and Machine Learning. My main research interests are weakly supervised and low-resource learning with applications including information extraction, machine reading and fact checking.
Before starting a faculty position, I was a postdoctoral research associate in Sebastian Riedel's UCL Machine Reading group, mainly investigating machine reading from scientific articles. Prior to that, I was a Research Associate in the Sheffield NLP group, a PhD Student in the University of Sheffield Computer Science department, a Research Assistant at AIFB, Karlsruhe Institute of Technology and a Computational Linguistics undergraduate student at the Department of Computational Linguistics, Heidelberg University.
Spreading of mis- and disinformation is growing and is having a big impact on interpersonal communications, politics and even science.
Traditional methods, e.g. manual fact-checking by reporters cannot keep up with the growth of information. On the other hand, there has been much progress in natural language processing recently, partly due to the resurgence of neural methods.
How can natural language processing methods fill this gap and help to automatically check facts?
This talk will explore different ways to frame fact checking and detail our ongoing work on learning to encode documents for automated fact checking, as well as describe future challenges.
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...Isabelle Augenstein
Shared task summary for SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications
Paper: https://arxiv.org/abs/1704.02853
Abstract:
We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.
Extracting Relations between Non-Standard Entities using Distant Supervision ...Isabelle Augenstein
Poster for our EMNLP paper on extracting non-standard relations from the Web with distant supervision and imitation learning. Read the full paper here: https://aclweb.org/anthology/D/D15/D15-1086.pdf
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Securing your Kubernetes cluster_ a step-by-step guide to success !
Distant Supervision with Imitation Learning
1. Distant Supervision with
Imitation Learning
Isabelle Augenstein
i.augenstein@sheffield.ac.uk
Department of Computer Science, University of Sheffield, UK
Joint work with Andreas Vlachos, Diana Maynard (EMNLP 2015)
30 November 2015
Heriot-Watt University Computer Science Seminar
2. 2
Talk Overview
• Relation Extraction from the Web with Distant Supervision
• Extracting Relations from Web pages
• Relation are used for populating Knowledge Bases
• Distant Supervision allows to automatically generate relation extraction
training data using knowledge base
Ø No manual effort necessary
3. 3
Talk Overview
• Imitation Learning for Distant Supervision
• Relation extraction relies on recognising and classifying named entities,
but sentences only have relation annotations
• Suitable manually labeled NERC training data can be difficult to obtain
• Imitation Learning decomposes tasks (RE) into sequence of actions
(e.g. NEC, RE), able to deal with latent variables
• Imitation Learning is a structured prediction method, also called
learning-to-search, inverse reinforcement learning
Ø Only labels for last action (RE) needed, no additional manual effort
4. 4
• Large knowledge bases are useful for search, question
answering etc.
Overall Problem
Structured Information from
Google Knowledge Graph
5. 5
• Large knowledge bases are useful for search, question
answering etc. but far from complete
Overall Problem
Structured Information from
Google Knowledge Graph
Band members,
genre missing
6. 6
• Large knowledge bases are useful for search, question
answering etc. but far from complete
• Approach: automatic knowledge base population (KBP)
methods using Web information extraction (IE)
1) Extracting entities and relations between them from text on Web pages
2) Combining information from several sources to populate KBs
Overall Problem
7. 7
Relation extraction for knowledge base completion
• Given subject and name of relation, find object of relation in corpus
• E.g. “Where was Bill Gates born?”
• Answer: birthplace(Bill Gates, Seattle_Washington)
Relation Extraction Overview
birthplace
Bill Gates was born in Seattle, Washington
LOC
8. 8
• Why distant supervision for relation extraction (RE)?
• RE methods requiring manual effort
• Rule-based approaches: manually created patters, e.g.
“X is a professor at Y”
• Supervised learning: statistical models, manually annotated training data
Ø Biased towards a domain, e.g. Biology, newswire, Wikipedia
• RE methods requiring no manual effort
• Bootstrapping: semi-supervised, learning patterns iteratively starting with
prior knowledge, e.g. list of names
Ø “Semantic drift”, e.g. “X is a professor at Y” -> “X lives in Y”
• Open Information Extraction: unsupervised learning, discovering
patterns, clustering
Ø Difficult to map to schema
Existing Approaches
9. 9
“If two entities participate in a relation, any sentence that contains those two
entities might express that relation.” (Mintz, 2009)
Amy Jade Winehouse was a
singer and songwriter known for
her eclectic mix of musical genres
including R&B, soul and jazz.
Blur helped to popularise the
Britpop genre.
Beckham rose to fame with the
all-female pop group Spice Girls.
Name Genre …
Amy Winehouse
Amy Jade Winehouse
Wino
…
R&B
soul
jazz
…
…
Blur
…
Britpop
…
…
Spice Girls
…
pop
…
…
different
lexicalisations
Distant Supervision
10. 10
Creating positive &
negative training
examples
Feature
Extraction
Classifier
Training
Prediction of
New
Relations
Distant Supervision
11. 11
Creating positive &
negative training
examples
Feature
Extraction
Classifier
Training
Prediction of
New
Relations
Distant Supervision
KB: album(The Beatles, Abbey Road)
Positive: The Beatles released their album Abbey Road
in 1969.
Negative: The Beatles played in Edinburgh.
depLemmaPath=released_OJB,
possPath=VBD_PRP_album, …
possPath=_release+VBN=0.354677
depLemmaPath=_release=1.81213, …
Michael Jackson’s third album is Music & Me
album(Michael Jackson, Music & Me)
12. 12
Distant Supervision
Creating positive &
negative training
examples
Feature
Extraction
Classifier
Training
Prediction of
New
Relations
Supervised learning
Automatically generated
training data
+
Distant Supervision
13. 13
• Requires no manual effort
• Automatically label text with relations from knowledge base
• Train statistical model (not patterns)
• Extract relations with respect to knowledge base
Ø Combine benefits of supervised approaches (learn statistical
model) and bootstrapping RE approaches (only list of extractions
as input)
Distant Supervision
14. 14
• Web crawl corpus, created using entity-specific search
queries, e.g. “`The Beatles’ Musical Artist album”
Class Property / Relation
Book author, characters
Musical
Artist
album, record label, track
Film director, producer, actor,
character
Politician birthplace, educational
institution, spouse
Evaluation: Corpus
Class Property / Relation
Business employees, founders
Educational
Institution
mascot, city
River origin, mouth
15. 15
• Distant Supervision does not require manual annotation but
depends on NERC for candidate identification
NERC for Distant Supervision
birthplace
Bill Gates was born in Seattle, Washington
LOC
16. 16
• Existing works use Stanford NER (Finkel et al. 2005) or
FIGER (Ling and Weld 2012)
Stanford NER FIGER
Location 14 Location (City, Country, County, Province, Railway, …)
Person 15 Person (Actor, Architect, Artist, Musician, Terrorist, …)
Organisation 13 Org (Airline, Company, Educational_Institution, ….)
Misc 13 Product (Car, Train, Camera, Software, Weapon, …)
9 Building (Airport, Hospital, Restaurant, Theater, …)
5 Art (Film, Play, Written_Work, Music, Newspaper)
7 Event (Election, Military_Conflict, Terrorist_Attack, …)
30 Misc (Time, Educational_Degree, Drug, Algorithm, …)
NERC for Distant Supervision
17. 17
• Problem 1: missing NE types even with fine-grained schemas
album
Michael Jackson’s third album is Music & Me
Musician ? Misc
NERC for Distant Supervision
18. 18
• Problem 1: missing NE types even with fine-grained schemas
• Problem 2: domain difference between training and testing
data (e.g. newswire, Wikipedia vs. Web)
album
Michael Jackson’s third album is Music & Me
? Misc
NERC for Distant Supervision
19. 19
• Task decomposition
• NER: Named Entity Boundary Recognition
• NEC: Assigning Types to NEs
• RE: Relation Extraction
• Solution 1:
• NER: recognise NEs with heuristics (e.g. POS-based, HTML)
• NEC: apply trained model (e.g. Stanford, FIGER), add labels of objects
to RE features
• RE: train model with distantly annotated data as usual
• NER Heuristics:
• Noun phrases, capitalised phrases
• Phrases from HTML markup: <ahref>, <li>, <h1>, <h2>, <h3>,
<strong>, <b>, <em>, <i>
NERC for Distant Supervision
20. 20
album
Michael Jackson’s third album is Music & Me
O
NERC for Distant Supervision
• Solution 1:
• NER: recognise NEs with heuristics (e.g. POS-based, HTML)
• NEC: add object candidate labels (e.g. with Stanford, FIGER)
• RE: train model with distantly annotated data as usual
• RE features: ne=O, depLemmaPath=poss_album_subj,
possPath=POS_JJ_album_VBZ, …
21. 21
• Experiments with 16 relations (e.g. album, character, record
label, author, origin)
Recall of NER with off-the-shelf Stanford model compared to
heuristics
NERC for Distant Supervision
22. 22
• Solution 2:
• NER: with heuristics
• NEC & RE: train one-stage model
• NEC features: obj=Music & Me, w[-1-2]=album is, …
• RE features: depLemmaPath=poss_album_subj,
possPath=POS_JJ_album_VBZ, …
album
Michael Jackson’s third album is Music & Me
NERC for Distant Supervision
23. 23
• Solution 2:
• NER: with heuristics
• NEC & RE: train one-stage model
• Problem 3: NEC features useful for RE but
• RE features are sparse (e.g. path between subject and object)
• NEC features can overpower RE features
album
Michael Jackson’s third album is Music & Me
NERC for Distant Supervision
24. 24
• Problem 3: NEC features useful for RE but:
• RE features are sparse (e.g. path between subject and object)
• NEC features can overpower RE features
Ø Model would incorrectly predict Stephen Spielberg,
because context is stronger (w[-1]=director)
One of director Stephen Spielberg’s greatest heroes
was Alfred Hitchcock, the mastermind behind
Psycho.
Candidates for director relation with subject Psycho:
Stephen Spielberg, Alfred Hitchcock
NERC for Distant Supervision
25. 25
• Ideal Solution:
• NER: with heuristics
• NEC: trained classifier
• RE: trained classifier
Ø That would be great, but how can we do this without NEC
training data?
NERC for Distant Supervision
26. 26
• Imitation learning with DAGGER (Ross et al. 2011)
• Also called learning-to-search, inverse reinforcement learning
• Structured prediction method
• Able to deal with latent variables, only labels for last stage (RE) needed
• Decompose tasks into sequence of actions made at different stages
• Dependencies between tasks are learnt by appropriate generation of
training examples
• Classifiers are trained iteratively
• Relationship between Reinforcement Learning and
Imitation learning
• In reinforcement, the policy is being learnt and the actions are given
• In imitation learning, the policy is given and the actions are learnt
• (hence inverse)
Imitation Learning for Distant
Supervision
27. 27
Imitation Learning for Distant
Supervision
• Learning from demonstrator
• Possible actions are given
• Correctness of actions (i.e.
costs) are assessed by
taking actions, predicting
remaining ones and
evaluating result
• Dependencies between
actions are learnt by
observation
• Origins of Imitation learning
• Robotics
• Game playing (e.g. Ortega et al. 2012)
• Mario’s possible actions (simplified): move left, move right,
duck, run, jump, fire
28. 28
Imitation Learning for Distant
Supervision
• Imitation Learning for NLP
• Actions: NEC, if NEC positive followed by RE
• Demonstrator (expert policy) tries to replicate labelled RE data
• Base classifier: cost sensitive classification learning with PA
(passive-aggressive classifier)
• NEC labels are needed but not specified by labelled RE data
• Solution: look-ahead!
29. 29
• Iteration 1, NEC Stage
Imitation Learning for Distant
Supervision
True False Features
NEC Stage ? ? obj=Music & Me, …
RE Stage depLemma=poss_album_subj, …
Michael Jackson’s third album is Music & Me
?
30. 30
• Iteration 1, RE Stage
Imitation Learning for Distant
Supervision
True False Features
NEC Stage ? ? obj=Music & Me, …
RE Stage 0 1 depLemma=poss_album_subj, …
True
Michael Jackson’s third album is Music & Me
?
31. 31
• Iteration 1, RE Stage
Imitation Learning for Distant
Supervision
True False Features
NEC Stage 0 1 obj=Music & Me, …
RE Stage 0 1 depLemma=poss_album_subj, …
True
Michael Jackson’s third album is Music & Me
True
32. 32
• Iteration 1
• NEC and RE Stage: predict labels according to labelled data
(expert policy) with look-ahead
• Extract features
• Assess costs
• CSC example: features, costs -> will be remembered for next iterations!
• Train classifier for each stage based on CSC example (learned policy)
Imitation Learning for Distant
Supervision
33. 33
• Iteration 1
• NEC and RE Stage: predict labels according to labelled data
(expert policy) with look-ahead
• Extract features
• Assess costs
• CSC example: features, costs -> will be remembered for next iterations!
• Train classifier for each stage based on CSC example (learned policy)
• Iteration >= 2
• Predict labels according to expert policy or learned policy
• Learned policy is chosen stochastically, i.e. p=(1−β)
i: number iteration, β: learning rate
• With each iteration it is more likely that expert policy is chosen
• The bigger the learning rate the faster learner moves away from labelled
data
Imitation Learning for Distant
Supervision
i-1
34. 34
• Reminder: Problem 3: NEC features useful for RE but:
• RE features are sparse (e.g. path between subject and object)
• NEC features can overpower RE features
Ø Model would incorrectly predict Stephen Spielberg,
because context is stronger (w[-1]=director)
One of director Stephen Spielberg’s greatest heroes
was Alfred Hitchcock, the mastermind behind
Psycho.
Candidates for director relation with subject Psycho:
Stephen Spielberg, Alfred Hitchcock
NERC for Distant Supervision
35. 35
• Multi-stage modelling compensates for mistakes
Imitation Learning for Distant
Supervision
Confidence Prediction Features
NEC Stage 0.629 True obj=Stephen Spielberg, …
RE Stage -0.571 False depLemma=_POSS_heroes_ …
False
Steven Spielberg’s greatest heroes (…) Psycho
True
36. 36
• Multi-stage modelling compensates for mistakes
Imitation Learning for Distant
Supervision
True
Alfred Hitchcock, the mastermind behind Psycho
True
Confidence Prediction Features
NEC Stage 0.629 True obj=Alfred Hitchcock, …
RE Stage 0.571 True depLemma=_APPOS_mastermi
nd …
37. 37
• Web crawl corpus, created using entity-specific search
queries, e.g. “`The Beatles’ Musical Artist album”
Class Property / Relation
Book author, characters
Musical
Artist
album, record label, track
Film director, producer, actor,
character
Politician birthplace, educational
institution, spouse
Evaluation: Corpus
Class Property / Relation
Business employees, founders
Educational
Institution
mascot, city
River origin, mouth
38. 38
• Improving NEC for RE with Web Features
Evaluation: NEC Features
Arctic Monkeys
Arctic Monkeys are a rock band from Sheffield,
famous for albums such as AM.
Albums:
- Whatever People Say I Am, That's What I'm Not
- AM
header
link
bold
list
39. 39
• NEC:
• Word features: Object occurrence, POS, digit and capitalisation
pattern etc.
• Context features: 2 words to left and right: BOW, sequence, bag of
POS, POS sequence, as 1-grams and 2-grams
• Web features
Ø Best F1 and P-avg achieved with all of those
• RE:
• Context features (as for NEC)
• POS and words between subject and object, as seq and BOW
• Dependency path with/without lemmas
Ø Best F1 and P-avg with sparse dependency features and 2-gram
context features
Evaluation: Features
40. 40
Evaluation Setting
• Models:
• All models: NER with candidate identification heuristics (POS,
Web-based)
• Rel only: one-stage, only relation features
• Stanf: one-stage with Stanf NEC labels added to RE features
• FIGER: one-stage with FIGER labels added to RE features
• OS: one-stage with NEC features added to RE features
• IL: two-stage with imitation learning
42. 42
Conclusions EMNLP Experiments
• Imitation learning approach outperforms baselines with
supervised NEC (Stanford NER and FIGER) by 10 points in
average precision
• For NEC: Web features such as appearance in lists or links to
other Web improve average precision by 7 points
• For RE: parse, high-precision features (such as parse)
outperform high-recall low-precision features (such as BOW
features)
43. 43
Distant Supervision Challenges
• Automatically generating training data
• Can lead to noisy training examples
Let It Be is the twelfth album by
The Beatles which contains their
hit single Let It Be.
Name Album Track
The Beatles
…
Let It Be
…
Let It Be
…
44. 44
Distant Supervision Challenges
• Automatically generating training data
• Can lead to noisy training examples
• Use ‘Let It Be’ mentions as positive training examples for album or for
track?
• Problem: if both mentions of ‘Let It Be’ are used to extract features for
both album and track, wrong weights are learnt
Let It Be is the twelfth album by
The Beatles which contains their
hit single Let It Be.
Name Album Track
The Beatles
…
Let It Be
…
Let It Be
…
45. 45
Distant Supervision Challenges
• Automatically generating training data
• Can lead to noisy training examples
• Evaluation
• If training data is generated automatically, how / on what data can
approaches be evaluated?
• Co-Reference Resolution
• Does training / testing data have to contain names of subj and obj
directly?
• Named Entity Recognition and Classification
• Supervised off-the-shelf NERC approaches are not perfect (see rest of
talk)
46. 46
Conclusions / Future Work
• Distant supervision allows to automatically populate
knowledge bases without manual effort
• Distant supervision can be applied to any domain
• Ongoing challenges:
• Reducing errors made by automatic labeling
• Distant supervision with co-reference resolution
• NERC for distant supervision
47. 47
References
• Isabelle Augenstein, Andreas Vlachos, Diana Maynard (2015).
Extracting Relations between Non-Standard Entities using Distant
Supervision and Imitation Learning. EMNLP 2015.
• Isabelle Augenstein, Diana Maynard, Fabio Ciravegna (2015). Distantly
Supervised Web Relation Extraction for Knowledge Base Population.
Semantic Web Journal.
• Isabelle Augenstein, Diana Maynard, Fabio Ciravegna (2014). Relation
Extraction from the Web using Distant Supervision. EKAW 2014, nominated
for best paper award.
• Isabelle Augenstein (2014). Joint Information Extraction from the Web using
Linked Data. ISWC 2014.
• Isabelle Augenstein (2014). Seed Selection for Distantly Supervised Web-
Based Relation Extraction. SWAIE Workshop at COLING 2014.
48. 48
References
Distant Supervision:
• Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant
supervision for relation extraction without labeled data. ACL- IJCNLP.
NERC:
• Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005.
Incorporating Non-local Information into Information Extraction Systems by
Gibbs Sampling. ACL.
• Xiao Ling and Daniel S. Weld. 2012. Fine-Grained Entity Recognition. AAAI.
Imitation Learning:
• Stéphane Ross, Geoffrey J. Gordon, and Drew Bagnell. 2011. A Reduction
of Imitation Learning and Structured Prediction to No-Regret Online
Learning. JMLR.
• Juan Ortega, Noor Shaker, Julian Togelius and Georgios N. Yannakakis
(2013): Imitating human playing styles in Super Mario Bros. Entertainment
Computing, Elsevier.