This document summarizes the MultiR model for distant supervision relation extraction. MultiR introduces latent variables to indicate the relation expressed by each sentence and handles missing data by relaxing hard constraints from previous models. It allows an entity pair to have multiple relations and incorporates the tendency that knowledge bases include popular entities and relations. The model is trained using an algorithm similar to perceptron and inference involves finding the highest weight assignment of relations consistent with the knowledge base.
The world is ever changing. As a result, many of the systems and phenomena we are interested in evolve over time resulting in time evolving datasets. Timeseries often display any interesting properties and levels of correlation. In this tutorial we will introduce the students to the use of Recurrent Neural Networks and LSTMs to model and forecast different kinds of timeseries.
GitHub: https://github.com/DataForScience/RNN
Using Topological Data Analysis on your BigDataAnalyticsWeek
Synopsis:
Topological Data Analysis (TDA) is a framework for data analysis and machine learning and represents a breakthrough in how to effectively use geometric and topological information to solve 'Big Data' problems. TDA provides meaningful summaries (in a technical sense to be described) and insights into complex data problems. In this talk, Anthony will begin with an overview of TDA and describe the core algorithm that is utilized. This talk will include both the theory and real world problems that have been solved using TDA. After this talk, attendees will understand how the underlying TDA algorithm works and how it improves on existing “classical” data analysis techniques as well as how it provides a framework for many machine learning algorithms and tasks.
Speaker:
Anthony Bak, Senior Data Scientist, Ayasdi
Prior to coming to Ayasdi, Anthony was at Stanford University where he did a postdoc with Ayasdi co-founder Gunnar Carlsson, working on new methods and applications of Topological Data Analysis. He completed his Ph.D. work in algebraic geometry with applications to string theory at the University of Pennsylvania and ,along the way, he worked at the Max Planck Institute in Germany, Mount Holyoke College in Germany, and the American Institute of Mathematics in California.
The world is ever changing. As a result, many of the systems and phenomena we are interested in evolve over time resulting in time evolving datasets. Timeseries often display any interesting properties and levels of correlation. In this tutorial we will introduce the students to the use of Recurrent Neural Networks and LSTMs to model and forecast different kinds of timeseries.
GitHub: https://github.com/DataForScience/RNN
Using Topological Data Analysis on your BigDataAnalyticsWeek
Synopsis:
Topological Data Analysis (TDA) is a framework for data analysis and machine learning and represents a breakthrough in how to effectively use geometric and topological information to solve 'Big Data' problems. TDA provides meaningful summaries (in a technical sense to be described) and insights into complex data problems. In this talk, Anthony will begin with an overview of TDA and describe the core algorithm that is utilized. This talk will include both the theory and real world problems that have been solved using TDA. After this talk, attendees will understand how the underlying TDA algorithm works and how it improves on existing “classical” data analysis techniques as well as how it provides a framework for many machine learning algorithms and tasks.
Speaker:
Anthony Bak, Senior Data Scientist, Ayasdi
Prior to coming to Ayasdi, Anthony was at Stanford University where he did a postdoc with Ayasdi co-founder Gunnar Carlsson, working on new methods and applications of Topological Data Analysis. He completed his Ph.D. work in algebraic geometry with applications to string theory at the University of Pennsylvania and ,along the way, he worked at the Max Planck Institute in Germany, Mount Holyoke College in Germany, and the American Institute of Mathematics in California.
Memory Management for Large-Scale Link Discovery
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
Starting from the very basic of what a GAN is, passing trough Tensorflow implementation, using the most cutting-edge APIs available in the framework, and finally, production-ready serving at scale using Google Cloud ML Engine.
Slides for the talk: https://www.pycon.it/conference/talks/deep-diving-into-gans-form-theory-to-production
Github repo: https://github.com/zurutech/gans-from-theory-to-production
Joint contrastive learning with infinite possibilitiestaeseon ryu
Contrastive Learning은 두 이미지가 유사한지 유사하지 않은 지에 대해서 어떤 label이 없이 피쳐들을 배우게 하는 머신 learning 테크닉 중에 하나입니다 우리는 기존에 있는 Supervised learning과 조금 차이가 있는데 Supervised learning은 label cost가 들고
그다음에 Task specific 하기 때문에 generalizability가 조금 떨어질 수 있습니다 하지만 Contrastive Learning은 label이 없이 진행하기때문에 label cost가 없고 generalizability가 조금 더 좋을수 있습니다. 해당 논문은 보다 유용한 Contrastive Learning을 위한 Joint Contrastive Learning에 대해 제안을 하는대요 https://youtu.be/0NLq-ikBP1I
This research was published in IEEE SSCI 2017 in Hawaii.
The research goal was constructing learning theory of Non-negative Matrix Factorization and we derived a tighter upper bound of the generalization error than our previous research. Moreover, we carried out numerical experiments and discovered a conjecture that showed the exact value of the generalization error.
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
Generative Adversarial Network and its Applications on Speech and Natural Language Processing, Part 1.
발표자: Hung-yi Lee(국립 타이완대 교수)
발표일: 18.7.
Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
In the first part of the talk, I will first give an introduction of GAN and provide a thorough review about this technology. In the second part, I will focus on the applications of GAN to speech and natural language processing. I will demonstrate the applications of GAN on voice I will also talk about the research directions towards unsupervised speech recognition by GAN.conversion, unsupervised abstractive summarization and sentiment controllable chat-bot.
Learning to automatically solve algebra word problemsNaoaki Okazaki
Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and Regina Barzilay.
ACL-2014, pages 271–281.
(presented by Naoaki Okazaki at the paper reading organized by Preferred Infrastructure)
Memory Management for Large-Scale Link Discovery
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
Starting from the very basic of what a GAN is, passing trough Tensorflow implementation, using the most cutting-edge APIs available in the framework, and finally, production-ready serving at scale using Google Cloud ML Engine.
Slides for the talk: https://www.pycon.it/conference/talks/deep-diving-into-gans-form-theory-to-production
Github repo: https://github.com/zurutech/gans-from-theory-to-production
Joint contrastive learning with infinite possibilitiestaeseon ryu
Contrastive Learning은 두 이미지가 유사한지 유사하지 않은 지에 대해서 어떤 label이 없이 피쳐들을 배우게 하는 머신 learning 테크닉 중에 하나입니다 우리는 기존에 있는 Supervised learning과 조금 차이가 있는데 Supervised learning은 label cost가 들고
그다음에 Task specific 하기 때문에 generalizability가 조금 떨어질 수 있습니다 하지만 Contrastive Learning은 label이 없이 진행하기때문에 label cost가 없고 generalizability가 조금 더 좋을수 있습니다. 해당 논문은 보다 유용한 Contrastive Learning을 위한 Joint Contrastive Learning에 대해 제안을 하는대요 https://youtu.be/0NLq-ikBP1I
This research was published in IEEE SSCI 2017 in Hawaii.
The research goal was constructing learning theory of Non-negative Matrix Factorization and we derived a tighter upper bound of the generalization error than our previous research. Moreover, we carried out numerical experiments and discovered a conjecture that showed the exact value of the generalization error.
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
Generative Adversarial Network and its Applications on Speech and Natural Language Processing, Part 1.
발표자: Hung-yi Lee(국립 타이완대 교수)
발표일: 18.7.
Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
In the first part of the talk, I will first give an introduction of GAN and provide a thorough review about this technology. In the second part, I will focus on the applications of GAN to speech and natural language processing. I will demonstrate the applications of GAN on voice I will also talk about the research directions towards unsupervised speech recognition by GAN.conversion, unsupervised abstractive summarization and sentiment controllable chat-bot.
Learning to automatically solve algebra word problemsNaoaki Okazaki
Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and Regina Barzilay.
ACL-2014, pages 271–281.
(presented by Naoaki Okazaki at the paper reading organized by Preferred Infrastructure)
Linked Data Generation for Adaptive Learning Analytics SystemsSven Lieber
The presentation for the paper "Linked Data Generation for Adaptive Learning Analytics Systems" given at the LILE2018 – Learning & Education with Web Data workshop at the WebSci conference 2018 in Amsterdam.
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Fwdays
Successful artificial intelligence solutions always require a massive amount of high-quality labeled data. In most cases, we don’t have a large and qualitative labeled set together. Weak supervision and active learning tools may help you optimize the labeling process and address the shortage of data labels.
First, we will review how active learning can significantly reduce the amount of labeled data for training with classic approaches. We will show how active learning methods can be customized for a specific (NLP) task by using text embedding.
With weak supervision, we will see how using simple rules gets a big train dataset automatically and high model performance without manual labeling at all.
In the end, we will combine active learning and weak supervision by taking advantage of both techniques and achieving the best metrics.
Dynamic Search Using Semantics & StatisticsPaul Hofmann
This presentation shows 3 applications of successfully combining semantics and statistics for text mining and interactive search.
1) We predict the Lehman bankruptcy using statistical topic modeling, SAP Business Objects entity extraction and associative memories (powered by Saffron Technologies).
2) We semi-automatically handle service requests at Cisco using knowledge extraction and knowledge reuse.
3) We discover user intent for interactive retrieval. User intent is defined as a latent state. The observations of this latent state are the reformulated query sequence, and the retrieved documents, together with the positive or negative feedback provided by the user. Demo shows recognizing user’s intent for health care search.
Machine Learning for Incident Detection: Getting StartedSqrrl
This presentation walks you through the uses of machine learning in incident detection and response, outlining some of the basic features of machine learning and specific tools you can use.
Watch the presentation with audio here: https://www.youtube.com/watch?v=4pArapSIu_w
1. Introduction and how to get into Data
2. Data Engineering and skills needed
3. Comparison of Data Analytics for statistic and real time streaming data
4. Bayesian Reasoning for Data
Finding knowledge, data and answers on the Semantic Webebiquity
Web search engines like Google have made us all smarter by providing ready access to the world's knowledge whenever we need to look up a fact, learn about a topic or evaluate opinions. The W3C's Semantic Web effort aims to make such knowledge more accessible to computer programs by publishing it in machine understandable form.
<p>
As the volume of Semantic Web data grows software agents will need their own search engines to help them find the relevant and trustworthy knowledge they need to perform their tasks. We will discuss the general issues underlying the indexing and retrieval of RDF based information and describe Swoogle, a crawler based search engine whose index contains information on over a million RDF documents.
<p>
We will illustrate its use in several Semantic Web related research projects at UMBC including a distributed platform for constructing end-to-end use cases that demonstrate the semantic web’s utility for integrating scientific data. We describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface which searches the Semantic Web for data relevant to a given query ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with Triple Shop and other semantic web resources.
Building machine learning systems remains something of an art, from gathering and transforming the right data to selecting and finetuning the most fitting modeling techniques. If we want to make machine learning more accessible and foster skilfull use, we need novel ways to share and reuse findings, and streamline online collaboration. OpenML is an open science platform for machine learning, allowing anyone to easily share data sets, code, and experiments, and collaborate with people all over the world to build better models. It shows, for any known data set, which are the best models, who built them, and how to reproduce and reuse them in different ways. It is readily integrated into several machine learning environments, so that you can share results with the touch of a button or a line of code. As such, it enables large-scale, real-time collaboration, allowing anyone to explore, build on, and contribute to the combined knowledge of the field. Ultimately, this provides a wealth of information for a novel, data-driven approach to machine learning, where we learn from millions of previous experiments to either assist people while analyzing data (e.g., which modeling techniques will likely work well and why), or automate the process altogether.
Short Description about machine learning.What is machine learning? specifications , categories, terminologies and applications every thing is explained in short way.
Currently, most of white-box machine learning techniques are purely data-driven and ignore prior background and expert knowledge. A lot of this knowledge has already been captured in domain models, i.e. ontologies, using Semantic Web technologies. The goal of this research proposal is to enhance the predictive performance and required training time of white-box models by incorporating the vast amount of available knowledge captured in ontologies in each of the phases of a machine learning process: feature extraction, feature selection and model construction. Moreover, it will be investigated if we can augment the initial training set with minimal user interaction by exploiting the concept of linked data.
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
PROJECT REPORT
• Performed memory-based collaborative filtering techniques like Cosine similarities, Pearson’s r & model-based Matrix Factorization techniques like Alternating Least Squares (ALS) method
• Studied the scalability of these methods on local machines & on Hadoop clusters
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet spefic criteria, has become an increasingly important, yet challenging task to support issues such as entity retrieval or semantic search and data linking. Particularly with respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and ecient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to
the semantic web tradition in dealing with "fnding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.
While an understanding of the nature of the content of specic datasets is a crucial
prerequisite for the mentioned issues, we adopt in this dissertation the notion of
\dataset prole" | a set of features that describe a dataset and allow the comparison
of dierent datasets with regard to their represented characteristics. Our
rst research direction was to implement a collaborative ltering-like dataset recommendation
approach, which exploits both existing dataset topic proles, as well
as traditional dataset connectivity measures, in order to link LOD datasets into
a global dataset-topic-graph. This approach relies on the LOD graph in order to
learn the connectivity behaviour between LOD datasets. However, experiments have
shown that the current topology of the LOD cloud group is far from being complete
to be considered as a ground truth and consequently as learning data.
Facing the limits the current topology of LOD (as learning data), our research
has led to break away from the topic proles representation of \learn to rank"
approach and to adopt a new approach for candidate datasets identication where
the recommendation is based on the intensional proles overlap between dierent
datasets. By intensional prole, we understand the formal representation of a set of
schema concept labels that best describe a dataset and can be potentially enriched
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...Charlie Berger
DBAs spend too time with routine tasks leaving little time for innovation. Autonomous Databases free data professionals to extract more value from data. Oracle Machine Learning, in Autonomous Database, “moves the algorithms; not the data” for 100% in-database processing. Data professionals perform many supporting tasks for “data scientists”, typically 80% of the work. Come learn an evolutionary path for Oracle data professionals to leverage domain knowledge and data skills and add machine learning. See how to build and deploy predictive models inside the Database. Using examples, demos and sharing experiences, Charlie will show you how to discover new insights, make predictions and become an “Oracle Data Scientist” in just 6 weeks!
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013)
1. Modeling Missing Data in Distant Supervision for Information Extraction
Alan Ritter (CMU)
Luke Zettlemoyer(University of Washington)
Mausam(University of Washington)
Oren Etzioni(Vulcan Inc.)
TACL, 1, 367-378, 2013.
Presented by NaoakiOkazaki (Tohoku University)
2014-09-05 Modeling Missing Data in Distant Supervision
1
2. Relation instance extraction
Steven Spielberg’s film Saving Private Ryan is loosely based on the brothers’ story.
Extractor
Film
Director
Saving Private Ryan
Steven Spielberg
Film-director relation
•
Fully-supervised learning (Zhou+ 05, …)
•
Uses ACE corpora to build relation-instance classifiers
•
Suffers from the limited number of training data
•
Unsupervised information extraction (Banko+ 07, …)
•
Extracts relational patterns between entities, and clusters the patterns into relations
•
Difficult to map clusters into relations of interest
•
Bootstrap learning (Brin98, …)
•
Uses seed instances to extract a new set of relational patterns
•
Often suffers from low precision (semantic drift)
•
Distant supervision (Mintz+ 09, …)
•
Combines the advantages of the above approaches
2014-09-05 Modeling Missing Data in Distant Supervision
2
3. Distant supervision (Mintz+, 09)
Person
Birthplace
EdwinHubble
Marshfield
…
…
Automatic annotation
Astronomer Edwin Hubble was born in Marshfield, Missouri.
Feature extraction
Mintzet al. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011.
* Each row presents a single feature. Concatenate features from different sentences containing the same entity pairs.
Problem: An entity pair cannot have multiple relations
E.g., Founded(Jobs, Apple) and CEO-of(Jobs, Apple) are true.
2014-09-05 Modeling Missing Data in Distant Supervision
3
4. MultiR(Hoffmann+, 11)
Introduces latent variables (푧푧푖푖) to indicate the relation expressed by sentence 푥푥푖푖
0
1
1
0
Founder
Founder
CEO-of
푦푦born−in
푦푦founder
푦푦CEO−of
푦푦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
푧푧1
푧푧2
푧푧3
푝푝풚풚,풛풛풙풙 = 1 푍푍푥푥 ෑ 푟푟 Φjoin(푦푦푟푟,풛풛)ෑ 푖푖 Φextract(푧푧푖푖,푥푥푖푖)
푥푥1
푥푥2
푥푥3
풛풛
풙풙
풚풚
For entity pair, (Steve Jobs, Apple)
푥푥푖푖: a sentence containing the entity pair
푦푦푟푟∈{0,1}: 1if the knowledge base includes the pair with relation 푟푟, 0otherwise
푧푧푖푖∈푅푅: the relation expressed by sentence 푥푥푖푖
Φextract푧푧푖푖,푥푥푖푖=exp 푗푗 휃휃푗푗휙휙푗푗(푧푧푖푖,푥푥푖푖)
Φjoin푦푦푟푟,풛풛=1(¬푦푦푟푟⋁∃푖푖: 푗푗=푧푧푖푖)
(Deterministic OR)
The same as (Mintz+ 09)
Φjoinensures that a sentence 푥푥푖푖expressing the relation 푟푟exists if 푟푟is true
Allows multiple relations for the same entity pair
2014-09-05 Modeling Missing Data in Distant Supervision
4
5. MultiR: Training
Hoffmann et al. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550.
Loop for passes over the training data
Loop for entity pairs in the KB
Predict sentence-level and KB-level relations (ignoring the facts in the KB)
Find an optimal assignment of sentence-level relations consistent with the facts in KB
We need two kinds of inferences
Update feature weights similarly to the perceptron algorithm
2014-09-05 Modeling Missing Data in Distant Supervision 5
6. MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙)
?
?
?
?
?
?
?
푦푦born−in
푦푦founder
푦푦CEO−of
푦푦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
푧푧1
푧푧2
푧푧3
푥푥1
푥푥2
푥푥3
풛풛
풙풙
풚풚
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
born−in
founder
CEO−of
capita−of
Predict a relation label for each sentence independently
Aggregate sentence- level predictions into global-level predictions
2014-09-05 Modeling Missing Data in Distant Supervision
6
7. MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙)
0
1
0
0
founder
founder
founder
푦푦born−in
푦푦founder
푦푦CEO−of
푦푦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
푧푧1
푧푧2
푧푧3
푥푥1
푥푥2
푥푥3
풛풛
풙풙
풚풚
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
born−in
founder
CEO−of
capita−of
Predict a relation label for each sentence independently
Aggregate sentence- level predictions into global-level predictions
Very easy to find!
Computational cost: 표표(푅푅풙풙)
2014-09-05 Modeling Missing Data in Distant Supervision
7
8. MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚)
0
1
1
0
?
?
?
푦푦born−in
푦푦founder
푦푦CEO−of
푦푦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
푧푧1
푧푧2
푧푧3
푥푥1
푥푥2
푥푥3
풛풛
풙풙
풚풚
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
born−in
founder
CEO−of
capita−of
0.5
8
7
16
11
8
9
6
7
0.1
0.1
0.2
Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖)
A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖
Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟
Find a set of edges that maximize the sum of weights
2014-09-05 Modeling Missing Data in Distant Supervision
8
9. MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚)
0
1
1
0
founder
founder
CEO-of
푦푦born−in
푦푦founder
푦푦CEO−of
푦푦capital−of
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs is CEO of Apple.
푧푧1
푧푧2
푧푧3
푥푥1
푥푥2
푥푥3
풛풛
풙풙
풚풚
For entity pair, (Steve Jobs, Apple)
0.5
16.0
9.0
0.1
8.0
11.0
6.0
0.1
7.0
8.0
7.0
0.2
born−in
founder
CEO−of
capita−of
16
11
8
9
6
7
Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖)
A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖
Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟
Find a set of edges that maximize the sum of weights
Exact solution in polynomial time
In practice, approximate solution by greedy search (assigning 푧푧푖푖for each node 푦푦푟푟=1) is sufficient
2014-09-05 Modeling Missing Data in Distant Supervision
9
10. Contribution of this work
•
MultiRmakes two assumptions (hard constraints):
•
If a fact is not found in the database, it cannot be mentioned in the text
•
If a fact is in the database, it must be mentioned in at least one sentence.
•
Relax MultiRto handle the situation where:
•
A fact is not mentioned in text (MIT)
•
A fact mentioned in text is missing in database (MID)
•
Side effect of this relaxation
•
Incorporates the tendency that the knowledge base is likely to include popular entities and relations
2014-09-05 Modeling Missing Data in Distant Supervision
10
11. Distant Supervision with Data Not Missing at Random (DNMAR)
0
1
1
0
Founder
Founder
visit
푦푦born−in
푦푦founder
푦푦CEO−of
푦푦visit
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs visited Apple store…
푧푧1
푧푧2
푧푧3
푥푥1
푥푥2
푥푥3
풛풛
풙풙
풚풚
For entity pair, (Steve Jobs, Apple)
0
1
0
1
풕풕
Introduce a layer of latent variables (푡푡푟푟) to handle missing cases
휙휙miss푦푦푟푟,푡푡푟푟 = −훼훼푀푀푀푀푀푀(푦푦푟푟=1⋀푡푡푟푟=0) (missingintext) −훼훼푀푀푀푀푀푀(푦푦푟푟=0⋀푡푡푟푟=1) (missinginDB) 0(otherwise)
Relaxing two hard constraints in MultiRinto soft oneswith penalty factors −훼훼푀푀푀푀푀푀and −훼훼푀푀푀푀푀푀
Introduce a new factor:
Training algorithm is the same as the one used in MultiR
2014-09-05 Modeling Missing Data in Distant Supervision
11
12. Constrained inference: argmax 풛풛 푝푝(풛풛|풙풙,풚풚)
0
1
1
0
?
?
?
푦푦born−in
푦푦founder
푦푦CEO−of
푦푦visit
Steve Jobs was founder of Apple.
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple.
Steve Jobs visited Apple store…
푧푧1
푧푧2
푧푧3
푥푥1
푥푥2
푥푥3
풛풛
풙풙
풚풚
For entity pair, (Steve Jobs, Apple)
?
?
?
?
풕풕
푧푧∗=argmax 풛풛 푖푖=1 푛푛 휃휃ȉΦextract푧푧푖푖,푥푥푖푖+ 푟푟 훼훼푀푀푀푀푇ȉ1(푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖)−훼훼푀푀푀푀퐷ȉ1(¬푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖)
Became more challenging
A* search can find an exact solution, but is not scalable with many variables
Present a greedy hill climbing approach for the inference:
1.
Initialize 푧푧푖푖at random
2.
Obtain neighborhoods of the current solution
3.
Move to the neighbor yielding the highest score
4.
Repeat this process
2014-09-05 Modeling Missing Data in Distant Supervision
12
13. Incorporating popularity in KB
•
We tune the penalty factors 훼훼푀푀푀푀푀푀and 훼훼푀푀푀푀퐷on a development set
•
We can take into account how likely each fact is to be observed in the text and the knowledge base
•
Facts about Barack Obama are likelyto exist
•
Facts about NaoakiOkazaki are unlikelyto exists
•
Control the penalty factor for each entity pair
•
Popularity of entities: 훼훼푀푀푀푀푀푀 (푒푒1,푒푒2)=−훾훾min(푐푐푒푒1,푐푐(푒푒2))
•
A larger penalty if the model predicts that a fact about a popular entity does not exist in KB
•
Well-aligned relations: assign 3 kinds of values of 훼훼푀푀푀푀푇푟푟
•
A larger penalty if a popular relation such as contains, place_lived, and nationalitydoes not exist in text
2014-09-05 Modeling Missing Data in Distant Supervision
13
14. Experiments
•
Binary relation extraction
•
The standard setting (Riedel+, 10)
•
Knowledge base: Freebase relations
•
Text corpus: 1.8m New York Times articles
•
Two kinds of evaluation
•
Sentence-level extractions using the dataset (Hoffmann+, 11)
•
Holdout evaluation on Freebase knowledge
•
Unary relation extraction (NE categorization)
•
Twitter NE categorization dataset (Ritter+, 11)
•
Knowledge base: Freebase (instances and their categories)
•
Text corpus: tweets
•
Hold-out evaluation
2014-09-05 Modeling Missing Data in Distant Supervision
14
15. Results
17% increase in area under the curve.
Incorporating popularity yielded 27% increase over the baseline.
This evaluation underestimate precision because many facts correctly extracted from text are missing in the database.
DNMAR doubled the recall.
Ritter et al. (2013) Modeling Missing Data in Distant Supervision for Information Extraction, TACL(1), 367-378.
2014-09-05 Modeling Missing Data in Distant Supervision
15
16. Conclusion
•
Investigated the problem of missing data in distant supervision
•
Presented an extension of MultiRto handle missing data
•
Could incorporate the popularity of facts to be included in the knowledge base and text
•
Presented a scalable inference algorithm based on greedy hill-climbing
•
Demonstrated the effectiveness of the modeling
2014-09-05 Modeling Missing Data in Distant Supervision
16
17. References
•
Raphael Hoffmann, CongleZhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld. (2011) Knowledge- Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550.
•Slides
and
codes
•
Mike Mintz, Steven Bills, RionSnow, Dan Jurafsky. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011.
2014-09-05 Modeling Missing Data in Distant Supervision
17