[Paper Introduction] Distant supervision for relation extraction without labeled data

•

1 like•592 views

Distant supervision for relation extraction without labeled data Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky ACL 2009 I introduced this paper at NAIST Machine Translation Study Group.

Technology

Distant supervision
for relation extraction
without labeled data
Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky 
ACL 2009
Introduced by Makoto Morishita

Contribution of this paper
• Proposed “distant supervision” for the ﬁrst
time.
• By using distant supervision, 
we can extract the relation between entities
from the sentences without annotation work.
2

Current training methods
• Supervised learning
• Unsupervised learning
• Self-training
• Active learning
3

Supervised learning
• Use only annotated data to train a model.
• Need a heavy cost to make the data.
4
Annotated data

Unsupervised learning
5
• Use only unannotated data.
• The result may not be suitable for some purposes.
Unannotated
data

Self-training
6
• Use annotated data for the seed of training model, then annotate
the unlabeled data by myself.
• It may be low precision and have a bias from the annotated data.
 
 
 
 
 
Unannotated data
Annotated
data

Active learning
7
• Use existing model to evaluate what data we
want to next, then annotate the selected data.
Unannotated data
Annotated
data
Evaluate
Annotate

Distant supervision
8
• We use existing database and unannotated data
to train classiﬁer, then annotate the new data.
Unannotated
data
Classiﬁer
Unannotated
data
Existing database
train
train
annotate

What we want to do
• Extract the relation between entities from
sentences.
• e.g. 
sentence: Kyoto, the famous place in Japan. 
entity: Japan, Kyoto 
relation: location-contains <Japan, Kyoto> 
12

In this work…
13
• Freebase: 102 relations, 940k entities,  
1.8M instances.
Unannotated
data
Classiﬁer
Unannotated
data
Freebase
train
train
annotate
Wikipedia
Multiclass logistic 
regression classiﬁer
Wikipedia

Training
• Find the sentence that contains two entities. 
- This sentence tends to express the relation. 
- Entities are found by a named entity tagger.
• Train classiﬁer. 
- I will explain the features later.
15

Example
• Known relation: 
location-contains <Virginia, Richmond> 
location-contains <France, Nantes>
• We found the sentences like: 
- Richmond, the capital of Virginia. 
- Henry’s Edict of Nantes helped the
Protestants of France.
• Train the classiﬁed using these sentences.
16

Testing
• Find the sentence that contains two entities. 
- This sentence tends to express the relation. 
- Entities are found by a named entity tagger.
• Using trained classiﬁer, we can know these
entities have a relation.
17

Features
• Lexical features: 
- speciﬁc words between and surrounding
the two entities in the sentence.
• Syntactic features: 
- dependency path
18

Lexical features
• The sequence of words between the two entities.
• The part-of-speech tags of these words.
• A ﬂag indication which entity came ﬁrst in the sentence.
• A window of k words to the left of Entity 1 and their part-of-speech tags.
• A window of k words to the right of Entity 2 and their part-of-speech tags.
19
Astronomer Edwin Hubble was born in Marshﬁeld, Missouri.

Syntactic features
20
• A dependency path between the two entities.
• For each entity, one “window” node that is not part of the dependency path.

Conclusion
• By using this method, we can extract the
relation from unlabeled texts.
• By using database, the label is suit for the
current database.
• Extracted relations are seemed to be
accurate.
25

Example usage of distant supervision
26
Existing database Target annotation
Freebase 
(relation between entities)
Wikipedia sentences 
(ﬁnd new relations)
Emoticon
Tweet 
(annotate positive, negative)
Dependency parse tree,
knowledge base
semantic parser

Comments
• Distant supervision can be useful for other
tasks. 
- Currently, this method is used mainly for
relation extraction task.
• However, it supposes that we already have a
large database.
27

This paper proposes using distant supervision to extract relations from web text to populate knowledge bases without requiring manual effort. It does this by using an existing knowledge base to automatically label sentences with entity relations, training a classifier on this distant supervision data. The paper describes using statistical methods to select better training data and discard noisy examples, and shows this improves precision. It also introduces methods for integrating information across sentences which improves both precision and recall of extracted relations.

Distant Supervision with Imitation Learning

Isabelle Augenstein

Imitation learning is used to address the problem of distant supervision for relation extraction. It decomposes the task into named entity classification (NEC) and relation extraction (RE), allowing the models to be trained separately. Through an iterative process, imitation learning is able to learn the dependencies between NEC and RE even when only labels for RE are provided. This overcomes limitations of prior approaches that rely on distantly labeled data. Evaluation shows the approach improves over baselines by leveraging multi-stage modeling to compensate for mistakes at the NEC stage.

Seed Selection for Distantly Supervised Web-Based Relation Extraction

Isabelle Augenstein

presentation 2

Marie Zelenina

This document discusses using eye tracking data for natural language processing applications. It provides an overview of eye tracking, including what it measures and examples of how it has been used. Specifically, it discusses using gaze features like fixation duration and skipping rate to train models for part-of-speech tagging. It proposes collecting more diverse eye tracking data from multiple languages and readers with varying abilities. This data would be used to extract additional word-level and text-level features to improve models for fine-grained part-of-speech tagging.

Open Knowledge Extraction at ESWC2016

Anna Lisa Gentile

The OKE challenge, launched as first edition at last year Extended Semantic Web Conference, ESWC2015, has the ambition to provide a reference framework for research on Knowledge Extraction from text for the Semantic Web by re-defining a number of tasks (typically from information and knowledge extraction), taking into account specific SW requirements. The OKE challenge defines three tasks, each one having a separate dataset: - Entity Recognition, Linking and Typing for Knowledge Base population - Class Induction and entity typing for Vocabulary and Knowledge Base enrichment - Web-scale Knowledge Extraction by Exploiting Structured Annotation. Challenge organizers: Andrea Giovanni Nuzzolese, Anna Lisa Gentile, Valentina Presutti, Aldo Gangemi, Robert Meusel, Heiko Paulheim.

Social Phrases Having Impact in Altmetrics - SOPHIA

Insight_Altmetrics

1) The document describes the SOPHIA project, which aims to build altmetric networks of researchers and institutions to understand how research impacts spread in society. 2) SOPHIA collects data from Scopus and social media sources to build a heterogeneous graph network, and analyzes the network using graph metrics to measure the influence and authority of researchers and institutions. 3) The project has developed visualization and search tools to explore the altmetric networks, annotated documents, and metrics within a software prototype called SOPHIA.

MLforIR.pps

butest

This document discusses machine learning and information retrieval. It introduces machine learning and describes some common applications like bioinformatics, robotics, and computer vision. It then discusses information retrieval, including traditional keyword search approaches and a new example-based approach. Several prototype systems are described that use this example-based approach for tasks like movie search, academic literature search, image retrieval, and protein search. The approach is statistically principled, computationally fast, and easily parallelized.

Learning from Noisy Label Distributions (ICANN2017)

Yuya Yoshikawa

This document presents a method for learning from noisy label distributions when labeled training data is unavailable. It proposes a probabilistic generative model to: 1) Infer true label distributions of groups from observed noisy distributions, by modeling the noise distortion process. 2) Infer the true label of each instance from the inferred true distributions and which groups it belongs to. 3) Learn a classifier using the inferred true labels. The model outperforms existing methods on synthetic data, especially when noise distortion is large. Future work includes experiments on real-world datasets.

The document discusses the steps involved in developing affective constructs and constructing non-cognitive measures. It explains that affective characteristics have dimensions of intensity and direction. Various affective scales are classified including attitudes, beliefs, interests, and values. The key steps outlined include deciding the construct to measure, developing subscales and items, selecting a response format, pilot testing the measures, analyzing validity and reliability, and revising the instrument based on results. Examples of different response formats for measures are also provided.

Is connectivism real v 19th

frazil

- Connectivism proposes that learning occurs through connections within networks, and is influenced by evolution over time as networks become more complex - While connectivity has likely occurred naturally, new mathematical network analysis tools may help test whether connectivity leads to emergent behaviors - If validated, network analysis could help optimize teaching methods by identifying influential student subgroups, at-risk students, and other insights from network dynamics

Summary and conclusion - Survey research and design in psychology

James Neill

This document provides an overview and summary of a lecture on survey research and design in psychology. It covers the following key points: - Survey research involves using standardized questionnaires to collect data on psychological phenomena. It has become a popular social science method since the 1920s. - Survey design considerations include whether the survey is self-administered or interview-based, the types of questions used, and response formats. Proper sampling and minimizing biases are also important. - Analysis of survey data involves descriptive statistics, graphs, and correlations to describe and explore relationships in the data. Tools like exploratory factor analysis can be used to develop psychometric instruments. Multiple linear regression allows predicting outcomes from multiple variables.

Revision of Scientific Method

Lily Kotze

Artificial Intelligence - Reason and Planning

ArchanaKK4

Unsupervised Main Entity Extraction from News Articles using Latent Variables

Jinho Choi

This document presents a methodology for semi-unsupervised main entity extraction from news articles using latent variables. It trains a semi-supervised model using only semantic and lexical information from raw text to automatically extract main entities from articles. The extracted entities are evaluated based on word sequence matches between the entities and news article titles, with the evaluation metric for this task needing improvement.

Feature Selection.pdf

adarshbarnwal5

Anomalies! You can't escape them.

CSIRO

Why are anomalies important? Because they tell us a different story from the norm. An anomaly or an event might signify a failing heart rate of a patient, a fraudulent credit card activity, or an early indication of a tsunami. As such, it is extremely important to detect anomalies or anomalous events. In this talk, we will give an introduction to anomaly detection. Anomalies are rare events. As a result, standard accuracy measures do not apply. But then, how do we evaluate an Anomaly Detection (AD) method? If we want to compare two or more AD methods, what kind of simple tests can we do? What are the data repositories that are available for AD? We will also discuss an ensemble method for AD. Constructing an AD ensemble is challenging because the class labels are not known. We will look at an unusual ally from psychometrics – Item Response Theory – to help us in this construction.

User Centered Design of an Android app

Satheesh Kumar Chandran

The document discusses the development of a faculty search mobile app using MIT App Inventor that allows users to search for professors by name or department and view their research interests and contact information. It outlines the vision, mock-up, scenario, and demonstration of the app, which was created to act as an information portal and reduce paper use. The app allows searching and viewing professor profiles but does not include email or calling functionality, keeping the features limited to only the essential functions.

Presentation for data science and data anayltics

timaprofile

The document discusses sentiment analysis on social media. It provides definitions and examples of key concepts in sentiment analysis, including sentiment, opinions, entities, aspects, subjectivity analysis, and sentiment classification. It also outlines common techniques for sentiment analysis, such as lexicon-based methods, supervised learning approaches, and aspects extraction. Finally, it discusses applications of sentiment analysis and levels of analysis, from words to sentences to documents.

Unit 2.pptx

WilliamTom9

Unit 2 discusses knowledge representation in artificial intelligence. It describes knowledge representation as the process of representing knowledge in a form that enables an AI system to reason with it and use it to solve problems. There are several types of knowledge that can be represented, including declarative, procedural, heuristic, and structural knowledge. Common approaches to knowledge representation include simple relational knowledge, inheritable knowledge, inferential knowledge, and procedural knowledge. Logical representation is a core technique that uses formal logic to represent knowledge through propositions and inference rules. Propositional logic represents the simplest form of logical representation using atomic and compound propositions connected by logical operators.

03 case-study-and-phenomenology

jessieldiez

The document discusses research design methods for case studies and phenomenology. It provides details on the case study method, including defining research questions, selecting cases, collecting and analyzing data, and preparing a report. As an example, it describes a case study examining whether an electronic community network is beneficial to non-profit organizations. It outlines the research questions and interview questions that would be used to collect data from these organizations. The document also discusses what constitutes a strong phenomenological research question by providing examples that clearly identify a phenomenon to explore, such as the experience of motherhood for deployed female soldiers.

Object modeling

Preeti Mishra

Object modeling involves identifying important objects (classes) within a system and defining their attributes, operations, and relationships. During object modeling, classes are identified based on system requirements and domain concepts. Key activities include class identification, defining class attributes and methods, and determining associations between classes. Object modeling results in a visual representation of classes and their relationships in class and other diagrams.

Lec 6 learning

Eyob Sisay

This document provides an introduction to machine learning, including definitions and explanations of key concepts such as learning, machine learning, the motivation for machine learning, the three phases of machine learning (training, validation, application), and different learning techniques including rote learning, inductive learning, and deductive learning. It also discusses symbol-based learning, connectionist learning, artificial neural networks, deep learning, and how machine learning is different from other forms of artificial intelligence.

Sylva workshop.gt that camp.2012

CameliaN

This document provides an overview of social network analysis and the Sylva software. It begins with key concepts in social network analysis including social structure, social networks, nodes, linkages, and additional terminology. It then discusses what makes social network analysis unique and provides examples of ego-centered and community-centered network analysis. Finally, it describes the features and capabilities of the Sylva software for collecting, storing, visualizing, and analyzing social network data.

Sentiment Analysis

Sagar Ahire

Semantic Data Retrieval: Search, Ranking, and Summarization

Gong Cheng

Gong Cheng presented on semantic data retrieval, including entity retrieval and association retrieval from semantic graphs. He discussed two main challenges: efficiently searching large graphs for associations within a diameter bound, and ranking the retrieved associations. For the first challenge, he proposed algorithms using path finding, pruning, and result deduplication. For the second challenge, he conducted a user study and found that association size was the most important ranking factor. Other proposed measures like entity homogeneity and relation heterogeneity had mixed user preferences.

Creating an Urban Legend: A System for Electrophysiology Data Management and ...

Anita de Waard

BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015

Charlie Hull

Meta analysis in neuroimaging 101

Krzysztof Gorgolewski

This document provides an overview of conducting a meta-analysis in neuroimaging. It discusses using tools like Sleuth to search for relevant studies, GingerALE to quantify the spatial overlap of activations across studies, and Neurosynth to explore terms and topics as well as decode unthresholded maps. The document also discusses challenges like reverse inference and the lack of selectivity of some brain regions. Students will learn how to use these tools to find papers, analyze the results, and better understand meta-analysis in neuroimaging.

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...

NAIST Machine Translation Study Group

On using monolingual corpora in neural machine translation

NAIST Machine Translation Study Group

This document summarizes research on leveraging monolingual corpora to improve neural machine translation. The researchers investigated two methods ("shallow fusion" and "deep fusion") for integrating a language model trained on monolingual data into the decoder of an NMT system. They found that both methods led to improved translation performance, with gains of over 1 BLEU point for lower-resource language pairs and around 0.4 BLEU point for higher-resource pairs. The degree of improvement depended on how similar the domain of the monolingual data was to the translation domain, with greater benefits observed when the domains closely matched.

Similar to [Paper Introduction] Distant supervision for relation extraction without labeled data

Developing affective constructs

Carlo Magno

Is connectivism real v 19th

frazil

Summary and conclusion - Survey research and design in psychology

James Neill

Revision of Scientific Method

Lily Kotze

Artificial Intelligence - Reason and Planning

ArchanaKK4

Unsupervised Main Entity Extraction from News Articles using Latent Variables

Jinho Choi

Feature Selection.pdf

adarshbarnwal5

Anomalies! You can't escape them.

CSIRO

User Centered Design of an Android app

Satheesh Kumar Chandran

Presentation for data science and data anayltics

timaprofile

Unit 2.pptx

WilliamTom9

03 case-study-and-phenomenology

jessieldiez

Object modeling

Preeti Mishra

Lec 6 learning

Eyob Sisay

Sylva workshop.gt that camp.2012

CameliaN

Sentiment Analysis

Sagar Ahire

Semantic Data Retrieval: Search, Ranking, and Summarization

Gong Cheng

Creating an Urban Legend: A System for Electrophysiology Data Management and ...

Anita de Waard

BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015

Charlie Hull

Meta analysis in neuroimaging 101

Krzysztof Gorgolewski

Similar to [Paper Introduction] Distant supervision for relation extraction without labeled data (20)

Developing affective constructs

Is connectivism real v 19th

Summary and conclusion - Survey research and design in psychology

Revision of Scientific Method

Artificial Intelligence - Reason and Planning

Unsupervised Main Entity Extraction from News Articles using Latent Variables

Feature Selection.pdf

Anomalies! You can't escape them.

User Centered Design of an Android app

Presentation for data science and data anayltics

Unit 2.pptx

03 case-study-and-phenomenology

Object modeling

Lec 6 learning

Sylva workshop.gt that camp.2012

Sentiment Analysis

Semantic Data Retrieval: Search, Ranking, and Summarization

Creating an Urban Legend: A System for Electrophysiology Data Management and ...

BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015

Meta analysis in neuroimaging 101

More from NAIST Machine Translation Study Group

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...

NAIST Machine Translation Study Group

On using monolingual corpora in neural machine translation

NAIST Machine Translation Study Group

RNN-based Translation Models (Japanese)

NAIST Machine Translation Study Group

[Paper Introduction] Efficient top down btg parsing for machine translation p...

NAIST Machine Translation Study Group

1) The document proposes an efficient top-down parsing algorithm for preordering source sentences in machine translation using bilexical grammar (BTG) trees. 2) Existing BTG-based preordering approaches are slow due to their use of CKY parsing and loss function calculations with time complexity of O(n^5). 3) The proposed approach uses an incremental top-down parsing algorithm with early updates and beam search, achieving time complexity of O(n^2) and making it 10-100 times faster than prior work. 4) Experimental results show the efficient approach provides better BLEU scores in machine translation compared to prior BTG preordering methods.

[Paper Introduction] Translating into Morphologically Rich Languages with Syn...

NAIST Machine Translation Study Group

[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...

NAIST Machine Translation Study Group

[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...

NAIST Machine Translation Study Group

This study evaluates machine translation systems using second language proficiency tests to measure human performance on tasks using machine-translated texts. The researchers had 320 Japanese junior high students answer multiple-choice questions based on conversations translated by 4 systems - Google Translate, Yahoo Translate, and two human translations, one with and one without context. They found that considering context was important for accurate translations, as the system that included context performed better. Scores on the proficiency tests agreed somewhat with automatic evaluation metrics but captured additional aspects of translation quality. The tests also proved robust to differences between test-takers.

[Paper Introduction] Bilingual word representations with monolingual quality ...

NAIST Machine Translation Study Group

1) The document discusses methods for creating bilingual word representations, which are vectors that represent words from two languages in a single vector space. 2) It presents an approach called Bilingual Skipgram that trains word representations by substituting words from one language to predict contexts in the other language. 3) Evaluation shows this approach achieves better performance on monolingual tasks compared to previous methods, while still performing well on cross-lingual tasks.

[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...

NAIST Machine Translation Study Group

The document presents a context-aware topic model (CATM) for statistical machine translation. CATM jointly models local sentence context and global document topics to improve lexical selection. It achieves the highest translation performance compared to models using only context or topics. The CATM is the first work to jointly learn both context and topic information for lexical selection in statistical machine translation.

[Book Reading] 機械翻訳 - Section 3 No.1

NAIST Machine Translation Study Group

The document discusses methods for estimating the probability (P) that a sentence (e) is natural or grammatically correct using n-gram language models. It explains that n-gram models approximate P(e) by considering the probability of word sequences of length n rather than all preceding words. This helps address the problem of P(e) being estimated as 0 when e is not present in the training data. The document also covers smoothing techniques like linear interpolation and Witten-Bell smoothing that combine n-gram and (n-1)-gram probabilities to further address cases where n-gram probabilities are 0.

[Paper Introduction] Training a Natural Language Generator From Unaligned Data

NAIST Machine Translation Study Group

The document summarizes a research paper on training a natural language generator from unaligned data. The paper proposes a novel method that integrates the data alignment step into the sentence planning process using deep syntactic trees and rule-based surface realization. This allows the system to learn from incomplete trees and capture long-range syntactic dependencies without requiring a separate alignment step. The method uses an A* search algorithm during sentence planning and is trained on a restaurant domain dataset to generate text from abstract representations, showing improvement over previous work.

[Book Reading] 機械翻訳 - Section 5 No.2

NAIST Machine Translation Study Group

This document discusses various techniques for optimizing search space in phrase-based machine translation models, including: 1) Using graph structures and semirings like the tropical semiring to represent translation hypotheses as paths through a weighted graph and find optimal paths. 2) Applying constraints like distortion limits and beam search to prune unpromising partial translations. 3) Using heuristic functions to guide the search and pre-ordering methods like rules and learned models to reorder languages with different word orders.

[Book Reading] 機械翻訳 - Section 7 No.1

NAIST Machine Translation Study Group

The document discusses various methods for optimization in machine translation decoding, including loss minimization, minimum error rate training (MERT), softmax loss, max margin loss, pairwise ranking optimization, and minimum Bayes risk. It covers challenges like non-differentiable error functions and vast search spaces, and how different methods address these challenges through techniques like Powell's method, gradient-based methods, and sentence-level BLEU approximations.

[Book Reading] 機械翻訳 - Section 2 No.2

NAIST Machine Translation Study Group

This document discusses various automatic evaluation metrics for machine translation: - BLEU evaluates matching n-grams between reference and translated texts but ignores position and favors shorter translations. - METEOR explicitly matches words accounting for stem, synonym, and paraphrase matches. It aims for high precision and recall. - RIBES uses rank correlation coefficients between reference and translation word order to evaluate language pairs where word-for-word matching is difficult. - Statistical testing like bootstrapping is used to determine if differences in evaluation scores between systems are statistically significant.

More from NAIST Machine Translation Study Group (14)

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...

On using monolingual corpora in neural machine translation

RNN-based Translation Models (Japanese)

[Paper Introduction] Efficient top down btg parsing for machine translation p...

[Paper Introduction] Translating into Morphologically Rich Languages with Syn...

[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...

[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...

[Paper Introduction] Bilingual word representations with monolingual quality ...

[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...

[Book Reading] 機械翻訳 - Section 3 No.1

[Paper Introduction] Training a Natural Language Generator From Unaligned Data

[Book Reading] 機械翻訳 - Section 5 No.2

[Book Reading] 機械翻訳 - Section 7 No.1

[Book Reading] 機械翻訳 - Section 2 No.2

Recently uploaded

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

ScyllaDB

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk

Fwdays

At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience

Skybuffer SAM4U tool for SAP license adoption

Tatiana Kojar

Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool. SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.

Choosing The Best AWS Service For Your Website + API.pptx

Brandon Minnick, MBA

Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API? Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose? Which one is cheapest? Which one is fastest? Which one will scale to meet our needs? Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

saastr

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

Chart Kalyan

The Microsoft 365 Migration Tutorial For Beginner.pptx

operationspcvita

Introduction of Cybersecurity with OSS at Code Europe 2024

Hiroshi SHIBATA

I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems. The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS. Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application. I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.

Demystifying Knowledge Management through Storytelling

Enterprise Knowledge

The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event. The objectives of the Lunch and Learn presentation were to: - Review what KM ‘is’ and ‘isn’t’ - Understand the value of KM and the benefits of engaging - Define and reflect on your “what’s in it for me?” - Share actionable ways you can participate in Knowledge - - Capture & Transfer

JavaLand 2024: Application Development Green Masterplan

Miro Wengner

A Deep Dive into ScyllaDB's Architecture

ScyllaDB

Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels

Northern Engraving

"Choosing proper type of scaling", Olena Syrota

Fwdays

5th LF Energy Power Grid Model Meet-up Slides

DanBrown980551

5th Power Grid Model Meet-up It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology. Power Grid Model The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services. Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability. Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization. What to expect For the upcoming meetup we are organizing, we have an exciting lineup of activities planned: -Insightful presentations covering two practical applications of the Power Grid Model. -An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024. -An interactive brainstorming session to discuss and propose new feature requests. -An opportunity to connect with fellow Power Grid Model enthusiasts and users.

Columbus Data & Analytics Wednesdays - June 2024

Jason Packer

inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill

LizaNolte

HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable. In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed: Key Takeaways: Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement. Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers. Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.

GNSS spoofing via SDR (Criptored Talks 2024)

Javier Junquera

In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security. This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing. The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

akankshawande

Dandelion Hashtable: beyond billion requests per second on a commodity server

Antonios Katsarakis

This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...

Alex Pruden

Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security. Paper Link: https://eprint.iacr.org/2024/257

Recently uploaded (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk

Skybuffer SAM4U tool for SAP license adoption

Choosing The Best AWS Service For Your Website + API.pptx

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

The Microsoft 365 Migration Tutorial For Beginner.pptx

Introduction of Cybersecurity with OSS at Code Europe 2024

Demystifying Knowledge Management through Storytelling

JavaLand 2024: Application Development Green Masterplan

A Deep Dive into ScyllaDB's Architecture

Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels

"Choosing proper type of scaling", Olena Syrota

5th LF Energy Power Grid Model Meet-up Slides

Columbus Data & Analytics Wednesdays - June 2024

inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill

GNSS spoofing via SDR (Criptored Talks 2024)

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

Dandelion Hashtable: beyond billion requests per second on a commodity server

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...

[Paper Introduction] Distant supervision for relation extraction without labeled data

1. Distant supervision for relation extraction without labeled data Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky  ACL 2009 Introduced by Makoto Morishita

2. Contribution of this paper • Proposed “distant supervision” for the ﬁrst time. • By using distant supervision,  we can extract the relation between entities from the sentences without annotation work. 2

3. Current training methods • Supervised learning • Unsupervised learning • Self-training • Active learning 3

4. Supervised learning • Use only annotated data to train a model. • Need a heavy cost to make the data. 4 Annotated data

5. Unsupervised learning 5 • Use only unannotated data. • The result may not be suitable for some purposes. Unannotated data

6. Self-training 6 • Use annotated data for the seed of training model, then annotate the unlabeled data by myself. • It may be low precision and have a bias from the annotated data.           Unannotated data Annotated data

7. Active learning 7 • Use existing model to evaluate what data we want to next, then annotate the selected data. Unannotated data Annotated data Evaluate Annotate

8. Distant supervision 8 • We use existing database and unannotated data to train classiﬁer, then annotate the new data. Unannotated data Classiﬁer Unannotated data Existing database train train annotate

9. In this paper…

10. 10

11. 11

12. What we want to do • Extract the relation between entities from sentences. • e.g.  sentence: Kyoto, the famous place in Japan.  entity: Japan, Kyoto  relation: location-contains <Japan, Kyoto>  12

13. In this work… 13 • Freebase: 102 relations, 940k entities,   1.8M instances. Unannotated data Classiﬁer Unannotated data Freebase train train annotate Wikipedia Multiclass logistic  regression classiﬁer Wikipedia

14. Freebase 14

15. Training • Find the sentence that contains two entities.  - This sentence tends to express the relation.  - Entities are found by a named entity tagger. • Train classiﬁer.  - I will explain the features later. 15

16. Example • Known relation:  location-contains <Virginia, Richmond>  location-contains <France, Nantes> • We found the sentences like:  - Richmond, the capital of Virginia.  - Henry’s Edict of Nantes helped the Protestants of France. • Train the classiﬁed using these sentences. 16

17. Testing • Find the sentence that contains two entities.  - This sentence tends to express the relation.  - Entities are found by a named entity tagger. • Using trained classiﬁer, we can know these entities have a relation. 17

18. Features • Lexical features:  - speciﬁc words between and surrounding the two entities in the sentence. • Syntactic features:  - dependency path 18

19. Lexical features • The sequence of words between the two entities. • The part-of-speech tags of these words. • A flag indication which entity came first in the sentence. • A window of k words to the left of Entity 1 and their part-of-speech tags. • A window of k words to the right of Entity 2 and their part-of-speech tags. 19 Astronomer Edwin Hubble was born in Marshfield, Missouri.

20. Syntactic features 20 • A dependency path between the two entities. • For each entity, one “window” node that is not part of the dependency path.

21. Result

22. Trained features 22

23. Automatic evaluation 23

24. Human evaluation 24

25. Conclusion • By using this method, we can extract the relation from unlabeled texts. • By using database, the label is suit for the current database. • Extracted relations are seemed to be accurate. 25

26. Example usage of distant supervision 26 Existing database Target annotation Freebase  (relation between entities) Wikipedia sentences  (ﬁnd new relations) Emoticon Tweet  (annotate positive, negative) Dependency parse tree, knowledge base semantic parser

27. Comments • Distant supervision can be useful for other tasks.  - Currently, this method is used mainly for relation extraction task. • However, it supposes that we already have a large database. 27

28. END

[Paper Introduction] Distant supervision for relation extraction without labeled data

Recommended

Recommended

More Related Content

Similar to [Paper Introduction] Distant supervision for relation extraction without labeled data

Similar to [Paper Introduction] Distant supervision for relation extraction without labeled data (20)

More from NAIST Machine Translation Study Group

More from NAIST Machine Translation Study Group (14)

Recently uploaded

Recently uploaded (20)

[Paper Introduction] Distant supervision for relation extraction without labeled data