SlideShare a Scribd company logo
Text Mining 5
Information Extraction
!
Madrid Summer School 2014
Advanced Statistics and Data Mining
!
Florian Leitner
florian.leitner@upm.es
for real, now!
License:
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Incentive and Applications
➡ Statistical analyses of text sequences
!
• Text Segmentation
‣ word & sentence boundaries (see Text Mining #3)
• Part-of-Speech (PoS) Tagging & Chunking
‣ noun, verb, adjective, … & noun/verb/preposition/… phrases
• Named Entity Recognition (NER)
‣ organizations, persons, places, genes, chemicals, …
• Information Extraction
‣ locations/times, constants, formulas, entity linking, entity-relationships, …
144
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Probabilistic Graphical
Models (PGM)
• Graphs of hidden (blank) and observed (shaded) variables
(vertices/nodes).
• The edges depict dependencies, and if directed, show the
causal relationship between nodes.
145
H O
S3S2S1
directed ➡ Bayesian Network (BN)
undirected ➡ Markov Random Field (MRF)
mixed ➡ Mixture Models
Koller & Friedman. Probabilistic Graphical Models. 2009
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Hidden vs. Observed State
and Statistical Parameters
Rao. A Kalman Filter Model of the Visual Cortex. Neural Computation 1997
146
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
A Bayesian Network for
the Monty Hall Problem
147
Choice
1
Choice
2
Goat
Winner
Car
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Markov Random Field
148
normalizing constant (partition function Z)
factor (clique potential)
A
B
C
D
θ(A, B) θ(B, C) θ(C, D) θ(D, A)
a b 30 b c 100 c d 1 d a 100
a b 5 b c 1 c d 100 d a 1
a b 1 b c 1 c d 100 d a 1
a b 10 b c 100 c d 1 d a 100
P(a1, b1, c0, d1) =
10·1·100·100 ÷
7’201’840 =
0.014
factor table
factor graph
P(X = ~x) =
Q
cl2~x cl(cl)
P
~x2X
Q
cl2~x cl(cl)
cl … [maximal] clique; a subset of nodes in the
graph where every pair of nodes is connected
History class: Ising developed a linear
field to model (binary) atomic spin states
(Ising, 1924); the 2-dim. model problem
was solved by Onsager in 1944.
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
A First Look At Probabilistic
Graphical Models
• Latent Dirichlet Allocation: LDA
‣ Blei, Ng, and Jordan. Journal of Machine Learning Research 2003
‣ For assigning “topics” to “documents”
149
i.e., Text Classification; last lesson (4)
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Latent Dirichlet Allocation
(LDA 1/3)
• Intuition for LDA
• From: Edwin Chen. Introduction to LDA. 2011
‣ “Document Collection”
• I like to eat broccoli and bananas.
• I ate a banana and spinach smoothie for breakfast.
• Chinchillas and kittens are cute.
• My sister adopted a kitten yesterday.
• Look at this cute hamster munching on a piece of broccoli.
!
!
150
➡ Topic A
➡ Topic B
➡ Topic 0.6A + 0.4B
Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, …
!
Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, …
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
The Dirichlet Process
• A Dirichlet is a [possibly continuos] distribution over
[discrete/multinomial] distributions (probability masses).
!
!
!
• The Dirichlet Process samples multiple independent,
discrete distributions θi with repetition from θ (“statistical
clustering”).
151
D(✓, ↵) =
(
P
↵i)
Q
(↵i)
Y
✓↵i 1
i
∑ i = 1; a PMF
Xnα
N
a Dirichlet Process is like drawing from an
(infinite) ”bag of dice“ (with finite faces)
Gamma function -> a
“continuous” factorial [!]
1. Draw a new distribution X from D(θ, α)
2. With probability α ÷ (α + n - 1) draw a new X

With probability n ÷ (α + n - 1), (re-)sample an Xi from X
Dirichlet prior: ∀ i ∈ : i > 0
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
The Dirichlet Prior α
152
bluered
green
Frigyik et al. Introduction to the Dirichlet
Distribution and Related Processes. 2010
⤳ equal, =1 ➡ uniform distribution
⤳ equal, <1 ➡ marginal distrib. (”choose few“)
⤳ equal, >1 ➡ symmetric, mono-modal distrib.
⤳ not equal, >1 ➡ non-symmetric distribution
Documents and Topic distributions (N=3)
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Latent Dirichlet Allocation
(LDA 2/3)
• α - per-document Dirichlet
prior
• θd - topic distribution of
document d
• zd,n - word-topic assignments
• wd,n - observed words
• βk - word distrib. of topic k
• η - per-topic Dirichlet prior
153
βkwd,nzd,nθdα η
ND K
Documents Words Topics
P(B, ⇥, Z, W) =
KY
k
P( k|⌘)
! DY
d
P(✓d|↵)
NY
n
P(zd,n|✓d)P(wd,n| 1:K, zd,n)
!
P(Topics)
P(Document-T.)
P(Word-T. | Document-T.)
P(Word | Topics, Word-T.)
Joint Probability
Posterior: P(Topic | Word) P(B, , Z | W) = P(B, , Z, W) ÷ P(W)
termscorek,n = ˆk,n log
ˆk,n
⇣QK
j
ˆj,n
⌘1/K
dampens the topic-specific score of terms assigned to many topics
What Topics is a Word assigned to?
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Latent Dirichlet Allocation
(LDA 3/3)
• LDA inference in a nutshell
‣ Calculate the probability that Topic t generated Word w.
‣ Initialization: Choose K and randomly assign one out of the K Topics to each of
the N Words in each of the D Documents.
• The same word can have different Topics at different positions in the Document.
‣ Then, for each Topic:

And for each Word in each Document:
1. Compute P(Word-Topic | Document): the proportion of [Words assigned to] Topic t in Document d
2. Compute P(Word | Topics, Word-Topic): the probability a Word w is assigned a Topic t (using the general
distribution of Topics and the Document-specific distribution of [Word-] Topics)
• Note that a Word can be assigned a different Topic each time it appears in a Document.
3. Given the prior probabilities of a Document’s Topics and that of Topics in general, reassign

P(Topic | Word) = P(Word-Topic | Document) * P(Word | Topics, Word-Topic)
‣ Repeat until P(Topic | Word) stabilizes (e.g., MCMC Gibbs sampling, Course 04)
154
MCMC in Python: PyMC or PySTAN
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Retrospective
We have seen how to…
• Design generative models of natural language.
• Segment, tokenize, and compare text and/or sentences.
• [Multi-] labeling whole documents or chunks of text.
!
Some open questions…
• How to assign labels to individual tokens in a stream?
‣ without using a dictionary/gazetteer
• How to detect semantic relationships between tokens?
‣ dependency & phrase structure parsing not in this class !
155
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Probabilistic Models for
Sequential Data
• a.k.a. Temporal or Dynamic Bayesian Networks
‣ static process w/ constant model ➡ temporal process w/ dynamic model
‣ model structure and parameters are [still] constant
‣ the topology within a [constant] “time slice” is depicted
• Markov Chain (MC; Markov. 1906)
• Hidden Markov Model (HMM; Baum et al. 1970)
• MaxEnt Markov Model (MEMM; McCallum et al. 2000)
• [Markov] Conditional Random Field (CRF; Lafferty et al. 2001)
‣ Naming: all four models make the Markov assumption (Text Mining 2)
156
generativediscriminative
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
From A Markov Chain to a
Hidden Markov Model
157
WW’ SS’
W
hidden states
observed state
S1S0
W1
S2
W2
S3
W3
“unrolled”
(for 3 words)
“time-slice”
W depends on S
and S in turn
only depends on
S’
P(W | W’) P(W | S)
P(S | S’)
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
A Language-Based Intuition
for Hidden Markov Models
• A Markov Chain: P(W) = ∏ P(w | w’)
‣ assumes the observed words are in and of themselves the cause of the
observed sequence.
• A Hidden Markov Model P(S, W) = ∏ P(s | s’) P(w | s)
‣ assumes the observed words are emitted by a hidden (not observable)
sequence, for example the chain of part-of-speech-states.
158
S DT NN VB NN .
The dog ran home !
again, this is the “unrolled” model that does not depict the conditional dependencies
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
The two Matrices of a
Hidden Markov Model
159
P(s | s’) DT NN VB …
DT 0.03 0.7 0
NN 0 0 0.5
VB 0 0.5 0.2
…
P(w | s) word word word …
DT 0.3 0 0
NN 0.0001 0.002 0
VB 0 0 0.001
…
SS’
W
very sparse (W is large)
➡ Smoothing!
Transition Matrix
Observation Matrix
underflow danger ➡ use “log Ps”!
a.k.a. “CPDs”: Conditional Probability Distributions
(as measured from annotated PoS corpora)
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Three Problems Solved by
HMMs
Evaluation: Given a HMM, infer the P of the observed
sequence (because a HMM is a generative model).
Solution: Forward Algorithm
Decoding: Given a HMM and an observed sequence, predict
the hidden states that lead to this observation.
Solution: Viterbi Algorithm
Training: Given only the graphical model and an observation
sequence, learn the best [smoothed] parameters.
Solution: Baum-Welch Algorithm
160
all three algorithms are are implemented using dynamic programming
in Bioinformatics:
Likelihood of a particular DNA
element
in Statistical NLP:
PoS annotation
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Three Limitations of HMMs
Markov assumption: The next state only depends on the
current state.
Example issue: trigrams
Output assumption: The output (observed value) is
independent of all previous outputs (given the current state).
Example issue: word morphology
Stationary assumption: Transition probabilities are
independent of the actual time when they take place.
Example issue: position in sentence
161
(label bias* problem!)
(long-range dependencies!)
(inflection, declension!)
MEMMCRF“unsolved”
(CRF)
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
From Generative to
Discriminative Markov Models
162
SS’
W S’ S
W
S’ S
W
Hidden Markov Model
Maximum Entropy Markov Model
Conditional Random

Field
this “clique” makes
CRFs expensive to
compute
generative model; lower
bias is beneficial for
small training sets
(linear chain version)(first oder version)
P(S’, S, W)
P(S | S’, W)
NB boldface W:
all words!
(first oder version)
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Maximum Entropy
(MaxEnt 3/3)
‣ In summary, MaxEnt is about selecting the “maximal” model p*:
!
!
!
‣ That obeys the following conditional equality constraint:
!
!
!
‣ Using, e.g., Langrange multipliers, one can establish the optimal λ weights
of the model that maximize the entropy of this probability:
163
X
x2X
P(x)
X
y2Y
P(y|x) f(x, y) =
X
x2X,y2Y
P(x, y) f(x, y)
p⇤
= argmax
p2P
X
x2X
p(x)
X
y2Y
p(y|x) log2 p(y|x)
p⇤
(y|x) =
exp(
P
ifi(x, y))
P
y2Y exp(
P
ifi(x, y))
…using a conditional model that matches the (observed) joint probabilities
select some model that maximizes the conditional entropy…
REMINDER
“Exponential Model”
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Maximum Entropy Markov
Models (MEMM)
Simplify training of P(s | s’, w)

by splitting the model into |S| separate
transition functions Ps’(s | w) for each s’
164
P(S | S’, W)
S’ S
W
Ps0 (s|w) =
exp (
P
ifi(w, s))
P
s⇤2S exp (
P
ifi(w, s⇤))
P(s | s’, w) = Ps’(s | w)
MaxEnt used {x, y} which in the sequence model now are {w, s}
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
The Label Bias Problem of
Directional Markov Models
165
The robot wheels are round.
The robot wheels Fred around.
The robot wheels were broken.
DT NN VB NN RB
DT NN NN VB JJ
DT NN ?? — —
yet unseen!
Wallach. Efficient Training of CRFs. MSc 2002
because of their
directionality constraints,
MEMMs & HMMs suffer
from the label bias problem
S’ S
W
MEMM
HMM
SS’
W
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Markov Random Field
166
normalizing constant (partition function - Z)
factor (clique potential)
A
B
C
D
θ(A, B) θ(B, C) θ(C, D) θ(D, A)
a b 30 b c 100 c d 1 d a 100
a b 5 b c 1 c d 100 d a 1
a b 1 b c 1 c d 100 d a 1
a b 10 b c 100 c d 1 d a 100
P(a1, b1, c0, d1) =
10·1·100·100 ÷
7’201’840 =
0.014
factor table
factor graph
P(X = ~x) =
Q
cl2~x cl(cl)
P
~x2X
Q
cl2~x cl(cl)
cl … [maximal] clique; a subset of nodes in the
graph where every pair of nodes is connected
REMINDER
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Conditional Random Field
167
SS’
W
w1, …, wn
Models a per-state exponential function of joint
probability over the entire observed sequence W.
(max.)
clique
MRF:
Label bias “solved” by
conditioning the MRF Y-Y’
on the (possibly) entire
observed sequence (feature
function dependent).
P(s|W) =
exp (
P
ifi(W, s0
, s))
P
s⇤2S exp (
P
ifi(W, s0⇤, s⇤))
CRF:
note W (upper-case/bold),
not w (lower-case)
P(X = ~x) =
Q
cl2~x cl(cl)
P
~x2X
Q
cl2~x cl(cl)
P(Y = ~y |X = ~x) =
Q
y2~y cl(y0
, y, ~x)
P
~y2Y
Q
y2~y cl(y0, y, ~x)
” P(s, s’, W) “
P(W)
s’ WENT MISSING
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Parameter Estimation and
L2-Regularization of CRFs
• For training {Y(n), X(n)}N
n=1 sequence pairs with K features
• Parameter estimation using conditional log-likelihood
!
!
• Substitute log P(Y(n) | X(n)) with log exponential model
!
!
• Add a penalty for parameters with a to high L2-norm
168
free parameter
`(⇤) = argmax
2⇤
NX
n
log P(Y (n)
|X(n)
; )
`( ) =
NX
n=1
X
y2Y (n)
KX
i=1
ifi(y0
, y, X(n)
)
NX
n=1
log Z(X(n)
)
`( ) =
NX
n=1
X
y2Y (n)
KX
i=1
ifi(y0
, y, X(n)
)
NX
n=1
log Z(X(n)
)
KX
i=1
2
i
2 2
(regularization reduces the effects of overfitting)
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Model Summary:

HMM, MEMM, CRF
• A HMM
‣ generative model
‣ efficient to learn and deploy
‣ trains with little data
‣ generalizes well (low bias)
• A MEMM
‣ better labeling performance
‣ modeling of features
‣ label bias problem
!
• CRF
‣ conditioned on entire observation
‣ complex features over full input
‣ training time scales exponentially
• O(NTM2
G)
• N: # of sequence pairs;

T: E[sequence length];

M: # of (hidden) states;

G: # of gradient computations for parameter
estimation
169
a PoS model w/45 states and 1 M
words can take a week to train…
Sutton & McCallum. An Introduction to CRFs for Relational Learning. 2006
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Example Applications
of Sequence Models
170
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
The Parts of Speech
• The Parts of Speech:
‣ noun: NN, verb: VB, adjective: JJ, adverb: RB, preposition: IN,

personal pronoun: PRP, …
‣ e.g. the full Penn Treebank PoS tagset contains 48 tags:
‣ 34 grammatical tags (i.e., “real” parts-of-speech) for words
‣ one for cardinal numbers (“CD”; i.e., a series of digits)
‣ 13 for [mathematical] “SYM” and currency “$” symbols, various types of
punctuation, as well as for opening/closing parenthesis and quotes
171
I ate the pizza with green peppers .
PRP VB DT NN IN JJ NN .
“PoS tags”
REMINDER
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
The Parts of Speech
• Corpora for the [supervised] training of PoS taggers
‣ Brown Corpus (AE from ~1961)
‣ British National Corpus: BNC (20th
century British)
‣ American National Corpus: ANC (AE from the 90s)
‣ Lancaster Corpus of Mandarin Chinese: LCMC (Books in Mandarin)
‣ The GENIA corpus (Biomedical abstracts from PubMed)
‣ NEGR@ (German Newswire from the 90s)
‣ Spanish and Arabian corpora should be (commercially…) available… ???
172
I ate the pizza with green peppers .
PRP VB DT NN IN JJ NN .
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Noun/Verb Phrase Chunking
and BIO-Labels
173
The brown fox quickly jumps over the lazy dog .
a pangram (hint: check the letters)
DT JJ NN RB VBZ IN DT JJ NN .
B-N I-N I-N B-V I-V O B-N I-N I-N O
Performance (2nd
order CRF) ~ 94%
Main Problem: embedded & chained NPs (N of N and N)
!
Chunking is “more robust to the highly diverse corpus of text on the Web”
and [exponentially] faster than parsing.
Banko et al. Open Information Extraction from the Web. IJCAI 2007 a paper with over 700 citations
Wermter et al. Recognizing noun phrases in biomedical text. SMBM 2005 error sources
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Word Sense Disambiguation
(WSD)
• Basic Example: hard [JJ]
‣ physically hard (a hard stone)
‣ difficult [task] (a hard task)
‣ strong/severe (a hard wind)
‣ dispassionate [personality] (a hard
bargainer)
• Entity Disambig.: bank [NN]
‣ finance (“bank account”)
‣ terrain (“river bank”)
‣ aeronautics (“went into bank”)
‣ grouping (“a bank of …”)
!
• SensEval
‣ http://www.senseval.org/
‣ SensEval/SemEval Challenges
‣ provides corpora where every word is
tagged with its sense
• WordNet
‣ http://wordnet.princeton.edu/
‣ a labeled graph of word senses
• Applications
‣ Named Entity Recognition
‣ Machine Translation
‣ Language Understanding
174
Note the PoS-tag “dependency”:
otherwise, the two examples
would have even more senses!
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Word Representations:
Unsupervised WSD
• Idea 1: Words with similar meaning have similar environments.
• Use a word vector to count a word’s surrounding words.
• Similar words now will have similar word vectors.
‣ see Text Mining 4, Cosine Similarity
‣ visualization: Principal Component Analysis
!
• Idea 2: Words with similar meaning have similar environments.
• Use the surrounding of unseen words to “smoothen” language models
(i.e., the correlation between word wi and its context cj).
‣ see Text Mining 4: TF-IDF weighting, Cosine similarity and point-wise MI
‣ Levy & Goldberg. Linguistic Regularities in Sparse and Explicit Word Representations.
CoNLL 2014
175
Levy & Goldberg took
two words on each side
to “beat” a neural

network model with a
four word window!
Turian et al. Word representations. ACL 2010
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Named Entity Recognition
(NER)
176
Date
Location
Money
Organization
Percentage
Person
➡ Conditional Random Field
➡ Ensemble Methods; +SVM, HMM, MEMM, … ➡ pyensemble

NB these are corpus-based approaches (supervised)
CoNLL03: http://www.cnts.ua.ac.be/conll2003/ner/
!
How much training 

data do I need?
“corpora list”
Image Source: v@s3k [a GATE session; http://vas3k.ru/blog/354/]
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
The Next Step: Relationship
Extraction
Given this piece of text:
The fourth Wells account moving to another agency is the
packaged paper-products division of Georgia-Pacific Corp., which
arrived at Wells only last fall. Like Hertz and the History Channel,
it is also leaving for an Omnicom-owned agency, the BBDO South
unit of BBDO Worldwide. BBDO South in Atlanta, which handles
corporate advertising for Georgia-Pacific, will assume additional
duties for brands like Angel Soft toilet tissue and Sparkle paper
towels, said Ken Haldin, a spokesman for Georgia-Pacific in
Atlanta.
Which organizations operate in Atlanta? (BBDO S., G-P)
177
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Four Approaches to
Relationship Extraction
• Co-mention window
‣ e.g.: if ORG and LOC entity within
same sentence and no more than x
tokens in between, treat the pair as a
hit.
‣ Low precision, high recall; trivial,
many false positives.
• Dependency Parsing
‣ if the shortest a path containing
certain nodes (e.g. the “in/IN”)
connecting the two entities, extract
the pair.
‣ Balanced precision and recall,
computationally EXPENSIVE.
• Pattern Extraction
‣ e.g.: <ORG>+ <IN> <LOC>+
‣ High precision, low recall;
cumbersome, but very common
‣ pattern learning can help
• Machine Learning
‣ features for sentences with entities
and some classifier (e.g., SVM,
MaxEnt, ANN, …)
‣ very variable milages



but much more fun than anything
else in the speaker’s opinion…
178
preposition (PoS)
token-distance,
#tokens between the
entities, tokens
before/after them,
etc.)
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Advanced Evaluation Metrics:
AUC ROC and PR .
179
Image Source: WikiMedia Commons, kku (“kakau”, eddie)
TPR / Recall"
TP ÷ (TP + FN)
!
FPR"
FP ÷ (FP + TN)
!
Precision"
TP ÷ (TP + FP)
• Davis & Goadrich. The Relationship Between PR and ROC
Curves. ICML 2006
• Landgrebe et al. Precision-recall operating characteristic
(P-ROC) curves in imprecise environments. Pattern
Recognition 2006
• Hanczar et al. Small-Sample Precision of ROC-Related
Estimates. Bioinformatics 2010
“an algorithm which optimizes the area under
the ROC curve is not guaranteed to optimize
the area under the PR curve”
Davis & Goadrich, 2006
Area Under the Curve
Receiver-Operator Characteristic
Precision-Recall
i.e., ROC is
affected by the
same class-
imbalance issues
as accuracy
(Lesson #4)!
Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining
Practical: Implement a
Shallow Parser and NER
Implement a Python class and command-line tool that will allow
you to do shallow parsing and NER on “standard” English text.
180
Richard Feynman, 1974
“The first principle is that you must not fool yourself
- and you are the easiest person to fool.”

More Related Content

What's hot

Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
ananth
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
Marina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
Marina Santini
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Nik Spirin
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining
Yi-Shin Chen
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
Minh Pham
 
Natural Language Processing in Practice
Natural Language Processing in PracticeNatural Language Processing in Practice
Natural Language Processing in Practice
Vsevolod Dyomkin
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Seonghyun Kim
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
Manohar Swamynathan
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
Marina Santini
 
Text mining - from Bayes rule to dependency parsing
Text mining - from Bayes rule to dependency parsingText mining - from Bayes rule to dependency parsing
Text mining - from Bayes rule to dependency parsing
Florian Leitner
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
Marina Santini
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
ananth
 
NLP
NLPNLP
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
rudolf eremyan
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
Rutu Mulkar-Mehta
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 

What's hot (20)

Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing in Practice
Natural Language Processing in PracticeNatural Language Processing in Practice
Natural Language Processing in Practice
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Text mining - from Bayes rule to dependency parsing
Text mining - from Bayes rule to dependency parsingText mining - from Bayes rule to dependency parsing
Text mining - from Bayes rule to dependency parsing
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
NLP
NLPNLP
NLP
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 

Viewers also liked

Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
guest0edcaf
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
Minha Hwang
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
Michel Bruley
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
Datamining Tools
 
Information Extraction --- An one hour summary
Information Extraction --- An one hour summaryInformation Extraction --- An one hour summary
Information Extraction --- An one hour summary
Yunyao Li
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
butest
 
Python + NoSQL in Animations
Python + NoSQL in AnimationsPython + NoSQL in Animations
Python + NoSQL in Animations
Shuen-Huei Guan
 
Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Cataldo Musto
 
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Koji Matsuda
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
Tommaso Teofili
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
semanticsconference
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
Tommaso Teofili
 
Text mining
Text miningText mining
Text mining
Ali A Jalil
 

Viewers also liked (13)

Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Information Extraction --- An one hour summary
Information Extraction --- An one hour summaryInformation Extraction --- An one hour summary
Information Extraction --- An one hour summary
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Python + NoSQL in Animations
Python + NoSQL in AnimationsPython + NoSQL in Animations
Python + NoSQL in Animations
 
Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...
 
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Text mining
Text miningText mining
Text mining
 

Similar to OUTDATED Text Mining 5/5: Information Extraction

Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
Kalpit Desai
 
Topic modelling
Topic modellingTopic modelling
Topic modelling
Shubhmay Potdar
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
Liwei Ren任力偉
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
Habtamu100
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
Web and text
Web and textWeb and text
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
Houw Liong The
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
Uma Se
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
Aidan Hogan
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
rchbeir
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
ssuser4293bd
 
2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt
HayomeTakele
 
Text features
Text featuresText features
Text features
Shruti kar
 
Text Mining
Text MiningText Mining
Text Mining
sathish sak
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
Keerti Bhogaraju
 
Lec12 semantic processing
Lec12 semantic processingLec12 semantic processing
Lec12 semantic processing
Manju Rajput
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
DigiGurukul
 
Bytewise approximate matching, searching and clustering
Bytewise approximate matching, searching and clusteringBytewise approximate matching, searching and clustering
Bytewise approximate matching, searching and clustering
Liwei Ren任力偉
 

Similar to OUTDATED Text Mining 5/5: Information Extraction (20)

Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Topic modelling
Topic modellingTopic modelling
Topic modelling
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
Web and text
Web and textWeb and text
Web and text
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt
 
Text features
Text featuresText features
Text features
 
Text Mining
Text MiningText Mining
Text Mining
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Lec12 semantic processing
Lec12 semantic processingLec12 semantic processing
Lec12 semantic processing
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Bytewise approximate matching, searching and clustering
Bytewise approximate matching, searching and clusteringBytewise approximate matching, searching and clustering
Bytewise approximate matching, searching and clustering
 

Recently uploaded

Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 

Recently uploaded (20)

Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 

OUTDATED Text Mining 5/5: Information Extraction

  • 1. Text Mining 5 Information Extraction ! Madrid Summer School 2014 Advanced Statistics and Data Mining ! Florian Leitner florian.leitner@upm.es for real, now! License:
  • 2. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Incentive and Applications ➡ Statistical analyses of text sequences ! • Text Segmentation ‣ word & sentence boundaries (see Text Mining #3) • Part-of-Speech (PoS) Tagging & Chunking ‣ noun, verb, adjective, … & noun/verb/preposition/… phrases • Named Entity Recognition (NER) ‣ organizations, persons, places, genes, chemicals, … • Information Extraction ‣ locations/times, constants, formulas, entity linking, entity-relationships, … 144
  • 3. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Probabilistic Graphical Models (PGM) • Graphs of hidden (blank) and observed (shaded) variables (vertices/nodes). • The edges depict dependencies, and if directed, show the causal relationship between nodes. 145 H O S3S2S1 directed ➡ Bayesian Network (BN) undirected ➡ Markov Random Field (MRF) mixed ➡ Mixture Models Koller & Friedman. Probabilistic Graphical Models. 2009
  • 4. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Hidden vs. Observed State and Statistical Parameters Rao. A Kalman Filter Model of the Visual Cortex. Neural Computation 1997 146
  • 5. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining A Bayesian Network for the Monty Hall Problem 147 Choice 1 Choice 2 Goat Winner Car
  • 6. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Markov Random Field 148 normalizing constant (partition function Z) factor (clique potential) A B C D θ(A, B) θ(B, C) θ(C, D) θ(D, A) a b 30 b c 100 c d 1 d a 100 a b 5 b c 1 c d 100 d a 1 a b 1 b c 1 c d 100 d a 1 a b 10 b c 100 c d 1 d a 100 P(a1, b1, c0, d1) = 10·1·100·100 ÷ 7’201’840 = 0.014 factor table factor graph P(X = ~x) = Q cl2~x cl(cl) P ~x2X Q cl2~x cl(cl) cl … [maximal] clique; a subset of nodes in the graph where every pair of nodes is connected History class: Ising developed a linear field to model (binary) atomic spin states (Ising, 1924); the 2-dim. model problem was solved by Onsager in 1944.
  • 7. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining A First Look At Probabilistic Graphical Models • Latent Dirichlet Allocation: LDA ‣ Blei, Ng, and Jordan. Journal of Machine Learning Research 2003 ‣ For assigning “topics” to “documents” 149 i.e., Text Classification; last lesson (4)
  • 8. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Latent Dirichlet Allocation (LDA 1/3) • Intuition for LDA • From: Edwin Chen. Introduction to LDA. 2011 ‣ “Document Collection” • I like to eat broccoli and bananas. • I ate a banana and spinach smoothie for breakfast. • Chinchillas and kittens are cute. • My sister adopted a kitten yesterday. • Look at this cute hamster munching on a piece of broccoli. ! ! 150 ➡ Topic A ➡ Topic B ➡ Topic 0.6A + 0.4B Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, … ! Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, …
  • 9. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining The Dirichlet Process • A Dirichlet is a [possibly continuos] distribution over [discrete/multinomial] distributions (probability masses). ! ! ! • The Dirichlet Process samples multiple independent, discrete distributions θi with repetition from θ (“statistical clustering”). 151 D(✓, ↵) = ( P ↵i) Q (↵i) Y ✓↵i 1 i ∑ i = 1; a PMF Xnα N a Dirichlet Process is like drawing from an (infinite) ”bag of dice“ (with finite faces) Gamma function -> a “continuous” factorial [!] 1. Draw a new distribution X from D(θ, α) 2. With probability α ÷ (α + n - 1) draw a new X
 With probability n ÷ (α + n - 1), (re-)sample an Xi from X Dirichlet prior: ∀ i ∈ : i > 0
  • 10. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining The Dirichlet Prior α 152 bluered green Frigyik et al. Introduction to the Dirichlet Distribution and Related Processes. 2010 ⤳ equal, =1 ➡ uniform distribution ⤳ equal, <1 ➡ marginal distrib. (”choose few“) ⤳ equal, >1 ➡ symmetric, mono-modal distrib. ⤳ not equal, >1 ➡ non-symmetric distribution Documents and Topic distributions (N=3)
  • 11. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Latent Dirichlet Allocation (LDA 2/3) • α - per-document Dirichlet prior • θd - topic distribution of document d • zd,n - word-topic assignments • wd,n - observed words • βk - word distrib. of topic k • η - per-topic Dirichlet prior 153 βkwd,nzd,nθdα η ND K Documents Words Topics P(B, ⇥, Z, W) = KY k P( k|⌘) ! DY d P(✓d|↵) NY n P(zd,n|✓d)P(wd,n| 1:K, zd,n) ! P(Topics) P(Document-T.) P(Word-T. | Document-T.) P(Word | Topics, Word-T.) Joint Probability Posterior: P(Topic | Word) P(B, , Z | W) = P(B, , Z, W) ÷ P(W) termscorek,n = ˆk,n log ˆk,n ⇣QK j ˆj,n ⌘1/K dampens the topic-specific score of terms assigned to many topics What Topics is a Word assigned to?
  • 12. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Latent Dirichlet Allocation (LDA 3/3) • LDA inference in a nutshell ‣ Calculate the probability that Topic t generated Word w. ‣ Initialization: Choose K and randomly assign one out of the K Topics to each of the N Words in each of the D Documents. • The same word can have different Topics at different positions in the Document. ‣ Then, for each Topic:
 And for each Word in each Document: 1. Compute P(Word-Topic | Document): the proportion of [Words assigned to] Topic t in Document d 2. Compute P(Word | Topics, Word-Topic): the probability a Word w is assigned a Topic t (using the general distribution of Topics and the Document-specific distribution of [Word-] Topics) • Note that a Word can be assigned a different Topic each time it appears in a Document. 3. Given the prior probabilities of a Document’s Topics and that of Topics in general, reassign
 P(Topic | Word) = P(Word-Topic | Document) * P(Word | Topics, Word-Topic) ‣ Repeat until P(Topic | Word) stabilizes (e.g., MCMC Gibbs sampling, Course 04) 154 MCMC in Python: PyMC or PySTAN
  • 13. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Retrospective We have seen how to… • Design generative models of natural language. • Segment, tokenize, and compare text and/or sentences. • [Multi-] labeling whole documents or chunks of text. ! Some open questions… • How to assign labels to individual tokens in a stream? ‣ without using a dictionary/gazetteer • How to detect semantic relationships between tokens? ‣ dependency & phrase structure parsing not in this class ! 155
  • 14. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Probabilistic Models for Sequential Data • a.k.a. Temporal or Dynamic Bayesian Networks ‣ static process w/ constant model ➡ temporal process w/ dynamic model ‣ model structure and parameters are [still] constant ‣ the topology within a [constant] “time slice” is depicted • Markov Chain (MC; Markov. 1906) • Hidden Markov Model (HMM; Baum et al. 1970) • MaxEnt Markov Model (MEMM; McCallum et al. 2000) • [Markov] Conditional Random Field (CRF; Lafferty et al. 2001) ‣ Naming: all four models make the Markov assumption (Text Mining 2) 156 generativediscriminative
  • 15. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining From A Markov Chain to a Hidden Markov Model 157 WW’ SS’ W hidden states observed state S1S0 W1 S2 W2 S3 W3 “unrolled” (for 3 words) “time-slice” W depends on S and S in turn only depends on S’ P(W | W’) P(W | S) P(S | S’)
  • 16. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining A Language-Based Intuition for Hidden Markov Models • A Markov Chain: P(W) = ∏ P(w | w’) ‣ assumes the observed words are in and of themselves the cause of the observed sequence. • A Hidden Markov Model P(S, W) = ∏ P(s | s’) P(w | s) ‣ assumes the observed words are emitted by a hidden (not observable) sequence, for example the chain of part-of-speech-states. 158 S DT NN VB NN . The dog ran home ! again, this is the “unrolled” model that does not depict the conditional dependencies
  • 17. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining The two Matrices of a Hidden Markov Model 159 P(s | s’) DT NN VB … DT 0.03 0.7 0 NN 0 0 0.5 VB 0 0.5 0.2 … P(w | s) word word word … DT 0.3 0 0 NN 0.0001 0.002 0 VB 0 0 0.001 … SS’ W very sparse (W is large) ➡ Smoothing! Transition Matrix Observation Matrix underflow danger ➡ use “log Ps”! a.k.a. “CPDs”: Conditional Probability Distributions (as measured from annotated PoS corpora)
  • 18. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Three Problems Solved by HMMs Evaluation: Given a HMM, infer the P of the observed sequence (because a HMM is a generative model). Solution: Forward Algorithm Decoding: Given a HMM and an observed sequence, predict the hidden states that lead to this observation. Solution: Viterbi Algorithm Training: Given only the graphical model and an observation sequence, learn the best [smoothed] parameters. Solution: Baum-Welch Algorithm 160 all three algorithms are are implemented using dynamic programming in Bioinformatics: Likelihood of a particular DNA element in Statistical NLP: PoS annotation
  • 19. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Three Limitations of HMMs Markov assumption: The next state only depends on the current state. Example issue: trigrams Output assumption: The output (observed value) is independent of all previous outputs (given the current state). Example issue: word morphology Stationary assumption: Transition probabilities are independent of the actual time when they take place. Example issue: position in sentence 161 (label bias* problem!) (long-range dependencies!) (inflection, declension!) MEMMCRF“unsolved” (CRF)
  • 20. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining From Generative to Discriminative Markov Models 162 SS’ W S’ S W S’ S W Hidden Markov Model Maximum Entropy Markov Model Conditional Random
 Field this “clique” makes CRFs expensive to compute generative model; lower bias is beneficial for small training sets (linear chain version)(first oder version) P(S’, S, W) P(S | S’, W) NB boldface W: all words! (first oder version)
  • 21. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Maximum Entropy (MaxEnt 3/3) ‣ In summary, MaxEnt is about selecting the “maximal” model p*: ! ! ! ‣ That obeys the following conditional equality constraint: ! ! ! ‣ Using, e.g., Langrange multipliers, one can establish the optimal λ weights of the model that maximize the entropy of this probability: 163 X x2X P(x) X y2Y P(y|x) f(x, y) = X x2X,y2Y P(x, y) f(x, y) p⇤ = argmax p2P X x2X p(x) X y2Y p(y|x) log2 p(y|x) p⇤ (y|x) = exp( P ifi(x, y)) P y2Y exp( P ifi(x, y)) …using a conditional model that matches the (observed) joint probabilities select some model that maximizes the conditional entropy… REMINDER “Exponential Model”
  • 22. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Maximum Entropy Markov Models (MEMM) Simplify training of P(s | s’, w)
 by splitting the model into |S| separate transition functions Ps’(s | w) for each s’ 164 P(S | S’, W) S’ S W Ps0 (s|w) = exp ( P ifi(w, s)) P s⇤2S exp ( P ifi(w, s⇤)) P(s | s’, w) = Ps’(s | w) MaxEnt used {x, y} which in the sequence model now are {w, s}
  • 23. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining The Label Bias Problem of Directional Markov Models 165 The robot wheels are round. The robot wheels Fred around. The robot wheels were broken. DT NN VB NN RB DT NN NN VB JJ DT NN ?? — — yet unseen! Wallach. Efficient Training of CRFs. MSc 2002 because of their directionality constraints, MEMMs & HMMs suffer from the label bias problem S’ S W MEMM HMM SS’ W
  • 24. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Markov Random Field 166 normalizing constant (partition function - Z) factor (clique potential) A B C D θ(A, B) θ(B, C) θ(C, D) θ(D, A) a b 30 b c 100 c d 1 d a 100 a b 5 b c 1 c d 100 d a 1 a b 1 b c 1 c d 100 d a 1 a b 10 b c 100 c d 1 d a 100 P(a1, b1, c0, d1) = 10·1·100·100 ÷ 7’201’840 = 0.014 factor table factor graph P(X = ~x) = Q cl2~x cl(cl) P ~x2X Q cl2~x cl(cl) cl … [maximal] clique; a subset of nodes in the graph where every pair of nodes is connected REMINDER
  • 25. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Conditional Random Field 167 SS’ W w1, …, wn Models a per-state exponential function of joint probability over the entire observed sequence W. (max.) clique MRF: Label bias “solved” by conditioning the MRF Y-Y’ on the (possibly) entire observed sequence (feature function dependent). P(s|W) = exp ( P ifi(W, s0 , s)) P s⇤2S exp ( P ifi(W, s0⇤, s⇤)) CRF: note W (upper-case/bold), not w (lower-case) P(X = ~x) = Q cl2~x cl(cl) P ~x2X Q cl2~x cl(cl) P(Y = ~y |X = ~x) = Q y2~y cl(y0 , y, ~x) P ~y2Y Q y2~y cl(y0, y, ~x) ” P(s, s’, W) “ P(W) s’ WENT MISSING
  • 26. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Parameter Estimation and L2-Regularization of CRFs • For training {Y(n), X(n)}N n=1 sequence pairs with K features • Parameter estimation using conditional log-likelihood ! ! • Substitute log P(Y(n) | X(n)) with log exponential model ! ! • Add a penalty for parameters with a to high L2-norm 168 free parameter `(⇤) = argmax 2⇤ NX n log P(Y (n) |X(n) ; ) `( ) = NX n=1 X y2Y (n) KX i=1 ifi(y0 , y, X(n) ) NX n=1 log Z(X(n) ) `( ) = NX n=1 X y2Y (n) KX i=1 ifi(y0 , y, X(n) ) NX n=1 log Z(X(n) ) KX i=1 2 i 2 2 (regularization reduces the effects of overfitting)
  • 27. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Model Summary:
 HMM, MEMM, CRF • A HMM ‣ generative model ‣ efficient to learn and deploy ‣ trains with little data ‣ generalizes well (low bias) • A MEMM ‣ better labeling performance ‣ modeling of features ‣ label bias problem ! • CRF ‣ conditioned on entire observation ‣ complex features over full input ‣ training time scales exponentially • O(NTM2 G) • N: # of sequence pairs;
 T: E[sequence length];
 M: # of (hidden) states;
 G: # of gradient computations for parameter estimation 169 a PoS model w/45 states and 1 M words can take a week to train… Sutton & McCallum. An Introduction to CRFs for Relational Learning. 2006
  • 28. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Example Applications of Sequence Models 170
  • 29. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining The Parts of Speech • The Parts of Speech: ‣ noun: NN, verb: VB, adjective: JJ, adverb: RB, preposition: IN,
 personal pronoun: PRP, … ‣ e.g. the full Penn Treebank PoS tagset contains 48 tags: ‣ 34 grammatical tags (i.e., “real” parts-of-speech) for words ‣ one for cardinal numbers (“CD”; i.e., a series of digits) ‣ 13 for [mathematical] “SYM” and currency “$” symbols, various types of punctuation, as well as for opening/closing parenthesis and quotes 171 I ate the pizza with green peppers . PRP VB DT NN IN JJ NN . “PoS tags” REMINDER
  • 30. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining The Parts of Speech • Corpora for the [supervised] training of PoS taggers ‣ Brown Corpus (AE from ~1961) ‣ British National Corpus: BNC (20th century British) ‣ American National Corpus: ANC (AE from the 90s) ‣ Lancaster Corpus of Mandarin Chinese: LCMC (Books in Mandarin) ‣ The GENIA corpus (Biomedical abstracts from PubMed) ‣ NEGR@ (German Newswire from the 90s) ‣ Spanish and Arabian corpora should be (commercially…) available… ??? 172 I ate the pizza with green peppers . PRP VB DT NN IN JJ NN .
  • 31. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Noun/Verb Phrase Chunking and BIO-Labels 173 The brown fox quickly jumps over the lazy dog . a pangram (hint: check the letters) DT JJ NN RB VBZ IN DT JJ NN . B-N I-N I-N B-V I-V O B-N I-N I-N O Performance (2nd order CRF) ~ 94% Main Problem: embedded & chained NPs (N of N and N) ! Chunking is “more robust to the highly diverse corpus of text on the Web” and [exponentially] faster than parsing. Banko et al. Open Information Extraction from the Web. IJCAI 2007 a paper with over 700 citations Wermter et al. Recognizing noun phrases in biomedical text. SMBM 2005 error sources
  • 32. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Word Sense Disambiguation (WSD) • Basic Example: hard [JJ] ‣ physically hard (a hard stone) ‣ difficult [task] (a hard task) ‣ strong/severe (a hard wind) ‣ dispassionate [personality] (a hard bargainer) • Entity Disambig.: bank [NN] ‣ finance (“bank account”) ‣ terrain (“river bank”) ‣ aeronautics (“went into bank”) ‣ grouping (“a bank of …”) ! • SensEval ‣ http://www.senseval.org/ ‣ SensEval/SemEval Challenges ‣ provides corpora where every word is tagged with its sense • WordNet ‣ http://wordnet.princeton.edu/ ‣ a labeled graph of word senses • Applications ‣ Named Entity Recognition ‣ Machine Translation ‣ Language Understanding 174 Note the PoS-tag “dependency”: otherwise, the two examples would have even more senses!
  • 33. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Word Representations: Unsupervised WSD • Idea 1: Words with similar meaning have similar environments. • Use a word vector to count a word’s surrounding words. • Similar words now will have similar word vectors. ‣ see Text Mining 4, Cosine Similarity ‣ visualization: Principal Component Analysis ! • Idea 2: Words with similar meaning have similar environments. • Use the surrounding of unseen words to “smoothen” language models (i.e., the correlation between word wi and its context cj). ‣ see Text Mining 4: TF-IDF weighting, Cosine similarity and point-wise MI ‣ Levy & Goldberg. Linguistic Regularities in Sparse and Explicit Word Representations. CoNLL 2014 175 Levy & Goldberg took two words on each side to “beat” a neural
 network model with a four word window! Turian et al. Word representations. ACL 2010
  • 34. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Named Entity Recognition (NER) 176 Date Location Money Organization Percentage Person ➡ Conditional Random Field ➡ Ensemble Methods; +SVM, HMM, MEMM, … ➡ pyensemble
 NB these are corpus-based approaches (supervised) CoNLL03: http://www.cnts.ua.ac.be/conll2003/ner/ ! How much training 
 data do I need? “corpora list” Image Source: v@s3k [a GATE session; http://vas3k.ru/blog/354/]
  • 35. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining The Next Step: Relationship Extraction Given this piece of text: The fourth Wells account moving to another agency is the packaged paper-products division of Georgia-Pacific Corp., which arrived at Wells only last fall. Like Hertz and the History Channel, it is also leaving for an Omnicom-owned agency, the BBDO South unit of BBDO Worldwide. BBDO South in Atlanta, which handles corporate advertising for Georgia-Pacific, will assume additional duties for brands like Angel Soft toilet tissue and Sparkle paper towels, said Ken Haldin, a spokesman for Georgia-Pacific in Atlanta. Which organizations operate in Atlanta? (BBDO S., G-P) 177
  • 36. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Four Approaches to Relationship Extraction • Co-mention window ‣ e.g.: if ORG and LOC entity within same sentence and no more than x tokens in between, treat the pair as a hit. ‣ Low precision, high recall; trivial, many false positives. • Dependency Parsing ‣ if the shortest a path containing certain nodes (e.g. the “in/IN”) connecting the two entities, extract the pair. ‣ Balanced precision and recall, computationally EXPENSIVE. • Pattern Extraction ‣ e.g.: <ORG>+ <IN> <LOC>+ ‣ High precision, low recall; cumbersome, but very common ‣ pattern learning can help • Machine Learning ‣ features for sentences with entities and some classifier (e.g., SVM, MaxEnt, ANN, …) ‣ very variable milages
 
 but much more fun than anything else in the speaker’s opinion… 178 preposition (PoS) token-distance, #tokens between the entities, tokens before/after them, etc.)
  • 37. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Advanced Evaluation Metrics: AUC ROC and PR . 179 Image Source: WikiMedia Commons, kku (“kakau”, eddie) TPR / Recall" TP ÷ (TP + FN) ! FPR" FP ÷ (FP + TN) ! Precision" TP ÷ (TP + FP) • Davis & Goadrich. The Relationship Between PR and ROC Curves. ICML 2006 • Landgrebe et al. Precision-recall operating characteristic (P-ROC) curves in imprecise environments. Pattern Recognition 2006 • Hanczar et al. Small-Sample Precision of ROC-Related Estimates. Bioinformatics 2010 “an algorithm which optimizes the area under the ROC curve is not guaranteed to optimize the area under the PR curve” Davis & Goadrich, 2006 Area Under the Curve Receiver-Operator Characteristic Precision-Recall i.e., ROC is affected by the same class- imbalance issues as accuracy (Lesson #4)!
  • 38. Florian Leitner <florian.leitner@upm.es> MSS/ASDM: Text Mining Practical: Implement a Shallow Parser and NER Implement a Python class and command-line tool that will allow you to do shallow parsing and NER on “standard” English text. 180
  • 39. Richard Feynman, 1974 “The first principle is that you must not fool yourself - and you are the easiest person to fool.”