The Web is inundated with information in many different formats including semi-structured and unstructured data. Machine Reading is a research area that aims to build systems that can read natural-language-based information, extracting knowledge and storing it into knowledge bases. Thus, Machine Reading systems are developed to produce language- understanding technology that will automatically process text in affordable time. In this tutorial the idea of automatically reading the Web using Machine Reading techniques will be explored. Four of the most successful Machine Reading approaches in- tended to Read the Web (namely KnowItAll, Yago, NELL and DBPedia systems) will be presented and discussed. The principles, the subtleties, as well as current results of each approach will be addressed. On-line resources (from each approach) will be explored and the future directions in each system will be pointed out. YAGO, KnowItAll, NELL and DBPedia are not the only research efforts focusing on Reading the Web. They were selected, to be presented in this tutorial, because they show four different and very relevant approaches to this problem, but it does not mean they are the only relevant approaches at all. In spite of mainly focusing on the four aforementioned systems, some other independent contributions on the Read the Web idea will be mentioned and pointed out as related works.
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
Service Graphics 2015 has seen our installation crews busy crossing the country do great work. To showcase just some of the marquee projects delivered our 2015 yearbook is hot off the press. Have a look and see what our specialist teams can do for you. #wechangespace
Stories About Renraku — the new Quality Model of Pharo (esug2016)Yuriy Tymchuk
Earlier this year Pharo 5 was released with QualityAssistant on board. However the live quality feedback in the code browser is just the tip of the iceberg. The main value comes from Renraku — a quality model that was forged during the last two years based on the requirements of quality tools. One cannot simply “show” Renraku as it is just a meta-model with a set of handy functions. And I will never allow myself to bore audience by presenting dry specifications. Luckily I have enough stories that accumulated during the development to unveil Renraku by telling about the challenges and solutions that shaped Pharo’s quality model.
Stories About Renraku — the new Quality Model of PharoESUG
Tue, August 23, 11:30am – 12:00pm
Youtube: https://youtu.be/K4rMfQ_bQuI
First Name: Yuriy
Last Name: Tymchuk
Email where you can always be reached: yuriy.tymchuk@me.com
Title: Pitfalls and New Horizons of QualityAssistant Journey.
Type: Talk
Abstract: Earlier this year Pharo 5 was released with QualityAssistant on board. However the live quality feedback in the code browser is just the tip of the iceberg. The main value comes from Renraku — a quality model that was forged during the last two years based on the requirements of quality tools. One cannot simply “show” Renraku as it is just a meta-model with a set of handy functions. And I will never allow myself to bore audience by presenting dry specifications. Luckily I have enough stories that accumulated during the development to unveil Renraku by telling about the challenges and solutions that shaped Pharo’s quality model.
Bio: I'm a Ph.D. student at the University of Bern in the Institute of Informatics. I am working under the supervision of Prof. Dr. Oscar Nierstrasz, in
the Software Composition Group. My main topic is software quality especially the tools that helps developers to deal with the quality of code and rules
that work behind the stage. For the last couple of years I was doing my Ph.D. studies in Lugano. In the past I worked as a network administrator at ISP,
Java and Ruby developer in two software companies and ran a freelance web development teem. Now I am Pharo evangelist, and also I promote collaboration
with outer world in Ukrainian universities where I am originally from.
Service Graphics 2015 has seen our installation crews busy crossing the country do great work. To showcase just some of the marquee projects delivered our 2015 yearbook is hot off the press. Have a look and see what our specialist teams can do for you. #wechangespace
Stories About Renraku — the new Quality Model of Pharo (esug2016)Yuriy Tymchuk
Earlier this year Pharo 5 was released with QualityAssistant on board. However the live quality feedback in the code browser is just the tip of the iceberg. The main value comes from Renraku — a quality model that was forged during the last two years based on the requirements of quality tools. One cannot simply “show” Renraku as it is just a meta-model with a set of handy functions. And I will never allow myself to bore audience by presenting dry specifications. Luckily I have enough stories that accumulated during the development to unveil Renraku by telling about the challenges and solutions that shaped Pharo’s quality model.
Stories About Renraku — the new Quality Model of PharoESUG
Tue, August 23, 11:30am – 12:00pm
Youtube: https://youtu.be/K4rMfQ_bQuI
First Name: Yuriy
Last Name: Tymchuk
Email where you can always be reached: yuriy.tymchuk@me.com
Title: Pitfalls and New Horizons of QualityAssistant Journey.
Type: Talk
Abstract: Earlier this year Pharo 5 was released with QualityAssistant on board. However the live quality feedback in the code browser is just the tip of the iceberg. The main value comes from Renraku — a quality model that was forged during the last two years based on the requirements of quality tools. One cannot simply “show” Renraku as it is just a meta-model with a set of handy functions. And I will never allow myself to bore audience by presenting dry specifications. Luckily I have enough stories that accumulated during the development to unveil Renraku by telling about the challenges and solutions that shaped Pharo’s quality model.
Bio: I'm a Ph.D. student at the University of Bern in the Institute of Informatics. I am working under the supervision of Prof. Dr. Oscar Nierstrasz, in
the Software Composition Group. My main topic is software quality especially the tools that helps developers to deal with the quality of code and rules
that work behind the stage. For the last couple of years I was doing my Ph.D. studies in Lugano. In the past I worked as a network administrator at ISP,
Java and Ruby developer in two software companies and ran a freelance web development teem. Now I am Pharo evangelist, and also I promote collaboration
with outer world in Ukrainian universities where I am originally from.
Presented by Mr. Dinesh KS
Software Developer, Livares Technologies
Introduction
Object detection is a computer technology related to computer vision and image processing that
deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or
cars) in digital images and videos.
Face detection is a computer technology being used in a variety of applications that identifies
human faces in digital images.
Machine Learning Experimentation at Sift ScienceSift Science
Alex Paino, a Software Engineer at Sift Science, discusses how we use machine learning to prevent several types of abusive user behavior for thousands of customers. Measuring the accuracy of the thousands of classifiers used in a manner that correctly represents the value provided to customers is a huge challenge for us. Alex describes how we think about this problem and what we have done to address it. This includes an overview of the various tools and methodologies we employ that allow us to quickly summarize the results of an experiment, break ties in mixed result experiments, and drill into specific models and samples.
Denis Troyanov - Lalafo
How to Solve Product Classification Challenge for Marketplaces in the Wild
In pursuit of improved user experience, marketplaces are trying to solve open issues using modern approaches. Today, it seems that AI can easily solve a lot of narrow problems but when it faces real-world questions, it turns out that those questions are much deeper than they seem at first glance. Since we are following current industry dynamics in AI, we have started with image classification. We’ll tell you about our experience with real-world problems associated with this task. How we went from a flat-model to one that is ‘super-complex’ as well as what was successful and what wasn’t.
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Chris Hammerschmidt
achine Learning for DFIR with Velociraptor: From Setting Expectations to a Case Study
By Christian Hammerschmidt, PhD - Head of Engineering/ML, APTA Technologies
Machine learning (ML) or artificial intelligence (AI) often comes with great promise and large marketing budgets for cybersecurity, especially in monitoring (such as EDR/XDR solutions). Post-breach, it often turns out that the actual performance falls short of its promises.
In this talk, we’ll briefly look at ML for DFIR: What tasks can ML solve, generally speaking? What requirements do we have for a useful ML system in cybersecurity/DFIR contexts, such as reliability, robustness to attackers, and explainability? What makes ML difficult to apply in cybersecurity, e.g. when thinking about false alerts or attackers attempting to circumvent automated systems?
After discussing the basics, we look at ML for velociraptor:
How can we process forensic data collected with VQL using machine learning (with a typical Python/Jupyter/scikit-learn/PyTorch stack)?
And how can we build artifacts that run ML directly on each endpoint, avoiding central data collection?
The talk concludes with a case study, showing how we significantly reduced time to analyze EVTX files in incident response cases, saving thousands of USD in costs and reducing time to resolution.
Bio: Chris Hammerschmidt did his PhD research on machine learning methods for reverse engineering software systems. Now, he’s heading APTA Technologies, a start-up building machine learning tools to understand software behavior .
Affiliation: APTA Technologies, https://apta.tech
안녕하세요 딥러닝 논문읽기 모임입니다.
오늘 소개 드릴 논문은 2021 ICLR에 억셉이된
'The Deep Bootstrap Framework:Good Online Learners are good Offline Generalizers ' 라는 논문입니다.
오늘 발표를 위해 펀디 멘탈팀 이재윤님이 리뷰 도와주셨습니다.
문의 tfkeras@kakao.com
The Art Of Performance Tuning - with presenter notes!Jonathan Ross
A somewhat more verbose version of https://www.slideshare.net/JonathanRoss74/the-art-of-performance-tuning.
Presented at JavaOne 2017 [CON4027], this presentation takes a practical, hands-on look at Java performance tuning. It discusses methodology (spoiler: it’s the scientific method) and how to apply it to Java SE systems (on any budget). Exploring concrete examples with tools such as the Oracle Java Mission Control feature of Oracle Java SE Advanced, VisualVM, YourKit, and JMH, the presentation focuses on ways of measuring performance, how to interpret data, ways of eliminating bottlenecks, and even how to avoid future performance regressions.
Recommender Systems from A to Z – Model TrainingCrossing Minds
This second meetup will be about training different models for our recommender system. We will review the simple models we can build as a baseline. After that, we will present the recommender system as an optimization problem and discuss different training losses. We will mention linear models and matrix factorization techniques. We will end the presentation with a simple introduction to non-linear models and deep learning.
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Applied Machin...Thomas Ploetz
Tutorial @Ubicomp 2015: Bridging the Gap -- Machine Learning for Ubiquitous Computing (applied machine learning session).
A tutorial on promises and pitfalls of Machine Learning for Ubicomp (and Human Computer Interaction). From Practitioners for Practitioners.
Presenter: Thomas Ploetz <tom.ploetz@gmail.com>
video recording of talks as they were held at Ubicomp:
https://youtu.be/LgnnlqOIXJc?list=PLh96aGaacSgXw0MyktFqmgijLHN-aQvdq
Scalable Learning Technologies for Big Data MiningGerard de Melo
These are slides of a tutorial by Gerard de Melo and Aparna Varde presented at the DASFAA 2015 conference.
As data expands into big data, enhanced or entirely novel data mining algorithms often become necessary. The real value of big data is often only exposed when we can adequately mine and learn from it. We provide an overview of new scalable techniques for knowledge discovery. Our focus is on the areas of cloud data mining and machine learning, semi-supervised processing, and deep learning. We also give practical advice for choosing among different methods and discuss open research problems and concerns.
Similar to Machine Reading the Web: beyond Named Entity Recognition and Relation Extraction (20)
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Machine Reading the Web: beyond Named Entity Recognition and Relation Extraction
1. Estevam
R.
Hruschka
Jr.
Federal
University
of
São
Carlos
Machine Reading the Web:
Beyond Named Entity
Recognition and Relation
Extraction
2. Disclaimers
• Previous
versions
of
this
tutorial
were
presented
at
IBERAMIA2012
(h@p://iberamia2012.dsic.upv.es/tutorials/)
and
WWW2013
(h@p://www2013.org/program/machine-‐
reading-‐the-‐web/).
Also,
a
short
version
was
presented
at
ECMLPKDD2015
Summer
School(h@p://
www.ecmlpkdd2015.org/summer-‐school/ss-‐schedule).
• Feel
free
to
e-‐mail
me
(estevam.hruschka@gmail.com)
with
quesTons
about
this
tutorial
or
any
feedback/suggesTons/
criTcisms.
Your
feedback
can
help
improving
the
quality
of
these
slides,
thus,
they
are
very
welcome.
• As
in
many
tutorials’
slides,
these
slides
were
prepared
to
be
presented,
and
la@er
studied.
Thus,
they
are
meant
to
be
more
self-‐contained
than
slides
from
a
paper
presentaTon.
3. Disclaimers
• Due
to
Tme
constraints,
I
do
not
intend
to
cover
all
the
algorithms
and
publicaTons
related
to
YAGO,
KnowItAll,
NELL
and
DBPedia.
What
I
do
intend,
instead,
is
to
give
an
overview
of
all
four
projects
and
what
is
the
main
approach
to
“Read
the
Web”,
used
in
each
project.
• YAGO,
KnowItAll,
NELL
and
DBPedia
are
not
the
only
research
efforts
focusing
on
“Reading
the
Web”.
They
were
selected,
to
be
presented
in
this
tutorial,
because
they
represent
four
different
and
very
relevant
approaches
to
this
problem,
but
it
does
not
mean
they
are
the
best
(or
the
only
relevant)
ones
at
all.
4. Outline
• Machine
Learning
• Machine
Reading
• Reading
the
Web
– YAGO
– KnowItAll
– NELL
– DBPedia
5. Outline
• Machine
Learning
• Machine
Reading
• Reading
the
Web
– YAGO
– KnowItAll
– NELL
– DBPedia
25. Outline
• Machine
Learning
• Machine
Reading
• Reading
the
Web
– DBPedia
– YAGO
– KnowItAll
– NELL
26. Machine
Learning
• What
is
Machine
Learning?
The
field
of
Machine
Learning
seeks
to
answer
the
quesTon
“How
can
we
build
computer
systems
that
automaTcally
improve
with
experience,
and
what
are
the
fundamental
laws
that
govern
all
learning
processes?”
[Mitchell,
2006]
27. Machine
Learning
• What
is
Machine
Learning?
a
machine
learns
with
respect
to
a
parTcular:
-‐ task
T
-‐ performance
metric
P
-‐ type
of
experience
E
if
the
system
reliably
improves
its
performance
P
at
task
T,
following
experience
E.
[Mitchell,
1997]
28. Machine
Learning
• Examples
of
Machine
Learning
approaches
for
different
tasks
(T),
performance
metrics
(P)
an
experiences
(E)
-‐ data
mining
-‐ autonomous
discovery
-‐ database
updaTng
-‐ programming
by
example
-‐ Pa@ern
recogniTon
62. 0
5
10
15
20
25
0
5
10
15
20
25
Series1
Series2
Unlabeled
Semi-‐supervised
Learning
(one
simple
anecdotal
approach)
?????????
What
model
should
be
chosen?
?????????
63. Outline
• Machine
Learning
• Machine
Reading
• Reading
the
Web
– DBPedia
– YAGO
– KnowItAll
– NELL
64. Machine
Reading
• “The
autonomous
understanding
of
text”
[Etzioni
et
al.,
2007]
• “One
of
the
most
important
methods
by
which
human
beings
learn
is
by
reading”
[Clark
et
al.,
2007],
thus
why
not
building
machines
capable
of
learning
by
reading?
65. Machine
Reading
• “The
problem
of
deciding
what
was
implied
by
a
wri@en
text,
of
reading
between
the
lines
is
the
problem
of
inference.”
[Norvig,
2007]
• Typically,
Machine
Reading
is
different
from
Natural
Language
Processing
alone
66. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
This
slide
was
adapted
from
[Hady
et
al.,
2011]
67. Machine
Reading
same
This
slide
was
adapted
from
[Hady
et
al.,
2011]
It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
68. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
same
same
same
same
same
same
This
slide
was
adapted
from
[Hady
et
al.,
2011]
69. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
same
same
same
same
same
same
uncleOf
owns
hires
headOf
This
slide
was
adapted
from
[Hady
et
al.,
2011]
70. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
same
same
same
same
same
same
uncleOf
owns
hires
headOf
affairWith
affairWith
enemyOf
This
slide
was
adapted
from
[Hady
et
al.,
2011]
71. Machine
Reading
• One
important
(ini6al)
approach
to
machine
reading
is
to
extract
facts
from
text
and
store
them
in
a
structured
form.
• Facts
can
be
seen
as
enTTes
and
their
relaTons
• Ontology
is
one
of
the
most
common
representaTon
for
the
extracted
facts
80. Machine
Reading
• Named
EnTty
ResoluTon/RecogniTon
• RelaTon
ExtracTon
• Co-‐reference
and
Polysemy
ResoluTon
• RelaTon
Discovery
• Inference
• Knowledge
Base
• Document/Sentence
Understanding
(Micro-‐
Reading)
81. Machine
Reading
• Named
EnTty
ResoluTon/RecogniTon
• RelaTon
ExtracTon
• Co-‐reference
and
Polysemy
ResoluTon
• RelaTon
Discovery
• Inference
• Knowledge
Base
• Document/Sentence
Understanding
(Micro-‐
Reading)
82. Machine
Reading
• Named
EnTty
ResoluTon/RecogniTon
– Semi-‐structured
data
The
“Low-‐Hanging
Fruit”
• Wikipedia
infoboxes
&
categories
• HMTL
lists
&
tables,
etc.
– Free
text
• Hearst-‐pa@erns;
clustering
by
verbal
phrases
• Natural-‐language
processing
• Advanced
pa@erns
&
iteraTve
bootstrapping
(“Dual
IteraTve
Pa@ern
RelaTon
ExtracTon”)
83. Named
EnTty
RecogniTon
• Named
EnTty
RecogniTon
[Nadeau
&
Sekine,
2007]
– term
“Named
EnTty”
coined
for
the
Sixth
Message
Understanding
Conference
(MUC-‐6)
(R.
Grishman
&
Sundheim
1996).
– important
sub-‐tasks
of
IE
called
“Named
EnTty
RecogniTon
and
ClassificaTon
(NERC)”.
84. • recognize
informaTon
units
like
names,
including
person,
organizaAon
and
locaAon
names,
and
numeric
expressions
including
Ame,
date,
money
and
percent
expressions.
• In
Machine
Reading,
many
other
enTTes:
product,
kitchen
item,
sport,
etc.
Named
EnTty
RecogniTon
[Nadeau
&
Sekine,
2007]
85. • Named
EnTty
ResoluTon
[Theobald
&
Weikum,
2012]
– Which
individual
enTTes
belong
to
which
classes?
• instanceOf
(Surajit
Chaudhuri,
computer
scien6sts),
• instanceOf
(BarbaraLiskov,
computer
scien6sts),
• instanceOf
(Barbara
Liskov,
female
humans),
…
Named
EnTty
ResoluTon
86. • Named
EnTTes
RecogniTon
as
a
machine
learning
task.
– Supervised
Learning
NLP
tools
(POS,
Parse
Trees)
text
Features
ExtracTon
Classifier
Named
EnTty
RecogniTon
87. • Named
EnTty
RecogniTon
as
a
Machine
Learning
task.
– Supervised
Learning
– Possible
features
[RaTnov
&
Roth,
2009],
[Khambhatla,
2004],
[Zhou
et.
al.
2005]
•
Words
“around”
and
including
enTTes
• POS
(Part-‐Of-‐Speech)
• Prefixes
and
suffixes
• CapitalizaTon
• Number
of
words
• Number
of
characters
• First
word,
last
word
• gaze@eer
matches
Named
EnTty
RecogniTon
88. • Supervised
Learning
NLP
tools
(POS,
Parse
Trees)
text
Features
ExtracTon
Classifier
Named
EnTty
RecogniTon
89. • Supervised
Learning
NLP
tools
(POS,
Parse
Trees)
text
Features
ExtracTon
Classifier
Kernels
Named
EnTty
RecogniTon
90. • Supervised
Learning
using
Kernels
– A
Kernel
defines
similarity
implicitly
in
a
higher
dimensional
space
– Can
be
based
on
Strings,
Word
Sequences,
Parse
Trees,
etc.
• For
strings
similarity∝
number
of
common
substrings
(or
subsequences)
• Recommended
reading
on
string
kernels
[Lodhi
et.
al.,
2002]
Named
EnTty
RecogniTon
[Bach
&
Badaskar,
2007]
92. • Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
Set
of
labeled
Pa@ern
Examples
Named
EnTty
RecogniTon
93. • Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
is
the
CEO
of
X
Named
EnTty
RecogniTon
94. • Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
NE
Instances
Classifier
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
is
the
CEO
of
X
Named
EnTty
RecogniTon
95. Set
of
labeled
Instances
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
NE
Instances
Classifier
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
is
the
CEO
of
X
Named
EnTty
RecogniTon
96. Set
of
labeled
Instances
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
NE
Instances
Classifier
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
is
the
CEO
of
X
Named
EnTty
RecogniTon
Google
Apple
97.
Set
of
labeled
Instances
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
NE
Instances
Classifier
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
is
the
CEO
of
X
Named
EnTty
RecogniTon
Google
Apple
NE
Pa@ern
Classifier
98.
Set
of
labeled
Instances
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
NE
Instances
Classifier
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
is
the
CEO
of
X
Named
EnTty
RecogniTon
Google
Apple
NE
Pa@ern
Classifier
What
about
unsupervised?
99.
Set
of
labeled
Instances
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
NE
Instances
Classifier
Set
of
labeled
Pa@ern
Examples
Named
EnTty
RecogniTon
NE
Pa@ern
Classifier
What
about
unsupervised?
100.
Set
of
labeled
Instances
• Unsupervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
NE
instances.
NE
Instances
Classifier
Set
of
labeled
Pa@ern
Examples
Named
EnTty
RecogniTon
NE
Pa@ern
Classifier
101. • [RaTnov
&
Roth,
2009]
Named
EnTty
RecogniTon
103. Machine
Reading
• Named
EnTty
ResoluTon/ExtracTon
• RelaTon
ExtracTon
• Co-‐reference
and
Polysemy
ResoluTon
• RelaTon
Discovery
• Inference
• Knowledge
Base
RepresentaTon
• Document/Sentence
Understanding
(Micro-‐
Reading)
104. Machine
Reading
• RelaTon
ExtracTon
– Semi-‐structured
data
The
“Low-‐Hanging
Fruit”
• Wikipedia
infoboxes
&
categories
• HMTL
lists
&
tables,
etc.
– Free
text
• Hearst-‐pa@erns;
clustering
by
verbal
phrases
• Natural-‐language
processing
• Advanced
pa@erns
&
iteraTve
bootstrapping
(“Dual
IteraTve
Pa@ern
RelaTon
ExtracTon”)
105. Machine
Reading
• RelaTon
ExtracTon
[Theobald
&
Weikum,
2012]
– Which
instances
(pairs
of
individual
enTTes)
are
there
for
given
binary
relaTons
with
specific
type
signatures?
• hasAdvisor
(JimGray,
MikeHarrison)
• hasAdvisor
(HectorGarcia-‐Molina,
Gio
Wiederhold)
• hasAdvisor
(Susan
Davidson,
Hector
Garcia-‐Molina)
• graduatedAt
(JimGray,
Berkeley)
• graduatedAt
(HectorGarcia-‐Molina,
Stanford)
• hasWonPrize
(JimGray,
TuringAward)
• bornOn
(JohnLennon,
9Oct1940)
• diedOn
(JohnLennon,
8Dec1980)
• marriedTo
(JohnLennon,
YokoOno)
106. RelaTon
ExtracTon
• ExtracTng
semanTc
relaTons
between
enTTes
in
text
• RelaTon
extracTon
as
a
Machine
Learning
task.
– Supervised
Learning
NLP
tools
(POS,
Parse
Trees)
text
Features
ExtracTon
Classifier
[Bach
&
Badaskar,
2007]
107. RelaTon
ExtracTon
• RelaTon
extracTon
as
a
Machine
Learning
task.
– Supervised
Learning
– Possible
features
[Khambhatla,
2004],
[Zhou
et.
al.
2005]
•
Words
between
and
including
enTTes
• Types
of
enTTes
(person,
locaTon,
etc)
• Number
of
enTTes
between
the
two
enTTes,
whether
both
enTTes
belong
to
same
chunk
• #
words
separaTng
the
two
enTTes
• Path
between
the
two
enTTes
in
a
parse
tree
[Bach
&
Badaskar,
2007]
108. RelaTon
ExtracTon
• ExtracTng
semanTc
relaTons
between
enTTes
in
text
• RelaTon
extracTon
as
a
classificaTon
task.
– Supervised
Learning
NLP
tools
(POS,
Parse
Trees,
NER)
text
Features
ExtracTon
Classifier
[Bach
&
Badaskar,
2007]
109. RelaTon
ExtracTon
• ExtracTng
semanTc
relaTons
between
enTTes
in
text
• RelaTon
extracTon
as
a
classificaTon
task.
– Supervised
Learning
NLP
tools
(POS,
Parse
Trees,
NER)
text
Features
ExtracTon
Classifier
Kernels
[Bach
&
Badaskar,
2007]
110. RelaTon
ExtracTon
• Supervised
Learning
using
Kernels
– A
Kernel
defines
similarity
implicitly
in
a
higher
dimensional
space
– Can
be
based
on
Strings,
Word
Sequences,
Parse
Trees,
etc.
• For
strings,
similarity∝
number
of
common
substrings
(or
subsequences)
• Recommended
reading
on
string
kernels
[Lodhi
et.
al.,
2002]
[Bach
&
Badaskar,
2007]
112. RelaTon
ExtracTon
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
Pa@ern
Examples
113. RelaTon
ExtracTon
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
Y
Y
is
the
headquarter
of
X
114. RelaTon
ExtracTon
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
Y
Y
is
the
headquarter
of
X
Pair
of
Instances
Classifier
115. RelaTon
ExtracTon
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
Y
Y
is
the
headquarter
of
X
Pair
of
Instances
Classifier
116. RelaTon
ExtracTon
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Pair
of
Instances
Classifier
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
X
is
headquartered
in
Y
Y
is
the
headquarter
of
X
Google-‐Mountain
View
Apple-‐CuperAno
117. RelaTon
ExtracTon
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pa@ern
Classifier
X
is
headquartered
in
Y
Y
is
the
headquarter
of
X
Google-‐Mountain
View
Apple-‐CuperAno
Pair
of
Instances
Classifier
118. RelaTon
ExtracTon
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pa@ern
Classifier
X
is
headquartered
in
Y
Y
is
the
headquarter
of
X
Google-‐Mountain
View
Apple-‐CuperAno
Pair
of
Instances
Classifier
What
about
unsupervised?
119. RelaTon
ExtracTon
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pa@ern
Classifier
Pair
of
Instances
Classifier
What
about
unsupervised?
120. RelaTon
ExtracTon
• Unsupervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pa@ern
Classifier
Pair
of
Instances
Classifier
122. Machine
Reading
• Named
EnTty
ResoluTon/ExtracTon
• RelaTon
ExtracTon
• Co-‐reference
and
Polysemy
ResoluTon
• RelaTon
Discovery
• Inference
• Knowledge
Base
RepresentaTon
• Document/Sentence
Understanding
(Micro-‐
Reading)
123. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference:
expressions
that
refer
to
the
same
enTty
Example
(figure)
taken
from:
h@p://nlp.stanford.edu/projects/coref.shtml
124. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference:
expressions
that
refer
to
the
same
enTty
Example
(figure)
taken
from:
h@p://nlp.stanford.edu/projects/coref.shtml
within-document co-reference
125. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference:
expressions
that
refer
to
the
same
enTty
Example
(figure)
taken
from:
h@p://nlp.stanford.edu/projects/coref.shtml
within-document co-reference
126. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference:
expressions
that
refer
to
the
same
enTty
Example
(figure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]
apple
computer
Apple
Computer
127. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference:
expressions
that
refer
to
the
same
enTty
Example
(figure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]
apple
apple
computer
Apple
Computer
128. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference:
expressions
that
refer
to
the
same
enTty
Example
(figure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]
apple
apple
computer
Apple
Computer
cross-document co-reference
129. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference:
expressions
that
refer
to
the
same
enTty
• Which
names
denote
which
enTTes?
[Theobald
&
Weikum,
2012]
– means
(“Lady
Di“,
Diana
Spencer),
– means
(“Diana
Frances
Mountba@en-‐Windsor”,
Diana
Spencer),
…
– means
(“Madonna“,
Madonna
Louise
Ciccone),
– means
(“Madonna“,
Madonna(painTng
by
Edward
Munch)),
…
cross-document co-reference
130. Co-‐Reference
and
Polysemy
ResoluTon
• Polysemy:
is
the
capacity
for
a
sign
(such
as
a
word,
phrase,
or
symbol)
to
have
mulTple
meanings
[Wikipedia]
131. Co-‐Reference
and
Polysemy
ResoluTon
• Polysemy:
is
the
capacity
for
a
sign
(such
as
a
word,
phrase,
or
symbol)
to
have
mulTple
meanings
[Wikipedia]
Example
(figure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]
apple
apple
(the
fruit)
Apple
Computer
132. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐Reference
and
Polysemy
Example
(figure)
adapted
from
[Krishnamurthy
&
Mitchell,
2011]
apple
apple
computer
apple
(the
fruit)
Apple
Computer
133. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference
and
Polysemy:
– Supervised
Learning
NLP
tools
(POS,
Parse
Trees)
text
Features
ExtracTon
Classifier
134. • Co-‐Reference
ResoluTon.
– Supervised
Learning
– Possible
features
[Bengtson
&
Roth,
2008]
Co-‐Reference
and
Polysemy
ResoluTon
135. • Co-‐Reference
ResoluTon.
– Supervised
Learning
– Possible
features
[Bengtson
&
Roth,
2008]
Co-‐Reference
and
Polysemy
ResoluTon
136. Co-‐Reference
and
Polysemy
ResoluTon
• Co-‐reference
and
Polysemy:
– Supervised
Learning
NLP
tools
(POS,
Parse
Trees)
text
Features
ExtracTon
Classifier
Kernels
137. • Supervised
Learning
using
Kernels
– A
Kernel
defines
similarity
implicitly
in
a
higher
dimensional
space
– Can
be
based
on
Strings,
Word
Sequences,
Parse
Trees,
etc.
• For
strings
similarity∝
number
of
common
substrings
(or
subsequences)
• Recommended
reading
on
string
kernels
[Lodhi
et.
al.,
2002]
Co-‐Reference
and
Polysemy
ResoluTon
138. • Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
Pa@ern
Examples
Co-‐Reference
and
Polysemy
ResoluTon
139. • Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
Pa@ern
Examples
X
also
know
as
Y
Co-‐Reference
and
Polysemy
ResoluTon
140. • Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Set
of
labeled
Pa@ern
Examples
Pair
of
Instances
Classifier
Co-‐Reference
and
Polysemy
ResoluTon
X
also
know
as
Y
141.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pair
of
Instances
Classifier
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Co-‐Reference
and
Polysemy
ResoluTon
X
also
know
as
Y
142. Pair
of
Instances
Classifier
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Apple
Computer
-‐
Apple
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Co-‐Reference
and
Polysemy
ResoluTon
X
also
know
as
Y
143.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pa@ern
Classifier
Pair
of
Instances
Classifier
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Co-‐Reference
and
Polysemy
ResoluTon
Apple
Computer
-‐
Apple
X
also
know
as
Y
144.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pa@ern
Classifier
Pair
of
Instances
Classifier
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Co-‐Reference
and
Polysemy
ResoluTon
Apple
Computer
-‐
Apple
X
also
know
as
Y
What
about
unsupervised?
145.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pa@ern
Classifier
Pair
of
Instances
Classifier
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Co-‐Reference
and
Polysemy
ResoluTon
What
about
unsupervised?
146.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
Pa@ern
Classifier
Pair
of
Instances
Classifier
• Semi-‐supervised
Approaches
– Bootstrap
can
generate
a
large
number
of
pa@erns
and
relaTon
instances.
Co-‐Reference
and
Polysemy
ResoluTon
147. • Co-‐Reference
ResoluTon:
[Singh
et
al.,
2011],
[Krishnamurthy
&
Mitchell,
2011],[Du@a
&
Weikum,
2015]
• Polysemy
ResoluTon:
[Krishnamurthy
&
Mitchell,
2011],
[Galárraga
et
al.,
2014]
Co-‐Reference
and
Polysemy
ResoluTon
148. Machine
Reading
• Named
EnTty
ResoluTon/ExtracTon
• RelaTon
ExtracTon
• Co-‐reference
and
Synonym
ResoluTon
• RelaTon
Discovery
• Inference
• Knowledge
Base
RepresentaTon
• Document/Sentence
Understanding
(Micro-‐
Reading)
149. Machine
Reading
• RelaTon
Discovery
– Which
new
relaTons
are
there
for
given
pair
of
enTTes?
• hasAdvisor
(JimGray,
MikeHarrison)
150. Machine
Reading
• RelaTon
Discovery
– Which
new
relaTons
are
there
for
given
pair
of
enTTes?
• hasAdvisor
(JimGray,
MikeHarrison)
• hasCoAuthor(HectorGarcia-‐Molina,
Gio
Wiederhold)
151. Machine
Reading
• RelaTon
Discovery
– Which
new
relaTons
are
there
for
given
pair
of
enTTes?
• hasAdvisor
(JimGray,
MikeHarrison)
• hasCoAuthor(HectorGarcia-‐Molina,
Gio
Wiederhold)
• graduatedAt
(JimGray,
Berkeley)
152. Machine
Reading
• RelaTon
Discovery
– Which
new
relaTons
are
there
for
given
pair
of
enTTes?
• hasAdvisor
(JimGray,
MikeHarrison)
• hasCoAuthor(HectorGarcia-‐Molina,
Gio
Wiederhold)
• graduatedAt
(JimGray,
Berkeley)
• studiedAt
(HectorGarcia-‐Molina,
Stanford)
• bornOn
(JohnLennon,
9Oct1940)
• releasedAlbum
(JohnLennon,
10Dec1965)
153.
Set
of
labeled
pairs
of
Instances
Examples
Set
of
labeled
Pa@ern
Examples
RelaTon
Discovery
Clustering
Algorithm
154. Machine
Reading
• Named
EnTty
ResoluTon/ExtracTon
• RelaTon
ExtracTon
• Co-‐reference
and
Synonym
ResoluTon
• RelaTon
Discovery
• Inference
• Knowledge
Base
RepresentaTon
• Document/Sentence
Understanding
(Micro-‐
Reading)
155. Inference
• Inference
is
the
act
or
process
of
deriving
logical
conclusions
from
premises
known
or
assumed
to
be
true
[Wikipedia]
156. Inference
• Manually
craved
inference
rules
• AutomaTcally
learned
inference
rules
• Data
mining
the
Knowledge
Base
157. Machine
Reading
• Named
EnTty
ResoluTon/ExtracTon
• RelaTon
ExtracTon
• Co-‐reference
and
Synonym
ResoluTon
• RelaTon
Discovery
• Inference
• Knowledge
Base
RepresentaTon
• Document/Sentence
Understanding
(Micro-‐
Reading)
162. Document/Sentence
UnderstanTng
(MicroRead)
• “The
scienTst
observed
the
bu[erfly
with
the
blue
circle”
• “The
scienTst
observed
the
bu@erfly
with
the
blue
microscope”
163. Document/Sentence
UnderstanTng
(MicroRead)
• “The
scienTst
observed
the
bu[erfly
with
the
blue
circle”
• “The
scienAst
observed
the
bu@erfly
with
the
blue
microscope”
164. Outline
• Machine
Learning
• Machine
Reading
• Reading
the
Web
– DBPedia
– YAGO
– KnowItAll
– NELL
165. Outline
• Machine
Learning
• Machine
Reading
• Reading
the
Web
– DBPedia
– YAGO
– KnowItAll
– NELL
168. DBPedia
Mapping
Wikipedia
semi-‐structured
data
into
RDF
triples
Semi-‐structured
data
The
“Low-‐Hanging
Fruit”
169. DBPedia
• How
to
Read
Wikipedia
Semi-‐structured
data?
[Lehmann
et
al.,
2014]
– Parse
Wikipedia
Markup
language
– Overcome
the
lack
of
standard
problem
• Same
properTes
might
have
different
names
• “Datebirth”
and
“Birth_date”
• “Birthplace”
and
“Birth_place”
– Instead
of
“Modeling
the
World”,
try
to
structure
the
available
informaTon
174. YAGO
• Yet
Another
Great
Ontology
-‐
YAGO
• Main
Goal:
building
a
conveniently
searchable,
large-‐scale,
highly
accurate
knowledge
base
of
common
facts
in
a
machine-‐processable
representaTon
176. YAGO
• Turn
Web
into
Knowledge
Base
[Weikum
et
al.,
2009]
– Building
a
comprehensive
Knowledge
Base
of
human
knowledge
– knowledge
from
Wikipedia
and
WordNet
– the
ontology
check
itself
for
precision
177. YAGO
• The
knowledge
base
is
automaTcally
constructed
from
Wikipedia
• Each
arTcle
in
Wikipedia
becomes
an
enTty
in
the
kb
(e.g.,
since
Leonard
Cohen
has
an
arTcle
in
Wikipedia,
LeonardCohen
becomes
an
enTty
in
YAGO).
185. YAGO
• Certain
categories
are
exploited
to
deliver
type
informaTon
(e.g.,
the
arTcle
about
Leonard
Cohen
is
in
the
category
Canadian
male
poets,
so
he
becomes
a
Canadian
poet).
188. YAGO
• For
each
category
of
a
page
[Hoffart
et
al.,
2012]
– Using
shallow
parsing,
determine
the
head
word
of
the
category
name.
In
the
example
of
Canadian
poets,
the
head
word
is
poets.
– If
the
head
word
is
in
plural,
then
proposes
the
category
as
a
class
and
the
arTcle
enTty
as
an
instance
– Link
the
class
to
the
WordNet
taxonomy
(most
frequent
sense
of
the
head
word
in
WordNet)
• only
countable
nouns
can
appear
in
plural
form
• only
countable
nouns
can
be
ontological
classes
• themaTc
categories
(such
as
Canadian
poetry)
are
different
from
conceptual
Categories
189. YAGO
• head
words
that
are
not
conceptual
even
though
they
appear
in
plural
(such
as
stubs
in
Canadian
poetry
stubs)
are
in
the
first
list
of
excepTons.
• words
that
do
not
map
to
their
most
frequent
sense,
but
to
a
different
sense
are
in
the
second
excepTon
list
– The
word
capital,
e.g.,
refers
to
the
main
city
of
a
country
in
the
majority
of
cases
and
not
to
the
financial
amount,
which
is
the
most
frequent
sense
in
WordNet.
190. YAGO
• About
100
manually
defined
relaTons
– wasBornOnDate
– locatedIn
– hasPopulaTon
• Categories
and
infoboxes
are
exploited
to
deliver
facts
(instances
of
relaTons).
• Manually
defined
pa@erns
that
map
categories
and
infobox
a@ributes
to
fact
templates
– infobox
a@ribute
born=Montreal,
thus
wasBornIn(LeonardCohen,
Montreal)
• Pa@ern-‐based
extracTons
resulted
in
2
million
extracted
enTTes
and
20
million
facts
191. YAGO
• Based
on
declaraTve
rules
(stored
in
text
files)
• The
rules
take
the
form
of
subject-‐
predicate-‐
object
triples,
so
that
they
are
basically
addiTonal
facts
• There
are
different
types
of
rules
192. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
193. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
194. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
Knowledge
RepresentaTon
195. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
196. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
Inference
197. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
198. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
Knowledge
RepresentaTon
199. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
200. YAGO
• Factual
rules:
definiTon
of
all
relaTons,
their
domains
and
ranges,
and
the
definiTon
of
the
classes
that
make
up
the
YAGO
hierarchy
of
literal
types.
• ImplicaAon
rules:
express
that
if
certain
facts
appear
in
the
knowledge
base,
then
another
fact
shall
be
added.
Horn
clause
rules.
• Replacement
rules:
for
interpreTng
micro-‐formats,
cleaning
up
HTML
tags,
and
normalizing
numbers.
• ExtracAon
rules:
apply
primarily
to
pa@erns
found
in
the
Wikipedia
infoboxes,
but
also
to
Wikipedia
categories,
arTcle
Ttles,
and
even
other
regular
elements
in
the
source
such
as
headings,
links,
or
references.
InformaTon
ExtracTon
204. YAGO
• Ontology
RepresentaTon
– EnTTes
and
RelaTons
of
public
interest
– Format:
TSV,
RDF,
XML,
N3,
Web
Interface
– Learns
• Instances
and
pa@erns
from
Wikipedia;
• Taxonomy
from
WordNet;
• Geotags
informaTon
from
Geonames.
205. YAGO
• Named
EnTty
ResoluTon/ExtracTon
[Theobald
&
Weikum,
2012]
– Based
on
rules
and
pa@erns
extracted
from
Wikipedia
– DisambiguaTon
is
a
relevant
issue
– Semi-‐structured
data
The
“Low-‐Hanging
Fruit”
• Wikipedia
infoboxes
&
categories
• HMTL
lists
&
tables,
etc.
206. YAGO
• Named
EnTty
ResoluTon/ExtracTon
[Theobald
&
Weikum,
2012]
– Based
on
rules
and
pa@erns
extracted
from
Wikipedia
– DisambiguaTon
is
a
relevant
issue
– Semi-‐structured
data
The
“Low-‐Hanging
Fruit”
• Wikipedia
infoboxes
&
categories
• HMTL
lists
&
tables,
etc.
Natural
Language
Processing
Machine
Learning
207. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
This
slide
was
adapted
from
[Hady
et
al.,
2011]
208. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
This
slide
was
adapted
from
[Hady
et
al.,
2011]
209. YAGO
• RelaTon
ExtracTon
[Theobald
&
Weikum,
2012]
– Based
on
rules
and
pa@erns
extracted
from
Wikipedia
– Semi-‐structured
data
The
“Low-‐Hanging
Fruit”
• Wikipedia
infoboxes
&
categories
• HMTL
lists
&
tables,
etc.
210. YAGO
• RelaTon
ExtracTon
[Theobald
&
Weikum,
2012]
– Based
on
rules
and
pa@erns
extracted
from
Wikipedia
– Semi-‐structured
data
The
“Low-‐Hanging
Fruit”
• Wikipedia
infoboxes
&
categories
• HMTL
lists
&
tables,
etc.
Natural
Language
Processing
Machine
Learning
211. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
This
slide
was
adapted
from
[Hady
et
al.,
2011]
212. Machine
Reading
same
This
slide
was
adapted
from
[Hady
et
al.,
2011]
It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
213. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
same
same
same
same
same
same
This
slide
was
adapted
from
[Hady
et
al.,
2011]
214. It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby.
The old man draws Blomkvist in by promising solid evidence against Wennerström.
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
After discovering that Salander has hacked into his computer, he persuades her to assist
him with research. They eventually become lovers, but Blomkvist has trouble getting close
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
supports herself by doing deep background investigations for Dragan Armansky, who, in
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
Machine
Reading
same
same
same
same
same
same
uncleOf
owns
hires
headOf
This
slide
was
adapted
from
[Hady
et
al.,
2011]
216. YAGO
• YAGO2:
Exploring
and
Querying
World
Knowledge
in
Time,
Space,
Context,
and
Many
Languages
– New
relaTons
specifically
designed
to
cover
Tme,
space
and
context
– Wikipedia
translated
pages
as
sources
for
other
languages
217. YAGO
• YAGO3
[Mahdisoltani
&
Biega
&
Suchanek,
2015]
– an
extension
of
the
YAGO
knowledge
base;
– built
from
the
Wikipedias
in
mulTple
languages.
– fuses
the
mulTlingual
informaTon
with
the
English
WordNet
– categories,
infoboxes,
and
Wikidata,
to
learn
the
meaning
of
infobox
a@ributes
across
languages
– 10
different
languages
– precision
of
95%-‐100%
in
the
a@ribute
mapping
– enlarges
YAGO
by
1m
new
enTTes
and
7m
new
facts.
218. YAGO
• More
on
YAGO:
– Very
nice
tutorials:
• “Knowledge
Bases
for
Web
Content
AnalyTcs”
at
WWW
2015,
Florence,
May
2015.
• "SemanTc
Knowledge
Bases
from
Web
Sources"
at
IJCAI
2011,
Barcelona,
July
2011
"HarvesTng
Knowledge
from
Web
Data
and
Text"
at
CIKM
2010,
Toronto,
October
2010
"From
InformaTon
to
Knowledge:
HarvesTng
EnTTes
and
RelaTonships
from
Web
Sources"
at
PODS
2010,
Indianapolis,
June
2010
– Project
Website:
• h[p://www.mpi-‐inf.mpg.de/yago-‐naga/
219. YAGO
• More
on
YAGO
(h[p://www.mpi-‐inf.mpg.de/yago-‐naga/)
220. YAGO
• More
on
YAGO
(h[p://www.mpi-‐inf.mpg.de/yago-‐naga/)
221. YAGO
• More
on
YAGO
(h[p://www.mpi-‐inf.mpg.de/yago-‐naga/)
222. YAGO
• More
on
YAGO
(h[p://www.mpi-‐inf.mpg.de/yago-‐naga/)
?X
<hasChild>
?C
?Y
<hasChild>
?C
=>
?X
<isMarriedTo>
?Y
223. YAGO
• More
on
YAGO
(h[p://www.mpi-‐inf.mpg.de/yago-‐naga/)
?X
<hasChild>
?C
?Y
<hasChild>
?C
=>
?X
<isMarriedTo>
?Y
Machine
Learning
224. YAGO
• More
on
YAGO
(h[p://www.mpi-‐inf.mpg.de/yago-‐naga/)
?X
<hasChild>
?C
?Y
<hasChild>
?C
=>
?X
<isMarriedTo>
?Y
Machine
Learning
Inference
225. Outline
• Machine
Learning
• Machine
Reading
• Reading
the
Web
– DBPedia
– YAGO
– KnowItAll
– NELL
226. Outline
• Machine
Learning
• Machine
Reading
• Reading
the
Web
– DBPedia
– YAGO
– KnowItAll
– NELL
230. KnowItAll
• MoTvaTon:
New
Paradigm
for
Search
[Etzioni,
2008]
– The
future
of
Web
Search
– Read
the
Web
instead
of
retrieving
Web
pages
to
perform
Web
Search
231. KnowItAll
• InformaTon
ExtracTon
(IE)
+
tractable
inference
– IE(sentence)
=
who
did
what?
• speaker(P.
Smith,
ECMLPKDD2012)
– Inference
=
uncover
implicit
informaTon
• Will
Pi@sburgh
Steelers
be
champions
again?
• Open
InformaTon
ExtracTon
[Banko
et
al.,
2007]
232. Open
InformaTon
ExtracTon
[Banko
et
al.,
2007]
• Open
IE
systems
avoid
specific
nouns
and
verbs
• Extractors
are
unlexicalized—formulated
only
in
terms
of:
–
syntacTc
tokens
(e.g.,
part-‐of-‐speech
tags)
– closed-‐word
classes
(e.g.,
of,
in,
such
as).
• Open
IE
extractors
focus
on
generic
ways
in
which
relaTonships
are
expressed
in
English
– naturally
generalizing
across
domains.
233. Open
InformaTon
ExtracTon
[Banko
et
al.,
2007]
• Open
IE
extractors
focus
on
generic
ways
in
which
relaTonships
are
expressed
in
English
– naturally
generalizing
across
domains.
RelaTon
Discovery
234. Open
InformaTon
ExtracTon
• Open
IE
systems
are
tradiTonally
based
on
three
steps
[Etzioni
et
al.,
2011]:
– 1.
Label:
Sentences
are
automaTcally
labeled
with
extracTons
using
heurisTcs
or
distant
supervision.
Unsupervised
Learning
235. Open
InformaTon
ExtracTon
• Open
IE
systems
are
tradiTonally
based
on
three
steps
[Etzioni
et
al.,
2011]:
– 1.
Label:
Sentences
are
automaTcally
labeled
with
extracTons
using
heurisTcs
or
distant
supervision.
– 2.
Learn:
A
relaTon
phrase
extractor
is
learned
using
a
sequence-‐labeling
graphical
model
(e.g.,
CRF).
Supervised
Learning