Temporal Information Processing is a subfield of Natural Language Processing, valuable in many tasks
like Question Answering and Summarization. Temporal Information Processing is broadened, ranging
from classical theories of time and language to current computational approaches for Temporal
Information Extraction. This later trend consists on the automatic extraction of events and temporal
expressions. Such issues have attracted great attention especially with the development of annotated
corpora and annotations schemes mainly TimeBank and TimeML. In this paper, we give a survey of
Temporal Information Extraction from Natural Language texts.
Rule-based Information Extraction for Airplane Crashes ReportsCSCJournals
Over the last two decades, the internet has gained a widespread use in various aspects of everyday living. The amount of generated data in both structured and unstructured forms has increased rapidly, posing a number of challenges. Unstructured data are hard to manage, assess, and analyse in view of decision making. Extracting information from these large volumes of data is time-consuming and requires complex analysis. Information extraction (IE) technology is part of a text-mining framework for extracting useful knowledge for further analysis.
Various competitions, conferences and research projects have accelerated the development phases of IE. This project presents in detail the main aspects of the information extraction field. It focused on specific domain: airplane crash reports. Set of reports were used from 1001 Crash website to perform the extraction tasks such as: crash site, crash date and time, departure, destination, etc. As such, the common structures and textual expressions are considered in designing the extraction rules.
The evaluation framework used to examine the system's performance is executed for both working and test texts. It shows that the system's performance in extracting entities and relations is more accurate than for events. Generally, the good results reflect the high quality and good design of the extraction rules. It can be concluded that the rule-based approach has proved its efficiency of delivering reliable results. However, this approach does require an intensive work and a cycle process of rules testing and modification.
Rule-based Information Extraction for Airplane Crashes ReportsCSCJournals
Over the last two decades, the internet has gained a widespread use in various aspects of everyday living. The amount of generated data in both structured and unstructured forms has increased rapidly, posing a number of challenges. Unstructured data are hard to manage, assess, and analyse in view of decision making. Extracting information from these large volumes of data is time-consuming and requires complex analysis. Information extraction (IE) technology is part of a text-mining framework for extracting useful knowledge for further analysis.
Various competitions, conferences and research projects have accelerated the development phases of IE. This project presents in detail the main aspects of the information extraction field. It focused on specific domain: airplane crash reports. Set of reports were used from 1001 Crash website to perform the extraction tasks such as: crash site, crash date and time, departure, destination, etc. As such, the common structures and textual expressions are considered in designing the extraction rules.
The evaluation framework used to examine the system's performance is executed for both working and test texts. It shows that the system's performance in extracting entities and relations is more accurate than for events. Generally, the good results reflect the high quality and good design of the extraction rules. It can be concluded that the rule-based approach has proved its efficiency of delivering reliable results. However, this approach does require an intensive work and a cycle process of rules testing and modification.
Recruitment Based On Ontology with Enhanced Security Featurestheijes
This document describes a recruitment system based on ontology with enhanced security features. The system allows human resource personnel to search for and select candidates based on criteria like area of interest and academic performance. HR users must first complete a registration process that generates a random security code sent to their email. They can then log in to search candidate profiles and select individuals of interest. Selected candidates' details are emailed to HR for future reference. The system also periodically refreshes candidate data to improve memory management and logs all user activity for security. The proposed system aims to facilitate secure, efficient recruitment while maintaining data integrity through its design.
Rule-based Information Extraction from Disease Outbreak ReportsWaqas Tariq
Information extraction (IE) systems serve as the front end and core stage in different natural language programming tasks. As IE has proved its efficiency in domain-specific tasks, this project focused on one domain: disease outbreak reports. Several reports from the World Health Organization were carefully examined to formulate the extraction tasks: named-entities, such as disease name, date and location; the location of the reporting authority; and the outbreak incident. Extraction rules were then designed, based on a study of the textual expressions and elements found in the text that appeared before and after the target text.
The experiment resulted in very high performance scores for all the tasks in general. The training corpora and the testing corpora were tested separately. The system performed with higher accuracy with entities and events extraction than with relationship extraction.
It can be concluded that the rule-based approach has been proven capable of delivering reliable IE, with extremely high accuracy and coverage results. However, this approach requires an extensive, time-consuming, manual study of word classes and phrases.
This document summarizes an article about adaptive information extraction. It discusses how information extraction research has grown with the increasing availability of online text sources. However, one drawback of information extraction is its domain dependence. To address this, machine learning techniques have been used to develop adaptive information extraction systems that can be applied to new domains with less manual adaptation. The document provides an overview of information extraction and different machine learning approaches used for adaptive information extraction.
Great model a model for the automatic generation of semantic relations betwee...ijcsity
The
large
a
v
ailable
am
ou
n
t
of
non
-
structured
texts
that
b
e
-
long
to
differe
n
t
domains
su
c
h
as
healthcare
(e.g.
medical
records),
justice
(e.g.
l
a
ws,
declarations),
insurance
(e.g.
declarations),
etc. increases
the
effort
required
for
the
analysis
of
information
in
a
decision making
pro
-
cess.
Differe
n
t
pr
o
jects
and t
o
ols
h
av
e
pro
p
osed
strategies
to
reduce
this
complexi
t
y
b
y
classifying,
summarizing
or
annotating
the
texts.
P
artic
-
ularl
y
,
text
summary
strategies
h
av
e
pr
ov
en
to
b
e
v
ery
useful
to
pr
o
vide
a
compact
view
of
an
original
text.
H
ow
e
v
er,
the
a
v
ailable
strategies
to
generate
these
summaries
do
not
fit
v
ery
w
ell
within
the
domains
that
require
ta
k
e
i
n
to
consideration
the
tem
p
oral
dimension
of
the
text
(e.g.
a
rece
n
t
piece
of
text
in
a
medical
record
is
more
im
p
orta
n
t
than
a
pre
-
vious
one)
and
the
profile
of
the
p
erson
who
requires
the
summary
(e.g
the
medical
s
p
ecialization).
T
o
co
p
e with
these
limitations
this
pa
p
er
prese
n
ts
”GRe
A
T”
a
m
o
del
for
automatic
summary
generation
that
re
-
lies
on
natural
language
pr
o
cessing
and
text
mining
te
c
hniques
to
extract
the
most
rele
v
a
n
t
information
from
narrati
v
e
texts
and
disc
o
v
er
new
in
-
formation
from
the
detection
of
related
information. GRe
A
T
M
o
del
w
as impleme
n
ted
on
sof
tw
are
to
b
e
v
alidated
in
a
health
institution
where
it
has
sh
o
wn
to
b
e
v
ery
useful
to displ
a
y
a
preview
of
the
information
a
b
ou
t
medical
health
records
and
disc
o
v
er
new
facts
and
h
y
p
otheses
within
the
information.
Se
v
eral
tests
w
ere
executed
su
c
h
as
F
unctional
-
i
t
y
,
Usabili
t
y
and
P
erformance
regarding
to
the
impleme
n
ted
sof
t
w
are.
In
addition,
precision
and
recall
measures
w
ere
applied
on
the
results
ob
-
tained
through
the
impleme
n
ted
t
o
ol,
as
w
ell
as
on
the
loss
of
information
obtained
b
y
pr
o
viding
a
text
more
shorter than
the
original
Data and Information Integration: Information ExtractionIJMER
Information extraction is generally concerned with the location of different items in any document, may be textual or web document. This paper is concerned with the methodologies and applications of information extraction. The field of information extraction plays a very important role in the natural language processing community. The architecture of information extraction system which acts as the base for all languages and fields is also discussed along with its different components. Information is hidden in the large volume of web pages and thus it is necessary to extract useful information from the web content, called Information Extraction. In information extraction, given a sequence of instances, we identify and pull out a sub-sequence of the input that represents information we are interested in.
Manual data extraction from semi supervised web pages is a difficult task. This paper focuses on study of various data extraction techniques and also some web data extraction techniques. In the past years, there was a rapid expansion of activities in the information extraction area. Many methods have been proposed for automating the process of extraction. We will survey various web data extraction tools. Several real-world applications of information extraction will be introduced. What role information extraction plays in different fields is discussed in these applications. Current challenges being faced by the available information extraction techniques are briefly discussed along with the future work going on using the current researches is discussed.
A Simple Information Retrieval Techniqueidescitation
The document presents a simple information retrieval technique that involves removing stop words and punctuation from documents, calculating term frequency and inverse document frequency, constructing a master document matrix, and ranking documents based on similarity to user queries. The technique is demonstrated on a sample collection of 5 documents. For a query on "information retrieval system", the documents are ranked from most similar to least similar as document 5, document 1, document 2, document 3, document 4. The technique provides an easy way to search and retrieve relevant documents from a collection.
Rule-based Information Extraction for Airplane Crashes ReportsCSCJournals
Over the last two decades, the internet has gained a widespread use in various aspects of everyday living. The amount of generated data in both structured and unstructured forms has increased rapidly, posing a number of challenges. Unstructured data are hard to manage, assess, and analyse in view of decision making. Extracting information from these large volumes of data is time-consuming and requires complex analysis. Information extraction (IE) technology is part of a text-mining framework for extracting useful knowledge for further analysis.
Various competitions, conferences and research projects have accelerated the development phases of IE. This project presents in detail the main aspects of the information extraction field. It focused on specific domain: airplane crash reports. Set of reports were used from 1001 Crash website to perform the extraction tasks such as: crash site, crash date and time, departure, destination, etc. As such, the common structures and textual expressions are considered in designing the extraction rules.
The evaluation framework used to examine the system's performance is executed for both working and test texts. It shows that the system's performance in extracting entities and relations is more accurate than for events. Generally, the good results reflect the high quality and good design of the extraction rules. It can be concluded that the rule-based approach has proved its efficiency of delivering reliable results. However, this approach does require an intensive work and a cycle process of rules testing and modification.
Rule-based Information Extraction for Airplane Crashes ReportsCSCJournals
Over the last two decades, the internet has gained a widespread use in various aspects of everyday living. The amount of generated data in both structured and unstructured forms has increased rapidly, posing a number of challenges. Unstructured data are hard to manage, assess, and analyse in view of decision making. Extracting information from these large volumes of data is time-consuming and requires complex analysis. Information extraction (IE) technology is part of a text-mining framework for extracting useful knowledge for further analysis.
Various competitions, conferences and research projects have accelerated the development phases of IE. This project presents in detail the main aspects of the information extraction field. It focused on specific domain: airplane crash reports. Set of reports were used from 1001 Crash website to perform the extraction tasks such as: crash site, crash date and time, departure, destination, etc. As such, the common structures and textual expressions are considered in designing the extraction rules.
The evaluation framework used to examine the system's performance is executed for both working and test texts. It shows that the system's performance in extracting entities and relations is more accurate than for events. Generally, the good results reflect the high quality and good design of the extraction rules. It can be concluded that the rule-based approach has proved its efficiency of delivering reliable results. However, this approach does require an intensive work and a cycle process of rules testing and modification.
Recruitment Based On Ontology with Enhanced Security Featurestheijes
This document describes a recruitment system based on ontology with enhanced security features. The system allows human resource personnel to search for and select candidates based on criteria like area of interest and academic performance. HR users must first complete a registration process that generates a random security code sent to their email. They can then log in to search candidate profiles and select individuals of interest. Selected candidates' details are emailed to HR for future reference. The system also periodically refreshes candidate data to improve memory management and logs all user activity for security. The proposed system aims to facilitate secure, efficient recruitment while maintaining data integrity through its design.
Rule-based Information Extraction from Disease Outbreak ReportsWaqas Tariq
Information extraction (IE) systems serve as the front end and core stage in different natural language programming tasks. As IE has proved its efficiency in domain-specific tasks, this project focused on one domain: disease outbreak reports. Several reports from the World Health Organization were carefully examined to formulate the extraction tasks: named-entities, such as disease name, date and location; the location of the reporting authority; and the outbreak incident. Extraction rules were then designed, based on a study of the textual expressions and elements found in the text that appeared before and after the target text.
The experiment resulted in very high performance scores for all the tasks in general. The training corpora and the testing corpora were tested separately. The system performed with higher accuracy with entities and events extraction than with relationship extraction.
It can be concluded that the rule-based approach has been proven capable of delivering reliable IE, with extremely high accuracy and coverage results. However, this approach requires an extensive, time-consuming, manual study of word classes and phrases.
This document summarizes an article about adaptive information extraction. It discusses how information extraction research has grown with the increasing availability of online text sources. However, one drawback of information extraction is its domain dependence. To address this, machine learning techniques have been used to develop adaptive information extraction systems that can be applied to new domains with less manual adaptation. The document provides an overview of information extraction and different machine learning approaches used for adaptive information extraction.
Great model a model for the automatic generation of semantic relations betwee...ijcsity
The
large
a
v
ailable
am
ou
n
t
of
non
-
structured
texts
that
b
e
-
long
to
differe
n
t
domains
su
c
h
as
healthcare
(e.g.
medical
records),
justice
(e.g.
l
a
ws,
declarations),
insurance
(e.g.
declarations),
etc. increases
the
effort
required
for
the
analysis
of
information
in
a
decision making
pro
-
cess.
Differe
n
t
pr
o
jects
and t
o
ols
h
av
e
pro
p
osed
strategies
to
reduce
this
complexi
t
y
b
y
classifying,
summarizing
or
annotating
the
texts.
P
artic
-
ularl
y
,
text
summary
strategies
h
av
e
pr
ov
en
to
b
e
v
ery
useful
to
pr
o
vide
a
compact
view
of
an
original
text.
H
ow
e
v
er,
the
a
v
ailable
strategies
to
generate
these
summaries
do
not
fit
v
ery
w
ell
within
the
domains
that
require
ta
k
e
i
n
to
consideration
the
tem
p
oral
dimension
of
the
text
(e.g.
a
rece
n
t
piece
of
text
in
a
medical
record
is
more
im
p
orta
n
t
than
a
pre
-
vious
one)
and
the
profile
of
the
p
erson
who
requires
the
summary
(e.g
the
medical
s
p
ecialization).
T
o
co
p
e with
these
limitations
this
pa
p
er
prese
n
ts
”GRe
A
T”
a
m
o
del
for
automatic
summary
generation
that
re
-
lies
on
natural
language
pr
o
cessing
and
text
mining
te
c
hniques
to
extract
the
most
rele
v
a
n
t
information
from
narrati
v
e
texts
and
disc
o
v
er
new
in
-
formation
from
the
detection
of
related
information. GRe
A
T
M
o
del
w
as impleme
n
ted
on
sof
tw
are
to
b
e
v
alidated
in
a
health
institution
where
it
has
sh
o
wn
to
b
e
v
ery
useful
to displ
a
y
a
preview
of
the
information
a
b
ou
t
medical
health
records
and
disc
o
v
er
new
facts
and
h
y
p
otheses
within
the
information.
Se
v
eral
tests
w
ere
executed
su
c
h
as
F
unctional
-
i
t
y
,
Usabili
t
y
and
P
erformance
regarding
to
the
impleme
n
ted
sof
t
w
are.
In
addition,
precision
and
recall
measures
w
ere
applied
on
the
results
ob
-
tained
through
the
impleme
n
ted
t
o
ol,
as
w
ell
as
on
the
loss
of
information
obtained
b
y
pr
o
viding
a
text
more
shorter than
the
original
Data and Information Integration: Information ExtractionIJMER
Information extraction is generally concerned with the location of different items in any document, may be textual or web document. This paper is concerned with the methodologies and applications of information extraction. The field of information extraction plays a very important role in the natural language processing community. The architecture of information extraction system which acts as the base for all languages and fields is also discussed along with its different components. Information is hidden in the large volume of web pages and thus it is necessary to extract useful information from the web content, called Information Extraction. In information extraction, given a sequence of instances, we identify and pull out a sub-sequence of the input that represents information we are interested in.
Manual data extraction from semi supervised web pages is a difficult task. This paper focuses on study of various data extraction techniques and also some web data extraction techniques. In the past years, there was a rapid expansion of activities in the information extraction area. Many methods have been proposed for automating the process of extraction. We will survey various web data extraction tools. Several real-world applications of information extraction will be introduced. What role information extraction plays in different fields is discussed in these applications. Current challenges being faced by the available information extraction techniques are briefly discussed along with the future work going on using the current researches is discussed.
A Simple Information Retrieval Techniqueidescitation
The document presents a simple information retrieval technique that involves removing stop words and punctuation from documents, calculating term frequency and inverse document frequency, constructing a master document matrix, and ranking documents based on similarity to user queries. The technique is demonstrated on a sample collection of 5 documents. For a query on "information retrieval system", the documents are ranked from most similar to least similar as document 5, document 1, document 2, document 3, document 4. The technique provides an easy way to search and retrieve relevant documents from a collection.
Concept integration using edit distance and n gram match ijdms
Information is growing more rapidly on the World Wide Web (WWW) has made it necessary to make all
this information not only available to people but also to the machines. Ontology and token are widely being
used to add the semantics in data processing or information processing. A concept formally refers to the
meaning of the specification which is encoded in a logic-based language, explicit means concepts,
properties that specification is machine readable and also a conceptualization model how people think
about things of a particular subject area. In modern scenario more ontologies has been developed on
various different topics, results in an increased heterogeneity of entities among the ontologies. The concept
integration becomes vital over last decade and a tool to minimize heterogeneity and empower the data
processing. There are various techniques to integrate the concepts from different input sources, based on
the semantic or syntactic match values. In this paper, an approach is proposed to integrate concept
(Ontologies or Tokens) using edit distance or n-gram match values between pair of concept and concept
frequency is used to dominate the integration process. The proposed techniques performance is compared
with semantic similarity based integration techniques on quality parameters like Recall, Precision, FMeasure
& integration efficiency over the different size of concepts. The analysis indicates that edit
distance value based interaction outperformed n-gram integration and semantic similarity techniques.
Structured and Unstructured Information Extraction Using Text Mining and Natu...rahulmonikasharma
Information on web is increasing at infinitum. Thus, web has become an unstructured global area where information even if available, cannot be directly used for desired applications. One is often faced with an information overload and demands for some automated help. Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents by means of Text Mining and Natural Language Processing (NLP) techniques. Extracted structured information can be used for variety of enterprise or personal level task of varying complexity. The Information Extraction (IE) in also a set of knowledge in order to answer to user consultations using natural language. The system is based on a Fuzzy Logic engine, which takes advantage of its flexibility for managing sets of accumulated knowledge. These sets may be built in hierarchic levels by a tree structure. Information extraction is structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. Data mining research assumes that the information to be “mined” is already in the form of a relational database. IE can serve an important technology for text mining. The knowledge discovered is expressed directly in the documents to be mined, then IE alone can serve as an effective approach to text mining. However, if the documents contain concrete data in unstructured form rather than abstract knowledge, it may be useful to first use IE to transform the unstructured data in the document corpus into a structured database, and then use traditional data mining tools to identify abstract patterns in this extracted data. We propose a novel method for text mining with natural language processing techniques to extract the information from data base with efficient way, where the extraction time and accuracy is measured and plotted with simulation. Where the attributes of entities and relationship entities from structured and semi structured information .Results are compared with conventional methods.
IRJET- Survey for Amazon Fine Food ReviewsIRJET Journal
This document discusses sentiment analysis and summarizes several papers on related topics. It begins with an abstract describing sentiment analysis and its importance. The introduction defines sentiment classification and analysis. The literature survey section summarizes 5 papers on natural language processing and machine learning algorithms for sentiment analysis, including K-means clustering, bag-of-words models, TF-IDF vectorization for document clustering, hierarchical clustering methods, and using naive bayes and SVM for sentiment analysis and text summarization. The conclusion discusses techniques for data processing, natural language processing, and machine learning algorithms covered.
Automated Discovery of Logical Fallacies in Legal Argumentationgerogepatton
This document summarizes an article that presents a model and system for discovering logical fallacies in legal argumentation. The model functions by formalizing legal text in Prolog. It assesses different parts of legal decision making like claims, applied laws, and decisions for detecting fallacies. The system checks arguments for validity, soundness, sufficiency, and necessity to identify fallacies. It is asserted that dealing with these challenges resolves fallacies in argumentation. The model provides a mechanism to discover non sequitur fallacies, where the conclusion does not follow the premises, in legal texts.
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONijaia
This paper presents a model of an algorithmic framework and a system for the discovery of non sequitur fallacies in legal argumentation. The model functions on formalised legal text implemented in Prolog. Different parts of the formalised legal text for legal decision-making processes such as, claim of a plaintiff, the piece of law applied to the case, and the decision of judge, will be assessed by the algorithm, for detecting fallacies in an argument. We provide a mechanism designed to assess the coherence of every premise of a claim, their logic structure and legal consistency, with their corresponding piece of law at each stage of the argumentation. The modelled system checks for validity and soundness of a claim, as well as sufficiency and necessity of the premise of arguments. We assert that, dealing with the challenges of validity, soundness, sufficiency and necessity resolves fallacies in argumentation.
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONgerogepatton
This paper presents a model of an algorithmic framework and a system for the discovery of non sequitur fallacies in legal argumentation. The model functions on formalised legal text implemented in Prolog. Different parts of the formalised legal text for legal decision-making processes such as, claim of a plaintiff, the piece of law applied to the case, and the decision of judge, will be assessed by the algorithm, for detecting fallacies in an argument. We provide a mechanism designed to assess the coherence of every premise of a claim, their logic structure and legal consistency, with their corresponding piece of law at each stage of the argumentation. The modelled system checks for validity and soundness of a claim, as well as sufficiency and necessity of the premise of arguments. We assert that, dealing with the challenges of validity, soundness, sufficiency and necessity resolves fallacies in argumentation.
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONgerogepatton
This document presents a model for discovering logical fallacies in legal argumentation. The model analyzes formalized legal text implemented in Prolog to assess arguments for validity, soundness, sufficiency, and necessity. This helps identify errors in reasoning known as non sequitur fallacies. The system checks if a claim's premises adequately support its conclusion and the application of the relevant law. By examining the logic and consistency of arguments at each stage, the model aims to detect fallacies and ensure legally sound decision making.
The document provides an introduction to information retrieval, including its history, key concepts, and challenges. It discusses how information retrieval aims to retrieve relevant documents from a collection to satisfy a user's information need. The main challenge in information retrieval is determining relevance, as relevance depends on personal assessment, task, context, time, location, and device. Three main issues in information retrieval are determining relevance, representing documents and queries, and developing effective retrieval models and algorithms.
The document provides an introduction to information retrieval, including its history, key concepts, and challenges. It discusses how information retrieval aims to retrieve relevant documents from a collection to satisfy a user's information need. The main challenge in information retrieval is determining relevance, as relevance depends on personal assessment and can change based on context, time, location, and device. The document outlines the major issues and developments in the field over time from the 1950s to present day.
The huge volume of text documents available on the internet has made it difficult to find valuable
information for specific users. In fact, the need for efficient applications to extract interested knowledge
from textual documents is vitally important. This paper addresses the problem of responding to user
queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a
cluster-based information retrieval framework was proposed in this paper, in order to design and develop
a system for analysing and extracting useful patterns from text documents. In this approach, a pre-
processing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector
Space Model (VSM) is performed to represent the dataset. The system was implemented through two main
phases. In phase 1, the clustering analysis process is designed and implemented to group documents into
several clusters, while in phase 2, an information retrieval process was implemented to rank clusters
according to the user queries in order to retrieve the relevant documents from specific clusters deemed
relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision
(P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHijcsit
The huge volume of text documents available on the internet has made it difficult to find valuable
information for specific users. In fact, the need for efficient applications to extract interested knowledge
from textual documents is vitally important. This paper addresses the problem of responding to user
queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a
cluster-based information retrieval framework was proposed in this paper, in order to design and develop
a system for analysing and extracting useful patterns from text documents. In this approach, a preprocessing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector
Space Model (VSM) is performed to represent the dataset. The system was implemented through two main
phases. In phase 1, the clustering analysis process is designed and implemented to group documents into
several clusters, while in phase 2, an information retrieval process was implemented to rank clusters
according to the user queries in order to retrieve the relevant documents from specific clusters deemed
relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision
(P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.
A study on the approaches of developing a named entity recognition tooleSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Decision Support for E-Governance: A Text Mining ApproachIJMIT JOURNAL
Information and communication technology has the capability to improve the process by which governments involve citizens in formulating public policy and public projects. Even though much of government regulations may now be in digital form (and often available online), due to their complexity and diversity, identifying the ones relevant to a particular context is a non-trivial task. Similarly, with the advent of a number of electronic online forums, social networking sites and blogs, the opportunity of gathering citizens’ petitions and stakeholders’ views on government policy and proposals has increased greatly, but the volume and the complexity of analyzing unstructured data makes this difficult. On the other hand, text mining has come a long way from simple keyword search, and matured into a discipline capable of dealing with much more complex tasks. In this paper we discuss how text-mining techniques can help in retrieval of information and relationships from textual data sources, thereby assisting policy makers in discovering associations between policies and citizens’ opinions expressed in electronic public forums and blogs etc. We also present here, an integrated text mining based architecture for e-governance decision support along with a discussion on the Indian scenario.
Automatic Annotation Approach Of Events In News ArticlesJoaquin Hamad
The document describes an approach for automatically annotating events in news articles. It involves four main steps: 1) preprocessing the text through segmentation into sentences and identifying named entities, 2) using a classifier to filter out non-event sentences, 3) grouping similar event sentences into clusters, 4) generating a summary of the annotated events. The approach uses natural language processing and machine learning techniques like decision trees for classification.
This document provides an overview of text mining and web mining. It defines data mining and describes the common data mining tasks of classification, clustering, association rule mining and sequential pattern mining. It then discusses text mining, defining it as the process of analyzing unstructured text data to extract meaningful information and structure. The document outlines the seven practice areas of text mining as search/information retrieval, document clustering, document classification, web mining, information extraction, natural language processing, and concept extraction. It provides brief descriptions of the problems addressed within each practice area.
Applying Soft Computing Techniques in Information RetrievalIJAEMSJORNAL
There is plethora of information available over the internet on daily basis and to retrieve meaningful effective information using usual IR methods is becoming a cumbersome task. Hence this paper summarizes the different soft computing techniques available that can be applied to information retrieval systems to improve its efficiency in acquiring knowledge related to a user’s query.
The document proposes a text mining template-based algorithm to improve business intelligence by categorizing text. It begins with an introduction to data mining and text mining. It then discusses related work on text mining algorithms. The document proposes a methodology using a configuration file to identify fields in documents based on regular expressions. The algorithm reads documents line by line, matches lines to conditions in the configuration file, and stores the identified fields in an array. The array is then used to populate a structured table for analysis to improve decision making. The methodology is experimentally tested on 100 resumes to select candidates, demonstrating its ability to extract structured data from unstructured documents.
(1) The document is an annotated bibliography on information extraction and natural language processing written by Jun-ichi Tsujii from the University of Tokyo.
(2) It provides references to key papers that have influenced the development of the field of information extraction over the last 5 years as of 2000, organized by topics such as general introduction, IE systems used in Message Understanding Conferences, and IE systems for biology and biomedical texts.
(3) The references cover techniques such as finite-state processing, pattern matching, and use of full parsers as well as domain-specific resources for biological IE systems.
The document discusses information retrieval and summarizes key points about data, information, knowledge, and informatics. It provides definitions of information retrieval and discusses challenges in retrieving information from large, unstructured collections. It also summarizes the scope of informatics as focusing on representing, processing, and communicating information in natural and artificial systems.
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
As the information access across languages increases, the importance of a system that supports querybased
searching with the presence of multilingual also grows. Gathering the information in different
natural language is the most difficult task, which requires huge resources like database and digital
libraries. Cross language information retrieval (CLIR) enables to search in multilingual document
collections using the native language which can be supported by the different data mining techniques. This
paper deals with various data mining techniques that can be used for solving the problems encountered in
CLIR.
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...kevig
Simulation of human vocal tract model is necessary to understand the effect of different
conditions on speech generation. In natural calamities such as earthquakes person may swallow
soil due to large amount of dust around. The effect of soil is investigated on the sound pressure in
the human vocal tract. The generation of speech starts from the glottis, so the soil obstacle is
positioned near it. The whole model is designed and analyzed using FEM. As the speech signal
requires a bandwidth near about 4 kHz, the investigation carried out to obtained sound pressure
in the human vocal tract from 100-5000 Hz sound signals.
Identification and Classification of Named Entities in Indian Languageskevig
The process of identification of Named Entities (NEs) in a given document and then there classification into
different categories of NEs is referred to as Named Entity Recognition (NER). We need to do a great effort
in order to perform NER in Indian languages and achieve the same or higher accuracy as that obtained by
English and the European languages. In this paper, we have presented the results that we have achieved by
performing NER in Hindi, Bengali and Telugu using Hidden Markov Model (HMM) and Performance
Metrics.
More Related Content
Similar to Temporal Information Processing: A Survey
Concept integration using edit distance and n gram match ijdms
Information is growing more rapidly on the World Wide Web (WWW) has made it necessary to make all
this information not only available to people but also to the machines. Ontology and token are widely being
used to add the semantics in data processing or information processing. A concept formally refers to the
meaning of the specification which is encoded in a logic-based language, explicit means concepts,
properties that specification is machine readable and also a conceptualization model how people think
about things of a particular subject area. In modern scenario more ontologies has been developed on
various different topics, results in an increased heterogeneity of entities among the ontologies. The concept
integration becomes vital over last decade and a tool to minimize heterogeneity and empower the data
processing. There are various techniques to integrate the concepts from different input sources, based on
the semantic or syntactic match values. In this paper, an approach is proposed to integrate concept
(Ontologies or Tokens) using edit distance or n-gram match values between pair of concept and concept
frequency is used to dominate the integration process. The proposed techniques performance is compared
with semantic similarity based integration techniques on quality parameters like Recall, Precision, FMeasure
& integration efficiency over the different size of concepts. The analysis indicates that edit
distance value based interaction outperformed n-gram integration and semantic similarity techniques.
Structured and Unstructured Information Extraction Using Text Mining and Natu...rahulmonikasharma
Information on web is increasing at infinitum. Thus, web has become an unstructured global area where information even if available, cannot be directly used for desired applications. One is often faced with an information overload and demands for some automated help. Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents by means of Text Mining and Natural Language Processing (NLP) techniques. Extracted structured information can be used for variety of enterprise or personal level task of varying complexity. The Information Extraction (IE) in also a set of knowledge in order to answer to user consultations using natural language. The system is based on a Fuzzy Logic engine, which takes advantage of its flexibility for managing sets of accumulated knowledge. These sets may be built in hierarchic levels by a tree structure. Information extraction is structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. Data mining research assumes that the information to be “mined” is already in the form of a relational database. IE can serve an important technology for text mining. The knowledge discovered is expressed directly in the documents to be mined, then IE alone can serve as an effective approach to text mining. However, if the documents contain concrete data in unstructured form rather than abstract knowledge, it may be useful to first use IE to transform the unstructured data in the document corpus into a structured database, and then use traditional data mining tools to identify abstract patterns in this extracted data. We propose a novel method for text mining with natural language processing techniques to extract the information from data base with efficient way, where the extraction time and accuracy is measured and plotted with simulation. Where the attributes of entities and relationship entities from structured and semi structured information .Results are compared with conventional methods.
IRJET- Survey for Amazon Fine Food ReviewsIRJET Journal
This document discusses sentiment analysis and summarizes several papers on related topics. It begins with an abstract describing sentiment analysis and its importance. The introduction defines sentiment classification and analysis. The literature survey section summarizes 5 papers on natural language processing and machine learning algorithms for sentiment analysis, including K-means clustering, bag-of-words models, TF-IDF vectorization for document clustering, hierarchical clustering methods, and using naive bayes and SVM for sentiment analysis and text summarization. The conclusion discusses techniques for data processing, natural language processing, and machine learning algorithms covered.
Automated Discovery of Logical Fallacies in Legal Argumentationgerogepatton
This document summarizes an article that presents a model and system for discovering logical fallacies in legal argumentation. The model functions by formalizing legal text in Prolog. It assesses different parts of legal decision making like claims, applied laws, and decisions for detecting fallacies. The system checks arguments for validity, soundness, sufficiency, and necessity to identify fallacies. It is asserted that dealing with these challenges resolves fallacies in argumentation. The model provides a mechanism to discover non sequitur fallacies, where the conclusion does not follow the premises, in legal texts.
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONijaia
This paper presents a model of an algorithmic framework and a system for the discovery of non sequitur fallacies in legal argumentation. The model functions on formalised legal text implemented in Prolog. Different parts of the formalised legal text for legal decision-making processes such as, claim of a plaintiff, the piece of law applied to the case, and the decision of judge, will be assessed by the algorithm, for detecting fallacies in an argument. We provide a mechanism designed to assess the coherence of every premise of a claim, their logic structure and legal consistency, with their corresponding piece of law at each stage of the argumentation. The modelled system checks for validity and soundness of a claim, as well as sufficiency and necessity of the premise of arguments. We assert that, dealing with the challenges of validity, soundness, sufficiency and necessity resolves fallacies in argumentation.
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONgerogepatton
This paper presents a model of an algorithmic framework and a system for the discovery of non sequitur fallacies in legal argumentation. The model functions on formalised legal text implemented in Prolog. Different parts of the formalised legal text for legal decision-making processes such as, claim of a plaintiff, the piece of law applied to the case, and the decision of judge, will be assessed by the algorithm, for detecting fallacies in an argument. We provide a mechanism designed to assess the coherence of every premise of a claim, their logic structure and legal consistency, with their corresponding piece of law at each stage of the argumentation. The modelled system checks for validity and soundness of a claim, as well as sufficiency and necessity of the premise of arguments. We assert that, dealing with the challenges of validity, soundness, sufficiency and necessity resolves fallacies in argumentation.
AUTOMATED DISCOVERY OF LOGICAL FALLACIES IN LEGAL ARGUMENTATIONgerogepatton
This document presents a model for discovering logical fallacies in legal argumentation. The model analyzes formalized legal text implemented in Prolog to assess arguments for validity, soundness, sufficiency, and necessity. This helps identify errors in reasoning known as non sequitur fallacies. The system checks if a claim's premises adequately support its conclusion and the application of the relevant law. By examining the logic and consistency of arguments at each stage, the model aims to detect fallacies and ensure legally sound decision making.
The document provides an introduction to information retrieval, including its history, key concepts, and challenges. It discusses how information retrieval aims to retrieve relevant documents from a collection to satisfy a user's information need. The main challenge in information retrieval is determining relevance, as relevance depends on personal assessment, task, context, time, location, and device. Three main issues in information retrieval are determining relevance, representing documents and queries, and developing effective retrieval models and algorithms.
The document provides an introduction to information retrieval, including its history, key concepts, and challenges. It discusses how information retrieval aims to retrieve relevant documents from a collection to satisfy a user's information need. The main challenge in information retrieval is determining relevance, as relevance depends on personal assessment and can change based on context, time, location, and device. The document outlines the major issues and developments in the field over time from the 1950s to present day.
The huge volume of text documents available on the internet has made it difficult to find valuable
information for specific users. In fact, the need for efficient applications to extract interested knowledge
from textual documents is vitally important. This paper addresses the problem of responding to user
queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a
cluster-based information retrieval framework was proposed in this paper, in order to design and develop
a system for analysing and extracting useful patterns from text documents. In this approach, a pre-
processing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector
Space Model (VSM) is performed to represent the dataset. The system was implemented through two main
phases. In phase 1, the clustering analysis process is designed and implemented to group documents into
several clusters, while in phase 2, an information retrieval process was implemented to rank clusters
according to the user queries in order to retrieve the relevant documents from specific clusters deemed
relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision
(P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHijcsit
The huge volume of text documents available on the internet has made it difficult to find valuable
information for specific users. In fact, the need for efficient applications to extract interested knowledge
from textual documents is vitally important. This paper addresses the problem of responding to user
queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a
cluster-based information retrieval framework was proposed in this paper, in order to design and develop
a system for analysing and extracting useful patterns from text documents. In this approach, a preprocessing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector
Space Model (VSM) is performed to represent the dataset. The system was implemented through two main
phases. In phase 1, the clustering analysis process is designed and implemented to group documents into
several clusters, while in phase 2, an information retrieval process was implemented to rank clusters
according to the user queries in order to retrieve the relevant documents from specific clusters deemed
relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision
(P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.
A study on the approaches of developing a named entity recognition tooleSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Decision Support for E-Governance: A Text Mining ApproachIJMIT JOURNAL
Information and communication technology has the capability to improve the process by which governments involve citizens in formulating public policy and public projects. Even though much of government regulations may now be in digital form (and often available online), due to their complexity and diversity, identifying the ones relevant to a particular context is a non-trivial task. Similarly, with the advent of a number of electronic online forums, social networking sites and blogs, the opportunity of gathering citizens’ petitions and stakeholders’ views on government policy and proposals has increased greatly, but the volume and the complexity of analyzing unstructured data makes this difficult. On the other hand, text mining has come a long way from simple keyword search, and matured into a discipline capable of dealing with much more complex tasks. In this paper we discuss how text-mining techniques can help in retrieval of information and relationships from textual data sources, thereby assisting policy makers in discovering associations between policies and citizens’ opinions expressed in electronic public forums and blogs etc. We also present here, an integrated text mining based architecture for e-governance decision support along with a discussion on the Indian scenario.
Automatic Annotation Approach Of Events In News ArticlesJoaquin Hamad
The document describes an approach for automatically annotating events in news articles. It involves four main steps: 1) preprocessing the text through segmentation into sentences and identifying named entities, 2) using a classifier to filter out non-event sentences, 3) grouping similar event sentences into clusters, 4) generating a summary of the annotated events. The approach uses natural language processing and machine learning techniques like decision trees for classification.
This document provides an overview of text mining and web mining. It defines data mining and describes the common data mining tasks of classification, clustering, association rule mining and sequential pattern mining. It then discusses text mining, defining it as the process of analyzing unstructured text data to extract meaningful information and structure. The document outlines the seven practice areas of text mining as search/information retrieval, document clustering, document classification, web mining, information extraction, natural language processing, and concept extraction. It provides brief descriptions of the problems addressed within each practice area.
Applying Soft Computing Techniques in Information RetrievalIJAEMSJORNAL
There is plethora of information available over the internet on daily basis and to retrieve meaningful effective information using usual IR methods is becoming a cumbersome task. Hence this paper summarizes the different soft computing techniques available that can be applied to information retrieval systems to improve its efficiency in acquiring knowledge related to a user’s query.
The document proposes a text mining template-based algorithm to improve business intelligence by categorizing text. It begins with an introduction to data mining and text mining. It then discusses related work on text mining algorithms. The document proposes a methodology using a configuration file to identify fields in documents based on regular expressions. The algorithm reads documents line by line, matches lines to conditions in the configuration file, and stores the identified fields in an array. The array is then used to populate a structured table for analysis to improve decision making. The methodology is experimentally tested on 100 resumes to select candidates, demonstrating its ability to extract structured data from unstructured documents.
(1) The document is an annotated bibliography on information extraction and natural language processing written by Jun-ichi Tsujii from the University of Tokyo.
(2) It provides references to key papers that have influenced the development of the field of information extraction over the last 5 years as of 2000, organized by topics such as general introduction, IE systems used in Message Understanding Conferences, and IE systems for biology and biomedical texts.
(3) The references cover techniques such as finite-state processing, pattern matching, and use of full parsers as well as domain-specific resources for biological IE systems.
The document discusses information retrieval and summarizes key points about data, information, knowledge, and informatics. It provides definitions of information retrieval and discusses challenges in retrieving information from large, unstructured collections. It also summarizes the scope of informatics as focusing on representing, processing, and communicating information in natural and artificial systems.
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
As the information access across languages increases, the importance of a system that supports querybased
searching with the presence of multilingual also grows. Gathering the information in different
natural language is the most difficult task, which requires huge resources like database and digital
libraries. Cross language information retrieval (CLIR) enables to search in multilingual document
collections using the native language which can be supported by the different data mining techniques. This
paper deals with various data mining techniques that can be used for solving the problems encountered in
CLIR.
Similar to Temporal Information Processing: A Survey (20)
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...kevig
Simulation of human vocal tract model is necessary to understand the effect of different
conditions on speech generation. In natural calamities such as earthquakes person may swallow
soil due to large amount of dust around. The effect of soil is investigated on the sound pressure in
the human vocal tract. The generation of speech starts from the glottis, so the soil obstacle is
positioned near it. The whole model is designed and analyzed using FEM. As the speech signal
requires a bandwidth near about 4 kHz, the investigation carried out to obtained sound pressure
in the human vocal tract from 100-5000 Hz sound signals.
Identification and Classification of Named Entities in Indian Languageskevig
The process of identification of Named Entities (NEs) in a given document and then there classification into
different categories of NEs is referred to as Named Entity Recognition (NER). We need to do a great effort
in order to perform NER in Indian languages and achieve the same or higher accuracy as that obtained by
English and the European languages. In this paper, we have presented the results that we have achieved by
performing NER in Hindi, Bengali and Telugu using Hidden Markov Model (HMM) and Performance
Metrics.
Effect of Query Formation on Web Search Engine Resultskevig
Query in a search engine is generally based on natural language. A query can be expressed in more than
one way without changing its meaning as it depends on thinking of human being at a particular moment.
Aim of the searcher is to get most relevant results immaterial of how the query has been expressed. In the
present paper, we have examined the results of search engine for change in coverage and similarity of first
few results when a query is entered in two semantically same but in different formats. Searching has been
made through Google search engine. Fifteen pairs of queries have been chosen for the study. The t-test has
been used for the purpose and the results have been checked on the basis of total documents found,
similarity of first five and first ten documents found in the results of a query entered in two different
formats. It has been found that the total coverage is same but first few results are significantly different.
Investigations of the Distributions of Phonemic Durations in Hindi and Dogrikevig
Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations.
For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.
May 2024 - Top10 Cited Articles in Natural Language Computingkevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
Effect of Singular Value Decomposition Based Processing on Speech Perceptionkevig
Speech is an important biological signal for primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. Speech processing is the most important aspect in signal processing. In this paper the theory of linear algebra called singular value decomposition (SVD) is applied to the speech signal. SVD is a technique for deriving important parameters of a signal. The parameters derived using SVD may further be reduced by perceptual evaluation of the synthesized speech using only perceptually important parameters, where the speech signal can be compressed so that the information can be transformed into compressed form without losing its quality. This technique finds wide applications in speech compression, speech recognition, and speech synthesis. The objective of this paper is to investigate the effect of SVD based feature selection of the input speech on the perception of the processed speech signal. The speech signal which is in the form of vowels \a\, \e\, \u\ were recorded from each of the six speakers (3 males and 3 females). The vowels for the six speakers were analyzed using SVD based processing and the effect of the reduction in singular values was investigated on the perception of the resynthesized vowels using reduced singular values. Investigations have shown that the number of singular values can be drastically reduced without significantly affecting the perception of the vowels.
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Modelskevig
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4,
demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to
identify which specific terms in prompts positively or negatively impact relevance
evaluation with LLMs. We employed two types of prompts: those used in previous
research and generated automatically by LLMs. By comparing the performance of
these prompts in both few-shot and zero-shot settings, we analyze the influence of
specific terms in the prompts. We have observed two main findings from our study.
First, we discovered that prompts using the term ‘answer’ lead to more effective
relevance evaluations than those using ‘relevant.’ This indicates that a more direct
approach, focusing on answering the query, tends to enhance performance. Second,
we noted the importance of appropriately balancing the scope of ‘relevance.’ While
the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments.
The inclusion of few-shot examples helps in more precisely defining this balance.
By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute
to refine relevance criteria. In conclusion, our study highlights the significance of
carefully selecting terms in prompts for relevance evaluation with LLMs.
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Modelskevig
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4, demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with LLMs. We employed two types of prompts: those used in previous research and generated automatically by LLMs. By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts. We have observed two main findings from our study. First, we discovered that prompts using the term ‘answer’ lead to more effective relevance evaluations than those using ‘relevant.’ This indicates that a more direct approach, focusing on answering the query, tends to enhance performance. Second, we noted the importance of appropriately balancing the scope of ‘relevance.’ While the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments. The inclusion of few-shot examples helps in more precisely defining this balance. By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute to refine relevance criteria. In conclusion, our study highlights the significance of carefully selecting terms in prompts for relevance evaluation with LLMs.
In recent years, great advances have been made in the speed, accuracy, and coverage of automatic word
sense disambiguator systems that, given a word appearing in a certain context, can identify the sense of
that word. In this paper we consider the problem of deciding whether same words contained in different
documents are related to the same meaning or are homonyms. Our goal is to improve the estimate of the
similarity of documents in which some words may be used with different meanings. We present three new
strategies for solving this problem, which are used to filter out homonyms from the similarity computation.
Two of them are intrinsically non-semantic, whereas the other one has a semantic flavor and can also be
applied to word sense disambiguation. The three strategies have been embedded in an article document
recommendation system that one of the most important Italian ad-serving companies offers to its customers.
Genetic Approach For Arabic Part Of Speech Taggingkevig
With the growing number of textual resources available, the ability to understand them becomes critical.
An essential first step in understanding these sources is the ability to identify the parts-of-speech in each
sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech
tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a
genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic
approaches.
Rule Based Transliteration Scheme for English to Punjabikevig
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
Improving Dialogue Management Through Data Optimizationkevig
In task-oriented dialogue systems, the ability for users to effortlessly communicate with machines and computers through natural language stands as a critical advancement. Central to these systems is the dialogue manager, a pivotal component tasked with navigating the conversation to effectively meet user goals by selecting the most appropriate response. Traditionally, the development of sophisticated dialogue management has embraced a variety of methodologies, including rule-based systems, reinforcement learning, and supervised learning, all aimed at optimizing response selection in light of user inputs. This research casts a spotlight on the pivotal role of data quality in enhancing the performance of dialogue managers. Through a detailed examination of prevalent errors within acclaimed datasets, such as Multiwoz 2.1 and SGD, we introduce an innovative synthetic dialogue generator designed to control the introduction of errors precisely. Our comprehensive analysis underscores the critical impact of dataset imperfections, especially mislabeling, on the challenges inherent in refining dialogue management processes.
Document Author Classification using Parsed Language Structurekevig
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used, for example, to determine authorship of all of The Federalist Papers. Such methods may be useful in more modern times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship using grammatical structural information extracted using a statistical natural language parser. This paper provides a proof of concept, testing author classification based on grammatical structure on a set of “proof texts,” The Federalist Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted from the statisticalnaturallanguage parserwere explored: all subtrees of some depth from any level; rooted subtrees of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the features into a lower dimensional space. Statistical experiments on these documents demonstrate that information from a statistical parser can, in fact, assist in distinguishing authors.
Rag-Fusion: A New Take on Retrieval Augmented Generationkevig
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots, but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal scores and fusing the documents and scores. Through manually evaluating answers on accuracy, relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and comprehensive answers due to the generated queries contextualizing the original query from various perspectives. However, some answers strayed off topic when the generated queries' relevance to the original query is insufficient. This research marks significant progress in artificial intelligence (AI) and natural language processing (NLP) applications and demonstrates transformations in a global and multi-industry context.
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...kevig
The common practice in Machine Learning research is to evaluate the top-performing models based on their performance. However, this often leads to overlooking other crucial aspects that should be given careful consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches (SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance parameters (standard indices), but also alternative measures such as timing, energy consumption and costs, which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of complex models (LLM and generative models), consuming much less energy and requiring fewer resources. These findings suggest that companies should consider additional considerations when choosing machine learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on).
Evaluation of Medium-Sized Language Models in German and English Languagekevig
Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative question answering, which requires models to provide elaborate answers without external document retrieval. The paper introduces an own test dataset and presents results from human evaluation. Results show that combining the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 71.8% and has 33B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. More fine-grained feedback should be used to further improve the quality of answers. The open source community is quickly closing the gap to the best commercial models.
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONkevig
In task-oriented dialogue systems, the ability for users to effortlessly communicate with machines and
computers through natural language stands as a critical advancement. Central to these systems is the
dialogue manager, a pivotal component tasked with navigating the conversation to effectively meet user
goals by selecting the most appropriate response. Traditionally, the development of sophisticated dialogue
management has embraced a variety of methodologies, including rule-based systems, reinforcement
learning, and supervised learning, all aimed at optimizing response selection in light of user inputs. This
research casts a spotlight on the pivotal role of data quality in enhancing the performance of dialogue
managers. Through a detailed examination of prevalent errors within acclaimed datasets, such as
Multiwoz 2.1 and SGD, we introduce an innovative synthetic dialogue generator designed to control the
introduction of errors precisely. Our comprehensive analysis underscores the critical impact of dataset
imperfections, especially mislabeling, on the challenges inherent in refining dialogue management
processes.
Document Author Classification Using Parsed Language Structurekevig
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the
text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used,
for example, to determine authorship of all of The Federalist Papers. Such methods may be useful in more modern
times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of
using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship
using grammatical structural information extracted using a statistical natural language parser. This paper provides a
proof of concept, testing author classification based on grammatical structure on a set of “proof texts,” The Federalist
Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted
of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the
features into a lower dimensional space. Statistical experiments on these documents demonstrate that information
from a statistical parser can, in fact, assist in distinguishing authors.
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONkevig
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product
information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots,
but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines
RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal
scores and fusing the documents and scores. Through manually evaluating answers on accuracy,
relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and
comprehensive answers due to the generated queries contextualizing the original query from various
perspectives. However, some answers strayed off topic when the generated queries' relevance to the
original query is insufficient. This research marks significant progress in artificial intelligence (AI) and
natural language processing (NLP) applications and demonstrates transformations in a global and multiindustry context
Performance, energy consumption and costs: a comparative analysis of automati...kevig
The common practice in Machine Learning research is to evaluate the top-performing models based on their
performance. However, this often leads to overlooking other crucial aspects that should be given careful
consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into
account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches
(SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance
parameters (standard indices), but also alternative measures such as timing, energy consumption and costs,
which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and
the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of
complex models (LLM and generative models), consuming much less energy and requiring fewer resources.
These findings suggest that companies should consider additional considerations when choosing machine
learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific
world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to
give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on).
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
In our second session, we shall learn all about the main features and fundamentals of UiPath Studio that enable us to use the building blocks for any automation project.
📕 Detailed agenda:
Variables and Datatypes
Workflow Layouts
Arguments
Control Flows and Loops
Conditional Statements
💻 Extra training through UiPath Academy:
Variables, Constants, and Arguments in Studio
Control Flow in Studio
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Temporal Information Processing: A Survey
1. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
DOI : 10.5121/ijnlc.2012.1201 1
TEMPORAL INFORMATION PROCESSING: A SURVEY
Ines Berrazega
1
LARODEC Laboratory, ISG of Tunis B.P.1088, 2000 Le Bardo, Tunisia
Ines_berrazega@yahoo.fr
ABSTRACT
Temporal Information Processing is a subfield of Natural Language Processing, valuable in many tasks
like Question Answering and Summarization. Temporal Information Processing is broadened, ranging
from classical theories of time and language to current computational approaches for Temporal
Information Extraction. This later trend consists on the automatic extraction of events and temporal
expressions. Such issues have attracted great attention especially with the development of annotated
corpora and annotations schemes mainly TimeBank and TimeML. In this paper, we give a survey of
Temporal Information Extraction from Natural Language texts.
KEYWORDS
Temporal Information Processing, Natural Language Processing, Information Extraction, annotation
schemes, TimeML.
1. INTRODUCTION
Information Extraction is gaining increased attention by researchers who seek to acquire
knowledge from huge amount of natural language contents. Many approaches have been
proposed to extract valuable information from text. However, dynamic facts as temporal
information have been neglected, although time is a crucial dimension in any information space
[1]. This limitation can be explained by the complexity of such task. For example, the classical
techniques used to extract named entities and events from textual contents are unable to identify
the temporal relations between events or to infer the chronological ordering of these events. Such
processes require a grater effort to analyze how temporal information is conveyed in real texts,
especially when temporal information is implicitly expressed.
In this concern, recent researchers have aimed to expand the capabilities of existing Natural
Language Processing (NLP) systems to account for the temporal dimension of language. Thus the
Temporal Information Extraction tasks first appeared in the scope of MUC-5 when it was asked
to assign a calendrical time to a joint venture event [2].
In fact, extracting temporal information from texts is valuable in many NLP tasks like Question
Answering, Summarization, and Information Extraction. It is also gainful for many real-world
applications like Web Search, Medical-Records, Legal Reasoning, Accounting, Banking,
Reservation Systems, and Accident Reports.
Motivated by the expected applications of Temporal Information Extraction in various domains,
several computational approaches have been proposed. They are handling two main issues:
2. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
2
automatic recognition of temporal entities in narrative (namely temporal expressions and events),
and temporal relations identification between these entities. These tasks are considered quite
complex; tackling them cannot be limited to simple pattern recognition. It requires a deep
comprehension and studying of natural language contents, especially that temporal information
can be expressed in implicit way and some expressions can be ambiguous.
The main purpose of this paper is to present a survey of the Temporal Information Extraction
domain. The rest of this paper is organized as follows. We first present a brief review of
Information Extraction in section 2. Then, we state the main concepts related to Temporal
Information in section 3. We review in section 4 the classical works on Temporal Information
Processing. After that, we focus on the current issues on the Temporal Information Extraction
domain in section 5. Then, we describe some real-world applications in section 6, followed by a
conclusion in section 7.
2. INFORMATION EXTRACTION
Information Extraction (IE) is an important field of Natural Language Processing (NLP). Its aim
is to automatically extract from unstructured and/or semi-structured contents predefined types of
information such as entities, events or relationships.
Information Extraction dates back to the late 1970s. Later, the scope of IE was strongly
influenced by two competitions namely the Message Understanding Conferences (MUC) [3] and
the Automatic Content Extraction (ACE) program. In the scope of these competitions, several
tasks have been discussed such as named entity recognition, co-reference resolution and event
extraction.
It’s worth distinguishing Information Extraction from other fields of NLP like Information
Retrieval (IR). In the next subsection, we present a brief distinction between these two NLP tasks.
2.1. Information Retrieval vs. Information Extraction
Daily, huge amounts of unstructured data are produced everywhere. These contents need to be
exploited and analyzed to acquire relevant information useful in different domains. This issue has
been the topic of several fields of research such as Information Retrieval (IR) and Information
Extraction (IE).
Information Retrieval (IR) involves searching and finding documents that answer the user’s
requirement. According to Gaizauskas and Wilks, Information Retrieval is based on document
retrieval with the aim to answer the question “how to find, in a set of documents, those that
interest me?” [4].
On the other side, Information Extraction (IE) aims to extract interesting information from
documents for an automatic analysis by a computer. The extraction techniques have to deal with
the understanding of the meaning of natural language.
Sarawagi defines Information Extraction as “the automatic extraction of structured information
such as entities, relationships between entities and attributes describing entities from unstructured
sources. This enables much richer forms of queries on the abundant unstructured sources than
possible with keyword searches alone [5].”
3. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
3
While IR aims to recover from a collection of documents a subset that answers the user’s query,
the goal of IE is to extract relevant facts from these documents. These facts are then used in
further applications.
Information Extraction is useful for many issues. The next subsection presents some of its real-
world applications.
2.2. Real-World Applications
Various applications depending on information extraction have been developed especially with
the advent of the Internet. Sarawagi lists four categories of applications: Enterprise, Personal,
Scientific, and Web-oriented. Some of these applications are presented in the following
paragraphs.
Customer Care
These applications are designed to collect unstructured data from customer interaction; and then
to integrate it with the databases and the business ontologies of the enterprise. Sarawagi states
many problems resolved with these applications: “the identification of product names and product
attributes from customer emails, linking of customer emails to a specific transaction in a sales
database, the extraction of merchant name and addresses from sales invoices, the extraction of
repair records from insurance claim forms, the extraction of customer moods from phone
conversation transcripts, and the extraction of product attribute value pairs from textual product
descriptions.”
Personal information management (PIM)
PIM systems are used to structure personal data like documents, emails, and projects in a
structured inter-linked format. For example these systems are able to automatically extract from
an email the location of people names and phone numbers.
Bio-informatics
Many medical applications aim to extract biological objects such as proteins and genes from
paper repositories such as Pubmed [6] as well as their names and their interaction.
Community Websites
This category of websites tracks information about several topics like researchers, conferences,
talks, projects, and events specific to a given community. These websites are used for the creation
of structured databases like DBLife [7] based on various extraction operations: locating talk
announcements from department pages, extracting names of speakers and titles from them,
extracting structured records about a conference from a website, etc.
2.3. Approaches for Information Extraction
In the last two decades, several approaches have been proposed for Information Extraction. There
are three families of approaches namely rule-based approaches, machine learning approaches and
hybrid approaches.
4. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
4
Rule-Based approaches
Historically, rule-based systems for Information Extraction were first implemented. This family
of systems is based on domain-specific extraction rules written by a domain expert. Several rule-
based systems were developed to extract information namely AutoSlog by Riloff [8]; Fastus by
Appelt et al, [9] and GATE by Cunningham et al, [10].
This family of methods requires a great human effort and a considerable time for data analysis
and rule writing. Also rule-based methods lack portability to other domains.
To overcome the deficiencies of the classical rule-based approaches, human effort should be
adequately replaced by a learning component. Thus, machine-learning approaches began to be
exploited.
Machine-Learning approaches
These approaches have become the enabling technique for many problems in Natural Language
Processing, among which are named entity recognition, parsing, tagging and semantic role
labeling. For these purposes, several models were used: generative models based on Hidden
Markov Models [11, 12, 13], conditional models based on maximum entropy [14, 15, 16] and
global conditional models which are Conditional Random Fields [17].
Rule-based methods and machine-learning methods are both used depending on the nature of the
extraction task. There is a third family of approaches consisting on hybrid models.
Hybrid approaches.
This third family of approaches leverages the benefits of both machine-learning and rule-based
methods [18, 19, 20].
All reviewed approaches on Information Extraction were limited to study static facts of natural
language to the detriment of dynamic ones, in spite of the interest of temporal dimension in many
cases. For this reason, several researchers have aimed to expand the capabilities of existing
Natural Language Processing systems to account for the temporal dimension of language. In the
next section, we examine this new trend in Natural Language Processing.
3. BASIC CONCEPTS
When studying temporality, it is important to define basic concepts related to time. In this section
we try to provide a clear description of temporal entities and relations in textual contents that
enables us to gain useful insights into how temporal information is conveyed in written language.
3.1. Eventualities: Events and States
The first two main temporal entities in texts are Events and States. These two entities are grouped
by Bach in 1986 under the term of eventualities [21].
Events are things that happen or occur in the world (like weddings, birthdays, and parties…) at a
certain time or over a given period of time and in a given place. They are typically dynamic
occurrences that have causes and effects, a clear beginning and end, and bring about some
perceptible change in the world [22].
5. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
5
States, on the other hand, are considered as the existence of a set of properties over a given period
(like happiness, owning a car…) often without a clearly defined beginning and end.
Bittar mentions that “the occurrence of an event may bring about the existence of a state.
Similarly, an event may also bring about the end of a state. In other words, a change of state in
the world typically indicates the occurrence of an event [23].”
The main difference between events and states is attributed to the dynamic/static distinction,
where events are seen as dynamic and resulting in some change or other, while states, on the
other hand, do not involve a perceptible.
3.2. Temporal Expression
A temporal expression can be defined as being any element in language that lexicalizes the
concept of time in terms of recognized and generally quantifiable temporal units [23]. In
accordance with Sauri and al, there are four categories of temporal expressions according to the
types of instants or intervals they describe: dates, times and durations, sets [24].
Dates are expressions that refer to a particular period based on the Gregorian calendar. This
includes units that are larger than a year, such as centuries and millennia, as well as subintervals
of a typical year, such as seasons. The basic unit on which the calendar is based is the day. Dates
can be expressed in an absolute form (eg, Tuesday 18th) or in a relative form (eg, last week).
Times are expressions which denote a particular subdivision of a day. They are understood to
refer to particular moments that subdivide a day. Also, times represent a relatively simple
category of temporal expressions and may correspond to the moments we measure on a clock, or
express more general parts of a typical day (eg, in the morning/ at 9 a.m)
Durations are expressions which refer to an extended period of time, very often specifying the
temporal extent of an eventuality. They are measured using calendrical units (years, months, days
etc.) or clock units (hours, minutes, seconds etc.) of temporal measure (eg, 3 hours last Monday).
Sets are expressions that refer to the regularity or reoccurrence of an eventuality, either in the
absolute, or relative to a period of time (eg, twice a week).
3.3. Temporal Relations
In natural language, temporal relations hold between temporal entities: between two events,
between two temporal expressions or between an event and a temporal expression. Longacre
defines a temporal relation as “an inter-propositional relation that communicates the simultaneity
or ordering in time of events or states" [25]. On this basis, reasoning about time implies the
temporal representation of these relations. In this concern, most of computational models for
reasoning about time are based on the Allen’s Interval Algebra [26] to capture the temporal
dimension of a narrative. More details about this algebra are presented in the subsection 4.2.
After having stated the main concepts related to time in language, the next section is dedicated to
present the earlier work on Temporal Information Processing.
6. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
6
4. CLASSICAL WORK IN TEMPORAL INFORMATION
PROCESSING
The scope of research in Temporal Information Processing is broadened, ranging from classical
linguistic theories of tense and aspect to current computational approaches for Temporal
Information Extraction.
Prior to describe current Temporal Information Processing issues, it is necessary to review the
background studies on time and language. The following subsections review the most outstanding
classical Temporal Information Processing works [27]. The first subsection presents the most
prominent linguistic theories of tense and aspect. The second subsection reviews the temporal
reasoning works. The third subsection states the works on the temporal structure of discourse.
Finally, the fourth subsection shows the limitations of these classical works.
4.1. Linguistic theories of tense and aspect
Temporal information is conveyed in natural language through grammatical mechanisms to
represent time and temporal relations. In this concern, tense and aspect are considered as the most
important grammatical categories in language.
The most prominent works on linguistic theories are the Reichenbach's theory for tense called
'The Tenses of Verbs' [28] and the Vendler’s work [29] for aspect which has been the basis for
subsequent researchers.
4.1.1. Tense
Tense is a temporal linguistic mechanism that expresses the time at which or during which an
event takes place. In this purpose, Reichenbach [28] develops a theory in which tense gives
information about the following times:
Speech Time (S): the time related to the time point of the speaking act.
Event Time (E): the time at which the told event happens.
This two times points let express the basic tense classes with use of the operators of precedence
(<) and simultaneity (=).
The followings three sentences illustrate basic tense classes with speech and event time points.
a. I played football: Past tense. (E < S)
b. I play football: Present tense. (E= S)
c. I will play football: Future tense. (S < E)
However, combinations between these two times points are unable to express all tenses. Consider
the following examples:
a. I played football.
b. I had played football.
Even though the two sentences express events which occurred in the past (E < S), its incorrect to
represent them in the same way. In example (I had played football) the event of playing seems to
refer to another event. To handle this situation, Reichenbach introduce a third temporal point into
7. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
7
his model which is the Reference Time (R). According to this extension, the Event Time (E) In
example (I had played football) is the time at which I played, and the Reference Time (R) is
between the Event Time (E) and the Speech Time (S) “(E <R < S)”.
With the three points defined by Reichenbach, it’s possible to represent all the tenses using a set
of relations between these points. Nevertheless, Reichenbach’s relations still ambiguous, so Song
and Cohen [30] develop an unambiguous set of relations including a new operator (>) and
presenting the relations always in S-R-E order.
The following table illustrates the original Reichenbach’s relations, the unambiguous relations
proposed by Song and Cohen, and their corresponding English tenses.
Table1. Tense Temporal Relations
Reichenbach
relations
Unambiguous
relations
English
tense
example
E<R<S (Anterior Past) S>R>E Past Perfect I had played
E=R<S (Simple Past) S>R=E Past Simple I played
R<E<S* (Posterior Past) S>R<E [I expect] I would
play
R<S=E*
R<S<E*
E<S=R (Anterior
Present)
S=R>E Present Perfect I have played
S=R=E (Simple Present) S=R=E Present Simple I play
S=R<E (Posterior
Present)
S=R<E Future Simple I will play
S=E<R*
E<S<R*
S<E<R* (Anterior
Future)
S<R>E Future Perfect I will have played
S<R=E (Simple Future) S<R=E Future Simple I will play
S<R<E (Posterior
Future)
S<R<E I shall be going to
play
* ambiguous relations
4.1.2. Aspect
Aspect is the second device expressing time in natural language. Two types of aspects are
expressed in language namely grammatical aspect and lexical aspect. Grammatical aspect
expresses the viewpoint from which a particular eventuality is described, it indicates the phase in
which an eventuality is to be perceived. While lexical aspect, distinguishes between different
subclasses of events based on its following temporal properties: dynamicity, telicity and
durativity.
In the literature of Aspect, Vendler’s work [29] has been the basis for subsequent researchers.
Vandler proposes an initial distinction between events and states and then classifies the event
expressions into three aspectual categories or Aktionsarten: activities, accomplishments, and
achievements.
Activities are events which are durative, or extended in time, but that do not involve an explicit
end point. In other words, activities are durative and atelic events. In the other hand,
8. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
8
accomplishments (durative culminated process) are events that imply a duration with a definite
end point in which a state changes. Finally, achievements (non-durative culminations) are
punctual or instantaneous events that do not imply a duration, happening at a defined point in
which a state changes.
4.2. Temporal Reasoning
Temporal Reasoning involves the temporal representation of events and their temporal anchoring
within natural language text. This topic has attracted great attention due to its potential
applications in several NLP tasks like Summarization, Question Answering and so on.
The state-of art of Temporal Reasoning proposes several constraint based models. In a constraint
based model, the temporal information is represented as a temporal constraint network (TCN) in
which the events are denoted by nodes and the ordering constraints between events are denoted
by edges. In this case, reasoning about time becomes a Temporal Constraint Satisfaction Problem
(TCSP).
Different TCNs are defined depending on the representation of the temporal entity as time
intervals, durations or points, and the class of constraints namely qualitative, quantitative, metrics
or its combination [31]. Several researches fall into this framework [26] [32] [33].
James Allen developed an Interval Algebra which provides a conceptual model of time that
captures the different ways in which eventualities may be related to each other. Allen considers
that every temporal expression or event can be presented as a temporal interval having a start
point and an end point on a timeline. In this concern, Allen defines a set of thirteen basic (binary)
interval relations, where six are inverses of the other six, excluding equality. The following figure
represents those intervals.
Table2. Allen’s Temporal Interval Relations [26]
9. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
9
In Allen’s model, temporal relations between the intervals are maintained in a network, where the
nodes represent individual intervals and the arcs represent the relationship between them.
Figure1. Allen’s interval-relation network [26]
Filatova and Hovy [32] also propose a model for temporal relations representation known as the
time stamping of events. This model serves for arranging the contents of news stories into a time-
line. This procedure consists on assigning a calendrical time point or interval to all events in a
text. However, this model doesn’t capture information in many cases and sometimes loses
information or misinterpret.
Setzer and Gaizauskas [33] propose a simpler representation able to capture more information.
It’s a Time-Event Graph Representation used for identifying event-event relationship with event-
time relationship. This representation is based on the Allen algebra. In fact, after reducing all
events and temporal expressions to intervals and after identifying the temporal relations between
them, the temporal information in a text can be represented as a graph where events and temporal
expressions form the nodes, and the edges are labeled with the temporal relations between them.
4.3. Temporal structure of discourse
The context of a discourse is required to reach the correct language comprehension. In fact,
sentences are usually interpreted in context instead of isolation. However, the context in which a
sentence is uttered and the situation of the speakers, is important.
In this concern, Kamp proposes an influential work on Discourse Representation Theory (DRT)
[34]. This work introduced a new level of mental representations, known as the discourse
representation structures (DRSs). In fact, authors consider that a hearer of a given discourse
builds up a mental representation for each sentence of this discourse in a cumulative way. This
theory gives the possibility to interpret sentences in the context of a discourse, rather than
sentences in isolation using a construction procedure extending a given DRS. Other remarkable
work on the temporal structure of discourse is the Dowty’s temporal discourse interpretation
principle (TDIP) [35]. Another well known work on temporal structure of discourse is Webber’s
theory of anaphoric reference [36]. In this work, Webber develops an account for noun-phrase
anaphora and shows how tensed clauses can be treated as anaphors like noun phrases. This work
is also based on the Reichenbach's theory for tense. We can also mention the Hwang and
Schubert’s Tense Trees [37]. Authors tried to consider the compositional semantics of complex
sentences. For this purpose, they automatically convert the logical form of sentences into a
temporal structure as a tense tree. Other works on temporal structure of discourse are interesting
10. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
10
namely the approach proposed by ter Meulen [38] about the dynamic aspect trees (DATs), the
Reference Point (Rpt) work presented by Kamp and Reyle [39] and the temporal conceptual
graphs proposed by Moulin [40].
4.4. Discussion
Even though the presented works highlighted a variety of issues and proposed many solutions for
understanding and representing temporal information, they were criticized because they were not
analyzed and evaluated over real linguistic data.
In this concern, it’s worth noting that Temporal Information Processing and Natural Language
Processing in general have undergone a mutation from the rationalist strategy based on such
formal theories of language analysis to an empiricist strategy based on the analysis of real
language use (i.e., textual corpora) [42]. This mutation has been accompanied by the development
of computational semantic models and the establishment of evaluation frameworks to annotate
temporal information in a corpus based vision. The next section highlights this new trend.
5. CURRENT ISSUES IN TEMPORAL INFORMATION
EXTRACTION
As mentioned in the previous section, Temporal Annotation has become in the last decade the
most prominent issue in the Temporal Information Processing area, especially with the
development of annotation schemes and annotated corpora. With this new trend, researches are
focusing on two tasks: in one hand automatically recognizing and extracting temporal entities in
narrative (namely temporal expressions and events), and in the other hand on discovering
temporal relations between these entities and inferring the type of each recognized one.
Temporal entity recognition
This task consists on identifying and extracting temporal entities form natural language texts. For
events, this means finding which textual entity constitutes an event. For example, in the sentence
“she bought 15% of the shares”, Filatova and Hovy [32] consider that the entire clause represents
an event. While for Pustejovsky and al, the appropriate span is just the verb group or just the head
of the verb group (“bought” in this example) [43]. Furthermore, event recognition involves the
identification of some attributes related to each event. Such attributes depend on the topic being
studied. Pustejovsky and al define five attributes in their TimeBank news corpus [44] (tense,
aspect, modality, polarity and class). Several approaches were proposed to identify events in
natural language contents. Faiz propose an approach to identify relevant sentences in news
articles for event information extraction [45]. Later, ElKhlifi and Faiz propose a new automatic
approach able to annotate events in News texts [46].
For temporal expressions, entity recognition amounts to distinguish its different types (times,
dates, durations, frequencies) and to find its corresponding values.
Temporal relations identification
Besides temporal entities, natural language texts contain other devices expressing logical relations
between times and events or between events and events. The temporal relation identification task
aims to recognize such relations and to infer the type of each recognized one.
Several researches fall into this task. [26, 32, 33].
Annotation schemes and annotated corpora
11. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
11
Handling temporal annotation requires the development of annotations schemes and the
construction of annotated corpora. In this concern, a lot of ongoing researches are focusing on the
development of annotations schemes to extract, model and interpret temporal information in
natural language texts [42].
The most prominent temporal annotation schemes are: MUC-TIMEX proposed by Grishman and
Sundheim in 1996 [47], TIDES proposed by Ferro and al in 2001 [48], STAG developed by
Setzer and Gaizauskas in the same year [49] and finally the TimeML annotation scheme
developed by Pustejovsky and al in 2003 [43]. All of them follow a SGML/XMLbased annotation
format.
TimeML (Time Markup Language) is the latest annotation scheme. It was developed under the
sponsorship of ARDA as the natural evolution of STAG. This new scheme is considered to be a
combination and an extension of preceding schemes, which makes it the most complete and
potent one. TimeML has recently been standardized to an ISO international standard for temporal
information markup, ISO-TimeML (ISO-TimeML, 2007). Both the TimeML and the ISO-
TimeML annotation standards define the following basic XML tags: <EVENT> for the
annotation of events, <TIMEX3> for the annotation of time expressions, <SIGNAL> for locating
textual elements that indicate a temporal relation, and the tags <TLINK>, <SLINK> and
<ALINK> that capture different types of relations.
Regarding the construction of corpora, a corpus is seen as a collection of natural language data
contents organized according to a set of criteria with the aim of conducting specific research.
These contents are often enriched with descriptive or analytic annotations, in order to explain a
particular linguistic phenomenon.
In terms of temporally annotated corpora, several corpora have been developed within the
framework of the TimeML annotation scheme. TimeBank 1.1, TimeBank 1.2 and the AQUAINT
corpora are all made up of journalistic texts in English and annotated with the TimeML
annotation language.
TimeBank 1.2 [44] has become somewhat of a reference for the study of temporal information
within the Computational Linguistics community. It’s the last major TimeML annotated corpus
and it has become the main reference for temporal annotation in English. It’s also used as the
basis for the TempEval evaluation campaigns’ corpora.
After getting insights on current issues of Temporal Information Extraction, it’s worth exposing
some real world applications.
6. REAL-WORLD APPLICATIONS
Extracting Temporal Information from texts is a key problem for many real-world applications
like Web Search, Medical-Records, Legal Reasoning, Accounting, Banking, Reservation
Systems, and Accident Reports. Examples from these applications are presented in the next
paragraphs.
Web Search
Query log analysis is currently an active topic of research. In fact, there is significant and
growing number of contributions to the understanding of online user behavior. In this concern,
temporal information has received little attention by researches who want to understand and
characterize users’ behavior online. Researchers try to identify and characterize the use of
temporal expressions in web search queries. Nunes and al, led a query log analysis to investigate
12. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
12
the use of temporal expressions in web search queries [50]. They find that temporal expressions
are rarely used (1.5% of queries) and, when used, they are related to current and past events.
Also, there are specific topics where the use of temporal expressions is more visible
Medical Domain
Temporal Information Systems are lately used in several medically-related problems and tasks
such as diagnosis, monitoring and longitudinal guideline-based care. Several applications in the
medical domain are used to extract information about times of clinical investigations (X-rays,
ultrasounds, etc.). In this concern, Angus and al, propose an algorithm to extract temporal
relations between temporal expressions and clinical investigation events form clinical narratives
[51].
Legal Reasoning
In legal domain, extracting temporal information has a great utility to facilitate the work of
lawyers. Several applications have been proposed namely the automatic temporal ordering of
legal documents in a time line, or the automatic retrieval of a specific event mentioned in various
legal documents according to temporal constraints that may be associated with this event.
Schilder presents a prototype system that extracts events from the United States Code on U.S.
immigration nationality and links these events to temporal constraints [52].
Financial Accounting
Many applications in the accountancy field have used temporal information to solve several
issues. Fisher presents a prototype system to support the temporal reconstruction of financial
accounting standards (FASs) [53]. This prototype enables a user to specify an FAS along with a
date, and permits a dynamic and continuing codification of a particular FAS.
7. CONCLUSION
Temporal Information Extraction is a recent area gaining more interest among researchers. Our
literature review reveals the existence of two trends in the domain. In one hand, classical
Temporal Information Processing works are arising from the intersection of linguistics,
philosophy and symbolic AI. In the other hand, the current works on Temporal Information
Extraction are based on annotation schemes and annotated corpora. These works handle several
interrelated issues namely the automatic recognition of temporal entities and the temporal
relations identification. This area of research is attracting researchers in several domains, which
explain its exploitation in several real applications.
REFERENCES
[1] X.Ling & D. S .Weld, (2010), “Temporal Information Extraction”. In Proceedings of the Twenty-
Fourth Conference on Artificial Intelligence (AAAI 2010). Atlanta, GA.
[2] M. Kufman, (1993) “Proceeding of the Fifth Message Understanding Conference, (MUC-5)”.
[3] B. M. Sundheim, (1991). “Overview of the third message understanding evaluation and conference”.
In Proceedings of the Third Message Understanding Conference (MUC-3), pp 3–16, San Diego, CA.
[4] R. Gaizauskas & Y. Wilks, (1998). “Information Extraction: Beyond Document Retrieval.
Computational Linguistics and Chinese Language Processing. vol. 3, no. 2, August 1998, pp 17-60.
[5] S. Sarawagi, (2008) “Information extraction”. Foundations and Trends in Databases, 1:261–377.
13. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
13
[6] C. Plake, T. Schiemann, M. Pankalla, J. Hakenberg & U. Leser, (2006). “Alibaba: Pubmed as
agraph,” Bioinformatics, vol. 22, pp. 2444–2445.
[7] P. DeRose, W. Shen, Y. Lee, D. Burdick, A. Doan & R. Ramakrishnan, (2007) “Dblife: A
community information management platform for the database research community (demo),” in
CIDR, pp. 169–172.
[8] E. Riloff, (1993)” Automatically constructing a dictionary for information extraction tasks”. In
Proceedings of the 11th National Conference on Artificial Intelligence, pp 811–816.
[9] D. E. Appelt, J. R. Hobbs, J. Bear, D.J. Israel & M. Tyson, (1993). “Fastus: A finite-state processor
for information extraction from real-world text,” in IJCAI, pp 1172–1178.
[10] H. Cunningham, D. Maynard, K. Bontcheva & V. Tablan, (2002) “Gate: A framework and graphical
development environment for robust nlp tools and applications”. In Proceedings of the 40th
Anniversary Meeting of the Association for Computational Linguistics.
[11] E. Agichtein & V. Ganti, (2004). “Mining reference tables for automatic text segmentation”. In
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, Seattle, USA.
[12] K. Seymore, A. McCallum & R. Rosenfeld, (1999) “Learning Hidden Markov Model structure for
information extraction,” in Papers from the AAAI- 99 Workshop on Machine Learning for
Information Extraction, pp 37–42.
[13] Zhang, N. R, (2001) “Hidden Markov Models for Information Extraction”.
[14] A. Borthwick, J. Sterling, E. Agichtein & R. Grishman, (1998 )“Exploiting diverse knowledge
sources via maximum entropy in named entity recognition,” in Sixth Workshop on Very Large
Corpora New Brunswick, New Jersey, Association for Computational Linguistics.
[15] A. McCallum, D. Freitag & F. Pereira, (2000 ) “Maximum entropy markov models for information
extraction and segmentation,” in Proceedings of the International Conference on Machine Learning
(ICML-2000), pp 591–598, Palo Alto, CA.
[16] R. Malouf, (2002) “A comparison of algorithms for maximum entropy parameter estimation,” in
Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002), pp 49–55.
[17] F. Peng & A. McCallum, (2004) “Accurate information extraction from research papers using
conditional random fields,” in HLT-NAACL, pp 329–336.
[18] M. Califf & R. Mooney, (2003) “Bottom-up Relational Learning of Pattern Matching Rules for
Information Extraction”.
[19] B. Marthi, B. Milch, & S. Russell, (2003) “First-order probabilistic models for information
extraction,” in Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from
Relational Data (SRL-2003), (L. Getoor and D. Jensen, eds.), pp 71–78, Acapulco, Mexico.
[20] Y. Choi, C. Cardie, E. Riloff, & S. Patwardhan, (2005) “Identifying sources of opinions with
conditional random fields and extraction patterns,” in HLT/EMNLP.
[21] E. Bach, (1986) “The Algebra of Events”. chapter 9, pp 5–16. Linguistics and Philosophy.
[22] N. Asher, (1993) “Reference to abstract objects in discourse.” Kluwer Academic Publishers,
Dordrecht.
[23] A. Bittar, (2010) “Building a TimeBank for French: A Reference Corpus Annotated According to the
ISO- TimeML Standard.” PhD thesis, Université Paris Diederot EFR de Linguistique, école doctorale
de sciences du langage, laboratoire ALPAGE.
[24] R.Sauri, J. Pustejovsky & J. Littman, (2006) “Argument structure in TimeML”. In G. Katz, J.
Pustejovsky & F. Schilder, Annotating, Extracting and Reasoning about Time and Events, Dagstuhl
Seminar Proceedings, Dagstuhl, Germany. Internationales Begegnungs- und Forschungszentrum für
Informatik (IBFI), Schloss Dagstuhl, Germany.
[25] R. Longacre, (1983) “The Grammar of Discourse”. New York. Plenum Press.
[26] J. Allen, (1983), “Maintaining knowledge about temporal intervals”. Commun. ACM, 26(11), pp,
832–843.
[27] I. Mani, J. Pustejovsky & R. Gaizauskas, (2005) “The Language of Time. A Reader” Oxford
University Press.
[28] H. Reichenbach, (1947) “The tenses of verbs”. Elements of Symbolic Logic. New York. Macmillan
& Co.
[29] Z. Vendler, (1967) “Verbs and Times”, chapter 4, pp 97–121. Cornell University Press, Ithaca, New
York.
[30] F. Song & R. Cohen, (1991) “Tense interpretation in the context of narrative”. In AAAI, pp. 131–136.
14. International Journal on Natural Language Computing (IJNLC) Vol. 1, No.2, August 2012
14
[31] S. K. Sanampudi & G. Kumari, (2010) “Temporal reasoning in natural language processing: A
survey.’ International Journal of Computer Applications, 1(4), pp53–57.
[32] E. Filatova & E. Hovy, (2001) “Assigning time-stamps to event-clauses”. In Katz, G., Pustejovsky, J.,
and Schilder, F., editors, the workshop on Temporal and spatial information processing, Dagstuhl
Seminar Proceedings, Morristown, NJ, USA, Association for Computational Linguistics.
[33] A. Setzer & R. Gaizauskas, (2002.) “On the importance of annotating temporal event-event relations
in text”. In the Third International Conference on Language Resources and Evaluation. LREC2002.
[34] H. Kamp, (1981) “A theory of truth and semantic representation”. In Formal methods in the study of
language.
[35] D.R. Dowty, (1986) “The effects of aspectual class on the temporal structure of discourse: “Semantics
or pragmatics?”, chapter 9, pp 37–61. Linguistics and Philosophy.
[36] Webber, B. L, (1988), “Tense as discourse anaphor”. Chapter 14, pp 61–73. Linguistics and
Philosophy.
[37] C. H. Hwang & L. K. Schubert, (1992) “Tense trees as the “fine structure” of discourse”. In the 30th
annual meeting on Association for Computational Linguistics, Morristown, NJ, USA, ACL.
[38] A. F. ter Meulen, (1995) “Representing time in natural language: the dynamic interpretation of tense
and aspect”. Mit Press, Cambridge.
[39] H. Kamp & U. Reyle, (1993) “From Discourse to Logic. Introduction to Model-theoretic Semantics
of Natural Language, Formal Logic and Discourse Representation Theory.” Studies in Linguistics and
Philosophy. Dordrecht, The Netherlands.
[40] B. Moulin, (1997) “Temporal contexts for discourse representation: An extension of the conceptual
graph approach”. Applied Intelligence, 7 (3), pp 227–255.
[41] C. D. Manning, & H. Schutze, (1999) “Foundations of statistical natural language processing”. Mit
Press,Cambridge.
[42] J. Pustejovsky, J. Castano, R. Ingria, R. Sauri, R. Gaizauskas, A. Setzer & G. Katz, (2003a)
“TimeML: Robust specification of event and temporal expressions in text”, In Fifth International
Workshop on Computational Semantics. IWCS-5.
[43] J. Pustejovsky, R. Hanks, A. Sauri, R. See, A. Gaizauskas, A. Setzer, B. Radev, D. Sundheim, L. Day,
L. Ferro & M. Lazo, (2003b ) “The timebank corpus”. In Corpus Linguistics. Lancaster.
[44] R. Faiz, (2006) “Identifying relevant sentences in news articles for event information extraction”.
International Journal of Computer Processing of Oriental Languages (IJCPOL), World Scientific,
Vol. 19, No. 1, pp. 1-19.
[45] A. Elkhlifi & R. Faiz, (2009) “Automatic Annotation Approach of Events in News Article”,
International Journal of Computing and Information Sciences (IJCIS), Vol.7, N°1, pp. 19-28.
[46] R. Grishman B. Sundheim, (1996) “Message understanding conference-6: A brief history”. In the
16th Conference on Computational Linguistics, pp466–471, USA, Morristown, NJ, 1996. Association
for Computational Linguistics.
[47] L. Ferro, I. Mani, B. Sundheim & G. Wilson, (2001) “Tides temporal annotation guidelines”.
Technical Report 1.0.2, The MITRE Corporation, Washington C3 Center McLean, Virginia.
[48] A. Setzer & R. Gaizauskas, (2000) “Annotating events and temporal information in newswire texts”.
In the Second International Conference on Language Resources and Evaluation, pp1287–1294.
LREC2000.
[49] S. Nunes, C. Ribeiroand & G. David, (2008) “Use of temporal expressions in web search”. In ECIR
2008, pp 580–584. Springer-Verlag Berlin Heidelberg.
[50] A. Roberts, R. Gaizauskas & M. Hepple, (2008) “Extracting clinical relationships from patient
narratives”, Proceedings of the Workshop on Current Trends in Biomedical Natural Language
Processing BioNLP 2008, pp 10–18, Columbus, Ohio, USA.
[51] F. Schilder, (2007) “Event extraction and temporal reasoning in legal documents”. In the proceedings
of Annotating, Extracting and Reasoning about Time, pp59–71. LNAI Volume 4795/2007.
[52] I.E. Fisher, (2007) “A prototype system for temporal reconstruction of financial accounting
standards”, International Journal of Accounting Information Systems Volume 8, Issue 3, September
2007, pp139–164