We describe existing and anticipated future benefits
of an end-to-end methodology for annotating formal RDF
statements representing temporal knowledge to be extracted
from text, as well as for authoring and validating test and/or
application queries to exercise that knowledge. Extraction is
driven by a target ontology of temporal and domain concepts
supporting an intelligence analyst’s timeline tool. Both the tool
and the methodology are supported at several points by an
implemented temporal reasoning engine, in a way that we argue
ultimately advances machine reading technology by increasing
both sophistication and quality expectations about temporal
annotations and extraction.
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...Lifeng (Aaron) Han
"LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors"
Publisher: Association for Computational LinguisticsDecember 2012
Authors: Aaron Li-Feng Han, Derek F. Wong and Lidia S. Chao
Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, December 2012. Open tool https://github.com/aaronlifenghan/aaron-project-lepor
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
Proceedings of the ACL 2013 EIGHTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION (ACL-WMT 2013), 8-9 August 2013. Sofia, Bulgaria. Open tool https://github.com/aaronlifenghan/aaron-project-lepor & https://github.com/aaronlifenghan/aaron-project-hlepor(ACM digital library, ACL anthology)
The project is developed as a part of IRE course work @IIIT-Hyderabad.
Team members:
Aishwary Gupta (201302216)
B Prabhakar (201505618)
Sahil Swami (201302071)
Links:
https://github.com/prabhakar9885/Text-Summarization
http://prabhakar9885.github.io/Text-Summarization/
https://www.youtube.com/playlist?list=PLtBx4kn8YjxJUGsszlev52fC1Jn07HkUw
http://www.slideshare.net/prabhakar9885/text-summarization-60954970
https://www.dropbox.com/sh/uaxc2cpyy3pi97z/AADkuZ_24OHVi3PJmEAziLxha?dl=0
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...Lifeng (Aaron) Han
"LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors"
Publisher: Association for Computational LinguisticsDecember 2012
Authors: Aaron Li-Feng Han, Derek F. Wong and Lidia S. Chao
Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, December 2012. Open tool https://github.com/aaronlifenghan/aaron-project-lepor
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
Proceedings of the ACL 2013 EIGHTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION (ACL-WMT 2013), 8-9 August 2013. Sofia, Bulgaria. Open tool https://github.com/aaronlifenghan/aaron-project-lepor & https://github.com/aaronlifenghan/aaron-project-hlepor(ACM digital library, ACL anthology)
The project is developed as a part of IRE course work @IIIT-Hyderabad.
Team members:
Aishwary Gupta (201302216)
B Prabhakar (201505618)
Sahil Swami (201302071)
Links:
https://github.com/prabhakar9885/Text-Summarization
http://prabhakar9885.github.io/Text-Summarization/
https://www.youtube.com/playlist?list=PLtBx4kn8YjxJUGsszlev52fC1Jn07HkUw
http://www.slideshare.net/prabhakar9885/text-summarization-60954970
https://www.dropbox.com/sh/uaxc2cpyy3pi97z/AADkuZ_24OHVi3PJmEAziLxha?dl=0
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013) pp. 215-222. Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATIONLifeng (Aaron) Han
Proceedings of the 16th International Conference of Text, Speech and Dialogue (TSD 2013). Plzen, Czech Republic, September 2013. LNAI Vol. 8082, pp. 121-128. Volume Editors: I. Habernal and V. Matousek. Springer-Verlag Berlin Heidelberg 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
This paper deals with Sentence Validation - a sub-field of Natural Language Processing. It finds various applications in
different areas as it deals with understanding the natural language (English in most cases) and manipulating it. So the effort is on
understanding and extracting important information delivered to the computer and make possible efficient human computer
interaction. Sentence Validation is approached in two ways - by Statistical approach and Semantic approach. In both approaches
database is trained with the help of sample sentences of Brown corpus of NLTK. The statistical approach uses trigram technique based
on N-gram Markov Model and modified Kneser-Ney Smoothing to handle zero probabilities. As another testing on statistical basis,
tagging and chunking of the sentences having named entities is carried out using pre-defined grammar rules and semantic tree parsing,
and chunked off sentences are fed into another database, upon which testing is carried out. Finally, semantic analysis is carried out by
extracting entity relation pairs which are then tested. After the results of all three approaches is compiled, graphs are plotted and
variations are studied. Hence, a comparison of three different models is calculated and formulated. Graphs pertaining to the
probabilities of the three approaches are plotted, which clearly demarcate them and throw light on the findings of the project.
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
This article introduces a methodology for analyzing sentiment in Arabic text using a global foreign lexical
source. Our method leverages the available resource in another language such as the SentiWordNet in
English to the limited language resource that is Arabic. The knowledge that is taken from the external
resource will be injected into the feature model whilethe machine-learning-based classifier is trained. The
first step of our method is to build the bag-of-words (BOW) model of the Arabic text. The second step
calculates the score of polarity using translation machine technique and English SentiWordNet. The scores
for each text will be added to the model in three pairs for objective, positive, and negative. The last step of
our method involves training the ML classifier on that model to predict the sentiment of the Arabic text.
Our method increases the performance compared with the baseline model that is BOW in most cases. In
addition, it seems a viable approach to sentiment analysis in Arabic text where there is limitation of the
available resource.
This paper presents a natural language processing based automated system called DrawPlus for generating UML diagrams, user scenarios and test cases after analyzing the given business requirement specification which is written in natural language. The DrawPlus is presented for analyzing the natural languages and extracting the relative and required information from the given business requirement Specification by the user. Basically user writes the requirements specifications in simple English and the designed system has conspicuous ability to analyze the given requirement specification by using some of the core natural language processing techniques with our own well defined algorithms. After compound analysis and extraction of associated information, the DrawPlus system draws use case diagram, User scenarios and system level high level test case description. The DrawPlus provides the more convenient and reliable way of generating use case, user scenarios and test cases in a way reducing the time and cost of software development process while accelerating the 70 of works in Software design and Testing phase Janani Tharmaseelan ""Cohesive Software Design"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22900.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/22900/cohesive-software-design/janani-tharmaseelan
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...kevig
This paper proposes a deep learning model, StockGram, to automate financial communications via natural language generation. StockGram is a seq2seq model that generates short and coherent versions of financial news reports based on the client's point of interest from numerous pools of verified resources. The proposed model is developed to mitigate the pain points of advisors who invest numerous hours while scanning through these news reports manually. StockGram leverages bi-directional LSTM cells that allows a recurrent system to make its prediction based on both past and future word sequences and hence predicts the next word in the sequence more precisely. The proposed model utilizes custom word-embeddings, GloVe, which incorporates global statistics to generate vector representations of news articles in an unsupervised manner and allows the model to converge faster. StockGram is evaluated based on the semantic closeness of the generated report to the provided prime words.
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...ijnlc
This paper proposes a deep learning model, StockGram, to automate financial communications via natural language generation. StockGram is a seq2seq model that generates short and coherent versions of financial news reports based on the client's point of interest from numerous pools of verified resources. The proposed model is developed to mitigate the pain points of advisors who invest numerous hours while
scanning through these news reports manually. StockGram leverages bi-directional LSTM cells that allows a recurrent system to make its prediction based on both past and future word sequences and hence predicts the next word in the sequence more precisely. The proposed model utilizes custom word-embeddings, GloVe, which incorporates global statistics to generate vector representations of news articles in an unsupervised manner and allows the model to converge faster. StockGram is evaluated based on the semantic closeness of the generated report to the provided prime words.
Classification of Machine Translation Outputs Using NB Classifier and SVM for...mlaij
Machine translation outputs are not correct enough to be used as it is, except for the very simplest
translations. They only give the general meaning of a sentence not the exact translation. As Machine
Translation (MT) is gaining a position in the whole world, there is a need for estimating the quality of
machine translation outputs. Many prominent MT-Researchers are trying to make the MT-System, that
produces very good and accurate translations and that also covers maximum language pairs. If good
translations out of all translations can be categorized then the time and cost can be saved to a great extent.
Now, Good quality translations will be sent for post-editing and rest will be sent for pre-editing or
retranslation. In this paper, Kneser Ney smoothing language model is used to calculate the probability of
machine translated output. But a translation cannot be said good or bad. Based on its probability score
there are many other parameters that effect its quality. The quality of machine translation is made easier to
estimate for post-editing by using two different predefined famous algorithms for classification.
A Survey of Various Methods for Text SummarizationIJERD Editor
Document summarization means retrieved short and important text from the source document. In this paper, we studied various techniques. Plenty of techniques have been developed on English summarization and other Indian languages but very less efforts have been taken for Hindi language. Here, we discusses various techniques in which so many features are included such as time and memory consumption, efficiency, accuracy, ambiguity, redundancy.
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
Following are the questions which I tried to answer in this ppt
What is text summarization.
What is automatic text summarization?
How it has evolved over the time?
What are different methods?
How deep learning is used for text summarization?
business application
in first few slides extractive summarization is explained, with pro and cons in next section abstractive on is explained.
In the last section business application of each one is highlighted
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013) pp. 215-222. Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATIONLifeng (Aaron) Han
Proceedings of the 16th International Conference of Text, Speech and Dialogue (TSD 2013). Plzen, Czech Republic, September 2013. LNAI Vol. 8082, pp. 121-128. Volume Editors: I. Habernal and V. Matousek. Springer-Verlag Berlin Heidelberg 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
This paper deals with Sentence Validation - a sub-field of Natural Language Processing. It finds various applications in
different areas as it deals with understanding the natural language (English in most cases) and manipulating it. So the effort is on
understanding and extracting important information delivered to the computer and make possible efficient human computer
interaction. Sentence Validation is approached in two ways - by Statistical approach and Semantic approach. In both approaches
database is trained with the help of sample sentences of Brown corpus of NLTK. The statistical approach uses trigram technique based
on N-gram Markov Model and modified Kneser-Ney Smoothing to handle zero probabilities. As another testing on statistical basis,
tagging and chunking of the sentences having named entities is carried out using pre-defined grammar rules and semantic tree parsing,
and chunked off sentences are fed into another database, upon which testing is carried out. Finally, semantic analysis is carried out by
extracting entity relation pairs which are then tested. After the results of all three approaches is compiled, graphs are plotted and
variations are studied. Hence, a comparison of three different models is calculated and formulated. Graphs pertaining to the
probabilities of the three approaches are plotted, which clearly demarcate them and throw light on the findings of the project.
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
This article introduces a methodology for analyzing sentiment in Arabic text using a global foreign lexical
source. Our method leverages the available resource in another language such as the SentiWordNet in
English to the limited language resource that is Arabic. The knowledge that is taken from the external
resource will be injected into the feature model whilethe machine-learning-based classifier is trained. The
first step of our method is to build the bag-of-words (BOW) model of the Arabic text. The second step
calculates the score of polarity using translation machine technique and English SentiWordNet. The scores
for each text will be added to the model in three pairs for objective, positive, and negative. The last step of
our method involves training the ML classifier on that model to predict the sentiment of the Arabic text.
Our method increases the performance compared with the baseline model that is BOW in most cases. In
addition, it seems a viable approach to sentiment analysis in Arabic text where there is limitation of the
available resource.
This paper presents a natural language processing based automated system called DrawPlus for generating UML diagrams, user scenarios and test cases after analyzing the given business requirement specification which is written in natural language. The DrawPlus is presented for analyzing the natural languages and extracting the relative and required information from the given business requirement Specification by the user. Basically user writes the requirements specifications in simple English and the designed system has conspicuous ability to analyze the given requirement specification by using some of the core natural language processing techniques with our own well defined algorithms. After compound analysis and extraction of associated information, the DrawPlus system draws use case diagram, User scenarios and system level high level test case description. The DrawPlus provides the more convenient and reliable way of generating use case, user scenarios and test cases in a way reducing the time and cost of software development process while accelerating the 70 of works in Software design and Testing phase Janani Tharmaseelan ""Cohesive Software Design"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22900.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/22900/cohesive-software-design/janani-tharmaseelan
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...kevig
This paper proposes a deep learning model, StockGram, to automate financial communications via natural language generation. StockGram is a seq2seq model that generates short and coherent versions of financial news reports based on the client's point of interest from numerous pools of verified resources. The proposed model is developed to mitigate the pain points of advisors who invest numerous hours while scanning through these news reports manually. StockGram leverages bi-directional LSTM cells that allows a recurrent system to make its prediction based on both past and future word sequences and hence predicts the next word in the sequence more precisely. The proposed model utilizes custom word-embeddings, GloVe, which incorporates global statistics to generate vector representations of news articles in an unsupervised manner and allows the model to converge faster. StockGram is evaluated based on the semantic closeness of the generated report to the provided prime words.
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...ijnlc
This paper proposes a deep learning model, StockGram, to automate financial communications via natural language generation. StockGram is a seq2seq model that generates short and coherent versions of financial news reports based on the client's point of interest from numerous pools of verified resources. The proposed model is developed to mitigate the pain points of advisors who invest numerous hours while
scanning through these news reports manually. StockGram leverages bi-directional LSTM cells that allows a recurrent system to make its prediction based on both past and future word sequences and hence predicts the next word in the sequence more precisely. The proposed model utilizes custom word-embeddings, GloVe, which incorporates global statistics to generate vector representations of news articles in an unsupervised manner and allows the model to converge faster. StockGram is evaluated based on the semantic closeness of the generated report to the provided prime words.
Classification of Machine Translation Outputs Using NB Classifier and SVM for...mlaij
Machine translation outputs are not correct enough to be used as it is, except for the very simplest
translations. They only give the general meaning of a sentence not the exact translation. As Machine
Translation (MT) is gaining a position in the whole world, there is a need for estimating the quality of
machine translation outputs. Many prominent MT-Researchers are trying to make the MT-System, that
produces very good and accurate translations and that also covers maximum language pairs. If good
translations out of all translations can be categorized then the time and cost can be saved to a great extent.
Now, Good quality translations will be sent for post-editing and rest will be sent for pre-editing or
retranslation. In this paper, Kneser Ney smoothing language model is used to calculate the probability of
machine translated output. But a translation cannot be said good or bad. Based on its probability score
there are many other parameters that effect its quality. The quality of machine translation is made easier to
estimate for post-editing by using two different predefined famous algorithms for classification.
A Survey of Various Methods for Text SummarizationIJERD Editor
Document summarization means retrieved short and important text from the source document. In this paper, we studied various techniques. Plenty of techniques have been developed on English summarization and other Indian languages but very less efforts have been taken for Hindi language. Here, we discusses various techniques in which so many features are included such as time and memory consumption, efficiency, accuracy, ambiguity, redundancy.
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
Following are the questions which I tried to answer in this ppt
What is text summarization.
What is automatic text summarization?
How it has evolved over the time?
What are different methods?
How deep learning is used for text summarization?
business application
in first few slides extractive summarization is explained, with pro and cons in next section abstractive on is explained.
In the last section business application of each one is highlighted
Haystax Technology Labs presentation of white-paper on advanced threat analytics at 9th International Semantic Technologies Intelligence for Defense and Security (STIDS)
Ability to control all aspects of a major event from one integrated cloud based platform. Proven in mission critical situations. Highly scalable, and includes real time monitoring, is directly actionable and integrates with existing systems.
Get to know Haystax Technology and what differentiates us from typical analytic vendors. Our solution is cloud based and focuses on real time data analytics.
Company Information
American Mfg Company is a quality manufacturer of replacement parts for the wide variety of mud pumps, centrifugal pumps, drilling rigs, and swivels found on the World market today.
American's manufacturing plant is located in beautiful St. Joseph, Minnesota in America's heartland. A highly trained staff uses state-of-the-art equipment and proven quality control methods to produce the finest pump replacement parts found in the market today.
Our pump parts are used in many different industries including the water well, oilfield, geothermal and horizontal directional drilling industries. Every part we make is manufactured to standards equal or exceeding those of the O.E.M.
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...University of Bari (Italy)
Studying, understanding and exploiting the content of a digital library, and extracting useful information thereof, require automatic techniques that can effectively support the users. To this aim, a relevant role can be played by concept taxonomies. Unfortunately, the availability of such a kind of resources is limited, and their manual building and maintenance are costly and error-prone. This work presents ConNeKTion, a tool for conceptual graph learning and exploitation. It allows to learn conceptual graphs from plain text and to enrich them by finding concept generalizations. The resulting graph can be used for several purposes: finding relationships between concepts (if any), filtering the concepts from a particular perspective, keyword extraction and information retrieval. A suitable control panel is provided for the user to comfortably carry out these activities.
The spread and abundance of electronic documents requires automatic techniques for extracting useful information from the text they contain. The availability of conceptual taxonomies can be of great help, but manually building them is a complex and costly task. Building on previous work, we propose a technique to automatically extract conceptual graphs from text and reason with them. Since automated learning of taxonomies needs to be robust with respect to missing or partial knowledge and flexible with respect to noise, this work proposes a way to deal with these problems. The case of poor data/sparse concepts is tackled by finding generalizations among disjoint pieces of knowledge. Noise is
handled by introducing soft relationships among concepts rather than hard ones, and applying a probabilistic inferential setting. In particular, we propose to reason on the extracted graph using different kinds of relationships among concepts, where each arc/relationship is associated to a number that represents its likelihood among all possible worlds, and to face the problem of sparse knowledge by using generalizations among distant concepts as bridges between disjoint portions of knowledge.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
Increasing interpreting needs a more objective and automatic measurement. We hold a basic idea that 'translating means translating meaning' in that we can assessment interpretation quality by comparing the
meaning of the interpreting output with the source input. That is, a translation unit of a 'chunk' named
Frame which comes from frame semantics and its components named Frame Elements (FEs) which comes
from Frame Net are proposed to explore their matching rate between target and source texts. A case study in this paper verifies the usability of semi-automatic graded semantic-scoring measurement for human
simultaneous interpreting and shows how to use frame and FE matches to score. Experiments results show that the semantic-scoring metrics have a significantly correlation coefficient with human judgment.
Automatic Extraction of Spatio-Temporal Information from Arabic Text Documentsijcsit
Unstructured Arabic text documents are an important source of geographical and temporal information.
The possibility of automatically tracking spatio-temporal information, capturing changes relating to events
from text documents, is a new challenge in the fields of geographic information retrieval (GIR), temporal
information retrieval (TIR) and natural language processing (NLP). There was a lot of work on the
extraction of information in other languages that use Latin alphabet, such as English,, French, or Spanish,
by against the Arabic language is still not well supported in GIR and TIR and it needs to conduct more
researches. In this paper, we present an approach that support automated exploration and extraction of
spatio-temporal information from Arabic text documents in order to capture and model such information
before it can be utilized in search and exploration tasks. The system has been successfully tested on 50
documents that include a mixture of types of Spatial/temporal information. The result achieved 91.01% of
recall and of 80% precision. This illustrates that our approach is effective and its performance is
satisfactory.
Increasing interpreting needs a more objective and automatic measurement. We hold a basic idea that
'translating means translating meaning' in that we can assessment interpretation quality by comparing the
meaning of the interpreting output with the source input. That is, a translation unit of a 'chunk' named
Frame which comes from frame semantics and its components named Frame Elements (FEs) which comes
from Frame Net are proposed to explore their matching rate between target and source texts. A case study
in this paper verifies the usability of semi-automatic graded semantic-scoring measurement for human
simultaneous interpreting and shows how to use frame and FE matches to score. Experiments results show
that the semantic-scoring metrics have a significantly correlation coefficient with human judgment.
This paper introduces the state-of-the-art machine translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency , adequacy, comprehension, and in-formativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories , including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms , textual entailment, paraphrase, semantic roles, and language models. Subsequently , we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT.
This paper differs from the existing works (Dorr et al., 2009; EuroMatrix, 2007) from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content. For latest version, please goto: https://arxiv.org/abs/1605.04515
Unsupervised Quality Estimation Model for English to German Translation and I...Lifeng (Aaron) Han
• Unsupervised Quality Estimation Model for English to German Translation and Its Application in Extensive Supervised Evaluation
o Hindawi Publishing Corporation
Authors: Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Liangye He and Yi Lu
The Scientific World Journal, Issue: Recent Advances in Information Technology. ISSN:1537-744X. SCIE, IF=1.73. http://www.hindawi.com/journals/tswj/aip/760301/
In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
prediction’s accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...IJwest
Ontology may be a conceptualization of a website into a human understandable, however machine-readable format consisting of entities, attributes, relationships and axioms. Ontologies formalize the intentional aspects of a site, whereas the denotative part is provided by a mental object that contains assertions about instances of concepts and relations. Semantic relation it might be potential to extract the whole family-tree of a outstanding personality employing a resource like Wikipedia. In a way, relations describe the linguistics relationships among the entities involve that is beneficial for a higher understanding of human language. The relation can be identified from the result of concept hierarchy extraction. The existing ontology learning process only produces the result of concept hierarchy extraction. It does not produce the semantic relation between the concepts. Here, we have to do the process of constructing the predicates and also first order logic formula. Here, also find the inference and learning weights using Markov Logic Network. To improve the relation of every input and also improve the relation between the contents we have to propose the concept of ARSRE. This method can find the frequent items between concepts and converting the extensibility of existing lightweight ontologies to formal one. The experimental results can produce the good extraction of semantic relations compared to state-of-art method.
Next generation analytics and cybersecurity solutions that takes a holistic approach to enabling, protecting, managing and supporting mission critical enterprise systems.
Applying advanced analytic techniques to enable rapid real-time enterprise threat intelligence and awareness. This presentation looks at how data + algorithms can help enterprises improve their overall threat posture.
Presentation delivered by Bryan Ware, CTO at Haystax Technology at The Research Board Symposium on Information Risk Management in NYC. This presentation provides an overview of the importance of this approach. Contact the author for a more detailed explanation of the approach.
Haystax's real time streaming analytics platform optimized for the cloud. This presentation was used during the IC Community Cloud Expo that was held in Langley, VA on Sept 10.
Overview of Haystax's Carbon threat detection and prioritization system that is both more effective than traditional rules based systems as well as improves results from traditional alerting systems in scenarios where noise to signal ratios are very high. Excellent for cyber and insider threat management solutions.
A cloud based application (no hardware or software to install on premise) that enables school districts to catalog and manage their schools. Includes schools master data management, assessments and monitoring for threats and social media topics.
Exploiting inference to improve temporal RDF annotations and queries for mach...Haystax Technology
We argue for time points with zero real-world
duration as a best ontological practice in point- and intervalbased
temporal representation and reasoning. We demonstrate
anomalies that unavoidably arise in the event calculus when realworld
time intervals corresponding to finest anticipated calendar
units (e.g., days or seconds, per application granularity) are taken
(naively or for implementation convenience) to be time “points.”
Our approach to eliminating the undesirable anomalies admits
durations of infinitesimal extent as the lower and/or upper
bounds that may constrain two time points’ juxtaposition.
Following Dean and McDermott, we exhibit axioms for temporal
constraint propagation that generalize corresponding naïve
axioms by treating infinitesimals as orthogonal first-class
quantities and we appeal to complex number arithmetic
(supported by programming languages such as Lisp) for
straightforward implementation. The resulting anomaly-free
operation is critical to effective event calculus application in
commonsense understanding applications, like machine reading.
Haystax carbon for Insider Threat Management & Continuous EvaluationHaystax Technology
Haystax Technology, Inc. provides next-generation intelligence and analytics solutions that deliver up to the minute situational awareness and actionable intelligence for the public and commercial sectors. Haystax uses a combination of software and human analysis to turn large, disparate and unstructured data volumes into comprehensive and actionable information. In essence, these technologies allow users to find “the needle in the haystack” quickly and reliably.
Haystax - Analytic Products and Enterprise Network ServicesHaystax Technology
Haystax Technology, Inc. provides next-generation intelligence and analytics solutions that deliver up to the minute situational awareness and actionable intelligence for the public and commercial sectors. Haystax uses a combination of software and human analysis to turn large, disparate and unstructured data volumes into comprehensive and actionable information. In essence, these technologies allow users to find “the needle in the haystack” quickly and reliably.
Haystax Technology, Inc. provides next-generation intelligence and analytics solutions that deliver up to the minute situational awareness and actionable intelligence for the public and commercial sectors. Haystax uses a combination of software and human analysis to turn large, disparate and unstructured data volumes into comprehensive and actionable information. In essence, these technologies allow users to find “the needle in the haystack” quickly and reliably.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
1.
Exploiting inference to improve temporal RDF
annotations and queries for machine reading
Robert C. Schrag
Haystax Technology
McLean, VA USA
bschrag@haystax.com
Abstract—We describe existing and anticipated future benefits
of an end-to-end methodology for annotating formal RDF
statements representing temporal knowledge to be extracted
from text, as well as for authoring and validating test and/or
application queries to exercise that knowledge. Extraction is
driven by a target ontology of temporal and domain concepts
supporting an intelligence analyst’s timeline tool. Both the tool
and the methodology are supported at several points by an
implemented temporal reasoning engine, in a way that we argue
ultimately advances machine reading technology by increasing
both sophistication and quality expectations about temporal
annotations and extraction.
Index Terms—temporal knowledge representation and
reasoning, extracting formal knowledge from text, machine
reading, annotation interfaces and validation
I. INTRODUCTION
Machine reading—that is, automatic extraction of formal
knowledge from natural language text—has been a
longstanding goal of artificial intelligence. Effective extraction
into RDF has the potential to make targeted knowledge
accessible in the semantic web. We recently supported a large-
scale evaluation of temporal knowledge extraction from text by
providing RDF/OWL ontology for target statements and a
corresponding reasoning engine for query answering. Along
the way, we discovered…
• How inference could improve annotation—the manual
extraction of formal temporal statements—and
question authoring for evaluation or for applications.
• How, coupled with annotation and question authoring
processes, inference could ultimately drive more
sophisticated machine reading capabilities.
II. TEMPORAL KNOWLEDGE REPRESENTATION AND
REASONING FOR TIMELINE DEVELOPMENT
Our temporal logic is based loosely on the event calculus
[10], as follows.
A time interval is a convex collection of time points—
intuitively, an unbroken segment of a time line. Time intervals
begin and end with time points, which may be constrained
relative to each other or relative to a calendar. The ontology
includes a rich set of relational properties over time points and
intervals, and the reasoning engine will calculate tightest
inferable bounds between any two points and will detect
contradictory time point relation sets.
A fluent is an object-level, domain statement (e.g., FluentA:
attendsSchool(Jansa
LubljanaUniversity)) whose truth value is
a function of time. It is taken to be true at time points where it
holds and not to be true at time points where it does not hold.
We reify a fluent in an observation—a meta-level statement
whose object is a fluent, whose subject is a time interval, and
whose predicate is a holds
property (e.g.,
holdsThroughout(FluentA
Interval1), when FluentA
is observed
over Interval1, corresponding to, say, September, 1980).
The events of interest to us, which we call transition events,
occur at individual time points and may cause one or more
fluents to change truth value. We represent events (like the
birth of Jansa) as objects with attribute properties like agent,
patient, and location, and we relate events to time intervals with
an occurs
property (e.g., occursAt(BirthOfJansa
Point2), where
Point2
is associated with an interval corresponding to the date
September 17, 1958). As usual with the event calculus, such
events can initiate fluents (e.g., occursAt(BirthOfJansa
Point2)
initiates FluentB:
alive(Jansa
Interval3), where Interval3
is
begun by Point2) or terminate them (e.g., DeathOfJansa… ).
The temporal reasoning engine implements appropriate axioms
to perform fluent initiation and termination.
Note that an observer may report information about the
temporal extent of a fluent without communicating anything
about initiation or termination. E.g., if text says Abdul and
Hasan lived next door to each other in Beirut in 1999, we don’t
know when Abdul or Hasan may have moved to or from
Beirut. When text says Abdul moved to Beirut in 1995 and
emigrated in 2007, we use the properties clippedBackward
and
clippedForward
regarding the fluent residesInGPE-‐-‐-‐spec(Abdul
BeirutLebanon)
to indicate initiation and termination by
anonymous (unrepresented) transition events, so that we can
also initiate or terminate temporally under-constrained like-
fluent observations (e.g. Abdul lived in Beirut during the
1990s).
The reasoning engine’s implementation, using
AllegroGraph, Allegro Prolog, and Allegro Common Lisp from
Franz, Inc., can answer any conjunctive query. While not yet
heavily optimized, it is at least fast enough to support machine
reading system evaluation over newspaper articles where cross-
document entity co-reference is not required.
2.
The combined extraction and reasoning capability was
conceived to support an intelligence analyst’s timeline tool in
which a GUI would be populated with statements about entities
(e.g., persons) of interest extracted from text. Our evaluation
of machine reading capabilities was based on queries similar to
those we would have expected from such a tool’s API. It also
supposed the analysts could formulate their own, arbitrary
questions, such as Query 1: Find all persons who were born in
Ljubljana in the 1950s and attended Ljubljana University in the
1980s, the titles that they held, the organizations in which they
held these titles, and the maximal known time periods over
which they attended and held these titles.
III. LESSONS LEARNED AND REALIZED IN IMPLEMENTATION
This indirect, query answering style of machine reading
evaluation makes it especially important that we perform
effective quality control of formal queries in the context of the
formal statements we expect to be extracted from answer-
bearing documents. We thus developed the test query
validation approach illustrated in Figure 1. Considering Query
1’s formalization (see Figure 10 in section IV.B), it’s worth
noting that we used the methodology illustrated here to debug a
number of subtle errors occurring in our earlier (manual)
formulations. When each such formulation did not result in the
answers expected, we traced inference to identify a point of
failure, corrected this, and then iterated until correct.
Our machine reading technologists told us early on that
they preferred macro-level relational interfaces that would
streamline away micro-level details of time points and
intervals. We thus provide a language of flexible specification
strings (spec strings) that expand to create time points,
intervals, and relations inside our reasoning engine. We also
provide ontology to associate the temporal aspects Completed,
Ongoing, and Future
with fluents (e.g., he used to live there vs.
he is living there vs. he is going to live there) and with the
reporting dates of given articles, to afford a relational interface
reasonably close in form to natural language sources. For the
Completed
or Future
aspects, we also can capture quantitative
lag or lead information (e.g., he lived there five years ago or he
will move in five days from now).
IV. LESSONS LEARNED WITH IMPLEMENTATION PROPOSED
Here, we propose some further approach elements that we
expect to lead to high-quality temporal annotations,
including…
A. Interfaces and workflows deliberately designed to
support capture of all statements specified as
extraction targets (see section A)
B. Graphical time map display including fluents and
events (see section B)
• On-line inference to elucidate inter-
relationships and potential contradictions
• Visual feedback to let users help assure quality
themselves
• Time map-based widgets supporting user
knowledgeentry
C. Technology adaptable to test or application question
authoring (see section C)
D. Quantitative temporal relation annotation evaluation
(see section V.A).
A. Annotationworkflows
Fluents are simple statements that we can readily represent
in RDF. The example in Figure 2 focuses on the fluent about
one Janez Jansa attending Ljubljana University—only on the
fluent, not the full observation including temporal information
(i.e., only that Jansa attends the school, not when). The
technology needed to annotate such information is well
understood and (excepting perhaps the last bullet about
modality) has been well enough exercised that we may
routinely expect good results. This includes multi-frame GUIs
where a user can produce stand-off annotations by highlighting
text and by clicking in drop-down boxes select relations,
classes, and instances. In part because these tools have
preceded reading for formal knowledge extraction, they may
not use our intended representation internally—i.e., they may
for historical reasons internally use a representation (e.g., XML
that is not RDF) tailored to linguistic phenomena rather than
associated with any formal ontology.
Figure 1. We validate test queries by making sure that natural language (NL) and formal knowledge representation (KR)
versions of documents, queries, and answers agree, diagnosing and debugging as necessary.
Diagnose KR&R issues.
Apply temporal reasoning engine.
Formalize.
Compareanswers.
Given NL… Document Query Answer(s)
Produce
KR…
Manually author selected
statements to support
expected inference.
Formalize
query.
Execute
query to
get results.
3.
…Jansa graduated from
Ljubljana University…
Sourcetext
Formalization
1. Select relation.
2. Specify argument identifiers, respecting
co-reference.
3. Select / highlight / designate
corresponding text.
4. Capture any counter-factual modality info.
attendsSchool(Janez_Jansa Ljubljana_University)
Figure 2. The workflow to annotate a fluent
Dec 28, 2007…
Reporting
date
1. Select one of time interval or point.
2. Capture any beginning date and backward
clippinginfo.
3. Capture any ending date and forward
…Jansa graduated from
Ljubljana University in
1984…
clippinginfo.
4. Capture any duration info.
5. If ending point is unconstrained w.r.t.
reportingdate:
a. Capture reporting aspect.
b. Capture any reporting lag info.
6. Capture any other relative temporal info
available.
attendsSchool(Janez_Jansa Ljubljana_University)
Fluentclippedforward
atending point
[,1984-12-31]
1984
[1984-01-01,1984-12-31]
Enteredinfo (userwrites)
Inferredinfo (userreads)
Figure 3. The workflow to annotate a holds statement (a fluent observation)
Capturing the temporal information associated with the
given observation of a fluent in a holds
statement requires
following a sequence of actions and decisions in a deliberately
designed workflow, as outlined in Figure 3. We have
highlighted, by color- and typeface-coding, some temporal
elements in the source text, along with corresponding steps in
the workflow and elements of the associated graphical
representation.
Addressing the workflow step by step, we see that:
• There is no reason to believe Jansa attended school for
only one day (the time unit associated with a time
“point” in our machine reading evaluation), so we
choose a time interval (and the predicate
holdsThroughout) rather than a time point (and
holdsAt). Schrag [8] argues that holdsAt
almost never
is appropriate, and in future this step may be omitted.
• We find no beginning date information. (In the
absence of such information, there is no benefit to
asserting backward clipping.)
• We find (and have highlighted) a coarse-grained
ending date (1984). We indicate that our fluent is
clipped forward, assuming that Jansa no longer attends
the school after graduation.
• There is no duration information. (We don’t know
how long Jansa was at school.)
• The ending point is well before the reporting date, so
we skip to the next step.
• There is no other relevant temporal information.
• To indicate clipping, the graphic fills the time point
symbol (making it solid).
Our reasoning engine expands the entered coarse date 1984
into earliest and latest possible calendar dates bounding the
observation’s ending point. It also infers an upper bound on its
beginning time point.
As illustrated in Figure 4, we invoke a similar workflow for
event occurrence. Because our representation for events is
simpler than that for observations, this workflow has fewer
steps. Our ontology treats birth as a fluent transition event—it
occurs at a given time point, and it causes a transition of the
vital status of the person born (from FuturePerson
to Alive).
Our graphical representation here accordingly just depicts a
single time point (not an interval). We can use basically the
same workflow to capture a non-transition event (e.g., a legal
trial) that occurs over more than one time point.
4.
BirthEvent(Janez_Jansa, Ljubljana)
1. Selectevent type.
2. Specifyargument identifiers,
respecting co-reference.
3. Select / highlight / designate
correspondingtext.
4. Captureany hypotheticalmodality
info.
5. Captureany date info.
6. If an event’s date is otherwise
unconstrainedw.r.t.reportingdate:
a. Capture reporting aspect.
b. Capture any reporting lag info.
7. Captureany other relative temporal
infoavailable.
Figure 4. Workflow to annotate a transition event
BirthEvent(Janez_Jansa, Ljubljana)
1958-09-17
attendsSchool(Janez_Jansa Ljubljana_University)
[0D,15Y3M12D]
[1958-09-18,1984-12-31] 1984
Figure 5. A time map with both a fluent and a transition event
B. Displaying integrated time maps
Figure 5 illustrates a time map including both the birth
event and the school attendance fluent from earlier figures. It
also suggests functional requirements to be satisfied
automatically/by default and upon user demand. Note that we
now have automatically displayed—from on-line temporal
inference—a lower bound on the fluent observation’s
beginning date: Jansa could not have attended school until after
he was born. (The “day” time point granularity used in our
machine reading evaluation leads to some non-intuitive effects,
like not being alive until the day after one is born. We can
easily correct this using an interval constraint propagation
arithmetic including infinitesimals [6][7][8].) We’ve also
indicated bounds on the fluent observation’s duration
(calculated as ending date bounds interval minus starting date
bounds interval). Propagating effects like this can maximize
visual feedback to users, expanding their basis for quality
judgments about the information they enter. If any inferred
bound seemed odd, a user could click on it to identify which of
his/her own entered information (then highlighted in the
display) might be responsible. The time map display tool
would automatically launch such an interaction when it
detected a contradiction among inputs.
The time map displayed in Figure 6 includes all the
information from the source text that is necessary to answer
Query1. The last fluent observation (at bottom right, where
Jansa is prime minister) exercises workflow steps that earlier
time map elements don’t. We have no ending date for this
observation, but we do have present tense reference to Jansa as
the prime minister, so we appeal to the reporting aspect
Ongoing. From the source text he was elected prime minister
on November 9, 2004, we can bound the observation’s
beginningpoint.
ordered unambiguously.
o Rules: Can’t attend school before being
alive; being born makes one alive.
…Born on September 17,
1958 in Ljubljana, Jansa…
1958-09-17
Date of the birth event
Beginning points, ending points
Durations
5.
BirthEvent(Janez_Jansa, Ljubljana)
1958-09-17
attendsSchool(Janez_Jansa Ljubljana_University)
[0D,9235D]
[1958-09-18,1984-12-31] 1984
personHasTitleInOrganization(Janez_Jansa Defence_Minister Slovenia)
1990 1994
personHasTitleInOrganization(Janez_Jansa Defence_Minister Slovenia) ElectionEvent(Slovenia,
Prime_Minister,
2000 2000 2004-11-09 Janez_Jansa)
personHasTitleInOrganization(Janez_Jansa Prime_Minister Slovenia)
Slovenia national election day 2004,
per user-established relative temporal
reference
[2004-11-09, 2007-12-28] 2007-12-28
Reporting date, via reporting
aspect Ongoing
Figure 6. A time map with more statements extracted from the same article
Widget: User selects:
ElectionEvent(Slovenia,
Prime_Minister,
,
• Subject , object (via mouse)
‘
2004-11-09
Janez_Jansa)
‘
‘ personHasTitleInOrganization(Janez_Jansa Prime_Minister Slovenia)
Figure 7. Using the GUI to establish relative temporal reference
Our user also has entered the election event. An election is
not necessarily a fluent transition event, at least in that an
elected candidate does not always take office immediately. So,
we rely on the user to establish relative temporal reference
between the election event and the fluent observation’s
beginning. See the depicted constraint, whose entry is
illustrated in Figure 7. Establishing relative temporal reference
requires the selection of a pair of time points and/or intervals to
be related and of an appropriate temporal relation between
them. Here, we just need the time point at which the election
occurs to be less than or equal to the time point at which Jansa
takes office.
While a few common relations may be all that most users
will ever need, we do have a lot of relations [8] that a user
could in principle choose from. We should be able to provide
access to these effectively, so that our user is empowered
withoutbeing overwhelmed.
Figure 8 shows formal statements that would be created
directly by the user’s actions (i.e., not also including those
created indirectly by inference) in entering the information
reflected in our finished time map. We have highlighted
fluents and some other key statements, each of which appears
near several related statements. Our time map represents
Jansa’s birth event in a non-standard way, repeated here in
different color type, beside the italicized, standard statements.
We have not similarly formalized the event of Jansa’s election
as PM, and Figure 8 includes just statements about that event’s
point of occurrence.
Clearly, we can do a lot of formal work for the user behind
the scenes.
6.
F_school: attendsSchool (Janez_Jansa Ljubljana_University)
holdsThroughout(F_school I_school)
clippedForward(F_school I_school)
hasTimeIntervalSpecString(I_school [,1984])
hasPersonBorn(birth Janez_Jansa)
occursAt(birth P_birth)
hasTimePointSpecString(P_birth 1958-09-17)
hasPersonBorn(birth Janez_Jansa)
hasBirthEventGPE-spec(birth GPEspec)
hasCityTownOrVillage(GPEspec ljubljana_Ljubljana_Slovenia)
hasNationState(GPEspec Slovenia)
type(Defence_Minister MinisterTitle)
F_PTIO_DM_1: personHasTitleInOrganization(Janez_Jansa Defence_Minister Slovenia)
holdsThroughout(F_PTIO_DM_1 I_PTIO_DM_1)
clippedBackward(F_PTIO_DM_1 I_PTIO_DM_1)
clippedForward(F_PTIO_DM_1 I_PTIO_DM_1)
hasTimeIntervalSpecString(I_PTIO_DM_1 [1990,1994])
F_PTIO_DM_2: personHasTitleInOrganization(Janez_Jansa Defence_Minister Slovenia)
holdsThroughout(F_PTIO_DM_2 I_PTIO_DM_2)
clippedBackward(F_PTIO_DM_2 I_PTIO_DM_2)
clippedforward(F_PTIO_DM_2 I_PTIO_DM_2)
hasTimeIntervalSpecString(I_PTIO_DM_2 [2000,2000])
F_PTIO_PM: personHasTitleInOrganization(Janez_Jansa Prime_Minister Slovenia)
holdsThroughout(F_PTIO_PM I_PTIO_PM)
clippedBackward(F_PTIO_PM I_PTIO_PM)
hasBeginningTimePoint(I_PTIO_PM I_PTIO_PM_beginning)
hasTimePointSpecString(Slovenia_2004_Election_Day 2004-11-09)
timePointGreaterThanOrEqualTo(I_PTIO_PM_beginning Slovenia_2004_Election_Day)
hasReportingAspect(I_PTIO_PM Ongoing)
ref(annotation I_PTIO_PM)
annotation(document annotation)
hasReportingChronusSpecString(document 2007-12-28)
Figure 8. Formal statements associated with the time map in Figure 6
Query 1: Find all persons who were born in
Ljubljana in the 1950s and attended Ljubljana
University in the 1980s, the titles that they held,
the organizations in which they held these titles,
and the maximal known time periods over which
they attended and held these titles.
attendsSchool(?person Ljubljana_University)
?attendanceIntervalSpec
1980-01-01 1989-12-31
personHasTitleInOrganization(?person ?title ?org)
?titleIntervalSpec
Figure 9. The time map covering our Query1
C. Adaptation to test or application question authoring
We might reuse much of the same machinery in a question
authoring interface, in which a user can formalize a query, as
illustrated for Query1 in Figure 9. This time map display is
even less cluttered than the one for this query’s supporting
statements, for a couple of reasons.
• We are making general statements, rather than specific
ones, so don’t use as many dates or long identifiers.
Rather, we use variables (here beginning with ?).
• We are asking about only one answer (set of variable
values satisfying the query) at a time. The supporting
statements in our earlier time map include three
separate sets of bindings for the variables ?title
and
?org.
We have introduced intervals to represent the 1950s and the
1980s, and we have selected time point/interval relationships
appropriate to the query’s conditions. These relationships are
associated with particular idioms used in our formalization in
Figure 10.
BirthEvent(Janez_Jansa, Ljubljana)
BirthEvent(?person, Ljubljana)
1950-01-01 1959-12-31
7.
hasPersonBorn(?birth ?person)
hasBirthEventGPE-spec(?birth ?GPEspec)
hasCityTownOrVillage(?GPEspec ljubljana_Ljubljana_Slovenia)
hasTimeIntervalSpecString(?I_range_birth [1950-01-01,1959-12-31])
occursWithin(?birth ?I_range_birth)
hasTimeIntervalSpecString(?I_range_school [1980-01-01,1989-12-31])
holdsWithin(?F_school ?I_range_school)
maximallyHoldsThroughout(?F_school ?I_school)
hasTimeIntervalSpecString(?I_school ?attendanceIntervalSpec)
?F_title: personHasTitleInOrganization(?person ?title ?org)
maximallyHoldsThroughout(?F_title ?I_title)
hasTimeIntervalSpecString(?I_title ?titleIntervalSpec)
Figure 10. Formalization of the query in Figure 9’s time map, covering Query 1
Our query asks about the “maximal known time periods”
over which the fluents hold, and we associate (via a query
authoring workflow step) an “interval spec” variable with each
fluent’s observation interval. Per our formalization, this will be
bound, on successful query execution, to a string that describes
lower and upper bounds on the observation interval’s
beginning point, ending point, and duration. The formalization
uses the properties occursWithin
(for born in the 1950s) and
holdsWithin
(for attended school in the 1980s) to accommodate
the temporal relations selected for the query authoring time
map. We know to use maximallyHoldsThroughout
(vice the
less restrictive holdsThroughout) for the fluents’ observation
intervals because the query’s author has included (via the
invoked widget) associated spec string variables.
Thus, it appears that we might enable non-specialists to
author effective test queries (or, in a transition/application
setting, domain queries), without requiring the intervention of a
KR specialist. One angle on this proposed work might be to
determine the extent to which readers who are not (temporal)
knowledge representation specialists can perform such tasks
consistently—alternatively, to determine the amount of training
(e.g., pages of documentation, number of successfully
completed test exercises) required to qualify an otherwise-non-
specialist to perform the task well. That said, rather than
“dumb down” the task, to accommodate non-expert readers, we
propose to ratchet up annotator performance expectations—to
achieve the highest-quality results possible so that we can drive
research regarding extraction of temporal knowledge by
machines from text to new levels of sophistication. The
machine reading researchers whose systems are under
evaluation quite reasonably ask, before they embark on a
mission of technological advancement, “Is this task feasible for
humans, with acceptable consistency?” We’d like to answer
that question in the best way that we can.
V. RELATED WORK AND PROPOSED ADVANCES
Beyond test questions and answers, the entire machine
reading community would benefit from having a large volume
of good temporal logic annotations available. Time is a key
topic in language understanding, engendering much current
community interest. TimeML [4], which emphasizes XML
annotation structures rather than RDF ontology and
relationships, has been used in the TempEval temporal
annotation activities (see, e.g., www.timeml.org/tempeval2/)
and advanced as an international standard [5]. We are
interested in exploring the synergy between this work and ours.
Others have applied limited temporal reasoning in post-
processing of temporal annotations, to…
A. Compute the closure of qualitative pairwise time
interval relations, as one step in assessing a machine
reader’s precision and recall performance (see section
A)
B. Ascertain the global coherence of captured qualitative
relations (see section B).
Our implementation can go further, as described below.
A. Quantitative temporal relation annotation evaluation
Evaluating temporal annotations typically has been limited
to (Allen’s [1]) qualitative relations (e.g., before, overlaps,
contains), and quantitative information about dates and
durations typically has been evaluated only locally—at the
level of temporal expressions (AKA “TIMEXs” [3]). The
reasoning applied has been strictly interval-based, neglecting
important quantitative information about dates and durations
widely available in text. This approach is taken by Setzer et al.
[9], e.g.
Our temporal reasoning engine, which is point-based,
naturally accommodates arbitrary bounds on the metric
durations that separate time points and uses global constraint
propagation to calculate earliest and latest possible dates/times
for any point (including the beginning and ending points of all
temporal intervals), as well as tightest bounds on durations.
This approach also usually affords sufficient global
perspective for a robust recall statistic. Adapting the standard
approach for evaluating interval relations [11], we can discard
from our gold standard annotations any redundant relations
until we determine a set spanning globally calculated bounds.
Then we can count members of this spanning set whose
addition to a user’s candidate set results in tightening of bounds
in the latter, to determine recall.
Only when every member of a set of points is unrelated to
the calendar (i.e., we have only point ordering and interval
duration information) do we lack calendar bounds supporting
meaningful recall assessment. Then, however, by choosing any
point in a connected set to serve as a reference (in place of the
calendar), we can apply the same approach as above.
It may reasonably be argued that at some threshold of
representational complexity the brute force transitive closure-
and-spanning tree approach to computing recall and precision
of an extracted knowledge base (set of statements) must
become impractical. Our quantitative temporal statements are
certainly richer than the typical qualitative ones, and
(depending on knowledge base size) we may be pushing up
8.
against this threshold with them. Our query answering
evaluation paradigm is more broadly applicable, presuming
inference over extracted statements remains tractable—for the
queries of interest.
B. Ascertaining global coherence
Waiting until annotation is done to infer bounds and detect
contradictions neglects opportunities to give annotator’s (e.g.,
time map-based) feedback and receive their best-effort
corrections. As Bittar et al. [2] comment, “Manually
eliminating incoherencies is an arduous task, and performing
an online coherence check during annotation of relations would
be extremely useful in a manual annotation tool.” We propose
this.
VI. SUMMARY
We have outlined existing and anticipated future benefits of
an end-to-end methodology for…
• Annotating formal RDF statements representing
temporal knowledge to be extracted from text
• Authoring and validating test and/or application
queries to exercise that knowledge.
These capabilities are supported by an implemented temporal
reasoning engine. They and the engine are intended to support
a timeline tool conceived for use by intelligence analysts. We
have explained how these benefits can advance machine
reading technology by increasing both sophistication and
quality expectations about temporal annotations and extraction.
ACKNOWLEDGMENT
Thanks to other participants in DARPA’s Machine Reading
research program—especially to other members of the SAIC
evaluation team, including Global InfoTek (the author’s former
employer).
The views expressed are those of the author and do not
reflect the official policy or position of the Department of
Defense or the U.S. Government. (Approved for Public
Release,DistributionUnlimited)
REFERENCES
[1] J. Allen, “Maintaining knowledge about temporal intervals,” in
Communications of the ACM. 26, pp. 832–843, November 1983.
[2] A. Bittar, P. Amsili, P. Denis, L. Danlos, “French TimeBank: An
ISO-TimeML annotated reference corpus,” in Proceedings of the
49th Annual Meeting of the Association for Computational
Linguistics, pp. 130–134, 2011.
[3] L. Ferro, L. Gerber, I. Mani, B. Sundheim, and G. Wilson,
“TIDES 2005 standard for the annotation of temporal
expressions,” MITRE Corporation, 2005.
[4] J. Pustejovsky et al., “TimeML: Robust specification of event and
temporal expressions in text,” AAAI Technical Report SS-03-07,
2003.
[5] J. Pustejovsky, K. Lee, H. Bunt, and L. Romary, “ISO-TimeML:
an international standard for semantic annotation,” in
Proceedings of the Seventh International Conference on
Language Resources and Evaluation (LREC), Malta. May 18-21,
2010.
[6] R. Schrag, J. Carciofini, and M. Boddy, “Beta-TMM Manual
(version b19),” Technical Report CS-R92-012, Honeywell SRC,
1992.
[7] R. Schrag, M. Boddy, and J. Carciofini. “Managing disjunction
for practical temporal reasoning,” in Principles of Knowledge
Representation and Reasoning: Proceedings of the Third
International Conference (KR-92), pp 36–46, 1992.
[8] R. Schrag, “Best-practice time point ontology for event calculus-
based temporal reasoning,” 7th International Conference on
Semantic Technologies for Intelligence, Defense, and Security
(STIDS), 2012.
[9] A. Setzer, R. Gaizauskas, and M. Hepple, “The role of inference
in the temporal annotation and analysis of text,” Language
Resources and Evaluation v. 39, pp. 243–265, 2005.
[10] M. Shanahan, “The event calculus explained,” in Artificial
Intelligence Today, ed. M. Wooldridge and M. Veloso, Springer
Lecture Notes in Artificial Intelligence no. 1600, pp.409-430,
1999.
[11] X. Tannier and P Muller, “Evaluation metrics for automatic
temporal annotation of texts,” in Proceedings of the Sixth
International Conference on Language Resources and
Evaluation (LREC’08), 2008.