Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Eskm20140903
1. Extraction of Key Expressions Indicating
the Important Sentence
from Article Abstracts
5th International Conference on E-Service and
Knowledge Management (ESKM 2014)
Shuhei Otani
Department of Library Science
Kyushu University
Yoichi Tomiura
Department of Library Science
Kyushu University
2. Table of Contents
• Background and Purpose
• Method for extraction key
expressions
• Experiments
• Result
• Summary and future plans
3. Background
• Academic information continues to be
increased.(double in every 10
years )
• Disciplines have become subdivided.
• Interdisciplinary research is promoted.
We need many expense when academic
information searches
4. How to reduce the expense of academic
information search
Title :
Not represent their contents
You’re not from 'round here, are you?
Abstract :
Native and non-native use of language differs, depending on the proficiency
of the speaker, in clear and quantifiable ways.
It has been shown that customizing the acoustic and language models of a
natural language understanding system can significantly improve handling
of non-native input;
in order to make such a switch, however, the nativeness status of the user
must be known.
In this paper, we show that naive Bayes classification can be used to
identify non-native utterances of English. The advantage of our method is
that it relies on text, not on acoustic features, and can be used when the
acoustic source is not available. …
Native and non-native use of language differs, depending on the proficiency
of the speaker, in clear and quantifiable ways.
It has been shown that customizing the acoustic and language models of a
natural language understanding system can significantly improve handling
of non-native input;
in order to make such a switch, however, the nativeness status of the user
must be known.
In this paper, we show that naive Bayes classification can be used to
identify non-native utterances of English. The advantage of our method is
that it relies on text, not on acoustic features, and can be used when the
acoustic source is not available. …
Need many expense , if there are many results.
Describe article’s contents
5. Purpose
Reduce the expense of academic information search
Important Sentences
Describe the originalities or contributions
Ex.
• Present search results where the
important sentences are emphasized.
• Keyword suggest
6. How we extract important sentence?
• Almost all of all important sentences
include key expressions.
Key expression = cue expression: word
sequence such that If a sentence includes it ,
the sentence is important.
e.g. “In this study, we aim to extract key expressions that indicate
the important sentence describing the originalities or contributions
from article abstracts. ”
To collect exhaustive key expressions
7. Related works
• Nakawatase and Oyama (2011)
→They make a list of key expressions of
Japanese articles by hands .
• Structuring abstracts and Text
summarization
→The cost of preparing training data is a
problem.
It is not realistic to prepare list of key expressions or
training data for all discipline.
8. Table of Contents
• Background and Purpose
• Method for extraction key
expressions
• Experiments
• Result
• Summary and future plans
9. Extraction of important sentences
• To collect exhaustive key expressions by hands
• To prepare annotated corpus to key expressions through
Exhaustive key expression
machine learning.
Important sentence Key expression
Many
key expressions
A lot of important
sentences
High Precision
High Recall
If we automatically collect subset of important sentences
without key expressions….
High Precision
Low Recall
10. How we collect subset of important sentences
without key expressions?
• A sentence in an abstract that has many common
words with the title is more likely to be an important
sentence.
11. How we collect subset of important sentences
without key expressions?
Title :
EPUB as Publication Format in Open Access Journals:
Tools and Workflow
Abstract :
In this article, we present a case study of how the main publishing format of
an Open Access journal was changed from PDF to EPUB by designing a
new workflow using JATS as the basic XML source format.
We state the reasons and discuss advantages for doing this, how we did it,
and the costs of changing an established Microsoft Word workflow.
As an example, we use one typical sociology article with tables, illustrations
and references. We then follow the article from JATS markup through
different transformations resulting in XHTM.
12. How we collect subset of important sentences
without key expressions?
• A sentence in an abstract that has many common
words with the title is more likely to be an important
sentence.
• We call such sentences “Pseudo-important
sentences”
13. How to collect pseudo important
sentences
Title :
EPUB as Publication Format in Open Access Journals:
Tools and Workflow
Abstract :
In this article, we present a case study of how the main publishing format of
an Open Access journal was changed from PDF to EPUB by designing a
new workflow using JATS as the basic XML source format.
We state the reasons and discuss advantages for doing this, how we did it,
and the costs of changing an established Microsoft Word workflow.
As an example, we use one typical sociology article with tables, illustrations
and references. We then follow the article from JATS markup through
different transformations resulting in XHTM.
14. How to collect pseudo important
sentences
Title :
EPUB as Publication Format in Open Access Journals:
Tools and Workflow
Abstract :
In this article, we present a case study of how the main publishing format of
an Open Access journal was changed from PDF to EPUB by designing a
new workflow using JATS as the basic XML source format.
We state the reasons and discuss advantages for doing this, how we did it,
and the costs of changing an established Microsoft Word workflow.
T s is
W A W s
( )
( )
As an example, we use one typical sociology article with tables, illustrations
and references. We then follow the article from JATS markup through
different transformations resulting in XHTM.
W A W s
( )
( )
T
a pseudo-important sentence
15. How we extract key expressions from
pseudo-important sentences?
Feature of Key expressions
• Appear frequently in pseudo-important
sentences.
• Appear infrequently in non-pseudo-important
sentences.
→Extract key expressions using a ratio of key
expression’s frequency
We remove the expressions with a low frequency from
candidates.
16. All sentences of abstracts
Non-Pseudo-important
sentence
In this paper : 15
Topic model : 55
Pseudo-important
sentence
In this paper : 45
Topic model : 45
Latent diriclet : 4 Latent diriclet : 1
17. All sentences of abstracts
Non-Pseudo-important
sentence
In this paper : 15
Topic model : 55
Pseudo-important
sentence
In this paper : 45
45
Topic model : 45
Topic model : 45
Key
expression
Topic model : 25
15+45
=0.75
45 ≧ γ(10)
Latent diriclet : 3
Latent diriclet : 4
≧ δ(0.5)
Latent diriclet : 4 Latent diriclet : 1
Threshold values
18. All sentences of abstracts
Non-Pseudo-important
=0.45 < δ (0.5)
sentence
In this paper : 15
4 < γ (10)
In this paper : 15
Topic model : 55
Pseudo-important
sentence
In this paper : 45
Topic model : 45
45
45+55
Latent diriclet : 4 Latent diriclet : 1
19. Table of Contents
• Background and Purpose
• Method for extraction key
expressions
• Experiments
• Result
• Summary and future plans
20. Experiment
Pseudo important
sentences
Key expressions
Important sentences
Dataset1
Dataset2
Evaluation
Precision and recall
22. Experiment
Parameter
• Ratio of common words α → 0.3
• Length of the key expressions N → 2, 3, 4
• Threshold of Ratio → 0.1, 0.3, 0.5
• The minimum frequency →10
23. Table of Contents
• Background and Purpose
• Method for extraction key
expressions
• Experiments
• Result
• Summary and future plans
24. Result
N δ Precision(%) recall(%)
3 0.1 52.6 40.9
3 0.3 64.2 17.2
3 0.5 85.7 9.1
High precision and low
recall
25. Samples of extracted key
expressions
• (in, this, study)
• (scheme, based, on) .
→Expected key expressions
• (atomic, absorption)
• (high, performance, liquid, chromatography)
→Technical terms are to be removed from key
expressions.
26. Table of Contents
• Background and Purpose
• Method for extraction key
expressions
• Experiments
• Result
• Summary and future plans
27. Summary and future plans
• To extract key expressions
• Regard a sentence that has many
common words with title as a pseudo-important
sentence.
• Extract key expressions based on the
ratio of the frequency of the pseudo-important
sentences and all sentences.
28. Summary and future plans
• Result is High precision.
• Increase recall as keeping high
precision
• Increase target abstracts used to extract
pseudo-important sentences.
• Remove technical terms from key
expressions
Editor's Notes
Hi, I’m shuhei otani.
I will talk about “Extraction of Key Expressions Indicating the Important Sentence from Article Abstracts ”
In this presentation, first I’m gonna talk about Background and purpose of the present work,
Recently Academic information continues to be increased.
A study reports that the number of articles doubles in size every 10 years.
Therefor we are faced too many search results.
In addition, disciplines have become subdivided and interdisciplinary research is promoted by nations.
In this situation, information searches that are outside of the researcher’s discipline occur.
Click
So we need large expense when we search academic information.
How to reduce this expense of academic information search?
The search results are often composed of titles and abstracts.
Genelary speaaing, first we check the title, don’t we?
However some article title does not represent their contents, because their title is Rhetorical or very short.
For this example, “you are not around here”
Click
We can not understand their study from title.
Next, we will check abstracts.
Click Click
However We need many expense to read all of abstract, If there are too many search result,
Click
If we can extract the sentence that describe articles’s contents.
We can reduce the expense of academic imformation search.
Our purpose is to reduce the expense of academic information searches by extraction “important sentences” from article abstract.
Important sentences mean that describes the originalities or contributions in an article abstract.
If we can extract important sentence, for example, we can present the search result where the important sentences are emphasized and we can suggest other key word that we extracted important sentences.
It is easy for us to grasp a content of articles.
How we extract important sentence?
In fact almost all of important sentences include key expressions.
Key expression is equal cue expressions.
Here we think that key expression is word sequence such that If a sentence includes it , the sentence is important.
For example, maybe this sentence is important in an abstract.
And this word sequence “in this study” shows this sentence is important.
Click
So purpose of our study is to collect exhaustive key expressions.
There are some related works about extraction of important sentences.
One of them is Nakawatase and Oyama .
They made a list of key expressions such that sentences including them are important for their target abstracts in Japanese.
Then they extracted important sentences using key expressions.
However, it is expensive to prepare many key expressions by hands.
Structuring abstracts and text summarization are related to the extraction of important sentences.
The methods used in both techniques with high performance are based on machine learning.
Therefore, the cost of preparing training data become problems.
Click
So It is not realistic to prepare list of key expressions or training data for all discipline.
Then I explain about our method for extraction key expressions.
Our final goal is extraction of important sentences in abstracts.
Click
We need exhaustive key expression to extract important sentences with high precision and with high recall..
Click
However It is not realistic to collect exhaustive key expressions by hands for all discipline.
It is not also realistic to prepare annotated corpus to extract key expressions through machine learning.
Click
Click
if we automatically collect subset of important sentences without key expressions.
We can extract key expression from the extracted important sentence.
And Using this key expression, we can extract important sentence with high precision but with low recall.
If we can collect a lot of important sentences, we can extract many key expressions and Using them we can extract important sentence with high precision and with high recall.
We explain How we collect subset of important sentences without key expressions?
We presume that sentence in an abstract that has many common words with the title is more likely to be an important sentence.
As you see, this sentence has many common words with the title, and you might this sentence is important.
We extract important sentence using this tendency
We call such sentences “Pseudo-important sentences”
This is same abstract as I show previously. Using this I explain How to collect pseudo important sentences.
This is a title and these are sentences in the abstract.
WT(A) is a set of words in the title of article A.
W(s) is a set of words in the sentence s
If this value, the ratio of common words with the title , is higher than or equal to arufa, we regard s is a pseudo-important sentence.
How we extract key expressions from pseudo-important sentences?
First we show some feature of key expressions.
Key expression appears frequently in pseudo-important sentences. And appears infrequently in non-pseudo-important sentences.
So we extract key expressions using a ratio of key expression’s frequency in pseudo important sentence for all sentences of abstract .
In addition, we remove the expressions with a low frequency from candidates because they have are low reliability of the ratio.
In particular, this case , expression “In this paper” and “Topic model”is appear 45 And expression “Latent diriclet” is 4 times in pseudo-important sentence.
And In non pseudo important sentence, “In this paper” is appear 15,topic model is 55, latent diriclet is 1.
We make some threshold values to extract key expressions from Pseudo-important sentence.
One of them is δ the minimum ratio of frequency between pseudo important sentences and all sentences.
And The other is γ minimum frequency.
We suppose that δ is set to 0.5 and γ is set to 10.
In the case of “in this paper”, the ratio is 0.75 and it is greater than δ.
And the frequency is 45 and it is grater than γ.
So word sequence “In this paper” is extracted as key expression.
In the case of “Topic model” the ratio is lower than δ. So it is not extracted as key expression.
And in the case of “Laten dileclet” the frequency is lower than γ, So it is not extracted as key expression.
Next I talk about our experiment.
Finally will make a summary and talk about future plans.
In this study, we aimed to extract key expressions that indicate the important sentence from article abstracts.
We performed an experiment with the following method.
We regard a sentence that shares many words with the article title as a pseudo-important sentence.
We extracted key expressions from article abstracts based on the ratio of the frequency of the pseudo-important sentences and include them with the frequency of all sentences including these key expressions.
Our result showed that the precision of extraction of an important sentence using the extracted key expressions was high, but low recall as we expected.
We will improve recall by increasing target abstracts used to extract pseudo-important sentences and removing technical term from key expressions.
That’s the end of my presentation .
Thank you for your time.