0
Extraction of Key Expressions Indicating 
the Important Sentence 
from Article Abstracts 
5th International Conference on ...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Background 
• Academic information continues to be 
increased.(double in every 10 
years ) 
• Disciplines have become subd...
How to reduce the expense of academic 
information search 
Title : 
Not represent their contents 
You’re not from 'round h...
Purpose 
Reduce the expense of academic information search 
Important Sentences 
Describe the originalities or contributio...
How we extract important sentence? 
• Almost all of all important sentences 
include key expressions. 
Key expression = cu...
Related works 
• Nakawatase and Oyama (2011) 
→They make a list of key expressions of 
Japanese articles by hands . 
• Str...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Extraction of important sentences 
• To collect exhaustive key expressions by hands 
• To prepare annotated corpus to key ...
How we collect subset of important sentences 
without key expressions? 
• A sentence in an abstract that has many common 
...
How we collect subset of important sentences 
without key expressions? 
Title : 
EPUB as Publication Format in Open Access...
How we collect subset of important sentences 
without key expressions? 
• A sentence in an abstract that has many common 
...
How to collect pseudo important 
sentences 
Title : 
EPUB as Publication Format in Open Access Journals: 
Tools and Workfl...
How to collect pseudo important 
sentences 
Title : 
EPUB as Publication Format in Open Access Journals: 
Tools and Workfl...
How we extract key expressions from 
pseudo-important sentences? 
Feature of Key expressions 
• Appear frequently in pseud...
All sentences of abstracts 
Non-Pseudo-important 
sentence 
In this paper : 15 
Topic model : 55 
Pseudo-important 
senten...
All sentences of abstracts 
Non-Pseudo-important 
sentence 
In this paper : 15 
Topic model : 55 
Pseudo-important 
senten...
All sentences of abstracts 
Non-Pseudo-important 
=0.45 < δ (0.5) 
sentence 
In this paper : 15 
4 < γ (10) 
In this paper...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Experiment 
Pseudo important 
sentences 
Key expressions 
Important sentences 
Dataset1 
Dataset2 
Evaluation 
Precision a...
Experiment 
• Dataset for extracting key expressions 
• 10,000 abstracts 
• Computer Science, Analytical Chemistry 
• Data...
Experiment 
Parameter 
• Ratio of common words α → 0.3 
• Length of the key expressions N → 2, 3, 4 
• Threshold of Ratio ...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Result 
N δ Precision(%) recall(%) 
3 0.1 52.6 40.9 
3 0.3 64.2 17.2 
3 0.5 85.7 9.1 
High precision and low 
recall
Samples of extracted key 
expressions 
• (in, this, study) 
• (scheme, based, on) . 
→Expected key expressions 
• (atomic,...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Summary and future plans 
• To extract key expressions 
• Regard a sentence that has many 
common words with title as a ps...
Summary and future plans 
• Result is High precision. 
• Increase recall as keeping high 
precision 
• Increase target abs...
Upcoming SlideShare
Loading in...5
×

Eskm20140903

85

Published on

http://aai2014.iaiai.org/eskm.html

Published in: Data & Analytics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
85
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Hi, I’m shuhei otani.
    I will talk about “Extraction of Key Expressions Indicating the Important Sentence from Article Abstracts ”
  • In this presentation, first I’m gonna talk about Background and purpose of the present work,
  • Recently Academic information continues to be increased.
    A study reports that the number of articles doubles in size every 10 years.
    Therefor we are faced too many search results.
    In addition, disciplines have become subdivided and interdisciplinary research is promoted by nations.
    In this situation, information searches that are outside of the researcher’s discipline occur.
    Click
    So we need large expense when we search academic information.
  • How to reduce this expense of academic information search?
    The search results are often composed of titles and abstracts.
    Genelary speaaing, first we check the title, don’t we?
    However some article title does not represent their contents, because their title is Rhetorical or very short.
    For this example, “you are not around here”
    Click
    We can not understand their study from title.

    Next, we will check abstracts.
    Click Click
    However We need many expense to read all of abstract, If there are too many search result,

    Click
    If we can extract the sentence that describe articles’s contents.
    We can reduce the expense of academic imformation search.

  • Our purpose is to reduce the expense of academic information searches by extraction “important sentences” from article abstract.
    Important sentences mean that describes the originalities or contributions in an article abstract.
     
    If we can extract important sentence, for example, we can present the search result where the important sentences are emphasized and we can suggest other key word that we extracted important sentences.
    It is easy for us to grasp a content of articles.
  • How we extract important sentence?
    In fact almost all of important sentences include key expressions.
    Key expression is equal cue expressions.
    Here we think that key expression is word sequence such that If a sentence includes it , the sentence is important.
    For example, maybe this sentence is important in an abstract.
    And this word sequence “in this study” shows this sentence is important.
    Click
    So purpose of our study is to collect exhaustive key expressions.
  • There are some related works about extraction of important sentences.
    One of them is Nakawatase and Oyama .
    They made a list of key expressions such that sentences including them are important for their target abstracts in Japanese.
    Then they extracted important sentences using key expressions.
    However, it is expensive to prepare many key expressions by hands.
     
    Structuring abstracts and text summarization are related to the extraction of important sentences.
    The methods used in both techniques with high performance are based on machine learning.
    Therefore, the cost of preparing training data become problems.
    Click
    So It is not realistic to prepare list of key expressions or training data for all discipline.
  • Then I explain about our method for extraction key expressions.
  • Our final goal is extraction of important sentences in abstracts.
    Click
    We need exhaustive key expression to extract important sentences with high precision and with high recall..
    Click
    However It is not realistic to collect exhaustive key expressions by hands for all discipline.
    It is not also realistic to prepare annotated corpus to extract key expressions through machine learning.
    Click
    Click
    if we automatically collect subset of important sentences without key expressions.
    We can extract key expression from the extracted important sentence.
    And Using this key expression, we can extract important sentence with high precision but with low recall.
    If we can collect a lot of important sentences, we can extract many key expressions and Using them we can extract important sentence with high precision and with high recall.
  • We explain How we collect subset of important sentences without key expressions?
    We presume that sentence in an abstract that has many common words with the title is more likely to be an important sentence.
  • As you see, this sentence has many common words with the title, and you might this sentence is important.
  • We extract important sentence using this tendency
    We call such sentences “Pseudo-important sentences”

  • This is same abstract as I show previously. Using this I explain How to collect pseudo important sentences.

    This is a title and these are sentences in the abstract.
    WT(A) is a set of words in the title of article A.
    W(s) is a set of words in the sentence s
  • If this value, the ratio of common words with the title , is higher than or equal to arufa, we regard s is a pseudo-important sentence.
  • How we extract key expressions from pseudo-important sentences?
    First we show some feature of key expressions.
    Key expression appears frequently in pseudo-important sentences. And appears infrequently in non-pseudo-important sentences.
    So we extract key expressions using a ratio of key expression’s frequency in pseudo important sentence for all sentences of abstract .
    In addition, we remove the expressions with a low frequency from candidates because they have are low reliability of the ratio.
  • In particular, this case , expression “In this paper” and “Topic model”is appear 45 And expression “Latent diriclet” is 4 times in pseudo-important sentence.
    And In non pseudo important sentence, “In this paper” is appear 15,topic model is 55, latent diriclet is 1.
  • We make some threshold values to extract key expressions from Pseudo-important sentence.
    One of them is δ the minimum ratio of frequency between pseudo important sentences and all sentences.
    And The other is γ minimum frequency.

    We suppose that δ is set to 0.5 and γ is set to 10.
    In the case of “in this paper”, the ratio is 0.75 and it is greater than δ.
    And the frequency is 45 and it is grater than γ.
    So word sequence “In this paper” is extracted as key expression.
  • In the case of “Topic model” the ratio is lower than δ. So it is not extracted as key expression.
    And in the case of “Laten dileclet” the frequency is lower than γ, So it is not extracted as key expression.
  • Next I talk about our experiment.
  • Finally will make a summary and talk about future plans.
  • In this study, we aimed to extract key expressions that indicate the important sentence from article abstracts.
    We performed an experiment with the following method.
    We regard a sentence that shares many words with the article title as a pseudo-important sentence.
    We extracted key expressions from article abstracts based on the ratio of the frequency of the pseudo-important sentences and include them with the frequency of all sentences including these key expressions.
  • Our result showed that the precision of extraction of an important sentence using the extracted key expressions was high, but low recall as we expected.
    We will improve recall by increasing target abstracts used to extract pseudo-important sentences and removing technical term from key expressions.
    That’s the end of my presentation .
    Thank you for your time.
  • Transcript of "Eskm20140903"

    1. 1. Extraction of Key Expressions Indicating the Important Sentence from Article Abstracts 5th International Conference on E-Service and Knowledge Management (ESKM 2014) Shuhei Otani Department of Library Science Kyushu University Yoichi Tomiura Department of Library Science Kyushu University
    2. 2. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
    3. 3. Background • Academic information continues to be increased.(double in every 10 years ) • Disciplines have become subdivided. • Interdisciplinary research is promoted. We need many expense when academic information searches
    4. 4. How to reduce the expense of academic information search Title : Not represent their contents You’re not from 'round here, are you? Abstract : Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. … Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. … Need many expense , if there are many results. Describe article’s contents
    5. 5. Purpose Reduce the expense of academic information search Important Sentences Describe the originalities or contributions Ex. • Present search results where the important sentences are emphasized. • Keyword suggest
    6. 6. How we extract important sentence? • Almost all of all important sentences include key expressions. Key expression = cue expression: word sequence such that If a sentence includes it , the sentence is important. e.g. “In this study, we aim to extract key expressions that indicate the important sentence describing the originalities or contributions from article abstracts. ” To collect exhaustive key expressions
    7. 7. Related works • Nakawatase and Oyama (2011) →They make a list of key expressions of Japanese articles by hands . • Structuring abstracts and Text summarization →The cost of preparing training data is a problem. It is not realistic to prepare list of key expressions or training data for all discipline.
    8. 8. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
    9. 9. Extraction of important sentences • To collect exhaustive key expressions by hands • To prepare annotated corpus to key expressions through Exhaustive key expression machine learning. Important sentence Key expression Many key expressions A lot of important sentences High Precision High Recall If we automatically collect subset of important sentences without key expressions…. High Precision Low Recall
    10. 10. How we collect subset of important sentences without key expressions? • A sentence in an abstract that has many common words with the title is more likely to be an important sentence.
    11. 11. How we collect subset of important sentences without key expressions? Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM.
    12. 12. How we collect subset of important sentences without key expressions? • A sentence in an abstract that has many common words with the title is more likely to be an important sentence. • We call such sentences “Pseudo-important sentences”
    13. 13. How to collect pseudo important sentences Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM.
    14. 14. How to collect pseudo important sentences Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. T s is  W A W s ( )  ( ) As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM. W A W s ( )  ( ) T a pseudo-important sentence
    15. 15. How we extract key expressions from pseudo-important sentences? Feature of Key expressions • Appear frequently in pseudo-important sentences. • Appear infrequently in non-pseudo-important sentences. →Extract key expressions using a ratio of key expression’s frequency We remove the expressions with a low frequency from candidates.
    16. 16. All sentences of abstracts Non-Pseudo-important sentence In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 Topic model : 45 Latent diriclet : 4 Latent diriclet : 1
    17. 17. All sentences of abstracts Non-Pseudo-important sentence In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 45 Topic model : 45 Topic model : 45 Key expression Topic model : 25 15+45 =0.75 45 ≧ γ(10) Latent diriclet : 3 Latent diriclet : 4 ≧ δ(0.5) Latent diriclet : 4 Latent diriclet : 1 Threshold values
    18. 18. All sentences of abstracts Non-Pseudo-important =0.45 < δ (0.5) sentence In this paper : 15 4 < γ (10) In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 Topic model : 45 45 45+55 Latent diriclet : 4 Latent diriclet : 1
    19. 19. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
    20. 20. Experiment Pseudo important sentences Key expressions Important sentences Dataset1 Dataset2 Evaluation Precision and recall
    21. 21. Experiment • Dataset for extracting key expressions • 10,000 abstracts • Computer Science, Analytical Chemistry • Dataset for evaluation • 115 abstracts • Computer Science, Analytical Chemistry
    22. 22. Experiment Parameter • Ratio of common words α → 0.3 • Length of the key expressions N → 2, 3, 4 • Threshold of Ratio  → 0.1, 0.3, 0.5 • The minimum frequency  →10
    23. 23. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
    24. 24. Result N δ Precision(%) recall(%) 3 0.1 52.6 40.9 3 0.3 64.2 17.2 3 0.5 85.7 9.1 High precision and low recall
    25. 25. Samples of extracted key expressions • (in, this, study) • (scheme, based, on) . →Expected key expressions • (atomic, absorption) • (high, performance, liquid, chromatography) →Technical terms are to be removed from key expressions.
    26. 26. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
    27. 27. Summary and future plans • To extract key expressions • Regard a sentence that has many common words with title as a pseudo-important sentence. • Extract key expressions based on the ratio of the frequency of the pseudo-important sentences and all sentences.
    28. 28. Summary and future plans • Result is High precision. • Increase recall as keeping high precision • Increase target abstracts used to extract pseudo-important sentences. • Remove technical terms from key expressions
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×