Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Extraction of Key Expressions Indicating 
the Important Sentence 
from Article Abstracts 
5th International Conference on ...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Background 
• Academic information continues to be 
increased.(double in every 10 
years ) 
• Disciplines have become subd...
How to reduce the expense of academic 
information search 
Title : 
Not represent their contents 
You’re not from 'round h...
Purpose 
Reduce the expense of academic information search 
Important Sentences 
Describe the originalities or contributio...
How we extract important sentence? 
• Almost all of all important sentences 
include key expressions. 
Key expression = cu...
Related works 
• Nakawatase and Oyama (2011) 
→They make a list of key expressions of 
Japanese articles by hands . 
• Str...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Extraction of important sentences 
• To collect exhaustive key expressions by hands 
• To prepare annotated corpus to key ...
How we collect subset of important sentences 
without key expressions? 
• A sentence in an abstract that has many common 
...
How we collect subset of important sentences 
without key expressions? 
Title : 
EPUB as Publication Format in Open Access...
How we collect subset of important sentences 
without key expressions? 
• A sentence in an abstract that has many common 
...
How to collect pseudo important 
sentences 
Title : 
EPUB as Publication Format in Open Access Journals: 
Tools and Workfl...
How to collect pseudo important 
sentences 
Title : 
EPUB as Publication Format in Open Access Journals: 
Tools and Workfl...
How we extract key expressions from 
pseudo-important sentences? 
Feature of Key expressions 
• Appear frequently in pseud...
All sentences of abstracts 
Non-Pseudo-important 
sentence 
In this paper : 15 
Topic model : 55 
Pseudo-important 
senten...
All sentences of abstracts 
Non-Pseudo-important 
sentence 
In this paper : 15 
Topic model : 55 
Pseudo-important 
senten...
All sentences of abstracts 
Non-Pseudo-important 
=0.45 < δ (0.5) 
sentence 
In this paper : 15 
4 < γ (10) 
In this paper...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Experiment 
Pseudo important 
sentences 
Key expressions 
Important sentences 
Dataset1 
Dataset2 
Evaluation 
Precision a...
Experiment 
• Dataset for extracting key expressions 
• 10,000 abstracts 
• Computer Science, Analytical Chemistry 
• Data...
Experiment 
Parameter 
• Ratio of common words α → 0.3 
• Length of the key expressions N → 2, 3, 4 
• Threshold of Ratio ...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Result 
N δ Precision(%) recall(%) 
3 0.1 52.6 40.9 
3 0.3 64.2 17.2 
3 0.5 85.7 9.1 
High precision and low 
recall
Samples of extracted key 
expressions 
• (in, this, study) 
• (scheme, based, on) . 
→Expected key expressions 
• (atomic,...
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary ...
Summary and future plans 
• To extract key expressions 
• Regard a sentence that has many 
common words with title as a ps...
Summary and future plans 
• Result is High precision. 
• Increase recall as keeping high 
precision 
• Increase target abs...
Upcoming SlideShare
Loading in …5
×

Eskm20140903

295 views

Published on

http://aai2014.iaiai.org/eskm.html

Published in: Data & Analytics
  • Be the first to comment

Eskm20140903

  1. 1. Extraction of Key Expressions Indicating the Important Sentence from Article Abstracts 5th International Conference on E-Service and Knowledge Management (ESKM 2014) Shuhei Otani Department of Library Science Kyushu University Yoichi Tomiura Department of Library Science Kyushu University
  2. 2. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  3. 3. Background • Academic information continues to be increased.(double in every 10 years ) • Disciplines have become subdivided. • Interdisciplinary research is promoted. We need many expense when academic information searches
  4. 4. How to reduce the expense of academic information search Title : Not represent their contents You’re not from 'round here, are you? Abstract : Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. … Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. … Need many expense , if there are many results. Describe article’s contents
  5. 5. Purpose Reduce the expense of academic information search Important Sentences Describe the originalities or contributions Ex. • Present search results where the important sentences are emphasized. • Keyword suggest
  6. 6. How we extract important sentence? • Almost all of all important sentences include key expressions. Key expression = cue expression: word sequence such that If a sentence includes it , the sentence is important. e.g. “In this study, we aim to extract key expressions that indicate the important sentence describing the originalities or contributions from article abstracts. ” To collect exhaustive key expressions
  7. 7. Related works • Nakawatase and Oyama (2011) →They make a list of key expressions of Japanese articles by hands . • Structuring abstracts and Text summarization →The cost of preparing training data is a problem. It is not realistic to prepare list of key expressions or training data for all discipline.
  8. 8. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  9. 9. Extraction of important sentences • To collect exhaustive key expressions by hands • To prepare annotated corpus to key expressions through Exhaustive key expression machine learning. Important sentence Key expression Many key expressions A lot of important sentences High Precision High Recall If we automatically collect subset of important sentences without key expressions…. High Precision Low Recall
  10. 10. How we collect subset of important sentences without key expressions? • A sentence in an abstract that has many common words with the title is more likely to be an important sentence.
  11. 11. How we collect subset of important sentences without key expressions? Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM.
  12. 12. How we collect subset of important sentences without key expressions? • A sentence in an abstract that has many common words with the title is more likely to be an important sentence. • We call such sentences “Pseudo-important sentences”
  13. 13. How to collect pseudo important sentences Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM.
  14. 14. How to collect pseudo important sentences Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. T s is  W A W s ( )  ( ) As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM. W A W s ( )  ( ) T a pseudo-important sentence
  15. 15. How we extract key expressions from pseudo-important sentences? Feature of Key expressions • Appear frequently in pseudo-important sentences. • Appear infrequently in non-pseudo-important sentences. →Extract key expressions using a ratio of key expression’s frequency We remove the expressions with a low frequency from candidates.
  16. 16. All sentences of abstracts Non-Pseudo-important sentence In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 Topic model : 45 Latent diriclet : 4 Latent diriclet : 1
  17. 17. All sentences of abstracts Non-Pseudo-important sentence In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 45 Topic model : 45 Topic model : 45 Key expression Topic model : 25 15+45 =0.75 45 ≧ γ(10) Latent diriclet : 3 Latent diriclet : 4 ≧ δ(0.5) Latent diriclet : 4 Latent diriclet : 1 Threshold values
  18. 18. All sentences of abstracts Non-Pseudo-important =0.45 < δ (0.5) sentence In this paper : 15 4 < γ (10) In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 Topic model : 45 45 45+55 Latent diriclet : 4 Latent diriclet : 1
  19. 19. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  20. 20. Experiment Pseudo important sentences Key expressions Important sentences Dataset1 Dataset2 Evaluation Precision and recall
  21. 21. Experiment • Dataset for extracting key expressions • 10,000 abstracts • Computer Science, Analytical Chemistry • Dataset for evaluation • 115 abstracts • Computer Science, Analytical Chemistry
  22. 22. Experiment Parameter • Ratio of common words α → 0.3 • Length of the key expressions N → 2, 3, 4 • Threshold of Ratio  → 0.1, 0.3, 0.5 • The minimum frequency  →10
  23. 23. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  24. 24. Result N δ Precision(%) recall(%) 3 0.1 52.6 40.9 3 0.3 64.2 17.2 3 0.5 85.7 9.1 High precision and low recall
  25. 25. Samples of extracted key expressions • (in, this, study) • (scheme, based, on) . →Expected key expressions • (atomic, absorption) • (high, performance, liquid, chromatography) →Technical terms are to be removed from key expressions.
  26. 26. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  27. 27. Summary and future plans • To extract key expressions • Regard a sentence that has many common words with title as a pseudo-important sentence. • Extract key expressions based on the ratio of the frequency of the pseudo-important sentences and all sentences.
  28. 28. Summary and future plans • Result is High precision. • Increase recall as keeping high precision • Increase target abstracts used to extract pseudo-important sentences. • Remove technical terms from key expressions

×