SlideShare a Scribd company logo
1 of 28
Extraction of Key Expressions Indicating 
the Important Sentence 
from Article Abstracts 
5th International Conference on E-Service and 
Knowledge Management (ESKM 2014) 
Shuhei Otani 
Department of Library Science 
Kyushu University 
Yoichi Tomiura 
Department of Library Science 
Kyushu University
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary and future plans
Background 
• Academic information continues to be 
increased.(double in every 10 
years ) 
• Disciplines have become subdivided. 
• Interdisciplinary research is promoted. 
We need many expense when academic 
information searches
How to reduce the expense of academic 
information search 
Title : 
Not represent their contents 
You’re not from 'round here, are you? 
Abstract : 
Native and non-native use of language differs, depending on the proficiency 
of the speaker, in clear and quantifiable ways. 
It has been shown that customizing the acoustic and language models of a 
natural language understanding system can significantly improve handling 
of non-native input; 
in order to make such a switch, however, the nativeness status of the user 
must be known. 
In this paper, we show that naive Bayes classification can be used to 
identify non-native utterances of English. The advantage of our method is 
that it relies on text, not on acoustic features, and can be used when the 
acoustic source is not available. … 
Native and non-native use of language differs, depending on the proficiency 
of the speaker, in clear and quantifiable ways. 
It has been shown that customizing the acoustic and language models of a 
natural language understanding system can significantly improve handling 
of non-native input; 
in order to make such a switch, however, the nativeness status of the user 
must be known. 
In this paper, we show that naive Bayes classification can be used to 
identify non-native utterances of English. The advantage of our method is 
that it relies on text, not on acoustic features, and can be used when the 
acoustic source is not available. … 
Need many expense , if there are many results. 
Describe article’s contents
Purpose 
Reduce the expense of academic information search 
Important Sentences 
Describe the originalities or contributions 
Ex. 
• Present search results where the 
important sentences are emphasized. 
• Keyword suggest
How we extract important sentence? 
• Almost all of all important sentences 
include key expressions. 
Key expression = cue expression: word 
sequence such that If a sentence includes it , 
the sentence is important. 
e.g. “In this study, we aim to extract key expressions that indicate 
the important sentence describing the originalities or contributions 
from article abstracts. ” 
To collect exhaustive key expressions
Related works 
• Nakawatase and Oyama (2011) 
→They make a list of key expressions of 
Japanese articles by hands . 
• Structuring abstracts and Text 
summarization 
→The cost of preparing training data is a 
problem. 
It is not realistic to prepare list of key expressions or 
training data for all discipline.
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary and future plans
Extraction of important sentences 
• To collect exhaustive key expressions by hands 
• To prepare annotated corpus to key expressions through 
Exhaustive key expression 
machine learning. 
Important sentence Key expression 
Many 
key expressions 
A lot of important 
sentences 
High Precision 
High Recall 
If we automatically collect subset of important sentences 
without key expressions…. 
High Precision 
Low Recall
How we collect subset of important sentences 
without key expressions? 
• A sentence in an abstract that has many common 
words with the title is more likely to be an important 
sentence.
How we collect subset of important sentences 
without key expressions? 
Title : 
EPUB as Publication Format in Open Access Journals: 
Tools and Workflow 
Abstract : 
In this article, we present a case study of how the main publishing format of 
an Open Access journal was changed from PDF to EPUB by designing a 
new workflow using JATS as the basic XML source format. 
We state the reasons and discuss advantages for doing this, how we did it, 
and the costs of changing an established Microsoft Word workflow. 
As an example, we use one typical sociology article with tables, illustrations 
and references. We then follow the article from JATS markup through 
different transformations resulting in XHTM.
How we collect subset of important sentences 
without key expressions? 
• A sentence in an abstract that has many common 
words with the title is more likely to be an important 
sentence. 
• We call such sentences “Pseudo-important 
sentences”
How to collect pseudo important 
sentences 
Title : 
EPUB as Publication Format in Open Access Journals: 
Tools and Workflow 
Abstract : 
In this article, we present a case study of how the main publishing format of 
an Open Access journal was changed from PDF to EPUB by designing a 
new workflow using JATS as the basic XML source format. 
We state the reasons and discuss advantages for doing this, how we did it, 
and the costs of changing an established Microsoft Word workflow. 
As an example, we use one typical sociology article with tables, illustrations 
and references. We then follow the article from JATS markup through 
different transformations resulting in XHTM.
How to collect pseudo important 
sentences 
Title : 
EPUB as Publication Format in Open Access Journals: 
Tools and Workflow 
Abstract : 
In this article, we present a case study of how the main publishing format of 
an Open Access journal was changed from PDF to EPUB by designing a 
new workflow using JATS as the basic XML source format. 
We state the reasons and discuss advantages for doing this, how we did it, 
and the costs of changing an established Microsoft Word workflow. 
T s is 
 
W A W s 
( )  
( ) 
As an example, we use one typical sociology article with tables, illustrations 
and references. We then follow the article from JATS markup through 
different transformations resulting in XHTM. 
W A W s 
( )  
( ) 
T 
a pseudo-important sentence
How we extract key expressions from 
pseudo-important sentences? 
Feature of Key expressions 
• Appear frequently in pseudo-important 
sentences. 
• Appear infrequently in non-pseudo-important 
sentences. 
→Extract key expressions using a ratio of key 
expression’s frequency 
We remove the expressions with a low frequency from 
candidates.
All sentences of abstracts 
Non-Pseudo-important 
sentence 
In this paper : 15 
Topic model : 55 
Pseudo-important 
sentence 
In this paper : 45 
Topic model : 45 
Latent diriclet : 4 Latent diriclet : 1
All sentences of abstracts 
Non-Pseudo-important 
sentence 
In this paper : 15 
Topic model : 55 
Pseudo-important 
sentence 
In this paper : 45 
45 
Topic model : 45 
Topic model : 45 
Key 
expression 
Topic model : 25 
15+45 
=0.75 
45 ≧ γ(10) 
Latent diriclet : 3 
Latent diriclet : 4 
≧ δ(0.5) 
Latent diriclet : 4 Latent diriclet : 1 
Threshold values
All sentences of abstracts 
Non-Pseudo-important 
=0.45 < δ (0.5) 
sentence 
In this paper : 15 
4 < γ (10) 
In this paper : 15 
Topic model : 55 
Pseudo-important 
sentence 
In this paper : 45 
Topic model : 45 
45 
45+55 
Latent diriclet : 4 Latent diriclet : 1
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary and future plans
Experiment 
Pseudo important 
sentences 
Key expressions 
Important sentences 
Dataset1 
Dataset2 
Evaluation 
Precision and recall
Experiment 
• Dataset for extracting key expressions 
• 10,000 abstracts 
• Computer Science, Analytical Chemistry 
• Dataset for evaluation 
• 115 abstracts 
• Computer Science, Analytical Chemistry
Experiment 
Parameter 
• Ratio of common words α → 0.3 
• Length of the key expressions N → 2, 3, 4 
• Threshold of Ratio  → 0.1, 0.3, 0.5 
• The minimum frequency  →10
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary and future plans
Result 
N δ Precision(%) recall(%) 
3 0.1 52.6 40.9 
3 0.3 64.2 17.2 
3 0.5 85.7 9.1 
High precision and low 
recall
Samples of extracted key 
expressions 
• (in, this, study) 
• (scheme, based, on) . 
→Expected key expressions 
• (atomic, absorption) 
• (high, performance, liquid, chromatography) 
→Technical terms are to be removed from key 
expressions.
Table of Contents 
• Background and Purpose 
• Method for extraction key 
expressions 
• Experiments 
• Result 
• Summary and future plans
Summary and future plans 
• To extract key expressions 
• Regard a sentence that has many 
common words with title as a pseudo-important 
sentence. 
• Extract key expressions based on the 
ratio of the frequency of the pseudo-important 
sentences and all sentences.
Summary and future plans 
• Result is High precision. 
• Increase recall as keeping high 
precision 
• Increase target abstracts used to extract 
pseudo-important sentences. 
• Remove technical terms from key 
expressions

More Related Content

Similar to Eskm20140903

Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
 
Joint Copying and Restricted Generation for Paraphrase
Joint Copying and Restricted Generation for ParaphraseJoint Copying and Restricted Generation for Paraphrase
Joint Copying and Restricted Generation for ParaphraseMasahiro Kaneko
 
Interface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryInterface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryPriyatham Bollimpalli
 
Tips for writing a paper
Tips for writing a paperTips for writing a paper
Tips for writing a paperGrace Yang
 
C5Think of a dependent variable within your work environme
C5Think of a dependent variable within your work environmeC5Think of a dependent variable within your work environme
C5Think of a dependent variable within your work environmeChereCoble417
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptxNLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptxrohithprabhas1
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
presentationtemplate-research11.pdf
presentationtemplate-research11.pdfpresentationtemplate-research11.pdf
presentationtemplate-research11.pdfHassanShehzad15
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 

Similar to Eskm20140903 (20)

Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Joint Copying and Restricted Generation for Paraphrase
Joint Copying and Restricted Generation for ParaphraseJoint Copying and Restricted Generation for Paraphrase
Joint Copying and Restricted Generation for Paraphrase
 
Natural Language Processing using Java
Natural Language Processing using JavaNatural Language Processing using Java
Natural Language Processing using Java
 
Abtract writing skills
Abtract writing skillsAbtract writing skills
Abtract writing skills
 
Interface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryInterface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation Memory
 
Tips for writing a paper
Tips for writing a paperTips for writing a paper
Tips for writing a paper
 
C5Think of a dependent variable within your work environme
C5Think of a dependent variable within your work environmeC5Think of a dependent variable within your work environme
C5Think of a dependent variable within your work environme
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptxNLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
project present
project presentproject present
project present
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
Tldr
TldrTldr
Tldr
 
Lec 2
Lec 2Lec 2
Lec 2
 
Text analytics
Text analyticsText analytics
Text analytics
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
presentationtemplate-research11.pdf
presentationtemplate-research11.pdfpresentationtemplate-research11.pdf
presentationtemplate-research11.pdf
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
CO620
CO620CO620
CO620
 

More from Shuhei Otani

CiNii Books APIを利用した所蔵情報可視化
CiNii Books  APIを利用した所蔵情報可視化CiNii Books  APIを利用した所蔵情報可視化
CiNii Books APIを利用した所蔵情報可視化Shuhei Otani
 
20130717学術情報セミナー(大谷)
20130717学術情報セミナー(大谷)20130717学術情報セミナー(大谷)
20130717学術情報セミナー(大谷)Shuhei Otani
 
20130717学術情報セミナー(大谷).ppt
20130717学術情報セミナー(大谷).ppt20130717学術情報セミナー(大谷).ppt
20130717学術情報セミナー(大谷).pptShuhei Otani
 
佐賀県内市町の公文書館機能に関する調査結果(8/6暫定版)
佐賀県内市町の公文書館機能に関する調査結果(8/6暫定版)佐賀県内市町の公文書館機能に関する調査結果(8/6暫定版)
佐賀県内市町の公文書館機能に関する調査結果(8/6暫定版)Shuhei Otani
 
20120223全史料協専門職問題セミナー渡辺講演概要
20120223全史料協専門職問題セミナー渡辺講演概要20120223全史料協専門職問題セミナー渡辺講演概要
20120223全史料協専門職問題セミナー渡辺講演概要Shuhei Otani
 
2012023全史料協専門職問題セミナー大谷報告
2012023全史料協専門職問題セミナー大谷報告2012023全史料協専門職問題セミナー大谷報告
2012023全史料協専門職問題セミナー大谷報告Shuhei Otani
 
Mokurokukousyuukai 100902
Mokurokukousyuukai 100902Mokurokukousyuukai 100902
Mokurokukousyuukai 100902Shuhei Otani
 
Argカフェ那覇(大谷)
Argカフェ那覇(大谷)Argカフェ那覇(大谷)
Argカフェ那覇(大谷)Shuhei Otani
 
100618 学術情報セミナー
100618 学術情報セミナー100618 学術情報セミナー
100618 学術情報セミナーShuhei Otani
 

More from Shuhei Otani (10)

CiNii Books APIを利用した所蔵情報可視化
CiNii Books  APIを利用した所蔵情報可視化CiNii Books  APIを利用した所蔵情報可視化
CiNii Books APIを利用した所蔵情報可視化
 
20130717学術情報セミナー(大谷)
20130717学術情報セミナー(大谷)20130717学術情報セミナー(大谷)
20130717学術情報セミナー(大谷)
 
20130717学術情報セミナー(大谷).ppt
20130717学術情報セミナー(大谷).ppt20130717学術情報セミナー(大谷).ppt
20130717学術情報セミナー(大谷).ppt
 
佐賀県内市町の公文書館機能に関する調査結果(8/6暫定版)
佐賀県内市町の公文書館機能に関する調査結果(8/6暫定版)佐賀県内市町の公文書館機能に関する調査結果(8/6暫定版)
佐賀県内市町の公文書館機能に関する調査結果(8/6暫定版)
 
20120223全史料協専門職問題セミナー渡辺講演概要
20120223全史料協専門職問題セミナー渡辺講演概要20120223全史料協専門職問題セミナー渡辺講演概要
20120223全史料協専門職問題セミナー渡辺講演概要
 
2012023全史料協専門職問題セミナー大谷報告
2012023全史料協専門職問題セミナー大谷報告2012023全史料協専門職問題セミナー大谷報告
2012023全史料協専門職問題セミナー大谷報告
 
Mokurokukousyuukai 100902
Mokurokukousyuukai 100902Mokurokukousyuukai 100902
Mokurokukousyuukai 100902
 
Argカフェ那覇(大谷)
Argカフェ那覇(大谷)Argカフェ那覇(大谷)
Argカフェ那覇(大谷)
 
gak
gakgak
gak
 
100618 学術情報セミナー
100618 学術情報セミナー100618 学術情報セミナー
100618 学術情報セミナー
 

Recently uploaded

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Recently uploaded (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 

Eskm20140903

  • 1. Extraction of Key Expressions Indicating the Important Sentence from Article Abstracts 5th International Conference on E-Service and Knowledge Management (ESKM 2014) Shuhei Otani Department of Library Science Kyushu University Yoichi Tomiura Department of Library Science Kyushu University
  • 2. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  • 3. Background • Academic information continues to be increased.(double in every 10 years ) • Disciplines have become subdivided. • Interdisciplinary research is promoted. We need many expense when academic information searches
  • 4. How to reduce the expense of academic information search Title : Not represent their contents You’re not from 'round here, are you? Abstract : Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. … Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. … Need many expense , if there are many results. Describe article’s contents
  • 5. Purpose Reduce the expense of academic information search Important Sentences Describe the originalities or contributions Ex. • Present search results where the important sentences are emphasized. • Keyword suggest
  • 6. How we extract important sentence? • Almost all of all important sentences include key expressions. Key expression = cue expression: word sequence such that If a sentence includes it , the sentence is important. e.g. “In this study, we aim to extract key expressions that indicate the important sentence describing the originalities or contributions from article abstracts. ” To collect exhaustive key expressions
  • 7. Related works • Nakawatase and Oyama (2011) →They make a list of key expressions of Japanese articles by hands . • Structuring abstracts and Text summarization →The cost of preparing training data is a problem. It is not realistic to prepare list of key expressions or training data for all discipline.
  • 8. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  • 9. Extraction of important sentences • To collect exhaustive key expressions by hands • To prepare annotated corpus to key expressions through Exhaustive key expression machine learning. Important sentence Key expression Many key expressions A lot of important sentences High Precision High Recall If we automatically collect subset of important sentences without key expressions…. High Precision Low Recall
  • 10. How we collect subset of important sentences without key expressions? • A sentence in an abstract that has many common words with the title is more likely to be an important sentence.
  • 11. How we collect subset of important sentences without key expressions? Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM.
  • 12. How we collect subset of important sentences without key expressions? • A sentence in an abstract that has many common words with the title is more likely to be an important sentence. • We call such sentences “Pseudo-important sentences”
  • 13. How to collect pseudo important sentences Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM.
  • 14. How to collect pseudo important sentences Title : EPUB as Publication Format in Open Access Journals: Tools and Workflow Abstract : In this article, we present a case study of how the main publishing format of an Open Access journal was changed from PDF to EPUB by designing a new workflow using JATS as the basic XML source format. We state the reasons and discuss advantages for doing this, how we did it, and the costs of changing an established Microsoft Word workflow. T s is  W A W s ( )  ( ) As an example, we use one typical sociology article with tables, illustrations and references. We then follow the article from JATS markup through different transformations resulting in XHTM. W A W s ( )  ( ) T a pseudo-important sentence
  • 15. How we extract key expressions from pseudo-important sentences? Feature of Key expressions • Appear frequently in pseudo-important sentences. • Appear infrequently in non-pseudo-important sentences. →Extract key expressions using a ratio of key expression’s frequency We remove the expressions with a low frequency from candidates.
  • 16. All sentences of abstracts Non-Pseudo-important sentence In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 Topic model : 45 Latent diriclet : 4 Latent diriclet : 1
  • 17. All sentences of abstracts Non-Pseudo-important sentence In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 45 Topic model : 45 Topic model : 45 Key expression Topic model : 25 15+45 =0.75 45 ≧ γ(10) Latent diriclet : 3 Latent diriclet : 4 ≧ δ(0.5) Latent diriclet : 4 Latent diriclet : 1 Threshold values
  • 18. All sentences of abstracts Non-Pseudo-important =0.45 < δ (0.5) sentence In this paper : 15 4 < γ (10) In this paper : 15 Topic model : 55 Pseudo-important sentence In this paper : 45 Topic model : 45 45 45+55 Latent diriclet : 4 Latent diriclet : 1
  • 19. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  • 20. Experiment Pseudo important sentences Key expressions Important sentences Dataset1 Dataset2 Evaluation Precision and recall
  • 21. Experiment • Dataset for extracting key expressions • 10,000 abstracts • Computer Science, Analytical Chemistry • Dataset for evaluation • 115 abstracts • Computer Science, Analytical Chemistry
  • 22. Experiment Parameter • Ratio of common words α → 0.3 • Length of the key expressions N → 2, 3, 4 • Threshold of Ratio  → 0.1, 0.3, 0.5 • The minimum frequency  →10
  • 23. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  • 24. Result N δ Precision(%) recall(%) 3 0.1 52.6 40.9 3 0.3 64.2 17.2 3 0.5 85.7 9.1 High precision and low recall
  • 25. Samples of extracted key expressions • (in, this, study) • (scheme, based, on) . →Expected key expressions • (atomic, absorption) • (high, performance, liquid, chromatography) →Technical terms are to be removed from key expressions.
  • 26. Table of Contents • Background and Purpose • Method for extraction key expressions • Experiments • Result • Summary and future plans
  • 27. Summary and future plans • To extract key expressions • Regard a sentence that has many common words with title as a pseudo-important sentence. • Extract key expressions based on the ratio of the frequency of the pseudo-important sentences and all sentences.
  • 28. Summary and future plans • Result is High precision. • Increase recall as keeping high precision • Increase target abstracts used to extract pseudo-important sentences. • Remove technical terms from key expressions

Editor's Notes

  1. Hi, I’m shuhei otani. I will talk about “Extraction of Key Expressions Indicating the Important Sentence from Article Abstracts ”
  2. In this presentation, first I’m gonna talk about Background and purpose of the present work,
  3. Recently Academic information continues to be increased. A study reports that the number of articles doubles in size every 10 years. Therefor we are faced too many search results. In addition, disciplines have become subdivided and interdisciplinary research is promoted by nations. In this situation, information searches that are outside of the researcher’s discipline occur. Click So we need large expense when we search academic information.
  4. How to reduce this expense of academic information search? The search results are often composed of titles and abstracts. Genelary speaaing, first we check the title, don’t we? However some article title does not represent their contents, because their title is Rhetorical or very short. For this example, “you are not around here” Click We can not understand their study from title. Next, we will check abstracts. Click Click However We need many expense to read all of abstract, If there are too many search result, Click If we can extract the sentence that describe articles’s contents. We can reduce the expense of academic imformation search.
  5. Our purpose is to reduce the expense of academic information searches by extraction “important sentences” from article abstract. Important sentences mean that describes the originalities or contributions in an article abstract.   If we can extract important sentence, for example, we can present the search result where the important sentences are emphasized and we can suggest other key word that we extracted important sentences. It is easy for us to grasp a content of articles.
  6. How we extract important sentence? In fact almost all of important sentences include key expressions. Key expression is equal cue expressions. Here we think that key expression is word sequence such that If a sentence includes it , the sentence is important. For example, maybe this sentence is important in an abstract. And this word sequence “in this study” shows this sentence is important. Click So purpose of our study is to collect exhaustive key expressions.
  7. There are some related works about extraction of important sentences. One of them is Nakawatase and Oyama . They made a list of key expressions such that sentences including them are important for their target abstracts in Japanese. Then they extracted important sentences using key expressions. However, it is expensive to prepare many key expressions by hands.   Structuring abstracts and text summarization are related to the extraction of important sentences. The methods used in both techniques with high performance are based on machine learning. Therefore, the cost of preparing training data become problems. Click So It is not realistic to prepare list of key expressions or training data for all discipline.
  8. Then I explain about our method for extraction key expressions.
  9. Our final goal is extraction of important sentences in abstracts. Click We need exhaustive key expression to extract important sentences with high precision and with high recall.. Click However It is not realistic to collect exhaustive key expressions by hands for all discipline. It is not also realistic to prepare annotated corpus to extract key expressions through machine learning. Click Click if we automatically collect subset of important sentences without key expressions. We can extract key expression from the extracted important sentence. And Using this key expression, we can extract important sentence with high precision but with low recall. If we can collect a lot of important sentences, we can extract many key expressions and Using them we can extract important sentence with high precision and with high recall.
  10. We explain How we collect subset of important sentences without key expressions? We presume that sentence in an abstract that has many common words with the title is more likely to be an important sentence.
  11. As you see, this sentence has many common words with the title, and you might this sentence is important.
  12. We extract important sentence using this tendency We call such sentences “Pseudo-important sentences”
  13. This is same abstract as I show previously. Using this I explain How to collect pseudo important sentences. This is a title and these are sentences in the abstract. WT(A) is a set of words in the title of article A. W(s) is a set of words in the sentence s
  14. If this value, the ratio of common words with the title , is higher than or equal to arufa, we regard s is a pseudo-important sentence.
  15. How we extract key expressions from pseudo-important sentences? First we show some feature of key expressions. Key expression appears frequently in pseudo-important sentences. And appears infrequently in non-pseudo-important sentences. So we extract key expressions using a ratio of key expression’s frequency in pseudo important sentence for all sentences of abstract . In addition, we remove the expressions with a low frequency from candidates because they have are low reliability of the ratio.
  16. In particular, this case , expression “In this paper” and “Topic model”is appear 45 And expression “Latent diriclet” is 4 times in pseudo-important sentence. And In non pseudo important sentence, “In this paper” is appear 15,topic model is 55, latent diriclet is 1.
  17. We make some threshold values to extract key expressions from Pseudo-important sentence. One of them is δ the minimum ratio of frequency between pseudo important sentences and all sentences. And The other is γ minimum frequency. We suppose that δ is set to 0.5 and γ is set to 10. In the case of “in this paper”, the ratio is 0.75 and it is greater than δ. And the frequency is 45 and it is grater than γ. So word sequence “In this paper” is extracted as key expression.
  18. In the case of “Topic model” the ratio is lower than δ. So it is not extracted as key expression. And in the case of “Laten dileclet” the frequency is lower than γ, So it is not extracted as key expression.
  19. Next I talk about our experiment.
  20. Finally will make a summary and talk about future plans.
  21. In this study, we aimed to extract key expressions that indicate the important sentence from article abstracts. We performed an experiment with the following method. We regard a sentence that shares many words with the article title as a pseudo-important sentence. We extracted key expressions from article abstracts based on the ratio of the frequency of the pseudo-important sentences and include them with the frequency of all sentences including these key expressions.
  22. Our result showed that the precision of extraction of an important sentence using the extracted key expressions was high, but low recall as we expected. We will improve recall by increasing target abstracts used to extract pseudo-important sentences and removing technical term from key expressions. That’s the end of my presentation . Thank you for your time.