SlideShare a Scribd company logo
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
DOI: 10.5121/ijaia.2019.10203 27
EXTRACTIVE SUMMARIZATION WITH VERY DEEP
PRETRAINED LANGUAGE MODEL
Yang Gu1
and Yanke Hu2
1
Suning USA, Palo Alto, California, USA
2
Humana, Irving, Texas, USA
ABSTRACT
Recent development of generative pretrained language models has been proven very successful on a wide
range of NLP tasks, such as text classification, question answering, textual entailment and so on. In this
work, we present a two-phase encoder decoder architecture based on Bidirectional Encoding
Representation from Transformers(BERT) for extractive summarization task. We evaluated our model by
both automatic metrics and human annotators, and demonstrated that the architecture achieves the state-
of-the-art comparable result on large scale corpus – CNN/Daily Mail1
. As the best of our knowledge, this
is the first work that applies BERT based architecture to a text summarization task and achieved the state-
of-the-art comparable result.
KEYWORDS
BERT, AI, Deep Learning, Summarization
1. INTRODUCTION
Document summarization is a widely investigated problem in natural language processing, and
mainly classified into two categories: extractive summarization and abstractive summarization.
Extractive approaches work in the way of extracting existing words or sentences from the original
text and organizing into the summary, while abstractive approaches focus more on generating the
inherently same summary that is closer to the way human expresses. We will focus on extractive
summarization in this paper.
Traditional extractive summarization approaches can be generally classified into two ways like:
greedy approaches [1] and graph-based approaches [2]. Recently, deep learning techniques have
been proven successful in this domain. Kageback et al. 2014 [3] utilized continuous vector
representations in the recurrent neural network for the first time, and achieved the best result on
Opinosis dataset [4]. Yin et al. 2015 [5] developed an unsupervised convolutional neural network
for learning the sentence representations, and then applied a special sentence selection algorithm
to balance sentence prestige and diversity. Cao et al. 2016 [6] applied the attention mechanism in
a joint neural network model that can learn query relevance ranking and sentence saliency
ranking simultaneously,and achieved competitive performance on DUC query-focused
summarization benchmark datasets2
. Cheng et al. 2016 [7] developed a hierarchical document
1
https://github.com/deepmind/rc-data
2
https://www-nlpir.nist.gov/projects/duc/
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
encoder and an attention-based extractor,and achieved results comparable to the state
CNN/Daily Mail corpus [8] without linguistic annotation.
Recent development of GPT
trained language model on a wide range of different tasks, such as text classification, que
answering, textual entailment, etc, but neither of these approaches
summarization task.
In this paper, we introduced a two
tuned this model on the CNN/Daily Mail
result demonstrated that our model has the state
automatic metrics (in terms of ROUGE
this is the first work that applies BERT based architecture to a text summarization task.
2. TRAINING DATASET
CNN/Daily Mail [8] dataset is the most widely used large scale dataset for summarization and
reading comprehension. The training set contains 287226 samples. The validation set contains
13368 samples. The test set contains 11490 samples. Each sample contains an article and
referenced summary. After tokenization, an article contains 800 tokens on average, and a
corresponding summary contains 56 tokens on average.
3. SUMMARIZATION MODEL
In this section, we propose the summarization model that efficiently utilizes BERT
text encoder. The architecture is shown in Figure
encoder and sentence classification.
Figure 1: Architecture of the Summarization Model
The sentences of the original document will be feed into the BERT encoder sequentially, and be
classified as if or not to be included in the summary. BERT is essentially a multi
bidirectional Transformer [19]
layer and a following linear affine sub
process utilizes query matrix
dimension , and it can be expressed as E
, ,
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
based extractor,and achieved results comparable to the state
without linguistic annotation.
[9] and BERT [10] has proven the effectiveness of a generative pre
trained language model on a wide range of different tasks, such as text classification, que
answering, textual entailment, etc, but neither of these approaches has been tested on the text
ced a two-phase encoder decoder architecture based on BERT. We fine
tuned this model on the CNN/Daily Mail corpus for single document summarization task. The
result demonstrated that our model has the state-of-the-art comparable performance by both
automatic metrics (in terms of ROUGE [11]) and human assessors. As the best of our knowledge,
rk that applies BERT based architecture to a text summarization task.
dataset is the most widely used large scale dataset for summarization and
reading comprehension. The training set contains 287226 samples. The validation set contains
13368 samples. The test set contains 11490 samples. Each sample contains an article and
referenced summary. After tokenization, an article contains 800 tokens on average, and a
corresponding summary contains 56 tokens on average.
ODEL
In this section, we propose the summarization model that efficiently utilizes BERT
text encoder. The architecture is shown in Figure 1 and consists of two main modules, BERT
encoder and sentence classification.
Figure 1: Architecture of the Summarization Model
The sentences of the original document will be feed into the BERT encoder sequentially, and be
classified as if or not to be included in the summary. BERT is essentially a multi
[19] encoder. Each layer consists of a multi-head self
layer and a following linear affine sub-layer with the residual connection. The BERT encoding
process utilizes query matrix , key matrix of dimension , and value matrix
, and it can be expressed as Eq.(1).
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
28
based extractor,and achieved results comparable to the state of the art on
has proven the effectiveness of a generative pre-
trained language model on a wide range of different tasks, such as text classification, question
been tested on the text
phase encoder decoder architecture based on BERT. We fine-
corpus for single document summarization task. The
art comparable performance by both
) and human assessors. As the best of our knowledge,
rk that applies BERT based architecture to a text summarization task.
dataset is the most widely used large scale dataset for summarization and
reading comprehension. The training set contains 287226 samples. The validation set contains
13368 samples. The test set contains 11490 samples. Each sample contains an article and a
referenced summary. After tokenization, an article contains 800 tokens on average, and a
In this section, we propose the summarization model that efficiently utilizes BERT [10] as the
of two main modules, BERT
The sentences of the original document will be feed into the BERT encoder sequentially, and be
classified as if or not to be included in the summary. BERT is essentially a multi-layer
ead self-attention sub-
layer with the residual connection. The BERT encoding
, and value matrix of
(1)
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
29
Given the input document as { !, … , #} where % denotes one source token. The BERT
encoder’s output can be expressed as Eq. (2).
& , , !, … , # (2)
In CNN/Daily Mail dataset, the ground truth is the referenced summaries without sentence labels,
so we follow Nallapati et al. [20] to convert the abstractive referenced summaries to extractive
labels. The idea is to add one sentence each time incrementally to the candidate summary, so that
the Rouge score of the current set of selected sentences is increasing in regard to the referenced
summary. We stopped when the remaining candidate sentences cannot promote the Rouge score
with respect to the referenced summary. With this approach, we convert a text summarization
task to a sentence classification task.
Our BERT encoder is based on Google’s TensorFlow3
implementation (TensorFlow version >=
1.11.0). We used the BERT-Base model(uncased, 12-layer,768-hidden,12-heads, 110M
parameters) and then fine tune the model on training set of CNN/Daily Main corpus. All inputs to
the reader are padded to 384 tokens; the learning rate is set to 3x10-5, and other settings are by
default.
4. EXPERIMENT RESULT
We evaluate ROUGE-1, ROUGE-2, and ROUGE-L [11] of our proposed model on CNN/Daily
Mail test dataset. This automatic metric measures the overlap of 1-gram (R-1), bigrams (R-2) and
the longest common subsequence between the model generated summaries and the reference
summaries. We also listed the previously reported ROUGE scores on CNN/Daily Mail in Table 1
for comparison. As Table 1 shows, the ROUGE score of BERT summarization is comparable to
the start-of-the-art models.
Table 1: Comparative evaluation of BERT Summarization with recently reported summarization systems
Models ROUGE-1 ROUGE-2 ROUGE-L
Pointer Generator [12]
ML + Intra-Attention [13]
Saliency + Entailment reward [14]
Key information guide network [15]
Inconsistency loss [16]
Sentence Rewriting [17]
Bottom-Up Summarization [18]
36.44
38.30
40.43
38.95
40.68
40.88
41.22
15.66
14.81
18.00
17.12
17.97
17.80
18.68
33.42
35.49
37.10
35.68
37.13
38.54
38.34
BERT Summarization (Ours) 37.30 17.05 34.76
As per Schluter [21], extractive summarization performance tends to be undervalued with respect
to ROUGE standard, so we also conduct human assessment on Amazon Mechanical Turk
(MTurk)4
for the relevance and readability between Bottom-Up Summarization model (best
3
https://github.com/google-research/bert
4
https://www.mturk.com/
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
30
reported score on CNN/Daily Mail) and our BERT Summarization model. We selected 3 human
annotators who had an approval rate over than 95% (at least 1000 HITs), and showed them 100
samples from the test dataset of CNN/Daily Mail, including the input article, the reference
summary, and the outputs of the two candidate models. We asked them to choose the better one
between the two candidate models’ outputs, or choose “tie” if both outputs were equally good or
bad. The relevance score is mainly based on if the output summary is informative and redundancy
free. The Readability score is mainly based on if the output summary is coherent and
grammatically correct. As shown in Table 2, our BERT Summarization model achieves higher
scores than the best reported Bottom-Up Summarization model by human assessment.
Table 2: Human assessment: pairwise comparison of relevance and readability between Bottom-Up
Summarization [18] and BERT Summarization
Models Relevance Readability Total
Bottom-Up Summarization 42 38 80
BERT Summarization (Ours) 49 46 95
Tie 9 16 25
5. CONCLUSIONS
In this paper, we present a two-phase encoder decoder architecture based on BERT for extractive
summarization task. We demonstrated that our model has the state-of-the-art comparable
performance on CNN/Daily Mail dataset by both automatic metrics and human assessors. As the
best of our knowledge, this is the first work that applies BERT based architecture to a text
summarization task. In the future, we will test employing better decoding and reinforcement
learning approaches to extend it to abstractive summarization task.
REFERENCES
[1] Carbonell, J., and Goldstein, J. 1998. The use of mmr, diversity-based reranking for reordering
documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR
conference on Research and development in information retrieval, 335–336. ACM.
[2] Radev, D., and Erkan, G. 2004. Lexrank: Graph-based lexical centrality as salience in text
summarization. Journal of Artificial Intelligence Research 457–479.
[3] Kageback, M.; Mogren, O.; Tahmasebi, N.; and Dubhashi, D. 2014. Extractive summarization
using continuous vector space models. 31–39.
[4] Ganesan, K.; Zhai, C.; and Han, J. 2010. Opinosis: a graph based approach to abstractive
summarization of highly redundant opinions. In Proceedings of the 23rd international conference
on computational linguistics, 340–348. Association for Computational Linguistics.
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
[5] Yin, W., and Pei, Y. 2015. Optimizing sentence modeling and selection for document
summarization. In Proceedings of the 24th International Conference on Artificial Intelligence,
1383–1389. AAAI Press.
[6] Cao, Z.; Li, W.; Li, S.; and Wei, F. 2016. Attsum: Joint learning of focusing and summarization
with neural attention. arXiv preprint arXiv:1604.00125.
[7] Cheng, J., and Lapata, M. 2016. Neural summarization by extracting sentenc
Annual Meeting of the Association for Computational Linguistics.
[8] Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa
Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. I
of Neural Information Processing Systems (NIPS).
[9] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.2018. [Online]. Available:
https://s3-us-west-2.amazonaws.com/openai
unsupervised/language_un
[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre
of deep bidirectional transformers for language understanding. arXiv:1810.04805.
[11] Chin-Yew Lin. 2004. ROUGE: a package for automatic e
of ACL Workshop on Text Summarization Branches Out.
[12] Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with
pointer-generator networks. arXiv preprint arXiv:1704.04368.
[13] Romain Paulus, CaimingXiong, and Richard Socher. 2017. A deep reinforced model for
abstractive summarization. arXiv preprint arXiv:1705.04304.
[14] RamakanthPasunuru and Mohit Bansal. 2018. Multireward reinforced summarization with
saliency and entailment. In Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short
Papers), volume 2, pages 646
[15] Chenliang Li, Weiran Xu, Si Li, and Sheng Gao. 2018
summarization based on key information guide network. In Proceedings of the 2018 Conference
of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 2 (Sho
[16] Wan-Ting Hsu, Chieh-
unified model for extractive and abstractive summarization using inconsistency loss. arXiv
preprint arXiv:1805.06266.
[17] Yen-Chun Chen and Mohit Bansal. 2018. Fast abstractive summarization with reinforce
sentence rewriting. arXiv preprint arXiv:1805.11080.
[18] Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. Bottom
summarization. arXiv preprint arX
[19] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Lukasz Kaiser, and IlliaPolosukhin. 2017. Attention is all you need. In Advances in Neural
Information Processing Systems, pages 6000
[20] Ramesh Nallapati, FeifeiZhai, and Bowen Zhou. SummaRuNN
based sequence model for extractive summarization of documents. In AAAI, pp. 3075
AAAI Press, 2017.
[21] Natalie Schluter. 2017. The limits of au
of the 15th Conference of the European Chapter of the Association for Computational Linguistics:
Short Papers. Valencia, Spain, pages 41
Authors
Yang Gu
Software Engineer at Suning USA AILab. 2 years research
experience in computer vision and natural language processing.
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
Yin, W., and Pei, Y. 2015. Optimizing sentence modeling and selection for document
summarization. In Proceedings of the 24th International Conference on Artificial Intelligence,
1389. AAAI Press.
Cao, Z.; Li, W.; Li, S.; and Wei, F. 2016. Attsum: Joint learning of focusing and summarization
with neural attention. arXiv preprint arXiv:1604.00125.
Cheng, J., and Lapata, M. 2016. Neural summarization by extracting sentenc
Annual Meeting of the Association for Computational Linguistics.
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa
Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. I
of Neural Information Processing Systems (NIPS).
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.2018. [Online]. Available:
2.amazonaws.com/openai-assets/research-covers/language-
unsupervised/language_understanding_paper.pdf
Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre
of deep bidirectional transformers for language understanding. arXiv:1810.04805.
Yew Lin. 2004. ROUGE: a package for automatic evaluation of summaries. In Proceedings
of ACL Workshop on Text Summarization Branches Out.
Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with
generator networks. arXiv preprint arXiv:1704.04368.
Romain Paulus, CaimingXiong, and Richard Socher. 2017. A deep reinforced model for
abstractive summarization. arXiv preprint arXiv:1705.04304.
RamakanthPasunuru and Mohit Bansal. 2018. Multireward reinforced summarization with
t. In Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short
Papers), volume 2, pages 646–653.
Chenliang Li, Weiran Xu, Si Li, and Sheng Gao. 2018. Guiding generation for abstractive text
summarization based on key information guide network. In Proceedings of the 2018 Conference
of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 2 (Short Papers), volume 2, pages 55–60.
-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, and Min Sun. 2018. A
unified model for extractive and abstractive summarization using inconsistency loss. arXiv
preprint arXiv:1805.06266.
un Chen and Mohit Bansal. 2018. Fast abstractive summarization with reinforce
sentence rewriting. arXiv preprint arXiv:1805.11080.
Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. Bottom
summarization. arXiv preprint arXiv:1808.10792, 2018.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Lukasz Kaiser, and IlliaPolosukhin. 2017. Attention is all you need. In Advances in Neural
Information Processing Systems, pages 6000–6010.
feiZhai, and Bowen Zhou. SummaRuNNer: A recurrent neural network
based sequence model for extractive summarization of documents. In AAAI, pp. 3075
Natalie Schluter. 2017. The limits of automatic summarization according to rouge. In Proceedings
of the 15th Conference of the European Chapter of the Association for Computational Linguistics:
Short Papers. Valencia, Spain, pages 41–45.
Software Engineer at Suning USA AILab. 2 years research and development
experience in computer vision and natural language processing.
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
31
Yin, W., and Pei, Y. 2015. Optimizing sentence modeling and selection for document
summarization. In Proceedings of the 24th International Conference on Artificial Intelligence,
Cao, Z.; Li, W.; Li, S.; and Wei, F. 2016. Attsum: Joint learning of focusing and summarization
Cheng, J., and Lapata, M. 2016. Neural summarization by extracting sentences and words. 54th
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa
Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Proceedings
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.2018. [Online]. Available:
-
Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training
of deep bidirectional transformers for language understanding. arXiv:1810.04805.
valuation of summaries. In Proceedings
Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with
Romain Paulus, CaimingXiong, and Richard Socher. 2017. A deep reinforced model for
RamakanthPasunuru and Mohit Bansal. 2018. Multireward reinforced summarization with
t. In Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short
. Guiding generation for abstractive text
summarization based on key information guide network. In Proceedings of the 2018 Conference
of the North American Chapter of the Association for Computational Linguistics: Human
Ying Lee, Kerui Min, Jing Tang, and Min Sun. 2018. A
unified model for extractive and abstractive summarization using inconsistency loss. arXiv
un Chen and Mohit Bansal. 2018. Fast abstractive summarization with reinforce-selected
Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. Bottom-up abstractive
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Lukasz Kaiser, and IlliaPolosukhin. 2017. Attention is all you need. In Advances in Neural
er: A recurrent neural network
based sequence model for extractive summarization of documents. In AAAI, pp. 3075–3081.
ation according to rouge. In Proceedings
of the 15th Conference of the European Chapter of the Association for Computational Linguistics:
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
Yanke Hu
Senior Cognitive/Machine Learning Engineer at Humana, responsible for deep
learning research and AI platform development. 9
development experience in software industry at companies majoring in
navigation (TeleNav), business analytics (IBM), finance(Fintopia), retail (AiFi)
and healthcare (Humana).
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
Senior Cognitive/Machine Learning Engineer at Humana, responsible for deep
learning research and AI platform development. 9 years research and
development experience in software industry at companies majoring in
navigation (TeleNav), business analytics (IBM), finance(Fintopia), retail (AiFi)
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019
32

More Related Content

What's hot

Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...
IJECEIAES
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Design
ijtsrd
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET Journal
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
Performance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyPerformance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganography
journalBEEI
 
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
IJNSA Journal
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
WarNik Chow
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
Hanwha System / ICT
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
Editor IJCATR
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
Senthil Kumar M
 
Bert
BertBert
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
AbdurrahimDerric
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
ijwscjournal
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
csandit
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
AbdurrahimDerric
 

What's hot (15)

Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Design
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Performance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyPerformance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganography
 
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Bert
BertBert
Bert
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 

Similar to Extractive Summarization with Very Deep Pretrained Language Model

Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
IRJET Journal
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
IRJET Journal
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
saurav singla
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
ijasuc
 
Novel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information ExtractionNovel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information Extraction
ijsrd.com
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ijnlc
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
kevig
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
kevig
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
ijnlc
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
kevig
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
ijaia
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET Journal
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
IRJET Journal
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
IRJET Journal
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
IJCI JOURNAL
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
IRJET Journal
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
ijdms
 
Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...
IAESIJAI
 

Similar to Extractive Summarization with Very Deep Pretrained Language Model (20)

Y24168171
Y24168171Y24168171
Y24168171
 
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
Novel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information ExtractionNovel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information Extraction
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...
 

More from gerogepatton

DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...
gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
gerogepatton
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...
gerogepatton
 
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
gerogepatton
 
Information Extraction from Product Labels: A Machine Vision Approach
Information Extraction from Product Labels: A Machine Vision ApproachInformation Extraction from Product Labels: A Machine Vision Approach
Information Extraction from Product Labels: A Machine Vision Approach
gerogepatton
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...
gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
gerogepatton
 
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...
gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
gerogepatton
 
10th International Conference on Artificial Intelligence and Soft Computing (...
10th International Conference on Artificial Intelligence and Soft Computing (...10th International Conference on Artificial Intelligence and Soft Computing (...
10th International Conference on Artificial Intelligence and Soft Computing (...
gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
gerogepatton
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...
gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
gerogepatton
 
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
gerogepatton
 
13th International Conference on Software Engineering and Applications (SEA 2...
13th International Conference on Software Engineering and Applications (SEA 2...13th International Conference on Software Engineering and Applications (SEA 2...
13th International Conference on Software Engineering and Applications (SEA 2...
gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
gerogepatton
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
gerogepatton
 

More from gerogepatton (20)

DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...
 
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
Information Extraction from Product Labels: A Machine Vision Approach
Information Extraction from Product Labels: A Machine Vision ApproachInformation Extraction from Product Labels: A Machine Vision Approach
Information Extraction from Product Labels: A Machine Vision Approach
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
10th International Conference on Artificial Intelligence and Soft Computing (...
10th International Conference on Artificial Intelligence and Soft Computing (...10th International Conference on Artificial Intelligence and Soft Computing (...
10th International Conference on Artificial Intelligence and Soft Computing (...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
 
13th International Conference on Software Engineering and Applications (SEA 2...
13th International Conference on Software Engineering and Applications (SEA 2...13th International Conference on Software Engineering and Applications (SEA 2...
13th International Conference on Software Engineering and Applications (SEA 2...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 

Recently uploaded

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 

Recently uploaded (20)

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 

Extractive Summarization with Very Deep Pretrained Language Model

  • 1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 DOI: 10.5121/ijaia.2019.10203 27 EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL Yang Gu1 and Yanke Hu2 1 Suning USA, Palo Alto, California, USA 2 Humana, Irving, Texas, USA ABSTRACT Recent development of generative pretrained language models has been proven very successful on a wide range of NLP tasks, such as text classification, question answering, textual entailment and so on. In this work, we present a two-phase encoder decoder architecture based on Bidirectional Encoding Representation from Transformers(BERT) for extractive summarization task. We evaluated our model by both automatic metrics and human annotators, and demonstrated that the architecture achieves the state- of-the-art comparable result on large scale corpus – CNN/Daily Mail1 . As the best of our knowledge, this is the first work that applies BERT based architecture to a text summarization task and achieved the state- of-the-art comparable result. KEYWORDS BERT, AI, Deep Learning, Summarization 1. INTRODUCTION Document summarization is a widely investigated problem in natural language processing, and mainly classified into two categories: extractive summarization and abstractive summarization. Extractive approaches work in the way of extracting existing words or sentences from the original text and organizing into the summary, while abstractive approaches focus more on generating the inherently same summary that is closer to the way human expresses. We will focus on extractive summarization in this paper. Traditional extractive summarization approaches can be generally classified into two ways like: greedy approaches [1] and graph-based approaches [2]. Recently, deep learning techniques have been proven successful in this domain. Kageback et al. 2014 [3] utilized continuous vector representations in the recurrent neural network for the first time, and achieved the best result on Opinosis dataset [4]. Yin et al. 2015 [5] developed an unsupervised convolutional neural network for learning the sentence representations, and then applied a special sentence selection algorithm to balance sentence prestige and diversity. Cao et al. 2016 [6] applied the attention mechanism in a joint neural network model that can learn query relevance ranking and sentence saliency ranking simultaneously,and achieved competitive performance on DUC query-focused summarization benchmark datasets2 . Cheng et al. 2016 [7] developed a hierarchical document 1 https://github.com/deepmind/rc-data 2 https://www-nlpir.nist.gov/projects/duc/
  • 2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 encoder and an attention-based extractor,and achieved results comparable to the state CNN/Daily Mail corpus [8] without linguistic annotation. Recent development of GPT trained language model on a wide range of different tasks, such as text classification, que answering, textual entailment, etc, but neither of these approaches summarization task. In this paper, we introduced a two tuned this model on the CNN/Daily Mail result demonstrated that our model has the state automatic metrics (in terms of ROUGE this is the first work that applies BERT based architecture to a text summarization task. 2. TRAINING DATASET CNN/Daily Mail [8] dataset is the most widely used large scale dataset for summarization and reading comprehension. The training set contains 287226 samples. The validation set contains 13368 samples. The test set contains 11490 samples. Each sample contains an article and referenced summary. After tokenization, an article contains 800 tokens on average, and a corresponding summary contains 56 tokens on average. 3. SUMMARIZATION MODEL In this section, we propose the summarization model that efficiently utilizes BERT text encoder. The architecture is shown in Figure encoder and sentence classification. Figure 1: Architecture of the Summarization Model The sentences of the original document will be feed into the BERT encoder sequentially, and be classified as if or not to be included in the summary. BERT is essentially a multi bidirectional Transformer [19] layer and a following linear affine sub process utilizes query matrix dimension , and it can be expressed as E , , International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 based extractor,and achieved results comparable to the state without linguistic annotation. [9] and BERT [10] has proven the effectiveness of a generative pre trained language model on a wide range of different tasks, such as text classification, que answering, textual entailment, etc, but neither of these approaches has been tested on the text ced a two-phase encoder decoder architecture based on BERT. We fine tuned this model on the CNN/Daily Mail corpus for single document summarization task. The result demonstrated that our model has the state-of-the-art comparable performance by both automatic metrics (in terms of ROUGE [11]) and human assessors. As the best of our knowledge, rk that applies BERT based architecture to a text summarization task. dataset is the most widely used large scale dataset for summarization and reading comprehension. The training set contains 287226 samples. The validation set contains 13368 samples. The test set contains 11490 samples. Each sample contains an article and referenced summary. After tokenization, an article contains 800 tokens on average, and a corresponding summary contains 56 tokens on average. ODEL In this section, we propose the summarization model that efficiently utilizes BERT text encoder. The architecture is shown in Figure 1 and consists of two main modules, BERT encoder and sentence classification. Figure 1: Architecture of the Summarization Model The sentences of the original document will be feed into the BERT encoder sequentially, and be classified as if or not to be included in the summary. BERT is essentially a multi [19] encoder. Each layer consists of a multi-head self layer and a following linear affine sub-layer with the residual connection. The BERT encoding process utilizes query matrix , key matrix of dimension , and value matrix , and it can be expressed as Eq.(1). International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 28 based extractor,and achieved results comparable to the state of the art on has proven the effectiveness of a generative pre- trained language model on a wide range of different tasks, such as text classification, question been tested on the text phase encoder decoder architecture based on BERT. We fine- corpus for single document summarization task. The art comparable performance by both ) and human assessors. As the best of our knowledge, rk that applies BERT based architecture to a text summarization task. dataset is the most widely used large scale dataset for summarization and reading comprehension. The training set contains 287226 samples. The validation set contains 13368 samples. The test set contains 11490 samples. Each sample contains an article and a referenced summary. After tokenization, an article contains 800 tokens on average, and a In this section, we propose the summarization model that efficiently utilizes BERT [10] as the of two main modules, BERT The sentences of the original document will be feed into the BERT encoder sequentially, and be classified as if or not to be included in the summary. BERT is essentially a multi-layer ead self-attention sub- layer with the residual connection. The BERT encoding , and value matrix of (1)
  • 3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 29 Given the input document as { !, … , #} where % denotes one source token. The BERT encoder’s output can be expressed as Eq. (2). & , , !, … , # (2) In CNN/Daily Mail dataset, the ground truth is the referenced summaries without sentence labels, so we follow Nallapati et al. [20] to convert the abstractive referenced summaries to extractive labels. The idea is to add one sentence each time incrementally to the candidate summary, so that the Rouge score of the current set of selected sentences is increasing in regard to the referenced summary. We stopped when the remaining candidate sentences cannot promote the Rouge score with respect to the referenced summary. With this approach, we convert a text summarization task to a sentence classification task. Our BERT encoder is based on Google’s TensorFlow3 implementation (TensorFlow version >= 1.11.0). We used the BERT-Base model(uncased, 12-layer,768-hidden,12-heads, 110M parameters) and then fine tune the model on training set of CNN/Daily Main corpus. All inputs to the reader are padded to 384 tokens; the learning rate is set to 3x10-5, and other settings are by default. 4. EXPERIMENT RESULT We evaluate ROUGE-1, ROUGE-2, and ROUGE-L [11] of our proposed model on CNN/Daily Mail test dataset. This automatic metric measures the overlap of 1-gram (R-1), bigrams (R-2) and the longest common subsequence between the model generated summaries and the reference summaries. We also listed the previously reported ROUGE scores on CNN/Daily Mail in Table 1 for comparison. As Table 1 shows, the ROUGE score of BERT summarization is comparable to the start-of-the-art models. Table 1: Comparative evaluation of BERT Summarization with recently reported summarization systems Models ROUGE-1 ROUGE-2 ROUGE-L Pointer Generator [12] ML + Intra-Attention [13] Saliency + Entailment reward [14] Key information guide network [15] Inconsistency loss [16] Sentence Rewriting [17] Bottom-Up Summarization [18] 36.44 38.30 40.43 38.95 40.68 40.88 41.22 15.66 14.81 18.00 17.12 17.97 17.80 18.68 33.42 35.49 37.10 35.68 37.13 38.54 38.34 BERT Summarization (Ours) 37.30 17.05 34.76 As per Schluter [21], extractive summarization performance tends to be undervalued with respect to ROUGE standard, so we also conduct human assessment on Amazon Mechanical Turk (MTurk)4 for the relevance and readability between Bottom-Up Summarization model (best 3 https://github.com/google-research/bert 4 https://www.mturk.com/
  • 4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 30 reported score on CNN/Daily Mail) and our BERT Summarization model. We selected 3 human annotators who had an approval rate over than 95% (at least 1000 HITs), and showed them 100 samples from the test dataset of CNN/Daily Mail, including the input article, the reference summary, and the outputs of the two candidate models. We asked them to choose the better one between the two candidate models’ outputs, or choose “tie” if both outputs were equally good or bad. The relevance score is mainly based on if the output summary is informative and redundancy free. The Readability score is mainly based on if the output summary is coherent and grammatically correct. As shown in Table 2, our BERT Summarization model achieves higher scores than the best reported Bottom-Up Summarization model by human assessment. Table 2: Human assessment: pairwise comparison of relevance and readability between Bottom-Up Summarization [18] and BERT Summarization Models Relevance Readability Total Bottom-Up Summarization 42 38 80 BERT Summarization (Ours) 49 46 95 Tie 9 16 25 5. CONCLUSIONS In this paper, we present a two-phase encoder decoder architecture based on BERT for extractive summarization task. We demonstrated that our model has the state-of-the-art comparable performance on CNN/Daily Mail dataset by both automatic metrics and human assessors. As the best of our knowledge, this is the first work that applies BERT based architecture to a text summarization task. In the future, we will test employing better decoding and reinforcement learning approaches to extend it to abstractive summarization task. REFERENCES [1] Carbonell, J., and Goldstein, J. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 335–336. ACM. [2] Radev, D., and Erkan, G. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 457–479. [3] Kageback, M.; Mogren, O.; Tahmasebi, N.; and Dubhashi, D. 2014. Extractive summarization using continuous vector space models. 31–39. [4] Ganesan, K.; Zhai, C.; and Han, J. 2010. Opinosis: a graph based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd international conference on computational linguistics, 340–348. Association for Computational Linguistics.
  • 5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 [5] Yin, W., and Pei, Y. 2015. Optimizing sentence modeling and selection for document summarization. In Proceedings of the 24th International Conference on Artificial Intelligence, 1383–1389. AAAI Press. [6] Cao, Z.; Li, W.; Li, S.; and Wei, F. 2016. Attsum: Joint learning of focusing and summarization with neural attention. arXiv preprint arXiv:1604.00125. [7] Cheng, J., and Lapata, M. 2016. Neural summarization by extracting sentenc Annual Meeting of the Association for Computational Linguistics. [8] Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. I of Neural Information Processing Systems (NIPS). [9] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.2018. [Online]. Available: https://s3-us-west-2.amazonaws.com/openai unsupervised/language_un [10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre of deep bidirectional transformers for language understanding. arXiv:1810.04805. [11] Chin-Yew Lin. 2004. ROUGE: a package for automatic e of ACL Workshop on Text Summarization Branches Out. [12] Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368. [13] Romain Paulus, CaimingXiong, and Richard Socher. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304. [14] RamakanthPasunuru and Mohit Bansal. 2018. Multireward reinforced summarization with saliency and entailment. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), volume 2, pages 646 [15] Chenliang Li, Weiran Xu, Si Li, and Sheng Gao. 2018 summarization based on key information guide network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Sho [16] Wan-Ting Hsu, Chieh- unified model for extractive and abstractive summarization using inconsistency loss. arXiv preprint arXiv:1805.06266. [17] Yen-Chun Chen and Mohit Bansal. 2018. Fast abstractive summarization with reinforce sentence rewriting. arXiv preprint arXiv:1805.11080. [18] Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. Bottom summarization. arXiv preprint arX [19] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and IlliaPolosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000 [20] Ramesh Nallapati, FeifeiZhai, and Bowen Zhou. SummaRuNN based sequence model for extractive summarization of documents. In AAAI, pp. 3075 AAAI Press, 2017. [21] Natalie Schluter. 2017. The limits of au of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers. Valencia, Spain, pages 41 Authors Yang Gu Software Engineer at Suning USA AILab. 2 years research experience in computer vision and natural language processing. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 Yin, W., and Pei, Y. 2015. Optimizing sentence modeling and selection for document summarization. In Proceedings of the 24th International Conference on Artificial Intelligence, 1389. AAAI Press. Cao, Z.; Li, W.; Li, S.; and Wei, F. 2016. Attsum: Joint learning of focusing and summarization with neural attention. arXiv preprint arXiv:1604.00125. Cheng, J., and Lapata, M. 2016. Neural summarization by extracting sentenc Annual Meeting of the Association for Computational Linguistics. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. I of Neural Information Processing Systems (NIPS). Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.2018. [Online]. Available: 2.amazonaws.com/openai-assets/research-covers/language- unsupervised/language_understanding_paper.pdf Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre of deep bidirectional transformers for language understanding. arXiv:1810.04805. Yew Lin. 2004. ROUGE: a package for automatic evaluation of summaries. In Proceedings of ACL Workshop on Text Summarization Branches Out. Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with generator networks. arXiv preprint arXiv:1704.04368. Romain Paulus, CaimingXiong, and Richard Socher. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304. RamakanthPasunuru and Mohit Bansal. 2018. Multireward reinforced summarization with t. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), volume 2, pages 646–653. Chenliang Li, Weiran Xu, Si Li, and Sheng Gao. 2018. Guiding generation for abstractive text summarization based on key information guide network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), volume 2, pages 55–60. -Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, and Min Sun. 2018. A unified model for extractive and abstractive summarization using inconsistency loss. arXiv preprint arXiv:1805.06266. un Chen and Mohit Bansal. 2018. Fast abstractive summarization with reinforce sentence rewriting. arXiv preprint arXiv:1805.11080. Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. Bottom summarization. arXiv preprint arXiv:1808.10792, 2018. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and IlliaPolosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010. feiZhai, and Bowen Zhou. SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents. In AAAI, pp. 3075 Natalie Schluter. 2017. The limits of automatic summarization according to rouge. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers. Valencia, Spain, pages 41–45. Software Engineer at Suning USA AILab. 2 years research and development experience in computer vision and natural language processing. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 31 Yin, W., and Pei, Y. 2015. Optimizing sentence modeling and selection for document summarization. In Proceedings of the 24th International Conference on Artificial Intelligence, Cao, Z.; Li, W.; Li, S.; and Wei, F. 2016. Attsum: Joint learning of focusing and summarization Cheng, J., and Lapata, M. 2016. Neural summarization by extracting sentences and words. 54th Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Proceedings Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.2018. [Online]. Available: - Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. valuation of summaries. In Proceedings Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with Romain Paulus, CaimingXiong, and Richard Socher. 2017. A deep reinforced model for RamakanthPasunuru and Mohit Bansal. 2018. Multireward reinforced summarization with t. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short . Guiding generation for abstractive text summarization based on key information guide network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Ying Lee, Kerui Min, Jing Tang, and Min Sun. 2018. A unified model for extractive and abstractive summarization using inconsistency loss. arXiv un Chen and Mohit Bansal. 2018. Fast abstractive summarization with reinforce-selected Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. Bottom-up abstractive Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and IlliaPolosukhin. 2017. Attention is all you need. In Advances in Neural er: A recurrent neural network based sequence model for extractive summarization of documents. In AAAI, pp. 3075–3081. ation according to rouge. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics:
  • 6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 Yanke Hu Senior Cognitive/Machine Learning Engineer at Humana, responsible for deep learning research and AI platform development. 9 development experience in software industry at companies majoring in navigation (TeleNav), business analytics (IBM), finance(Fintopia), retail (AiFi) and healthcare (Humana). International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 Senior Cognitive/Machine Learning Engineer at Humana, responsible for deep learning research and AI platform development. 9 years research and development experience in software industry at companies majoring in navigation (TeleNav), business analytics (IBM), finance(Fintopia), retail (AiFi) International Journal of Artificial Intelligence and Applications (IJAIA), Vol.10, No.2, March 2019 32