The document discusses automatic text summarization, including trends, challenges and opportunities in the field. It provides an overview of existing work on extractive and abstractive summarization techniques. Recent trends include the use of deep learning models like neural attention models and RNN-based summarizers. Challenges include the rare word problem in neural summarizers and the difficulty of evaluation. The future of summarization is seen to involve better control of sequence to sequence models and improved evaluation metrics.
- The document describes a logic programming course that will be taught in English on Tuesdays and Thursdays, with theory covered on Tuesdays and practical lab work on Thursdays.
- The main teaching materials will be the book "Learn Prolog Now!" and the SWI Prolog interpreter.
- The first lecture will introduce Prolog syntax and the concepts of facts, rules, and queries through examples.
Selective encoding for abstractive sentence summarizationKodaira Tomonori
This document describes a selective encoding model for abstractive sentence summarization. The model uses a selective gate to filter unimportant information from the encoder states before decoding. It achieves state-of-the-art results on several datasets, outperforming sequence-to-sequence and attention-based models. The model consists of an encoder, selective gate, and decoder. It is trained end-to-end to maximize the likelihood of generating reference summaries.
Get To The Point: Summarization with Pointer-Generator Networks_acl17_論文紹介Masayoshi Kondo
Neural Text Summarizationタスクの研究論文.ACL'17- long paper採択.スタンフォード大のD.Manning-labの博士学生とGoogle Brainの共同研究.長文データ(multi-sentences)に対して、生成時のrepetitionを回避するような仕組みをモデルに導入し、長文の要約生成を可能とした.ゼミでの論文紹介資料.論文URL : https://arxiv.org/abs/1704.04368
- The document describes a logic programming course that will be taught in English on Tuesdays and Thursdays, with theory covered on Tuesdays and practical lab work on Thursdays.
- The main teaching materials will be the book "Learn Prolog Now!" and the SWI Prolog interpreter.
- The first lecture will introduce Prolog syntax and the concepts of facts, rules, and queries through examples.
Selective encoding for abstractive sentence summarizationKodaira Tomonori
This document describes a selective encoding model for abstractive sentence summarization. The model uses a selective gate to filter unimportant information from the encoder states before decoding. It achieves state-of-the-art results on several datasets, outperforming sequence-to-sequence and attention-based models. The model consists of an encoder, selective gate, and decoder. It is trained end-to-end to maximize the likelihood of generating reference summaries.
Get To The Point: Summarization with Pointer-Generator Networks_acl17_論文紹介Masayoshi Kondo
Neural Text Summarizationタスクの研究論文.ACL'17- long paper採択.スタンフォード大のD.Manning-labの博士学生とGoogle Brainの共同研究.長文データ(multi-sentences)に対して、生成時のrepetitionを回避するような仕組みをモデルに導入し、長文の要約生成を可能とした.ゼミでの論文紹介資料.論文URL : https://arxiv.org/abs/1704.04368
See to believe: capturing insights using contextual inquiryDeirdre Costello
Presented by Deirdre Costello, Kate Lawrence and Melissa Pike to Boston UXPA members on September 18, 2014.
EBSCO's User Research team recently completed an in-depth, ethnography-style study of physicians' research habits, including how they judge credibility, how they learn about the sources they use and what they do with the information they find.
Two researchers and a product manager will talk about the methodology, the project and how the findings influenced a product roadmap. And answer your questions, of course!
DSC UTeM DevOps Session#1: Intro to DevOps Presentation SlidesDSC UTeM
DevOps has been such a buzzword in the IT field nowadays. If you look into job postings, you might be surprised to find terms like "work with DevOps team", "work in an agile team" etc.
What is DevOps? What is agile? And why all these? 樂
Join us on 24 May 2021, where we have a short session to explore on the events that led to the trend nowadays
We will be exploring on the current trends, tech stacks and the existence of DevOps itself! 朗
Mark this date on your calendar and we'll see you there!
* Note: This is an introductory "brief overview" session that gives you context on our upcoming events.
Slides by KwongTN.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
The document proposes incorporating Chinese radicals into neural machine translation models. It discusses related work incorporating word and character level information into neural MT. The proposed model combines radical-level MT with an attention-based neural model, representing input text with word, character, and radical combinations. Experiments show the character+radical and word+radical models outperform baselines on standard MT evaluation metrics using a Chinese-English dataset. Future work includes improving model optimization and testing on additional data.
This document provides background and guidance for writing a research proposal summarizing the potential problems caused by increased technology use in American society. It outlines essential questions and internet sources for research on topics like privacy issues with drones, social media, and cell phone use. The document instructs evaluating sources and includes sample graphs as primary evidence. It guides developing an outline and writing a draft proposal highlighting technology's problems and promises while citing supporting research.
Harsh Patel has applied for software developer positions with 7+ years of experience in Java programming. He holds a Bachelor's degree in Computer Engineering and has experience with technologies like PHP, HTML, JavaScript, and databases. His projects include a college management system and website. He is looking for opportunities to further develop his skills and contribute to advancing India's technology sector.
This document discusses web crawling techniques. It begins with an outline of topics covered, including the motivation and taxonomy of crawlers, basic crawlers and implementation issues, universal crawlers, preferential crawlers, crawler evaluation, ethics, and new developments. It then covers basic crawlers and their implementation, including graph traversal techniques, a basic crawler code example in Perl, and various implementation issues around fetching, parsing, indexing text, dealing with dynamic content, relative URLs, and URL canonicalization.
The document discusses user studies conducted during the redesign of a university website. It provides examples of different user study methods used, including online surveys that received over 7,000 responses to identify user tasks, guerilla testing using card sorting and whiteboarding to develop taxonomy and page flow, and live testing of visual designs using a hot-spot tool before coding. The summary highlights the main user study methods, tools used for each method, and concludes by noting the importance of engaging users early through qualitative testing.
The document provides guidance for students to research and write a paper on the topic of privacy versus security and the use of technology. It outlines an assignment where students will research the issue, evaluate internet sources, write a proposal presenting their position, and create a final paper using a five paragraph essay structure. Students are given essential questions to consider and links to various sources to inform their research on both sides of the debate around privacy and national security.
Predicting and Preparing For Emerging Learning Technologieslisbk
The document summarizes Brian Kelly's presentation on predicting and preparing for emerging learning technologies. It discusses identifying technology trends, drivers, and challenges through the Delphi process used by the NMC Horizon Report. It also provides tools and methods for institutions to plan for future technologies, including scenario planning, acknowledging risks, and engaging with challenges. The presentation aims to help attendees understand limitations of future forecasting and apply similar methodologies to plan locally.
Webinar - SEO for Beginners: Simple Steps for Nonprofits and Libraries - 2016...TechSoup
SEO – search engine optimization – is the practice of improving, and promoting a website in order to increase the number of visitors the site receives from search engines. The majority of traffic to your organization or library website may come from the three major search engines - Google, Yahoo, and Bing.
In this free webinar with Whole Whale, learn some basic SEO tips for beginners to help your organization's site and content rank higher and be found more consistently, helping you grow your reach and supporters.
There are over 200 factors that translate in the the Google Search algorithm that handles over 1 trillion searches each day. This session gives a simple history of how we got here and the basics of the algorithm. We cover the main topics and key terms you should know, as well as the guiding principles of the system. This overview will help your team start to decode the nice versus necessary elements of SEO your organization can use to increase organic traffic.
Takeaways:
-- Keyword research
-- Link-building basics to increase traffic
-- Understanding the on-page and off-page principles of the algorithm
This document provides an overview of the research process and how to write research papers and articles. It discusses various parts of a research paper like the title, abstract, introduction, literature review, methods, results, conclusion, references, and plagiarism. It also lists various tools that can be used for writing, data analysis, literature searching, data collection, data storage, connecting with other researchers, grammar and plagiarism checking. The document is intended as a guide for writing research papers and articles.
This document provides an agenda and information for an IT class. It introduces assignments including a PowerPoint presentation, thesis topics, and library and research resources. It explains how to format papers in MLA style and conduct scholarly research. Homework includes finding scholarly sources and creating Word equations. Extra credit options are writing a research paper or presenting the PowerPoint. The next class will cover identity theft, hackers, viruses, malware, spam, and cookies. Students will debate whether Wendy's or Dunkin' Donuts wifi is safer to use.
This webinar will provide guidance for proper planning and managing, in order to get your distributed teams working smoothly and effectively. Prerequisites: A working knowledge of Lean and Scrum NPD methods (stand-up meetings, user stories, backlog, sprints, burn-down charts, etc.)
We will cover the following topics in this webinar:
· Qualifying and monitoring distributed partners
· Planning an Agile project
· Project execution across time-zones and cultures
· Encouraging true Innovation and Collaboration
· Effective Internet tools
· Q&A
Pathways to Technology Transfer and Adoption: Achievements and ChallengesTao Xie
Dongmei Zhang and Tao Xie. Pathways to Technology Transfer and Adoption: Achievements and Challenges. In Proceedings of the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering in Practice (SEIP), Mini-Tutorial, San Francisco, CA, May 2013. http://people.engr.ncsu.edu/txie/publications/icse13seip-techtransfer.pdf
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
More Related Content
Similar to Text Summarization Talk @ Saama Technologies
See to believe: capturing insights using contextual inquiryDeirdre Costello
Presented by Deirdre Costello, Kate Lawrence and Melissa Pike to Boston UXPA members on September 18, 2014.
EBSCO's User Research team recently completed an in-depth, ethnography-style study of physicians' research habits, including how they judge credibility, how they learn about the sources they use and what they do with the information they find.
Two researchers and a product manager will talk about the methodology, the project and how the findings influenced a product roadmap. And answer your questions, of course!
DSC UTeM DevOps Session#1: Intro to DevOps Presentation SlidesDSC UTeM
DevOps has been such a buzzword in the IT field nowadays. If you look into job postings, you might be surprised to find terms like "work with DevOps team", "work in an agile team" etc.
What is DevOps? What is agile? And why all these? 樂
Join us on 24 May 2021, where we have a short session to explore on the events that led to the trend nowadays
We will be exploring on the current trends, tech stacks and the existence of DevOps itself! 朗
Mark this date on your calendar and we'll see you there!
* Note: This is an introductory "brief overview" session that gives you context on our upcoming events.
Slides by KwongTN.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
The document proposes incorporating Chinese radicals into neural machine translation models. It discusses related work incorporating word and character level information into neural MT. The proposed model combines radical-level MT with an attention-based neural model, representing input text with word, character, and radical combinations. Experiments show the character+radical and word+radical models outperform baselines on standard MT evaluation metrics using a Chinese-English dataset. Future work includes improving model optimization and testing on additional data.
This document provides background and guidance for writing a research proposal summarizing the potential problems caused by increased technology use in American society. It outlines essential questions and internet sources for research on topics like privacy issues with drones, social media, and cell phone use. The document instructs evaluating sources and includes sample graphs as primary evidence. It guides developing an outline and writing a draft proposal highlighting technology's problems and promises while citing supporting research.
Harsh Patel has applied for software developer positions with 7+ years of experience in Java programming. He holds a Bachelor's degree in Computer Engineering and has experience with technologies like PHP, HTML, JavaScript, and databases. His projects include a college management system and website. He is looking for opportunities to further develop his skills and contribute to advancing India's technology sector.
This document discusses web crawling techniques. It begins with an outline of topics covered, including the motivation and taxonomy of crawlers, basic crawlers and implementation issues, universal crawlers, preferential crawlers, crawler evaluation, ethics, and new developments. It then covers basic crawlers and their implementation, including graph traversal techniques, a basic crawler code example in Perl, and various implementation issues around fetching, parsing, indexing text, dealing with dynamic content, relative URLs, and URL canonicalization.
The document discusses user studies conducted during the redesign of a university website. It provides examples of different user study methods used, including online surveys that received over 7,000 responses to identify user tasks, guerilla testing using card sorting and whiteboarding to develop taxonomy and page flow, and live testing of visual designs using a hot-spot tool before coding. The summary highlights the main user study methods, tools used for each method, and concludes by noting the importance of engaging users early through qualitative testing.
The document provides guidance for students to research and write a paper on the topic of privacy versus security and the use of technology. It outlines an assignment where students will research the issue, evaluate internet sources, write a proposal presenting their position, and create a final paper using a five paragraph essay structure. Students are given essential questions to consider and links to various sources to inform their research on both sides of the debate around privacy and national security.
Predicting and Preparing For Emerging Learning Technologieslisbk
The document summarizes Brian Kelly's presentation on predicting and preparing for emerging learning technologies. It discusses identifying technology trends, drivers, and challenges through the Delphi process used by the NMC Horizon Report. It also provides tools and methods for institutions to plan for future technologies, including scenario planning, acknowledging risks, and engaging with challenges. The presentation aims to help attendees understand limitations of future forecasting and apply similar methodologies to plan locally.
Webinar - SEO for Beginners: Simple Steps for Nonprofits and Libraries - 2016...TechSoup
SEO – search engine optimization – is the practice of improving, and promoting a website in order to increase the number of visitors the site receives from search engines. The majority of traffic to your organization or library website may come from the three major search engines - Google, Yahoo, and Bing.
In this free webinar with Whole Whale, learn some basic SEO tips for beginners to help your organization's site and content rank higher and be found more consistently, helping you grow your reach and supporters.
There are over 200 factors that translate in the the Google Search algorithm that handles over 1 trillion searches each day. This session gives a simple history of how we got here and the basics of the algorithm. We cover the main topics and key terms you should know, as well as the guiding principles of the system. This overview will help your team start to decode the nice versus necessary elements of SEO your organization can use to increase organic traffic.
Takeaways:
-- Keyword research
-- Link-building basics to increase traffic
-- Understanding the on-page and off-page principles of the algorithm
This document provides an overview of the research process and how to write research papers and articles. It discusses various parts of a research paper like the title, abstract, introduction, literature review, methods, results, conclusion, references, and plagiarism. It also lists various tools that can be used for writing, data analysis, literature searching, data collection, data storage, connecting with other researchers, grammar and plagiarism checking. The document is intended as a guide for writing research papers and articles.
This document provides an agenda and information for an IT class. It introduces assignments including a PowerPoint presentation, thesis topics, and library and research resources. It explains how to format papers in MLA style and conduct scholarly research. Homework includes finding scholarly sources and creating Word equations. Extra credit options are writing a research paper or presenting the PowerPoint. The next class will cover identity theft, hackers, viruses, malware, spam, and cookies. Students will debate whether Wendy's or Dunkin' Donuts wifi is safer to use.
This webinar will provide guidance for proper planning and managing, in order to get your distributed teams working smoothly and effectively. Prerequisites: A working knowledge of Lean and Scrum NPD methods (stand-up meetings, user stories, backlog, sprints, burn-down charts, etc.)
We will cover the following topics in this webinar:
· Qualifying and monitoring distributed partners
· Planning an Agile project
· Project execution across time-zones and cultures
· Encouraging true Innovation and Collaboration
· Effective Internet tools
· Q&A
Pathways to Technology Transfer and Adoption: Achievements and ChallengesTao Xie
Dongmei Zhang and Tao Xie. Pathways to Technology Transfer and Adoption: Achievements and Challenges. In Proceedings of the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering in Practice (SEIP), Mini-Tutorial, San Francisco, CA, May 2013. http://people.engr.ncsu.edu/txie/publications/icse13seip-techtransfer.pdf
Similar to Text Summarization Talk @ Saama Technologies (20)
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Population Growth in Bataan: The effects of population growth around rural pl...
Text Summarization Talk @ Saama Technologies
1. Automatic Text Summarization
Trends, Challenges and Opportunities
Siddhartha Banerjee
Research Scientist, Content Platform
Yahoo! (now Oath, a Verizon Company)
September 22, 2017
2. 2Talk @ Saama Technologies Siddhartha Banerjee
❑ Undergraduate degree
• Industrial Engineering - 2009 (IIT Kharagpur)
❑ Professional Experience: 2009 – 2012
• Sabre Airline Solutions and Oracle Retail
❑ Ph.D. @Penn State Information Sciences (2012 - Dec’ 2016)
• Advised by Prof. Prasenjit Mitra
• Natural Language Processing
❑ Back to Industry: 2017
• Yahoo! (March 2017 - present)
• Question Answering
• Relationship extraction using distant supervision
• Deep Learning
My background
3. 3Talk @ Saama Technologies Siddhartha Banerjee
Outline
● What is Text Summarization?
● Overview of existing work
● Challenges
● Current Trends
● My experiences
● The Future of Summarization
● Q&A
4. 4Talk @ Saama Technologies Siddhartha Banerjee
What is Text Summarization?
Single-document summarization
Multi-document summarization
5. 5Talk @ Saama Technologies Siddhartha Banerjee
An “ideal” summary
Informativeness Coherence Grammaticality
6. 6Talk @ Saama Technologies Siddhartha Banerjee
Types of Summarization
● Extractive
○ “Extract” certain sentences
○ Easier
○ No issues with grammaticality
● Abstractive
○ Produce “abstracts”
○ Content understanding
○ Generation
7. 7Talk @ Saama Technologies Siddhartha Banerjee
Extractive Summarization
1958
We have come a long way since then!
Sentences that mention words that occur frequently in the document are more important.
8. 8Talk @ Saama Technologies Siddhartha Banerjee
Extractive Techniques
• Word-statistics based techniques
• Centroid [Radev et. al, 2004]
• TextRank [Mihalcea and Tarau, 2004]
• Supervised techniques
• Provide ranked sentences to train from documents
• Learning to Rank
• Topic-model based techniques
• Model sentences as topic vectors [Blei et. al, 2003]
• Select sentences that are more “central” to the document vector.
9. 9Talk @ Saama Technologies Siddhartha Banerjee
Why “abstractive”?
❑ Consider opinions on iphone:
• The iPhone’s battery lasts long…have to charge it once every few days.
• iPhone’s battery is bulky but it is cheap..
• iPhone’s battery is bulky but it lasts long!
❑ Extractive: The iPhone’s battery lasts long…have to charge it once every few days.
• Limit on summary length
❑ Ideal: The iPhone’s battery lasts long and is cheap but is bulky.
• HARD!!
• Preferred (Murray et. al, 2010 – user study)
10. 10Talk @ Saama Technologies Siddhartha Banerjee
Abstractive Summarization techniques
❏ Text-to-text generation at sentence level – Independent of other sentences
❏ Sentence compression (Cohn and Lapata’ 2009)
❏ Extractive to abstractive: Not possible using just compression
❏ Sentence fusion (Barzilay and McKeown’ 2005, Filippova and Strube, 2008)
Template-based (Genest and Lapalme’, 2011)
❏ Domain-specific templates - Lot of manual effort
I: But a month ago, she returned to Britain, taking the children with her.
O: She returned to Britain, taking the children
11. 11Talk @ Saama Technologies Siddhartha Banerjee
Current Trends
● Deep Learning!!
● Neural Attention Model for Sentence Summarization (FAIR, 2015)
○ Headline generation
○ Feed-forward neural network
○ Attention model
● RNN-based summarization (FAIR, 2016)
12. 12Talk @ Saama Technologies Siddhartha Banerjee
Sequence to Sequence models
❏ Originally modelled for machine translation
❏
❏
13. 13Talk @ Saama Technologies Siddhartha Banerjee
RNN’s with attention
http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html
● Rare-word problem: Reproducing factual details inaccurately
● Pointer-Generator Networks to the rescue! Copy words from source to text.
● Get To The Point: Summarization with Pointer-Generator Networks (Stanford NLP Group, 2017)
14. 14Talk @ Saama Technologies Siddhartha Banerjee
Evaluation
Automatic Evaluation
• ROUGE – Recall-Oriented Understudy for Gisting Evaluation (Lin, 2004)
Manual Evaluation
•Ask human judges and rate summaries on quality
15. 15Talk @ Saama Technologies Siddhartha Banerjee
Datasets
• News articles
• CNN/Daily News dataset
• Document Understanding Conference datasets [DUC, now TAC]
• Several topics: Each topic with 8-10 documents
• Meeting conversations
• Single meeting transcript
• AMI Dataset [http://groups.inf.ed.ac.uk/ami/corpus/overview.shtml]
• 139 meeting transcripts: 119 training + 20 test
16. 16Talk @ Saama Technologies Siddhartha Banerjee
My Summarization Experience
Automatically authoring content for Wikipedia
Improving existing articles Constructing new articles
Web
information
Assign to Wiki
Sections
Summarization
17. 17Talk @ Saama Technologies Siddhartha Banerjee
Summary sentence generation
S1 The outbreak is the largest ever reported in North America.
S2 Enterovirus D68 caused outbreak of respiratory disease.
S3 Clusters of the outbreak in the United States were reported in August.
1: Enterovirus D68 caused outbreak is the largest ever reported in North America.
2: Enterovirus D68 caused outbreak in the United States were reported in August.
3: The outbreak is the largest ever reported in August.
Output
Graph Construction
❑ Multi-sentence compression
(Filippova’ 2010)
• Directed Graph
• Nodes are words
■ (with POS)
• Edges are adjacencies
❑ Graph traversal
Overgenerate
and
Select
18. 18Talk @ Saama Technologies Siddhartha Banerjee
A comprehensive model (Banerjee and Mitra’ 2016)
Word - graph
p2
p3 pk
Generated
sentences ❌ ❌✔
…...........
Select few sentences
Informativeness Linguistic Quality Coherence
p1
✔
Ordering of sentences
(Bollegala et al. 2012)
Information coverage Grammaticality
19. 19Talk @ Saama Technologies Siddhartha Banerjee
Mathematical formulation
Maximize
Constraints
❑ Three factors:
• I – Information coverage [Textrank (2004)]
• LQ – Language model [Heafield et al. 2013]
• Coh – Regression based scoring
+
K
K
20. 20Talk @ Saama Technologies Siddhartha Banerjee
Experimental Results: News dataset
•ROUGE evaluation on Document understanding conference (DUC) datasets
20
21. 21Talk @ Saama Technologies Siddhartha Banerjee
❑ Manual Evaluation: 10 evaluators
• Informative coverage: ~5% improvement over `best’
extractive system
• Readability: ~4% reduction compared to extractive system
❑ Error Cases
• The U.N. imposed sanctions since 1992 for its refusal to hand over the two
Libyans wanted in the 1988 bombing that killed 270 people killed.
• The deal that will make Hun Sen prime minister and Ranariddh agreed to a
government formed.
Experimental Results (contd.)
22. 22Talk @ Saama Technologies Siddhartha Banerjee
Disaster-event Tweet Summarization (Rudra et. al, 2016)
Content words: Numerals, nouns, locations, main verbs
• 5: Content word -> At least One Sentence
• 6: Sentence selected determines content words to be selected
Content- word based
Summary Quality Optimization
23. 23Talk @ Saama Technologies Siddhartha Banerjee
Experimental Results
• Readability evaluation (COWABS is our proposed technique)
24. 24Talk @ Saama Technologies Siddhartha Banerjee
Meeting summarization using fusion (Banerjee and Mitra, 2015)
•“Um well this is the kick-off meeting for our project.”
• “so we’re designing a new remote control and um.”
• “Um, as you can see it is supposed to be original, trendy and user friendly.”
25. 25Talk @ Saama Technologies Siddhartha Banerjee
Results: Meeting data
❑ AMI Dataset (http://groups.inf.ed.ac.uk/ami/corpus/overview.shtml)
• 139 meeting transcripts: 119 training + 20 test (for extractive)
❑ ROUGE Evaluation
• ~17 % R-2 score over other abstractive system (Filippova’ 2010)
❑ Readability Analysis
• Our model: Slightly curved around the sides like up to the main display as well. It
was voice activated .
• Human: The remote will be single-curved with a cherry design on top. A sample
sensor was included to add speech recognition.
26. 26Talk @ Saama Technologies Siddhartha Banerjee
Resources
• https://github.com/miso-belica/sumy
• Lots of simple extractive summarization techniques
• https://github.com/facebookarchive/NAMAS
• Abstractive summarization: headline generation task
• http://kavita-ganesan.com/opinosis-summarizer-library
• Summarizing redundant opinions/ reviews
• http://pavel.surmenok.com/2016/10/15/how-to-run-text-summarization-with-tensorflow/
• Tutorial using seq2seq model on tensorflow
• https://github.com/g-deoliveira/TextSummarization
• Topic model-based summarization
• https://github.com/StevenLOL/AbTextSumm
• My abstractive summarization technique.
27. 27Talk @ Saama Technologies Siddhartha Banerjee
Future of Summarization
❏ The importance of summarization is undeniable
❏ Growth of data
❏ Automatic authoring in journalism
❏ Medical report summarization
❏ Deep Learning (RNN’s)
❏ Still a long way to go!
❏ Sequence to sequence models are hard to control
❏ Better metrics. ROUGE is not good enough.
❏ Making sense of an entire summary -- mimicking human capabilities.
28. 28Talk @ Saama Technologies Siddhartha Banerjee
Publications
• Siddhartha Banerjee and Prasenjit Mitra. WikiWrite: Generating Wikipedia Articles Automatically. 25th International Joint Conference
on Artificial Intelligence IJCAI-16.
• Koustav Rudra, Siddhartha Banerjee, Muhammad Imran, Niloy Ganguly, Pawan Goyal and Prasenjit Mitra. Summarizing Situational
Tweets in Crisis Scenario. ACM HyperText, 2016
• Siddhartha Banerjee and Prasenjit Mitra. Filling the Gaps: Improving Wikipedia Stubs., 15th ACM SIGWEB International Symposium on
Document Engineering (DocEng 2015).
• Siddhartha Banerjee, Prasenjit Mitra and Kazunari Sugiyama. Generating Abstractive Summaries from Meeting Transcripts., 15th ACM
SIGWEB International Symposium on Document Engineering (DocEng 2015).
• Siddhartha Banerjee and Prasenjit Mitra. WikiKreator: Improving Wikipedia Stubs Automatically., Association of Computational
Linguistics (ACL, 2015).
• Siddhartha Banerjee, Prasenjit Mitra and Kazunari Sugiyama. Multi-Document Abstractive Summarization using ILP-based
Multi-Sentence Compression. , International Joint Conference on Artificial Intelligence (IJCAI, 2015).
• Siddhartha Banerjee, Prasenjit Mitra and Kazunari Sugiyama. Abstractive Meeting Summarization using Dependency Graph Fusion,
ACM International Conference on World Wide Web (WWW (poster) ), 2015, Florence, Italy.
• Siddhartha Banerjee, Cornelia Caragea and Prasenjit Mitra. Playscript Classification and Automatic Wikipedia Play Articles Generation,
International Conference on Pattern Recognition (ICPR '2014) Stockholm, Sweden
29. 29Talk @ Saama Technologies Siddhartha Banerjee
Email id: sidd.iitkgp@gmail.com