#2 Building Natural Language Generation Systems

Building Natural
Language Generation
(NLG) Systems
Ross Turner
Tomorrow’s Language Technology, Berlin 17/09/15

Agenda
1.  Brief introduction
2.  NLG in 10 minutes
3.  Case study: NLG in Weather Services
4.  Statistical approaches to NLG
5.  Where next?
2

My Profile
•  Current: Principal Engineer, Arria NLG plc
•  Formerly:
–  Senior Software Engineer, Nokia Berlin
–  Post-doctoral Research Fellow, Universities of Edinburgh and
Aberdeen
•  PhD in Applied NLG systems in 2009
4

What is

Natural Language Generation (NLG)
exactly?

NLG Synopsis

•  The automatic generation of natural language from non-linguistic input
6
Input

Seman+c

Representa+on

Text

Example
"Grass pollen levels for
Wednesday have decreased
from the very high levels of
yesterday with values of
around 6 to 7 across most parts
of the country. However, in
Northern and North Western
areas, pollen levels will be
moderate with values of 4. "

7
Turner
et.
al
2006

Reiter & Dale Pipeline Architecture

8
Choosing
What
to

Say

Deciding
How

to
say
it

System Building

•  Development requires example input data and corresponding output text
•  Systems are usually knowledge-based and domain-specific, but statistical
approaches are becoming more commonplace
•  Evaluations typically use:
–  Automated metrics against a gold standard
–  Human ratings
–  Task-based evaluations
9

Commercial Applications
•  NLG Commercialisation has been relatively recent
•  Many systems developed in Healthcare, Meteorology, Finance etc.
•  Most common applications are so called “data-to-text” systems that
provide decision support
11

Benefits
•  Scalability, cost-eﬀiciencies, automation of routine reporting etc.
•  Task-based evaluations have highlighted the benefits of textual
presentations of data:
–  Medical staﬀ made better decisions (Law et al. 2005)
–  Mobile phone users exhibited superior task performance (Langan-Fox
et al. 2006)
12

Can NLG produce high quality texts?

Output Variation and Quality
•  NLG systems have been developed to generate:
–  Narrative Prose (Callaway 2002)
–  Poetry (Manurung 2003)
–  Jokes (Binsted and Ritchie 1994, Manurung et al. 2008)
•  SumTime-Mousam wind forecasts were judged better than those written
by human experts (Reiter et al. 2005)
14

Input Data
Turner
2009
17

Input Data
Turner
2009
18

Input Data
Turner
2009
19

Communicative Goal
Turner
2009
20

System Output
Computer Generated Forecast
•  “Road surface temperatures will fall slowly during the afternoon and early
evening, reaching zero in some northwestern places by 15:00. Ice and hoar
frost will aﬀect all routes throughout the forecast period, hoar frost turning
heavy by 15:00 in some places below 100M. Fog will aﬀect all routes
throughout the forecast period, turning freezing by 16:00 in all areas.”
Human Authored Forecast
•  “A dry and settled night. It will be cold, despite rather cloudy skies at times
and freezing fog is expected to form along the lower routes. Hoar frost will
be widespread across the region and there will also be icy patches at some
locations. RSTs are expected to fall to between minus one and minus three
degrees.”
Turner
2009
21

Evaluation with Road Engineers
•  Online questionnaire:
–  Ask Road Engineers to rate pairs of road ice forecasts based on the
same data
–  21 respondents, 17 with 5+ years experience.

Turner
2009
22

Experimental Setup
•  Gritting decision conditions:
–  Marginal Night? Yes (MN+), No (MN-)
–  Settled Conditions? Yes (SC+), No (SC-)
•  SC-MN-: Grit all routes
•  SC+MN-: Grit all routes
•  SC-MN+: Grit some routes
•  SC+MN+: Grit some routes
Turner
2009
23

Questions: Direct Comparisons
Q1 In terms of the information presented in both texts, which is most useful?
Q2 Which text do you find easier to understand?
Q4 Which text would allow you to prioritise the routing of gritting vehicles better?
Turner
2009
24

Results: Direct Comparisons
Turner
2009
25

Questions: Task-based
Q3 Please indicate for both texts roughly how many routes you would treat
(all, some or none)?
Turner
2009
26

Results: Task-based
Turner
2009
27

Meteorologists Beta Feedback
Turner
2009

28
•  Forecaster’s ratings vs forecaster’s post-edit behaviour
“Do as I say, not as I do”

Public Weather Forecasts
Sripada
et.
al
2014
29

Business Use Case
•  UK Met Oﬀice produces forecast data for 1000s of sites every 3 hours
•  Manpower dictates written forecasts can only be produced at the area
level
•  Solution: develop a NLG system to generate site-specific weather
forecasts
Sripada
et.
al
2014
30

Results obtained over 10 trials using a
MacBook Pro 2.5 GHz Intel Core i5,
running OS X 10.8 with 4GB of RAM
Sripada
et.
al
2014

31
Scalability

Output Quality
35 @metoﬀice followers:
1.  Did you find the text helped you to understand the forecast better?
–  Yes 97%, No 3%
2.  How did you find the text used?
–  About right 74%, Too short/long 20%, Unsure 6%
3.  Would you recommend this feature?
–  Yes 91%, No 9%
Sripada
et.
al
2014
32

NLG Is All About Choice

•  Choosing what to say and how to say it:
–  Content
–  Words
–  Syntactic structure
•  Many of these choices can be learnt:
–  Overgeneration and ranking
–  Word choice classifiers
–  Word ordering

Evaluating System Building Cost
•  Belz and Kow (2010) evaluated implementations of SumTime-Mousam
–  The original handcrafted version
–  Probabilistic Context Free Grammars (PCFG)
–  Statistical Machine Translation
•  Human ratings favoured the original handcrafted system while metrics
favoured automated systems

35

Some Discussion of Statistical Approaches

•  Statistical approaches can replicate a corpus well and reduce system
building cost
•  Hybrid statistical approaches have the potential to support domain
adaptability (Kondadadi et al. 2013)
•  Uncertain how to refine the output of model based systems
•  Large amounts of aligned training data is normally required

36

The Story So Far…
•  NLG systems can produce high quality texts
•  NLG systems solve business problems
•  Statistical NLG approaches are still evolving
38

The Future?

•  New learning and statistical models
•  Domain independence
•  Multilinguality
•  Targeted web content
•  Big data analysis
42

References
•  Belz A. and Kow E. (2010), Assessing the Trade-Oﬀ between System Building Cost and Output Quality in Data-to-Text Generation. In
Krahmer, E., Theune, M. (eds.) Empirical Methods in Natural Language Generation, Vol. 5980 of Lecture Notes in Computer Science,
Springer, pp. 180-200.
•  Binsted K. and Ritchie G. (1994) An Implemented Model of Punning riddles. In Proceedings of the Twelfth National Conference on
Artificial Intelligence (AAAI-94).
•  Callaway, C. B. and Lester, J. C. (2002). Narrative prose generation. Artificial Intelligence, 139(2):213–252.
•  Kondadadi R., Howald B. and Schilder F. (2013) A Statistical NLG Framework for Aggregated Planning and Realization. In ACL (1),
1406-1415
•  Law A., Freer Y., Hunter J., Logie R., McIntosh N. and Quinn J. (2005). A Comparison of Graphical and Textual Presentations of Time
Series Data to Support Medical Decision Making in the Neonatal Intensive Care Unit. Journal of Clinical Monitoring and Computing 19
(3): 183–94
•  Langan-Fox, J., Platania-Phung, C. and Waycott, J. (2006). Eﬀects of advance organizers, mental models and abilities on task and
recall performance using a mobile phone network. Applied Cognitive Psychology, 20(9):1143-1165
•  Manurung, R., Ritchie, G., Pain, H., Waller, A., O’Mara, D., and Black, R. (2008). The construction of a pun generator for language skills
development. Applied Artificial Intelligence, 22(9):841–869.
•  Reiter, E., Sripada, S., Hunter, J., Yu, J., and Davy, I. (2005). Choosing words in computer- generated weather forecasts. In Artificial
Intelligence, volume 67, pages 137–169
•  Sripada S. Burnett N., Turner R., Mastin J. and Evans D. (2014). A Case Study: NLG meeting Weather Industry Demand for Quality and
Quantity of Textual Weather Forecasts. In proceedings of INLG-2014, Philadelphia, PA, USA, 19-21.
•  Turner R., Sripada S., Reiter E. and Davy I. (2006). Generating Spatio-Temporal Descriptions in Pollen Forecasts. EACL-06proceedings,
Trento, Italy, April 3-7.
•  Turner, R. (2009) Georeferenced data-to-text : techniques and application. Ph.D Thesis, University of Aberdeen.
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.509142

44

Images
•  “Snowwiper near Toronto, Canada”, by Jkransen, CC BY-SA 2.5 – Slide 15
•  "John's Weather Forecasting Stone”, by Tim Rogers, CC BY-NC-SA 2.0 – Slide 28
•  http://googleresearch.blogspot.de/2014/11/a-picture-is-worth-thousand-coherent.html - Slide
36
•  http://www.theguardian.com/media/shortcuts/2014/mar/16/could-robots-be-journalist-of-
future - Slide 40

45

London
ARRIA NLG CORPORATE HQ
Space One, 1 Beadon Road
Hammersmith
London W6 0EA
United Kingdom
+44-20-7100-4540
Aberdeen
ARRIA RESEARCH & DEVELOPMENT
Meston Building G05E
University of Aberdeen
Aberdeen AB24 3FX
United Kingdom
+44-1224-466-740
ARRIA GLOBAL HEADQUARTERS & ARRIA EMEA
ARRIA.COM
ARRIA NLG plc is a company registered in England and Wales having its registered oﬀice at Space One, 1 Beadon Road, Hammersmith, London W6 0EA, United Kingdom with registered number 07812686
Company names and company logos are trademarks of their respective owners. Entire contents © 2015 by ARRIA NLG plc with all rights reserved.
Americas | EMEA | Asia Pacific
New York
ARRIA NLG (USA)
80 Broad Street,
6th Floor
New York, NY 1004
United States
+1-212-252-2185
Auckland
ARRIA NLG (NZ)
Unit 16
150 Beaumont Street
Westhaven, Auckland 1010
New Zealand
+64-9-801-0035
ARRIA AMERICAS
ARRIA ASIA-PACIFIC

#2 Building Natural Language Generation Systems

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to #2 Building Natural Language Generation Systems

Similar to #2 Building Natural Language Generation Systems (20)

Recently uploaded

Recently uploaded (20)

#2 Building Natural Language Generation Systems