Morse, Christian - LIBR 202 - The Future of Natural Language Processing

Running head: FUTURE OF NATURAL LANGUAGE PROCESSING 1
Speculations on the Future of Natural Language Processing
Christian A. Morse
San Jose State University, School of Library and Information Science
Information and Retrieval LIBR 202-10
Professor Liu – Summer 2014

Abstract
This study takes into consideration the development of natural language processing
systems within the broader framework of artificial intelligence. It is the focus of this
study to discern the historical and current trends in artificial intelligence and
natural language processing, and to establish some of the theoretical foundations
that have led the way in these fields. Ultimately, the study looks to identify crucial
factors by which insight can be gained regarding speculations about the future of
natural language processing, and the impact this future will have on information
retrieval, and human societies as a whole.
Keywords: Artificial Intelligence, Cognitive Science, Information Retrieval,
Library and Information Science, Natural Language Processing, Technology

Introduction
Natural language processing (NLP) is a subfield of the much broader field
known as Artificial Intelligence (AI)1. The development of NLP in computerized
systems with capabilities on par with humans is generally considered to be the most
challenging task within the AI field2 (Kurzweil, 2005, p. 286). Over the years,
researchers have devised a number of methods in order to deal with the challenges
of developing NLP systems in computers. It is the purpose of this study to review
some of the historical and theoretical dilemmas that AI researchers have faced in
the development of NLP systems, and to trace the progress that has been made on
these matters. Ultimately, this study will look to sketch out an understanding
regarding the future capacities of NLP systems based on current developments
within the field, as well as, provide a brief discussion of how these capacities could
impact the future of information retrieval (IR).
A Brief History of Natural Language Processing and Artificial Intelligence
While philosophical debates and perspectives regarding the development of
AI can be traced throughout human history, the modern development of AI as a
scientific field begins in the 1950s. Riding on the theoretical breakthroughs of a
handful of notable individuals,3 as well as the substantial public investment
1 AI involves the notion of computers being able to perform intelligent tasks in a
manner similar to humans. This includes reasoning, various forms of pattern
recognition, language use and understanding, and learning (Pinker, 2007, pp. 503-
504). Minsky (1986) notes, “there is no clear boundary between psychology and
Artificial Intelligence because the brain itself is a kind of machine” (p. 326).
2 The theoretical capability of computers to perform at the level of humans has been
referred to as “Strong AI” by the philosopher John Searle.
3 Some of the most important early contributors include, Alan Turing, Kurt Gödel,
John von Neumann, and Claude Shannon. For various overviews of these individuals

provided during this period, some researchers predicted that the development of
robots with human level intelligence was just around the corner. In the mid 1960s,
AI pioneer Herbert Simon stated, “Machines will be capable, within 20 years, of
doing any work a man can do” (as quoted in Kaku, 2014, p. 216). In 1967, Marvin
Minsky (cofounder of the AI lab at MIT) noted, “within a generation . . . the problem
of creating ‘artificial intelligence’ will substantially be solved” (as quoted in Kaku,
2014, pp. 216-217). However, most of the AI breakthroughs during this period
involved machines performing highly specialized tasks (e.g. playing checkers) and
little else (Kaku, 2014, p. 217). It was clear that some researchers had greatly
underestimated the problems around AI (with one of the main overlooked issues
being that most of human thinking is actually subconscious), and in the 1970s public
sector funding began to dry up (Kaku, 2014, p. 217).
In the 1980s, as computing power continued to increase and as Pentagon
planners dreamed about the prospects of robot soldiers, AI funding and research
began to rebound, hitting a billion dollars by 1985 (Kaku, 2014, p. 217). However,
progress during this period was still modest, and it wouldn’t be until the late 1990s
that some of the more substantive gains would be seen.4 Since this time, computing
power has continued to increase substantially5 and steady progress continues to be
and an understanding of their contributions, see Dyson (2012) and Gleick (2012).
For an informative Turing biography, see Hodges (2012). For a discussion of some
of Gödel’s contributions, see Hofstadter (1999); Goldstein (2006); Nagel and
Newman (2001).
4 One notable example includes IBM’s Deep Blue defeating world chess champion
Garry Kasparov in 1997. For an enlightening look at this event, see Silver (2012, pp.
262-293).
5 The sustained phenomenon over the last 40+ years of computer power doubling
every two years is known as Moore’s law (named after Gordon Moore). However,

made in a number of specialized AI systems.6 Kurzweil (2005) notes that NLP
remains the most difficult task for AI and that “no simple tricks, short of fully
mastering the principles of human intelligence, will allow a computerized system to
convincingly emulate human conversation, even if restricted to just text messages”
(p. 286). Pinker (2007) notes, “the main lesson of . . . AI research is that the hard
problems are easy and the easy problems are hard”7 (pp. 190-191). Pinker goes on
to note, “understanding a sentence is one of these hard easy problems” (p. 191).
Over the decades, researchers have incorporated a variety of strategies in order to
try to bridge the NLP gap between machines and humans.
Major Approaches and Perspectives for NLP Systems
Chowdhury (2010) notes that there are three major problems involved in
developing NLP systems for computers (p. 406). These problems involve the
system’s thought process, its representation and meaning of inputs, and its general
knowledge of the world (Chowdhury, 2010, p. 406). To deal with these problems,
Chowdhury (2010) notes, “a natural language processing system requires three
there are reasons for pessimism regarding the extension of Moore’s law into the
future. For further discussion, see Kaku (2014, pp. 223-224) and Seung (2013, pp.
168-169).
6 Kurzweil (2005) notes a number of examples of industries where progress is being
made. These include various aspects of science and mathematics, military, finance,
medicine, space exploration, robotics, and speech and language, among other
industries (pp. 279-289).
7 In other words, it’s difficult to get AI systems to perform tasks that are easy for
humans (e.g. various forms of pattern recognition), and it’s relatively easy to get AI
systems to perform tasks that are difficult for humans (e.g. performing well at
chess).

kinds of knowledge: syntactic knowledge, semantic knowledge and pragmatic
knowledge”8 (p. 406).
With regard to syntactic analysis, the most prominent and influential work
has come from Noam Chomsky.9 In the mid 1950s, Chomsky introduced a formal
rule system known as the Chomsky hierarchy, and the examples of context-free
grammars (or type 2 grammars) that Chomsky provided have been highly
influential in developing linguistic models in computer science (Chowdhury, 2010,
p. 407). Chomsky also introduced the idea of ‘transformational grammar’, where
deeper structures represent the ‘meaning’ of a sentence’s surface manifestation
(Chowdhury, 2010, p. 408). This model was adopted in order make-up for some of
the shortcomings around context-free grammars (Chowdhury, 2010, p. 408).
Chowdhury (2010) notes, “transformational grammar starts out with context-free
rules to build up the basics of the sentence, but then modifies the basic sentences
with the transformational rules” (p. 408). These transformations ultimately map the
‘deep structure’ (or ‘d-structure’) onto the sentence’s ‘surface structure’ (or ‘s-
structure) (Pinker, 2007, p. 113-118).10 Since the 1980s, an entirely different
8 For a somewhat technical discussion of these problems with regard to AI, see Barr,
Cohen, and Feigenbaum (1989, pp. 193-239).
9 Chomsky’s influence over linguistics has been immense. For an informative and
easily digestible overview of some of Chomsky’s contributions, see Pinker (2007).
10 Much of the discussion around Chomsky’s theories can be quite technical, and
Chomsky himself has a tendency to overthrow his own theories every ten years, or
so. For a discussion of this, see Pinker (2013, pp. 228-229). There is also debate
among linguists about the need for deep structure. Chomsky himself has called the
need for deep structure into question (Pinker, 2007, p. 114). For an overview on
some of Chomsky’s current positions, see Chomsky (2012).

approach, featuring neural network models with no rules or modules, has also
become fashionable in developing AI systems.11
Parsing has also been a major area of focus of NLP in AI systems. Chowdhury
(2010) notes that parsing is “a computational process that takes individual
sentences or connected texts and converts them to some representational structure
useful for further processing” (p. 409). Both top-down and bottom-up strategies
have been implemented in order to deal with some of the challenges around parsing
(Chowdhury, 2010, pp. 409-414). The difficulty for AI systems in this area has to do
with decision-making. Pinker (2007) notes that “the memory part is easy for
computers and hard for people, and the decision-making part is easy for people . . .
and hard for computers” (p. 200). Difficulties around decision-making have been a
classic problem in the AI field.
Semantic analysis is another crucial area of research in NLP systems.
Chowdhury (2010) notes, “all syntactic analysis systems must use semantic
knowledge to eliminate ambiguities that cannot be resolved by only structural
considerations” (p. 414). This means that a system needs to have a knowledge base
by which it can discern statements in natural language (Chowdhury, 2010, p. 414).
Major conceptual contributions have included the development of the logical system
known as predicate calculus,12 the development of semantic networks,13 case
11 A prominent example of this is found in the work of Rumelhart and McClelland
(1986). For a critique of their work, see Pinker (2000, pp. 103-119; 2013, pp. 84-
101).
12 This system was development by the mathematician Gottlob Frege.
13 For a discussion of semantic networks, see Barr, Cohen, and Feigenbaum (1989).

grammar, frames,14 and conceptual dependency (Chowdhury, 2010, pp. 416-424).
While all of these approaches have been useful in AI research, they also contain a
variety of limitations, with no single approach being able to solve all of the
underlying issues.
Pragmatic knowledge is also important in the development of NLP systems.
Chowdhury (2010) notes, “pragmatic knowledge is useful because it helps eliminate
ambiguities and complete semantic interpretation. Methods such as scripts, plans
and goals have been developed for representing pragmatic knowledge about
everyday life” (p. 424). Roger Schank is notable for having developed scripts as a
means to organize the knowledge necessary in order to understand various
situations in the world15 (Chowdhury, 2010, p. 424). Minsky (1986) describes a
script as “a sequence of actions produced so automatically that it can be performed
without disturbing the activities of many other agencies” (p. 331). Much of the
theoretical work in this area can be quite complex. Plans and goals are also crucial
components to the development of NLP in AI systems. Chowdhury (2010) notes,
“there is a fine line between the point where the scripts leave off and plans begin”
(p. 427). Chowdhury (2010) goes on to note that for a script, “the sequence of
actions is automatic, and thus very little guidance is needed. With plans, however,
little information is needed to specify the goals, but much detail must be given in
14 For a discussion of frames, see Minsky (1986, pp. 243-272).
15 Minsky (1986) relates the notion of scripts to human cognition, noting “the people
we call ‘experts’ seem to exercise their special skills with scarcely any thought at all .
. . . Perhaps when we ‘practice’ to improve our skills, we’re mainly building simpler
scripts that don’t engage so many agencies” (p. 137).

order to indicate how to achieve each goal”16 (pp. 427-428). AI researchers have
been working with all of these basic theories and concepts in order to try to build
better NLP capacities, and ultimately smarter machines.
Current Developments in AI and NLP Systems
In some ways AI and NLP systems have come a long way in recent years.
Major aspects of modern civilization now fundamentally depend upon AI systems,
with further advancements on the way17 (Kurzweil, 2013). One notable achievement
in NLP was in 2011, when IBM’s Watson was able to defeat the two most
accomplished Jeopardy! champions of all time.18 Other technologies, like the iPhone
personal assistant, Siri, have been generally well received on the consumer market.
Kurzweil (2013) notes, “you can pretty much ask Siri to do anything that a self-
respecting smartphone should be capable of doing . . . and most of the time Siri will
comply” (p. 161). These technologies often work by combining different theoretical
approaches within AI research. Kurzweil (2013) notes that “the methods used for
understanding natural language are very similar to hierarchical hidden Markov
models [HHMM], and indeed HHMM itself is commonly used” (p. 162). Kurzweil
(2013) goes on to note that “they all involve hierarchies of linear sequences where
each element has a weight, connections that are self-adapting, and an overall system
that self-organizes based on learning data. Usually learning continues during actual
16 For examples, see Chowdhury (2010, p. 426-428).
17 Examples include cars that drive themselves, and wristwatch devices that provide
medical advice, among other developments (Kurzweil, 2013; Kaku, 2014).
18 Kaku (2014) notes, “Watson can process data at the astonishing rate of five
hundred gigabytes per second (or the equivalent of a million books per second) with
sixteen trillion bytes of RAM memory” (p. 214). Kaku (2014) goes on to note, “It also
has access to two hundred million pages of material in its memory, including the
entire storehouse of knowledge within Wikipedia” (p. 214).

use of the system”19 (p. 162). The other approach that is used in these systems
involves hand-built rules (Kurzweil, 2013, p. 164). Kurzweil (2013) notes that,
“hand-built rules work well for a core of common basic knowledge. For translations
of short passages, this approach often provides more accurate results” (p. 164). By
combining these two basic approaches, researchers and innovators have been able
to develop the world’s most cutting edge NLP systems.20 However, self-organizing
machines that learn by acquiring large amounts of statistical data also have
significant limitations. Minsky (2006) notes that these systems are often useful, but
they are not very clever “because they use numerical ways to represent all the
knowledge they get” (p. 180). Minsky (2006) goes on to note that until they are
equipped “with higher reflective levels, they won’t be able to represent the concepts
they’d need for understanding what those numbers mean”21 (p. 180). All of this gets
at some of the fundamental dilemmas that have been in AI from the beginning.
19 For more on HHMM, see Kurzweil (2013, pp. 141-146). For a discussion on
hidden Markov models and neural networks, see Jordan and Bishop (1996).
20 There are many variations by which these approaches may be implemented.
IBM’s Watson has hundreds of individual systems that are regulated by an expert
manager known as UIMA (Unstructured Information Management Architecture).
This expert manager works to integrate the results of the individual systems
(Kurzweil, 2013, p. 167).
21 Minsky (2006) sees the popularity of some of these systems as slowing down the
progress for finding “higher-level ideas about human psychological machinery” (p.
290). It is also worth noting that self-organizing machines don’t produce systems
that are organized in the way that brains are. If a single transistor fails it impacts the
whole system, whereas brains are able to deal with significant internal changes. In
other words, errors are far more harmful to computers than they are to brains.
Minsky (2006) notes, “it will be difficult for a machine to keep developing—unless it
first evolves ways to protect itself against changes that cause bad side effects” (p.
181). Minsky (2006) suggests that a great way to accomplish this in engineering and
biology is to “split the whole system into parts that then can evolve more
independently. This surely is why all living things evolved to become assemblies of
separate parts [i.e. organs] . . . each of which have comparatively few connections to
other parts” (p. 181).

While researchers have made great strides in getting machines to do specific tasks
that humans can’t do, progress has been slow when it comes to developing
machines that have ‘common sense’. This problem has been particularly glaring
when it comes to creating NLP systems.
Speculations on the Future of NLP and AI
As previously mentioned, predictions by prominent researchers regarding
NLP and AI have at times been overly optimistic. Such predictions often fail to take
into account all of the relevant variables that go into determining the outcome of a
technology. Therefore, it is important to proceed with caution when addressing
these issues.
Prominent inventor and futurist Ray Kurzweil (2005; 2013) provides some
of the most optimistic predictions regarding the development of NLP in computers.
Kurzweil (2005) expects that by the end of the 2020s computers will be able “to
pass the Turing test, indicating intelligence indistinguishable from that of biological
humans”22 (p. 25). The fundamental concept behind Kurzweil’s optimism lies in the
notion of exponential growth. By taking into account the doubling of computing
power every couple of years over the last half-century (Moore’s law), and extending
the trend into the coming decades, Kurzweil expects computers to have enough
22 The Turing test was originally a thought experiment by English mathematician
Alan Turing. The idea involves an investigator engaging in a typed conversation with
a computer and a human. If the investigator can’t tell the difference between the
two, then the computer will have passed the Turing test and may be considered to
have full human capacities in regard to language (and presumably intelligence).
However, there are complications that arise with this thought experiment. For a
discussion, see Seung (2013; p. 259-263).

computing power to rival human intelligence by 202923 (Kurzweil, 2013, p. 169).
However, there may be significant problems in carrying these trends out much
further into the future. Kaku (2014) notes that, “today the smallest layer of silicon in
your Pentium chip is about twenty atoms in width, and by 2020 that layer might be
five atoms across” (p. 223). The problem is that at this level of reality, the
Heisenberg uncertainty principle24 begins to take effect, causing chips to short-
circuit (Kaku, 2014, pp.223-224). Kaku (2014) notes that this “would generate
enough heat to fry an egg on it. So leakage and heat will eventually doom Moore’s
law, and a replacement will soon be necessary” (p. 224). Ideas for countering this
problem include developing 3-D chips, as well as, the development of 2-D parallel
processing. However, Kaku (2014) notes that heat generation “rises rapidly with the
height of the chip” and expanding into 2-D parallel processing may involve
challenges with regard to the discrepancies in growth between software and
computing power (p. 224). Kaku, (2014) suggests that “these stopgap measures may
add years to Moore’s law” but they will ultimately give way to the inevitable
complications provided by quantum theory (p. 224). Kurzweil (2013) suggests that
23 Kurzweil goes significantly further than this in his notion of the Singularity.
Kurzweil (2005) predicts that by 2045, “the Singularity will allow us to transcend . . .
our bodies and brains. We will gain power over our fates . . . . We will live as long as
we want” and “we will fully understand human thinking and will vastly extend and
expand its reach” (p. 9). Unsurprisingly, many thinkers have called these claims into
question. However, historian Ian Morris (2014) notes that a 2012 survey of futurists
“found that the median date at which they anticipated a technological Singularity
was 2040, five years ahead of Kurzweil’s projection” (p. 381).
24 The Heisenberg uncertainty principle was a major theoretical breakthrough in
quantum theory. The basic idea is that an observer can’t discern the exact location
and momentum of an electron with certainty. It’s possible to know one or the other,
but not both. Back in the 1920s this created a major stir in the physics community
(Kaku, 2014, p. 330).

“when one paradigm runs out of steam . . . it creates research pressure to create the
next paradigm” (p. 255). However, it is not clear that any of the known
alternatives25 will be ready to go when the current electronics age winds down.
Nobel Prize—winning chemist Richard Smalley has noted that, “When a
scientist says something is possible, they’re probably underestimating how long it
will take. But if they say it’s impossible, they’re probably wrong” (as quoted in
Morris, 2010, p. 593). Smalley’s observation seems to fit well with the trajectory of
technological advancement in NLP and ultimately in AI. While certain advancements
in NLP and AI have proven naysayers wrong, other advancements (e.g. Pinker’s
“hard easy problems”) have been slow in coming. It has only been very recently that
scientists have begun to build machines capable of carrying out tasks that most
humans take for granted.26 When considering the snail’s pace by which some of
these capabilities have been advancing, as well as, the dilemmas and uncertainties in
projecting Moore’s law into the future, it seems unlikely that NLP in machines will
reach the capacities of humans before the latter decades of the twenty-first
century.27 However, it seems that a variety of highly specialized systems will
continue to be developed that exceed the capacities of humans.28
Conclusion
25 Many of the alternatives are not well developed at this point. Some of these
include molecular computing, quantum computing, and DNA computing, among
others (Kaku, 2014, p. 224).
26 For example, in 2012 scientists at Yale University built a robot that was able to
pass the mirror test (Kaku, 2014, p. 241).
27 There are too many variables to take into account to make a confident prediction
on this matter.
28 One notable example includes the ability of computers to drive cars. It is likely
that in the near future, computers will show better judgment at driving automobiles
than most humans.

The development of NLP systems has continually impacted the field of IR. As
advancements in NLP continue to progress, the gap between IR and NLP will likely
narrow. Recent research and development in IR systems has involved cross-
language information retrieval, machine translation, text mining, speech
recognition, information extraction, and question answering, among other examples
(Chowdhury, 2010, p. 432). Variations and combinations of the theories and
methods discussed in this study have been used to further the development of these
areas. Given the nature of how progress has been obtained in AI, it seems likely that
highly specialized systems will continue to advance at a pace far faster than systems
attempting to employ ‘common sense’ and other indispensible human traits. The
development of such ‘common sense’ systems would certainly revolutionize the
field of IR by combining the capacities of humans with the storage and retrieval
capacities of computers. It is this study’s conclusion that under the best-case
scenario, such systems are probably six or seven decades away. However, it is this
author’s view that there is little reason to believe that the development of ‘super
human’ intelligence is impossible.29 In the mean time, the development of
specialized AI and NLP systems will continue to have a profound impact on human
life.
29 For works that take into account this philosophical question, see Dennett (1996)
and Hofstadter (1999).

References
Barr, A., Cohen, P. R., & Feigenbaum, E. A. (1989). The handbook of Artificial
Intelligence: Volume IV. Reading, MA: Addison-Wesley Publishing Company,
Inc.
Chomsky, N. (2012). The science of language: Interviews with James McGilvray. New
York, NY: Cambridge University Press.
Chowdhury, G. G. (2010). Introduction to modern information retrieval (3rd ed.). New
York, NY: Neal-Schuman Publishers, Inc.
Dennett, D. C. (1996). Darwin’s dangerous idea: Evolution and the meanings of life.
New York, NY: Touchstone.
Dyson, G. (2012). Turing’s cathedral: The origins of the digital universe. New York,
NY: Vintage Books.
Gleick, J. (2012). The information: A history, a theory, a flood. New York, NY: Vintage
Books.
Goldstein, R. (2006). Incompleteness: The proof and paradox of Kurt Gödel. New York,
NY: W.W. Norton & Company, Inc.
Hodges, A. (2012). Alan Turing: The enigma. Princeton, NJ: Princeton University
Press.
Hofstadter, D. R. (1999). Gödel, Escher, Bach: An eternal golden braid. New York, NY:
Basic Books.
Jordan, M. I., & Bishop, C. M. (1996). Neural Networks. Computing Surveys, 28(1).
Retrieved from
http://delivery.acm.org.libaccess.sjlibrary.org/10.1145/240000/234348/p7

3-
jordan.pdf?ip=130.65.109.155&id=234348&acc=ACTIVE%20SERVICE&key=
F26C2ADAC1542D74%2ED0BD0A8C52906328%2E4D4702B0C3E38B35%
2E4D4702B0C3E38B35&CFID=519644732&CFTOKEN=11959246&__acm__=
1406706262_8ce2d0a5369ccec6658622fd1dcbbc7e
Kaku, M. (2014). The future of the mind: The scientific quest to understand, enhance,
and empower the mind. New York, NY: Doubleday.
Kurzweil, R. (2005). The singularity is near: When humans transcend biology. New
York, NY: Viking.
Kurzweil, R. (2013). How to create a mind: The secret of human thought revealed.
New York, NY: Penguin Books.
Minsky, M. (1986). The society of mind. New York, NY: Simon & Schuster, Inc.
Minsky, M. (2006). The emotion machine: Commonsense thinking, artificial
intelligence, and the future of the human mind. New York, NY: Simon &
Schuster Paperbacks.
Morris, I. (2010) Why the west rules—for now: The patterns of history, and what they
reveal about the future. New York, NY: Farrar, Straus and Giroux.
Morris, I. (2014). War! What is it good for?: Conflict and the progress of civilization
from primates to robots. New York, NY: Farrar, Straus and Giroux.
Nagel, E., & Newman, J. R. (2001). Gödel’s proof (Revised edition). New York, NY:
New York University Press.
Pinker, S. (2000). Words and rules: The ingredients of language. London, UK: The
Softback Preview.

Pinker, S. (2007). The language instinct: How the mind creates language. New York,
NY: HarperCollins Publishers.
Pinker, S. (2013). Language, cognition, and human nature: Selected articles. New
York, NY: Oxford University Press.
Rumelhart, D. E., McClelland, J. L. (1986). In Parallel Distributed Processing:
Explorations in the Microstructure of Cognition (Volume 2: Psychological and
Biological Models) (McClelland, J. L., Rumelhart, D. E. and the PDP Research
Group, eds.). Retrieved from
http://mind.cog.jhu.edu/faculty/smolensky/050.326-
626/Foundations%20Readings%20PDFs/Rumelhart&McClelland-1986-
PastTense.pdf
Seung, S. (2013). Connectome: How the brain’s wiring makes us who we are. New
York, NY: Mariner Books.
Silver, N. (2012). The signal and the noise: Why so many predictions fail – but some
don’t. New York, NY: The Penguin Press.

Morse, Christian - LIBR 202 - The Future of Natural Language Processing

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Morse, Christian - LIBR 202 - The Future of Natural Language Processing

Similar to Morse, Christian - LIBR 202 - The Future of Natural Language Processing (20)

Morse, Christian - LIBR 202 - The Future of Natural Language Processing