DATA, THE ANDROID
Gillian Smith, Northeastern University
Assistant Professor, Art+Design/Computer Science
e: gi.smith@neu.edu tw: @gillianmsmith
STAR TREK:
THE NEXT
GENERATION
1987-1994
DESCRIPTION
➤ First aired: 1987
➤ 21 years after The Original
Series!
➤ Show themes
➤ Exploration
➤ Diplomacy
➤ Societal Issues
➤ Family
expand on show themes with examples
CULTURAL CONTEXT
➤ Personal computer revolution is
underway!
➤ 1:12 US adults owns a computer
➤ 1:7 baby boomers (23-41)
➤ AI enters its second “winter”
➤ reduced funding
➤ lowered expectations
➤ “Android” is in the lexicon
➤ Star Wars (1977)
http://articles.sun-sentinel.com/1987-01-25/features/
8701050968_1_word-processing-boomers-home-computers
When The Original Series aired (1966), in comparison, computers were still largely in the realm of research and industry. Age of building super computers as big as a
room. Weisenbaum completed the ELIZA project in 1966. Introduction of IBM’s PS/2 computer.
WHO IS
DATA?
exploring humanity
TNG’S SPOCK
➤ Emotionless, logic-driven
➤ Bridge officer
➤ Trusted advisor
➤ Outside, “alien” perspective
➤ Excuse for explicit dialog
around human behavior and
societal expectation
Data is to TNG what Spock is to TOS. They share many traits, though in different ways. Spock chooses to be logic-driven and rejects human concepts of emotion; Data
has no choice but to be logic-driven and wants to be more human (including understanding emotion). Both are bridge officers and trusted advisors to their captain. They
both offer the ability to offer an outside, alien perspective on the happenings aboard the enterprise, bring a different perspective than the remainder of the crew have.

Primarily, they both serve a narrative role of allowing the show to build explicit dialog around human behavior and societal expectations. Crew often has to patiently
explain core concepts of humanity (that are difficult to unpack!) to Data.
BECOMING MORE HUMAN
➤ Gene Roddenberry’s vision of
Data’s progression
➤ “If being human is not simply a
matter of being born flesh and
blood, if it is instead a way of
thinking, acting and... feeling, then
I am hopeful that one day I will
discover my own humanity. -... -
Until then, Commander Maddox, I
will continue learning, changing,
growing, and trying to become more
than what I am.”

- Data, “TNG: Data’s Day”
Gene Roddenberry, creator of the show, had explicitly told Brent Spiner (who portrays Data) that he wanted to see Data progress further towards humanity over the
course of the series, but never quite get there. This desire for progress is built into Data’s every decision, a baked in characteristic that drives him moment-to-moment
HIGHLIGHTING DIFFERENCE
➤ Strong emphasis on Data’s
physical capabilities
➤ Fast reading
➤ Fast typing/finger dexterity
➤ Physical strength
➤ Environmental tolerance
➤ Every episode points out
some difference between Data
and the rest of the crew
➤ Dr. Pulaski
show creators go to great lengths to continually point out Data’s physical differences

also a fascinating insight into how we thought about robots interfacing with machines. creators knew that data could interface directly (he does it a couple times, at least),
but still choose for him to take the familiar (if accelerated) route of manual typing. makes him more familiar to audiences and to his crew-mates. also perhaps evidence of
just not thinking through future interaction mechanisms (e.g. Asimov’s Lucky Star ships with faster than light travel but use of typewriter-style text output)
INTERROGATING HUMANITY
Captain Jean-Luc Picard: Oh, yes! For
humans, touch can connect you to an
object in a very personal way, make it
seem more real. 



Lieutenant Commander Data: I am
detecting imperfections in the
titanium casing... temperature
variations in the fuel manifold... it is
no more "real" to me now than it was
a moment ago.


-Picard & Data, ST: First Contact
…and the use that difference, interrogating what it means to be human on large and small scales, for the benefit of both Data (who continues to learn from example) and
the (human) viewers of the show to reflect on our own natures
HOW DATA
WORKS
embodiment and the
“positronic” brain
Data: an architecture diagram
v0.0.1a
A BIPEDAL ROBOT
➤ Only one aspect of 20th - early
21st century robotics
➤ Movement is complex
➤ Different purpose in robotics
➤ Embodiment
➤ Perception of “humanity”
➤ Builds empathy/implicit
understanding of human
limitation
➤ Does embodiment require
flesh-and-blood?
Data’s embodiment as a bipedal robot asks philosophical questions. Why does he look like us? To try to be more like us and more familiar to us. How does he pass the
uncanny valley test? By not looking quite exactly like us (different skin tone from human, believable but not perfect facial expression). Embodiment builds empathy, but
there is the question of whether Data can truly feel what it’s like to be human without a flesh-and-blood body.
STORAGE SYSTEM
➤ 100,000 terabyte storage
capacity
➤ Equal to total amount of
storage for photo/video on
Facebook in 2012
➤ Rapid and perfect recall
➤ High bandwidth I/O
Gresh & Weinberg, “The Computers of Star Trek”, 

Chapter 6: Data
KNOWLEDGE REPRESENTATION
➤ Capable of building and
updating semantic
relationships
➤ Presumably highly (and
efficiently) compressed
➤ Standardized format or easy to
convert between formats on
the fly
NEURAL NETWORK
➤ “Positronic” neural network
➤ Only three successful positronic
brains ever created
➤ Data
➤ Lore
➤ Juliana Tainer
➤ Best guess: neural network with many
hidden layers
➤ Supervised learning
➤ Unsupervised learning
➤ Reinforcement learning
➤ Hardware accelerated?
DATA’S SCRIPTING ENGINE
➤ Scripted “subroutines” for
➤ personality (hot-swappable!)
➤ high-level behavioral guidance
➤ domain-specific scenarios
➤ Many implied styles of
subroutine
➤ Linear scripts
➤ Decision trees
➤ Expert system-style heuristics
➤ Learned from external data
MODULAR ARCHITECTURE
➤ Components can be swapped out:
➤ New personality modules for
holodeck characters
➤ New insight into domain-
specific activity (e.g. dance)
➤ Self-programmable
➤ Management/coordination?
➤ Emergent behaviors
DATA AS A
SUPERHUMAN
faux intelligence
INFORMATION PROCESSING
➤ Data as a “computer”
➤ calculation
➤ data entry (via visual
processing system?!)
➤ Manual dexterity and accuracy
➤ Strength and other
superhuman physical abilities
“Actually, I am capable of
distinguishing over one hundred and
fifty simultaneous compositions. But
in order to analyze the aesthetics, I
try to keep it to ten or less.
-Data, “TNG: A Matter of Time”
Scene where another character walks in to find Data listening to four pieces of music, loudly and simultaneously. Data explains that he would be listening to more, but
he’s busy running a complex simulation, so is listening to only four today. Data excels at truly parallel multitasking, far beyond human capacity.
“Actually, I am capable of
distinguishing over one hundred and
fifty simultaneous compositions. But
in order to analyze the aesthetics, I
try to keep it to ten or less.
-Data, “TNG: A Matter of Time”
The show often shows his multitasking as the direct benefit to the crew (locking out the computer against a stronger enemy, asking him to perform complex calculations
in his head, using physical dexterity and speed to pilot the ship), so it’s easy to dismiss this as “faux” intelligence — simply human intelligence but sped up and
mechanized. But Data uses
DATA AS A
RATIONAL
AGENT
decision-making
“For each possible percept sequence, an ideal
rational agent should do whatever action is
expected to maximize its performance measure,
on the basis of the evidence provided by the
percept sequence and whatever built-in
knowledge the agent has.
-Russell & Norvig, AI: A Modern Approach
in many ways, Data is the perfect “rational” agent by Russell and Norvig’s definition. He is the inevitable result of taking this desire for building “rational” agents to their
full conclusion. Reminder of what a rational agent is…
sense (- learn) - think - act (- explain)
Russell & Norvig, AI:AMA Ch. 2
we can think of AI agents as engaging in a continual loop of sensing their environment, thinking about what to do next, acting in that environment (which updates the
environment state). optionally, they may learn from what they just did when they perceive the environmental change. and optionally, they may try to explain what they did
or receive feedback on what they did.
SENSE
➤ visual processing system
➤ invokes its own sense-
think-act loop for
determining what has been
seen
➤ touch, hearing
➤ taste, smell? limited capacity.
➤ highly parallelized
Data’s sensory system is, in itself, a highly complex AI system by today’s standards. His visual processing system alone has basically solved machine learning. His sense
of touch is extremely refined, able to detect imperfections in a surface that humans cannot, as is his hearing. Hearing also needs processing, e.g. for verbal content.

Expectation is that taste and smell have limited capacity, since he does not need to eat, though the show is a bit vague on this.

His sensory system alone must be highly parallelized - can do all of these things at the same time and combine the inputs into how he thinks about what to do next.
LEARN
➤ Reinforcement learning: Data
learns from how the
environment and other people
react to his behavior
➤ Supplemented with
explanation from humans
for appropriate/
inappropriate behavior
➤ Often impossible to modify
without this explanation
Data’s primary mode of learning appears to be that of reinforcement learning: he interacts with the environment and learns from how it and other people react to his
behavior. Example: makes decision, based on his computational model of humor, to push Crusher into an ocean on the holodeck (based on built understanding of
slapstick comedy, etc.). Is immediately chastised for his actions, and they attempt to not only say “this is bad” but also explain *why* it is bad. He often finds it hard to
modify his behavior without such explanation… this is something we don’t really have in reinforcement learning today, to my knowledge.
THINK
➤ Look to subroutines if there is
one (or many…) to cover the
current situation
➤ Data as a discovery system
Part of “think” for Data involves finding the appropriate subroutine(s) to handle the situation he has found himself in. For example, he has a subroutine for dancing that
determines what steps he should make and how to react to a partner. 

Data also engages in what we call “scientific discovery”, part of his “think” means building new knowledge based on what is around him and adding it to a knowledge
base for later access.
EXPLAIN
➤ When quizzed, Data can
explain not just what he did,
but why he did it
➤ “Why” is never “because the
weights trained on my neural
net said I should”
➤ ….perhaps this only
happens for behavior
controlled by more scripted
subroutines?
baffled crew-mates ask Data why he makes the decisions he does; he always has a logical answer built from a specific anecdote (almost implying that he’s using case-
based reasoning rather than a neural net)
SEEKING HUMANITY
➤ Behavioral model is what
makes him an impressive AI
system
➤ Behavior model is also what
makes him unsatisfied and
seek humanity
➤ Humans are not always
“rational” agents
Data’s rationality is what makes him an impressive (and semi-recognizable) AI system by today’s standards. But it is also what makes him LESS than human. Humans are
frequently not “rational” agents, acting on emotion and impulse. While much of AI is about trying to reduce this ‘weakness’ of mankind, Data wants to embrace it.
DATA AS A
BELIEVABLE
AGENT
fake it ’til you make it?
so how does Data try to approach becoming a “believable” agent as well as a “rational” one?
MECHANISTIC BEHAVIOR
➤ Rationally-driven intelligence
impressive but does not read
as “human”
➤ Desire to avoid mechanistic
behavior and provide illusion
of humanity
“ACTING” HUMAN
➤ Facial expressions to exhibit
“emotion”
➤ Personality modules in the
Holodeck
➤ Biophysical response:
breathing, hair growth
Faking facial expressions to exhibit emotion based on what he has seen from other performers.

Builds personality modules for himself for acting in different scenes on the holodeck (e.g. Sherlock Holmes)

His body has some of this built in (though, obviously, because of the fact that he is an actor…) — a breathing system that is likely his ‘cooling’ system but appears like
breathing, ability to regular hair growth.
DATA’S YOUTH
learning from the doomed
let’s spend some time looking at different aspects of Data’s personality, what does that tell us about the underlying AI system?
SUPERVISED LEARNING
➤ Created on an isolated colony
➤ Infused with log entries,
memories, experiences of the
doomed colonists
➤ One of the methods for
bootstrapping Data’s neural
net
➤ learning behavior via
training on diverse set of
human experiences
TNG: Inheritance
Data was created by Dr. Noonien Soong on a colony on Omicron Theta. The fifth and final android to be created on that planet by Soong. (Fourth was Lore, Data’s evil
brother, the first three were complete failures). Planet was doomed due to the crystalline entity, which (basically) destroys life.

Show lore states that Data was programmed with basic behaviors during an experimental phase, then his memory wiped and he was infused with the memories and
experiences of the doomed colonists as a way to bootstrap his behavior.

AI: supervised learning (he takes time to process, analyze, and learn patterns of behavior from a variety of examples, presumably can judge reward based on how
individual choices led to change). By getting a broad set of humans, his choices are informed not by one individual (making him kind of a copy of one person) but by an
aggregate population.
CUSTOM PROGRAMMING
➤ Hand-coded:
➤ motor control, sensory
processing
➤ “modesty” subroutine (and,
presumably, others)
➤ Weaknesses of learning from
example
➤ Need broad range of positive
and negative cases
➤ Handling conflicts and
exceptions
TNG: Inheritance
But, the learning algorithms were not enough. He has hand-coded motor control and sensory processing (both of which move beyond human capability, this is
presumably basically like firmware or drivers). Interesting part is that he needed to be programmed with subroutines for societally acceptable behavior (e.g. ‘modesty’).
Comes back to what he was capable of learning from the data, as well as theories of embodiment. Presumably he saw only positive examples of people wearing clothes
in public, which means a) he may not have even pulled out the feature of ‘wears clothes’ as something to care about, and b) even if he did, he wouldn’t have seen enough
negative cases to learn that there is a societal rule in place. But also a matter of embodiment: even if he DID learn that humans wear clothes outside, it was reasonable
for him to make other assumptions — they suffer from the elements, he does not; they feel shame, he does not.
DATA AS A
FRIEND
building relationships
“As I experience certain sensory input
patterns my mental pathways become
accustomed to them. The inputs
eventually are anticipated and even
'missed' when absent.
TNG: Time’s Arrow, Pt 1
though he cannot feel an emotional connection with crew (who feel one with him, regardless) — he does have an explanation for what it means to feel ‘friendship’ with
those he knows
“FRIENDSHIP”
➤ Familiarity as a proxy for
friendship
➤ familiar path through hidden
layers of neural net?
➤ Uses friendship to learn about
appropriate social interactions
➤ Still makes large-scale mistakes
that cannot be corrected
➤ Physical humor
➤ Not-so-gentle mockery
Data uses familiarity and, perhaps, efficiency as a proxy for friendship. Implies that encountering friends produces a familiar response, as though the circuitry in his brain
is “well-worn” where friends are concerned and he can somehow feel a physical difference.

Data also uses this notion of friendship to learn about appropriate social interaction and better himself, though he still makes large-scale mistakes that he struggles to
correct due to an incomplete model of friendship and empathy.
DATA AS A
FATHER
self-reflection, understanding
“
Lal: Then why do you still try to emulate
humans? What purpose does it serve except
to remind you that you are incomplete?
Lt. Cmdr. Data: I have asked myself that
many times, as I have struggled to be more
human. Until I realized, it is the struggle
itself that is most important. We must strive
to be more than we are, Lal. It does not
matter that we will never reach our ultimate
goal. The effort yields its own rewards.
TNG: The Offspring
Data’s attempt to create Lal, his “daughter”, and how he explains the world to her is some of the strongest evidence we have that Data deliberately seeks out learning
opportunities even though he knows he may never fully reach his goal.
UNREACHABLE DESIRES
➤ Lal reflects Data’s wishes for
himself and understanding of
choice and independence
➤ Emotion
➤ Choice of appearance
➤ Shows even he does not fully
understand himself
➤ Logical reaction to loss of a
child
Lal represents Data’s unreachable desires, and shows us that one core aspect of Data’s existence is that he does not fully understand himself well enough to intentionally
replicate himself.

Lal fails because she learns more from her father and other members on the station about emotion, and is capable of experiencing them herself, but cannot handle the
sensory overload and her brain shuts down.

Lal also shows us how a logical AI might reasonably react to “loss” — he saves all her memories to his own brain to learn from her experiences.
DATA,
THE ARTIST
creativity and emotion
the final aspect of Data’s personality I want to touch on is Data as an artist, and how/why he decides to pursue creativity
MECHANICAL INTERPRETATION
➤ Painting “replicas” with high
efficiency
➤ Replicating music in style of
famous musicians
➤ Synthesis of styles
➤ Formal modeling of aesthetics
most of Data’s art is a replica of what he has seen in the past

does attempt to synthesize different notions of style and paint/perform music according to those different styles — including trying to blend styles of multiple composers

implies that he has, at the very least, come up with a formal and parametrizable model of aesthetics
DREAMING
➤ “Explore this image, Data. Let it...
excite your imagination. Focus on
it, see where it leads you. Let it
inspire you.”

-Picard, TNG:Birthright Pt. 1
➤ First time Data can try to use
art to explore his own
“culture”
THE EMOTION “CURSE”
➤ What stands in the way of
Data being an artist?
➤ Embodiment
➤ Emotion
➤ Expressivity
➤ Technical mastery and
mimicry insufficient
➤ Continues striving to learn
regardless of failings
use clay example: Data with children trying to build clay sculptures that express feelings; Data produces perfect replicas quickly but does not even understand what it
means to express an emotion, feeling, or abstract concept via clay except in the most literal sense (music = a treble clef)
PERSISTENCE IN ADVERSITY
➤ Theme of Data’s use of
machine learning: actively
seek out opportunities for
learning elements that are
known to be poorly modeled
➤ Continual over- or under-
correction in neural net due to
fundamental inability to
model emotion, empathy,
embodiment?
Data’s attempts to be an artist are inspirational: he seeks out opportunities to learn things that he knows he does not have a strong model of. But the way his neural net is
structured means he is continually over- or under-correcting based on feedback. It’s as if he is incapable of ever finding an appropriate
CAN WE
MAKE DATA?
one tiny piece of research
at a time
So, given what we’ve learned about Data — what would it take to actually create him? Even just as he is, without need for modeling emotion?
ROBOTICS OPEN RESEARCH
➤ Bipedal movement and gait
➤ Uneven ground
➤ Minimize energy
➤ Grasping, recognizing, and
using objects
➤ Computer vision
➤ Proprioception
“RAISING” AN AI
➤ Supervised learning common
technique for training
➤ Providing a rich enough
training set
➤ Providing an expressive
enough model
➤ Currently hyper-domain
specific, how do we broaden?
➤ Hardware-accelerated neural
nets?
SEMANTIC NETWORKS
➤ Building and understanding
semantic relationships crucial
to Data’s learning process
➤ ConceptNet, WordNet as
precursors
➤ Mining text sources
➤ Crowdsourcing
RATIONALIZATION
➤ Need to combine strengths of
“deep learning” and data-
driven approaches to AI with
strengths of semantic
representations and cognitive
modeling
➤ Not sufficient for a machine to
perform intelligently, must
also be able to explain itself
PRESENTATION COUNTS
➤ Artistry of AI
➤ “barks” to fake agent
purpose
➤ scripted language
➤ canned animation
➤ Danger of ELIZA effect
ELIZA effect: where the shallow AI used to give illusion of intelligence is shattered, and people can realize what is actually happening (e.g. pattern recognition for
animations, realization that it’s the same set of facial expressions over and over, realization that barks have no actual meaning)
MODELING RELATIONSHIPS
➤ Need for ability to model AI-
human relationships and
methods for interaction
➤ Robot gestural
communication, expressivity
➤ Comme il Faut system for
modeling friendship, trust, and
romance
➤ social actions built on
underlying micro-theory
that modify the relationship
network
AUTOMATED PROGRAMMING
➤ How and why does a program
write another program?
➤ Software engineering approaches
➤ templates
➤ feature modeling
➤ AI approaches
➤ genetic programming
➤ writing a “preference”
subroutine
Mike Cook’s work in writing “preferences” for ANGELINA that are consistent and generated by the system rather than by him
COMPUTATIONAL CREATIVITY
➤ Ironically, we are more
advanced here than Data
➤ Robots that explore,
conceptualize, sketch, paint,
evaluate, and explain their
work
➤ Work in building models of
analogy, metaphor, humor
DATA, THE ANDROID
Gillian Smith, Northeastern University
Assistant Professor, Art+Design/Computer Science
e: gi.smith@neu.edu tw: @gillianmsmith
So, that’s Data! A complex, fictional individual who, nonetheless, gives us something to strive towards. Star Trek, Data, and the Holodeck have been an inspiration to
generations of people entering computer science and artificial intelligence—myself included. We have a long way to go towards being able to build him (and do we even
want to? that’s an open question), but there are identifiable elements we can build from already.

Data, The Android

  • 1.
    DATA, THE ANDROID GillianSmith, Northeastern University Assistant Professor, Art+Design/Computer Science e: gi.smith@neu.edu tw: @gillianmsmith
  • 2.
  • 3.
    DESCRIPTION ➤ First aired:1987 ➤ 21 years after The Original Series! ➤ Show themes ➤ Exploration ➤ Diplomacy ➤ Societal Issues ➤ Family expand on show themes with examples
  • 4.
    CULTURAL CONTEXT ➤ Personalcomputer revolution is underway! ➤ 1:12 US adults owns a computer ➤ 1:7 baby boomers (23-41) ➤ AI enters its second “winter” ➤ reduced funding ➤ lowered expectations ➤ “Android” is in the lexicon ➤ Star Wars (1977) http://articles.sun-sentinel.com/1987-01-25/features/ 8701050968_1_word-processing-boomers-home-computers When The Original Series aired (1966), in comparison, computers were still largely in the realm of research and industry. Age of building super computers as big as a room. Weisenbaum completed the ELIZA project in 1966. Introduction of IBM’s PS/2 computer.
  • 5.
  • 6.
    TNG’S SPOCK ➤ Emotionless,logic-driven ➤ Bridge officer ➤ Trusted advisor ➤ Outside, “alien” perspective ➤ Excuse for explicit dialog around human behavior and societal expectation Data is to TNG what Spock is to TOS. They share many traits, though in different ways. Spock chooses to be logic-driven and rejects human concepts of emotion; Data has no choice but to be logic-driven and wants to be more human (including understanding emotion). Both are bridge officers and trusted advisors to their captain. They both offer the ability to offer an outside, alien perspective on the happenings aboard the enterprise, bring a different perspective than the remainder of the crew have. Primarily, they both serve a narrative role of allowing the show to build explicit dialog around human behavior and societal expectations. Crew often has to patiently explain core concepts of humanity (that are difficult to unpack!) to Data.
  • 7.
    BECOMING MORE HUMAN ➤Gene Roddenberry’s vision of Data’s progression ➤ “If being human is not simply a matter of being born flesh and blood, if it is instead a way of thinking, acting and... feeling, then I am hopeful that one day I will discover my own humanity. -... - Until then, Commander Maddox, I will continue learning, changing, growing, and trying to become more than what I am.”
 - Data, “TNG: Data’s Day” Gene Roddenberry, creator of the show, had explicitly told Brent Spiner (who portrays Data) that he wanted to see Data progress further towards humanity over the course of the series, but never quite get there. This desire for progress is built into Data’s every decision, a baked in characteristic that drives him moment-to-moment
  • 8.
    HIGHLIGHTING DIFFERENCE ➤ Strongemphasis on Data’s physical capabilities ➤ Fast reading ➤ Fast typing/finger dexterity ➤ Physical strength ➤ Environmental tolerance ➤ Every episode points out some difference between Data and the rest of the crew ➤ Dr. Pulaski show creators go to great lengths to continually point out Data’s physical differences also a fascinating insight into how we thought about robots interfacing with machines. creators knew that data could interface directly (he does it a couple times, at least), but still choose for him to take the familiar (if accelerated) route of manual typing. makes him more familiar to audiences and to his crew-mates. also perhaps evidence of just not thinking through future interaction mechanisms (e.g. Asimov’s Lucky Star ships with faster than light travel but use of typewriter-style text output)
  • 9.
    INTERROGATING HUMANITY Captain Jean-LucPicard: Oh, yes! For humans, touch can connect you to an object in a very personal way, make it seem more real. 
 
 Lieutenant Commander Data: I am detecting imperfections in the titanium casing... temperature variations in the fuel manifold... it is no more "real" to me now than it was a moment ago. 
 -Picard & Data, ST: First Contact …and the use that difference, interrogating what it means to be human on large and small scales, for the benefit of both Data (who continues to learn from example) and the (human) viewers of the show to reflect on our own natures
  • 10.
    HOW DATA WORKS embodiment andthe “positronic” brain
  • 11.
    Data: an architecturediagram v0.0.1a
  • 12.
    A BIPEDAL ROBOT ➤Only one aspect of 20th - early 21st century robotics ➤ Movement is complex ➤ Different purpose in robotics ➤ Embodiment ➤ Perception of “humanity” ➤ Builds empathy/implicit understanding of human limitation ➤ Does embodiment require flesh-and-blood? Data’s embodiment as a bipedal robot asks philosophical questions. Why does he look like us? To try to be more like us and more familiar to us. How does he pass the uncanny valley test? By not looking quite exactly like us (different skin tone from human, believable but not perfect facial expression). Embodiment builds empathy, but there is the question of whether Data can truly feel what it’s like to be human without a flesh-and-blood body.
  • 13.
    STORAGE SYSTEM ➤ 100,000terabyte storage capacity ➤ Equal to total amount of storage for photo/video on Facebook in 2012 ➤ Rapid and perfect recall ➤ High bandwidth I/O Gresh & Weinberg, “The Computers of Star Trek”, 
 Chapter 6: Data
  • 14.
    KNOWLEDGE REPRESENTATION ➤ Capableof building and updating semantic relationships ➤ Presumably highly (and efficiently) compressed ➤ Standardized format or easy to convert between formats on the fly
  • 15.
    NEURAL NETWORK ➤ “Positronic”neural network ➤ Only three successful positronic brains ever created ➤ Data ➤ Lore ➤ Juliana Tainer ➤ Best guess: neural network with many hidden layers ➤ Supervised learning ➤ Unsupervised learning ➤ Reinforcement learning ➤ Hardware accelerated?
  • 16.
    DATA’S SCRIPTING ENGINE ➤Scripted “subroutines” for ➤ personality (hot-swappable!) ➤ high-level behavioral guidance ➤ domain-specific scenarios ➤ Many implied styles of subroutine ➤ Linear scripts ➤ Decision trees ➤ Expert system-style heuristics ➤ Learned from external data
  • 17.
    MODULAR ARCHITECTURE ➤ Componentscan be swapped out: ➤ New personality modules for holodeck characters ➤ New insight into domain- specific activity (e.g. dance) ➤ Self-programmable ➤ Management/coordination? ➤ Emergent behaviors
  • 18.
  • 19.
    INFORMATION PROCESSING ➤ Dataas a “computer” ➤ calculation ➤ data entry (via visual processing system?!) ➤ Manual dexterity and accuracy ➤ Strength and other superhuman physical abilities
  • 20.
    “Actually, I amcapable of distinguishing over one hundred and fifty simultaneous compositions. But in order to analyze the aesthetics, I try to keep it to ten or less. -Data, “TNG: A Matter of Time” Scene where another character walks in to find Data listening to four pieces of music, loudly and simultaneously. Data explains that he would be listening to more, but he’s busy running a complex simulation, so is listening to only four today. Data excels at truly parallel multitasking, far beyond human capacity.
  • 21.
    “Actually, I amcapable of distinguishing over one hundred and fifty simultaneous compositions. But in order to analyze the aesthetics, I try to keep it to ten or less. -Data, “TNG: A Matter of Time” The show often shows his multitasking as the direct benefit to the crew (locking out the computer against a stronger enemy, asking him to perform complex calculations in his head, using physical dexterity and speed to pilot the ship), so it’s easy to dismiss this as “faux” intelligence — simply human intelligence but sped up and mechanized. But Data uses
  • 22.
  • 23.
    “For each possiblepercept sequence, an ideal rational agent should do whatever action is expected to maximize its performance measure, on the basis of the evidence provided by the percept sequence and whatever built-in knowledge the agent has. -Russell & Norvig, AI: A Modern Approach in many ways, Data is the perfect “rational” agent by Russell and Norvig’s definition. He is the inevitable result of taking this desire for building “rational” agents to their full conclusion. Reminder of what a rational agent is…
  • 24.
    sense (- learn)- think - act (- explain) Russell & Norvig, AI:AMA Ch. 2 we can think of AI agents as engaging in a continual loop of sensing their environment, thinking about what to do next, acting in that environment (which updates the environment state). optionally, they may learn from what they just did when they perceive the environmental change. and optionally, they may try to explain what they did or receive feedback on what they did.
  • 25.
    SENSE ➤ visual processingsystem ➤ invokes its own sense- think-act loop for determining what has been seen ➤ touch, hearing ➤ taste, smell? limited capacity. ➤ highly parallelized Data’s sensory system is, in itself, a highly complex AI system by today’s standards. His visual processing system alone has basically solved machine learning. His sense of touch is extremely refined, able to detect imperfections in a surface that humans cannot, as is his hearing. Hearing also needs processing, e.g. for verbal content. Expectation is that taste and smell have limited capacity, since he does not need to eat, though the show is a bit vague on this. His sensory system alone must be highly parallelized - can do all of these things at the same time and combine the inputs into how he thinks about what to do next.
  • 26.
    LEARN ➤ Reinforcement learning:Data learns from how the environment and other people react to his behavior ➤ Supplemented with explanation from humans for appropriate/ inappropriate behavior ➤ Often impossible to modify without this explanation Data’s primary mode of learning appears to be that of reinforcement learning: he interacts with the environment and learns from how it and other people react to his behavior. Example: makes decision, based on his computational model of humor, to push Crusher into an ocean on the holodeck (based on built understanding of slapstick comedy, etc.). Is immediately chastised for his actions, and they attempt to not only say “this is bad” but also explain *why* it is bad. He often finds it hard to modify his behavior without such explanation… this is something we don’t really have in reinforcement learning today, to my knowledge.
  • 27.
    THINK ➤ Look tosubroutines if there is one (or many…) to cover the current situation ➤ Data as a discovery system Part of “think” for Data involves finding the appropriate subroutine(s) to handle the situation he has found himself in. For example, he has a subroutine for dancing that determines what steps he should make and how to react to a partner. Data also engages in what we call “scientific discovery”, part of his “think” means building new knowledge based on what is around him and adding it to a knowledge base for later access.
  • 28.
    EXPLAIN ➤ When quizzed,Data can explain not just what he did, but why he did it ➤ “Why” is never “because the weights trained on my neural net said I should” ➤ ….perhaps this only happens for behavior controlled by more scripted subroutines? baffled crew-mates ask Data why he makes the decisions he does; he always has a logical answer built from a specific anecdote (almost implying that he’s using case- based reasoning rather than a neural net)
  • 29.
    SEEKING HUMANITY ➤ Behavioralmodel is what makes him an impressive AI system ➤ Behavior model is also what makes him unsatisfied and seek humanity ➤ Humans are not always “rational” agents Data’s rationality is what makes him an impressive (and semi-recognizable) AI system by today’s standards. But it is also what makes him LESS than human. Humans are frequently not “rational” agents, acting on emotion and impulse. While much of AI is about trying to reduce this ‘weakness’ of mankind, Data wants to embrace it.
  • 30.
    DATA AS A BELIEVABLE AGENT fakeit ’til you make it? so how does Data try to approach becoming a “believable” agent as well as a “rational” one?
  • 31.
    MECHANISTIC BEHAVIOR ➤ Rationally-drivenintelligence impressive but does not read as “human” ➤ Desire to avoid mechanistic behavior and provide illusion of humanity
  • 32.
    “ACTING” HUMAN ➤ Facialexpressions to exhibit “emotion” ➤ Personality modules in the Holodeck ➤ Biophysical response: breathing, hair growth Faking facial expressions to exhibit emotion based on what he has seen from other performers. Builds personality modules for himself for acting in different scenes on the holodeck (e.g. Sherlock Holmes) His body has some of this built in (though, obviously, because of the fact that he is an actor…) — a breathing system that is likely his ‘cooling’ system but appears like breathing, ability to regular hair growth.
  • 33.
    DATA’S YOUTH learning fromthe doomed let’s spend some time looking at different aspects of Data’s personality, what does that tell us about the underlying AI system?
  • 34.
    SUPERVISED LEARNING ➤ Createdon an isolated colony ➤ Infused with log entries, memories, experiences of the doomed colonists ➤ One of the methods for bootstrapping Data’s neural net ➤ learning behavior via training on diverse set of human experiences TNG: Inheritance Data was created by Dr. Noonien Soong on a colony on Omicron Theta. The fifth and final android to be created on that planet by Soong. (Fourth was Lore, Data’s evil brother, the first three were complete failures). Planet was doomed due to the crystalline entity, which (basically) destroys life. Show lore states that Data was programmed with basic behaviors during an experimental phase, then his memory wiped and he was infused with the memories and experiences of the doomed colonists as a way to bootstrap his behavior. AI: supervised learning (he takes time to process, analyze, and learn patterns of behavior from a variety of examples, presumably can judge reward based on how individual choices led to change). By getting a broad set of humans, his choices are informed not by one individual (making him kind of a copy of one person) but by an aggregate population.
  • 35.
    CUSTOM PROGRAMMING ➤ Hand-coded: ➤motor control, sensory processing ➤ “modesty” subroutine (and, presumably, others) ➤ Weaknesses of learning from example ➤ Need broad range of positive and negative cases ➤ Handling conflicts and exceptions TNG: Inheritance But, the learning algorithms were not enough. He has hand-coded motor control and sensory processing (both of which move beyond human capability, this is presumably basically like firmware or drivers). Interesting part is that he needed to be programmed with subroutines for societally acceptable behavior (e.g. ‘modesty’). Comes back to what he was capable of learning from the data, as well as theories of embodiment. Presumably he saw only positive examples of people wearing clothes in public, which means a) he may not have even pulled out the feature of ‘wears clothes’ as something to care about, and b) even if he did, he wouldn’t have seen enough negative cases to learn that there is a societal rule in place. But also a matter of embodiment: even if he DID learn that humans wear clothes outside, it was reasonable for him to make other assumptions — they suffer from the elements, he does not; they feel shame, he does not.
  • 36.
  • 37.
    “As I experiencecertain sensory input patterns my mental pathways become accustomed to them. The inputs eventually are anticipated and even 'missed' when absent. TNG: Time’s Arrow, Pt 1 though he cannot feel an emotional connection with crew (who feel one with him, regardless) — he does have an explanation for what it means to feel ‘friendship’ with those he knows
  • 38.
    “FRIENDSHIP” ➤ Familiarity asa proxy for friendship ➤ familiar path through hidden layers of neural net? ➤ Uses friendship to learn about appropriate social interactions ➤ Still makes large-scale mistakes that cannot be corrected ➤ Physical humor ➤ Not-so-gentle mockery Data uses familiarity and, perhaps, efficiency as a proxy for friendship. Implies that encountering friends produces a familiar response, as though the circuitry in his brain is “well-worn” where friends are concerned and he can somehow feel a physical difference. Data also uses this notion of friendship to learn about appropriate social interaction and better himself, though he still makes large-scale mistakes that he struggles to correct due to an incomplete model of friendship and empathy.
  • 39.
  • 40.
    “ Lal: Then whydo you still try to emulate humans? What purpose does it serve except to remind you that you are incomplete? Lt. Cmdr. Data: I have asked myself that many times, as I have struggled to be more human. Until I realized, it is the struggle itself that is most important. We must strive to be more than we are, Lal. It does not matter that we will never reach our ultimate goal. The effort yields its own rewards. TNG: The Offspring Data’s attempt to create Lal, his “daughter”, and how he explains the world to her is some of the strongest evidence we have that Data deliberately seeks out learning opportunities even though he knows he may never fully reach his goal.
  • 41.
    UNREACHABLE DESIRES ➤ Lalreflects Data’s wishes for himself and understanding of choice and independence ➤ Emotion ➤ Choice of appearance ➤ Shows even he does not fully understand himself ➤ Logical reaction to loss of a child Lal represents Data’s unreachable desires, and shows us that one core aspect of Data’s existence is that he does not fully understand himself well enough to intentionally replicate himself. Lal fails because she learns more from her father and other members on the station about emotion, and is capable of experiencing them herself, but cannot handle the sensory overload and her brain shuts down. Lal also shows us how a logical AI might reasonably react to “loss” — he saves all her memories to his own brain to learn from her experiences.
  • 42.
    DATA, THE ARTIST creativity andemotion the final aspect of Data’s personality I want to touch on is Data as an artist, and how/why he decides to pursue creativity
  • 43.
    MECHANICAL INTERPRETATION ➤ Painting“replicas” with high efficiency ➤ Replicating music in style of famous musicians ➤ Synthesis of styles ➤ Formal modeling of aesthetics most of Data’s art is a replica of what he has seen in the past does attempt to synthesize different notions of style and paint/perform music according to those different styles — including trying to blend styles of multiple composers implies that he has, at the very least, come up with a formal and parametrizable model of aesthetics
  • 44.
    DREAMING ➤ “Explore thisimage, Data. Let it... excite your imagination. Focus on it, see where it leads you. Let it inspire you.”
 -Picard, TNG:Birthright Pt. 1 ➤ First time Data can try to use art to explore his own “culture”
  • 45.
    THE EMOTION “CURSE” ➤What stands in the way of Data being an artist? ➤ Embodiment ➤ Emotion ➤ Expressivity ➤ Technical mastery and mimicry insufficient ➤ Continues striving to learn regardless of failings use clay example: Data with children trying to build clay sculptures that express feelings; Data produces perfect replicas quickly but does not even understand what it means to express an emotion, feeling, or abstract concept via clay except in the most literal sense (music = a treble clef)
  • 46.
    PERSISTENCE IN ADVERSITY ➤Theme of Data’s use of machine learning: actively seek out opportunities for learning elements that are known to be poorly modeled ➤ Continual over- or under- correction in neural net due to fundamental inability to model emotion, empathy, embodiment? Data’s attempts to be an artist are inspirational: he seeks out opportunities to learn things that he knows he does not have a strong model of. But the way his neural net is structured means he is continually over- or under-correcting based on feedback. It’s as if he is incapable of ever finding an appropriate
  • 47.
    CAN WE MAKE DATA? onetiny piece of research at a time So, given what we’ve learned about Data — what would it take to actually create him? Even just as he is, without need for modeling emotion?
  • 48.
    ROBOTICS OPEN RESEARCH ➤Bipedal movement and gait ➤ Uneven ground ➤ Minimize energy ➤ Grasping, recognizing, and using objects ➤ Computer vision ➤ Proprioception
  • 49.
    “RAISING” AN AI ➤Supervised learning common technique for training ➤ Providing a rich enough training set ➤ Providing an expressive enough model ➤ Currently hyper-domain specific, how do we broaden? ➤ Hardware-accelerated neural nets?
  • 50.
    SEMANTIC NETWORKS ➤ Buildingand understanding semantic relationships crucial to Data’s learning process ➤ ConceptNet, WordNet as precursors ➤ Mining text sources ➤ Crowdsourcing
  • 51.
    RATIONALIZATION ➤ Need tocombine strengths of “deep learning” and data- driven approaches to AI with strengths of semantic representations and cognitive modeling ➤ Not sufficient for a machine to perform intelligently, must also be able to explain itself
  • 52.
    PRESENTATION COUNTS ➤ Artistryof AI ➤ “barks” to fake agent purpose ➤ scripted language ➤ canned animation ➤ Danger of ELIZA effect ELIZA effect: where the shallow AI used to give illusion of intelligence is shattered, and people can realize what is actually happening (e.g. pattern recognition for animations, realization that it’s the same set of facial expressions over and over, realization that barks have no actual meaning)
  • 53.
    MODELING RELATIONSHIPS ➤ Needfor ability to model AI- human relationships and methods for interaction ➤ Robot gestural communication, expressivity ➤ Comme il Faut system for modeling friendship, trust, and romance ➤ social actions built on underlying micro-theory that modify the relationship network
  • 54.
    AUTOMATED PROGRAMMING ➤ Howand why does a program write another program? ➤ Software engineering approaches ➤ templates ➤ feature modeling ➤ AI approaches ➤ genetic programming ➤ writing a “preference” subroutine Mike Cook’s work in writing “preferences” for ANGELINA that are consistent and generated by the system rather than by him
  • 55.
    COMPUTATIONAL CREATIVITY ➤ Ironically,we are more advanced here than Data ➤ Robots that explore, conceptualize, sketch, paint, evaluate, and explain their work ➤ Work in building models of analogy, metaphor, humor
  • 56.
    DATA, THE ANDROID GillianSmith, Northeastern University Assistant Professor, Art+Design/Computer Science e: gi.smith@neu.edu tw: @gillianmsmith So, that’s Data! A complex, fictional individual who, nonetheless, gives us something to strive towards. Star Trek, Data, and the Holodeck have been an inspiration to generations of people entering computer science and artificial intelligence—myself included. We have a long way to go towards being able to build him (and do we even want to? that’s an open question), but there are identifiable elements we can build from already.