SlideShare a Scribd company logo
José Hernández-Orallo
Dep. de Sistemes Informàtics i Computació,
Universitat Politècnica de València
jorallo@dsic.upv.es
Talk for the Cognitive Systems Institute Speaker series
8 June 2017* Based on parts of the book:
“The Measure of All Minds”:
http://allminds.org
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
2
“Greatest accuracy, at the frontiers of science,
requires greatest effort, and probably the most
expensive or complicated of measurement
instruments and procedures”
(David Hand, 2004).
COGNITIVE SYSTEMS: MUCH MORE THAN AI
 Computers:
 AI or AGI systems, robots, bots, …
 Cognitively-enhanced organisms, cognitive prosthetics
 Cyborgs, technology-enhanced humans
 Biologically-enhanced computers:
 Human computation and their data
 (Hybrid) collectives
 Virtual social networks, crowdsourcing
 Minimal or rare cognition
 Artificial life (more like bacteria, plants, etc.)
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
3
Societal impact on
work, leisure, health, etc.,
difficult to assess as we
do not know the cognitive
capabilities of all these
new systems.
THE EVALUATION DISCORDANCE: AI EVALUATION
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
4
Edited image, originally from wikicommons
"[AI is] the science of making
machines do things that would require
intelligence if done by [humans]."
Marvin Minsky (1968).
 They can do the “things” (tasks) without
featuring intelligence.
 Once the task is solved (“superhuman”),
it is no longer an AI problem (“AI effect”)
 AI would have progressed very significantly
(see, e.g., Nilsson, 2009, chap. 32, or Bostrom, 2014, Table 1, pp. 12–13).
 But AI is now full of idiots savants.
THE EVALUATION DISCORDANCE: AI EVALUATION
 Specific (task-oriented) AI systems
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
5
Machine translation, information retrieval,
summarisation
Warning!
Intelligence
NOT included.
PR: computer vision,
speech recognition, etc.
Robotic
navigation
Driverless
vehicles
Prediction and
estimation
Planning and
scheduling
Automated
deduction
Knowledge-
based assistants
Game
playing
Warning!
Intelligence
NOT included. Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
All images from wikicommons
THE EVALUATION DISCORDANCE: AI EVALUATION
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
6
 Specific domain evaluation settings:
 CADE ATP System Competition  PROBLEM BENCHMARKS
 Termination Competition  PROBLEM BENCHMARKS
 The reinforcement learning competition  PROBLEM BENCHMARKS
 Program synthesis (Syntax-guided synthesis)  PROBLEM BENCHMARKS
 Loebner Prize  HUMAN DISCRIMINATION
 Robocup and FIRA (robot football/soccer)  PEER CONFRONTATION
 International Aerial Robotics Competition (pilotless aircraft)  PROBLEM BENCHMARKS
 DARPA driverless cars, Cyber Grand Challenge, Rescue Robotics  PROBLEM BENCHMARKS
 The planning competition  PROBLEM BENCHMARKS
 General game playing AAAI competition  PEER CONFRONTATION
 BotPrize (videogame player) contest  HUMAN DISCRIMINATION
 World Computer Chess Championship  PEER CONFRONTATION
 Computer Olympiad  PEER CONFRONTATION
 Annual Computer Poker Competition  PEER CONFRONTATION
 Trading agent competition  PEER CONFRONTATION
 Robo Chat Challenge  HUMAN DISCRIMINATION
 UCI repository, PRTools, or KEEL dataset repository.  PROBLEM BENCHMARKS
 KDD-cup challenges and ML kaggle competitions  PROBLEM BENCHMARKS
 Machine translation corpora: Europarl, SE times corpus, the euromatrix, Tenjinno competitions…  PROBLEM BENCHMARKS
 NLP corpora: linguistic data consortium, …  PROBLEM BENCHMARKS
 Warlight AI Challenge  PEER CONFRONTATION
 The Arcade Learning Environment  PROBLEM BENCHMARKS
 Pathfinding benchmarks (gridworld domains)  PROBLEM BENCHMARKS
 Genetic programming benchmarks  PROBLEM BENCHMARKS
 CAPTCHAs  HUMAN DISCRIMINATION
 Graphics Turing Test  HUMAN DISCRIMINATION
 FIRA HuroCup humanoid robot competitions  PROBLEM BENCHMARKS
 …
THE EVALUATION DISCORDANCE: AI EVALUATION
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
7
Cognitive robots
Intelligent assistants
Pets, animats and other
artificial companions
Smart environments
Agents, avatars, chatbots
Web-bots, Smartbots, Security bots…
 How to evaluate general-purpose systems and cognitive components?
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
THE EVALUATION DISCORDANCE: AI EVALUATION
 “Mythical Turing Test” (Sloman, 2014)
and its myriad variants…
 Mythical human-level machine intelligence
 A red herring for general-purpose AI!
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
8
THE EVALUATION DISCORDANCE: AI EVALUATION
 What benchmarks? More comprehensive?
 ARISTO (Allen Institute for AI) : College science exams
 Winograd Schema Challenge : Questions targeting understanding.
 Weston et al. “AI-Complete Question Answering” (bAbI)
 CLEVR : Relations over visual objects
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
9
BEWARE: AI-Completeness claimed before
Calculation, Chess, Go, Turing test, …
Now AI is superhuman on most of them!
(e.g., https://arxiv.org/pdf/1706.01427.pdf)
THE EVALUATION DISCORDANCE: TEST MISMATCH
 What about psychometric tests or animal tests in AI?
 In 2003, Sanghi & Dowe :
 simple program passing many IQ tests.
 About 960 lines of code in Perl!
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
10
This made the point
unequivocally:
programs passing IQ
tests are not
necessarily intelligent
THE EVALUATION DISCORDANCE: TEST MISMATCH
 This has not been a deterrent!
 Psychometric AI (Bringsjord and Schmimanski 2003):
 An “agent is intelligent if and only if it excels at all
established, validated tests of intelligence”.
 Detterman, editor of the Intelligence Journal, posed “A
challenge to Watson” (Detterman 2011)
 2nd level to “be truly intelligent”: tests not seen
beforehand.
 “IQ tests are not for machines, yet” (Dowe & Hernandez-Orallo
2012)
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
11
THE EVALUATION DISCORDANCE: TEST MISMATCH
 What about developmental tests (or tests for children)?
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
12
 Developmental robotics:
 Battery of tests (Sinapov, Stoytchev, Schenk 2010-13)
 Cognitive architectures:
 Newell “test” (Anderson and Lebiere 2003)
 “Cognitive Decathlon” (Mueller 2007).
 AGI: high-level competency areas (Adams
et al. 2012), task breadth (Goertzel et al 2009,
Rohrer 2010), robot preschool (Goertzel and
Bugaj 2009).
a taxonomy for
cognitive architectures
a psychometric
taxonomy (CHC)
THE EVALUATION DISCORDANCE: TEST MISMATCH
 Adapting tests between disciplines (AI, psychometrics, comparative
psychology) is problematic:
 Test from one group only valid and reliable for the original group.
 Not necessary and/or not sufficient for the ability.
 Machines and hybrids represent a new population.
 Nowadays, many benchmarks are assuming that AI will use deep
learning or millions of examples.
 But machines and hybrids are also an opportunity to understand how
to evaluate cognition. Still,
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
13
We need a different foundation
THE ALGORITHMIC CONFLUENCE: WHAT IQ TESTS MEASURE
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
14
 “Beyond the Turing Test”…
 “Intelligence” definition and test (C-test) based on algorithmic
information theory (Hernandez-Orallo 1998-2000).
 Letter series common in cognitive tests (Thurstone).
 Here generated from a TM with properties (projectibility, stability, …).
 Their difficulty is calculated by Kt
 Linked with Levin’s universal search, Solomonoff’s inductive
inference, Kolmogorov complexity.
THE ALGORITHMIC CONFLUENCE: WHAT IQ TESTS MEASURE
 Metric derived by slicing by difficulty h (Kt) and :
 This is IQ-test re-engineering!
 Intelligence no longer “what intelligence tests measure” (Boring, 1923).
 Clues about what IQ tests really measure? Inductive inference.
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
15
Human performance
correlated with the difficulty
(h) of each exercise.
But remember Sanghi and Dowe 2003!
THE ALGORITHMIC CONFLUENCE: SITUATED TESTS
 Passive to interactive view:
 Intelligence as performance in a range of worlds.
 The set of worlds M is described by Turing machines.
 Intelligence is measured as an aggregate:
 R aggregates ri and p assigns probabilities to environments. How?
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
16
π μ
ri
oi
ai
THE ALGORITHMIC CONFLUENCE: SOLUTIONAL APPROACH
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
17
 Three approaches:
Range of difficulties Diversity of solutions
[universal, e.g. Legg and Hutter]
[uniform] [universal]
[universal][uniform][uniform]
[With the choices in brackets, they are NOT equivalent]
THE ALGORITHMIC CONFLUENCE: SOLUTIONAL APPROACH
 A different view of “general intelligence”:
 Policy-general intelligence: aggregate by difficulty (e.g., bounded
uniform distribution) and for each difficulty look for diversity.
 Connected to the task-independence of the g factor.
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
18
Raises a fascinating question: Is there a universal g factor?
Ability to find, integrate and emulate a
diverse range of successful policies.
FROM TASKS TO ABILITIES: CLUSTERING BY SIMILARITY
 Focus first on intermediate levels between tasks and abilities:
 Do we have an intrinsic notion of similarity between tasks?
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
19
 Task breadth? Arrange abilities?
 Hierarchically (e.g., Catell-Horn-Carroll)
 Spatially (e.g., Guttman’s model)
FROM TASKS TO ABILITIES: CLUSTERING BY SIMILARITY
 Example (ECA rules as tasks).
 Task description is not used. No population is used either.
 The best solutions are used instead and compared.
 Using similarity as difficulty increases (18 rules of difficulty 8):
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
20
Dendrogram using complete linkageMetric multidimensional scaling
NEW AI EVALUATION PLATFORMS: A COSMOS
 Here they are:
 Facebook’s bAbi
 Arcade Learning Env. (Atari)
 Video Game Definition Language
 OpenAI Gym and Universe
 Microsoft’s Project Malmo
 DeepMind Lab
 Facebook’s TorchCraft
 Facebook’s CommAI
 AI Magazine report: “A New AI Evaluation Cosmos”
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
21
Most (except CommAI) oriented towards the evaluation of
very embodied AI, but what about more abstract cognition?
IS THIS SUFFICIENT? OPEN QUESTIONS
 What do these platforms / test measure?
 Depends on the tasks we define!
 Many things to be done
 Task analysis, their similarities, difficulties, their requirements (data)
 Abilities: be conceptualised and identified.
 Ability-oriented (or feature-oriented) evaluation
 Incremental, gradual, curriculum, …: task similarity → dependency
 Recent (EGPAI@ECAI2016, MAIN@NIPS2016) and upcoming workshops
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
22
EGPAI@IJCAI2017
MAIN@NIPS2017 ?
IS THIS SUFFICIENT? OPEN QUESTIONS
 We want cognitive components that could be easily integrated into
standalone cognitive systems.
 What to measure:
 “specific entities”, “networks” or “services” (Spohrer and Banavar 2015)
 We need a different kind of 'specification' of
 What the components are able to do.
 What the integrated systems will be able to do,
 Depending on their integration (tight, loose, teams, etc.).
 Understanding the inclusion or emergence of general abilities.
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
23
CONCLUSIONS
 Increasing need for the evaluation of cognitive systems:
 Plethora of new systems: AI, hybrids, collectives, etc.
 Crucial to assess their cognitive profiles unlike and beyond humans’.
 Critical for recognising what professions can be automated first.
 Compensating for several cognitive impairments (e.g., aging).
 From a task-oriented to an ability-oriented evaluation:
 Evaluating cognitive abilities requires a change of paradigm:
 From a populational to a universal perspective,
 From agglomerative (task diversity) to solutional (policy diversity) approaches,
 Hierarchical view, clustering bottom-up.
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
24
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
25
THANK YOU!
 More info:
 BOOK
 “The Measure of All Minds: Evaluating Natural
and Artificial Intelligence”, Cambridge
University Press, 2017. http://www.allminds.org
 An AI Evaluation Survey
 "Evaluation in artificial intelligence: From task-
oriented to ability-oriented measurement",
Artificial Intelligence Review, 2016

More Related Content

Similar to Cognitive systems institute talk 8 june 2017 - v.1.0

LEC_2_AI_INTRODUCTION - Copy.pptx
LEC_2_AI_INTRODUCTION - Copy.pptxLEC_2_AI_INTRODUCTION - Copy.pptx
LEC_2_AI_INTRODUCTION - Copy.pptx
Ajaykumar967485
 
Today is all about AI
Today is all about AIToday is all about AI
Today is all about AI
Petru Cioată
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
PAPIs.io
 
introduction to ai
introduction to aiintroduction to ai
introduction to ai
SabbirAhmed274
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligenceMayank Saxena
 
Introduction to Artificial intelligence and ML
Introduction to Artificial intelligence and MLIntroduction to Artificial intelligence and ML
Introduction to Artificial intelligence and ML
bansalpra7
 
Basic questions about artificial intelligence
Basic questions about artificial intelligenceBasic questions about artificial intelligence
Basic questions about artificial intelligenceAqib Memon
 
Artificial Intelligence in Gaming
Artificial Intelligence in GamingArtificial Intelligence in Gaming
Artificial Intelligence in Gaming
Satvik J
 
Introduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docIntroduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docbutest
 
On the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceOn the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevance
Giovanni Sileno
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
University of Hertfordshire
 
Introduction to AI
Introduction to AIIntroduction to AI
Introduction to AI
Dymytr Yovchev
 
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
TEST Huddle
 
What is artificial intelligence
What is artificial intelligenceWhat is artificial intelligence
What is artificial intelligenceSulbha Gath
 
From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de Mántaras
Machine Learning Valencia
 
WHY ROBOTICS, AI, AL & QUANTUM COMPUTING
WHY ROBOTICS, AI, AL & QUANTUM COMPUTINGWHY ROBOTICS, AI, AL & QUANTUM COMPUTING
WHY ROBOTICS, AI, AL & QUANTUM COMPUTING
University of Hertfordshire
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
Armando Vieira
 
Sp14 cs188 lecture 1 - introduction
Sp14 cs188 lecture 1  - introductionSp14 cs188 lecture 1  - introduction
Sp14 cs188 lecture 1 - introduction
Amer Noureddin
 

Similar to Cognitive systems institute talk 8 june 2017 - v.1.0 (20)

LEC_2_AI_INTRODUCTION - Copy.pptx
LEC_2_AI_INTRODUCTION - Copy.pptxLEC_2_AI_INTRODUCTION - Copy.pptx
LEC_2_AI_INTRODUCTION - Copy.pptx
 
Today is all about AI
Today is all about AIToday is all about AI
Today is all about AI
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
 
introduction to ai
introduction to aiintroduction to ai
introduction to ai
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
01 ai
01 ai01 ai
01 ai
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Introduction to Artificial intelligence and ML
Introduction to Artificial intelligence and MLIntroduction to Artificial intelligence and ML
Introduction to Artificial intelligence and ML
 
Basic questions about artificial intelligence
Basic questions about artificial intelligenceBasic questions about artificial intelligence
Basic questions about artificial intelligence
 
Artificial Intelligence in Gaming
Artificial Intelligence in GamingArtificial Intelligence in Gaming
Artificial Intelligence in Gaming
 
Introduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docIntroduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.doc
 
On the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceOn the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevance
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Introduction to AI
Introduction to AIIntroduction to AI
Introduction to AI
 
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
 
What is artificial intelligence
What is artificial intelligenceWhat is artificial intelligence
What is artificial intelligence
 
From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de Mántaras
 
WHY ROBOTICS, AI, AL & QUANTUM COMPUTING
WHY ROBOTICS, AI, AL & QUANTUM COMPUTINGWHY ROBOTICS, AI, AL & QUANTUM COMPUTING
WHY ROBOTICS, AI, AL & QUANTUM COMPUTING
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Sp14 cs188 lecture 1 - introduction
Sp14 cs188 lecture 1  - introductionSp14 cs188 lecture 1  - introduction
Sp14 cs188 lecture 1 - introduction
 

More from diannepatricia

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
diannepatricia
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
diannepatricia
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
diannepatricia
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
diannepatricia
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
diannepatricia
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
diannepatricia
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
diannepatricia
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
diannepatricia
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
diannepatricia
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
diannepatricia
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
diannepatricia
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
diannepatricia
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognition
diannepatricia
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
diannepatricia
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
diannepatricia
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
diannepatricia
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
diannepatricia
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
diannepatricia
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
diannepatricia
 
Hicss17 asakawa
Hicss17 asakawaHicss17 asakawa
Hicss17 asakawa
diannepatricia
 

More from diannepatricia (20)

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognition
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
 
Hicss17 asakawa
Hicss17 asakawaHicss17 asakawa
Hicss17 asakawa
 

Recently uploaded

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Cognitive systems institute talk 8 june 2017 - v.1.0

  • 1. José Hernández-Orallo Dep. de Sistemes Informàtics i Computació, Universitat Politècnica de València jorallo@dsic.upv.es Talk for the Cognitive Systems Institute Speaker series 8 June 2017* Based on parts of the book: “The Measure of All Minds”: http://allminds.org
  • 2. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 2 “Greatest accuracy, at the frontiers of science, requires greatest effort, and probably the most expensive or complicated of measurement instruments and procedures” (David Hand, 2004).
  • 3. COGNITIVE SYSTEMS: MUCH MORE THAN AI  Computers:  AI or AGI systems, robots, bots, …  Cognitively-enhanced organisms, cognitive prosthetics  Cyborgs, technology-enhanced humans  Biologically-enhanced computers:  Human computation and their data  (Hybrid) collectives  Virtual social networks, crowdsourcing  Minimal or rare cognition  Artificial life (more like bacteria, plants, etc.) E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 3 Societal impact on work, leisure, health, etc., difficult to assess as we do not know the cognitive capabilities of all these new systems.
  • 4. THE EVALUATION DISCORDANCE: AI EVALUATION E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 4 Edited image, originally from wikicommons "[AI is] the science of making machines do things that would require intelligence if done by [humans]." Marvin Minsky (1968).  They can do the “things” (tasks) without featuring intelligence.  Once the task is solved (“superhuman”), it is no longer an AI problem (“AI effect”)  AI would have progressed very significantly (see, e.g., Nilsson, 2009, chap. 32, or Bostrom, 2014, Table 1, pp. 12–13).  But AI is now full of idiots savants.
  • 5. THE EVALUATION DISCORDANCE: AI EVALUATION  Specific (task-oriented) AI systems E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 5 Machine translation, information retrieval, summarisation Warning! Intelligence NOT included. PR: computer vision, speech recognition, etc. Robotic navigation Driverless vehicles Prediction and estimation Planning and scheduling Automated deduction Knowledge- based assistants Game playing Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. All images from wikicommons
  • 6. THE EVALUATION DISCORDANCE: AI EVALUATION E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 6  Specific domain evaluation settings:  CADE ATP System Competition  PROBLEM BENCHMARKS  Termination Competition  PROBLEM BENCHMARKS  The reinforcement learning competition  PROBLEM BENCHMARKS  Program synthesis (Syntax-guided synthesis)  PROBLEM BENCHMARKS  Loebner Prize  HUMAN DISCRIMINATION  Robocup and FIRA (robot football/soccer)  PEER CONFRONTATION  International Aerial Robotics Competition (pilotless aircraft)  PROBLEM BENCHMARKS  DARPA driverless cars, Cyber Grand Challenge, Rescue Robotics  PROBLEM BENCHMARKS  The planning competition  PROBLEM BENCHMARKS  General game playing AAAI competition  PEER CONFRONTATION  BotPrize (videogame player) contest  HUMAN DISCRIMINATION  World Computer Chess Championship  PEER CONFRONTATION  Computer Olympiad  PEER CONFRONTATION  Annual Computer Poker Competition  PEER CONFRONTATION  Trading agent competition  PEER CONFRONTATION  Robo Chat Challenge  HUMAN DISCRIMINATION  UCI repository, PRTools, or KEEL dataset repository.  PROBLEM BENCHMARKS  KDD-cup challenges and ML kaggle competitions  PROBLEM BENCHMARKS  Machine translation corpora: Europarl, SE times corpus, the euromatrix, Tenjinno competitions…  PROBLEM BENCHMARKS  NLP corpora: linguistic data consortium, …  PROBLEM BENCHMARKS  Warlight AI Challenge  PEER CONFRONTATION  The Arcade Learning Environment  PROBLEM BENCHMARKS  Pathfinding benchmarks (gridworld domains)  PROBLEM BENCHMARKS  Genetic programming benchmarks  PROBLEM BENCHMARKS  CAPTCHAs  HUMAN DISCRIMINATION  Graphics Turing Test  HUMAN DISCRIMINATION  FIRA HuroCup humanoid robot competitions  PROBLEM BENCHMARKS  …
  • 7. THE EVALUATION DISCORDANCE: AI EVALUATION E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 7 Cognitive robots Intelligent assistants Pets, animats and other artificial companions Smart environments Agents, avatars, chatbots Web-bots, Smartbots, Security bots…  How to evaluate general-purpose systems and cognitive components? Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included.
  • 8. THE EVALUATION DISCORDANCE: AI EVALUATION  “Mythical Turing Test” (Sloman, 2014) and its myriad variants…  Mythical human-level machine intelligence  A red herring for general-purpose AI! E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 8
  • 9. THE EVALUATION DISCORDANCE: AI EVALUATION  What benchmarks? More comprehensive?  ARISTO (Allen Institute for AI) : College science exams  Winograd Schema Challenge : Questions targeting understanding.  Weston et al. “AI-Complete Question Answering” (bAbI)  CLEVR : Relations over visual objects E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 9 BEWARE: AI-Completeness claimed before Calculation, Chess, Go, Turing test, … Now AI is superhuman on most of them! (e.g., https://arxiv.org/pdf/1706.01427.pdf)
  • 10. THE EVALUATION DISCORDANCE: TEST MISMATCH  What about psychometric tests or animal tests in AI?  In 2003, Sanghi & Dowe :  simple program passing many IQ tests.  About 960 lines of code in Perl! E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 10 This made the point unequivocally: programs passing IQ tests are not necessarily intelligent
  • 11. THE EVALUATION DISCORDANCE: TEST MISMATCH  This has not been a deterrent!  Psychometric AI (Bringsjord and Schmimanski 2003):  An “agent is intelligent if and only if it excels at all established, validated tests of intelligence”.  Detterman, editor of the Intelligence Journal, posed “A challenge to Watson” (Detterman 2011)  2nd level to “be truly intelligent”: tests not seen beforehand.  “IQ tests are not for machines, yet” (Dowe & Hernandez-Orallo 2012) E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 11
  • 12. THE EVALUATION DISCORDANCE: TEST MISMATCH  What about developmental tests (or tests for children)? E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 12  Developmental robotics:  Battery of tests (Sinapov, Stoytchev, Schenk 2010-13)  Cognitive architectures:  Newell “test” (Anderson and Lebiere 2003)  “Cognitive Decathlon” (Mueller 2007).  AGI: high-level competency areas (Adams et al. 2012), task breadth (Goertzel et al 2009, Rohrer 2010), robot preschool (Goertzel and Bugaj 2009). a taxonomy for cognitive architectures a psychometric taxonomy (CHC)
  • 13. THE EVALUATION DISCORDANCE: TEST MISMATCH  Adapting tests between disciplines (AI, psychometrics, comparative psychology) is problematic:  Test from one group only valid and reliable for the original group.  Not necessary and/or not sufficient for the ability.  Machines and hybrids represent a new population.  Nowadays, many benchmarks are assuming that AI will use deep learning or millions of examples.  But machines and hybrids are also an opportunity to understand how to evaluate cognition. Still, E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 13 We need a different foundation
  • 14. THE ALGORITHMIC CONFLUENCE: WHAT IQ TESTS MEASURE E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 14  “Beyond the Turing Test”…  “Intelligence” definition and test (C-test) based on algorithmic information theory (Hernandez-Orallo 1998-2000).  Letter series common in cognitive tests (Thurstone).  Here generated from a TM with properties (projectibility, stability, …).  Their difficulty is calculated by Kt  Linked with Levin’s universal search, Solomonoff’s inductive inference, Kolmogorov complexity.
  • 15. THE ALGORITHMIC CONFLUENCE: WHAT IQ TESTS MEASURE  Metric derived by slicing by difficulty h (Kt) and :  This is IQ-test re-engineering!  Intelligence no longer “what intelligence tests measure” (Boring, 1923).  Clues about what IQ tests really measure? Inductive inference. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 15 Human performance correlated with the difficulty (h) of each exercise. But remember Sanghi and Dowe 2003!
  • 16. THE ALGORITHMIC CONFLUENCE: SITUATED TESTS  Passive to interactive view:  Intelligence as performance in a range of worlds.  The set of worlds M is described by Turing machines.  Intelligence is measured as an aggregate:  R aggregates ri and p assigns probabilities to environments. How? E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 16 π μ ri oi ai
  • 17. THE ALGORITHMIC CONFLUENCE: SOLUTIONAL APPROACH E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 17  Three approaches: Range of difficulties Diversity of solutions [universal, e.g. Legg and Hutter] [uniform] [universal] [universal][uniform][uniform] [With the choices in brackets, they are NOT equivalent]
  • 18. THE ALGORITHMIC CONFLUENCE: SOLUTIONAL APPROACH  A different view of “general intelligence”:  Policy-general intelligence: aggregate by difficulty (e.g., bounded uniform distribution) and for each difficulty look for diversity.  Connected to the task-independence of the g factor. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 18 Raises a fascinating question: Is there a universal g factor? Ability to find, integrate and emulate a diverse range of successful policies.
  • 19. FROM TASKS TO ABILITIES: CLUSTERING BY SIMILARITY  Focus first on intermediate levels between tasks and abilities:  Do we have an intrinsic notion of similarity between tasks? E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 19  Task breadth? Arrange abilities?  Hierarchically (e.g., Catell-Horn-Carroll)  Spatially (e.g., Guttman’s model)
  • 20. FROM TASKS TO ABILITIES: CLUSTERING BY SIMILARITY  Example (ECA rules as tasks).  Task description is not used. No population is used either.  The best solutions are used instead and compared.  Using similarity as difficulty increases (18 rules of difficulty 8): E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 20 Dendrogram using complete linkageMetric multidimensional scaling
  • 21. NEW AI EVALUATION PLATFORMS: A COSMOS  Here they are:  Facebook’s bAbi  Arcade Learning Env. (Atari)  Video Game Definition Language  OpenAI Gym and Universe  Microsoft’s Project Malmo  DeepMind Lab  Facebook’s TorchCraft  Facebook’s CommAI  AI Magazine report: “A New AI Evaluation Cosmos” E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 21 Most (except CommAI) oriented towards the evaluation of very embodied AI, but what about more abstract cognition?
  • 22. IS THIS SUFFICIENT? OPEN QUESTIONS  What do these platforms / test measure?  Depends on the tasks we define!  Many things to be done  Task analysis, their similarities, difficulties, their requirements (data)  Abilities: be conceptualised and identified.  Ability-oriented (or feature-oriented) evaluation  Incremental, gradual, curriculum, …: task similarity → dependency  Recent (EGPAI@ECAI2016, MAIN@NIPS2016) and upcoming workshops E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 22 EGPAI@IJCAI2017 MAIN@NIPS2017 ?
  • 23. IS THIS SUFFICIENT? OPEN QUESTIONS  We want cognitive components that could be easily integrated into standalone cognitive systems.  What to measure:  “specific entities”, “networks” or “services” (Spohrer and Banavar 2015)  We need a different kind of 'specification' of  What the components are able to do.  What the integrated systems will be able to do,  Depending on their integration (tight, loose, teams, etc.).  Understanding the inclusion or emergence of general abilities. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 23
  • 24. CONCLUSIONS  Increasing need for the evaluation of cognitive systems:  Plethora of new systems: AI, hybrids, collectives, etc.  Crucial to assess their cognitive profiles unlike and beyond humans’.  Critical for recognising what professions can be automated first.  Compensating for several cognitive impairments (e.g., aging).  From a task-oriented to an ability-oriented evaluation:  Evaluating cognitive abilities requires a change of paradigm:  From a populational to a universal perspective,  From agglomerative (task diversity) to solutional (policy diversity) approaches,  Hierarchical view, clustering bottom-up. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 24
  • 25. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 25 THANK YOU!  More info:  BOOK  “The Measure of All Minds: Evaluating Natural and Artificial Intelligence”, Cambridge University Press, 2017. http://www.allminds.org  An AI Evaluation Survey  "Evaluation in artificial intelligence: From task- oriented to ability-oriented measurement", Artificial Intelligence Review, 2016