SlideShare a Scribd company logo
1 of 25
Download to read offline
José Hernández-Orallo
Dep. de Sistemes Informàtics i Computació,
Universitat Politècnica de València
jorallo@dsic.upv.es
Talk for the Cognitive Systems Institute Speaker series
8 June 2017* Based on parts of the book:
“The Measure of All Minds”:
http://allminds.org
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
2
“Greatest accuracy, at the frontiers of science,
requires greatest effort, and probably the most
expensive or complicated of measurement
instruments and procedures”
(David Hand, 2004).
COGNITIVE SYSTEMS: MUCH MORE THAN AI
 Computers:
 AI or AGI systems, robots, bots, …
 Cognitively-enhanced organisms, cognitive prosthetics
 Cyborgs, technology-enhanced humans
 Biologically-enhanced computers:
 Human computation and their data
 (Hybrid) collectives
 Virtual social networks, crowdsourcing
 Minimal or rare cognition
 Artificial life (more like bacteria, plants, etc.)
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
3
Societal impact on
work, leisure, health, etc.,
difficult to assess as we
do not know the cognitive
capabilities of all these
new systems.
THE EVALUATION DISCORDANCE: AI EVALUATION
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
4
Edited image, originally from wikicommons
"[AI is] the science of making
machines do things that would require
intelligence if done by [humans]."
Marvin Minsky (1968).
 They can do the “things” (tasks) without
featuring intelligence.
 Once the task is solved (“superhuman”),
it is no longer an AI problem (“AI effect”)
 AI would have progressed very significantly
(see, e.g., Nilsson, 2009, chap. 32, or Bostrom, 2014, Table 1, pp. 12–13).
 But AI is now full of idiots savants.
THE EVALUATION DISCORDANCE: AI EVALUATION
 Specific (task-oriented) AI systems
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
5
Machine translation, information retrieval,
summarisation
Warning!
Intelligence
NOT included.
PR: computer vision,
speech recognition, etc.
Robotic
navigation
Driverless
vehicles
Prediction and
estimation
Planning and
scheduling
Automated
deduction
Knowledge-
based assistants
Game
playing
Warning!
Intelligence
NOT included. Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
Warning!
Intelligence
NOT included.
All images from wikicommons
THE EVALUATION DISCORDANCE: AI EVALUATION
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
6
 Specific domain evaluation settings:
 CADE ATP System Competition  PROBLEM BENCHMARKS
 Termination Competition  PROBLEM BENCHMARKS
 The reinforcement learning competition  PROBLEM BENCHMARKS
 Program synthesis (Syntax-guided synthesis)  PROBLEM BENCHMARKS
 Loebner Prize  HUMAN DISCRIMINATION
 Robocup and FIRA (robot football/soccer)  PEER CONFRONTATION
 International Aerial Robotics Competition (pilotless aircraft)  PROBLEM BENCHMARKS
 DARPA driverless cars, Cyber Grand Challenge, Rescue Robotics  PROBLEM BENCHMARKS
 The planning competition  PROBLEM BENCHMARKS
 General game playing AAAI competition  PEER CONFRONTATION
 BotPrize (videogame player) contest  HUMAN DISCRIMINATION
 World Computer Chess Championship  PEER CONFRONTATION
 Computer Olympiad  PEER CONFRONTATION
 Annual Computer Poker Competition  PEER CONFRONTATION
 Trading agent competition  PEER CONFRONTATION
 Robo Chat Challenge  HUMAN DISCRIMINATION
 UCI repository, PRTools, or KEEL dataset repository.  PROBLEM BENCHMARKS
 KDD-cup challenges and ML kaggle competitions  PROBLEM BENCHMARKS
 Machine translation corpora: Europarl, SE times corpus, the euromatrix, Tenjinno competitions…  PROBLEM BENCHMARKS
 NLP corpora: linguistic data consortium, …  PROBLEM BENCHMARKS
 Warlight AI Challenge  PEER CONFRONTATION
 The Arcade Learning Environment  PROBLEM BENCHMARKS
 Pathfinding benchmarks (gridworld domains)  PROBLEM BENCHMARKS
 Genetic programming benchmarks  PROBLEM BENCHMARKS
 CAPTCHAs  HUMAN DISCRIMINATION
 Graphics Turing Test  HUMAN DISCRIMINATION
 FIRA HuroCup humanoid robot competitions  PROBLEM BENCHMARKS
 …
THE EVALUATION DISCORDANCE: AI EVALUATION
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
7
Cognitive robots
Intelligent assistants
Pets, animats and other
artificial companions
Smart environments
Agents, avatars, chatbots
Web-bots, Smartbots, Security bots…
 How to evaluate general-purpose systems and cognitive components?
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
Warning!
Some intelligence
MAY BE included.
THE EVALUATION DISCORDANCE: AI EVALUATION
 “Mythical Turing Test” (Sloman, 2014)
and its myriad variants…
 Mythical human-level machine intelligence
 A red herring for general-purpose AI!
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
8
THE EVALUATION DISCORDANCE: AI EVALUATION
 What benchmarks? More comprehensive?
 ARISTO (Allen Institute for AI) : College science exams
 Winograd Schema Challenge : Questions targeting understanding.
 Weston et al. “AI-Complete Question Answering” (bAbI)
 CLEVR : Relations over visual objects
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
9
BEWARE: AI-Completeness claimed before
Calculation, Chess, Go, Turing test, …
Now AI is superhuman on most of them!
(e.g., https://arxiv.org/pdf/1706.01427.pdf)
THE EVALUATION DISCORDANCE: TEST MISMATCH
 What about psychometric tests or animal tests in AI?
 In 2003, Sanghi & Dowe :
 simple program passing many IQ tests.
 About 960 lines of code in Perl!
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
10
This made the point
unequivocally:
programs passing IQ
tests are not
necessarily intelligent
THE EVALUATION DISCORDANCE: TEST MISMATCH
 This has not been a deterrent!
 Psychometric AI (Bringsjord and Schmimanski 2003):
 An “agent is intelligent if and only if it excels at all
established, validated tests of intelligence”.
 Detterman, editor of the Intelligence Journal, posed “A
challenge to Watson” (Detterman 2011)
 2nd level to “be truly intelligent”: tests not seen
beforehand.
 “IQ tests are not for machines, yet” (Dowe & Hernandez-Orallo
2012)
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
11
THE EVALUATION DISCORDANCE: TEST MISMATCH
 What about developmental tests (or tests for children)?
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
12
 Developmental robotics:
 Battery of tests (Sinapov, Stoytchev, Schenk 2010-13)
 Cognitive architectures:
 Newell “test” (Anderson and Lebiere 2003)
 “Cognitive Decathlon” (Mueller 2007).
 AGI: high-level competency areas (Adams
et al. 2012), task breadth (Goertzel et al 2009,
Rohrer 2010), robot preschool (Goertzel and
Bugaj 2009).
a taxonomy for
cognitive architectures
a psychometric
taxonomy (CHC)
THE EVALUATION DISCORDANCE: TEST MISMATCH
 Adapting tests between disciplines (AI, psychometrics, comparative
psychology) is problematic:
 Test from one group only valid and reliable for the original group.
 Not necessary and/or not sufficient for the ability.
 Machines and hybrids represent a new population.
 Nowadays, many benchmarks are assuming that AI will use deep
learning or millions of examples.
 But machines and hybrids are also an opportunity to understand how
to evaluate cognition. Still,
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
13
We need a different foundation
THE ALGORITHMIC CONFLUENCE: WHAT IQ TESTS MEASURE
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
14
 “Beyond the Turing Test”…
 “Intelligence” definition and test (C-test) based on algorithmic
information theory (Hernandez-Orallo 1998-2000).
 Letter series common in cognitive tests (Thurstone).
 Here generated from a TM with properties (projectibility, stability, …).
 Their difficulty is calculated by Kt
 Linked with Levin’s universal search, Solomonoff’s inductive
inference, Kolmogorov complexity.
THE ALGORITHMIC CONFLUENCE: WHAT IQ TESTS MEASURE
 Metric derived by slicing by difficulty h (Kt) and :
 This is IQ-test re-engineering!
 Intelligence no longer “what intelligence tests measure” (Boring, 1923).
 Clues about what IQ tests really measure? Inductive inference.
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
15
Human performance
correlated with the difficulty
(h) of each exercise.
But remember Sanghi and Dowe 2003!
THE ALGORITHMIC CONFLUENCE: SITUATED TESTS
 Passive to interactive view:
 Intelligence as performance in a range of worlds.
 The set of worlds M is described by Turing machines.
 Intelligence is measured as an aggregate:
 R aggregates ri and p assigns probabilities to environments. How?
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
16
π μ
ri
oi
ai
THE ALGORITHMIC CONFLUENCE: SOLUTIONAL APPROACH
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
17
 Three approaches:
Range of difficulties Diversity of solutions
[universal, e.g. Legg and Hutter]
[uniform] [universal]
[universal][uniform][uniform]
[With the choices in brackets, they are NOT equivalent]
THE ALGORITHMIC CONFLUENCE: SOLUTIONAL APPROACH
 A different view of “general intelligence”:
 Policy-general intelligence: aggregate by difficulty (e.g., bounded
uniform distribution) and for each difficulty look for diversity.
 Connected to the task-independence of the g factor.
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
18
Raises a fascinating question: Is there a universal g factor?
Ability to find, integrate and emulate a
diverse range of successful policies.
FROM TASKS TO ABILITIES: CLUSTERING BY SIMILARITY
 Focus first on intermediate levels between tasks and abilities:
 Do we have an intrinsic notion of similarity between tasks?
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
19
 Task breadth? Arrange abilities?
 Hierarchically (e.g., Catell-Horn-Carroll)
 Spatially (e.g., Guttman’s model)
FROM TASKS TO ABILITIES: CLUSTERING BY SIMILARITY
 Example (ECA rules as tasks).
 Task description is not used. No population is used either.
 The best solutions are used instead and compared.
 Using similarity as difficulty increases (18 rules of difficulty 8):
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
20
Dendrogram using complete linkageMetric multidimensional scaling
NEW AI EVALUATION PLATFORMS: A COSMOS
 Here they are:
 Facebook’s bAbi
 Arcade Learning Env. (Atari)
 Video Game Definition Language
 OpenAI Gym and Universe
 Microsoft’s Project Malmo
 DeepMind Lab
 Facebook’s TorchCraft
 Facebook’s CommAI
 AI Magazine report: “A New AI Evaluation Cosmos”
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
21
Most (except CommAI) oriented towards the evaluation of
very embodied AI, but what about more abstract cognition?
IS THIS SUFFICIENT? OPEN QUESTIONS
 What do these platforms / test measure?
 Depends on the tasks we define!
 Many things to be done
 Task analysis, their similarities, difficulties, their requirements (data)
 Abilities: be conceptualised and identified.
 Ability-oriented (or feature-oriented) evaluation
 Incremental, gradual, curriculum, …: task similarity → dependency
 Recent (EGPAI@ECAI2016, MAIN@NIPS2016) and upcoming workshops
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
22
EGPAI@IJCAI2017
MAIN@NIPS2017 ?
IS THIS SUFFICIENT? OPEN QUESTIONS
 We want cognitive components that could be easily integrated into
standalone cognitive systems.
 What to measure:
 “specific entities”, “networks” or “services” (Spohrer and Banavar 2015)
 We need a different kind of 'specification' of
 What the components are able to do.
 What the integrated systems will be able to do,
 Depending on their integration (tight, loose, teams, etc.).
 Understanding the inclusion or emergence of general abilities.
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
23
CONCLUSIONS
 Increasing need for the evaluation of cognitive systems:
 Plethora of new systems: AI, hybrids, collectives, etc.
 Crucial to assess their cognitive profiles unlike and beyond humans’.
 Critical for recognising what professions can be automated first.
 Compensating for several cognitive impairments (e.g., aging).
 From a task-oriented to an ability-oriented evaluation:
 Evaluating cognitive abilities requires a change of paradigm:
 From a populational to a universal perspective,
 From agglomerative (task diversity) to solutional (policy diversity) approaches,
 Hierarchical view, clustering bottom-up.
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
24
E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R
A B I L I T Y - O R I E N T E D ?
25
THANK YOU!
 More info:
 BOOK
 “The Measure of All Minds: Evaluating Natural
and Artificial Intelligence”, Cambridge
University Press, 2017. http://www.allminds.org
 An AI Evaluation Survey
 "Evaluation in artificial intelligence: From task-
oriented to ability-oriented measurement",
Artificial Intelligence Review, 2016

More Related Content

Similar to Cognitive systems institute talk 8 june 2017 - v.1.0

LEC_2_AI_INTRODUCTION - Copy.pptx
LEC_2_AI_INTRODUCTION - Copy.pptxLEC_2_AI_INTRODUCTION - Copy.pptx
LEC_2_AI_INTRODUCTION - Copy.pptxAjaykumar967485
 
Today is all about AI
Today is all about AIToday is all about AI
Today is all about AIPetru Cioată
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...PAPIs.io
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligenceMayank Saxena
 
Introduction to Artificial intelligence and ML
Introduction to Artificial intelligence and MLIntroduction to Artificial intelligence and ML
Introduction to Artificial intelligence and MLbansalpra7
 
Basic questions about artificial intelligence
Basic questions about artificial intelligenceBasic questions about artificial intelligence
Basic questions about artificial intelligenceAqib Memon
 
Artificial Intelligence in Gaming
Artificial Intelligence in GamingArtificial Intelligence in Gaming
Artificial Intelligence in GamingSatvik J
 
Introduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docIntroduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docbutest
 
On the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceOn the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceGiovanni Sileno
 
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013TEST Huddle
 
What is artificial intelligence
What is artificial intelligenceWhat is artificial intelligence
What is artificial intelligenceSulbha Gath
 
From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasMachine Learning Valencia
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
 
Sp14 cs188 lecture 1 - introduction
Sp14 cs188 lecture 1  - introductionSp14 cs188 lecture 1  - introduction
Sp14 cs188 lecture 1 - introductionAmer Noureddin
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overviewbutest
 

Similar to Cognitive systems institute talk 8 june 2017 - v.1.0 (20)

LEC_2_AI_INTRODUCTION - Copy.pptx
LEC_2_AI_INTRODUCTION - Copy.pptxLEC_2_AI_INTRODUCTION - Copy.pptx
LEC_2_AI_INTRODUCTION - Copy.pptx
 
Today is all about AI
Today is all about AIToday is all about AI
Today is all about AI
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
 
introduction to ai
introduction to aiintroduction to ai
introduction to ai
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
01 ai
01 ai01 ai
01 ai
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Introduction to Artificial intelligence and ML
Introduction to Artificial intelligence and MLIntroduction to Artificial intelligence and ML
Introduction to Artificial intelligence and ML
 
Basic questions about artificial intelligence
Basic questions about artificial intelligenceBasic questions about artificial intelligence
Basic questions about artificial intelligence
 
Artificial Intelligence in Gaming
Artificial Intelligence in GamingArtificial Intelligence in Gaming
Artificial Intelligence in Gaming
 
Introduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docIntroduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.doc
 
On the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceOn the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevance
 
Introduction to AI
Introduction to AIIntroduction to AI
Introduction to AI
 
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
 
What is artificial intelligence
What is artificial intelligenceWhat is artificial intelligence
What is artificial intelligence
 
From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de Mántaras
 
WHY ROBOTICS, AI, AL & QUANTUM COMPUTING
WHY ROBOTICS, AI, AL & QUANTUM COMPUTINGWHY ROBOTICS, AI, AL & QUANTUM COMPUTING
WHY ROBOTICS, AI, AL & QUANTUM COMPUTING
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Sp14 cs188 lecture 1 - introduction
Sp14 cs188 lecture 1  - introductionSp14 cs188 lecture 1  - introduction
Sp14 cs188 lecture 1 - introduction
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overview
 

More from diannepatricia

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsondiannepatricia
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systemsdiannepatricia
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”diannepatricia
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibilitydiannepatricia
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Cardiannepatricia
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”diannepatricia
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...diannepatricia
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...diannepatricia
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”diannepatricia
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Agingdiannepatricia
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"diannepatricia
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligencediannepatricia
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognitiondiannepatricia
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systemsdiannepatricia
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”diannepatricia
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...diannepatricia
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50diannepatricia
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learningdiannepatricia
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Societydiannepatricia
 

More from diannepatricia (20)

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognition
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
 
Hicss17 asakawa
Hicss17 asakawaHicss17 asakawa
Hicss17 asakawa
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Cognitive systems institute talk 8 june 2017 - v.1.0

  • 1. José Hernández-Orallo Dep. de Sistemes Informàtics i Computació, Universitat Politècnica de València jorallo@dsic.upv.es Talk for the Cognitive Systems Institute Speaker series 8 June 2017* Based on parts of the book: “The Measure of All Minds”: http://allminds.org
  • 2. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 2 “Greatest accuracy, at the frontiers of science, requires greatest effort, and probably the most expensive or complicated of measurement instruments and procedures” (David Hand, 2004).
  • 3. COGNITIVE SYSTEMS: MUCH MORE THAN AI  Computers:  AI or AGI systems, robots, bots, …  Cognitively-enhanced organisms, cognitive prosthetics  Cyborgs, technology-enhanced humans  Biologically-enhanced computers:  Human computation and their data  (Hybrid) collectives  Virtual social networks, crowdsourcing  Minimal or rare cognition  Artificial life (more like bacteria, plants, etc.) E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 3 Societal impact on work, leisure, health, etc., difficult to assess as we do not know the cognitive capabilities of all these new systems.
  • 4. THE EVALUATION DISCORDANCE: AI EVALUATION E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 4 Edited image, originally from wikicommons "[AI is] the science of making machines do things that would require intelligence if done by [humans]." Marvin Minsky (1968).  They can do the “things” (tasks) without featuring intelligence.  Once the task is solved (“superhuman”), it is no longer an AI problem (“AI effect”)  AI would have progressed very significantly (see, e.g., Nilsson, 2009, chap. 32, or Bostrom, 2014, Table 1, pp. 12–13).  But AI is now full of idiots savants.
  • 5. THE EVALUATION DISCORDANCE: AI EVALUATION  Specific (task-oriented) AI systems E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 5 Machine translation, information retrieval, summarisation Warning! Intelligence NOT included. PR: computer vision, speech recognition, etc. Robotic navigation Driverless vehicles Prediction and estimation Planning and scheduling Automated deduction Knowledge- based assistants Game playing Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. Warning! Intelligence NOT included. All images from wikicommons
  • 6. THE EVALUATION DISCORDANCE: AI EVALUATION E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 6  Specific domain evaluation settings:  CADE ATP System Competition  PROBLEM BENCHMARKS  Termination Competition  PROBLEM BENCHMARKS  The reinforcement learning competition  PROBLEM BENCHMARKS  Program synthesis (Syntax-guided synthesis)  PROBLEM BENCHMARKS  Loebner Prize  HUMAN DISCRIMINATION  Robocup and FIRA (robot football/soccer)  PEER CONFRONTATION  International Aerial Robotics Competition (pilotless aircraft)  PROBLEM BENCHMARKS  DARPA driverless cars, Cyber Grand Challenge, Rescue Robotics  PROBLEM BENCHMARKS  The planning competition  PROBLEM BENCHMARKS  General game playing AAAI competition  PEER CONFRONTATION  BotPrize (videogame player) contest  HUMAN DISCRIMINATION  World Computer Chess Championship  PEER CONFRONTATION  Computer Olympiad  PEER CONFRONTATION  Annual Computer Poker Competition  PEER CONFRONTATION  Trading agent competition  PEER CONFRONTATION  Robo Chat Challenge  HUMAN DISCRIMINATION  UCI repository, PRTools, or KEEL dataset repository.  PROBLEM BENCHMARKS  KDD-cup challenges and ML kaggle competitions  PROBLEM BENCHMARKS  Machine translation corpora: Europarl, SE times corpus, the euromatrix, Tenjinno competitions…  PROBLEM BENCHMARKS  NLP corpora: linguistic data consortium, …  PROBLEM BENCHMARKS  Warlight AI Challenge  PEER CONFRONTATION  The Arcade Learning Environment  PROBLEM BENCHMARKS  Pathfinding benchmarks (gridworld domains)  PROBLEM BENCHMARKS  Genetic programming benchmarks  PROBLEM BENCHMARKS  CAPTCHAs  HUMAN DISCRIMINATION  Graphics Turing Test  HUMAN DISCRIMINATION  FIRA HuroCup humanoid robot competitions  PROBLEM BENCHMARKS  …
  • 7. THE EVALUATION DISCORDANCE: AI EVALUATION E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 7 Cognitive robots Intelligent assistants Pets, animats and other artificial companions Smart environments Agents, avatars, chatbots Web-bots, Smartbots, Security bots…  How to evaluate general-purpose systems and cognitive components? Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included. Warning! Some intelligence MAY BE included.
  • 8. THE EVALUATION DISCORDANCE: AI EVALUATION  “Mythical Turing Test” (Sloman, 2014) and its myriad variants…  Mythical human-level machine intelligence  A red herring for general-purpose AI! E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 8
  • 9. THE EVALUATION DISCORDANCE: AI EVALUATION  What benchmarks? More comprehensive?  ARISTO (Allen Institute for AI) : College science exams  Winograd Schema Challenge : Questions targeting understanding.  Weston et al. “AI-Complete Question Answering” (bAbI)  CLEVR : Relations over visual objects E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 9 BEWARE: AI-Completeness claimed before Calculation, Chess, Go, Turing test, … Now AI is superhuman on most of them! (e.g., https://arxiv.org/pdf/1706.01427.pdf)
  • 10. THE EVALUATION DISCORDANCE: TEST MISMATCH  What about psychometric tests or animal tests in AI?  In 2003, Sanghi & Dowe :  simple program passing many IQ tests.  About 960 lines of code in Perl! E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 10 This made the point unequivocally: programs passing IQ tests are not necessarily intelligent
  • 11. THE EVALUATION DISCORDANCE: TEST MISMATCH  This has not been a deterrent!  Psychometric AI (Bringsjord and Schmimanski 2003):  An “agent is intelligent if and only if it excels at all established, validated tests of intelligence”.  Detterman, editor of the Intelligence Journal, posed “A challenge to Watson” (Detterman 2011)  2nd level to “be truly intelligent”: tests not seen beforehand.  “IQ tests are not for machines, yet” (Dowe & Hernandez-Orallo 2012) E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 11
  • 12. THE EVALUATION DISCORDANCE: TEST MISMATCH  What about developmental tests (or tests for children)? E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 12  Developmental robotics:  Battery of tests (Sinapov, Stoytchev, Schenk 2010-13)  Cognitive architectures:  Newell “test” (Anderson and Lebiere 2003)  “Cognitive Decathlon” (Mueller 2007).  AGI: high-level competency areas (Adams et al. 2012), task breadth (Goertzel et al 2009, Rohrer 2010), robot preschool (Goertzel and Bugaj 2009). a taxonomy for cognitive architectures a psychometric taxonomy (CHC)
  • 13. THE EVALUATION DISCORDANCE: TEST MISMATCH  Adapting tests between disciplines (AI, psychometrics, comparative psychology) is problematic:  Test from one group only valid and reliable for the original group.  Not necessary and/or not sufficient for the ability.  Machines and hybrids represent a new population.  Nowadays, many benchmarks are assuming that AI will use deep learning or millions of examples.  But machines and hybrids are also an opportunity to understand how to evaluate cognition. Still, E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 13 We need a different foundation
  • 14. THE ALGORITHMIC CONFLUENCE: WHAT IQ TESTS MEASURE E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 14  “Beyond the Turing Test”…  “Intelligence” definition and test (C-test) based on algorithmic information theory (Hernandez-Orallo 1998-2000).  Letter series common in cognitive tests (Thurstone).  Here generated from a TM with properties (projectibility, stability, …).  Their difficulty is calculated by Kt  Linked with Levin’s universal search, Solomonoff’s inductive inference, Kolmogorov complexity.
  • 15. THE ALGORITHMIC CONFLUENCE: WHAT IQ TESTS MEASURE  Metric derived by slicing by difficulty h (Kt) and :  This is IQ-test re-engineering!  Intelligence no longer “what intelligence tests measure” (Boring, 1923).  Clues about what IQ tests really measure? Inductive inference. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 15 Human performance correlated with the difficulty (h) of each exercise. But remember Sanghi and Dowe 2003!
  • 16. THE ALGORITHMIC CONFLUENCE: SITUATED TESTS  Passive to interactive view:  Intelligence as performance in a range of worlds.  The set of worlds M is described by Turing machines.  Intelligence is measured as an aggregate:  R aggregates ri and p assigns probabilities to environments. How? E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 16 π μ ri oi ai
  • 17. THE ALGORITHMIC CONFLUENCE: SOLUTIONAL APPROACH E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 17  Three approaches: Range of difficulties Diversity of solutions [universal, e.g. Legg and Hutter] [uniform] [universal] [universal][uniform][uniform] [With the choices in brackets, they are NOT equivalent]
  • 18. THE ALGORITHMIC CONFLUENCE: SOLUTIONAL APPROACH  A different view of “general intelligence”:  Policy-general intelligence: aggregate by difficulty (e.g., bounded uniform distribution) and for each difficulty look for diversity.  Connected to the task-independence of the g factor. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 18 Raises a fascinating question: Is there a universal g factor? Ability to find, integrate and emulate a diverse range of successful policies.
  • 19. FROM TASKS TO ABILITIES: CLUSTERING BY SIMILARITY  Focus first on intermediate levels between tasks and abilities:  Do we have an intrinsic notion of similarity between tasks? E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 19  Task breadth? Arrange abilities?  Hierarchically (e.g., Catell-Horn-Carroll)  Spatially (e.g., Guttman’s model)
  • 20. FROM TASKS TO ABILITIES: CLUSTERING BY SIMILARITY  Example (ECA rules as tasks).  Task description is not used. No population is used either.  The best solutions are used instead and compared.  Using similarity as difficulty increases (18 rules of difficulty 8): E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 20 Dendrogram using complete linkageMetric multidimensional scaling
  • 21. NEW AI EVALUATION PLATFORMS: A COSMOS  Here they are:  Facebook’s bAbi  Arcade Learning Env. (Atari)  Video Game Definition Language  OpenAI Gym and Universe  Microsoft’s Project Malmo  DeepMind Lab  Facebook’s TorchCraft  Facebook’s CommAI  AI Magazine report: “A New AI Evaluation Cosmos” E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 21 Most (except CommAI) oriented towards the evaluation of very embodied AI, but what about more abstract cognition?
  • 22. IS THIS SUFFICIENT? OPEN QUESTIONS  What do these platforms / test measure?  Depends on the tasks we define!  Many things to be done  Task analysis, their similarities, difficulties, their requirements (data)  Abilities: be conceptualised and identified.  Ability-oriented (or feature-oriented) evaluation  Incremental, gradual, curriculum, …: task similarity → dependency  Recent (EGPAI@ECAI2016, MAIN@NIPS2016) and upcoming workshops E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 22 EGPAI@IJCAI2017 MAIN@NIPS2017 ?
  • 23. IS THIS SUFFICIENT? OPEN QUESTIONS  We want cognitive components that could be easily integrated into standalone cognitive systems.  What to measure:  “specific entities”, “networks” or “services” (Spohrer and Banavar 2015)  We need a different kind of 'specification' of  What the components are able to do.  What the integrated systems will be able to do,  Depending on their integration (tight, loose, teams, etc.).  Understanding the inclusion or emergence of general abilities. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 23
  • 24. CONCLUSIONS  Increasing need for the evaluation of cognitive systems:  Plethora of new systems: AI, hybrids, collectives, etc.  Crucial to assess their cognitive profiles unlike and beyond humans’.  Critical for recognising what professions can be automated first.  Compensating for several cognitive impairments (e.g., aging).  From a task-oriented to an ability-oriented evaluation:  Evaluating cognitive abilities requires a change of paradigm:  From a populational to a universal perspective,  From agglomerative (task diversity) to solutional (policy diversity) approaches,  Hierarchical view, clustering bottom-up. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 24
  • 25. E V A L U A T I N G C O G N I T I V E S Y S T E M S : T A S K - O R I E N T E D O R A B I L I T Y - O R I E N T E D ? 25 THANK YOU!  More info:  BOOK  “The Measure of All Minds: Evaluating Natural and Artificial Intelligence”, Cambridge University Press, 2017. http://www.allminds.org  An AI Evaluation Survey  "Evaluation in artificial intelligence: From task- oriented to ability-oriented measurement", Artificial Intelligence Review, 2016