SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
7.
Meta decision theory
7
(Nozick 1993; MacAskill 2016)
8.
8
Altruistic Newcomb problem in a large
universe
Ω
Ω
Ω
Ω
Ω
Ω
Ω
9.
Altruistic Newcomb problem in a large
universe
9
10.
EDT Wager
10
● Large universe
● Caring about the gains of our copies
● Non-zero credence in EDT
● Meta decision theory
Wager for evidential decision theory (and all other theories that
take impact of copies into account)
12.
Caspar Oesterheld
Foundational Research Institute
Decision theory and approval-
directed agents
13.
Implementing decision theories in AIs
13
• Two problems of decision theory in AI safety:
• What is the right decision theory for an AI?
• How do we implement decision theories in AI?
• Decision theory not explicit in AI architecture
• Example: Doing what has worked well in the past (Oesterheld
2017b)
• Exception: Gödel machine (Schmidhuber 2006)
20.
20
In the paper…
If overseer only looks at the world, the agent’s DT is
decisive.
If overseer only looks at the agent’s action, the
overseer’s DT is decisive.
21.
Presentation title
John Smith | Head of Department 28.06.2016
Subtitle or caption
Thank you.
{johannes,caspar}@foundational-research.org
22.
References
22
• Ahmed, A. (2014): Evidence, Decision and Causality. Cambridge University Press.
• Almond, P. (2010): On Causation and Correlation. Part 2: Implications of Evidential
Decision Theory. https://casparoesterheld.files.wordpress.com/2017/03/
correlation2.pdf
• Bostrom, N. (2014b): Superintelligence: Paths, Dangers, Strategies. Oxford
University Press.
• Christiano, P. (2014): Model-free decisions. https://ai-alignment.com/model-free-
decisions-6e6609f5d99e
• MacAskill, W. (2016): Smokers, Psychos, and Decision-Theoretic Uncertainty. The
Journal of Philosophy
• Nozick, R. (1993): The Nature of Rationality. Princeton: Princeton University Press
23.
References
23
• Oesterheld, C. (2017b): Doing what has worked well in the past leads to evidential
decision theory. https://casparoesterheld.files.wordpress.com/2017/09/learningdt.pdf
• Oesterheld, C. (2017a): Multiverse-wide Cooperation via Correlated Decision
Making. https://foundational-research.org/files/Multiverse-wide-Cooperation-via-
Correlated-Decision-Making.pdf
• Schmidhuber, J. (2006): Gödel Machines: Self-Referential Universal Problem Solvers
Making Provably Optimal Self-Improvements. ftp://ftp.idsia.ch/pub/juergen/gm6.pdf
• Soares, N. and Fallenstein, B. (2014a): Aligning Superintelligence with Human
Interests: A Technical Research Agenda. MIRI Tech. rep. 2014-8. https://
intelligence.org/files/TechnicalAgenda.pdf
• Soares, N. and Fallenstein, B. (2014b): Toward Idealized Decision Theory. MIRI
Tech. rep. 2014-7. https://arxiv.org/abs/1507.01986
• Soares and Levinstein (2017): Cheating Death in Damascus. https://intelligence.org/
files/DeathInDamascus.pdf