Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The map of "Levels of defence" in AI safety


Published on

One of the main principles of engineering safety is multilevel defence. When a nuclear bomb accidentally fell from the sky in the US, 3 of 4 defence levels failed. The last one prevented the nuclear explosion:

Multilevel defence is used a lot in the nuclear industry and includes different systems of passive and active safety, starting from the use of delayed neutrons for the reaction activation and up to control rods, containment building and exclusion zones.

Here, I present a look at the AI safety from the point of view of multilevel defence. This is mainly based on two of my yet unpublished articles: “Global and local solutions to AI safety” and “Catching treacherous turn: multilevel AI containment system”.

The special property of the multilevel defence, in the case of AI, is that the biggest defence comes from only the first level, which is AI alignment. Other levels have progressively smaller chances to provide any protection, as the power of self-improving AI will grow after it will break of each next level. So we may ignore all levels after AI alignment, but, oh Houston, we have a problem: based on the current speed of AI development, it seems that powerful and dangerous AI could appear within several years, but AI safety theory needs several decades to be created.

The map is intended to demonstrate a general classification principle of the defence levels in AI safety, but not to list all known ideas on the topic. I marked in “yellow” boxes, which are part of the plan of MIRI according to my understanding.

I also add my personal probability estimates as to whether each level will work (under the condition that AI risks are the only global risk, and previous levels have failed).

The principles of the construction of the map are similar to my “plan of x-risks prevention” map and my “immortality map”, which are also based around the idea of the multilevel defence.

Published in: Science
  • Be the first to comment

  • Be the first to like this

The map of "Levels of defence" in AI safety

  1. 1. Coherent extrapolated volition Preparation Prevention of the Strong AI creation Limiting AI scale and impact “Levels of defense” in AI safety Best team gets best AI safety theory Installing minimal set of goals Solving “other AIs problem” Testing Instruments to affect mature AI Hopes to survive AI Messaging to future Young AI Humans in simulation Humans as slaves AI will abandon Earth Taking over the world using first AI as a weapon Creating AI police Circuit-breakers Destruction of AI labs everywhere Restriction by law Promotion of AI safety idea Manipulating AI arms race (GoodAI) Funding of the AI safety reserch Attracting best minds Solving many AI alignment subproblems arbital Human emulation (Hanson) Human self-improving (Musk) Vague values AI “constitution” Red button and other internal constrains Creating many AIs for balansing? 0 Prelimnary measures 1 From first AI to Benevolent Singleton 10 percent? 4 Stopping AI in the wild 1 percent 2 Catching treacherous turn 1 percent Control systems Global solutions Local solutions AI safety guidlines and mindset Slowing AI development MIRI strategy (as I understand it and based on link provided by Bensinger, 2017) marked in yellow. Pale yellow marks desirable things which are not in the MIRI strategy Testing in simulation Safely interruptable agents; Corrigibility Testing in small tasks Narrow AI to control AI Information limiting Disconection from interent Phisically remote place Utility monsters and “landmines” 3 AI boxing 1 percent Levels Created by Alexey Turchin, 2017, CC4.0 See also other my maps on the topic: “AI failures modes” “AI safety solution” and articles “Message to any future AI” “Catching treacherous turn” “Global and local solutions of AI safety” “Human upload based AI Nanny” Time EthicsNet dataset competition 8.“Steering toward alignment- conducive AGI approaches” 7a “Technological edge in AGI by a strategically adequate project” 7b “White-boxed system understanding” of AI by main players Creation of AI safety theory First AI creation 6-5 4 “Construction of minimal aligned AGI for ending the acute risk period” Seed AI, able to initial self- improvemnt Cooperation Distributed AI Global Benevolent Singleton AI 3. End to the acute risk period 2.A stable period... to reach good conclusions about long-run outcomes. 1. Long-run good outcomes Corrigibility First human level AI with controllable goal system First AI gets decesive strategic advantage Self-improvemnt Large funding International project (AI CERN) Oracle AI War against AI (Nuke grid) Refuges Start a second copy of AI