Advertisement
Advertisement

More Related Content

Advertisement

Y conf talk - Andrej Karpathy

  1. Where will AGI come from? Y Conf, June 10, 2017 andrej @karpathy
  2. “Deep Learning” search popularity 2012 2012+ image recognition, 2010+ speech recognition, 2014+ machine translation, etc.
  3. (from @ML_Hipster)
  4. CS231n: Convolutional Neural Networks for Visual Recognition (Stanford Class) 2015: 150 students 2016: 330 students 2017: 750 students 2018: ??? (max students per class is capped at 999)
  5. The Current State of Machine Intelligence 3.0 [Shivon Zilis]
  6. In popular media...
  7. 1. AI today is still very narrow*. *2. but thanks to Deep Learning, we can repurpose solution components faster. Two comments:
  8. Example: AlphaGo (see my Medium post “AlphaGo, in context”)
  9. Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed. Each player has complete information. 3. Discrete action space. Finite number of actions possible. 4. Perfect simulator. The effect of any action is know exactly. 5. Short episodes. ~200 actions per game. 6. Clear + fast evaluation. According to Go rules. 7. Huge dataset available. Human vs human games.
  10. Q: “Can we run AlphaGo on a robot for the Amazon Picking Challenge”?
  11. Q: “Can we run AlphaGo on a robot for the Amazon Picking Challenge”? A:
  12. 1. Deterministic. No noise in the game. 2. Fully observed. Each player has complete information. 3. Discrete action space. Finite number of actions possible. 4. Perfect simulator. The effect of any action is know exactly. 5. Short episodes. ~200 actions per game. 6. Clear + fast evaluation. According to Go rules. 7. Huge dataset available. Human vs human games.
  13. 1. Deterministic. No noise in the game. 2. Fully observed. Each player has complete information. 3. Discrete action space. Finite number of actions possible. 4. Perfect simulator. The effect of any action is know exactly. 5. Short episodes. ~200 actions per game. 6. Clear + fast evaluation. According to Go rules. 7. Huge dataset available. Human vs human games. OK OKish OK TROUBLE. challenge challenge not good
  14. Summary so far: 1. in interest in AI 2. AI is still 3. AI tech works in some cases and can be repurposed much (narrow)
  15. “What if we succeed in making it not narrow?” Nick Bostrom Stephen Hawking Bill Gates Elon Musk Sam Altman Stuart Russell Eliezer Yudkowsky ... ~2014+
  16. Normal hype cycle
  17. AI is different.
  18. “AGI imminent.” “Oh no, AI winter imminent. My funding is about to dry up again.” Meanwhile, in Academia...
  19. Talk Outline: - Supervised learning - “it works, just scale up!” - Unsupervised learning - “it will work, if we only scale up!” - AIXI - “guys, I can write down optimal AI.” - Brain simulation - “this will work one day, right?” - Artificial Life - “just do what nature did.” - Something not on our radar Where could AGI come from?
  20. Talk Outline: - Supervised learning - “it works, just scale up!” - Unsupervised learning - “it will work, if we only scale up!” - AIXI - “guys, I can write down optimal AI.” - Brain simulation - “this will work one day, right?” - Artificial Life - “just do what nature did.” - Something not on our radar Where could AGI come from?
  21. Supervised Learning: Collect lots of labeled data, train a neural network on it.
  22. How do we get labels of intelligent behavior?
  23. Short Story on AI: A Cognitive Discontinuity. Nov 14, 2015 see: link
  24. Amazon Mechanical Turk CORE IDEA: collect data from humans, then train a big Neural Net to mimic what humans do.
  25. Amazon Mechanical Turk ++ SSH lots of train data
  26. Big Neural Network STATE: vision audio joint positions/velocities TASK description ACTION: joint torques, etc. LABEL: ACTION taken by the human OBJECTIVE: Make these equal
  27. Amazon Mechanical Turk ++ Step 2: autonomy Big Neural Network STATE: vision audio joint positions/velocities TASK description ACTION: joint torques, etc.
  28. What would this AI look like?
  29. Possible hint: char-rnn The cat sat on a ma_?
  30. Big Neural Network STATE: previous characters TASK none ACTION: next character LABEL: next character by human OBJECTIVE: Make these equal Possible hint: char-rnn
  31. at first: Generate text from the model
  32. train for a bit at first:
  33. train more train more at first: train for a bit
  34. open source textbook on algebraic geometry Latex source
  35. The low-level gestalt is right, but the high-level, long-term structure is missing. This is mitigated with more data / larger models.
  36. AIs in this approach… - Imitate/generate human-like actions - Can these AIs be creative? - Can they assemble a room of chairs/tables? - Can they make human domination schemes?
  37. AIs in this approach… - Imitate/generate human-like actions - Can these AIs be creative? - Can they assemble a room of chairs/tables? - Can they make human domination schemes? (Kind of) (Yes) (No.)
  38. Talk Outline: - Supervised learning - “it works, just scale up!” - Unsupervised learning - “it will work, if we only scale up!” - AIXI - “guys, I can write down optimal AI.” - Brain simulation - “this will work one day, right?” - Artificial Life - “just do what nature did.” - Something not on our radar Where could AGI come from?
  39. Unsupervised Learning: Big generative models. 1. Initialize a Big Neural Network 2. Train it to compress a huge amount of data on the internet 3. ??? 4. Profit
  40. Example2: (variational) autoencoders Also see: Autoregressive models, Generative Adversarial Networks, etcetc. identity function Information bottleneck: 30 numbers. (must compress the data to 30 numbers to reconstruct later)
  41. Example2: (variational) autoencoders Meddle with the code, then “decode” to the image
  42. Work at OpenAI: “Unsupervised Sentiment Neuron” (Alec Radford et al.) Another example: 1. Train a large char-rnn on a large corpus of unlabeled reviews from Amazon 2. One of the neurons automagically “discovers” a small sentiment classifier (this high-level feature must help predict the next character) (char-rnn also optimizes compression of data; prediction and compression are closely linked.)
  43. Basic idea: all of internet Big Neural Network +compression objective
  44. What would this AI look like? - The neural network has a powerful “brain state”: - Given any input data, could get e.g. 10,000 numbers of the networks “thoughts” about the data. - Given any vector of 10,000 numbers, we could maybe ask the network to generate samples of data that correspond. - Does it want to take over the world? (no; has no agency, no planning, etc.)
  45. Talk Outline: - Supervised learning - “it works, just scale up!” - Unsupervised learning - “it will work, if we only scale up!” - AIXI - “guys, I can write down optimal AI.” - Brain simulation - “this will work one day, right?” - Artificial Life - “just do what nature did.” - Something not on our radar Where could AGI come from?
  46. AIXI - Algorithmic information theory applied to general artificial intelligence. (Marcus Hutter) - Allows for a formal definition of “Universal Intelligence” (Shane Legg) - Bayesian Reinforcement Learning agent over the hypothesis space of all Turing machines.
  47. Turing machines Prior probability: “Simpler worlds” are more likely P Turing machines Likelihood probability: Which TMs are consistent with my experience so far? P System identification: which Turing machine am I in? If I knew, I could plan perfectly. Multiply vertically to get a posterior
  48. We can write down the optimal agent’s action at time t: (from http://www.vetta.org/documents/Machine_Super_Intelligence.pdf) where
  49. Complete history of interactions up to this point time t time m all possible future action-state sequences Weighted average of the total discounted reward, across all possible Turing Machines. The weights are [prior] x [likelihood] for each Turing machine. (description length of the TM, number of bits)
  50. There’s just a few problems... !!! !!! !!!!!!!!!!!!11
  51. Attempts have been made... I like “A Monte-Carlo AIXI Approximation” from Veness et al. 2011, https://www.aaai.org/Papers/JAIR/Vol40/JAIR-4004.pdf
  52. What would this agent look like? - We need to feed it a reward signal. Might be very hard to write down. Might lead to “perverse instantiations” (e.g. paper clip maximizers etc.) - Or maybe humans have a dial that gives the reward. But its actions might not be fully observable to humans. - Very computationally intractable. Also, people are really not good at writing complex code. (e.g. for “AIXI approximation”). - This agent could be quite scary. Definitely has agency.
  53. Talk Outline: - Supervised learning - “it works, just scale up!” - Unsupervised learning - “it will work, if we only scale up!” - AIXI - “guys, I can write down optimal AI.” - Brain simulation - “this will work one day, right?” - Artificial Life - “just do what nature did.” - Something not on our radar Where could AGI come from?
  54. Brain simulation BRAIN initiative, Human Brain Project, optogenetics, multi-electrode arrays, connectomics, NeuraLink, ...
  55. Brain simulation - How to measure a complete brain state? - At what level of abstraction? - How to model the dynamics? - How do you simulate the “environment” to feed into senses? - Various ethical dilemmas - Timescale-bearish neuroscientists.
  56. Talk Outline: - Supervised learning - “it works, just scale up!” - Unsupervised learning - “it will work, if we only scale up!” - AIXI - “guys, I can write down optimal AI.” - Brain simulation - “this will work one day, right?” - Artificial Life - “just do what nature did.” - Something not on our radar Where could AGI come from?
  57. How did intelligence arise in nature?
  58. We don’t have to redo 4B years of evolution. - Work at a higher level of abstraction. We don’t have to simulate chemistry etc. to get intelligent networks. - Intelligent design. We can meddle with the system and initialize with RL agents, etc.
  59. Intelligence is the ability to win, in the face of world dynamics and a changing population of other intelligent agents with similar goals.
  60. ● attention. The at-will ability to selectively "filter out" parts of the input that is judged not to be relevant for a current top-down goal. e.g. the "cocktail party effect". ● working memory: some structures/processes that temporarily store and manipulate information (7 +/- 2). Related to this, phonological loop: a special part of working memory dedicated to storing a few seconds of sound (e.g. when you repeat a 7-digit phone number in your mind to keep it in memory). also: the visuospatial sketchpad and an episodic buffer. ● long-term memory of quite a few suspected different types: procedural memory (e.g. driving a car), semantic memory (e.g. the name of the current President), episodic memory (for autobiographical sequences of events, e.g. where one was during 9/11) ● knowledge representation; the ability to rapidly learn and incorporate facts into some "world model" that can be inferred over in what looks to be approximately bayesian ways. the ability to detect and resolve contradictions, or propose experiments that disambiguate cases. the ability to keep track of what source provided a piece of information and later down-weigh its confidence if the source is suddenly judged not trust-worthy. ● spatial reasoning, some crude "game engine" model of a scene and its objects and attributes. All the complex biases we have built in that only get properly revealed with optical illusions. Spatial memory: cells in the brain that keep track of the connectivity of the world and do something like an automatic "SLAM", putting together a lot of information from different senses to position the brain in the world. ● reasoning by analogy, eg applying a proverb such as "that’s locking the barn door after the horse has gone" to a current situation. ● emotions; heuristics that make our genes more likely to spread - e.g. frustration. ● a forward simulator, which lets us roll forward and consider abstract events and situations. ● various skill acquisition heuristics; practicing something repeatedly, including the abstract idea of "resetting" an experiment, or deciding when an experiment is finished, or what its outcomes were. The heuristic inclination for "fun", experimentation, and curiosity. The heuristic of empowerment, or the idea that it is better to take actions that leave more options available in the future. ● consciousness / theory of mind: the understanding that other agents are like me but also slightly different in unknown ways. Empathy (e.g. the cringy feeling when seeing someone else get hurt). Imitation learning, or the heuristic of paying attention to and then later repeating what the other agents are doing. Intelligence “Cognitive toolkit” includes but is not limited to:
  61. Conclusion: we need to create environments that incentivize the emergence of a cognitive toolkit.
  62. Conclusion: we need to create environments that incentivize the emergence of cognitive toolkit. Incentives a lookup table of correct moves. Doing it wrong:
  63. Conclusion: we need to create environments that incentivize the emergence of cognitive toolkit. Doing it right: Incentives a lookup table of correct moves. Doing it wrong: Incentivises cognitive tools.
  64. Benefits of multi-agent environments: - variety - the environment is parameterized by its agent population, so an optimal strategy must be dynamically derived, and cannot be statically “baked” as behaviors / reflexes into a network. - natural curriculum - the difficulty of the environment is determined by the skill of the other agents.
  65. Why? Trends. Q: What about the optimization? A: Optimize over the whole thing: the architecture, the initialization, the learning rule. Write very little (or none) explicit code. (example small tensorflow graph)
  66. datasets models ImageNet (~10^6 images) Caltech 101 (~10^4 images) (how large they are) Google/FB Images on the web (~10^9+ images) (how well they work) Image Features (SIFT etc., learning linear classifiers on top) ConvNets (learn the features, Structure hard-coded) 2013 2017 90s - 2012 CodeGen (learn the weights and the structure) projection Hard Coded (edge detection etc. no learning) Lena (10^0; single image) 70s - 90s possibilityfrontier Zone of “not going to happen.” Pascal VOC (~10^5 images) In Computer Vision...
  67. environments agents MuJoCo/ATARI /Universe (~few dozen envs) Cartpole etc. (and bandits, gridworld, ...few toy tasks) (how much they measure / incentivise general intelligence) more multi-agent / non-stationary / real-world-like. (how impressive they are) more learning. more compute. Value Iteration etc. (~discrete MDPs, linear function approximators) DQN, PG (deep nets, hard-coded various tricks) 2013 2017 RL^2 (Learn the RL algorithm. structure fixed.) 90s - 2012 CodeGen (learn structure and learning algorithm) projection (simple multi-agent envs) Digital worlds (complex multi-agent envs) Reality Hard Coded (LISP programs, no learning) BlocksWorld (SHRDLU etc) 70s - 90s possibilityfrontier Zone of “not going to happen.” In Reinforcement Learning
  68. With increasing computational resources, the trend is towards more learning/optimization, and less explicit design. 1970: One of Many explicit (LISP) programs that made up SHRDLU. 50 years “NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING”, Zoph & Le Large-Scale Evolution of Image Classifiers
  69. “Learning to Cooperate, Compete, and Communicate” OpenAI blog post, 2017 - 4 red agents cooperate to chase 2 green agents - 2 green agents want to reach blue “water”
  70. What would this look like? - Achieve completely uninterpretable “proto-AIs” first, similar to simple animals, but with fairly complete cognitive toolkits. - Evolved AIs are a synthetic species that lives among us. - We will shape them to love humans, similar to how we shaped dogs. - “AI safety” will become a primarily empirical discipline, not a mathematical one as it is today. - Some might try to evolve bad AIs, equiv. to. combat dogs. - We might have to make it illegal to evolve AI strains, or upper bound the amount of computation per person and closely track all computational resources on Earth.
  71. Talk Outline: - Supervised learning - “it works, just scale up!” - Unsupervised learning - “it will work, if we only scale up!” - AIXI - “guys, I can write down optimal AI.” - Brain simulation - “this will work one day, right?” - Artificial Life - “just do what nature did.” - Something not on our radar Where could AGI come from?
  72. + Data from very large VR MMORPG worlds?
  73. Combination of some of the above? - E.g. take the artificial life approach, but allow agents to access the high-level representations of a big, pre-trained generative model.
  74. In order of promisingness: - Artificial Life - “just do what nature did.” - Something not on our radar - Supervised learning - “it works, just scale up!” - Unsupervised learning - “it will work, if we only scale up!” - AIXI - “guys, I can write down optimal AI.” - Brain simulation - “this will work one day, right?” Conclusion
  75. What do you think? (Thank you!) SL UL AIXI BrainSim ALife Other http://bit.ly/2r54rfe
  76. Cool Related Pointers Sebastian’s post, which inspired the title of this talk http://www.nowozin.net/sebastian/blog/where-will-artificial-intelligence-come-from.html Rodney Brooks paper https://www.researchgate.net/publication/222486990_Intelligence_Without_Representation
Advertisement