- 1. 6/5/2019 人工知能学会 PFN Fellow 丸山宏 Twitter: @maruyama 「人工知能」をどのように読み解くか How to understand “Artificial Intelligence?” 1
- 2. Agenda 1. 「人工知能」とは何か 2. 新しい計算モデル 3. 科学・工学への展開 2 In this talk, first I point out that the word “AI” is used in many ways and it is causing confusion. Then, I will focus on “AI as frontier of computer science” and argue that a new computation model is emerging. Finally, I will discuss how this new computation model affects on science and engineering. 2
- 3. 3 (1) 「知性の探求」としての人工知能 神経科学 脳科学 神経科学 脳科学 心理学心理学 経済学経済学 ロボティクス 機械工学 ロボティクス 機械工学 コンピュータ科学 (人工知能) コンピュータ科学 (人工知能) 神経細胞・脳の 活動を観測 人の行動を観測 合理的エージェント としてモデル化 - ゲーム理論 - 最適化 制御システムとして モデル化 - 微分方程式 - 制御理論 ソフトウェアと してモデル化 The term “AI,” or Artificial Intelligence, is used in different meanings in different contexts. There are at least three different interpretations. First, AI is a research field to study “intelligence.” There are several different approaches to intelligence; neuroscience tries to understand intelligence by observing neuron activities in a brain, psychology by observing human behaviors, economics models human intelligence as an independent and rational agent, robotic researchers use differential equations and control theory to understand human motions, and we, computer scientists try to understand intelligence by building a software mimicking human intelligence. 3
- 4. 「人工知能(AI)」とは学問領域だが、、 学問領域 派生した技術 応用製品 物理学 人工知能 (AI) 内燃機関 半導体 自動車 コンピュータ 探索アルゴリズム 音声認識 画像認識 乗換案内 AIスピーカー 自動運転 これを物理と は呼ばない 「AI」と呼ば れることも… AI is a research field just like physics, but there is one important difference. In physics, there are technologies derived from physical theories, and there are products based on these derived technologies, but they are not called “physics.” On the other hand, in AI, the derived technologies as well as products are sometimes called “AI,” and THAT is the source of confusion. 4
- 5. (2) ハイプとしての「AI」 • AIに聞いてみた- 第4回超未婚社会 4/13 – 社会問題解決型AI 「AIひろし」 • 保育所の入所はAIが決める 数秒で選考可能に | NHKニュース 2/12 – 富士通のマッチング技術 • “AIに負けない”人材を育成せよ ～企業・教育 最前線～ 4/25 – 「自分たちの仕事がAIに奪われる」 • 車いすの天才ホーキング博士の遺言3/16 – 「ＡＩは自らの意志を持つようになり、人間と対立するだろう。ＡＩの到来は、 人類史上、最善の出来事になるか、または最悪の出来事になるだろう。」 • マリオ～AIのゆくえ～ 4/5 – 「死」を理解できないAI人間と、「生」に希望が持てない少年の交流の果てには、 どんな結末が待っているのか。 5 データアナリティ クス・最適化 機械による自動化 汎用AI（まだ見ぬ技 術） That leads to the second interpretation – “AI as something sophisticated.” Here are some examples of recent coverage by NHK. In the program “Ask AI,” statistical analysis of social data is used for drawing some recommendations for social issues. The news on “Nursery Matching AI” refers to a combinatorial optimization problem. Discussion on how job security is threatened by AI is actually talking about automation. The program on Dr. Hawking’s Last Will and the drama “Mario the AI” are dealing with hypothetical technologies that have never been realized. The reason I call this interpretation “AI as a hype” is that, in this use, people are less concerned with what “AI” is. The speaker means one thing, and the listener may understand it as completely another, and they do not bother. If the speaker uses “AI” in order to intentionally confuse the listener, tragedy happens. 5
- 6. This page is intentionally removed 6 6
- 7. This page is intentionally removed 7 7
- 8. (3) 「情報技術のフロンティア」としての人工知能 再帰呼び出しはAIの技術!?再帰呼び出しはAIの技術!? P. H. Winston, Artificial Intelligence ISBN-13: 978-0201084542, 1977 1970年代の「最先端」 Source: Wikipedia The third interpretation of “AI” is “AI as frontier of computer science (CS).” Many CS ideas were came from AI research. This is one of the early textbooks on AI. It covers things like - Symbolic programming (LISP programming language and dynamic memory management i.e., garbage collection) - Recursive call (e.g., to solve the puzzle “Tower of Hanoi”) - Search algorithm (e.g., depth-first search, A* algorithm, alpha-beta pruning, means-end analysis) - Syntax analysis All of these technologies are now in the core computer science and people do not call them “AI” anymore. 8
- 9. Computer Scienceに取り込まれる「AI技術」 1956-1974 第1次人工知能ブーム • 記号処理 (LISP) • Means-End Analysis • 自然言語処理 1956-1974 第1次人工知能ブーム • 記号処理 (LISP) • Means-End Analysis • 自然言語処理 1980-1987 第2次人工知能ブーム • 知識表現 (e.g. フレーム) • エキスパートシステム • オントロジー 1980-1987 第2次人工知能ブーム • 知識表現 (e.g. フレーム) • エキスパートシステム • オントロジー 2008 第3次人工知能ブーム • 統計的機械学習・深層学習 • ブラックボックス最適化 2008 第3次人工知能ブーム • 統計的機械学習・深層学習 • ブラックボックス最適化 - 再帰呼び出し - ガーベージコレクション - 動的計画法 - コンパイラ・コンパイラ - : - 再帰呼び出し - ガーベージコレクション - 動的計画法 - コンパイラ・コンパイラ - : - オブジェクト指向 - モデリング言語 - セマンティックWeb - 制約ソルバー - : - オブジェクト指向 - モデリング言語 - セマンティックWeb - 制約ソルバー - : 新しい計算モデル There have been three waves of AI research. In the first wave of AI, research topics include - Symbolic programming (lisp) - Means-end analysis - Natural language processing and they are incorporated into the core CS as - Dynamic memory management (garbage collection) - Search algorithms - Formal language theory In the second wave of AI, research topics include - Knowledge representation (e.g., Frame theory) - Expert system - Ontology And they are incorporated into the core CS as - Object-oriented programming - Modeling language - Semantic Web 9
- 10. So, if you look at what is happening in AI research, you can predict where CS is to go. In the current wave of AI research, focus areas include - Statistical machine learning, especially deep learning - Blackbox optimization And I believe that these technologies will lead us to a new programming paradigm, or new computation model. 9
- 11. Agenda 1. 「人工知能」とは何か 2. 新しい計算モデル – 深層学習 – ブラックボックス最適化 3. 科学・工学への展開 10 I said AI is to advance computer science. Let me discuss these two technologies that are currently driving CS. - Deep learning - Blackbox optimization and how they will make a new computation model. 10
- 12. 深層学習とは何か – (状態を持たない*)関数 Y = f(X)X Y 超多次元、（連 続変数・カテゴ リ変数の任意の 組み合わせ） 判別・制御なら ば比較的低次元、 生成ならば超多 次元 *状態を入力Xの一部とすれば、関数そのものは状態をもたないものとみなせる First, deep learning. Very simply put, deep learning is a function. A function takes X as input and produces Y as output. Usually we are dealing with a very high dimensional vector as input. For example, if the input is a image of 100x100 pixels, the input is a vector with 10,000 variables. Output Y could be simple – if the function is to classify the input image into either a cat or an airplane, the dimension of Y is 2. If the output is also an image, the dimension is much higher. 11
- 13. 普通の(演繹的な)関数の作り方: 例 摂氏から華氏への変換 double c2f(double c) { return 1.8*c + 32.0; } double c2f(double c) { return 1.8*c + 32.0; } 入力: C 出力: F ただし、FはCを華氏で表したもの 入力: C 出力: F ただし、FはCを華氏で表したもの 要求 アルゴリズム F = 1.8 * C + 32F = 1.8 * C + 32モデル 人が持つ人が持つ 先験的知 識 モデルが既知／計算機構（アルゴリズム）が構成可能である必要 Let’s consider programming a very simple function – computing Fahrenheit from Celsius. In conventional programming, which I call here “deductive programming,” we start from the requirement – what is input and what is the output. Then, we convert it into a “model” – by which I mean precise mathematical relationship between the input and output. In this case, we use our knowledge on the relationship between Celsius and Fahrenheit and represent it as a mathematical formula. Once you have the model, you can build an algorithm to actually compute it. Note that in deductive programming, we assume that the model is known and the algorithm is constructible. 12
- 14. 深層学習の(帰納的な)やり方 – 訓練データで入出力を例示 訓練データセット訓練データセット 観測 訓練（ほぼ自動でパラメタθを決定） モデル／アルゴリズムが未知でもよい! On the other hand, with deep learning, programming is done by giving examples – which I call inductive programming.. In our case, first we obtain two thermometers, one in Celsius and the other in Fahrenheit, and read their measurements from time to time. Then you get input-output examples, which we call a training data set. Once the training data set is ready, we train a deep neural net semi- automatically. Note that inductive programming is possible even when we do not know the model or the algorithm. This opens opportunities to solve very interesting real-world problems. 13
- 15. 自動運転のためのセグメンテーション https://www.youtube.com/watch?v=lGOjchGdVQs モデルが不明:人手による正解アノテーション Here are some examples of real world applications of inductive programming. First is semantic segmentation for autonomous driving. The top left panel is the video input. For each pixel of the input, our function has to calculate what class, such as road surface (purple), sidewalk (pink), vehicle (blue), tree (green), etc., the pixel belongs to. The output is shown in the right bottom panel. In this task, it is very hard to define a proper model, because you can not give precise mathematical conditions with which a particular pixel is classified as “road surface.” Instead of a model, we supply examples. The bottom left panel shows a human-annotated image, meaning that human annotators gave the correct answer to each pixel. We should supply thousands of these human- annotated images to get high accuracy. Even though we do not know the model, we can prepare training data set and obtain reasonably accurate results. 14
- 16. モデルが不明:人手による正解アノテーション The second example is a voice-controlled robot. In this demo, the operator says “Can you move the brown fluffy thing to the bottom?” to move the teddy bear. Note that the top and bottom are from the operators perspective, as shown in the top view image of the boxes as shown on the left. Next, she says “Can you move the tissue box to the left?” and then use another expression “white and blue box” to refer to the same object. Again, this system is very hard to program in the conventional way, because we cannot encode all the possible natural language expressions in the system. How did we train this system? We show the top view of the boxes and ask human annotators “what would you say if you want to move this to there?” and collected thousands of example sentences. With these example sentences as a training data set, we trained the system. 15
- 17. 線画への自動着色アプリPaintsChainer アルゴリズムが不明:逆問題として定式化 The third example is where the model is known but the algorithm is unknown. This is an application to colorize a line drawing. The left image is a line- drawing input and the right image is the output. You can also give hints in terms of what color is used in each region. How did we train this neural net? We collected a lot of Anime images from the Internet and applied a known algorithm called “edge detection” to obtain the corresponding line drawing. Using many such pairs we trained our neural net. Because the edge detection algorithm is well known, the mathematical relationship between a line drawing and the corresponding color image is known in a sense -- that is, the model is known. However, how to compute a colored image from a line-drawing input is not known, meaning that the algorithm is unknown. 16
- 18. 17 汎用計算機構としての深層学習 • 桁違いに多いパラメタ – 任意の多次元非線形関数を近似* 疑似的にチューリング完全! 出力（超多次元） 入力（超多次元） * G. Cybenko. Approximations by superpositions of sigmoidal functions. Mathematics of Control, Signals, and Systems, 2(4):303–314, 1989. I said that deep neural net is a function. How powerful is it? Is it powerful enough to represent any computable functions? The answer to this question is yes. Because a deep neural net has a very large number of parameters, it is theoretically proven that for any computable function, there exists a complex enough deep neural network that approximates the function within the given error margin. In other words, deep neural network is “pseudo Turing complete,” meaning that we have a universal computational mechanism, which can be programmed with examples. This is a significant breakthrough in computation, I believe. 17
- 19. 統計的機械学習の本質的限界 (1) 訓練データ DNN 将来が過去と同じでないと正しく予測できない 時間軸 過去に観測されたデー タに基づいて訓練 訓練 訓練済みモデルに基いて 新しい値を予測 Even though deep learning is a very powerful computing mechanism, it is not an almighty intelligence. It is statistical modeling and hence, there are fundamental limitations. I will explain three such limitations. First is that deep neural net is trained using the data observed in the past and use the trained deep neural net (DNN) to predict future events. Thus, we always assume that the future is essentially the same as the past. 18
- 20. 統計的機械学習の本質的限界 (2) • 訓練データセットは有限 訓練データセット 内挿 外挿 ?? 訓練データに現れない、希少な事象に対して無力 The second limitation is this. Even if the future and the past look the same, the training data set is always finite. If we are dealing with an infinite set as the domain of the function, which is almost always the case, an input to the function is in general something that is never seen in the training data set. If the input is close to some of the data points in the training data set, the prediction is very accurate. This situation is called “interpolation.” On the other hand, if the input is far from the any of the data points in the training data set, the prediction is very hard. This situation is called “extrapolation.” So deep learning is powerful for frequent events but powerless for rare events that do not appear in the training data set. Often customers ask us to build an “AI” system that can deal with emergency situations. As long as deep learning is concerned, it is impossible. We always need a lot of “emergency” samples in the training data set. 19
- 21. 統計的機械学習の本質的限界 (3) • 本質的に確率的 元分布 独立・同分布(i.i.d.) 訓練データ 訓練済みDNN サンプリングサンプリング にバイアスが 入ることは避 けられない! 「100%の正しさ」は保証できない The third and probably the most serious limitation is that the computation of deep learning is essentially probabilistic. Because we consider deep learning as statistical modeling, we always assume that there is an original probability distribution which is unknown. We further assume that the training data set is created by random sampling, or i.i.d. (Independent and Identically Distributed). After we obtained the training data set, the DNN is trained with a known training algorithm. But the problem is in the first “random sampling” step. This is random sampling, so there is always bias. For example, when you throw a dice for a hundred times, what is the chance of getting ‘one’ for all the hundred throws? It is very unlikely, and you may say that it will never happen. But there is no guarantee that it never happens because of the bias. This means that deep learning has no guarantee of its correctness. There is no such thing as 100% correct deep neural net. 20
- 22. 深層学習とは何か（まとめ） • 関数（プログラム）の作り方 – モデルやアルゴリズムがわからなくても、訓練データセットがあ れば作れる – 教師信号の与え方次第で、驚くようなことが… • 本質的に統計モデリング – 元分布が独立・同分布であることが前提 – 近似しかできない（バイアスが入ることを避けられない） Here is the recap of what deep learning is. Deep learning is a new programming model. It is inductive, not deductive, and the programming can be done without knowing the model or the algorithm. Innovative thinking on how to give teacher signals may solve many interesting problems. On the other hand, it is fundamentally a statistical modeling. It assumes that the original distribution is sampled i.i.d. There is always bias, and what you get is always an approximation. 21
- 23. Consumer Electronics Show (CES) 2016 CESにおける自動運転デモ、深層強化学習による訓練を行ったもの ブラックボックス最適化:強化学習による自動運転 Let me turn to the second technology that current AI research focuses on: blackbox optimization. This is an autonomous car demo at Consumer Electronics Show in 2016. The silver cars were trained using deep reinforcement learning. The red car was controlled manually by our vice president, and he sometimes hits or blocks other cars. But as you can see the silver cars can avoid collisions. 22
- 24. 強化学習は、報酬関数を最大化する最適化問題 23 X: センサ入力 Y: アクチュエータ 出力 Y = f(X, θ) u(S, Y): 報酬関数（効用関数） S: 現在の状態 How this system was trained? We used an external simulator. The current situation in the simulator is supplied as a simulated sensor input X to the learning algorithm. The learning algorithm compute the actuator output Y so that it maximize the reward function, or the utility function, u(S,Y). Thus, this is an optimization problem. 23
- 25. 24 伝統的な数理最適化との違い 伝統的な数理最適化 - 単体法（線形空間） - 内点法（凸空間） 効用関数の形が事前に知ら れている問題 ブラックボックス最適化 - 多腕バンディット問題 - 実験計画法 - 進化計算 - 強化学習 - ベイズ最適化 - 効用関数の形が事前に不明 - 外部オラクルを用いて効用関数の個別 の値を求める x u(x) This type of optimization problem is much different from the traditional optimization problems in operations research, such as linear programming (simplex method) and convex optimization (interior-point method) where the utility function is known in advance. Blackbox optimization, such as reinforcement learning, assumes no prior knowledge on the internals of the utility function – thus it is called “blackbox.” Instead, it receives signals on the utility function by asking an external “oracle” the value of utility function u(X) for a given parameter X. The idea of blackbox optimization is not new. Multi-arm bandit problem and design of experiments can both be considered as blackbox optimization. More recently, a new technique called Bayesian optimization attracts a lot of attention, especially in the context of hyper-parameter tuning in deep learning. 24
- 26. 25 • ハイパーパラメータ最適化ライブラリ – 任意の目的関数で並列かつ高度な枝刈り付きで高速に探索 • “Define-by-Run“に基づき問題をコードで記述するだけ – 探索範囲、最適化ロジックを別途記述する必要はない For example, Optuna is an open-source technology of Bayesian optimization. It allows very computationally-expensive oracles, such as deep learning, with pruning of unpromising trials. It also features “define-by-run,” meaning that the exploration space can be dynamically constructed as the exploration proceeds. Blackbox optimization is a very powerful tool. It does not assume anything about the utility function, yet it can optimize for it. In a sense, with blackbox optimization you specify what you want but not how to solve it. The “how” part is automatically discovered by the underlying algorithm. 25
- 27. 26 第3次AIがComputer Scienceにもたらすもの double c2f(double c) { return 1.8*c + 32.0; } double c2f(double c) { return 1.8*c + 32.0; } 計算ステップを明示的に 指定する計算 訓練データ 報酬関数 計算ステップを明 示的に指定しな い・できない計算 ブラック ボックス 計算! With these two technologies arisen from AI research, namely deep learning and blackbox optimization, we are now witnessing an evolution of computing. That is, from what I call whitebox computation where you have to specify “how to compute,” to what I call blackbox computation where you only specify “what to compute” and let the machine to figure out how to compute it, which I call “Blackbox Computing.” 26
- 28. 「計算」の進化 ホワイトボックス計算 (How) ブラックボックス計算 (What) 理論的背景 離散数学、ブール代数 確率論、情報幾何 計算メカニズム チューリング機械 深層学習、ベイズ最適化、 … 対象とする問題 モデル・アルゴリズムが存在、低 次元 モデル・アルゴリズムが不明、超高次元 プログラミング 演繹的・人手による (人間の認知 能力に制約される) 帰納的・探索に基づく 精度 厳密解が得られる 近似解のみ 設計論 直交性、独立性、モジュール化、 関心の分離 全体最適性、統合 27 Revolution of computing – from whitebox computing to blackbox computing. Translations of the table contents follow: Theoretical Background; Discrete mathematics, Boolean algebra; Probability theory, Information Geometry Computational Mechanism; Turing machine; Deep learning, Bayesian optimization Target problems; Known model/algorithm, low-dimensional; Unknown model/algorithm, high-dimensional Programming; Deductive, manual (constrained by human cognitive capability); Inductive / search-based Accuracy; Exact solution; Approximation only Design principles; Orthogonal, independent, module, separation of concerns; Global optimum, integration 27
- 29. Agenda 1. 「人工知能」とは何か 2. 新しい計算モデル 3. 科学・工学への展開 28 I discussed that a new computing paradigm, blackbox computation, is emerging. If this is the case, then what are its implications to our society? Here I discuss two areas -- science and engineering. 28
- 30. 科学の原理:「少ないパラメタで多くの現象を説明する」 万有引力の法則 1/15, 201629 科学を導く価値観:オッカムの剃刀 First, how scientific methodologies are affected by blackbox computation? Take for example the law of gravity, Sir Isaac Newton’s beautiful law. Why is this beautiful? Because this simple equation can explain both the movement of celestial bodies and a dropping apple. Behind this, there is a fundamental value in science called Occam’s Razor which says “The explanation requiring the fewest assumptions is most likely to be correct.” 29
- 31. 30 実問題の多くは、多パラメタ空間: 例: ExRNA に基づくがん診断 がん診断 科学者は、支配的な因子を探 しがち 深層学習は多パラメタのまま よい診断結果を出す https://www.preferred-networks.jp/en/news On the other hand, many problems in the real world are not simple. We are working with National Cancer Center to diagnose cancer by analyzing exRNA in blood. Usually RNAs stay in cells but some fragments may move out of the cell. These RNA fragments are called exRNA. It is believed that by analyzing the expressions of these exRNA in blood we can diagnose various kinds of cancer. There are more than 4,000 different types of exRNA, and scientists have been searching for a small number of predominant types that are indicative to a particular cancer. Our team applied deep learning on this problem. We took all the 4,000+ types of exRNA into account and by looking at the whole pattern of these expressions, we could achieve order-of-magnitude higher accuracy of cancer diagnosis. 30
- 32. 31 X線1分子追跡法 非線形・高次元・非平衡・励起状 態・動的な物性は、従来物理学者が 「汚い領域」として避けてきた領域 深層学習による非線形・ 高次元のモデル化 注:非平衡な物理学は近年急速な進歩があ り活発に研究されているそうです。 Some physicist told me that physical phenomena that is nonlinear, high- dimensional, non-equilibrium, excited, and dynamic is considered “dirty” and being avoided by many physicists. Diffracted X-ray Tracking is one such problem – using very high-energy X- ray diffracted by tiny gold crystal attached to an endpoint of a protein molecule, you can observe the dynamic nature of the protein molecule. But this modeling has been very hard with traditional tools because of its nonlinear, high-dimensional, and dynamic nature. With deep learning in our science toolbox, we have much better chance of analyzing this phenomena. 31
- 33. 32 科学にバイアスはなかったか? 科学の問題 (人が)解ける問題 解くべき問題 By looking at these complex problems, a question arises if our science has been focusing on problems that can be solved, not problems that should be solved. 32
- 34. 33 深層学習による科学の変化:高次元科学 観察 予測 制御 少パラメタ科学(深層学習前) 多パラメタ科学 （深層学習を前提とする科学） モデル(ホワイトボックス) 理解 モデル(ブラックボックス) 観察 人間の認知限界によって拘束される 予測 制御 人間の認知限界に拘束されない! What does it mean in terms of scientific methodologies? In the past, we observed the mother nature and came up with a simple model on how the nature works. Once we understand the mechanism, we can use this knowledge to predict the movement of celestial bodies and control the trajectory of a cannon shell. The model should be simple enough to be understood by a human being. This means that in this mode of scientific activities, which I call low-dimensional science, the complexity of the model is limited by the human cognitive capability. On the other hand, now we have a computational tool that can deal with high-dimensional models. I call this high-dimensional science. In high- dimensional science, we derive a model by applying, for example, deep learning, whose internal workings are too complex for a mere human to understand, but still useful for predicting or controlling the physical world. Probably this is a controversial view – later today, there is a special session on AI and Physics, and I expect to see further discussions on this topic. 33
- 35. 34 https://japan.cnet.com/blog/maruyama/2019/05/01/entry_30022958/ Last month I wrote a blog on high-dimensional science. If you are interested in, please read the blog (sorry it is in Japanese at this moment). 34
- 36. 世の中の流れ – “Software 2.0” 35 https://petewarden.com/2017/11/13/deep-learning-is-eating-software/ “Software is eating the world,” Marc Andreessen, 2011 The second implication of blackbox computation is on the engineering side. In 2011, Marc Andreessen said that “software is eating the world.” More recently Pete Warden said that “deep learning is eating software.” I think they are right. 35
- 37. 丸山の予想: 2020年には、新しく作られるソフトウェアの50%以上が ブラックボックス計算を何等かの形で利用する デジタル計算機発明以来、最大のパラダイムシフト この技術をどのようにうまく使うか? This is my personal conjecture – by the end of the next 2020, more than half of newly developed software use blackbox computation in one way or another. And this is the largest paradigm shift since digital computers were first invented 70 years ago. The question is, how we can make good use of this new technology? There are many challenges on how to apply blackbox computing in real world applications. 36
- 38. 帰納的プログラミングにおける品質保証の本質的な難しさ 37 出典:石川冬樹,トップエスイーシンポジウム, 2018/03 テストが困難 デバッグ・改変が 困難 One challenge is quality assurance. This is a slide taken from Prof. Fuyuki Ishikawa’s presentation. Because inductive programming often deals with real world problems and also it is inductive, 1. There are infinite number of requirements 2. We cannot have test oracles that are absolute 3. It is hard to explain how the output is computed 4. The behavior of the system is hard to predict. These points make deep learning-based systems harder to test and debug. 37
- 39. そもそも要求仕様が書けるのか? 強化学習において、衝突のペナルティを無限大にすると? 38 動かないクルマ 効用と安全性のバランスを定量的に要件として書き出す必要! Another challenge is how to define the requirements for a blackbox optimization problem. I showed you autonomous driving demo using reinforcement learning. As I discussed, this system does not guarantee 100% avoidance of collision, because ultimately we use deep learning, which works statistically. Then what happens if we increase the penalty of collision to infinity in order to make the cars “100% safe”? The result would be cars that never move. This is logical – if cars move, there are non-zero probabilities of collision. If you want make the probabilities down to absolute zero, the cars must not move. This example reminds us that the safety and the utility are always tradeoff. In traditional system development, we pretend as if we can achieve both objectives simultaneously. If you formalize the task as an optimization problem, suddenly you have to explicitly quantify how much of safety can be sacrificed in order to achieve utility. 38
- 40. 39 要求の難しさ • IJCAI 2017 Keynote by Stuart Russell, “Provably Beneficial AI” – 人:「コーヒーをとってきて」 – ロボット: スタバへ行き、列に並んでいる他の客を殺してコー ヒーをとってくる – 人の指示は常に不完全 どちらも、最適化問題における「最適とは何か」の問題を提起 人工知能研究での未解決問題:「フレーム問題」 Another example is a fictious robot case, that was presented by Stuart Russell at IJCAI 2017. Suppose there is a smart robot. You say to the robot “get me coffee.” The robot goes downstairs to Starbucks, sees many people in the line, kills everybody, and gets coffee for you. Of course you do not want the robot to kill people. Instead of saying “get me coffee,” you may want to say “get me coffee without killing people.” But there are infinite number of things that you do not want the robot to do. This is actually one form of the “frame problem,” an open problem in AI. Both the autonomous driving case and the robot case tell us the difficulty of specifying the utility function in optimization. 39
- 41. 「橋やビルを安全に作るた めに土木工学という工学的 ディシプリンが必要であっ たのと同様に、データや機 械学習を利用して分析や意 思決定を行うための工学的 ディシプリンが必要だ」* https://medium.com/@mijordan3/artificial-intelligence-the- revolution-hasnt-happened-yet-5e1d5812e1e7 *丸山による要約 我々は何をすべきか:新しい工学の必要性 How should we overcome these challenges? Last year, Prof. Michael Jordan of UC Berkeley wrote a blog about the necessity of engineering discipline for machine learning. It is a long text, but my understanding is this: As we have an engineering discipline called “civil engineering” for building bridges and buildings safely, we should have a similar engineering discipline for using machine learning or data analytics for decision making. 40
- 42. 工学とは「技術と社会の契約」 理論（e.g., 構造計算） ＊ 安全係数 新技術が社会に受容されるのは「工学」として熟成されてから 土木工学ハンドブック、969ページ 我々はなぜ安心して橋を渡れるのか? 土木工学の膨大な知見があるから What is engineering? Let’s consider why we feel safe when crossing a bridge. It is because there is a huge body of knowledge called “Civil Engineering” behind the bridge, and we, society as a whole, trust it. These volumes are Civil Engineering Handbook which have thousands of pages. What are in this body of knowledge? Most of it are theories such as material theory, structural theory, etc. In addition to these theories, there is a small set of numbers called “safety factors” shown in this table. For each subfield of civil engineering there is a safety factor, and if we put them altogether, the overall safety factor is around three. This means that, they first calculate a design of a bridge that is strong enough in every aspect of the relevant theories, and then finally they build actual bridge that is three times stronger than what the theories predict to be safe. In other words, civil engineers are humble. They know that their theories are not perfect. There are things that are beyond their knowledge. The safety factors are determined according to their best judgement, and the society trusts them. 41
- 43. To me, engineering is a contract between the technology and the society. We need something similar for the new computation paradigm. 41
- 44. 42 興味ある方は、日本ソフトウェア科学会機械学習工学研究会へ! MLSE (Machine Learning Systems Engineering) is a Special Interest Group of Japan Society for Software Science and Technology, established in April 2018. We have a lot of discussions both domestically and globally on these topics. We are planning to have MLSE 2019, a two-day, overnight workshop on 7/6-7/7 in Hakone. You are welcome to join. Paper submission deadline is 6/11. 42
- 45. 43 技術者として我々がすべきこと 1. 「人工知能」は多義 – 不幸をもたらす誤解があれば、正すべき 2. 深層学習・ブラックボックス最適化は新しいプログラミングパラダイム – その可能性と限界を正しく伝える責務 3. 工学としての熟成が必要 – ブラックボックス性を認めた上での社会合意を粘り強く求める 技術に対して誠実に向き合ってください Here is the take-away message of my talk today: as researchers and engineers in the field of AI, we should 1. Be aware that the word “AI” is used for different things in different context • Clarify the meaning whenever possible 2. Understand the emerging computation paradigm • Explain what it is. Do not oversell or undersell 3. Make it an engineering discipline • Admit its limitations and try to reach social acceptance In short, I urge you to be sincere to technologies. 43