データを読み取る感性

データを読みとる感性
廣瀬英雄
広島工業大学データサイエンス研究センター
2018.12.18
広島工業大学大学院シンポジウム
広島工業大学の廣瀬でございます。
コンピュータ
サイエンス
データ
サイエンス
コンピュータサイエンスとデータサイエンスとは、これまでそれぞれ違った道を歩いてき
ました。
コンピュータサイエンスでは2値データを使って、初期には、数値や文字だけ取り扱ってい
たものが、計算機の能力が上がりメモリーが小さくなることで、信号になり、画像になり、
動画になり、またネットワークで知識が繋がっていき、世界中が一つの体系を作り上げて
いく、というように、取り扱う範囲が広がってきました。その時、認識という世界にも踏
み込んできました。
一方、データサイエンスは、基盤を統計科学の分野においていますが、1970年代を境
に、それまで漸近理論で、つまり解析的な方法ですね、統計理論を組み立てていたものが
コンピューターインテンシブという方法、つまり、コンピュータを利用することで統計
的な信頼性を求める方法に置き換わってくる時期から急速に発展してきました。ブートス
トラップ法がその典型例です。
このように、両者は、それぞれの道を歩んできていました。
ところが、ここ10年くらいで様相が急変してきます。
ビッグデータから精度の高い予測を行うことができる環境になって、従来コンピュータサ
イエンスが得意としていた方法論でも間に合わなくなってきたからです。そこで、コン
ピュータサイエンスでは、確率的方法を用いるようになってきました。また、その際に統
計科学の知識が必要になってきました。2つのサイエンスは、まだ、ぎこちないところは
ありますが、お互いに歩み寄ってきています。今日は、その歩み寄りのところから感じて
いることを述べてみたいと思います。

800
600
400
200
0
360
300
240
180
120
60
0
10
0
-10
-20
-30
-40
-50
-60
Spectral Intensity (dBm)
"spe1.txt"
"spe2.txt"
"spe3.txt"
"spe4.txt"
"spe5.txt"
"spe6.txt"
"spe7.txt"
Frequency(MHz)

信号は検出できる
圧力
加速度
電磁波
音波
.
.
.
これは変電所での事故を未然に防ぐために、常時変電機器から出ているいろいろな信号
をとらえ、異常な状態になればそれをいち早く察知し、アラートを出して重大事故を防
ごうとするシステムの概念図を示しています。センサーの能力は高く、データはどんど
ん蓄積されてきます。
800
600
400
200
0
360
300
240
180
120
60
0
10
0
-10
-20
-30
-40
-50
-60
"spe1.txt"
"spe2.txt"
"spe3.txt"
"spe4.txt"
"spe5.txt"
"spe6.txt"
"spe7.txt"
Frequency(MHz)

信号は検出できるが
何が起こっているのか
圧力
加速度
電磁波
音波
.
.
.
何かが起こっている
あるとき、通常とは異なる信号を検知することができました。方法はニューラルネットで
もディープラーニングでも何でもかまいません。しかし、システムには、今何が起こって
いるか、原因が特定できません。したがって、対策が立てられません。

PD

PD

H. Hirose, M. Hikita, S. Ohtsuka, S. Tsuru and J. Ichimaru: Diagnosis of Electric Power Apparatuses using the Decision Tree Method,
IEEE Trans., Dielectrics and Electrical Insulation, Vol.15, No.5, pp. 1252-1260 (2008)
どういうことか。
センサーを張り巡らせれば何かができるだろうという楽観的な推測から作ったシステムは役
に立たなかった訳です。そこで、故障が起こっている状態から出てくる信号を見極め、故障状
態と信号との対比から、信号の分類を行おうという試みが、実験室レベルで行われるように
なってきました。
実験室では、そこで起こっている範囲のことを予測できるようになりました。
しかし、そのシステムを実際の変電所に置いても、やはり何が起こっているかはなかなかわ
かりません。
ここに、部分的には人の知性を超えるようになってきた現在のAIでも、まだまだ越えられな
い一線があることを示しています。
それは、後ほど述べますが、数学的にはinterpolationという方法論の限界になります。
ワトソンが専門医でもわからなかったガンを特定した
膨大なゲノム情報を扱うがん医療に不可欠なAI
www.innervision.co.jp/ressources/pdf/innervision2017/iv201707_018.pdf
Watson for Genomics(WfG)
①
骨髄異形成症候群
+ 別の白血病を発症している可能性
さて、今話題になっているAIからの3つの例を説明して、AIが本質的に何をやっている
のか見て見たいと思います。
はじめに、ワトソンです。東大の医科研が、ワトソンとスーパーコンピュータを使って、
専門医でもわからなかったガンを特定したというニュースはみなさん覚えておられるこ
とと思います。患者は、初め、専門医から骨髄異形成症候群と診断され、その治療を
しましたが効果がありませんでした。
そこに、ワトソンは、ゲノムの変異を、PubMedと呼ばれる世界中から集められた医学
関連の論文の知識と関連づけ、別の白血病を発症している可能性を指摘してくれました。
その治療法の効果があったということです。
これは、ビッグデータの中から適切なものを高速に探し出すということにあたり、従
来から行われていたAIにあたります。

AlphaGoが人間のプロ囲碁棋士を破った
過去の勝負を学習させた
深層学習 Deep Learning
ニューラルネットワーク
計算量との格闘
（先読みパターンなし）
盤面の評価関数 value network次の手を選ぶための policy network
モンテカルロ木探索 MCTS
②
Google DeepMindによって開発されたコンピュータ囲碁プログラム
名人の勝負を学習
教師ポリシー supervised learning (SL) policy
強化学習ポリシー reinforcement learning (RL) policy
自己シミュレーション
評価と探索の結合
各手での勝つ確率分布最適な一手
最適解の探索確率勾配
Mastering the game of Go with deep neural networks and tree search
4 8 4 | N A T U R E | V O L 5 2 9 | 2 8 J A N U A R Y 2 0 1 6
All games of perfect information have an optimal value function, v*
(s),
which determines the outcome of the game, from every board position
or state s, under perfect play by all players. These games may be solved
by recursively computing the optimal value function in a search tree
containing approximately bd
possible sequences of moves, where b is
the game’s breadth (number of legal moves per position) and d is its
depth (game length). In large games, such as chess (b≈35, d≈80)1
and
especially Go (b≈250, d≈150)1
, exhaustive search is infeasible2,3
, but
the effective search space can be reduced by two general principles.
First, the depth of the search may be reduced by position evaluation:
truncating the search tree at state s and replacing the subtree below s
by an approximate value function v(s)≈v*
(s) that predicts the outcome
from state s. This approach has led to superhuman performance in
chess4
, checkers5
and othello6
, but it was believed to be intractable in Go
due to the complexity of the game7
. Second, the breadth of the search
may be reduced by sampling actions from a policy p(a|s) that is a prob-
ability distribution over possible moves a in position s. For example,
Monte Carlo rollouts8
search to maximum depth without branching
at all, by sampling long sequences of actions for both players from a
policy p. Averaging over such rollouts can provide an effective position
evaluation, achieving superhuman performance in backgammon8
and
Scrabble9
, and weak amateur level play in Go10
.
Monte Carlo tree search (MCTS)11,12
uses Monte Carlo rollouts
to estimate the value of each state in a search tree. As more simu-
lations are executed, the search tree grows larger and the relevant
values become more accurate. The policy used to select actions during
search is also improved over time, by selecting children with higher
values. Asymptotically, this policy converges to optimal play, and the
evaluations converge to the optimal value function12
. The strongest
current Go programs are based on MCTS, enhanced by policies that
are trained to predict human expert moves13
. These policies are used
to narrow the search to a beam of high-probability actions, and to
sample actions during rollouts. This approach has achieved strong
amateur play13–15
. However, prior work has been limited to shallow
policies13–15
or value functions16
based on a linear combination of
input features.
Recently, deep convolutional neural networks have achieved unprec-
edented performance in visual domains: for example, image classifica-
tion17
, face recognition18
, and playing Atari games19
. They use many
layers of neurons, each arranged in overlapping tiles, to construct
increasingly abstract, localized representations of an image20
. We
employ a similar architecture for the game of Go. We pass in the board
position as a 19×19 image and use convolutional layers to construct a
representation of the position. We use these neural networks to reduce
the effective depth and breadth of the search tree: evaluating positions
using a value network, and sampling actions using a policy network.
We train the neural networks using a pipeline consisting of several
stages of machine learning (Fig. 1). We begin by training a supervised
learning (SL) policy network pσ directly from expert human moves.
This provides fast, efficient learning updates with immediate feedback
and high-quality gradients. Similar to prior work13,15
, we also train a
fast policy pπ that can rapidly sample actions during rollouts. Next, we
train a reinforcement learning (RL) policy network pρ that improves
the SL policy network by optimizing the final outcome of games of self-
play. This adjusts the policy towards the correct goal of winning games,
rather than maximizing predictive accuracy. Finally, we train a value
network vθ that predicts the winner of games played by the RL policy
network against itself. Our program AlphaGo efficiently combines the
policy and value networks with MCTS.
Supervised learning of policy networks
For the first stage of the training pipeline, we build on prior work
on predicting expert moves in the game of Go using supervised
learning13,21–24
. The SL policy network pσ(a|s) alternates between con-
volutional layers with weights σ, and rectifier nonlinearities. A final soft-
max layer outputs a probability distribution over all legal moves a. The
input s to the policy network is a simple representation of the board state
(see Extended Data Table 2). The policy network is trained on randomly
new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm,
our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go
champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the
full-sized game of Go, a feat previously thought to be at least a decade away.
1
Google DeepMind, 5 New Street Square, London EC4A 3TW, UK. 2
Google, 1600 Amphitheatre Parkway, Mountain View, California 94043, USA.
*These authors contributed equally to this work.
© 2016 Macmillan Publishers Limited. All rights reserved
4 8 4 | N A T U R E | V O L 5 2 9 | 2 8 J A N U A R Y 2 0 1 6
All games of perfect information have an optimal value function, v*
(s),
which determines the outcome of the game, from every board position
or state s, under perfect play by all players. These games may be solved
by recursively computing the optimal value function in a search tree
containing approximately bd
possible sequences of moves, where b is
the game’s breadth (number of legal moves per position) and d is its
depth (game length). In large games, such as chess (b≈35, d≈80)1
and
especially Go (b≈250, d≈150)1
, exhaustive search is infeasible2,3
, but
the effective search space can be reduced by two general principles.
First, the depth of the search may be reduced by position evaluation:
truncating the search tree at state s and replacing the subtree below s
by an approximate value function v(s)≈v*
(s) that predicts the outcome
from state s. This approach has led to superhuman performance in
chess4
, checkers5
and othello6
, but it was believed to be intractable in Go
due to the complexity of the game7
. Second, the breadth of the search
may be reduced by sampling actions from a policy p(a|s) that is a prob-
ability distribution over possible moves a in position s. For example,
Monte Carlo rollouts8
search to maximum depth without branching
at all, by sampling long sequences of actions for both players from a
policy p. Averaging over such rollouts can provide an effective position
evaluation, achieving superhuman performance in backgammon8
and
Scrabble9
, and weak amateur level play in Go10
.
Monte Carlo tree search (MCTS)11,12
uses Monte Carlo rollouts
to estimate the value of each state in a search tree. As more simu-
lations are executed, the search tree grows larger and the relevant
values become more accurate. The policy used to select actions during
search is also improved over time, by selecting children with higher
values. Asymptotically, this policy converges to optimal play, and the
evaluations converge to the optimal value function12
. The strongest
current Go programs are based on MCTS, enhanced by policies that
are trained to predict human expert moves13
. These policies are used
to narrow the search to a beam of high-probability actions, and to
sample actions during rollouts. This approach has achieved strong
amateur play13–15
. However, prior work has been limited to shallow
policies13–15
or value functions16
based on a linear combination of
input features.
Recently, deep convolutional neural networks have achieved unprec-
edented performance in visual domains: for example, image classifica-
tion17
, face recognition18
, and playing Atari games19
. They use many
layers of neurons, each arranged in overlapping tiles, to construct
increasingly abstract, localized representations of an image20
. We
employ a similar architecture for the game of Go. We pass in the board
position as a 19×19 image and use convolutional layers to construct a
representation of the position. We use these neural networks to reduce
the effective depth and breadth of the search tree: evaluating positions
using a value network, and sampling actions using a policy network.
We train the neural networks using a pipeline consisting of several
stages of machine learning (Fig. 1). We begin by training a supervised
learning (SL) policy network pσ directly from expert human moves.
This provides fast, efficient learning updates with immediate feedback
and high-quality gradients. Similar to prior work13,15
, we also train a
fast policy pπ that can rapidly sample actions during rollouts. Next, we
train a reinforcement learning (RL) policy network pρ that improves
the SL policy network by optimizing the final outcome of games of self-
play. This adjusts the policy towards the correct goal of winning games,
rather than maximizing predictive accuracy. Finally, we train a value
network vθ that predicts the winner of games played by the RL policy
network against itself. Our program AlphaGo efficiently combines the
policy and value networks with MCTS.
Supervised learning of policy networks
For the first stage of the training pipeline, we build on prior work
on predicting expert moves in the game of Go using supervised
learning13,21–24
. The SL policy network pσ(a|s) alternates between con-
volutional layers with weights σ, and rectifier nonlinearities. A final soft-
max layer outputs a probability distribution over all legal moves a. The
input s to the policy network is a simple representation of the board state
(see Extended Data Table 2). The policy network is trained on randomly
1
Google DeepMind, 5 New Street Square, London EC4A 3TW, UK. 2
Google, 1600 Amphitheatre Parkway, Mountain View, California 94043, USA.
*These authors contributed equally to this work.
© 2016 Macmillan Publishers Limited. All rights reserved
盤面深さ
10360
Monte Carlo Tree Search
1000万盤面くらい
次に、AlphaGoです。AlphaGoが人間のプロ囲碁棋士を破ったというニュー
スも驚かされるものでした。詰め将棋のように、打つ手の数がそれほど多く
ないときには、一つ一つの指し手に対して先読みを全部行っていけば最終的
な勝負の結果は論理的に読み取れます。しかし、碁盤のように、盤面が250
くらいで、手の深さが150くらいとすると、局面の数としては、10の360乗
にも登る天文学的な数になり、従来のロジックを土台にした方法では予測は
不可能です。
そこで、AlphaGoは、2つのことを行いました。1つは、過去の1000万盤面くらい
の勝負を学習したことです。もう1つは、次の一手によって先がどうなるか、架空
の対戦をして、次の指し手で勝つ確率が最も高くなるような手を計算したことです。
これは、計算量との格闘になります。
つまり、囲碁では、ロジックによる演繹推論では計算量の壁が破れないため、
AlphaGoは、確率的に最適に近い解を求める、という帰納的な手法を用いたとい
うことになります。ここでは、現在のAIを代表するアルゴリズムである、deep
learning、深層学習が使われています。
①
② 名人の勝負を学習, 強化学習, 確率的探索
ワトソン
AlphaGo
人
類
の
英
知
網羅的探索
ワトソンでのビッグデータからの検索、AlphaGoでの名人からの学習とシミュレー
ションによる確率的探索、と、2つの例を用いてAIの形態について考えてきまし
た。
ものすごく簡単に言いますと、「くまなく探す」「効率的に探す」という、なあん
だ、ということをやっているにすぎませんが、ヒトよりもうまく探した、というと
ころに驚きがあります。

SGD
③
Okada Nana
もう一つ。理解すると言うことについてです。
-2 -1 1 2
0.2
0.4
0.6
0.8
1.0
偏差値57
3
4
1
4
「東ロボくん」が偏差値57
https://diamond.jp/articles/-/108460?page=4
5030 40 60 70
偏差値
全国模試受験者
③
2年前、東ロボくんが全国模試で総合偏差値57をとりました。これは、受験者
の3/4が東ロボくんよりも下にいるということです。

http://image.itmedia.co.jp/l/im/news/articles/1511/16/l_haru_nii1.jpg
③
しかし、科目ごとの偏差値を見ると、数学は比較的良い成績なのに、国語と英語が極
端に劣っていることがわかります。
23
読解力がないので意味がわからない
これ以上ムリ
東ロボくん
③
24
当時、東ロボプロジェクトでは、今のロボットには読解力がないので文章の意味が
わからない。従って、これ以上追求してもムリ、ということで全国模試挑戦は休戦
状態になっています。プロジェクトリーダーによると「コンピュータは計算するだ
けで意味はわかっていない、意味がわからないと好成績は望めない」ということで
した。

https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html 3/6
move beyond existing pre-training techniques. The Transformer model architecture,
developed by researchers at Google in 2017, also gave us the foundation we needed to
make BERT successful. The Transformer is implemented in our open source release, as
well as the tensor2tensor library.
Results with BERT
To evaluate performance, we compared BERT to other state-of-the-art NLP systems.
Importantly, BERT achieved all of its results with almost no task-speci c changes to the
neural network architecture. On SQuAD v1.1, BERT achieves 93.2% F1 score (a measure
of accuracy), surpassing the previous state-of-the-art score of 91.6% and human-level
score of 91.2%:
BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE
benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks. The amount
of human-labeled training data in these tasks ranges from 2,500 examples to 400,000
examples, and BERT substantially improves upon the state-of-the-art accuracy on all of
them:
tml 1/6
ned on small-data NLP tasks like question
esulting in substantial accuracy improvements
ets from scratch.
echnique for NLP pre-training called Bidirectional
formers, or BERT. With this release, anyone in the
-art question answering system (or a variety of other
ngle Cloud TPU, or in a few hours using a single GPU.
ilt on top of TensorFlow and a number of pre-trained
our associated paper, we demonstrate state-of-the-
g the very competitive Stanford Question Answering
training contextual representations — including
Generative Pre-Training, ELMo, and ULMFit.
els, BERT is the rst deeply bidirectional,
n, pre-trained using only a plain text corpus (in thishttps://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html 1/6
answering and sentiment analysis, resulting in substantial accuracy improvements
compared to training on these datasets from scratch.
This week, we open sourced a new technique for NLP pre-training called Bidirectional
Encoder Representations from Transformers, or BERT. With this release, anyone in the
world can train their own state-of-the-art question answering system (or a variety of other
models) in about 30 minutes on a single Cloud TPU, or in a few hours using a single GPU.
The release includes source code built on top of TensorFlow and a number of pre-trained
language representation models. In our associated paper, we demonstrate state-of-the-
art results on 11 NLP tasks, including the very competitive Stanford Question Answering
Dataset (SQuAD v1.1).
What Makes BERT Different?
BERT builds upon recent work in pre-training contextual representations — including
Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit.
However, unlike these previous models, BERT is the rst deeply bidirectional,
unsupervised language representation, pre-trained using only a plain text corpus (in this
html 1/6
ets from scratch.
echnique for NLP pre-training called Bidirectional
sformers, or BERT. With this release, anyone in the
e-art question answering system (or a variety of other
ngle Cloud TPU, or in a few hours using a single GPU.
uilt on top of TensorFlow and a number of pre-trained
our associated paper, we demonstrate state-of-the-
g the very competitive Stanford Question Answering
-training contextual representations — including
, Generative Pre-Training, ELMo, and ULMFit.
els, BERT is the rst deeply bidirectional,
on, pre-trained using only a plain text corpus (in this
1/6
P tasks like question
accuracy improvements
training called Bidirectional
th this release, anyone in the
ng system (or a variety of other
a few hours using a single GPU.
ow and a number of pre-trained
we demonstrate state-of-the-
Stanford Question Answering
presentations — including
ng, ELMo, and ULMFit.
eeply bidirectional,
nly a plain text corpus (in thishttps://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html 1/6
answering and sentiment analysis, resulting in substantial accuracy improvements
compared to training on these datasets from scratch.
This week, we open sourced a new technique for NLP pre-training called Bidirectional
Encoder Representations from Transformers, or BERT. With this release, anyone in the
world can train their own state-of-the-art question answering system (or a variety of other
models) in about 30 minutes on a single Cloud TPU, or in a few hours using a single GPU.
The release includes source code built on top of TensorFlow and a number of pre-trained
-pre.html 1/6
e built on top of TensorFlow and a number of pre-trained
In our associated paper, we demonstrate state-of-the-
ding the very competitive Stanford Question Answering
pre-training contextual representations — including
ing, Generative Pre-Training, ELMo, and ULMFit.
odels, BERT is the rst deeply bidirectional,
ation, pre-trained using only a plain text corpus (in thishttps://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html 1/6
exact match (EM)
2018/11/4 Google AI Blog: Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing
simple task that can be generated from any text corpus: Given two sentences A and B, is
B the actual next sentence that comes after A in the corpus, or just a random sentence?
For example:
Training with Cloud TPUs
Everything that we’ve described so far might seem fairly straightforward, so what’s the
missing piece that made it work so well? Cloud TPUs. Cloud TPUs gave us the freedom
to quickly experiment, debug, and tweak our models, which was critical in allowing us to
move beyond existing pre-training techniques. The Transformer model architecture,
developed by researchers at Google in 2017, also gave us the foundation we needed to
make BERT successful. The Transformer is implemented in our open source release, as
well as the tensor2tensor library.
Results with BERT
To evaluate performance, we compared BERT to other state-of-the-art NLP systems.
Importantly, BERT achieved all of its results with almost no task-speci c changes to the
neural network architecture. On SQuAD v1.1, BERT achieves 93.2% F1 score (a measure
of accuracy), surpassing the previous state-of-the-art score of 91.6% and human-level
score of 91.2%:
BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE
benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks. The amount
of human-labeled training data in these tasks ranges from 2,500 examples to 400,000
examples, and BERT substantially improves upon the state-of-the-art accuracy on all of
them:
BERTがヒトの読解力を上回った
③`
2018/11/23
しかし、つい最近、GoogleのBERTというdeep learningが、ヒトの読解力を上回
る成果をあげたというニュースが入って衝撃が走りました。SQuAd（The
Stanford Question Answering Dataset）という、文章を理解しているかどうかを
多肢選択式で求めさせるテストで、ヒトを上回ったのです。
③`
学習のコースとは何ですか？
教授の科学を表す別の名前は何ですか？
ほとんどの教師はどこから資格を取得するのですか？
教師が生徒の学習に役立つものは何ですか？
先生はどこで教えているのでしょうか？
教師の役割は、しばしば正式で継続的であり、学校または
正式な教育の他の場所で行われます。多くの国で、教師にな
りたい人は、まず大学やカレッジから特定の専門資格や資
格を取得する必要があります。これらの職業的資格には、
教育学の研究、教育科学が含まれます。教師は他の専門家
のように、資格を得た後に継続して教育を受けなければな
らない場合があります。教師は、カリキュラムと呼ばれる学
習コースを提供し、学生の学習を促進するための授業計画
を使用することができます。
カリキュラム
ペダゴギー
大学やカレッジ
授業計画
大学やカレッジ
例えば、左に示すような文章を読んで、右のような問いかけで正しい答えはどれか、
という形式の問題に対して、BERTはヒトの正答率を上まわったのです。緑が正解を
表しています。
WikiPediaを事前に学習させてはいますが、それに加えて、ここにはdeep learning
の中でもtranformerという最新の技術が使われています。Tranformerというのは、
膨大な知識ベースを他の小さなデータベースにも継承できるようにしたものです。

③`
読解力のあるヒトより正解率が高かった
とはいえ、意味はわかっていない
ダメと言い切っていいのか？
BERTが成し遂げたことは、コンピュータが意味を理解した、とも受け取られるよ
うに思われます。もちろん、ここでもコンピュータは意味を理解してはいません。
よくわからないけど正解したということです。意味がわからなければダメじゃな
いか、東ロボくんもBERTもそういう意味ではどちらも厚い壁を破っていないでは
ないか、というヒトもいます。しかし、最近のdeep learningはここで私たち人間
に問いかけているようにも思えます。
③`
「意味を理解している」とは、
と、AIから問われているのでは？
人間が長い間に創り上げてきた「常識」という
世界を参照しているだけ、かも。
つまり、なぜかはわからないけど、deep learningはヒトよりも優れた結果を出し
てくれる。それのどこが問題なんだ、black boxで何がいけないんだ、ということ
です。私たちだって、正確に意味を理解しながらいろいろな判断を行っている訳で
はありません。多くは、人間が長い間に創り上げてきた「常識」という世界を参照
して、わかった気になっているだけかもしれません。「意味」にこだわる意味があ
るのか、ということです。

①
②
③
名人の勝負を学習, 強化学習, 確率的探索
Wiki, 事前学習, 教師なし学習, 転移学習
ワトソン
AlphaGo
BERT
人
類
の
英
知
網羅的探索
先の2つの例と比べて、3つ目の例、BERTでの事前学習や転移学習を用いた読解力の
獲得というのは、ヒトの知性にさらに一歩進んで「理解にせまった」と考えることも
できます。「参照している」にすぎませんが、ヒトを超え始めたのです。
一口にAIと言っても、さまざまな分野で、そこに特化したやり方を工夫し、時にはヒト
の力も借りながら、ヒトの知性を少しずつ乗り越えようとしていることがお分かりいた
だけたと思います。
①
②
③
名人の勝負を学習, 強化学習, 確率的探索
Wiki, 事前学習, 教師なし学習, 転移学習
ワトソン
AlphaGo
BERT
+ 高性能コンピュータ
高度なアルゴリズム+
人
類
の
英
知
データ
活用法
発見
網羅的探索
予測
さて、ここで、これらに共通している通奏低音のような模式を考えてみます。
いずれの場合も、データに高度なアルゴリズムを適用して何かを行おうというように
なっているかと思います。

データ . アルゴリズム予測
© Hideo Hirose
何か、というのは、予測であることが多いので、
「データ . アルゴリズム ~ 予測」、という模式のようになっていると解釈されます。
お互いの相性を考えた組み合わせによる相互作用
最適化という基準
何をしたいのか
© Hideo Hirose
ここで、赤い・の意味は、データとアルゴリズムの間でお互いの相性を考えた組み合
わせによる相互作用のことを表し、
赤い波は、最適化という基準に従って、何かをやりたい、例えば予測を行いたい、と
いう模式になります。

画像データ . 深層学習画像認識
狭い意味のAI
サンプル . 統計科学データサイエンス
統計学
情報学
データ . アルゴリズム計算機科学
この模式図を使えば、最近のAIブームを引き起こしたdeep learningも「画像デー
タ . 深層学習 ~ 画像認識」という風にかけますし、「サンプル . 統計科学 ~ データ
サイエンス」とかけば、データサイエンスに、「データ . アルゴリズム ~ 計算機科
学」とかけば、計算機科学にも通じる書き方になります。
43
y = f(x)
高校までは
関数の値を求める
変数の値を決めて
2次関数が与えられれば
計算ができればばよかった
数学の言葉を使うと、
高校までは、例えば2次関数が与えられれば、変数xの値を決めて、関数yの値を求める
ことができる、という計算ができればよかったわけですが、

45
y = f(x)
何をしたいのか
どのデータを利用するのか
どういうアルゴリズムを使うのか
の組み合わせが重要になってくる
AIでは
AIでは、これからは、どういうデータを利用するのか、そこにはどのように相性が合うア
ルゴリズムを使うのか、そもそも何をしたいのか、という組み合わせを全て考えていくこ
とが重要になってきます。
47
y = f(g(h…(x)))
ヒトを超える結果
ビッグデータ
複雑なアルゴリズム
ベクトルで表現すると
複雑なアルゴリズムも明快に記述できる
ベクトルで表現すると、
複雑な数式で表現されるアルゴリズムを使って、ビッグデータを解析し、ヒトの知性を
超える結果を出す、というようになります。

データを読む感性
© Hideo Hirose
ベクトルで表現すると、
複雑な数式で表現されるアルゴリズムを使って、ビッグデータを解析し、ヒトの知性を
超える結果を出す、というようになります。
ラーニングアナリティクス
例１
さて、一般的なお話しはここまでにして、ここで、私どもで実際に取り組んでい
ることをご紹介したいと思います。まず、はじめに、ラーニングアナリティクスに
ついてです。
ラーニングアナリティクスという言葉は使われはじめてまだ間もないのですが、
これは、コンピュータ利用によって蓄積された膨大な学習データを統計的に分析
して、効果的な学習支援を行う知見を得ようという、教育分野において最近急速
に発展してきている分野です。ここには、さまざまな機械学習のアルゴリズムも
使われています。

CBTによるサポート
CBT: Computer Based Testing
コンピュータを用いたテスト
CBTとはcomputerを用いたテストのことです。広島工大では、数学科目について、
2016年度から全学生にcomputerを使った小テストを行い、学生の習熟度を丁寧に見て
います。
数学教員が作った3000問以上の問題
3000人以上の受験者
この図は、数学教員が作った3000問以上の問題に対して、3000人以上の学生が小テ
ストに応答した結果を、マトリクス（行列）を使って表しています。アクセス数に応
じて色が変っています。緑はアクセス数が少なく、黄色から赤になるほど、アクセス
数が多いことを表しています。この結果から、学生一人一人の習熟度の変化を探って
みようとしています。

小テスト成績履歴
はじめに、ある数学の授業の中で毎時間実施された小テストの結果の成績の履歴を見て
みます。横軸は小テスト実施の週、縦軸は習熟度です。一本一本の線は一人一人の学生
さんを表します。
実は、これは、期末試験合格者の成績の変化です。

期末試験不合格者の成績はこのようになっています。
両者の違いがわかりませんから、一緒に比較してみます。
違いがわからない
横並びにしても違いがよくわかりません。

平均をとると誤差がとれる「平均をとると誤差がとれる」という統計の性質があります。
期末試験
合格者
習熟度
期末試験
不合格者
人数
Histogram of ABAAALCTSF$ABAsuccess
ABAAALCTSF$ABAsuccess
Frequency
-2 -1 0 1 2
051015202530
小テスト14回分の成績を平均した値のヒストグラム
合格者と不合格者の区別はつきにくい
習熟度の平均
人数
そこで、15回にわたる小テストの平均をとり、学生全員についてヒストグラムを作っ
てみました。
期末試験合格者の中に期末試験不合格者はすっぽり埋もれていて、両者を分離できま
せん。つまり、1回1回の小テストを個々に見ていても、どの学生が期末試験に失敗す
るかを予測できないということになります。
小テストの成績は、学生の指導に役に立たないのでしょうか。

項目反応理論を用いた正確な習熟度評価法を
累積させた小テストデータに適用し
習熟度の時間的トレンドを見た
項目反応理論：
学生の習熟度と問題の困難度を同時に推定できる統計的な理論で
現代テスト理論に基づいている
ここで、より正確な評価値を求めるため、習熟度評価法には項目反応理論を用い、
さらに、1回1回ではなく、累積させた小テストデータにそれを適用し、習熟度評価の
時間的トレンドを見てみました。項目反応理論とは、学生の習熟度と問題の困難度を
同時に推定できる統計的な理論で、現代テスト理論に基づいています。
それがこれです。
これは、期末試験合格者の成績のトレンドです。安定した傾向がみてとれます。

期末試験不合格者についても同様、安定した傾向にあることがわかります。
違いがある
両者を比較してみると、今度は明確な違いがあるように見えます。

類似度を比較する
NNによって
ここで、ある学生一人をとりあげ、その学生の成績のトレンドが期末試験合格者と不合
格者のどちらに似ているかという類似度を求めてみます。
例えば、（ピンクを指す）合格者の中の一番上のピンクの曲線の近くには右をみると1
本の不合格者もいませんから、この学生は合格と判断するわけです。混在している部分
では、どの程度の確率で不合格になるか、が判断できることになります。
0
50
100
150
200
250
300
350
LCT1-LCT4を使用 LCT1-LCT7を使用 LCT1-LCT11を使用
不合格者数 = 206
p ≧ 0.3
p ≧ 0.4
p ≧ 0.5
p ≧ 0.3
p ≧ 0.4
p ≧ 0.5
p ≧ 0.3
p ≧ 0.4
p ≧ 0.5
不合格と予測
実際には合格
不合格と予測
実際にも不合格
学期の1/3近辺学期の1/2近辺学期の2/3近辺
実際の不合格ライン
平均で評価するよりも
トレンドの類似度評価を使った方が
リスクのある学生をよくとらえている
しかも、どの時点で評価してもあまり変わらない
期末試験不合格者の予測数
そうすると、不合格の確率をpとしますと、p以上で不合格になる学生を捕まえること
ができます。
例えば、pが0.4以上の学生を捕まえると、学期の半ばであっても、実際の不合格者数
に近い数を予測していることがわかります。その中で半分は実際に不合格になってい
ます。
重要なのは、この傾向は、学期の1/3くらいのときでも、学期半ばでも、学期が2/3過
ぎた時点でもあまり変わらないということです。早い段階でリスクのある学生の捉え
ることができていることを示しています。

機械学習（AI）を用いれば
このpの値をどのように設定するのが最適であるか、というのを判断する際に、機械学
習でよく用いられている、
TruePositiveRate
0
0.2
0.4
0.6
0.8
1
False Positive Rate
0 0.2 0.4 0.6 0.8 1
p 0.2
p 0.1
p 0
p 0.3
p 0.4
p 0.5
p 0.6
tangent = 4
実際には合格なのに不合格と予測された
実際に不合格で不合格と予測された
p 0.4
p 0.3
p 0.5
p 0.1
学期の1/3近辺
学期の1/2近辺
学期の2/3近辺
ROC曲線
最適なpは0.4
p: 不合格の確率
receiver operating characteristic curve（受信者動作特性曲線）
ここでは
不合格をPositiveとしている
ROC曲線というものを使うことができます。ROC曲線を使うことによって、最適なp
の値は0.4に設定すると、精度よくリスクのある学生の捉えることができることになり
ます。

このままだとあなたは確率40%以上で期末試験に不合格です
というアラートを出して学生を奮起させることも可能
学期の途中で、
「このままだとあなたは確率40%以上で期末試験に不合格です」
というアラートを出して学生を奮起させることも可能になるわけです。
87
全学生の毎時間での小テスト
（大規模ラーニングデータ）
類似度による2値分類
学生のドロップアウトリスク予測
© Hideo Hirose 88
先ほどの模式図にこのことを適用してみますと、全学生の毎時間での小テストがデータ
の部分で、類似度による2値分類のところがアルゴリズム、学生のドロップアウトリス
ク予測が、もともとやりたかったこと、ということになると思います。

感染症の流行予測
例２
次に、感染症の流行予測の例です。
先月、廿日市市の地御前幼稚園でノロウイルスが検出されたとのニュースがありました。

時間
時間
感染者数
y = f(t)
t
y
一定時間ごとに得られるデータを使った予測法の定石は
2018年11月23日
西部保健所菅内の9月からの定点当たりの患者数の推移を表したものがこの図です。患
者数は時間とともに変化していきますから、一定時間ごとに得られるデータを使って解
析する、
伝統的な統計的手法時系列解析
autoregressive integrated moving average (ARIMA) model
AR
ARMA
ARIMA
as a model for y = f (x), traditionally,
artiﬁcial neural networks (ANN) model
time-series modeling
Box-Jenkinsy = f(t) = Xt
ARIMA(p, d, q) × (P, D, Q)S,
with p = non-seasonal AR order, d = non-seasonal differencing, q = non-seasonal
MA order, P = seasonal AR order, D = seasonal differencing, Q = seasonal MA
order, and S = time span of repeating seasonal pattern.
seasonal'ARIMA,
96
伝統的な統計的手法である時系列解析が使えます。ARとかARIMAと略されて呼ばれていま
す。季節性を考慮して作られた数理モデルもあります。

マトリクス（行列）にして考えてみる
週
年
ここにはどんな数値が入るだろうか
時系列ではなく
しかし、ここでは、時系列解析ではなく、年と週を行と列に持つマトリクス（行列）
にして考えてみようと思います。緑の部分は感染者数は少なく、黄色から赤くなるに
従って感染者数は多くなることを表しています。右下の空白の部分を予測しようとし
ています。
ノロウイルス感染症患者の発生状況の予測法
西部保健所菅内での感染者数の推移はまだ継続中ですから、ここでは、過去に起
こった全国のノロウイルス感染症患者の発生状況の予測を行ってみます。

2"
0"
5"
10"
15"
20"
25"
1" 2" 3" 4" 5" 6" 7" 8" 9" 10"11"12"13"14"15"16"17"18"19"20"21"22"23"24"25"26"27"28"29"30"31"32"33"34"35"36"37"38"39"40"41"42"43"44"45"46"47"48"49"50"51"52"

!
!
12
5
5
Prediction by using 1-5 weeks data
Number"of"pa6ents
weeks
実線表示のデータを使って予測
観測値
予測結果
観測値を青い点線で表し、赤い実線表示のデータを使って、予測値を赤い点線で表して
います。ここでは、その年の1月だけの観測値からその年のすべての週での予測がうま
くいっているように描いていますが、
13
0"
5"
10"
15"
20"
25"
1" 2" 3" 4" 5" 6" 7" 8" 9" 10"11"12"13"14"15"16"17"18"19"20"21"22"23"24"25"26"27"28"29"30"31"32"33"34"35"36"37"38"39"40"41"42"43"44"45"46"47"48"49"50"51"52"

!
!
02
03
04
05
06
07
08
09
10
11
12
5
5
Prediction by using 1-5 weeks data
Numberofpatients
weeks
観測値
予測結果
過去10年分のデータ
実線表示のデータを使って予測実は、背景に薄く見える、過去10年分のデータも用いているのです。
蓄積された過去のデータと直近の観測値を使って将来の予測を行っているのがお分かり
いただけると思います。

€
1.5 1 1( )
€
1
2
3










€
1
2
3










1.5 1 1( ) =
1.5 1 1
3 2 2
4.5 3 3










105
観測値と予測値の誤差
€
A
€
M
€
P
U
€
UM = P
マトリクス（行列）を2つのマトリクスの掛け算に分解する
UV = P
V
X
overﬁtting回避最小二情報
マトリクス（行列）分解法
106
予測アルゴリズムには、マトリクス（行列）分解法というものを用いています。マト
リクス（行列）Xを2つのマトリクスU, Vの掛け算に分解するというものです。
観測値と空欄が混在しているマトリクスXを、すべての要素に数値が入っている2つのマトリク
スUとVの掛け算で近似します。ここで、掛け合わせたマトリクスPの結果と、もとの観測値Xと
の誤差を最小にするという基準を使います。そして、この問題にだけはぴったり当てはまるとい
うover-ﬁttingを防ぐため、ペナルティー項を加えて正則化という調整を行っています。
時系列解析よりもマトリクス分解法が観測結果に合った
その結果、1月の段階で予測した年末での結果は、時系列解析よりマトリクス分解
法が観測結果に合ったのです。機械学習が統計解析を上回った事例です。

感染症流行観測値
感染症流行予測
© Hideo Hirose 110
ここでも、感染症流行の観測値をデータに、マトリクス分解法を流行予測を行うのに相性
の良いアルゴリズムと考えると、感染症流行のリスクを低減させるために予測を行う、と
いう模式図になっていることがおわかりいただけるかと思います。
アルゴリスム
アルゴリスム
アルゴリスム
アルゴリスム
アルゴリスム
アルゴリスム
データ
データ
データ
データ
データ
データ
予測
予測
予測
予測
予測
予測
© Hideo Hirose
AIとは
そうすると、いろいろなデータを用いて、それに合ったアルゴリズムを見つけ、
予測を行うということに、たくさんの組み合わせができると思います。
結局、AIというのは、この模式の一つ一つの組み合わせのことを指しているとい
うことがお分かりになったと思います。

人類の英知を結集し、コンピュータの力を借りて
ようやく
ヒトよりも、速くできるようになったAI
一つのタスクなら、
今後
データとアルゴリズムの組み合わせで多方面に広がる
人類の英知を結集し、コンピュータの力を借りて、一つのタスクなら、ようやくヒト
よりも速くできるようになったAIが、今後、データとアルゴリズムの組み合わせで多
方面に広がっていくと思われます。
データを読む感性
© Hideo Hirose
先ほどは、データとアルゴリズムの相性を考えるときに「データを読む感性」が求め
られる、と言いましたが、さらに、
新しい分野に創造的に踏み込もうとするときにも、
「感性」というのが重要になってくると考えています。

予測困難
判断ムリ
予知ムリムリ
同定困難
OK OK
OK
教師
経験
文献
mimicの域を出ていないAIから
しかし、冒頭で言いましたように、deep learningをはじめとるする、機械学習を使った予
測法では、interpolationを行っているにすぎません。英語でmimicという言葉がありま
す。真似する、という意味です。自分の知っている範囲の中での精密な予測ということは
得意ですが、外に向かっての予測は不得意です。
mimicの域を出ていないAIから
予測困難
判断ムリ
予知ムリムリ
同定困難
OK OK
OK
教師
経験
文献
3歳児の天衣無縫さを持つAIができるか
mimicの域を出ていないAIから3歳児の天衣無縫さをもつAIができるか、そこが今後の課
題になります。

intelligence sense, kansei
intention
その際、部分的にはヒトの知性を越えようとしているAIですが、外延、extrapolation
の領域にまで踏み込むには、ヒト特有の感性や意思といったものが求められます。こ
れには、まだ時間がかかるでしょう。
もう一度言います。
ぎっしり詰まった網の目をもつAIでも外の予測は不得意です。

数理的基礎さて、最後に、大学生の皆さんが、これからのAIや機械学習、あるいはデータサイ
エンスで必要とされる素養について、少し述べます。
anization of the BookCHAPTER 1. INTRODUCTION
1. Introduction
Part I: Applied Math and Machine Learning Basics
2. Linear Algebra
3. Probability and
Information Theory
4. Numerical
Computation
5. Machine Learning
Basics
Part II: Deep Networks: Modern Practices
6. Deep Feedforward
Figure 1.6
anization of the BookCHAPTER 1. INTRODUCTION
1. Introduction
Part I: Applied Math and Machine Learning Basics
2. Linear Algebra
3. Probability and
Information Theory
4. Numerical
Computation
5. Machine Learning
Basics
Part II: Deep Networks: Modern Practices
6. Deep Feedforward
Networks
7. Regularization 8. Optimization 9. CNNs 10. RNNs
11. Practical
Methodology
12. Applications
Figure 1.6 基礎的な数学が重要
AIの基礎
線形代数確率・統計
数値計算機械学習の基礎
Introduction
Lecture slides for Chapter 1 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
2016-09-26
基礎的な数学
私は、数学、特に確率や統計を教えていますので、ここで、数理的な基礎としてどこを抑
えておけばよいか、この1ページでご説明しておきます。Deep learningという分厚い本
の最初にも書かれているのですが、
線形代数確率・統計数値計算
は、機械学習、つまりAIを学ぶための基本中の基本ということです。

なぜ、確率と統計が
数学と併記されて
重要な基礎科目なのか
ところで、なぜ、確率と統計が、数学の一部ではなく、数学と併記されて重要なので
しょう。
それは、一番最初に述べたコンピュータが扱える量を超えた時代に入ったからです。
データの全てを対象とするのではなく、データの一部だけを見て全体を推測するとい
う方法がさけられません。
データのゆらぎデータには、揺らぎが起こっていると感じることが重要です。

数学で扱う「数」確率的にふるまう「数」
数学で扱う数は、いわば左のような、かっちりとしたイメージですが、
確率的な揺らぎを持った数というのは、右のような、ふわふわとした、はっきりとした
位置を示さない数です。
確率変数Xは、綿菓子の塊のどこにあるのかわかりませんが、綿菓子の中にはあるのです。

X
Y
X+Y
2
?
赤と白の2つの綿菓子の中のXとYの平均をとると、少し小さくなったピンクの綿菓子の
中にその平均ができます。
データの信頼度このようにして、データの信頼度を考えながら推論を進めることになりますが、ここに
統計的な知識が必要になります。

真値
観測値から推定された値
真値からの95%信頼域
推定値を捕捉
真の値はわからないのですが、予測値は綿菓子の塊の中に含まれている。
真値
真値からの95%信頼域
推定値を捕捉
推定値からの95%信頼域
真値を捕捉
真値
立場を逆に見て、推定値から考えると、推定値を中心とした綿菓子の中に真値が含ま
れているということがわかります。このようにして、推定値の信頼区間に意味を持た
せることができます。
このような揺らぎを感じながら予測を行うことがとても重要になってくると思います。

AIを使って
人生を楽しく!
次の世代はAIネイティブ
Okada Nana
感性
データサイエンス
と共に
AIの正体を見れば怖くはないことがお分かりいただけたと思います。
また、データとアルゴリズムを橋渡しできる「感性」があれば、
AIを使いこなすことができるようになります。
次の世代はAIネイティブです。
AIを使って、人生を楽しくしましょう。
データを読みとる感性
2018.12.18
thank you
広島工業大学大学院シンポジウム
ご静聴ありがとうございました。

データを読み取る感性

Recommended

Recommended

More Related Content

Similar to データを読み取る感性

Similar to データを読み取る感性 (19)

More from Hideo Hirose

More from Hideo Hirose (20)

Recently uploaded

Recently uploaded (6)

データを読み取る感性