# Explaining Potential Risks in Traffic Scenes by Combining Logical Inference and Physical Simulation

Tohoku University
Dec. 9, 2016

### Explaining Potential Risks in Traffic Scenes by Combining Logical Inference and Physical Simulation

1. Explaining Potential Risks in Traffic Scenes by Combining Logical Inference and Physical Simulation Ryo Takahashi*1, Naoya Inoue*1, Yasutaka Kuriya*2, Sosuke Kobayashi*1, and Kentaro Inui*1 *1: Tohoku University *2: DENSO CORPORATION
2. Predict potential risks in the traffic scene in the city 12/09/16 2 Research goal ① Woman raises her hand ② Taxi suddenly stops ③ Ego-vehicle collides with the taxi
3. Predict potential risks in the traffic scene in the city 12/09/16 3 Research goal ① Woman raises her hand ② Taxi suddenly stops ③ Ego-vehicle suddenly stops ④ Ego-vehicle collides with green truck from behind
4. Predict potential risks in the traffic scene in the city 12/09/16 4 Research goal ① Woman raises her hand ③ Ego-vehicle suddenly stops ② Taxi suddenly stops ④ Ego-vehicle collides with green truck from behind ③ Ego-vehicle collides with the taxi Recognize potential risks with causal chains Refer to as “explanation”
5. 12/09/16 5 Definition of traffic risk recognition Input Information obtained from sensors (e.g., vision cameras and LIDARs) • Quantitative data (shape, position, velocity) • Qualitative data (a scene description of a traffic scene) Output Explanations of the risks with the likelihood score (divide into two stages) Score Entity-action 0.8 Taxi(T) stops 0.5 Pedestrian(P) cross the road Scene description: Pedestrian(P), Taxi(T), Me(Me), LeftFrontOf(T, P), … (shape, position, velocity) Score Entity-action Explanation 0.8 Taxi(T) stops Woman raises her hand, Taxi suddenly stops, then the ego- vehicle collides. 0.5 Pedestrian(P) cross the road ...• Pair of a colliding entity and its action • Whole explanation
6. 12/09/16 6 Key idea Combine logical inference and physical simulationRecognizing Potential Traffic Risks through Logic-based Deep Scene Understanding cases in real-life applications. The main idea of our approach is as follows. We keep the prediction rules as general as possible, and try to find the best explanation for the condition of general prediction rules based on multiple pieces of evidence at different levels of abstraction. One can view the pro- cess of finding the best explanations as the projection of observed information onto a latent space of features useful for risk prediction. Introducing a latent feature space has been proved effective for a wide range of AI tasks [4, 8]. In our task, the latent information we particularly consider useful includes the beliefs, desires, and intentions (BDI) of each traffic object appearing in a given traffic scene. If the BDI information of objects is available, it can then be used to predict their next actions, which leads to a better prediction of potential risks. BDI information is not observable, but may sometimes be inferable from the observable behavior of the object and the observable infor- Abductive! Reasoner Image! Description Knowledge Bases 旬 者 陰 一 ＝ 歩 行 一 ＝ 画 』 4 0 k ｍ/ h で進行して います。どのような ことに注意して運転 しますか。 □①学撰が獣飛び曲してくるかもしれないので､速度を落としていつでも止まれるようにする。 そ く ど お と □②孝養が割こ柔び曲してくるかもしれないので､鰹な齢をあけるため対向車線にはみ出して たいこうしやせん そうこう 走行する。 ｎ③字撰は畠軍に美符いて飛び曲してこないと思うので､このままの速度で走行する。 お も そ く ど そ う こ う 1７ 1｢in'ｙ F が B O k m / h で進行して います。どのような ことに注意して運転 しますか。 □①ポｰ ﾙ は艶の若櫛に毒がっていきそうなので､ 左端によってこのままの雛で雛する。 ひだり芯し お こ ど も と だ と そ く ど □②ボールを追いかけて子供が飛び出してくるかもしれないので、いつでも止まれるように速度を 曇 と す 。 ” い … “お こ ど 巷 と だ つ う か □③ボールを追いかけて子供が飛び出してこないよう、警音器を鳴らしてこのまま通過する。 ロ①孝養も麦樫も自重に鍋いて飛び息してこないと窯うので､このまま走行する。 そうこう □②学鴬が乗ぴ曲さないように女性が声をかけてくれているので､す速く通過する。 じ よ せ い こ え ば や つ う か 、③学撰は崇に謡いていないかもしれないので､ 大きく右によけて通過する。 お お み ざ つ う か 極 駒 『 ー ■ ､１ 、 ﹃ ず ｒ臼 ﹃ ︷ ﹃ ｙ3 0 k ｍ/ h で進行して います。どのような ことに注意して運転 しますか。 Ｌ デー息請｢ 同黙 糞で 藷 ｒ 画 弓 一 Ｆ１寓一 ｜ 陸 問題③ 問題２ 問題.ロー ‐ ヴ ー ー ー ー X follows Y.! X does not recognize me.! The scene has! a potential risk.! explains! via (c)! explains! via (a) uniﬁcation There is a person X. X will rush out.! X is a child.! There is a ball Y.! Z does not recognize me.! There is a person Z. explains! via (b) uniﬁcation (hypothesize X=Z) BDI-level! representation Ground-level! representation (a) Causality (b) Ontological! knowledge (c) Risk pattern Best Explanation! (Implicit information &! potential risks) Z is behind the wall.! Z is a child. explains! via (b)! (A) (B) (C)(D) (E) (F) Image! Description! wall(Wall), soccer-ball(Ball), car(Me), car(RedCar), left-front-of(Now, Wall, Me), ... Fig 2. Overall framework. Arrows with alphabets represent information flow. Logical Inference (apply symbolic rules to a scene description and infer new facts) [Armand+ 14, Inoue+ 15, Mohammad+ 15, Zhao+ 15, etc.] PROS: Easy to take causal chain into account PROS: Can combine human writing knowledge CONS: The prediction is not based on a precise physical prediction [Althoff+ 09] [Inoue+ 15] 00 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 2, JUNE Fig. 2. Stochastic reachable sets of trafﬁc participants. Physical Simulation (simulate a scene and detect future collisions) [Broadhurst+ 05, Althoff+ 09, etc.] PROS: Precise prediction grounded in physical quantities CONS: Difficult to take causal chain into account
7. 12/09/16 7 Key idea Combine logical inference and physical simulationRecognizing Potential Traffic Risks through Logic-based Deep Scene Understanding cases in real-life applications. The main idea of our approach is as follows. We keep the prediction rules as general as possible, and try to find the best explanation for the condition of general prediction rules based on multiple pieces of evidence at different levels of abstraction. One can view the pro- cess of finding the best explanations as the projection of observed information onto a latent space of features useful for risk prediction. Introducing a latent feature space has been proved effective for a wide range of AI tasks [4, 8]. In our task, the latent information we particularly consider useful includes the beliefs, desires, and intentions (BDI) of each traffic object appearing in a given traffic scene. If the BDI information of objects is available, it can then be used to predict their next actions, which leads to a better prediction of potential risks. BDI information is not observable, but may sometimes be inferable from the observable behavior of the object and the observable infor- Abductive! Reasoner Image! Description Knowledge Bases 旬 者 陰 一 ＝ 歩 行 一 ＝ 画 』 4 0 k ｍ/ h で進行して います。どのような ことに注意して運転 しますか。 □①学撰が獣飛び曲してくるかもしれないので､速度を落としていつでも止まれるようにする。 そ く ど お と □②孝養が割こ柔び曲してくるかもしれないので､鰹な齢をあけるため対向車線にはみ出して たいこうしやせん そうこう 走行する。 ｎ③字撰は畠軍に美符いて飛び曲してこないと思うので､このままの速度で走行する。 お も そ く ど そ う こ う 1７ 1｢in'ｙ F が B O k m / h で進行して います。どのような ことに注意して運転 しますか。 □①ポｰ ﾙ は艶の若櫛に毒がっていきそうなので､ 左端によってこのままの雛で雛する。 ひだり芯し お こ ど も と だ と そ く ど □②ボールを追いかけて子供が飛び出してくるかもしれないので、いつでも止まれるように速度を 曇 と す 。 ” い … “お こ ど 巷 と だ つ う か □③ボールを追いかけて子供が飛び出してこないよう、警音器を鳴らしてこのまま通過する。 ロ①孝養も麦樫も自重に鍋いて飛び息してこないと窯うので､このまま走行する。 そうこう □②学鴬が乗ぴ曲さないように女性が声をかけてくれているので､す速く通過する。 じ よ せ い こ え ば や つ う か 、③学撰は崇に謡いていないかもしれないので､ 大きく右によけて通過する。 お お み ざ つ う か 極 駒 『 ー ■ ､１ 、 ﹃ ず ｒ臼 ﹃ ︷ ﹃ ｙ3 0 k ｍ/ h で進行して います。どのような ことに注意して運転 しますか。 Ｌ デー息請｢ 同黙 糞で 藷 ｒ 画 弓 一 Ｆ１寓一 ｜ 陸 問題③ 問題２ 問題.ロー ‐ ヴ ー ー ー ー X follows Y.! X does not recognize me.! The scene has! a potential risk.! explains! via (c)! explains! via (a) uniﬁcation There is a person X. X will rush out.! X is a child.! There is a ball Y.! Z does not recognize me.! There is a person Z. explains! via (b) uniﬁcation (hypothesize X=Z) BDI-level! representation Ground-level! representation (a) Causality (b) Ontological! knowledge (c) Risk pattern Best Explanation! (Implicit information &! potential risks) Z is behind the wall.! Z is a child. explains! via (b)! (A) (B) (C)(D) (E) (F) Image! Description! wall(Wall), soccer-ball(Ball), car(Me), car(RedCar), left-front-of(Now, Wall, Me), ... Fig 2. Overall framework. Arrows with alphabets represent information flow. Logical Inference (apply symbolic rules to a scene description and infer new facts) [Armand+ 14, Inoue+ 15, Mohammad+ 15, Zhao+ 15, etc.] PROS: Easy to take causal chain into account PROS: Can combine human writing knowledge CONS: The prediction is not based on a precise physical prediction [Althoff+ 09] [Inoue+ 15] 00 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 2, JUNE Fig. 2. Stochastic reachable sets of trafﬁc participants. Physical Simulation (simulate a scene and detect future collisions) [Broadhurst+ 05, Althoff+ 09, etc.] PROS: Precise prediction grounded in physical quantities CONS: Difficult to take causal chain into account Complementary • How to combine the two components? • No previous work addresses this issue
8. 12/09/16 8 Pipeline re-ranking model Scene Description Knowledge Base Logical Inference Physical Simulation Quantitative Data Given Given Explanations of the Risks Output Abductive Reasoner [Inoue+ 15] • Generate candidate risks using qualitative information • “What entity will take how risky action in this scene?” Action-based Physical Simulation • Re-score the candidates of risks using quantitative information • “Is the risk candidate really risky?”
9. 12/09/16 9 Evaluation | Dataset and task setting Risk recognition system Risky entity-action ranking +metadata Original dataset • Drive Recorder Database by the Tokyo University of Agriculture and Technology • Over 100,000 video recordings of near-miss accidents • Scale of knowledge base: • “such situation it might act like this” knowledge: 13 types • Is-a knowledge: 211 types 2-d bird view map • Two seconds before the actual risk • Manually created 379 scenes • training : test : validation = 3 : 1 : 1
10. 12/09/16 10 Evaluation | Models 1/3 Compare three models Scene Description Knowledge Base Quantitative Data Given Given Explanations of the Risks Output ML-based Ranker Baseline (SVM): • Directly models between scene description and a risky entity-action pair • Trained by using ranking-SVM [Joachims 02] Proposed1 (Inference): • Uses only qualitative information Proposed2 (Inf+PhySim): • The full model
11. 12/09/16 11 Evaluation | Models 1/3 Compare three models Baseline (SVM): • Directly models between scene description and a risky entity-action pair • Trained by using ranking-SVM [Joachims 02] Proposed1 (Inference): • Uses only qualitative information Proposed2 (Inf+PhySim): • The full model Scene Description Knowledge Base Quantitative Data Given Given Explanations of the Risks Logical Inference Output
12. 12/09/16 12 Evaluation | Models 1/3 Compare three models Baseline (SVM): • Directly models between scene description and a risky entity-action pair • Trained by using ranking-SVM [Joachims 02] Proposed1 (Inference): • Uses only qualitative information Proposed2 (Inf+PhySim): • The full model Scene Description Knowledge Base Quantitative Data Given Given Explanations of the Risks Logical Inference Physical Simulation Output
13. 12/09/16 13 Qualitative evaluation BASELINE 52.8 (38/72) 80.6 (58/72) 90.3 (65/72) 55.6 (40/ INFERENCE 51.4 (37/72) 80.6 (58/72) 90.3 (65/72) 58.3 (42/ INF+PHYSIM 51.4 (37/72) 81.9 (59/72) 90.3 (65/72) 58.3 (42/ Fig. 4: Example trajectories our physics simulator output. Scene ID is 1397 on the database. The boxes represent vehicles, and the lines that drawn from vehicles represent trajectories. machine learning-based ranker or classiﬁer, it is relatively harder to predict this kind of richer explanations. Finally, we compare INFERENCE with INF+PHYSIM. The results indicate that physics simulation did not improve the results of qualitative inference. In future work, we will conduct a deeper analysis on the results of physics simulation and reﬁne the entire framework to improve the results. [5] Mahmud Ab Rossitza Setc scenes using Based and In Singapore, 7 [6] Lihua Zhao, Kakinami, a uncontrolled Vehicles Sym 2015, pages [7] Naoya Inoue Recognizing Understandin [8] Kiken-yosoku [9] A. Broadhur reasoning. In pages 319–3 [10] Matthias Al probabilistic Intelligent Tr [11] Amol Amba Nicolescu. EURASIP J. [12] Andreas Ess A mobile v IEEE Compu Actual risk: Car will change lanes to avoid ParkingCar ① There is a parked car in front of Car ② There is a lane on the right side of Car ③ Car changes lane to the right ④ Me collides with Car in 3.6 seconds Model Predicted risky entity-action Explanation Quantitative prediction Baseline Car will stop X X Proposed1 Car will change lanes ✔ X Proposed2 Car will change lanes ✔ ✔ Model Predicted risky entity-action Explanation Baseline Car will stop X Proposed1 Car will change lanes ✔ Proposed2 Car will change lanes ✔ Model Predicted risky entity-action Baseline Car will stop Proposed1 Car will change lanes Proposed2 Car will change lanes
14. 12/09/16 14 Quantitative evaluation • Adding abductive reasoning and physical simulation has not yet affected prediction accuracy positively • Future work: improve the accuracy Baseline (SVM): • Directly models between scene description and a risky entity-action pair • Trained by using ranking-SVM [Joachims 02] Proposed1 (Inference): • Uses only qualitative information Proposed2 (Inf+PhySim): • The full model cture and traffic objects (position and direction of t). The physical simulator uses this information to he physical data. The logical representation of each ene is automatically generated from the map to Section IV-A1. In this experiment, we assume acy of sensor technologies to be perfect, in order to exploring the methodology of risk prediction. luate our results by using a task-ranking framework. Acc@ k (Accuracy at k ) as a metric, which is the f problems for which the correct prediction is made nk k . et a database of near-miss accidents published by the niversity of Agriculture and Technology. The Fig. 4. Example trajectories produced as output by our physics sim Scene ID is 1397 on the database. The boxes represent vehicles, and th drawn from vehicles represent trajectories. directly models a mapping between observed inform and a risky entity-action tuple. As mentioned in Se IV-A2, the weight vector wc is trained by using a ran SVM [20]; therefore, the result of this baseline indicate basic performance of a statistical machine-learning-b TABLE III: ACCURACY OF RISK PREDICTION MODELS Validation Test l Acc@1 Acc@3 Acc@5 Acc@1 Acc@3 Acc@5 LINE 52.8 (38/72) 80.6 (58/72) 90.3 (65/72) 55.6 (40/72) 77.8 (56/72) 93.1 (67/72) RENCE 51.4 (37/72) 80.6 (58/72) 90.3 (65/72) 58.3 (42/72) 77.8 (56/72) 91.7 (66/72) PHYSIM 51.4 (37/72) 81.9 (59/72) 90.3 (65/72) 58.3 (42/72) 77.8 (56/72) 91.7 (66/72) road structure and traffic objects (position and direction of movement). The physical simulator uses this information to simulate the physical data. The logical representation of each traffic scene is automatically generated from the map according to Section IV-A1. In this experiment, we assume the accuracy of sensor technologies to be perfect, in order to focus on exploring the methodology of risk prediction. We evaluate our results by using a task-ranking framework. We use Acc@ k (Accuracy at k ) as a metric, which is the fraction of problems for which the correct prediction is made within rank k . B. Dataset We use a database of near-miss accidents published by the Tokyo University of Agriculture and Technology. The TABLE III: ACCURACY O Validation Model Acc@1 Acc@3 Acc@5 BASELINE 52.8 (38/72) 80.6 (58/72) 90.3 (65/ INFERENCE 51.4 (37/72) 80.6 (58/72) 90.3 (65/ INF+PHYSIM 51.4 (37/72) 81.9 (59/72) 90.3 (65/
15. • Combined logical inference and physical simulation to explain risky situation – Very first step • Built a benchmark dataset consisting of real-life traffic scenes • Future work: – Extend the knowledge base – Extend the scale of evaluation – Automatize describing scene 12/09/16 15 Summary
16. • [Armand+ 14] A. Armand, D. Filliat, and J. I. Guzman, “Ontology based context awareness for driving assistance systems,” In 2014 IEEE Intelligent Vehicles Symp. Proc., Dearborn, MI, USA, June 8-11, 2014, pp. 227–233, 2014. • [Inoue+ 15] N. Inoue, Y. Kuriya, S. Kobayashi, and K. Inui, “Recognizing Potential Trafﬁc Risks through Logic-based Deep Scene Understanding,” In Proc. 22nd ITS World Congress, 2015. • [Mohammad+ 15] M. A. Mohammad, I. Kaloskampis, Y. Hicks, and R. Setchi, “Ontology-based framework for risk assessment in road scenes using videos,” In 19th Int. Conf. in Knowledge Based and Intelligent Information and Engineering Systems, KES 2015, Singapore, 7-9 September 2015, pp. 1532–1541, 2015. • [Zhao+ 15] L. Zhao, R. Ichise, T. Yoshikawa, T. Naito, T. Kakinami, and Y. Sasaki, “Ontology-based decision making on uncontrolled intersections and narrow roads,” In 2015 IEEE Intelligent Vehicles Symp., IV 2015, Seoul, South Korea, June 28 - July 1, 2015, pp. 83–88, 2015. • [Broadhurst+ 05] A. Broadhurst, S. Baker, and T. Kanade, “Monte Carlo road safety reasoning,” In Proc. IEEE Intelligent Vehicles Symp., 2005, pp. 319–324, June 2005. • [Althoff+ 09] M. Althoff, O. Stursberg, and M. Buss, “Model-based probabilistic collision detection in autonomous driving,” IEEE Trans. Intell. Transport. Syst., vol. 10, no. 2, pp. 299–310, 2009. • [Joachims 02] T. Joachims, “Optimizing search engines using clickthrough data,” In Proc. Eighth ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, pp. 133–142, 2002. 12/09/16 16 References