Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
“To Explain or To Predict”“To Know or To Act”(Pure Science vs. Engineering, 2004)            Using Target-Based Bayesian N...
DOE: Vs-optimal designs Ginsburg & Ben-Gal (2004)       x (control)                     f(x)    Y (output)          f(x...
The Bias-Variance TradeoffTel Aviv UniversityDepartment of Industrial Engineering
Presentation Layout       Bayesian networks and classifiers       Targeted Bayesian Network Learning (TBNL) (with Gruber...
Bayesian Networks (Pearl, 85)Tel Aviv UniversityDepartment of Industrial Engineering
What is a Bayesian Network?                                                                                          Joint...
Explain or Predict (classify)                                         Chow & Liu (1968)                                   ...
Unconstrained Learning                                       Assume X is the target variable                              ...
Constrained Learning                                       Assume X is the target variable                                ...
Differential Complexity                                               Explain                                             ...
Results (1/2)                           Data Sets Properties and Testing Methods            Dataset       # Attributes # C...
Naïve Bayes: Predict                                             Corral Dataset                                           ...
Tree Augmented Network (TAN)                                                                                              ...
Managing the Trade-off                                       CV5                                                 CV5      ...
Results (2/2)                                                  Accuracy              Dataset           TBNL     BNC-2P    ...
Presentation Layout       Bayesian networks and classifiers       Targeted Bayesian Network Learning (TBNL)       TBNL ...
Domain Description      Motivation          Simplicity: complexity-error tradeoff          Information extraction: util...
Data Description of the Domain                                        Call Detail Record (CDR)              Field         ...
ROC curve                                           40 suspects to no avail                                          1900 ...
Feature Extraction                            Activity of calls during the day of two distinct groups       Inter_prc_q1, ...
Learning & Mining Mobility Patterns(PI’s: Ben-Gal, Toch and Lerner, 2012)
Conclusions       “To Explain or to Predict” –        “To know or to Act” (constraint modeling)       Managing the error...
Prediction can help…Tel Aviv UniversityDepartment of Industrial Engineering
Upcoming SlideShare
Loading in …5
×

Ben Gal

3,274 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Ben Gal

  1. 1. “To Explain or To Predict”“To Know or To Act”(Pure Science vs. Engineering, 2004) Using Target-Based Bayesian Nets for Suspects Monitoring (joint work with A. Gruber and S. Yanovski) Irad Ben-Gal Tel Aviv University
  2. 2. DOE: Vs-optimal designs Ginsburg & Ben-Gal (2004) x (control) f(x) Y (output)  f(x) known: f(x)/x=0  x*  f(x) unknown:  Estimate g(x) (Meta Model: DOE, RSM,…)   g(x)/x=0  x* (R.V.)  ‘Scientists’ (to Know): Best estimation of f(x)  min V() (e.g., D-optimal exp.)  ‘Practitioner’ (to act) : Best estimation of x*  min V(x*) (new DOE optimality criterion)Tel Aviv UniversityDepartment of Industrial Engineering
  3. 3. The Bias-Variance TradeoffTel Aviv UniversityDepartment of Industrial Engineering
  4. 4. Presentation Layout  Bayesian networks and classifiers  Targeted Bayesian Network Learning (TBNL) (with Gruber)  TBNL application on suspects monitoring  SummaryTel Aviv UniversityDepartment of Industrial Engineering 4/35
  5. 5. Bayesian Networks (Pearl, 85)Tel Aviv UniversityDepartment of Industrial Engineering
  6. 6. What is a Bayesian Network? Joint Probability B ( G , Θ ) encodes the domain’s JPD Distribution X1 X2 X3 X4 Prob. 1 1 1 2 0.083 G  V , E  = Directed Acyclic Graph 1 1 2 2 0.167 1 2 2 3 0.25 2 2 1 1 0.25 2 2 2 1 0.25 Θ(X 3) X2 1 2 1 0.33 0.33 2 0.67 0.67 A Complete Factorization Bayesian Network P (X )  P ( X 2 )P ( X 3 | X 2 )P ( X 4 | X 3, X 2 )P( X 1 | X 4, X 3, X 2 )Tel Aviv UniversityDepartment of Industrial Engineering 6/35
  7. 7. Explain or Predict (classify) Chow & Liu (1968) TBNL Tree / GBN Williamson (2000) Gruber & Ben-Gal (2010) p(X ) p(X ) True distribution q(X ) q(X ) Modeled distribution p(X ) pX   p  X i | x  p x  Objective i x  X X i Principle Minimize D KL  p  X  || q  X   Minimize D KL  p  X i  || q  X i   Maximize I X i; Z i  Maximize  I  X i ;Zi  Consequence Maximize  I  X  i j ;Z j X jZ iTel Aviv UniversityDepartment of Industrial Engineering 11/35
  8. 8. Unconstrained Learning Assume X is the target variable 3 GBN (adding-arrows) Target-Oriented (TBNL) i=1 i=4 i=3 i=4 i=1 Equivalent Encoding!!!Tel Aviv UniversityDepartment of Industrial Engineering 13/35
  9. 9. Constrained Learning Assume X is the target variable 3 GBN (adding-arrows) Target-Oriented (TBNL) i=1 i=4 i=3 i=4 i=1Tel Aviv UniversityDepartment of Industrial Engineering 14/35
  10. 10. Differential Complexity Explain Predict (Classify)  r  t 𝜂 𝑡 = maximum percentage relative information exploitation about the target 𝜂 𝑟 = maximum percentage relative information exploitation about the rest attributesTel Aviv UniversityDepartment of Industrial Engineering
  11. 11. Results (1/2) Data Sets Properties and Testing Methods Dataset # Attributes # Classes # Instances Test Instances/Attributes Ratio australian 14 2 690 CV5 ~49 breast 9 2 683 CV5 ~76 chess 36 2 3196 holdout ~89 cleve 11 2 196 CV5 ~18 corral 6 2 128 CV5 ~21 crx 15 2 653 CV5 ~44 german 20 2 1000 CV5 ~50 glass 9 7 214 CV5 ~24 Iris 5 3 150 CV5 ~30 lymphography 18 4 148 CV5 ~8 mofn-3-7-10 10 2 1324 holdout ~132 vote 16 3 435 CV5 ~27Tel Aviv UniversityDepartment of Industrial Engineering 16/35
  12. 12. Naïve Bayes: Predict Corral Dataset Class A0 B0 Correlated Irrelevant A1 B1Tel Aviv UniversityDepartment of Industrial Engineering 17/35
  13. 13. Tree Augmented Network (TAN) Class Class Class Correlated Irrelevant B0 Irrelevant A0 Correlated A1 A0 A0 B0 B1 Irrelevant B0 A1 A1 Correlated B1 B1 Class Class Class B1 A0 A1 A1 B0 B1 Irrelevant B0 B0 A0 A0 Correlated A1 Irrelevant Irrelevant B1 Correlated CorrelatedTel Aviv UniversityDepartment of Industrial Engineering 18/35
  14. 14. Managing the Trade-off CV5 CV5 Holdout 2/3:1/3Tel Aviv UniversityDepartment of Industrial Engineering 20/35
  15. 15. Results (2/2) Accuracy Dataset TBNL BNC-2P NB TAN C4.5 HGC australian 83.3 87.0 85.1 82.5 84.9 85.6 breast 95.9 95.8 97.6 96.5 93.9 97.6 chess 96.9 95.8 87.3 92.4 99.5 95.3 cleve 81.4 80.0 82.1 78.4 79.4 78.7 corral 100.0 98.8 87.2 98.6 98.5 100.0 crx 86.4 84.2 85.0 83.7 86.1 86.9 german 69.7 73.6 75.4 73.9 72.9 72.5 glass 60.0 58.3 55.9 54.2 59.3 31.2 Iris 97.0 95.8 93.0 92.4 96.0 95.7 lymphography 81.8 83.7 83.4 82.2 78.4 63.8 mofn-3-7-10 100.0 91.4 86.7 91.5 84.0 86.7 vote 96.0 95.8 90.1 94.9 94.7 95.4 Average 87.4 86.7 84.1 85.1 85.6 82.4 StdE 4% 3% 3% 4% 3% 6% Best & worst methods (incl. 5% runner up) in Bold & Italic respectively Paired t-tests show significanceTel Aviv UniversityDepartment of Industrial Engineering 21/35
  16. 16. Presentation Layout  Bayesian networks and classifiers  Targeted Bayesian Network Learning (TBNL)  TBNL application on suspects monitoring (w. Gruber & Yanovski)  SummaryTel Aviv UniversityDepartment of Industrial Engineering 22/35
  17. 17. Domain Description  Motivation  Simplicity: complexity-error tradeoff  Information extraction: utilization of meta-data  Support: help the expert understand  Available Data  CDR  Privatized  Laundered  Requirements  50% Recall with 1% False Alarm at mostTel Aviv UniversityDepartment of Industrial Engineering 23/35
  18. 18. Data Description of the Domain Call Detail Record (CDR) Field Description Main party Monitored Object unique IDENTIFIER Other party Other Party unique IDENTIFIER year Year of call start month Month of call start day Day of call start hour Hour of call start minute Minute of call start second Second of call start duration Call duration in Seconds caller Indication of call initiator : {1/0} 1 – main party initiated the call 0 – other party initiated the call type_id Type of interaction initiator : {1/0} 1 - phone call 0 - sms (text message) tag Type (group) of monitored Object : {1/0} 0 – main party is a non-target 1 – main party is a targetTel Aviv UniversityDepartment of Industrial Engineering 24/35
  19. 19. ROC curve 40 suspects to no avail 1900 missed targetsTel Aviv UniversityDepartment of Industrial Engineering 27/35
  20. 20. Feature Extraction Activity of calls during the day of two distinct groups Inter_prc_q1, Inter_prc_q2, Inter_prc_q3, Inter_prc_q4 – percentage of activities in 1st, 2nd, 3rd and 4th quarter of the dayTel Aviv UniversityDepartment of Industrial Engineering 28/35
  21. 21. Learning & Mining Mobility Patterns(PI’s: Ben-Gal, Toch and Lerner, 2012)
  22. 22. Conclusions  “To Explain or to Predict” – “To know or to Act” (constraint modeling)  Managing the error-complexity tradeoff!  An “engineering approach” to modeling  Target-based BN Learning (2006), Gruber and Ben-Gal (2010)…  Vs-optimality criterion  min V(x*), Ginsburg and Ben-Gal (2006)  VOBN Ben-Gal et at (2005) – scenario dependent  More….Tel Aviv UniversityDepartment of Industrial Engineering 32/35
  23. 23. Prediction can help…Tel Aviv UniversityDepartment of Industrial Engineering

×