Debs 2012 uncertainty tutorial

1,355 views

Published on

DEBS 2012 tutorial on event processing uncertainty

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,355
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
57
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Debs 2012 uncertainty tutorial

  1. 1. Event processing under uncertainty TutorialAlexander Artikis1, Opher Etzion2 , Zohar Feldman2, & Fabiana Fournier2 3 DEBS 2012 July 16, 20121 NCSR Demokritos, Greece2 IBM Haifa Research Lab, Israel3 We acknowledge the help of Jason Filippou and Anastasios Skarlatidis
  2. 2. Event uncertainty from a petroleum producer point of view
  3. 3. Outline Topic Time SpeakerPart I: Introduction and illustrative example 30 minutes Opher EtzionPart II: Uncertainty representation 15 minutes Opher EtzionPart III: Extending event processing to handle uncertainty 45 minutes Opher EtzionBreakPart IV: AI based models for Event Recognition under 80 minutes Alexander ArtikisUncertaintyPart V: Open issues and summary 10 minutes Opher Etzion
  4. 4. Event processing under uncertainty tutorialPart I:Introduction and illustrative example
  5. 5. By 2015, 80% of all available data will be uncertain By 2015 the number of networked devices will be double the entire global population. All 9000 sensor data has uncertainty. 8000 100Global Data Volume in Exabytes 90 The total number of social media 7000 accounts exceeds the entire global Aggregate Uncertainty % 80 population. This data is highly uncertain 6000 in both its expression and content. 70 5000 60 4000 Data quality solutions exist for 50 enterprise data like customer, 3000 40 product, and address data, but this is only a fraction of the 30 total enterprise data. 2000 20 1000 10 0 Multiple sources: IDC,Cisco 2005 2010 2015
  6. 6. The big data perspective Big data is one of the three technology trends at the leading edge a CEO cannot afford to overlook in 2012 [Gartner #228708, January 2012} Dimensions of big data Velocity Variety (data in motion) (data in many forms ) Veracity Volume (data in doubt) (data processed in RT)
  7. 7. The big data perspective in focus Big data is one of the three technology trends at the leading edge a CEO cannot afford to overlook in 2012 [Gartner #228708, January 2012} Dimensions of big data Velocity Variety (data processed (data in many forms ) in RT) Veracity Volume (data in doubt) (data in motion)
  8. 8. Uncertainty in event processingState-of-the-art systems assume that data satisfies the “closed world assumption”, beingcomplete and precise as a result of a cleansing process before the data is utilized. Processing data is deterministicThe problem In real applications events may be uncertain or have imprecise content for various reasons (missing data, inaccurate/noisy input; e.g. data from sensors or social media) Often, in real time applications cleansing the data is not feasible due to time constraints Online data is not leveraged for immediate operational decisions Mishandling of uncertainty may result in undesirable outcomesThis tutorial presents common types of uncertainty and differentways of handling uncertainty in event processing
  9. 9. Representative sources of uncertaintyThermometer Source Visual data Source Inaccuracy RumorHuman error Malfunction UncertainFake tweet input data/ Malicious Sampling Wrong trendSensor disrupter Events Source or approximationWrong hourly Projectionsales summary of temporal Propagation Inference based anomalies of on uncertain uncertainty value
  10. 10. Event processing Authoring Rules / Patterns Tool Events Definitions Build Time Run Time Detected Situations Runtime Actions Engine Event Situation Detection Sources
  11. 11. Types of uncertainty in event processing imprecise event patterns Authoring Rules / Patterns Tool Events Definitions Build Time Run Time Detected Situations Runtime Actions Engine Event Situation Detection Sourcesincomplete event insufficient event erroneous eventstreams dictionary recognition inconsistent event annotationUncertainty in the event input, in the composite event pattern, in both
  12. 12. Illustrative example – Surveillance system for crime detectionA visual surveillance system provides data to an event drivenapplication.The goal of the event processing application is to detect andalert in real-time possible occurrences of crimes (e.g. drugdeal) based on the surveillance system and citizens’ reportsinputs.Real time alerts are posted to human security officers 1. Whenever the same person is observed participating in a potential crime scene more than 5 times in a certain day 2. Identification of locations in which more than 10 suspicious detections were observed in three consecutive days 3. Report of the three most suspicious locations in the last month
  13. 13. Some terminologyEvent producer/consumerEvent typeEvent streamEvent Processing Agent (EPA)Event Channel (EC)Event Processing Network (EPN)Context (temporal, spatial and segmentation)
  14. 14. Illustrative example – Event Processing Network (EPN) Crime report Citizens Crime report Matched crime Alert matching police Surveillance Criminals video data store Known criminalObservation Alert special Mega suspect security forces Crime Mega suspect Split mega observation detection suspect Special Suspicious Discovered security Observation criminal forces Alert Suspicious patrol Police location forces Suspicious location detection Top locations EPA- Event Processing Agent Report Most frequent stat. Producer/consumer locations police Channel dept.
  15. 15. Crime observation Citizens Crime report Alert matching police Surveillance Criminals video data store Alert special security forces Crime Mega suspect Split mega Uncertainty observation detection suspect Special security forces Wrong identification Suspicious location Alert patrol forces Police detection Missing events Noisy pictures EPA- Event Processing Agent Producer/consumer Most frequent locations Report stat. police Channel dept. Observation Filter Suspicious observation ...... Assertion: Crime indication = ‘true’ ...... Crime Indication Crime Indication ….. Context: Always ….. ...... ...... Identity
  16. 16. Mega Suspect detection Citizens Crime report Alert matching police Surveillance Criminals video data store Alert special security forces Crime Mega suspect Split mega Uncertainty observation detection suspect Special security forces Wrong identification Suspicious location Alert patrol forces Police detection Missing events Wrong threshold EPA- Event Processing Agent Producer/consumer Most frequent locations Report stat. police Channel dept. Suspicious Mega observation Threshold suspect ...... Assertion: Count (Suspicious observation) Crime Indication ...... >5 Suspect ….. ...... Context: Daily per suspect ….. Identity ......
  17. 17. Split Mega Suspect Citizens Crime report Alert matching police Surveillance Criminals video data store Alert special security forces Crime Mega suspect Split mega Uncertainty observation detection suspect Special security forces Wrong identification Suspicious location Alert patrol forces Police detection Missing events Wrong threshold EPA- Event Processing Agent Producer/consumer Most frequent locations Report stat. police Channel dept. Known criminal Mega Split suspect ...... Picture ...... Emit Known criminal when ….. Suspect exists (criminal-record) ...... ….. Emit Discovered criminal otherwise ...... Context: Always Discovered criminal ...... Picture ….. ......
  18. 18. Crime report matching Citizens Crime report Alert matching police Surveillance Criminals video data store Alert special security Uncertainty forces Crime Mega suspect Split mega observation detection suspect Special Missed events Alert security forces Suspicious patrol Police Wrong sequence location detection forces Inaccurate location EPA- Event Processing Agent Report Most frequent stat. Inaccurate crime type Producer/consumer Channel locations police dept. Suspicious observation Matched ...... Sequence crime Crime Indication ….. (Suspicious observation [override], ...... ...... Crime Indication Crime report) Identity ….. Context: Crime type per location ...... Identity Crime report ...... Crime Indication ….. ...... Identity
  19. 19. Suspicious location detection Citizens Crime report Alert matching police Surveillance Criminals video data store Alert special security forces Crime Mega suspect Split mega Uncertainty observation detection suspect Special security forces Missed events Suspicious location Alert patrol forces Police detection Wrong timing Inaccurate location EPA- Event Processing Agent Producer/consumer Most frequent locations Report stat. police Channel dept. Suspicious Suspicious Suspicious Suspicious observation Suspicious observation Threshold location observation...... observation ...... ......Crime Indication ...... Crime Indication Assertion: Count (Suspicious observation)>10 ......….. Crime Indication Crime ….. ….. Crime Indication Context: Location, Sliding 3 day interval...... ….. ….. ...... ......Identity Identity location ...... Identity Identity Identity
  20. 20. Most frequent locations Citizens Crime report Alert matching police Surveillance Criminals video data store Alert special security forces Crime Mega suspect Split mega Uncertainty observation detection suspect Special security forces Derived from uncertainty Suspicious location Alert patrol forces Police in Suspicious location detection events EPA- Event Processing Agent Producer/consumer Most frequent locations Report stat. police Channel dept. Suspicious Suspicious Suspicious observation Suspicious observation Top-K Top locations observation...... location ...... ......Crime Indication Crime Indication Order by: count (Suspicious location) ......….. ...... Indication Crime ….. Context: Month Top locations list...... Crime ….. ...... ….. ….. ......Identity Identity ...... Identity location Identity
  21. 21. Event processing under uncertainty tutorialPart II:Uncertainty representation
  22. 22. Occurrence uncertainty: Uncertainty whether an event hasoccurredExampleEvent type: Crime observationOccurrence time: 7/7/12, 12:23 The event header has an attribute with label “certainty”Certainty: 0.73 with values in the range of [0,1]… In raw events: In derived events: Given by the source (e.g. Calculated as part of sensor accuracy, the event derivation level of confidence)
  23. 23. Temporal uncertainty: Uncertainty when an event has occurredExampleEvent type: Crime observation The event header has anOccurrence time: U(7/7/12, attribute with the label “occurrence12:00, 7/7/12 13:00) time” which may designate:Certainty: 0.9 timestamp, interval, or distribution over an interval… In derived events: In raw events: Calculated as part of Given by the source the event derivation
  24. 24. The interval semantics1. The event happens during the Occurrence time type: interval entire interval2. The event happens at some Occurrence time type: timestamp, unknown time-point within the instance type: interval interval3. The event happens at some time- Occurrence time type: timestamp, point within an interval and the instance type: interval + distribution distribution of probabilities is known Example: U(7/7/12, 12:00, 7/7/12 13:00)
  25. 25. Spatial uncertainty: uncertainty where an event has occurredExample The event header has an attribute withEvent type: Crime observation the label “location” which may designate:Occurrence time: U(7/7/12, point, poly-line, area and distribution of12:00, 7/7/12 13:00) points within poly-line and area.Certainty: 0.9 Locations can be specified either inLocation: near 30 Market Street, coordinates or in labels (e.g. addresses)Philadelphia… In derived events: In raw events: Calculated as part of Given by the source the event derivation
  26. 26. The spatial semantics Examples:A poly-line may designate a road, e.g. Location type: area, Location :=M2 in UK PrincetonAn area may designate a bounded Location type: point Location:= M2geographical area, e.g. the city ofPrinceton.Like temporal interval – the designationof a poly-line or area may be Location type: point Location := Uinterpreted as: the event happens in the (center = city-hall Princeton,entire area, the event happens at radius = 2KM)unknown point within an area, the eventhappens within the area with somedistribution of points
  27. 27. Uncertainty about the event content – attribute values (meta data) Meta-data definition: attribute values may be probabilistic
  28. 28. Uncertainty about the event content – attribute value exampleInstance:“Crime indication” hasexplicit distribution
  29. 29. Uncertainty in situation detectionThresholdAssertion: Count (Suspicious observation) >5 Recall theContext: Daily per suspect “Mega suspect detection”Threshold The “Mega suspect detection” can haveAssertion: Count (Suspicious observation) some probability, furthermore there can be > 5 with probability = 0.9 some variations (due to possible missing = 5 with probability = 0.45 event) when count =5, count = 4, etc… = 4 with probability = 0.2 Thus it may become a collection ofContext: Daily per suspect assertions with probabilities
  30. 30. Event processing under uncertainty tutorialPart III:Extending event processing to handleuncertainty
  31. 31. Uncertainty handlingTraditional event processing needs to be enhanced to accountfor uncertain eventsTwo main handling methods: Uncertainty propagation The uncertainty of input events is propagated to the derived events Uncertainty flattening Uncertain values are replaced with deterministic equivalents; events may be ignored.
  32. 32. Topics covered 1. Expressions* 2. Filtering 3. Split 4. Context assignment 5. Pattern matching examples: a. Top K b. Split * Note that transformation and aggregations are covered by expressions.
  33. 33. Expressions (1/4)Expressions are used extensively in event-processing in Derivation: assigning values to derived events Assertions: filter and pattern detectionExpressions comprise a combination ofOperators Arithmetic functions, e.g. sum, min Logical operators, e.g. >, = Textual operators, e.g. concatenationOperands (terms) referencing particular event attributes general constants
  34. 34. Expressions (2/4)Example: Summing the number of suspicious observations in two locations location1.num-observations + location2.num-observationsDeterministic case: 12 + 6 = 18Stochastic case: + = 11 12 13 4 5 6 7
  35. 35. Expressions (3/4)The ‘uncertainty propagation’ approach: Operators overloading Explicit, e.g. Normal(1,1) + Normal(3,1) = Normal(4,2) Numerical convolution (general distributions, probabilistic dependencies) Monte-Carlo methods: generate samples, evaluate the result and aggregateThe ‘uncertainty flattening’ approach: Replacing the original expression usinga predefined policySome examples Expectation: X+Y Expectation(X+Y) Confidence value: X+Y Percentile(X+Y,0.9) X>5 ‘true’ if Prob{X>5}>0.9; ‘false’ otherwise
  36. 36. Expressions (4/4)Probabilistic operators Function Parameters Description expectation (X) X-Stochastic The expectation of the term stochastic term X variance (X) X-Stochastic The variance of the term stochastic term X pdf (X, x) X-Stochastic The probability of X Term, x-value taking value x cdf (X, x) X-Stochastic The probability that X Term, x-numeric takes a value not larger value than x percentile (X,α) X-Stochastic The smallest number Term, α-numeric z such that cdf(X,z)≥α value frequent (X) X-Stochastic A value x that Term maximizes pdf(X,.)
  37. 37. Filtering (1/3)Should the ‘Observation’ event instance pass the filter?Observation Crime observationCertainty 0.8 Assertion: crime-indication = ‘true’Crime-Indication [‘true’, 0.7; ‘false’, 0.3] Context: Always…........
  38. 38. Filtering (2/3)The ‘uncertainty propagation’ approach observation.certainty * Prob{assertion}Observation Suspicious Crime observation observationCertainty 0.8 Assertion: crime-indication = ‘true’ Certainty 0.8x0.7Crime-Indication [‘true’, 0.7; ‘false’, 0.3] Context: Always …..….. ............
  39. 39. Filtering (3/3)The ‘uncertainty flattening’ approach observation.certainty * Prob{assertion} > 0.5Observation Suspicious Crime observation observationCertainty 0.8 Assertion: crime-indication = ‘true’ Certainty 1Crime-Indication [‘true’, 0.7; ‘false’, 0.3] Context: Always …..….. ............
  40. 40. Split (1/3) Known criminal ...... Certainty 0.75 …..Mega suspect Id [‘John Doe1’, 0.3; Split mega-suspect Id is null NA, 0.7]......Certainty 0.75 Pattern: Split….. Context: AlwaysId [‘John Doe1’, 0.3; NA, 0.7] Discovered criminal Id is not null ...... Certainty 0.75 ….. Id [‘John Doe1’, 0.3; NA, 0.7]
  41. 41. Split (2/3) The ‘uncertainty propagation’ approach Known criminal ...... Certainty 0.75*0.3 …..Mega suspect Id ‘John Doe1’ Split mega-suspect......Certainty 0.75 Pattern: Split….. Context: AlwaysId [‘John Doe1’, 0.3; NA, 0.7] Discovered criminal ...... Certainty 0.75*0.7 ….. Id NA
  42. 42. Split (3/3) The ‘uncertainty flattening’ approach Id → frequent(Id)Mega suspect Discovered Split mega-suspect criminal...... ......Certainty 0.75 Pattern: Split Certainty 0.75….. Context: AlwaysId [‘John Doe1’, 0.3; ….. NA, 0.7] Id NA
  43. 43. Context assignment – temporalExample 1 (1/3)Should the ‘Suspicious observation’ event be assigned to the context segment ‘July16, 2012, 10AM-11AM’ ? Suspicious Observation Mega suspect detection Certainty 0.8 Occurrence time Uni(9:45AM,10:05AM) Id ‘John Doe’ Context segment: ‘John Doe’, ‘July 16, 2012, 10AM-11AM’ ….. ......
  44. 44. Context assignment – temporalExample 1 (2/3)The ‘uncertainty propagation’ approach Prob{occurrence time ∊ [10AM,11AM]} * certainty Suspicious Observation Mega suspect detection Certainty 0.2 Occurrence time Uni(9:45AM,10:05AM) Id ‘John Doe’ Context segment: ‘John Doe’, ‘July 16, 2012, 10AM-11AM’ ….. ......
  45. 45. Context assignment – temporalExample 1 (3/3)The ‘uncertainty flattening’ approach Prob{occurrence time ∊ [10AM,11AM]} > 0.5 Suspicious Observation Mega suspect detection Certainty 0.2 Occurrence time Uni(9:45AM,10:05AM) Id ‘John Doe’ Context segment: ‘John Doe’, ‘July 16, 2012, 10AM-11AM’ ….. ......
  46. 46. Context assignment – temporalExample 2Should the uncertain ‘Suspicious observation’ event initiate a new contextsegment? Participate in an existing context segmentation? Suspicious observation detection Suspicious Observation Context : ‘John Doe‘ State: 1 instance Certainty 0.3 Occurrence time Uni(9:45AM,10:05AM) Id ‘John Doe’ ….. ...... Suspicious observation detection Context : ‘John Doe’
  47. 47. Context assignment – Segmentation-oriented (1/3)Which of the context segmentations should the ‘Suspicious observation’ event beassigned to? Mega suspect detection Context segment: ‘John Doe1’, Suspicious Observation ‘July 16, 2012, 10AM-11AM’ Certainty 0.8 Occurrence time Uni(9:45AM,10:05AM) Id [‘John Doe1’, 0.3; ‘John Doe2’, 0.7] Mega suspect detection Context segment: ‘John Doe2’, ‘July 16, 2012, 10AM-11AM’
  48. 48. Context assignment – Segmentation-oriented (2/3)The ‘uncertainty propagation’ approach observation.certainty * Prob{‘John Doe1’} Suspicious Observation Mega suspect detection Certainty 0.24 Occurrence time Uni(9:45AM,10:05AM) Context segment: ‘John Doe1’, Id [‘John Doe1’, 0.3; ‘John Doe2’, 0.7] ‘July 16, 2012, 10AM-11AM’ Suspicious Observation Mega suspect detection Certainty 0.56 Context segment: ‘John Doe2’, Occurrence time Uni(9:45AM,10:05AM) Id [‘John Doe1’, 0.3; ‘July 16, 2012, 10AM-11AM’ ‘John Doe2’, 0.7] observation.certainty * Prob{‘John Doe2’}
  49. 49. Context assignment – Segmentation-oriented (3/3)The ‘uncertainty flattening’ approach Associate event to context segmentation frequent(Id) Mega suspect detection Context segment: ‘John Doe1’, Suspicious Observation ‘July 16, 2012, 10AM-11AM’ Certainty 0.8 Occurrence time Uni(9:45AM,10:05AM) Id [‘John Doe1’, 0.3; ‘John Doe2’, 0.7] Mega suspect detection Context segment: ‘John Doe2’, ‘July 16, 2012, 10AM-11AM’
  50. 50. Pattern matching: Top-K (1/3)(aka event recognition)What is the location with the most crime indications? Suspicious Suspicious Suspicious location location location ...... ...... ...... Crime Crime Certainty 0.3 ….. Count= ….. Id ….. Id Id Location ‘A’ 11 12 13 Location ‘A’ Location ‘downtown’ Most frequent locations Order= Pattern: Top-K (K=1) Suspicious Suspicious Order by: count (suspicious location) Suspicious location Suspicious location location Context: Month ......location ...... ...... Crime ...... Crime …..Certainty 0.75 Certainty ….. 0.75 Count= Id ….. Id ….. Id Location ‘A’ Id Location ‘A’ Location ‘suburbs’ 10 11 12 13 14 Location ‘suburbs’
  51. 51. Pattern matching: Top-K (2/3)The ‘uncertainty propagation’ approach Suspicious Suspicious Suspicious location location location Most frequent locations ...... ...... ...... Crime Crime Certainty 0.3 ….. Count1= Pattern: Top-K (K=1) ….. Id ….. Order by: count (Suspicious location) Id Id Location ‘A’ 11 12 13 Location ‘A’ Location ‘downtown’ Context: Month Suspicious (‘downtown’,’suburbs’) w.p. Prob{count1>count2} Suspicious Suspicious Order= location Suspicious location location (‘downtown’,’suburbs’) w.p. Prob{count1>count2} ......location ...... ...... Crime ...... Crime …..Certainty 0.75 Certainty ….. 0.75 Count2= Id ….. Id ….. Id Location ‘A’ Id Location ‘A’ Location ‘suburbs’ 10 11 12 13 14 Location ‘suburbs’
  52. 52. Pattern matching: Top-K (3/3)The ‘uncertainty flattening’ approach Suspicious Suspicious location location ...... ...... Crime Crime ….. Count= Id….. Location ‘A’ Id 11 12 13 Top suspicious Location ‘downtown’ Most frequent locations location list ...... Certainty 0.3 Pattern: Top-K (K=1) Suspicious Suspicious ….. Suspicious location Order by: f(count(suspicious location)) Id Suspicious location location Context: Month list ‘downtown’ ......location ...... ...... Crime ...... Crime …..Certainty 0.75 Certainty ….. 0.75 Count= Id ….. Id ….. Id Location ‘A’ Id Location ‘A’ Location ‘suburbs’ 10 11 12 13 14 Location ‘suburbs’ e.g. expectation, cdf, …
  53. 53. Pattern matching: Sequence (1/3)Should the following events instances be matched as asequence? Suspicious Observation Certainty 0.8 Occurrence time Uni(9:45AM,10:05AM) Id ‘John Doe’ ….. ...... Crime report matching Pattern: Sequence [Suspicious observation, Crime report] Context: Location, Crime type Crime report Certainty 0.9 Occurrence time 10:02AM Id NA ….. ......
  54. 54. Pattern matching: Sequence (2/3)The ‘uncertainty propagation’ approach Suspicious Observation obs.certainty * crime.certainty * Prob{obs.time<crime.time} Certainty 0.8 Occurrence time Uni(9:45AM,10:05AM) Id ‘John Doe’ ….. Matched crime ...... Crime report matching Certainty 0.612 Occurrence time Uni(9:45AM,10:02AM) Pattern: Sequence Id ‘John Doe’ Context: Location, Crime type ….. Crime report ...... Certainty 0.9 Occurrence time 10:02AM Id NA ….. ...... obs.time | obs.time<crime.time
  55. 55. Pattern matching: Sequence (3/3)The ‘uncertainty flattening’ approach Occurrence time → percentile(occurrence time, 0.5) Suspicious Observation Certainty 0.8 Occurrence time Uni(9:45AM,10:05AM) Id ‘John Doe’ ….. Matched crime ...... Crime report matching Certainty 0.72 Occurrence time 9:55AM Pattern: Sequence Id ‘John Doe’ Context: Location, Crime type ….. Crime report ...... Certainty 0.9 Occurrence time 10:02AM Id NA ….. ......
  56. 56. Event processing under uncertainty tutorialPart IV:Event Recognition under Uncertainty inArtificial Intelligence
  57. 57. Event Recognition under Uncertainty in ArtificialIntelligence AI-based systems already deal with various types of uncertainty: Erroneous input data detection. Imperfect composite event definitions. Overview: Logic programming. Markov logic networks.
  58. 58. Human Recognition from Video: Bilattice-based Reasoning Aim: Recognise humans given input data from part-based classifiers operating on video. Uncertainty: Erroneous input data detection. Imperfect composite event definitions. Approach: Extend logic programming by incorporating a bilattice: Consider both supportive and contradicting information about a given human hypothesis. For every human hypothesis, provide a list of justifications (proofs) that support or contradict the hypothesis.
  59. 59. Human Recognition from Video
  60. 60. Human Recognition from Video Evidence FOR hypothesis “Entity1 is a human”: Head visible. Torso visible. Scene is consistent at 1’s position.
  61. 61. Human Recognition from Video Evidence AGAINST hypothesis “Entity1 is a human”: Legs occluded. However, the missing legs can be explained because of the occlusion by the image boundary. If occluded body parts can be explained by a rule, the hypothesis is further supported.
  62. 62. Human Recognition from Video Evidence FOR hypothesis “Entity A is a human”: A is on the ground plane. Scene is consistent at A’s position.
  63. 63. Human Recognition from Video Evidence AGAINST hypothesis “Entity A is a human”: No head detected. No torso detected. No legs detected. Evidence against the hypothesis is overwhelming, so the system infers that A is unlikely to be a human.
  64. 64. Human Recognition from Video Recognising humans given three different sources of information, supporting or contradicting human hypotheses: human(X , Y , S) ← head(X , Y , S) human(X , Y , S) ← torso(X , Y , S) ¬human(X , Y , S) ← ¬scene consistent(X , Y , S) Supporting information sources are provided by head and torso detectors. Contradicting information is provided by the violation of geometric constraints (scene consistent).
  65. 65. Human Recognition from Video:Uncertainty Representation We are able to encode information for and against: Input data (output of part-based detectors) Rules (typically they are less-than-perfect) Hypotheses (inferrable atoms) We do this by defining a truth assignment function: ϕ:L→B where L is a declarative language and B is a bilattice. Some points in the bilattice: < 0, 0 > agnostic < 1, 0 > absolute certainty about truth < 0, 1 > absolute certainty about falsity < 1, 1 > complete contradiction
  66. 66. Human Recognition from Video A perfect part-based detector would detect body parts and geometrical constraints with absolute certainty. head(25 , 95 , 0 .9 ) torso(25 , 95 , 0 .9 ) ¬scene consistent(25 , 95 , 0 .9 )
  67. 67. Human Recognition from Video: Input Data Uncertainty However, in realistic applications, every low-level information is detected with a degree of certainty, usually normalized to a probability. ϕ(head(25 , 95 , 0 .9 )) =< 0 .90 , 0 .10 > ϕ(torso(25 , 95 , 0 .9 )) =< 0 .70 , 0 .30 > ϕ(¬scene consistent(25 , 95 , 0 .9 )) =< 0 .80 , 0 .20 > For simple detectors, if the probability of the detection is p, the following holds: ϕ(detection) =< p, 1 −p >
  68. 68. Human Recognition from Video Traditional logic programming rules are binary: human(X , Y , S) ← head(X , Y , S) human(X , Y , S) ← torso(X , Y , S) ¬human(X , Y , S) ← ¬scene consistent(X , Y , S)
  69. 69. Human Recognition from Video: Rule Uncertainty But in a realistic setting, we’d like to have a measure of their reliability: ϕ(human(X , Y , S) ← head(X , Y , S)) =< 0 .40 , 0 .60 > ϕ(human(X , Y , S) ← torso(X , Y , S)) =< 0 .30 , 0 .70 > ϕ(¬human(X , Y , S) ← ¬scene consistent(X , Y , S)) = < 0 .90 , 0 .10 > Rules are specified manually. There exists an elementary weight learning technique: The value x in <x,y> is the fraction of the times that the head of a rule is true when the body is true.
  70. 70. Human Recognition from Video: Uncertainty Handling It is now possible to compute the belief for and against a given human hypothesis, by aggregating through all the possible rules and input data. This is done by computing the closure over ϕ of multiple sources of information. E.g. if we have a knowledge base S with only one rule and one fact that together entail human S = human(X , Y , S) ← head(X , Y , S), head(25 , 95 , 0 .9 ) the degree of belief for and against the human hypothesis would be: cl(ϕ)(human(X , Y , S)) = cl(ϕ)(p) = p∈S cl(ϕ) human(X , Y , S) ← head(X , Y , S) ∧ cl(ϕ) head(25 , 95 , 0 .9 ) = · · · =< xh , yh >
  71. 71. Human Recognition from Video: Uncertainty Handling Because we have a variety of information sources to entail a hypothesis from (or the negation of a hypothesis), the inference procedure needs to take into account all possible information sources. The operator ⊕ is used for this. The final form of the equation that computes the closure over the truth assignment of a hypothesis q is the following: cl(ϕ)(q) = cl(ϕ)(p) ⊕ ¬ cl(ϕ)(p) S|=q p∈S S|=¬q p∈S Aggregating supporting Negating contradicting information information
  72. 72. Human Recognition from Video: Uncertainty Handling Given this uncertain input ϕ(head(25 , 95 , 0 .9 )) =< 0 .90 , 0 .10 > ϕ(torso(25 , 95 , 0 .9 )) =< 0 .70 , 0 .30 > ϕ(¬scene consistent(25 , 95 , 0 .9 )) =< 0 .80 , 0 .20 > and these uncertain rules ϕ(human(X , Y , S) ← head(X , Y , S)) =< 0 .40 , 0 .60 > ϕ(human(X , Y , S) ← torso(X , Y , S)) =< 0 .30 , 0 .70 > ϕ(¬human(X , Y , S) ← ¬scene consistent(X , Y , S)) = < 0.90, 0.10 > we would like to calculate our degrees of belief for and against the hypothesis human(25 , 95 , 0 .9 ).
  73. 73. Human Recognition from Video: Uncertainty Handling cl(ϕ)(human(25 , 95 , 0 .9 ) = < 0 .4 , 0 .6 > ∧ < 0 .9 , 0 .1 > ⊕ < 0 .3 , 0 .7 > ∧ < 0 .7 , 0 .3 > ⊕¬ < 0 .9 , 0 .1 > ∧ < 0 .8 , 0 .2 > = < 0 .36 , 0 > ⊕ < 0 .21 , 0 > ⊕¬ < 0 .72 , 0 >= < 0 .4944 , 0 > ⊕ < 0 , 0 .72 >=< 0 .4944 , 0 .72 > Evidence against the hypothesis exceeds evidence for the hypothesis. Justifications for the hypothesis: ϕ(head(25 , 95 , 0 .9 )) =< 0 .90 , 0 .10 > ϕ(torso(25 , 95 , 0 .9 )) =< 0 .70 , 0 .30 > Justifications against the hypothesis: ϕ(¬scene consistent(25 , 95 , 0 .9 )) =< 0 .80 , 0 .20 >
  74. 74. Human Recognition from Video: Summary Uncertainty: Erroneous input data detection. Imperfect composite event definitions. Features: Consider both supportive and contradicting information about a given hypothesis. For every hypothesis, provide a list of justifications (proofs) that support or contradict the hypothesis. Note: Human detection is not a typical event recognition problem.
  75. 75. Public Space Surveillance: VidMAP Aim: Continuously monitor an area and report suspicious activity. Uncertainty: Erroneous input data detection. Approach: Multi-threaded layered system combining computer vision and logic programming.
  76. 76. VidMAP: Architecture High-Level Module Standard logic programming reasoning Mid-Level Module Uncertainty elimination Low-Level Module Background Subtraction, Tracking and Appearance Matching on video content
  77. 77. VidMAP: Architecture
  78. 78. VidMAP: Uncertainty Elimination Filter out noisy observations (false alarms) by first determining whether the observation corresponds to people or objects that have been persistently tracked. A tracked object is considered to be a human if: It is ‘tall’. It has exhibited some movement in the past. A tracked object is considered to be a ‘package’ if: It does not move on its own. At some point in time, it was attached to a human.
  79. 79. VidMAP: Event Recognition The following rule defines theft: theft(H, Obj, T ) ← human(H), package(Obj), possess(H, Obj, T ), not belongs(Obj, H, T ) A human possesses an object if he carries it. An object belongs to a human if he was seen possessing it before anyone else.
  80. 80. VidMAP: Event Recognition The following rule defines entry violation: entry violation(H) ← human(H), appear (H, scene, T1 ), enter (H, building door , T2 ), not priviliged to enter (H, T1 , T2 ) ‘building door’ and ‘scene’ correspond to video areas that can be hard-coded by the user at system start-up. An individual is privileged to enter a building if he swipes his ID card in the ‘building door’ area of ‘scene’ or if he is escorted into the building by a friend who swipes his ID card.
  81. 81. VidMAP: Summary Uncertainty: Erroneous input data detection. Features: Uncertainty elimination. Intuitive composite event definitions that are easy to be understood by domain experts. Note: Crude temporal representation.
  82. 82. Probabilistic Logic Programming: Event Calculus Aim: General-purpose event recognition system. Uncertainty: Erroneous input data detection. Approach: Express the Event Calculus in ProbLog.
  83. 83. The Event Calculus (EC) A Logic Programming language for representing and reasoning about events and their effects. Key Components: event (typically instantaneous) fluent: a property that may have different values at different points in time. Built-in representation of inertia: F = V holds at a particular time-point if F = V has been initiated by an event at some earlier time-point, and not terminated by another event in the meantime.
  84. 84. ProbLog A Probabilistic Logic Programming language. Allows for independent ‘probabilistic facts’ prob::fact. Prob indicates the probability that fact is part of a possible world. Rules are written as in classic Prolog. The probability of a query q imposed on a ProbLog database (success probability) is computed by the following formula: Ps (q) = P( fi ) e∈Proofs(q) fi ∈e
  85. 85. ProbLog-EC: Input & Output Input Output 340 0.45 :: inactive(id0 ) 340 0.41 :: leaving object(id1 , id0 ) 340 0.80 :: p(id0 ) =(20.88, −11.90) 340 0.55 :: moving (id2 , id3 ) 340 0.55 :: appear (id0 ) 340 0.15 :: walking (id2 ) 340 0.80 :: p(id2 ) =(25.88, −19.80) 340 0.25 :: active(id1 ) 340 0.66 :: p(id1 ) =(20.88, −11.90) 340 0.70 :: walking (id3 ) 340 0.46 :: p(id3 ) =(24.78, −18.77) ...
  86. 86. ProbLog-EC: Event Recognition 1.0 CE Probability 0.5 0.2 0 10 20 30 40 Time Init Init … Init … Term Term … Term …
  87. 87. Uncertainty Elimination vs Uncertainty Propagation Crisp-EC (uncertainty elimination) vs ProbLog-EC (uncertainty propagation): Crisp-EC: input data with probability < 0.5 are discarded. ProbLog-EC: all input data are kept with their probabilities. ProbLog-EC: accept as recognised the composite events that have probability > 0.5.
  88. 88. Crisp-EC: Uncertainty Elimination 35000 30000 25000 SDE Occurrences 20000 15000 10000 5000 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Noise (Gamma distribution mean)
  89. 89. Uncertainty Elimination vs Uncertainty Propagation 1 0.9 Meeting 0.8 0.7 0.6 F measure 0.5 0.4 0.3 Crisp-EC 0.2 ProbLog-EC 0.1 0 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Gamma Distribution Mean meeting (Id1 , Id2 ) = true initiated if active(Id1 ) happens, close(Id1 , Id2 ) = true holds, person(Id2 ) = true holds, not running (Id2 ) happens
  90. 90. Uncertainty Elimination vs Uncertainty Propagation 1 0.9 Meeting 0.8 0.7 0.6 F measure 0.5 0.4 Crisp-EC 0.3 ProbLog-EC 0.2 0.1 0 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Gamma Distribution Mean meeting (Id1 , Id2 ) = true initiated if active(Id1 ) happens, close(Id1 , Id2 ) = true holds, person(Id2 ) = true holds, not running (Id2 ) happens
  91. 91. Uncertainty Elimination vs Uncertainty Propagation 1 0,9 Moving 0,8 0,7 0,6 F measure 0,5 0,4 0,3 0,2 0,1 0 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Gamma Distribution Mean moving (Id1 , Id2 ) = true initiated iff walking (Id1 ) happens, walking (Id2 ) happens, close(Id1 , Id2 ) = true holds, orientation(Id1 ) = O1 holds, orientation(Id2 ) = O2 holds, |O1 − O2 | < threshold
  92. 92. Uncertainty Elimination vs Uncertainty Propagation 1 0.9 Moving 0.8 0.7 F measure 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Gamma Distribution Mean moving (Id1 , Id2 ) = true initiated iff walking (Id1 ) happens, walking (Id2 ) happens, close(Id1 , Id2 ) = true holds, orientation(Id1 ) = O1 holds, orientation(Id2 ) = O2 holds, |O1 − O2 | < threshold
  93. 93. Uncertainty Elimination vs Uncertainty Propagation ProbLog-EC clearly outperforms Crisp-EC when: The environment is highly noisy (input data probability < 0.5) — realistic assumption in many domains, there are successive initiations that allow the composite event’s probability to increase and eventually exceed the specified (0.5) threshold, and the amount of probabilistic conjuncts in an initiation condition is limited.
  94. 94. ProbLog-EC: Summary Uncertainty: Erroneous input data detection. Features: Built-in rules for complex temporal representation. Note: Independence assumption on input data is not always desirable.
  95. 95. Probabilistic Graphical Models E.g. Hidden Markov Models, Dynamic Bayesian Networks and Conditional Random Fields. Can naturally handle uncertainty: Erroneous input data detection. Imperfect composite event definitions. Limited representation capabilities. Composite events with complex relations create models with prohibitively large and complex structure. Various extensions to reduce the complexity. Lack of a formal representation language. Difficult to incorporate background knowledge.
  96. 96. Markov Logic Networks (MLN) Unify first-order logic with graphical models. Compactly represent event relations. Handle uncertainty. Syntactically: weighted first-order logic formulas (Fi , wi ). Semantically: (Fi , wi ) represents a probability distribution over possible worlds. P(world) ∝ exp ( (weights of formulas it satisfies)) A world violating formulas becomes less probable, but not impossible!
  97. 97. Event Recognition using MLN Input Markov Logic Networks Output (evidence) Markov Network P(CE =True | SDE) CE (probabilistic inference) Knowledge Base Recognised SDE Grounding CE
  98. 98. Markov Logic: Representation suspect(Id) ⇐ w1 criminal record(Id) ∨ citizen report(Id) ∨ holds Dangerous Object(Id) suspicious transaction(Id1 ) ⇐ w2 suspect(Id1 ) ∧ mega suspect(Id2 ) ∧ deliver (Id1 , Id2 )
  99. 99. Markov Logic: Representation Weight: is a real-valued number. Higher weight −→ Stronger constraint. Hard constraints: Must be satisfied in all possible worlds. Infinite weight values. Background knowledge. Soft constraints: May not be satisfied in some possible worlds. Strong weight values: almost always true. Weak weight values: describe exceptions.
  100. 100. Markov Logic: Input Data Uncertainty Sensors detect events with: Absolute certainty A degree of confidence Example: holds Dangerous Object(Id) detected with probability 0.8. It can be represented with an auxiliary formula. The value of the weight of the formula is the log-odds of the probability. 1.38629 obs holds Dangerous Object(Id) ⇒ holds Dangerous Object(Id)
  101. 101. Markov Logic: Network Construction Formulas are translated into clausal form. Weights are divided equally among the clauses of a formula. Given a set of constants from the input data, ground all clauses. Ground predicates are Boolean nodes in the network. Each ground clause: Forms a clique in the network, and is associated with wi and a Boolean feature. 1 P(X = x) = Z exp ( i wi ni (x)) Z= x ∈X exp( i wi ni (x ))
  102. 102. Markov Logic: Network Construction suspect(Id) ⇐ w1 criminal record(Id) ∨ citizen report(Id) ∨ holds Dangerous Object(Id) suspicious transaction(Id1 ) ⇐ w2 suspect(Id1 ) ∧ mega suspect(Id2 ) ∧ deliver (Id1 , Id2 ) 1 3 w1 ¬criminal record(Id) ∨ suspect(Id) 1 3 w1 ¬citizen report(Id) ∨ suspect(Id) 1 3 w1 ¬holds Dangerous Object(Id) ∨ suspect(Id) w2 ¬suspect(Id1 ) ∨ ¬mega suspect(Id2 ) ∨ ¬deliver (Id1 , Id2 ) ∨ suspicious transaction(Id1 )
  103. 103. Markov Logic: Network Construction For example, the clause: w2 ¬suspect(Id1 ) ∨ ¬mega suspect(Id2 ) ∨ ¬deliver (Id1 , Id2 ) ∨ suspicious transaction(Id1 ) produces the following groundings w2 ¬suspect(alex) ∨ ¬mega suspect(alex) ∨ ¬deliver (alex, alex) ∨ suspicious transaction(alex) w2 ¬suspect(alex) ∨ ¬mega suspect(nick) ∨ ¬deliver (alex, nick) ∨ suspicious transaction(alex) w2 ¬suspect(nick) ∨ ¬mega suspect(alex) ∨ ¬deliver (nick, alex) ∨ suspicious transaction(nick) w2 ¬suspect(nick) ∨ ¬mega suspect(nick) ∨ ¬deliver (nick, nick) ∨ suspicious transaction(nick)
  104. 104. Markov Logic: Network Construction holds_ criminal_ citizen_ report Dangerous_ record (alex) Object (alex) (alex) deliver (alex,alex) mega_ suspect suspect (alex) (alex) suspicious_ suspicious_ deliver deliver transaction transaction (nick,alex) (alex,nick) (nick) (alex) mega_ suspect suspect (nick) (nick) deliver (nick,nick) holds_ criminal_ Dangerous_ citizen_ report record Object (nick) (nick) (nick)
  105. 105. Markov Logic: World state discrimination holds_ criminal_ citizen_ report Dangerous_ record (alex) Object (alex) (alex) deliver (alex,alex) mega_ suspect suspect (alex) (alex) suspicious_ suspicious_ deliver deliver transaction transaction (nick,alex) (alex,nick) (nick) (alex) mega_ suspect suspect (nick) (nick) deliver (nick,nick) holds_ criminal_ Dangerous_ citizen_ report record Object (nick) (nick) (nick) 1 State 1: P(X = x1 ) = Z exp( 3 w1 · 2 + 1 w1 · 2 + 3 w1 · 2 + w2 · 4) = 1 3 1 1 Z e 2w1 +4w2
  106. 106. Markov Logic: World state discrimination holds_ criminal_ citizen_ report Dangerous_ record (alex) Object (alex) (alex) deliver (alex,alex) mega_ suspect suspect (alex) (alex) suspicious_ suspicious_ deliver deliver transaction transaction (nick,alex) (alex,nick) (nick) (alex) mega_ suspect suspect (nick) (nick) deliver (nick,nick) holds_ criminal_ Dangerous_ citizen_ report record Object (nick) (nick) (nick) 1 State 1: P(X = x1 ) = Z exp( 3 w1 · 2 + 1 w1 · 2 + 3 w1 · 2 + w2 · 4) = 1 3 1 1 Z e 2w1 +4w2 1 State 2: P(X = x2 ) = Z exp( 3 w1 · 2 + 1 w1 · 2 + 3 w1 · 2 + w2 · 3) = 1 3 1 1 Z e 2w1 +3w2
  107. 107. Markov Logic: Inference Event recognition: querying about composite events. Large network with complex structure. Exact inference is infeasible. MLN combine logical and probabilistic inference methods. Given some evidence E = e, there are two types of inference: 1. Most Probable Explanation: argmax(P(Q = q|E = e)) q e.g. Maximum Satisfiability Solving. 2. Marginal: P(Q = true|E = e) e.g. Markov Chain Monte Carlo sampling.
  108. 108. Markov Logic: Machine Learning Training dataset: input data annotated with composite events. Input Data Composite Events ... happensAt(active(id1 ), 101) happensAt(walking (id2 ), 101) holdsAt(meeting (id1 , id2 ), 101) orientationMove(id1 , id2 , 101) holdsAt(meeting (id2 , id1 ), 101) ¬close(id1 , id2 , 24, 101) ... happensAt(walking (id1 ), 200) happensAt(running (id2 ), 200) ¬holdsAt(meeting (id1 , id2 ), 200) ¬orientationMove(id1 , id2 , 200) ¬holdsAt(meeting (id2 , id1 ), 200) ¬close(id1 , id2 , 24, 200) ...
  109. 109. Markov Logic: Machine Learning Structure learning: First-order logic formulas. Based on Inductive Logic Programming techniques. Weight estimation: Structure is known. Find weight values that maximise the likelihood function. Likelihood function: how well our model fits to the data. Generative learning. Discriminative learning.
  110. 110. Markov Logic: Discriminative Weight Learning In event recognition we know a priori: Evidence variables: input data, eg simple events Query variables: composite events Recognise composite events given the input data. Conditional log-likelihood function: log Pw (Q = q | E = e) = i wi ni (e, q) − log Ze Joint distribution of query Q variables given the evidence variables E . Conditioning on evidence reduces the likely states. Inference takes place on a simpler model. Can exploit information from long-range dependencies.

×