The Tower of Knowledge A Generic System Architecture


Published on

Professor Maria Petrou gave a lecture on "A Classification Framework for Software Component Models" in the Distinguished Lecturer Series - Leon The Mathematician.

More Information available at:

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Tower of Knowledge A Generic System Architecture

  1. 1. The TOWER of KNOWLEDGE: a generic system architecture Maria Petrou Informatics and Telematics Institute Centre for Research and Technology Hellas, andElectrical and Electronic Engineering Department, Imperial College London
  2. 2. An important target of human endeavour:Make intelligent machinesWhat distinguishes something intelligent from something non-intelligent?
  3. 3. Intelligent systems can learn!
  4. 4. The meaning of “learning”Extreme 1 (Mathematical):the identification of the best value of a parameter from training dataExtreme 2 (Cognitive):learning how to recognise visual structures.
  5. 5. Cognitive Learning: The Neural Network paradigm“. . . the generalisation capabilities of the neural network . . . ”• the ability to generalise is the most important characteristic oflearning.
  6. 6. Can NN really generalise?If we do not give them enough examples to populate the classifi-cation space densely enough near the class boundaries, do the NNreally generalise?• Neural networks and pattern classification methods are not learn-ing methods in the cognitive sense of the word.
  7. 7. Cognitive Learning: statistical or not?(1) Evidence against: the ability of humans to learn even from sin-gle examples.(2) Evidence pro: humans actually take a lot of time to learn (12-15years).(1): Makes use of meta-knowledge(2): Learning from lots of examples
  8. 8. Characteristics of learning• Generalisation is an important characteristic of learning• Generalisation in statistical learning may only be achieved by hav-ing enough training examples to populate all parts of the class space,or at least the parts that form the borders between classes• We have true generalisation capabilities, only when what is learntare rules on how to extract the identity of objects and not the classesof objects directly.• If such learning has taken place, totally unknown objects maybe interpreted correctly, even in the absence of any previously seenexamples.
  9. 9. ConclusionWhat we have to teach the computer, in order to construct a cog-nitive system, are relations rather than facts.
  10. 10. Two forms of learning:learning by experimentationlearning by demonstration
  11. 11. Learning by experimentationAn example:A fully automatic segmentation algorithm:perform segmentationassess the quality of the resultadjust the parameterstry again.Conclusion: learning by experimentation requires the presence ofa feed-back loop
  12. 12. The feedback loop when learning by experimentation• The teacher =⇒ Interactive systems• A criterion of performance self-assessment =⇒ Automatic systems
  13. 13. The performance criterion for self-assessment and self-learning• The meta-knowledge one may learn from MANY examples• The meta-knowledge the teacher learnt from MANY examples,transplanted to the brain of the learner!
  14. 14. So• meta-knowledge may take the form not only of relations, but alsoof generic characteristics that categories of objects have;• in interactive systems, meta-knowledge is inserted into the learnercomputer by the human teacher manually;• in automatic systems, meta-knowledge is supplied to the learnercomputer by the human teacher in the form of a criterion of per-formance assessment.
  15. 15. What connects the knowledge with the meta-knowledge?How is meta-knowledge learnt in the first place?Is it only learnt by having many examples, or are there otherways too?
  16. 16. Learning by demonstrationThe potter and his apprentice...
  17. 17. Learning by demonstrationThe potter and his apprentice...• We learn fast, from very few examples only, only when somebodyexplains to us why things are done the way they are done.
  18. 18. How do children learn?By asking lots of “why”s =⇒ we cannot disassociate learning torecognise objects from learning why each object is the way itis.“What is this?”“This is a window.”“Why?”“Because it lets the light in and allows the people to look out.”“How?”“By having an opening at eye level.”“Does it really?”
  19. 19. The tower of knowledge
  20. 20. How can we model these layers of networks and their inter-connections?We have various tools in our disposal:Markov Random Fieldsgrammarsinference rulesBayesian networksFuzzy inference...
  21. 21. Markov Random FieldsGibbsian versus non-Gibbsian
  22. 22. Hallucinating and Oscillating systems• Relaxations of non-Gibbsian MRFs do not converge, but oscillatebetween several possible states.• Optimisations of Gibbs distributions either converge to the rightinterpretation, but more often than not, they hallucinate, ie theyget stuck to a wrong interpretation.
  23. 23. Example of the output of the saliency mechanism of V1: Canny V1 saliency map
  24. 24. Intra-layer interactions in the tower of knowledge may be modelledby non-Gibbsian MRFsWhere are we going to get the knowledge from to construct thesenetworks?Where does the mother that teaches her child get it from?• There is NO universal source of knowledge for the learner child:the network of the child is trained according to the dictations of thenetwork of the mother!
  25. 25. In practice...• Use manually annotated images to learn the Markov dependenciesof region configurations• Define the neighbourhood of a region to be the six regions thatfulfil one of the following geometric constraints: it is above, below,to the left, to the right, it is contained by, or contains the regionunder consideration.D Heesch and M Petrou, “Non-Gibbsian Markov random field mod-els for contextual labelling of structured scenes”. BMVC2007.D Heesch and M Petrou, 2009. “Markov rendom fields with asym-metric interactions for modelling spatial context in structured scenelabelling”. Journal of Signal Processing Systems.
  26. 26. A collection of hundreds of house images...
  27. 27. ...manually segmented and annotated...... from which label relations are learnt...
  28. 28. ... to annotate new images of houses...
  29. 29. relaxing a non-Gibbsian MRF, using graph theory colourings...
  30. 30. obtain the final labelling.
  31. 31. • No global consistency is guaranteed• But no global consistency exists, when the interdependencies be-tween labels are asymmetric.
  32. 32. How can we extract the blobs we label automatically?Saliency is NOT the route!Saliency is related to pre-attentive visionWe need goal driven mechanism to extract the regions that haveto be analysed individually
  33. 33. fitting
  34. 34. M Jahangiri and M Petrou, “Fully bottom-up blob extraction inbuilding facades”. PRIA-9-2008.M Jahangiri and M Petrou, “Investigative Mood Visual AttentionModel”, CVIU, 2011, under review.
  35. 35. Bayesian approaches• Probabilistic relaxation (PR)• Pearl-Bayes networks of inference
  36. 36. Probabilistic relaxation• Impossible objects and constraint propagation:the strife for global consistency of labelling!From:˜fs/305 html/Gestalt/Waltz2.html c
  37. 37. Generalisation of Waltz’s work =⇒ Probabilistic relaxation• Probabilistic relaxation updates the probabilities of various labelsof individual objects by taking into consideration contextual infor-mation• Probabilistic relaxation is an alternative to MRFs for modellingpeer-to-peer context• Probabilistic relaxation DOES NOT lead to globally consistentsolutions!
  38. 38. Perl-Bayes networks of inference• Causal dependences =⇒ these networks are appropriate for inter-layer inference.Problem: How can we choose the conditional probabilities?
  39. 39. • Conditional probabilities may have to be learnt painfully slowlyfrom hundreds of examples.• Conditional probabilities may be transferred ready from anotheralready trained network: the network of the teacher.Such an approach leads us to new theories, like for example the socalled “utility theory”
  40. 40. Utility theory: a decision theory• Assigning labels to objects depicted in an image is a decision.• In the Bayesian framework: make this decision by maximising thelikelihood of a label given all the information we have.• In utility theory, this likelihood has to be ameliorated with a func-tion called “utility function”, that expresses subjective preferencesor possible consequences of each label we may assign.
  41. 41. • Utility function: a vehicle of expressing the meta-knowledge ofthe teacher!Marengoni (PhD thesis, University of Massachusetts (2002)) usedutility theory to select the features and operators that should beutilised to label areal images.Miller et al (ECCV 2000) used as utility function a function thatpenalises the unusual transformations that will have to be adoptedto transform what is observed to what the computer thinks it is.
  42. 42. Modelling the “why” and the “how” in order to answer the“what”Object oi will be assigned label lj with probability pij , given by: pij = p(lj |mi)p(mi) = p(mi|lj )p(lj ) (1)wheremi: all the measurements we have made on object oip(mi): prior probability of measurementsp(lj ): prior probability of labelsMaximum likelihood: assign the most probable label according to(1)
  43. 43. Alternatively: use the information coming from the other layers ofknowledge to moderate formula (1).fk : the units in the “verbs” leveldn: the units in the descriptor level
  44. 44. Choose label lji for object oi as: ji = arg max ujk vkncin pij (2) j n k utility function(i,j)whereujk indicates how important it is for an object with label lj to fulfilfunctionality fk .vkn indicates how important descriptor dn is for an object to be ableto fulfil functionality fk .cin is the confidence we are that descriptor dn applies to object oi.
  45. 45. A learning scheme must be able tolearn the values of ujk and vkn either• directly from examples (slowly and painfully), or• by trusting its teacher, who having learnt those values himselfslowly and painfully, over many years of human life experiences, di-rectly inserts them to the computer learner.The computer learner must have a tool box of processors of sensoryinput to work out the values of cin.
  46. 46. An example
  47. 47. X Mai and M Petrou, “Component Identification in the 3D modelof a Building”. ICPR 2010.
  48. 48. 224 out of the 233 mannualy identified components in 10 buildingswere correctly identified.
  49. 49. When no training data are availableSelf-learning the characteristics of the components iteratively:207 out of the 233 mannualy identified components in 10 buildingswere correctly identified. With training data Without training data EBN ToK Recursive ToK Window 97.7% 98.3% 89.3% Door 76.9% 96.2% 100% Balcony 63.6% 100% 100% Pillar 100% 100% 100% Dormer 22.2% 77.8% 22.2% Chimney 0.0% 62.5% 100% Overall 87.6% 96.1% 88.8%
  50. 50. M Xu and M Petrou, “3D Scene Interpretation by Combining Prob-ability Theory and Logic: the Tower of Knowledge”, CVIU, 2011,under review.
  51. 51. Self-learning the rules that connect the layers by using firstorder logic:M Xu and M Petrou, “Learning logic rules for the tower of knowl-edge using Markov logic networks” International Journal of PatternRecognition and Artificial Intelligence, 2011, accepted.
  52. 52. ConclusionsToK: offers a framework that can incorporate:• static and dynamic information• feedback loop between reasoning and sensors• a way to systematise the aquitition and storage of knowledge• A conceptual framework• Not fully explored yet• Inspired by human language
  53. 53. • Human language offers important cues to the way we acquire andstore knowledge• Language has also a well defined structure, which we may exploitto build around it full intelligent systems• An important characteristic of intelligence is the ability to learn.• Learning is characterised by the ability to generalise
  54. 54. • Generalisation can only be achieved if what is learnt is not thelabels of the objects viewed but the rules by which these labels areassigned• These rules express relationships between concepts that representobjects, actions and characterisations, in association with preposi-tions that represent temporal and spatial relative placements.• Rules and relations may be learnt statistically or, they may bekeyed into the system directly as integral part of it: a direct trans-plantation of knowledge from the teacher (human) to the learner(machine).• Relations may be expressed with the help of probability theory inany of its current forms• Relations tend to be asymmetric
  55. 55. • Asymmetric relations do not create systems with well establishedoptimal states.• Asymmetric relations lead to oscillating systems, where interpre-tations are neither global nore unique.• We do not need globally consistent labellings of scenes (interpre-tations).• What we need is fragments of reality and knowledge.
  56. 56. • Natural systems are not globally consistent: they oscillate betweenstates and we humans mange to survive through this constantly dy-namic, globally inconsistent and ambiguous world.• A robotic system must be able to do the same and perhaps theonly way to succeed in that is to be constructed so that it is contentwith a collection of fragments of understanding.
  57. 57. Some relevant publications1) M Petrou, 2007. “Learning in Computer Vision: some thoughts”.Progress in Pattern Recognition, Image Analysis and Applications.The 12th Iberoamerican Congress on Pattern Recognition, CIARP2007, Vina del Mar-Valparaiso, November, L Rueda, D Mery and JKittler (eds), LNCS 4756, Springer, pp 1–12.2) M Petrou and M Xu, 2007. “The tower of knowledge scheme forlearning in Computer Vision”, Digital Image Computing Techniquesand Applications, DICTA 2007, 3–5 December, Glenelg, South Aus-tralia, ISBN 0-7695-3067-2, Library of Congress 2007935931, Prod-uct number E3067.3) M Xu and M Petrou, 2008. “Recursive Tower of Knowledge”,19th British Machine Vision Conference BMVC2008, Leeds, UK,1–4 September.