Automatic Strategies for Decision Support in TriageTelephone


Published on

Automatic Strategies for Decision Support in TriageTelephone

Published in: Technology, Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This is a research and application work that began with a proposal to make improvements on an existing application.
    That application was developed mainly by Dan Rozenfarb at Ilón Software, with Dolphin 6. It is named ExpertCare and it is a decision support tool intended to help telephonic operators in a dispatch call center for medical emergencies, doing what is called telephone triage.
    Before proceeding with my exposition, I would like to remark that the ExpertCare project and its enhancements have a great potential impact. Our interest in it is not only academic. By way of example, one of the success stories involves Cordoba state, where the state government installed this system in 35 positions of its call center to cover Cordoba city and its surroundings. In total, a population of about 1.500.000.
    There were registered peaks of 70,000 calls per month.
    ExpertCare analyzes the symptoms reported by the caller and suggests new questions –about other symptoms– in order to complete a presumptive diagnosis and determine whether an ambulance is needed or not. The system relies upon a knowledge base and a large set of rules used to direct the questioning. But building and maintaining those rules is the most hard and expensive part of the application, and it hinders tailoring it to other requirements or knowledge domains. So we are attempting a different approach that allows us to build automatic strategies for interrogation guidance.
    With this objective in mind, we had to develop a small framework for using it as a virtual lab, where we could simulate patients and calls, and test and benchmark the strategies we built.
  • First, we are going to explain what we mean by telephone triage and delineate the scope of our interest area and objectives.
    Then, we will see how ExpertCare works and its internals and architecture, and later we will show the main features of the framework we work on.
    I will show some statistics of the current knowledge base, which provided us with guidelines for our study and bases to design different strategies. Finally, I will display some result tables to compare the different strategies we tested.
  • All telephone dispatch system for medical emergencies work alike. They can have more or less computer and medical support. But always a patient –or relative– makes a phone call to the dispatch center. There, an operator –with more or less training and skills– answers the call.
    Some initial data are gathered at the beginning of the call, for identifying the patient, but they can also have some use in the analysis of his problem –for example, age and sex determine impossibility of certain syndromes. Also, one or more symptoms reported by the patient are recorded.
    From there on, the dialog continues with further questions and answers until the operator has enough information to decide whether an ambulance must be sent and/or give complementary recommendations for dealing with the situation.
    In the case of ExpertCare, this is a big part of the problem, because the fact that it not only addresses urgencies but also makes recommendations for other, “lighter” problems, amounts for a greater number of possible diagnoses. The scale is 30 diagnoses for emergencies, but more than 1100 overall.
  • Here I will show some screenshots from ExpertCare, to illustrate how it works. It is developed in Dolphin Smalltalk, as I said before.
    This is the first input for an incoming call, a standard form for recording the patient’s personal data.
  • On the left there is a list with all the symptoms in the knowledge base. There the operator picks all symptoms reported by the patient from the beginning.
    Note that we speak about “symptoms” because they represent the majority of this set; actually, there are other kinds of information in this category. For example, some symptoms in the knowledge base are Man, Woman, Senior Adult or Diabetes antecedents.
  • This is the main screen used along the session. In the lower middle part, there are possible diagnoses, each with its score. This is the most important information for the operator. The score grows as new evidence makes a diagnosis more likely. With every new answer all scores are updated. This is necessary because in a real session, patients can contradict themselves and correct or enhance the information they had given before.
    Above this list there appears a question, suggested by the system itself. It asks for a symptom with a broader description, in common, non-technical language that patients and untrained operators can understand.
    On the right side you can see previous questions and answers, which are recorded for statistics and auditing.
  • In the normal course of a session, a group of diagnoses quickly acquires scores differentiating it from the rest. Using this the operator configures a presumptive diagnosis and closes the session, giving the caller indications and advice provided by the system.
  • Roughly speaking, the architecture and design of ExpertCare are similar to those of traditional expert systems. There is an underlying ontology and a knowledge base, composed by concrete symptoms and syndromes. There is an inference engine independent of that knowledge base, that works making deductions from logical predicates –which are the definitions of the syndromes– in the context of the current session.
    It cooperates with the Interrogator, who provides new questions operating with the interrogatory rules. The Scorer evaluates diagnoses for syndromes in the current context.
  • These are the classes of the ontology. They are pretty simple, except for a detail we will see on the next slide.
    Symptoms are elements which do not have any attribute apart from their names. There is an implication relationship defined between symptoms; for instance, Severe abdominal pain implies Abdominal pain.
    Syndromes have attributes for systems –plural ‘systems’, because syndromes may belong to more than one system–, severity and frequency. We are planning to expand this class to encompass more attributes, for instance physiopathology, but they are beyond the scope of the present work.
  • The definition of a syndrome is the relationship that completes the ontology. It is a complex relationship between a syndrome and several alternative sets of symptoms that characterize the syndrome. In order to work with these definitions, it is easier to express them as logical expressions, so that is the representation we implemented.
  • Here we have real data about the size of ExpertCare’s knowledge base.
    This base is pretty small in terms of quantity of objects. It fits well in RAM memory and that simplifies our work by not requiring an external database.
    It is important to notice here the weigth of the set of rules. It is not adequately represented by their quantity, because rules are more complex objects. They become especially complex in their interrelationships. They are fairly harder to maintain and test than the knowledge base. That is why we attempt to replace the set of rules with something automatic and more flexible and adaptable.
    There are other difficulties when working to obtain the rules: symptoms, syndromes and definitions are easy to obtain and check from common technical bibliography. It is relatively easy to achieve consensus on those matters. But it is very hard to get consensuated interrogatory rules from experts where several medical specialties are involved.
    Another advantage of having automated strategies lies in bringing in more objectivity and independence from any personal bias in decisions. Automatic implementation would expose clearly and explicitly the criterion used. Working with the rules, the criterion is always an opinion based on domain expertise.
  • We work with a very simple metric, easy to measure and strictly objective: the amount of questions that an operator should do before closing a session with a sound presumptive diagnosis.
    According to the severity of the situation, our target changes. Emergency situations ask for faster decisions because in many cases they are literally life or death decisions where every second counts.
    There are many other possible metrics. For instance, performance of a decision algorithm in time or in memory space, or microprocessor usage. But beyond some practical and obvious range such things are not relevant to this problem. It is pointless to concentrate on accelerating an algorithm since in a telephonic session question/answer time works at a largely different scale.
  • In this work, we try to generate automated strategies for interrogatories, dynamically adjusted to information acquired during the session. These strategies would only be based on the knowledge base and on information obtained from the current session.
    For this project an experimentation environment was required. We called this the “virtual lab”, a space where we could develop different strategies and simulate interaction in call sessions. The target of these simulations is to study and enhance the strategies, but mostly to measure their performance.
    Of course, this environment is a Smalltalk where we used objects to model every domain concept. There we also instantiated the knowledge base.
  • Main ontology classes are modelled in this very simple hierarchy. We used an abstract class –NamedValue. Its main characteristic is that of representing objects identified by one name. We also use class variables to define repositories for instances, where we store the knowledge base. We also have a protocol for accessing a specific instance or all of them.
    We did not use classes to model the concepts of frequency and severity; instead, we used symbols for them.
  • We defined another simple hierarchy for logical expressions that we use as syndrome definitions. This is almost a book exercise: most operations have a trivial implementation and there are only two domain-related details that stray from the standard.
    Variables are actually linked to symptoms in a session. Most of them only have boolean values, because symptoms are present or not. But some of them are quantified, e.g. temperature or blood pressure. For those values we had to add comparisons with fixed numbers.
  • Observation hierarchy refers to what I said before: symptom observation tells us if it is present or not, or could return a quantity in some cases. An observation is what we have in the answer to a question.
    We started modelling questions and answers as first-class objects, but for this work their behavior was not interesting.
    Diagnoses represent a possible diagnosis for a specific syndrome, in the context of a session. Evaluating syndrome definition in that context, diagnoses could be definitely false –if current information makes definition evaluate false. It could be complete if definition evaluation results in true or just open in any other case.
    CallSessions represent real interrogatory sessions, they save questions and answers and interact with a strategy to get the next appropriate question or to decide if session should end.
  • We needed automation for testing and benchmarking strategies, simulating patient calls. AnswerProvider plays the role of a patient. It knows a specific syndrome and one set of symptoms which characterize it, then it answers consistently the session questions.
    We used Sunit framework for benchmarking. Although we were not doing unit tests, as Andres Valloud demonstrated before, Sunit has several properties that make it a comfortable tool to organize massive essays. We collect the results in text files.
    We added StatisticalCollector later, to provide more navigation capabilities on the knowledge base. For instance, all syndromes of a specific system, or having a red severity. With this statistical collector we began a quantitative study of the knowledge base.
  • This is a plot of quantity of syndromes per system. Every vertical bar stands for a specific system.
    We can see that syndromes are fairly scattered. The first group of three systems stick out, concentrating a big mass of syndromes, although none of them has one hundred, not even ninety. Every other system has less than forty syndromes, most of them have less than twenty and we have a good quantity with ten or less.
  • Refining our plots to see the quantity of syndromes per system and severity, we appreciate the same scattering on a different scale.
    In emergencies, red severity, only two systems have more than ten syndromes and eight systems have only one.
    In urgencies, yellow severity, only two are above twenty, only four above ten, and most of them have less than five.
    In green severity we have two above fifty, but excepting the first group of five, all are below twenty and most have less than ten syndromes.
    These figures show that there is great spreading among systems. But they also tell us that if we could somehow guess the right system and severity, the problem shrinks greatly and becomes manageable. We checked it with experts and this matches some criteria they commonly use.
  • This plot displays in how many systems a symptom appears.
    The vast majority of the symptoms, seventy-five percent, appear in syndromes of only one system. There are few symptoms appearing in more than five systems.
    This information points that in most cases, with one or two symptoms reported by the patient, we might determine the system.
  • I will show an example of operation, to better understand the terms and problems I mentioned before. Now I will follow the simulation of a call, where the patient has Diabetic Ketoacidosis. This syndrome definition has some alternatives for characteristic symptoms, so we could choose among 8 (minimal) sets of symptoms; the simulation will try them all in a sequence of simulated calls.
  • Let’s analyze what happens with a particular combination, Diabetes and confusion. The answerProvider chooses one of them, diabetes, as first reported by the patient. As we can see, this would not be enough yet to determine a system. The session should continue asking, guided by a strategy and by the knowledge base, trying to discover that the patient also has confusion and close with a diabetic ketoacidosis diagnosis.
  • Here we see how the current information reduces the alternatives with positive evidence. This is an application of abductive reasoning, very common in medical applications. From seven hundred, seventy two syndromes, we are going to focus on only 6 syndromes, which have diabetes as a symptom in their definitions.
  • These six syndromes have nine different symptoms that we could choose to ask. If we were going at random, we would risk doing 9 questions, a number too large for an emergency. In this case, a good criterion could be to discern if pregnancy has something to do here.
  • A negative answer gives us good information, one more system is discarded; but we still have 5 syndromes.
  • Knowing the system, we could try a criterion to decide quickly if it is an emergency or not. There is not a criterion that serves well for all cases and that is why we developed different strategies. Let’s go one step more, supposing we somehow choose dyspnea.
  • A negative for this symptom did not add valuable information. Note that a positive one would have!
  • So, that was one “wasted” question and we still should decide which of seven symptoms to ask.
  • We first built a generation or family of strategies with little or no information. They were the first attempts to study the domain and tune the framework. But they also serve as stakeholders: if a strategy yields worse results than asking at random –that is what RandomStrategy does– then it does not deserve attention.
    The second generation was inspired by the statistics we’ve seen on previous slides. Each one represents an attempt to guess somehow the system of a patient’s syndrome, in order to cope with a reduced problem later. But these attempts were not successful.
  • These are the results for the first and second family of strategies in cases of emergency.
    We can only see how inadequate they are. Only MoreSatisfiers and MoreCriticSeparation seem a bit interesting in the first group. In the second, GuessSystemUsingPairs yields a good quantity of questions, but an error rate far too high.
    Note that none of them has 0% of error, but this is because of some internals of the knowledge base and the criterion used to automatically close the session. There are some emergency syndromes with very small sets of symptoms, which are implied by other syndromes. When we simulate a call for a syndrome implying another, it is common to complete the small one before and then to close the session. In real-life cases or non-simulated tests, the operator does not do that.
  • From previous attempts we envisioned the utility of a unique measure for the degree of confirmation or likelihood for a set of syndromes. With it we could know if a group of system/severity is more likely in a context than another, or if plausibility for one group grows or diminishes after a given question.
    With a single indicator it is easier to focus strategies on maximizing it.
    So we defined the support, quite similar to the score ExpertCare already had.
    Basically, we add positive points for syndromes having in its definition symptoms already present in the session, and negative points for symptoms confirmed as absent. We give bonus to syndromes with fully confirmed diagnoses and punish those with false diagnoses.
  • Third-generation strategies are based on different selection criteria, but they all look for a symptom producing the bigger support difference.
    And here results improved dramatically.
    We developed several strategies; some of them were plain failures but most were successful. We were able to reach the amounts of questions that we had established as our target at the beginning.
  • Here you can see for yourselves that for urgency, the target is achieved in the last five: an average between one and two questions, with a median of one and a low error rate.
  • Here we have the performance of support-based strategies on yellow severity. Again, our target is fulfilled in the last group.
  • And here we see how support-based strategies perform when it comes to green severity. The results are surprisingly good in terms of amount of questions. The error rate increases, because it is more probable that one syndrome with higher severity gets confirmed before the “real” one.
  • There were several reasons for which this is a very successful work. It is my thesis work, and that is not a minor detail to me. But I was interested –and so I told my director– in doing something that was not only an interesting research never to be applied. Then he gave me this chance with a real product, that is in the market and works for people’s health.
    Getting rid of the high cost of developing complex sets of interrogatory rules and replacing it with the task of tuning strategies to a new domain would allow the application to make it to new markets.
    From the beginning, we stick to the point of having a scientific way of working, at all levels: discussion, analysis, code, tests… all was addressed with scientific practices.
    Finally, this was a truly interdisciplinary work.
  • You all know Smalltalk’s advantages for modelling and simulation tasks, but I will summarize some features we used specifically in this work.
    The class hierarchies implementing domain concepts and all the virtual lab were developed very quickly. In a couple of part-time weeks we had the environment and could test the first group of strategies.
    From there on, the gap between having an idea, implementing and testing it was minimal. The main stopper was the time required for running the tests through all combinations of symptoms of every syndrome. For some strategies, a full run takes several days (but for the last group, little over one hour).
    Performance problems were attacked to accelerate this long runs of tests. This was solved with simple caches. No complex or sophisticated programming techniques or tools were needed. There is no big or complex data structure, there are not lots of code lines.
    The debugger was the main tool for this work. We used it to run the strategies step by step, verifying by hand its internal operation or analyzing particular cases requiring large numbers of questions.
  • There is much more work to be done. Here we settled some basics, defining a workline and we have a good proof of concept.
    One of the things in our wishlist throughout the job is a visual tool to graphically analyze strategy operation. At the beginning we did not face it because definitions of neighbourhood and navigation were not available. We could have done it before ending the work and it might have been useful for studying some cases. That would be better than using the debugger and the inspector.
    Along that same line, we could have used tools for interacting with sessions and strategies, for a better understanding of their operation.
    It would have been useful to have a tool to configure and run the benchmarks, which was done by adding test methods and running test cases by hand.
  • Besides programming tools, there is a lot of work to be done for integrating the automated strategies into ExpertCare in domain terms.
    We need a place for exceptions and special rules defined by hand. Some will be required to cut cases where automated strategies do not perform well. Some others are to deal with patient psychology and session handling.
    Tests and benchmarks with real interrogatories are a must. All our tests up to here were performed with simulated patients. They never fail, they don’t contradict themselves, they don’t ignore things and they don’t get stuck with questions.
    Finally, we should attempt adaptation to other different knowledge bases in order to verify that these strategies are not relying too much on the properties of this knowledge base.
  • Espacio para preguntas
  • Automatic Strategies for Decision Support in TriageTelephone

    1. 1. Automatic Strategies for Decision Support in Telephone Triage Framework and Testbed in Smalltalk Carlos E. Ferro Director: Dan Rozenfarb
    2. 2. Agenda  Introduction  Software Application: ExpertCare  Overview of the Framework:  Concept representation  Session, automation and simulation  Strategies  Some statistics  Examples and results
    3. 3. Telephone Triage  Phone call from a patient  Initial data gathered  Questions and answers  Presumptive diagnosis  Ambulance dispatch or treatment indications
    4. 4. ExpertCare Initial identification data on a standard form
    5. 5. ExpertCare Selection of all initial symptoms reported by the caller
    6. 6. ExpertCare  New questions are suggested.  Answers are recorded.  Diagnoses are re-evaluated. Question in plain Spanish List of scored diagnosesList of symptoms Session information
    7. 7. ExpertCare As new information is gathered, some diagnoses are separated according to their score
    8. 8. ExpertCare composition Modules of ExpertCare system Inference Engine Interrogator Ontology Knowledge base: Symptoms, Syndromes with attributes Interrogatory Rules Scorer
    9. 9. ExpertCare ontology  Symptoms (e.g. fever, headache, dyspnea)  Syndromes (e.g. appendicitis, osteomyelitis, asthma, schizofrenia)  Systems (e.g. circulatory, digestive)  Severities (red, yellow, green)  Frequencies (high, medium, low)
    10. 10. ExpertCare syndrome definition Syndrome definitions are logical expressions in terms of symptoms. Examples: Definition of Appendicitis : “right iliac fossa pain” AND “abdominal pain” AND NOT “appendix operation” Definition of Massive Obesity : “intense weight increase” OR “intense body fat increase”
    11. 11. ExpertCare size in numbers Rules 3209 Symptoms 2383 Syndromes 673 Other 157 Rules account for 50% of size, but 80% of complexity and 90% of costs. They also hinder software evolution.
    12. 12. Target Our main metrics is the amount of questions:  Red (Emergency): 3 or 4 questions  Yellow (Urgency): 4 or 5 questions  Green: around 6, but may reach 12
    13. 13. Solution approach  Automated strategy  Dynamic interrogatory  Navigation and gathering of information from the knowledge base  Adaptation to session status  Framework for session and strategies  Virtual lab as testbed
    14. 14. Concept representation
    15. 15. Logical expressions for definitions Expression value:aContext satisfiers acceptVisitor:anExpressionVisitor Constant value:aContext satisfiers acceptVisitor:anExpressionVisitor be:aBoolean false true value:aBoolean Variable value:aContext satisfiers acceptVisitor:anExpressionVisitor name:aString named:aString Conjunction value:aContext satisfiers acceptVisitor:anExpressionVisitor operator Comparison value:aContext satisfiers acceptVisitor:anExpressionVisitor Disjunction value:aContext satisfiers acceptVisitor:anExpressionVisitor operator BinaryExpression value:aContext satisfiers acceptVisitor:anExpressionVisitor left:anExpression right:anExpression operator of:anExpressionand:otherExpression Negation value:aContext satisfiers acceptVisitor:anExpressionVisitor expression:anExpression operator of:anExpression
    16. 16. Session, Diagnoses and other
    17. 17. Automation  AnswerProvider simulates a patient/caller  Strategy guides the interrogatory, suggesting #nextQuestionFor: aCallSession  SUnit tests run through all syndromes using different strategies  StatisticalCollector gathers and caches information from the knowledge base
    18. 18. Grouping and statistics
    19. 19. Grouping and statistics Severity red - Syndromes by system 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Severity green - Syndromes by system 0 10 20 30 40 50 60 Severity yellow - Syndromes by system 0 5 10 15 20 25 30
    20. 20. Grouping and statistics
    21. 21. Step-by-step example  This is a typical red syndrome.  According to the definition, AnswerProvider can choose among 8 pairs of symptoms (2x4).  Each one is called a subsyndrome Diabetic Ketoacidosis System: Metabolic Frequency: Low Severity: Red Definition: (Diabetes OR History of diabetes) AND (Unconsciousness OR Confusion OR Ketonic breath OR Dyspnea)
    22. 22. Step-by-step example Choosing clues:  Diabetes Systems pregnancy and metabolic  Confusion  Associated to 9 different systems  Let’s choose Diabetes as a clue, and try to establish the presence of Confusion, in order to make a Diabetic Ketoacidosis diagnosis. Diabetic Ketoacidosis_2 Definition: Diabetes AND Confusion
    23. 23. Example - Step 1 Choosing symptoms to ask: We should try to discern the system from among these two Diabetes 772 syndromes 51 systems 2 systems 18 syndromes positive evidence 6 syndromes 2 systems
    24. 24. Example - Step 1 Choosing symptoms to ask: Using information from the knowledge base and some abductive reasoning, we have 9 candidates left. We choose symptom pregnancy, in order to confirm or discard system pregnancy. System pregnancy 6 syndromes 31 symptoms System metabolic 13 syndromes 48 symptoms System pregnancy 1 syndrome 1 symptom Diabetes System metabolic 5 syndromes 8 symptoms
    25. 25. Example - Step 2 Now we “know” that only one system has chances left. Diabetes Not pregnancy 772 syndromes 51 systems 1 system 5 syndromes positive evidence
    26. 26. Example - Step 2 Now we try to discern severity, first trying to decide whether it is red or yellow. Using information from the knowledge base and some abductive reasoning, we have 8 symptom candidates left. Here is where we need some tool for comparing or choosing among them. For instance, we could ask for symptom dyspnea. System pregnancy 6 syndromes 31 symptoms System metabolic 13 syndromes 48 symptoms System pregnancy 0 syndrome 0 symptom Diabetes Not pregnancy System metabolic 5 syndromes 8 symptoms 1 syndrome 3 syndromes 9 syndromes 1 syndrome 2 syndromes 2 syndromes
    27. 27. Example - Step 3 The new information did not reduce syndromes Diabetes Not pregnancy Not dyspnea 772 syndromes 51 systems 1 system 5 syndromes positive evidence
    28. 28. Example - Step 3 We still try to discern severity, because “not dyspnea” only rejected some branches of some syndromes, but did not reduce the total number. Now we have 7 symptom candidates left. This way, we could use up to 7 more questions to “hit” the symptom that the simulated patient has and make a diagnosis. System metabolic 13 syndromes 48 symptoms Diabetes Not pregnancy Not dyspnea System metabolic 5 syndromes 8 symptoms 1 syndrome 3 syndromes 9 syndromes 1 syndrome 2 syndromes 2 syndromes
    29. 29. Strategies  One family of first attempts, using none or little information: SequentialStrategy, RandomStrategy, MoreSatisfiersStrategy, LessSatisfiersStrategy, MiddleSatisfiersStrategy, MoreCriticSeparationStrategy  Second family, attempting to guess the system by different indicators: GuessSystemByFrequencyStrategy, MoreCorrelationStrategy, LessNegationStrategy, GuessSystemStrategy, GuessSystemUsingPairsStrategy, LessNegationPairStrategy
    30. 30. Results of preliminary strategies Strategy Red Average Median Diag Error % Sev Error % Sequential 739.04 745 19.00 5.58 Random 746.58 732.5 18.05 5.46 LessSatisfiers (*) 726.28 932 18.84 0.00 MiddleSatisfiers 579.28 794 18.05 2.44 MoreSatisfiers 85.29 52 14.23 2.51 MoreCriticSeparation 174.44 35 18.29 7.13 GuessSystemByFrequency 362.37 219 18.18 1.46 GuessSystemBySeverity 92.12 134 75.77 66.15 GuessSystemUsingPairs (**) 9.27 2 40.48 23.81
    31. 31. Strategies - Support We coined the notion of support  Intuitively, it is a numeric representation of the degree of likelihood of a given set of syndromes in the current session.  Calculation is straigthforward.  Syndromes with full diagnoses add a large positive value.  Syndromes with disproved diagnoses add a large negative value.  For the rest, symptoms confirmed add positive value and symptoms negated add negative value. 
    32. 32. SupportSeparationStrategy The third family of strategies is based on support.  Most promising results  15 different strategies  Hierarchy 7 levels deep  Every level evolving from the previous one  SUCCESS according to target
    33. 33. Results of support strategies Strategy Red Average Median Diag Error % Sev Error % SupportSeparation 6.09 3 15.32 5.58 SupportSeparationWithImplication 4.62 2 18.29 7.13 SupportSeparationImplicationTracking 3.98 2 15.32 5.58 SupportSeparationImplicationTrackingClosing 2.95 1 16.86 7.13 SupportMainSyndrome 3.96 2 15.32 5.58 SupportMainSyndromeScoring 5.41 3 15.20 5.70 SupportOnlyPositive 3.34 2 15.20 5.11 SupportOnlyPositiveClosing 2.40 1 16.86 6.77 SupportOnlyPositiveCandidatesClosing 1.85 1 16.50 6.41 SupportOnlyPositiveClosingDifSev 2.12 1 20.55 6.29 SupportOnlyPositiveStrictClosing 2.13 1 16.86 6.77 SupportLessMissingScore 0.22 0 57.84 39.55 SupportMoreCoincidences 1.65 1 16.50 6.41 SupportMoreCoincidencesPassThru_1 1.65 1 16.50 6.41 SupportMoreCoincidencesPassThru_2 1.65 1 16.50 6.41 SupportMoreCoincidencesPassThru_3 1.96 1 15.32 5.23 SupportMoreCoincidencesPassThru_4 2.12 1 15.20 5.10
    34. 34. Results of support strategies Strategy Yellow Average Median Diag Error % Sev Error % SupportSeparation 11.75 6 9.61 3.91 SupportSeparationWithImplication 8.53 6 16.76 9.72 SupportSeparationImplicationTracking 7.31 5 10.33 4.16 SupportSeparationImplicationTrackingClosing 5.94 4 16.25 10.39 SupportMainSyndrome 7.18 5 10.95 4.47 SupportMainSyndromeScoring 7.29 6 10.80 4.88 SupportOnlyPositive 5.11 4 10.69 4.27 SupportOnlyPositiveClosing 4.24 3 12.08 5.91 SupportOnlyPositiveCandidatesClosing 3.15 2 11.21 5.19 SupportOnlyPositiveClosingDifSev 4.65 3 12.90 6.17 SupportOnlyPositiveStrictClosing 3.61 2 11.52 5.40 SupportLessMissingScore 0.48 0 55.89 35.63 SupportMoreCoincidences 2.63 2 11.47 5.45 SupportMoreCoincidencesPassThru_1 2.63 2 11.47 5.45 SupportMoreCoincidencesPassThru_2 2.62 2 11.47 5.45 SupportMoreCoincidencesPassThru_3 3.04 2 11.10 5.09 SupportMoreCoincidencesPassThru_4 3.07 2 10.00 4.99
    35. 35. Results of support strategies Strategy Green Average Median Diag Error % Sev Error % SupportSeparation 25.43 8 18.50 9.97 SupportSeparationWithImplication 17.67 8 23.55 10.66 SupportSeparationImplicationTracking 16.54 6 18.11 9.97 SupportSeparationImplicationTrackingClosing 5.79 1 19.01 16.43 SupportMainSyndrome 16.46 6 18.20 9.97 SupportMainSyndromeScoring 11.92 6 19.65 10.00 SupportOnlyPositive 13.44 5 18.08 10.00 SupportOnlyPositiveClosing 5.18 1 18.34 10.48 SupportOnlyPositiveCandidatesClosing 2.82 1 18.41 14.48 SupportOnlyPositiveClosingDifSev 5.23 1 18.11 9.70 SupportOnlyPositiveStrictClosing 3.39 1 18.14 13.82 SupportLessMissingScore 0.43 0 38.97 13.91 SupportMoreCoincidences 2.12 1 18.29 14.12 SupportMoreCoincidencesPassThru_1 2.12 1 18.29 14.12 SupportMoreCoincidencesPassThru_2 2.11 1 18.29 14.06 SupportMoreCoincidencesPassThru_3 2.24 1 17.66 10.00 SupportMoreCoincidencesPassThru_4 2.27 1 17.39 9.19
    36. 36. Conclusions and remarks It was great doing this work because:  Enhancing the ExpertCare application could have a direct impact on the population’s health.  Automated strategies allow ExpertCare architecture to be used in other domains.  We applied a scientific research approach and techniques to this “real world” software problem.  We learned from Artificial Intelligence, Object- Oriented Programming and Medicine in an interdisciplinary work.
    37. 37. Conclusions and remarks Smalltalk proved to be an adequate tool because:  Representation of the knowledge base was almost trivial.  Building a virtual lab for essays and benchmarks was very easy.  Additional tools for exploring the knowledge base and studying it were easy to implement.  There were no barriers for implementing and testing several strategies with diverse heuristics.  It was easy to get feedback and to debug troublesome cases, in order to enhance and refine strategies
    38. 38. Future work (technical)  A visual tool for representing the session. It should be some navigational metaphor.  The tool could be enhanced for tracing during simulation runs.  More tools for developers to understand and interact with strategy/session.  More tools for better comparative benchmarking.
    39. 39. Future work (domain model)  Integrate with ExpertCare  Incorporate exceptions and special rules  Test with real samples  Try some adaptation to other knowledge bases
    40. 40. The End Questions?