2. OBJECTIVES
• Assist medical professionals in diagnosis
• Predict probable disease and diagnosis
• Provide personalized healthcare to patients
2
3. MOTIVATION & BACKGROUND
• Too many patients but very few doctors
• Doctors short on time and overlook details
• Lab tests end up in false diagnosis
• Diagnosis is dependent on Doctor’s mood
3
4. MOTIVATION & BACKGROUND
• EMR data is not utilized properly
– Patient’s personal information and medical history
not taken in account
– Patients are often prescribed unnecessary tests
• Demographic characteristics ignored
– Existing expert systems do not take them into account
– These account for significant differences in baselines
4
5. METHODOLOGY
• Extract rules from data provided by UMDC
– This process will make use of Data mining methods
such as Neural Fuzzy learners
• Extract rules from medical literature
– Online repositories such as PubMed, Medscape, and
Wikipedia
– Crawl data from them using web crawlers such as
PHPcrawl
• Take baseline differences in account during rule
generation.
5
6. METHODOLOGY
● Generated rules will be accessible to doctors
–Through an excel spreadsheet containing results
values of lab tests
–Rules presented in a table with each row
denoting test result parameter values for each
disease
–Doctors could add and edit parameter values and
diseases without need for any programming skills
● The rules will then be converted into XML for
updating the expert system
6
7. METHODOLOGY
• Ranked list of possible diseases based on rules
and scoring
• Storage and retrieval of previous diagnosis of
patients to improve accuracy of prediction
7
9. EXTENSIONS
• Use of Symptoms during the prediction
• Medical Analysis based on demographic characteristics such
as gender, residential address etc.
• Integration of expert system with an existing Hospital EMR
• Risk monitoring system to identify patients at risk
9
10. DATA UNDERSTANDING
▪ The Blood Test Data provided by UMDC contains about 200,000 records
▪ Multiple test of about 54,000 patients
▪ Out of these, diagnosis of only 3000 is recorded
▪ Patient Tests:
10
Test
Code
Test name Normal values range
1 Haemoglobin 11.5 – 18 (mg/dl)
17 Urea 10 – 50 (mg%)
18 Creatinine 0.5 – 1.5 (mg%)
25 Potassium 3.8 – 5.2 (ME q/L)
47 Glucose Fasting 70 – 110 (mg%)
48 Glucose Random 80 – 180 (mg%)
15. DATA UNDERSTANDING
• Problems with the data
― Multiple diagnosis of patients at the same date and time
― Test codes inconsistent with the test names
e.g. Haemoglobin records are classified under test code 1 and most of the
Glucose (fasting) records are classified under test code 47. However, a few of
the Glucose (fasting) records are misclassified under test code 1
― Some of the test names are not consistent
e.g Haemoglobin test name is recorded as “Haemoglobin”, “Hb”, and
“Haemoglobin %”
― Human Errors in data entry. E.g. Temperature recorded as 980 *F (prob he
was trying to record 98.0)
15
17. DATA UNDERSTANDING
•Problems with the data
– Multiple test results values are recorded against the same registration number and the same
date and time.
17
18. DATA UNDERSTANDING
–Test Value Inconsistency- above 800 cells found with text such as ‘127 (AFTER GLOCOUSE 01
HR)’ and ‘AFTER 75GRM GLOCOUSE 01HR (92)’
18
19. DATA UNDERSTANDING
–Test Code and Test Name inconsistency problem solved by Excel formulas such
as:=IF(OR(P2="Haemoglobin %",P2="Hb"),"Haemoglobin",P2)
–And
=IF(N2="true",(MID(L2,SEARCH("(",L2)+1,SEARCH(")",L2,SEARCH("(",L2)+1)-SEARCH("(",L2)-1)),N2)
19
24. DATA CLEANING
•Handling missing values: Since a patient whose test reports are cleared will have normal test range
values. So we handled those missing values by inserting the average of normal test range values
24
28. CONCLUSION
• Aim to build a Medical Expert System to assist medical
professionals especially doctors in diagnosis
• Want to make medical literature as a direct support for
diagnosis
• Want to allow patients to be provided personalised treatment
using their medical history
• Wish to serve the medical community as Computer Scientists,
considering the field’s interdisciplinary nature
28