M Sc Thesis Presentation Eitan Lavi

Medical Engineering Data Analysis Framework for Clinical Decision Support for Pediatrics Neuro-Development Disorders eiTanLaVi Advisors: Prof. ShmuelEinav, Biomedical Engineering Department, Tel-Aviv University Prof. Yuval Shahar, Department of Information Systems Engineering, Ben-Gurion University Dr. Mitchell Schertz, Institute for Child Development ,KupatHolimMeuhedet, Central Region, Herzeliya

Introduction – Clinical Domain Pediatric Neuro-Development Disorders (NDD): A delay in development based on that expected for a given age level or stage of development Originate before the age of 18 May be expected to continue indefinitely Constitute a substantial impairment. Biological and non-biological factors are involved in these disorders ,[object Object],Pervasiveness: 1 in 5 Children ,[object Object],[object Object]

NDD – Current Clinical Practice Diagnosis is mainly performed based on an external evaluation of the child The pediatrician relies only on her own (available memory of his) past experience Inadequate Human ability to retrieve prior experience in an unbiased, complete and objective fashion. Need for Experience Based Decision Support System

IntroductionCase-Based Reasoning ,[object Object],[object Object]

Problem Space Institute for Child Development ,KupatHolimMeuhedet, Central Region Collaborating physician – Dr. Mitchell Schertz, head of the institute. Has been building a case-base since 2000 The case base currently holds 1941 non-active children, 2477 active children, 8022 cases. Much of the case information is in free-text form => making this also a TCBR project.

Building the Data Set 465 x 3 1474 x 60 4582 x 19 1143 x 69 437 x 3 5107 x 2 1560 x 43 13133 x 2 8022 x 153 4826 x 2 5227 x 3 n x m == # observations x # attributes 6861 x 1

Preprocessing and Transformations 182 attributes Attribute Type Mapping Date Numeric Binary Textual 8022 Neuroi.ds X Case-Base Factor Dirty Factor Free Text 235 Diagnoses Binary Diagnoses Vector for Case i 8022 Neuroi.ds Y Diagnoses

Similarity Between Cases C1 ,[object Object],a11a12a13 … … a1i C2 a21a22a23 … … a2i the distance between the ithattribute of the two cases The clinical weight/relevance of the attribute

Similarity Metrics Date Distance = month gap Numeric/Binary Distance = normalized Euclidean NA Distance = 0.5 if both fields are NA, -0.5 otherwise Textual Distance = Cosine Similarity of Latent Semantic Analysis (LSA) derived document vectors

LSA - Advantages Strictly mathematical approach - inherently independent of language. Able to perform cross linguistic concept searching and example-based categorization. Automatically adapts to new and changing terminology Has been shown to be very tolerant of noise Deals effectively with sparse, ambiguous and contradictory data Text doesn't have to be in sentence form

a c b d Latent Semantic Analysis1st 4 Steps

Weighted Term-Document Matrix A Local Term Weight: lij–relative frequency of the term iin a document j A Global Term Weight: gi–relative frequency of the term iwithinthe entire corpus

Exploring the Text-Document Matrix

Column = Document Vector in Concept Space Row = Term Vector in Concept Space Singular Values TMatrix ,[object Object]

r = measure of unique dimensions

Columns are eigenvectors of AAT = are axes to define the term spaceDT Matrix ,[object Object]

Rows are eigenvectors of ATA= are axes to define the document space AMatrix ,[object Object]

TDM with weighted term frequenciesSingular Value Decomposition (SVD)

Cosine Similarity Measure c.1 c.2 … c.r Doc.i c.1 c.2 … c.r Doc.j

Build Similarity Matrix for Each Attribute

` N.cases Aggregating Similarity Results for each Test Case t.i N.cases N.Cases N.Attributes

N.Attributes Simple AverageRetrieval Vectors N.Cases

N.Attributes Weighted AverageRetrieval Vectors N.Cases

N.Attributes Sorted Similarity Retrieval Vectors N.Cases

High Similarity Retrieval Vectors N.Attributes N.Cases

Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases The real diagnoses for this test case were: (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR (3) NORMAL EARLY INTELLIGENCE

Prediction evaluation matrix for a specific test case and retrieval method For each test case, prediction vectors were generated using 8 retrieval & prediction methods, for 8 different K values (total 64 per test case) For each test case, prediction vectors were generated using 8 retrieval & prediction methods, for 8 different K values (total 64 per test case)

SAR = 1/3 * (Accuracy + Area under the ROC curve + Root mean-squared error ) = Score combining performance measures of different characteristics, in the attempt of creating a more "robust" measure (cf. Caruana R., ROCAI2004).

F measure – Can help to dynamically choose threshold

M Sc Thesis Presentation Eitan Lavi

Recommended

Recommended

More Related Content

Similar to M Sc Thesis Presentation Eitan Lavi

Similar to M Sc Thesis Presentation Eitan Lavi (20)

M Sc Thesis Presentation Eitan Lavi

Editor's Notes