The KEDRI Integrated System for Personalised Modelling

701 views
641 views

Published on

Raphael Hu
KEDRI, Auckland University of Technology
(Wednesday, 1.15, Data Analysis Workshop)

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
701
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The KEDRI Integrated System for Personalised Modelling

  1. 1. The KEDRI Integrated System for Personalised Modelling: Software development and experiment results Prof. Nikola Kasabov Dr. Raphael Hu Gary Chen The Knowledge Engineering and Discovery Research Institute (KEDRI) Auckland University of Technology www.kedri.info23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  2. 2. Overview• Introduction• The development of new algorithms and methods for personalised modelling in KEDRI• Software prototype demo• Conclusion and future direction 23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  3. 3. KEDRI: The Knowledge Engineering and Discovery Research Institute at AUT (www.kedri.info)• Established in 2002 by Prof. Nikola Kasabov• Focus: novel information processing methods, technologies and applications for discoveries across different areas of science• Methods are mainly based on personalised modelling, brain information processing, evolution, genetics and quantum physics;23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  4. 4. KEDRI23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  5. 5. Computational Modelling TechniquesGlobal, local and personalized modelling are three main approachesfor modelling and pattern discovery in machine learning area [1].Global modelling creates a model from the data which covers the entireproblem space and is represented by a single function, e.g. a regressionfunction, a RBF, a MLP neural network, SVM, etc.Local modelling builds a set of local models from data, each representing asub-space (e.g. a cluster) of the whole problem space. These models canbe a set of rules or a set of local regressions, etc.Personalised modelling uses transductive reasoning to create a specificmodel for each single data point (e.g. a data vector, a patient record) withina localised problem space.23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  6. 6. Why Personalised Modelling?• The issue of using global modelling for prediction problems: a global model is derived from all available data for the target and then applied to any new patient anywhere at anytime. Prediction and treatment based on global models are only effective for some patients (approx 70%) [2].• Personalised Modelling: The rationale behind personalised modeling paradigm is: since each person is different, the most effective treatment could be only based on the detailed analysis for this particular patient.• The availability of utilising a variety of data: DNA, RNA, protein expression, inheritance, disease, etc.• The benefits of using personalised models for medical applications – To produce better results for classification and prediction – To create the profiling for individuals – To provide a potential improvement scenario for individuals, if it is possible23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  7. 7. Research Objectives of Personalised Modelling• To create accurate personalised computational models: the model is specific for an individual utilising the available information from other individuals related to the same problem.• To develop new algorithms and methods for personalised modelling;• To apply the above proposed algorithms and methods on the data from different sources: gene expression data, protein data, SNPs (single-nucleotide polymorphism) data, clinical data, etc; 23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  8. 8. The Integrated Method for Personalised Modelling (IMPM) for Data Analysis Learning Feature models, e.g. Outcome selection risk probability visualisation Data evaluation, (personalisedRepository disease profiling, risk Similarity Neighbour classification, probability) measurement creation etc. Optimisation (evolutionary computation, snn) The proposed framework and system using IMPM biomedical data analysis [2]23/11/2011 nkasabov@aut.ac.nz;
  9. 9. OptimisationCoevolutionary algorithm (CEA):CEA is derived from evolutionary algorithm. The individuals in CEAare from two or more populations and their assigned fitness valuesbased on their interactions between different populations. A sample of a simple 2-species coevolutionary model.23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  10. 10. Software Architecture of IMPM An example of software architecture of ISPM23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  11. 11. An Integrated Optimisation System for Personalised Modelling (IOSPM)• Cross-platform – implemented by QT which is able to be compiled under different platforms, such as Microsoft Windows, Mac OS, and Linux.• Integrated – combine methods/functions written in different languages (e.g. MATLAB, Python, JAVA and C/C++ etc).• Extensible – new methods/functions can be easily plugged in by editing system schema to generate dynamic GUI interface.23/11/2011 nkasabov@aut.ac.nz;
  12. 12. An overview of the IOSPM system Main GUI Spiking Neural Network -<<UI>> +Select Data file() +Select Optimisation Method() +Select Modelling Method() Lib SVM +Select Data pre-processing() I K-Nearest Neighbor Data Loading N T E R DENFIS F Data Pre-processing A C E PM Optimisation WKNN/W KN W N 1 Visualisation GUI INNTERFACE 2 -<<UI>> +Select Visualisation Mode() +Create Results Report() Data Report Generator Visualisation Mode 1 Visualisation Mode 2 Visualisation Mode 3 MATLAB Python QT XML GUI Executable OpenGL C++ Code Package Package23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  13. 13. Implementation of ISPM An exemplar content of the modules is given below:• Module for Neighbourhood Creation: Euclidean distance method; Hamming distance method; Cosine distance method; Kernel distance methods; other methods.• Module for Classification/Prediction: – Classification methods, such as: MLR, MLP, ECF, wkNN, wwkNN, TWNFI, SVM, eSNN. – Probability prediction methods, such as: DENFIS, TWNFI.• Module for Optimisation: Evolutionary computatio (EC), quantum inspired evolutionary algorithm, particle swarm optimisation (PSO), quantum inspired PSO, other methods.• Module for Task Distribution Centre: This module will control the whole optimisation process, will communicate with the user, will visualise the results.23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  14. 14. Global Modelling vs. Personalised Modelling Colon cancer gene expression data Model Overall accuracy Class 1 Class 2 MLR (global) 72.58% 75.00% 68.18% RBF (global) 79.03% 90.00% 59.09% IMPM(personalised) 87.10% 90.00% 81.82%23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  15. 15. Personalised Modelling for Bioinformatics Research An example: applying PM on gene expression data for colon cancer diagnosis Compact GA Evolution Weighted importance of selected features 600 0.08 Weighted importance 0.8 500 0.7 0.06 0.6 400generation 0.5 300 0.4 0.04 0.3 200 0.2 0.02 0.1 100 0 0 3771285 1892419 8121843 15743501991513 5611863814 8091069395 462 348 20 40 60 80 100 120 140 each bit represents one feature Index of genes (a) The evolution of feature selection for (b) The weighted importance of the selected sample #32 using 600 generations of GA features for sample #32 after one run of the optimisation; method; Results from a simple experiment on colon cancer gene expression data 23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  16. 16. Blue (Circle points) - actual value of this gene Visualizing the results of PFS with 3 features Green Upward Triangle -Healthy Red Downward Triangle-Diseased Blue (Circle points) - actual value of this gene 1400 Green Upward Triangle -Healthy; Red Downward Triangle-Diseased Gene Expression Level 1200 1000 0.8 800 0.6 f1892 600 0.4 0.2 400 200 0.2 0.8 0.4 0.6 0.6 0.4 0 0.8 0.2 377 12851892 419 812 18431574 350 1991 513 561 1863 814 809 1069 395 462 348 1 f1285 Index of Selected Genes f377 15 1 Colon cancer data - area under Curve: 0.87727 0.9 0.8 Classification Accuracy 10 0.7 0.6 0.5 0.4 5 ROC Curve 0.3 Overall Accuracy Class 1 Accuracy 0.2 Class 2 Accuracy 0.1 0 419 377 1423 132 105818921982 350 79110601495 49 824 892129618631924 0 0 0.2 0.4 0.6 0.8 1 Threshold(c) Sample 32 (a blue dot) is plotted with its neighbouring samples (red triangles represent cancer samples and greentriangles - control) in the 3D space of the top 3 gene variables from (b);(d) The profile of sample #32 (blue dots) versus the average local profile of the control (green) and cancer (red triangles)using the features from (b)(e) The 17 most frequently selected features for all samples - the method is run 20 times for each sample;(f)The accuracy of personalised diagnosis across all 60 samples when the 17 markers from (e) are used in a leave-one-out cross validation; in case of ROC curve x axis represents false positive rate (1-specificity), while y axis is truepositive rate (sensitivity); the area under curve is 0.87727 and the overall accuracy - 87.10%; 23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  17. 17. Personalised Modelling (PM) for CVD Diagnosis and Risk PrognosisThis study aims at personalised modelling for cardiovascular diseases(CVD) diagnosis. The dependency of classification accuracy on number of neighbors 0.9 overallAcc class 1 acc 0.85 class 2 acc Classification accuracy 0.8 0.75 0.7 0.65 5 10 15 20 25 30 35 Num of neighbors (k) The PM method optimises automatically the number of the neighbouring samples K, which can be unique for every input sample or chosen as an optimal for all. 23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  18. 18. Software Demo23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  19. 19. Conclusion• The proposed IMPM has a major advantage: the modelling process starts with all relevant variables available for a person, rather than with a fixed set of variables required by a global model.• The proposed IMPM leads to a better prognostic accuracy and a computed personalised profile;• With global optimisation using IMPM, a small set of variables (potential markers) can be identified from the selected variable set across the whole population• The proposed algorithms and models of IMPM are generic which can be potentially incorporated into a variety of applications for data analysis and knowledge discovery with certain constraints, such as financial risk analysis, time series data prediction, etc• We hope that this study will motivate the applications of personalised modelling research in different research areas.23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  20. 20. Reference List:1. Kasabov, N.: Global, local and personalized modelling and pattern discovery in bioinformatics: An integrated approach. Pattern Recognition Letters 28(6) (2007) 673–685.2. Amnon Shabo. Health record banks: integrating clinical and genomic data into patientcentric longitudinal and cross-institutional health records. Personalised Medicine, 4(4):453–455, 2007.3. Kasabov, N and Hu, Y (2011) Integrated optimisation method for personalised modelling and case study applications, Int. J. Functional Informatics and Personalised Medicine, vol. 3, no.3, pp. 236-256, 2010.23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz
  21. 21. Questions?23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz

×