Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this


  1. 1. Proceedings of the 3rd International Conference on E-Health and Bioengineering - EHB 2011, 24th-26th November, 2011, Iaşi, Romania ___________________________________________________________________________________________________________________ Using Data Mining Approach for Personalizing the Therapy of Dyslalia Mirela Danubianu, Iolanda Tobolcea “Stefan cel Mare” University of Suceava, “A.I. Cuza” University of Iasi, Abstract- The use of information technology for assist the speech disorder therapy allows collecting a huge volume of data. This data may be the foundation of a data mining process that can offer models, which lead to the optimization of therapy through its personalization. This paper aims to analyze how useful is data mining in logopaedic area and to present a proposed data mining system for optimize the decision regarding the personalized therapy of dyslalia. Keywords: data mining system, dyslalia, personalized therapy I. INTRODUCTION Speech disorders are affections, which by their nature do not endanger the life of a person, but often they have negative impact on his/her life standard. Dyslalia is one of the most common speech disorders and it consists in problems of fluency, voice or how a person utters speech sounds. Dyslalia occurs frequently to children under school age, and if it is discovered and proper treated in due time, it can be often corrected, but speech therapy is a complex process and it must be adapted to each child. The Information Technology used to assist the speech therapy and to track out the patients’ evolution allowed collect a considerable amount of data. Surprisingly, these data did not lead immediately to an increase in the volume of information to support the decisions of effective therapy, because the traditional methods of data processing are not suitable in such cases. The solution is data mining. This paper aims to analyze how useful is data mining in logopaedic area and to present a proposed data mining system for optimize the decision regarding the personalized therapy of dyslalia. Section II presents some general information about data mining, and Section III refers to dyslalia in the context of speech disorders. Section IV shows how useful is data mining in logopaedic area and Section V introduce the Logo-DM system. II. DATA MINING The technological progress that we witnessed in the last years associated with the increased need for valuable and new knowledge has led to the appearance and the development as an area of research, for Knowledge Discovery in Database (KDD). It was defined as “the process of identifying valid, novel, potentially useful, and understandable patterns in data” [1], and accordingly to CRISP-DM model it consists of the following six steps: business understanding, data understanding, data preparation, modeling, evaluation of the model and deployment [2]. Even if, at a time, the CRISP-DM Fig. 1. CRISP-DM model model was considered the standard for KDD processes, the development of this kind of application in various areas has led to the occurrence of many specific models. Defined as the semiautomatic extraction of knowledge from huge volumes of data, data mining correspond to the modeling step in the knowledge discovery in databases process. It involves the application of intelligent methods in order to discover new and interesting patterns from large volumes of data, and it may facilitate the discovery from apparently unrelated data, relationships that can anticipate future problems or might solve the studied problems. So, using data mining one can solve problems which are categorized into: prediction and knowledge discovery (or description). Prediction is the main goal of data mining, but it is often preceded by description. For example, in a health care application for a disease recognition, which belongs to predictive data mining, we must mine the database for a set of rules that describes the diagnosis knowledge, and this knowledge is further used for the prediction of the disease when a new patient comes in. Each of these two problems has some associated methods. For prediction we can use classification or regression while for knowledge discovery we can use deviation detection, clustering, association rules, database segmentation or visualization. III. WHAT IS DISLALYA? A speech disorder may be defined as a problem of fluency, voice, and/or how a person utters speech sounds. It may have different causes, from organic to functional, neurological or psycho-social causes. Dyslalia is articulation disorder that consists in difficulties with the way sounds are formed and strung together. These are usually characterized by omitting, distorting a sound or substituting one sound for another. Dyslalia has the greatest
  2. 2. Proceedings of the 3rd International Conference on E-Health and Bioengineering - EHB 2011, 24th-26th November, 2011, Iaşi, Romania ___________________________________________________________________________________________________________________ frequency among handicaps of language for psychological normal subjects as well as for those with deficiencies of intellect and sensory. Thus, the opinion of Sheridan (1946) is that at the age of eight years dyslalia are in proportion of 15% for girls and in proportion of 16% for boys [3]. The prevention and treatment of speech disorders is a complex issue, stirring the interest of speech therapists, as well as of those asked to contribute to the children's language education. Early treatment of speech impairments ensures improved efficiency, as psycholinguistic automatisms are not consolidated in young children, and, through adequate educational interventions, they can be replaced by correct speech acts. Differential diagnosis will decide upon the therapy for correcting language, as psycho diagnosis allows an adequate therapeutic program, and the elaboration of a prognosis regarding the evolution of the child, along with the therapeutic process. The therapy has to be adapted to each language therapist, to each particular case, to the child’s learning rhythm and style, as well as to the level of the impairment. The key issues in dyslalia therapy are shown in Fig. 2. Complex examination provides data regarding social, cognitive and affective parameters and point out potential general development problems, and articulation or pronunciation problems Due to the complexity of the possible problems involved, the therapeutic methods will vary significantly, from analysis, synthesis to global therapeutic specific methods and techniques. This is why, a good knowledge of all these methods will allow therapists to pick the best ones for a Fig. 2. The key issues in dyslalia therapy specific case. Therapy stages are to be followed in a specific order, regardless of the methods, according to each child’s psychosomatic structure. The therapy starts with the psychological processes involved (cognitive, psychometric and affective-emotional) and it is build on stages, moments and concrete objectives, materialized in specific therapeutic techniques. The techniques materialize in exercises and procedures which the child has to perform in order to achieve the final aim: a correct speech act [4]. IV. HOW USEFUL IS DATA MINING IN LOGOPAEDIC AREA It is known that the logopaedic diagnosis is subjective and depends on the available data and on the experience of the logopaed. Therapy also must be adapted to each individual case, based on factors such as the gravity of speech impairment, the child’s characteristics and lifestyle or the parent-child’s relationships. The aim of data mining is to extract hidden patterns in data collected from speech therapist’s office, using specific techniques. If we refer logopaedic activity, the actions performed by data mining can be grouped into the following categories [4]: classification, clustering and association rules mining. Classification aims to extract models describing data class. These models, called classifiers, are constructed to predict categorical labels. Data classification is a two step process. In the first step, called the learning step or the training phase, a classifier is build to describe a predetermined set of data classes, called training set. The class label of each training tuple is provided. This is the reason why this method is known as supervised learning. Once built, the model is tested on data that are also labeled in order to determine its accuracy. If it is validated, the model will be further used to predict classes of unlabeled tuples. Related to logopaedic area classification places the people with different speech impairments in predefined classes. Thus it is possible to track the size and structure of various groups. We use classification which is based on the information contained in many predictor variables, such as personal or familial anamnesis data or related to lifestyle, to join the patients with different segments. Clustering brings together sets of entities into groups based on their similarities. A cluster is a collection of objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. The expression for similarity is often a distance function which can take different forms based on data types used in order to describe the objects. For example, for numerical data one may use the Euclidian distance, but for categorical data this distance is not appropriate. In this case, an algorithm which uses a new distance measure based on the total mismatches of the categorical attributes of two data records is proposed [5]. In data mining, clustering is an unsupervised learning method, because, unlike classification is not based on existence of predefined classes and class-labeled training. In the logopaedic activity, clustering groups people with speech
  3. 3. Proceedings of the 3rd International Conference on E-Health and Bioengineering - EHB 2011, 24th-26th November, 2011, Iaşi, Romania ___________________________________________________________________________________________________________________ disorders on the basis of similarity of different features. It is not based on the previous definition of groups. It is an important task because helps the therapists to understand who are they patients. Clustering aims to find subsets of a predetermined segment, with homogeneous behavior towards various methods of therapy that can be effectively targeted by a specific therapy. The last data mining method we consider, association rules aim to find out associations between different data which seem to have no semantic dependence. Association rules were applied initially to data sets consisting of variable length transactions, but in time, some algorithms has been adapted for data stored in relational databases. Since, the goal of association rules mining is to identify combinations of items that often occur together, in the personalized speech therapy area its task is to determine why a specific therapy program has been successful on a segment of patients with speech disorders and on the other was ineffective. So, as a conclusion, we may say that data mining is a useful tool, which helps the therapists to design proper personalized schema for speech disorders therapy. V. LOGO-DM SYSTEM Following the natural tendency to use computers to assist and support the process of speech therapy, in the last decade have occurred more computer-based speech therapy (CBST) systems. Such a system is TERAPERS project developed within the Center for Computer Research in the University "Stefan cel Mare" of Suceava [4]. This project has proposed to provide a system which is able to assist teachers in their speech therapy of dislalya and to track out how the patients respond to various personalized therapy programs. TERAPERS system is currently used by the therapists from Regional Speech Therapy Center of Suceava. It is a complex system that contains a fix part - the intelligent system called LOGOMON, placed on computers located in therapists’ offices and a set of mobile devices. The desire to get better results in shorter intervals and to decrease the costs of therapy have led to the need for answers to questions about the expected duration of therapy, the results expected at the end of each stage of therapy or the establishment of a complex of exercises which correspond to the profile of each patient. All this may be the subject of predictions obtained by applying data mining techniques on data collected by using TERAPERS. To achieve this goal we propose the development of Logo-DM system. A. Data set description As we have noted above Logo-DM system aims to help the speech disorders therapy by the optimization of the personalizing process of dyslalia therapy. In order to adapt the therapy programs the therapist must perform a complex examination of children. This is materialized in recording of relevant data relating to personal and family anamnesis. The complex examination of how the children articulate the phonemes in various constructions allows also a diagnosis for the patient. Anamnesis data collected may provide information relative to various causes that may negatively influence the normal development of the language. It contains historical data and data provided by the cognitive and personality examination. On provide to the applied personalized therapy programs data such as number of sessions/week, exercises for each phase of therapy and the changes of the original program according to the patient evolution. In addition, the report downloaded from the mobile device collects data on the efforts of child self-employment. These data refer to the exercises done, the number of repetitious for each of these exercises and the results obtained. The tracking of child’s progress materializes data which indicate the moment of assessing the child and his status at that time. All these data are stored in a relational database, composed of 60 tables. Data stored in the TERAPERS’s database together with data from other sources (eg demographic data, medical or psychological research) is the set of raw data that will be the subject of data mining. Fig. 3 presents the complete sequence of operations applied on these data. A first remark is related to the fact that we have a relational database, characterized by a high degree of normalization, making the various characteristics to be in different tables. Creating target data set is accomplished through a join of tables containing useful features followed by a projection on a superset of appropriate attributes, as is shown in (1) ∏I (T1 ><T2...><Tk ) i (1) where: I is a superset of the attributes regarding the useful characteristics and T1 …Tk is the set of tables containing the attributes in the list of projection. For example, target data set necessary to establish the profile of children with speech disorders, can be obtained by joining tables which contain: general data about children, family and personal anamnesis, data on complex evaluation and diagnosis associated Another remark concerning the data contained in the database relates to the encoding characteristics by the values specified in the various fields. These codes were converted so that the data set on which apply data mining algorithms contain descriptive understandable values. It is preferable to change the numerical values during the data preprocessing step, so that the final data set contains the descriptive values of characteristics. Fig. 3. The complete operations’ flow for data used by Logo-DM
  4. 4. Proceedings of the 3rd International Conference on E-Health and Bioengineering - EHB 2011, 24th-26th November, 2011, Iaşi, Romania ___________________________________________________________________________________________________________________ B. System Architecture To execute the operations noted above we propose the architecture presented in Fig. 4. Fig. 5 Execution times for associatin rule minning in Logo-DM VI. CONCLUSION Fig. 4 Logo-DM architecture It is a client-server architecture on two levels, where the tasks are divided as follows. The client component contains the user interface, the pattern evaluation module connected to a Knowledge base and the pattern visualization module. The user interface (GUI) allows the user to communicate with the system in order to select the task to perform, to select and submit the request for the datasets on which data mining needs to be applied. Pattern evaluation and the postprocessing step consisting in pattern visualization are performed also on the client. The knowledge base is the module where the background knowledge is stored. The more difficult computational tasks of data mining operations are carried out on the server. Here, the data mining kernel contains modules able to perform classifications and association rule detection. Supplementary the pre-processing data module allows data to become suitable for applying data mining algorithms. C. Issues Related to Implementation First, we must note that the dataset used for this work contains 300 cases and 96 attributes. As noted above, the data source has several features that can affect performance of algorithms. So, to find the best solutions we have performed a series of tests for mining association rules algrithms. First, we have implemented Apriori inside to the database which contains the dataset, in order to use the database indexing capability and query processing and optimization facilities provided by the database management system. We have used the stored procedure approach and two SQL approaches – a k-way join and an implementation based on subqueries. The results obtained are presented in Fig. 5. We found that in this case, Apriori algorithm has an unusual behaviour. It means that the candidate itemsets do not decrease starting from the third iteration, but grows, even exponentially to about half of the iterations number. This leads to very large execution times. In these conditions we have also tested FP-Growth algorithm which has given better results relative to time of execution, because it finds the frequent itemsets without candidate generation. Data mining technology can be a useful tool if one try to optimize the speech disorder therapy. Searching in big volumes of data it is able to provide information that enables the implementation of personalized therapy programs adapted to the characteristics of each child. Finally one may expect a decreased duration of therapy, increased possibilities of achieving superior results and ultimately a lower cost of therapy. We have proposed a data mining system that aims to use data from TERAPERS – a CBST implemented at “Stefan cel Mare” University of Suceava- in order to help to speech therapy optimization. ACKNOWLEDGMENT This paper was supported by the project "Progress and development through post-doctoral research and innovation in engineering and applied sciences– PRiDE - Contract no. POSDRU/89/1.5/S/57083", project co-funded from European Social Fund through Sectorial Operational Program Human Resources 2007-2013. REFERENCES [1] Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996)The KDD process pentru extracting useful knowledge from volumes of data. Communications of the ACM, 39(11):, pag.27-34, 1996 [2] Wirth, R. and Hipp, ( 2000) J. CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pages 29-39, Manchester, UK., 2000 [3] Danubianu M. , Pentiuc S.G., Schipor O., Nestor M. , Ungurean I., (2008) Distributed Intelligent System for Personalized Therapy of Speech Disorders, Proceedings of ICCGI08, Atena, 2008 [4] Danubianu M., Pentiuc St. Gh., Socaciu T., Towards the Optimized Personalized Therapy of Speech Disorders by Data Mining Techniques, The Fourth International Multi Conference on Computing in the Global Information Technology ICCGI 2009, Vol: CD, 23-29 August, Cannes - La Bocca, France, 2009 [5] Huang, Z., Extension to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values, Data Mining and Knowledge Discovery, 2, p. 283-304, 1998 [6] last visited April 2011 [7] Oana Geman, Data Mining Tools and Deep Brain Stimulation Target Evaluation, The 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Technology and Applications, IDAACS’2000