Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Rumanía
1. Proceedings of the 3rd International Conference on E-Health and Bioengineering - EHB 2011,
24th-26th November, 2011, Iaşi, Romania
___________________________________________________________________________________________________________________
Using Data Mining Approach for Personalizing the
Therapy of Dyslalia
Mirela Danubianu, Iolanda Tobolcea
“Stefan cel Mare” University of Suceava, “A.I. Cuza” University of Iasi
mdanub@eed.usv.ro, itobolcea@yahoo.com
Abstract- The use of information technology for assist the
speech disorder therapy allows collecting a huge volume of data.
This data may be the foundation of a data mining process that
can offer models, which lead to the optimization of therapy
through its personalization. This paper aims to analyze how
useful is data mining in logopaedic area and to present a
proposed data mining system for optimize the decision
regarding the personalized therapy of dyslalia.
Keywords: data mining system, dyslalia, personalized therapy
I.
INTRODUCTION
Speech disorders are affections, which by their nature do
not endanger the life of a person, but often they have negative
impact on his/her life standard. Dyslalia is one of the most
common speech disorders and it consists in problems of
fluency, voice or how a person utters speech sounds. Dyslalia
occurs frequently to children under school age, and if it is
discovered and proper treated in due time, it can be often
corrected, but speech therapy is a complex process and it
must be adapted to each child. The Information Technology
used to assist the speech therapy and to track out the patients’
evolution allowed collect a considerable amount of data.
Surprisingly, these data did not lead immediately to an
increase in the volume of information to support the decisions
of effective therapy, because the traditional methods of data
processing are not suitable in such cases. The solution is data
mining.
This paper aims to analyze how useful is data mining in
logopaedic area and to present a proposed data mining system
for optimize the decision regarding the personalized therapy
of dyslalia. Section II presents some general information
about data mining, and Section III refers to dyslalia in the
context of speech disorders. Section IV shows how useful is
data mining in logopaedic area and Section V introduce the
Logo-DM system.
II.
DATA MINING
The technological progress that we witnessed in the last
years associated with the increased need for valuable and new
knowledge has led to the appearance and the development as
an area of research, for Knowledge Discovery in Database
(KDD). It was defined as “the process of identifying valid,
novel, potentially useful, and understandable patterns in data”
[1], and accordingly to CRISP-DM model it consists of the
following six steps: business understanding, data
understanding, data preparation, modeling, evaluation of the
model and deployment [2]. Even if, at a time, the CRISP-DM
Fig. 1. CRISP-DM model
model was considered the standard for KDD processes, the
development of this kind of application in various areas has
led to the occurrence of many specific models.
Defined as the semiautomatic extraction of knowledge
from huge volumes of data, data mining correspond to the
modeling step in the knowledge discovery in databases
process. It involves the application of intelligent methods in
order to discover new and interesting patterns from large
volumes of data, and it may facilitate the discovery from
apparently unrelated data, relationships that can anticipate
future problems or might solve the studied problems.
So, using data mining one can solve problems which are
categorized into: prediction and knowledge discovery (or
description). Prediction is the main goal of data mining, but it
is often preceded by description. For example, in a health care
application for a disease recognition, which belongs to
predictive data mining, we must mine the database for a set of
rules that describes the diagnosis knowledge, and this
knowledge is further used for the prediction of the disease
when a new patient comes in. Each of these two problems has
some associated methods. For prediction we can use
classification or regression while for knowledge discovery we
can use deviation detection, clustering, association rules,
database segmentation or visualization.
III. WHAT IS DISLALYA?
A speech disorder may be defined as a problem of fluency,
voice, and/or how a person utters speech sounds. It may have
different causes, from organic to functional, neurological or
psycho-social causes.
Dyslalia is articulation disorder that consists in difficulties
with the way sounds are formed and strung together. These
are usually characterized by omitting, distorting a sound or
substituting one sound for another. Dyslalia has the greatest
2. Proceedings of the 3rd International Conference on E-Health and Bioengineering - EHB 2011,
24th-26th November, 2011, Iaşi, Romania
___________________________________________________________________________________________________________________
frequency among handicaps of language for psychological
normal subjects as well as for those with deficiencies of
intellect and sensory. Thus, the opinion of Sheridan (1946) is
that at the age of eight years dyslalia are in proportion of 15%
for girls and in proportion of 16% for boys [3]. The
prevention and treatment of speech disorders is a complex
issue, stirring the interest of speech therapists, as well as of
those asked to contribute to the children's language education.
Early treatment of speech impairments ensures improved
efficiency, as psycholinguistic automatisms are not
consolidated in young children, and, through adequate
educational interventions, they can be replaced by correct
speech acts.
Differential diagnosis will decide upon the therapy for
correcting language, as psycho diagnosis allows an adequate
therapeutic program, and the elaboration of a prognosis
regarding the evolution of the child, along with the
therapeutic process. The therapy has to be adapted to each
language therapist, to each particular case, to the child’s
learning rhythm and style, as well as to the level of the
impairment. The key issues in dyslalia therapy are shown in
Fig. 2.
Complex examination provides data regarding social,
cognitive and affective parameters and point out potential
general development problems, and articulation or
pronunciation problems
Due to the complexity of the possible problems involved,
the therapeutic methods will vary significantly, from analysis,
synthesis to global therapeutic specific methods and
techniques. This is why, a good knowledge of all these
methods will allow therapists to pick the best ones for a
Fig. 2. The key issues in dyslalia therapy
specific case. Therapy stages are to be followed in a specific
order, regardless of the methods, according to each child’s
psychosomatic structure. The therapy starts with the
psychological processes involved (cognitive, psychometric
and affective-emotional) and it is build on stages, moments
and concrete objectives, materialized in specific therapeutic
techniques. The techniques materialize in exercises and
procedures which the child has to perform in order to achieve
the final aim: a correct speech act [4].
IV. HOW USEFUL IS DATA MINING IN LOGOPAEDIC AREA
It is known that the logopaedic diagnosis is subjective and
depends on the available data and on the experience of the
logopaed. Therapy also must be adapted to each individual
case, based on factors such as the gravity of speech
impairment, the child’s characteristics and lifestyle or the
parent-child’s relationships.
The aim of data mining is to extract hidden patterns in data
collected from speech therapist’s office, using specific
techniques. If we refer logopaedic activity, the actions
performed by data mining can be grouped into the following
categories [4]: classification, clustering and association rules
mining.
Classification aims to extract models describing data class.
These models, called classifiers, are constructed to predict
categorical labels. Data classification is a two step process. In
the first step, called the learning step or the training phase, a
classifier is build to describe a predetermined set of data
classes, called training set. The class label of each training
tuple is provided. This is the reason why this method is
known as supervised learning. Once built, the model is tested
on data that are also labeled in order to determine its
accuracy. If it is validated, the model will be further used to
predict classes of unlabeled tuples. Related to logopaedic area
classification places the people with different speech
impairments in predefined classes. Thus it is possible to track
the size and structure of various groups. We use classification
which is based on the information contained in many
predictor variables, such as personal or familial anamnesis
data or related to lifestyle, to join the patients with different
segments.
Clustering brings together sets of entities into groups based
on their similarities. A cluster is a collection of objects that
are similar to one another within the same cluster and are
dissimilar to the objects in other clusters. The expression for
similarity is often a distance function which can take different
forms based on data types used in order to describe the
objects. For example, for numerical data one may use the
Euclidian distance, but for categorical data this distance is not
appropriate. In this case, an algorithm which uses a new
distance measure based on the total mismatches of the
categorical attributes of two data records is proposed [5]. In
data mining, clustering is an unsupervised learning method,
because, unlike classification is not based on existence of
predefined classes and class-labeled training. In the
logopaedic activity, clustering groups people with speech
3. Proceedings of the 3rd International Conference on E-Health and Bioengineering - EHB 2011,
24th-26th November, 2011, Iaşi, Romania
___________________________________________________________________________________________________________________
disorders on the basis of similarity of different features. It is
not based on the previous definition of groups. It is an
important task because helps the therapists to understand who
are they patients. Clustering aims to find subsets of a
predetermined segment, with homogeneous behavior towards
various methods of therapy that can be effectively targeted by
a specific therapy.
The last data mining method we consider, association rules
aim to find out associations between different data which
seem to have no semantic dependence. Association rules were
applied initially to data sets consisting of variable length
transactions, but in time, some algorithms has been adapted
for data stored in relational databases. Since, the goal of
association rules mining is to identify combinations of items
that often occur together, in the personalized speech therapy
area its task is to determine why a specific therapy program
has been successful on a segment of patients with speech
disorders and on the other was ineffective.
So, as a conclusion, we may say that data mining is a useful
tool, which helps the therapists to design proper personalized
schema for speech disorders therapy.
V. LOGO-DM SYSTEM
Following the natural tendency to use computers to assist
and support the process of speech therapy, in the last decade
have occurred more computer-based speech therapy (CBST)
systems. Such a system is TERAPERS project developed
within the Center for Computer Research in the University
"Stefan cel Mare" of Suceava [4]. This project has proposed
to provide a system which is able to assist teachers in their
speech therapy of dislalya and to track out how the patients
respond to various personalized therapy programs.
TERAPERS system is currently used by the therapists from
Regional Speech Therapy Center of Suceava.
It is a complex system that contains a fix part - the
intelligent system called LOGOMON, placed on computers
located in therapists’ offices and a set of mobile devices.
The desire to get better results in shorter intervals and to
decrease the costs of therapy have led to the need for answers
to questions about the expected duration of therapy, the
results expected at the end of each stage of therapy or the
establishment of a complex of exercises which correspond to
the profile of each patient. All this may be the subject of
predictions obtained by applying data mining techniques on
data collected by using TERAPERS. To achieve this goal we
propose the development of Logo-DM system.
A. Data set description
As we have noted above Logo-DM system aims to help the
speech disorders therapy by the optimization of the
personalizing process of dyslalia therapy.
In order to adapt the therapy programs the therapist must
perform a complex examination of children. This is
materialized in recording of relevant data relating to personal
and family anamnesis. The complex examination of how the
children articulate the phonemes in various constructions
allows also a diagnosis for the patient.
Anamnesis data collected may provide information relative
to various causes that may negatively influence the normal
development of the language. It contains historical data and
data provided by the cognitive and personality examination.
On provide to the applied personalized therapy programs
data such as number of sessions/week, exercises for each
phase of therapy and the changes of the original program
according to the patient evolution. In addition, the report
downloaded from the mobile device collects data on the
efforts of child self-employment. These data refer to the
exercises done, the number of repetitious for each of these
exercises and the results obtained.
The tracking of child’s progress materializes data which
indicate the moment of assessing the child and his status at
that time. All these data are stored in a relational database,
composed of 60 tables.
Data stored in the TERAPERS’s database together with
data from other sources (eg demographic data, medical or
psychological research) is the set of raw data that will be the
subject of data mining. Fig. 3 presents the complete sequence
of operations applied on these data.
A first remark is related to the fact that we have a relational
database, characterized by a high degree of normalization,
making the various characteristics to be in different tables.
Creating target data set is accomplished through a join of
tables containing useful features followed by a projection on
a superset of appropriate attributes, as is shown in (1)
∏I (T1 ><T2...><Tk )
i
(1)
where: I is a superset of the attributes regarding the useful
characteristics and T1 …Tk is the set of tables containing the
attributes in the list of projection. For example, target data
set necessary to establish the profile of children with speech
disorders, can be obtained by joining tables which contain:
general data about children, family and personal anamnesis,
data on complex evaluation and diagnosis associated
Another remark concerning the data contained in the
database relates to the encoding characteristics by the values
specified in the various fields. These codes were converted so
that the data set on which apply data mining algorithms
contain descriptive understandable values. It is preferable to
change the numerical values during the data preprocessing
step, so that the final data set contains the descriptive values
of characteristics.
Fig. 3. The complete operations’ flow for data used by Logo-DM
4. Proceedings of the 3rd International Conference on E-Health and Bioengineering - EHB 2011,
24th-26th November, 2011, Iaşi, Romania
___________________________________________________________________________________________________________________
B. System Architecture
To execute the operations noted above we propose the
architecture presented in Fig. 4.
Fig. 5 Execution times for associatin rule minning in Logo-DM
VI. CONCLUSION
Fig. 4 Logo-DM architecture
It is a client-server architecture on two levels, where the
tasks are divided as follows. The client component contains
the user interface, the pattern evaluation module connected to
a Knowledge base and the pattern visualization module.
The user interface (GUI) allows the user to communicate
with the system in order to select the task to perform, to select
and submit the request for the datasets on which data mining
needs to be applied. Pattern evaluation and the postprocessing step consisting in pattern visualization are
performed also on the client. The knowledge base is the
module where the background knowledge is stored.
The more difficult computational tasks of data mining
operations are carried out on the server. Here, the data mining
kernel contains modules able to perform classifications and
association rule detection. Supplementary the pre-processing
data module allows data to become suitable for applying data
mining algorithms.
C. Issues Related to Implementation
First, we must note that the dataset used for this work
contains 300 cases and 96 attributes. As noted above, the data
source has several features that can affect performance of
algorithms. So, to find the best solutions we have performed a
series of tests for mining association rules algrithms. First, we
have implemented Apriori inside to the database which
contains the dataset, in order to use the database indexing
capability and query processing and optimization facilities
provided by the database management system. We have used
the stored procedure approach and two SQL approaches – a
k-way join and an implementation based on subqueries. The
results obtained are presented in Fig. 5.
We found that in this case, Apriori algorithm has an unusual
behaviour. It means that the candidate itemsets do not
decrease starting from the third iteration, but grows, even
exponentially to about half of the iterations number. This
leads to very large execution times. In these conditions we
have also tested FP-Growth algorithm which has given better
results relative to time of execution, because it finds the
frequent itemsets without candidate generation.
Data mining technology can be a useful tool if one try to
optimize the speech disorder therapy. Searching in big
volumes of data it is able to provide information that enables
the implementation of personalized therapy programs adapted
to the characteristics of each child. Finally one may expect a
decreased duration of therapy, increased possibilities of
achieving superior results and ultimately a lower cost of
therapy. We have proposed a data mining system that aims to
use data from TERAPERS – a CBST implemented at “Stefan
cel Mare” University of Suceava- in order to help to speech
therapy optimization.
ACKNOWLEDGMENT
This paper was supported by the project "Progress and
development through post-doctoral research and innovation in
engineering and applied sciences– PRiDE - Contract no.
POSDRU/89/1.5/S/57083", project co-funded from European
Social Fund through Sectorial Operational Program Human
Resources 2007-2013.
REFERENCES
[1]
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996)The KDD
process pentru extracting useful knowledge from volumes of data.
Communications of the ACM, 39(11):, pag.27-34, 1996
[2] Wirth, R. and Hipp, ( 2000) J. CRISP-DM: Towards a standard process
model for data mining. In Proceedings of the 4th International
Conference on the Practical Applications of Knowledge Discovery and
Data Mining, pages 29-39, Manchester, UK., 2000
[3] Danubianu M. , Pentiuc S.G., Schipor O., Nestor M. , Ungurean I.,
(2008) Distributed Intelligent System for Personalized Therapy of
Speech Disorders, Proceedings of ICCGI08, Atena, 2008
[4] Danubianu M., Pentiuc St. Gh., Socaciu T., Towards the Optimized
Personalized Therapy of Speech Disorders by Data Mining Techniques,
The Fourth International Multi Conference on Computing in the Global
Information Technology ICCGI 2009, Vol: CD, 23-29 August, Cannes
- La Bocca, France, 2009
[5] Huang, Z., Extension to the k-Means Algorithm for Clustering Large
Data Sets with Categorical Values, Data Mining and Knowledge
Discovery, 2, p. 283-304, 1998
[6] www.salford-systems.com/- last visited April 2011
[7] Oana Geman, Data Mining Tools and Deep Brain Stimulation Target
Evaluation, The 6th IEEE International Conference on Intelligent Data
Acquisition and Advanced Computing Systems, Technology and
Applications, IDAACS’2000