Using Data Warehouse and Data Mining Resources for


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Using Data Warehouse and Data Mining Resources for

  1. 1. Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning Daniela Resende Silva1 Marina Teresa Pires Vieira E-mail: E-mail: Department of Computer Sciences UFSCar - Federal University of São Carlos Rod. Washington Luís, Km 235 Caixa Postal 676 13565-905 / São Carlos – SP – Brazil Phone/Fax:(55 16) 260-8232 Abstract The work proposed herein presents an approach that differs from the existing ones for the ongoing assessment This paper discusses the use of Data Warehouse and of distance learning using some of the aspects relating to Data Mining resources to aid in the assessment of those utilized in the above cited studies. distance learning of students enrolled in distance courses. Section 2 provides a set of information to guide the Information considered relevant for the assessment of implementation of ongoing assessment of learning in distance learning is presented, as is the modeling of a distance learning environments, while Section 3 briefly data warehouse to store this information and the discusses the modeling of a data warehouse based on the MultiStar environment, which allows for knowledge set of information proposed. Section 4 presents the discovery to be performed in the data warehouse. implementation of this data warehouse using the MultiStar environment, and finally, Section 5 lists our conclusions to this paper. 1. Introduction 2. Ongoing Assessment of Distance Learning A variety of applications have benefited from the use of Data Warehousing technology [1, 2, 3] to support The teaching-learning process naturally produces management analyses, which can be obtained through the information about the status of a student’s activities in a use of Data Mining [4]. The joint use of Data course. The study of this information and the decisions Warehousing and Data Mining techniques is a trend in based on this study characterize the ongoing assessment KDD – Knowledge Discovery in Data Warehousing of the learner. applications (referred to herein as KDW – Knowledge In most computational environments for distance Discovery in Data Warehouse), since the data in a learning involving some kind of student assessment, this warehouse are better prepared for data mining. is done by collecting the student’s interactions with the This paper discusses how the data warehouse and data environment (the student’s actions). Analyzing the mining resources can be used for the assessment of student’s history of interactions can reveal how the distance learning and proposes the MultiStar environment manner in which he conducts his studies influences the for KDW to support this assessment. extent to which he profits from the course. Today there is Several studies focus on supporting student a wide range of environments available for distance assessment, among them those of [5, 6] and [7]. Some courses. To identify how these environments assess the studies apply data mining resources to Web log student’s assimilation, a survey was made of the ones information [8, 9, 10 11]. most frequently cited in the literature, as documented by 1 MPhil scholarship-CAPES/Brazil 40 0-473-08801-0/01 $20.00 © 2002 IEEE
  2. 2. [12]. Five mechanisms to support the ongoing assessment the criterion used to decide whether or not the student of distance learning were identified through this survey: has carried them out. − tracking of the student’s actions; − redirectioning through evaluation; 3. Ongoing Assessment of Distance Learning − records of messages from lists; using Data Warehouse Resources − records of messages from forums; − records of messages from chats. The relevant information for ongoing assessment of The results of this survey show a tendency for these distance learning can be stored in a data warehouse to environments to support the tracking of some student support management decisions. This study explores the activities to monitor his learning. use of a data warehouse with these characteristics for the Most of these environments contain a small set of application of data mining techniques, allowing for information that tracks the path the student has taken patterns of student behavior to be identified, thereby during the course. This set varies from one favoring decision making for ongoing assessment of the another, according to a criterion not student. divulged by its designers. In this work, the modeling of the data warehouse Although there is no standard set of requisites to assess follows the fact constellation schema [2], incorporating the student’s learning, there are clearly two types of generalization hierarchies for fact or dimensions tables of information to guide the implementation of ongoing the data warehouse. assessment of learning in distance learning environments: Figure 1 constitutes part of the data warehouse that was − Information about the student’s actions and developed based on the information discussed in the communication [13]. previous section. The gray boxes in these figures This information can aid in understanding how the represent fact tables, i.e., tables that store information student’s interactions with the environment and with about a subject, about which measures (or facts) are other course participants influence his learning. Two defined (highlighted in bold). The remaining boxes types of student interaction can be identified: represent the dimension tables from which one wishes to − Student-Person Interactions: which are those in store the values that determine the fact table measures. which the student interacts with other course The representation of a fact table with its dimension tables participants, such as the teacher, the assistant is called Star Schema. Part A and B of Figure 1 represent teacher or another student, through some two star schemas. communication mechanism. With regard to these interactions, it is interesting to know, for instance, the subject of the message and the mechanism (chat, email, list, forum, etc.) employed. − Student-Material Interactions: which are those in which the student interacts with the didactic material (content pages, tests, exercises, etc.). About these interactions, it is interesting to know, for example, how much time was spent on them, if the interaction consisted of downloading or uploading, which discipline the material belongs to, what link was used to access the material, etc. − Information about the student’s activities in the course [8] This kind of information, which depends on a rule established by the teacher, strongly influences in determining whether or not the student has actually learned. Each activity proposed by the teacher may have a result: for instance, participation or not in a conference, the grade given for an assignment, and so on. This type of information depends on the activities proposed for the course and Figure 1. Fact constellation schema for the Activity the way the teacher has chosen to validate them, i.e., and Personal Interaction. 41
  3. 3. Information about the activities developed by the the measures and dimensions of these two facts can be student during the course can be stored in the data analysed jointly, crossing information about the warehouse, as illustrated in part A of Figure 1, while interactions and activities developed by the students. One information about the student-person interactions can kind of analysis that can be made, for example, is to check follow the model shown in part B of Figure 1. if the students’ interactions influence in the performance The PersonalInteraction fact table shown in part B of of the course activities. Figure 1 specializes in 4 different interactions: Figure 2 illustrates the fact constellation schema of the InteractionViaChat, InteractionViaEmail, data warehouse developed to assess distance learning. A InteractionViaList and InteractionViaForum. The fact constellation is a collection of stars. semantics of this hierarchical structure is translated into In addition to the information about activities and the measures and dimensions of the specialized facts. personal interactions, this data warehouse contains the These fact tables contain all the dimensions and measures following information: of the PersonalInteraction. In analytical terms, this − the student’s interaction (access) with the didactic represents the possibility of examining, in each fact of the material (StudentMaterialInteraction fact table- specialization, the .dimensions and measures common to centered), involving the attributes all the personal interactions as well as the specific DurationOfTheAccess, LinkOfTheMaterialAccessed, information about each interaction (via chat, via email, TypeOfAccess (download or upload), etc. via list or via forum), considering the instances pertinent − the tests the student has taken (Test fact table- to the fact table in question. For analytical purposes, the centered), with the attributes Grade, PersonalInteraction fact table is used when one wishes to NumberOfIncorrectly AnsweredQuestions, etc. analyze measures and attributes common to all the types − and whether the student has passed the tests upon of personal interaction. conclusion of a discipline (Approval fact table- An analysis of Figure 1 reveals that the stars of the centered), with the attributes Dropped-out?, Passed?, Activity and PersonalInteraction facts have common TemporarilySuspended?, etc. dimensions: Student, Course, Discipline, Institution, For purposes of legibility, Figure 2 groups the Student, Group and Time. Joining these two stars forms a Course, Discipline, Institution, Time and Group constellation with two facts that share six dimensions. dimensions shared by all the facts into one entity to avoid This union is advantageous because, in addition to the pollution caused by linking. avoiding the duplication of data, in practice it means that The data warehouse in Figure 2 shows various indirect Figure 2. Fact constellation for ongoing assessement. 42
  4. 4. relationships among the fact tables. This opens up a wide Figures 3 and 4 exemplify the use of the MultiStar range of possibilities when combining measures and environment for knowledge discovery in the data dimensions to carry out analyses, e.g., warehouse in Figure 2. These figures portray how the − analyze whether there is a relation between a selection and mining of information in this environment student’s score, his personal interactions and his can be performed. Field 1 of Figure 3 represents the fact accessing of the didactic material (involving the Test, tables of Figure 2 which, upon being expanded PersonalInteraction and StudentMaterialInteraction (fields 2, 3 and 4), show the attributes that represent the facts); subjects subjected to analysis in the fact table (called − verify the influence of factors such as communication measures or facts) and information about the related and study on learning (involving the dimension tables. PersonalInteraction and StudentMaterialInteraction facts); − discover if the type of connection a student possesses influences the number of times he accesses the environment (involving the Student dimension and the StudentMaterialInteraction fact); − find activities that are more effective in given courses, age groups, level of schooling, etc. (involving the Course and Student dimensions and the Activity fact). These analyses can be made using the environment for Knowledge Discovery in Data Warehouses (KDW) described in the following section. 4. A KDW Application for Assessment of Distance Learning Figure 3. MultiStar: selecting information. Commercial tools can be used to carry out The purpose of the data selection process illustrated in management analyses in the data warehouse presented in Figure 3 is to support an analysis of the influence of the the previous section; however, they support simple chat interactions on the student’s activities. Thus, a analyses, i.e., using only one fact and its dimension tables, selection was made in the data warehouse of the Student e.g., identify the profiles of students more prone to dimension common to the Activity (field 2), Approval dropping out of a course (involving the Student dimension (field 3) and PersonalInteraction (field 4) fact tables, the table and the Approval fact table). TypeOfInteraction and Reply? measures in the However, there are important analyses that can be PersonalInteraction fact table, the Passed? measure of performed in this warehouse which require a comparison the Approval fact table, and the Accomplished? measure of the different aspects of the student’s learning process. of the Activity fact table. This analysis was restricted to Examples of this type of analysis were given in the students of the ATA Institution during the period of 1999 previous section. to 2001. This led to the creation of filters (field 5) for the To support this type of broad analysis, i.e., those attribute Name of the dimension Institution (field 6) and involving more than one fact (star), an environment called for the attribute Year of the dimension Time (field 7), both MultiStar was developed for knowledge discovery [14]. of which are attributes of dimensions common to the three This environment allows information to be selected in fact tables. which data mining tasks will be applied, providing The information selected is stored in a data cube2 resources for the recognition of fact constellations and the called ‘Interactions and Activities’, which contains all the treatment of generalization hierarchies. By recognizing attributes of the Student dimension table (as shown in .fact constellations, MultiStar allows for analyses Figure 1) and the measures cited below. involving facts that belong to the same constellation, i.e., In the MultiStar environment, for a generalization facts that share dimensions. The treatment of hierarchy between fact or dimension tables, characteristics generalization hierarchies involving the relationship of inherited from the parent tables are displayed inheritance among the fact or dimension tables of a data automatically in the child tables, making the hierarchies warehouse does not require the user to understand the 2 concept on which it is based. A data cube [4] is a structure composed of dimensions and facts organized to facilitate analyses of the data. 43
  5. 5. clear to the user. With regard to the fact constellations, The data mining task chosen was Classification, with when a dimension or measure is selected, the MultiStar the purpose of classifying the student according to the environment allows for the selection of only the fact measure Passed?. tables that are related directly or indirectly with the When this mining task is performed, MultiStar selected information. textually presents the patterns it finds. The patterns resulting from the classification task are expressed through rules, as shown in the example below: IF Accomplished? = yes, and TypeOfConnection = superfast, and TypeOfInteraction = chat, and Reply? = yes THEN Passed? = yes The number of cases in which a rule occurs and the degree of reliability of the rule are indicated for each rule found. 5. Conclusions This paper discusses the relevant information for ongoing assessment of learning in computational distance learning environments, proposing a solution to aid in those ongoing assessment through the use of data warehouse and data mining resources. Modeling of a data warehouse was presented to illustrate the information identified, as well as the MultiStar environment, which allows for knowledge discovery in this data warehouse. The authors intend to present the results of the Figure 4. MultiStar: mining data. application of data mining tasks in the next version of the environment in a more user intuitive form, using graphic Once the data has been selected, MultiStar provides resources. resources for the application of data mining tasks so that An intelligent tutor can also be developed to patterns can be extracted based on those data. Figure 4 automatically guide the student in his learning process, shows the interface for the application of data mining on based on the results of the data mining tasks applied to the the data selected in Figure 3. data warehouse discussed herein. In Field 1 of Figure 4, the user selects the cube to be analyzed (the ‘Interactions and Activities’ cube was selected here). Field 2 shows the attributes of the selected 6. References cube (dimensions and measures). The user must choose [1] W.H. Inmon, Building the Data Warehouse, John one attribute from each dimension of the cube (the Wiley & Sons, 2nd edition, 1996 attribute TypeOfConnection from the Student dimension table was selected). These attributes together with the [2] R. Kimball, The Data Warehouse Toolkit – Practical measures of the cube (Accomplished? from the Activity Techniques for Building Dimensional Data Warehouses, .fact table, Passed? from the Approval fact table, and John Wiley Professio, 1996 TypeOfInteraction and Reply? from the [3] R. Kimball, L. Reeves, M. Ross and W. Thornthwaite, PersonalInteraction fact table, in our example) compose a The Data Warehouse Lifecycle Toolkit, Willey Computer view to be mined. Field 5 shows the cube filter selected. Publishings, 1998 A mining task is selected in Field 3, and the parameters for this task are defined in Field 4. The data mining tasks [4] J. Han and M. Kamber, Data mining – Concepts and available in the environment are Association [15], Techniques, 1 st edition, New York: Morgan Kaufmann, Classification [16] and Clustering [17]. Each of these 2000 tasks allows the data to be analyzed from a different [5] K. Nurmela, E. Lehtinen, T. Palonen, Evaluating standpoint. CSCL Log Files by Social Network Analysis, In: 44
  6. 6. Computer Support for Collaborative Learning, Stanford, [12] D.R. Silva and M.T.P. Vieira, An Ongoing USA, 1999. Proceedings. p. 434-441 Assessment Model in Distance Learning, In: Proceedings of Internet and Multimedia Systems and Applications, [6] M. Rahkila and M. Karjalainen, Evaluation of Honolulu, USA, 2001 Learning in Computer Based Education Using Log Systems. In: ASEE/IEEE Frontiers in Education [13] C. Vrasidas and M.S. McIsaac, Factors Influencing Conference, 29., San Juan, Puerto Rico, 1999, Procedings. Interaction in an Online Course; The American Journal of p. 16-21 Distance Education, v. 13, n. 3, 1999. [7] S.L. Tanimoto, Towards an Ontology for Alternative [14] D.R. Silva, A Tool for Knowledge Discovery using Assessment in Education. Metting of IEEE Learning Data Warehousing and its Application on the Ongoing Technology Standards Committee, Pittsburgh, USA, 1998 Assessment of Distance Learning. MPhil. Dissertation, Departament of Computer Science, UFSCar, São Carlos, [8] J. Pei, J. Han, B. Mortazavi-Asl and H. Zhu, Mining Brazil, 2002, 108p. (In portuguese) Access Patterns Efficiently from Web Logs, In: Pacific- Asia Conference on Knowledge Discovery and Data [15] R. Agrawal, T. Imielinski and A. Swami, Mining Mining, Kyoto, Japan, 2000, Proceedings. p. 396-407 Associations between Sets of Items in Massive Databases. In: ACM SIGMOD International Conference on the [9] O.R. Zaiane, M. Xin and J. Han, Discovering Web Management of Data. New York, USA, 1993. Access Patterns and Trends by Applying OLAP and Data Proceedings. NY: ACM Press, 1993, p. 207--216. Mining Technology on Web Logs, In: Advances in Digital Libraries Conference, Santa Barbara, USA, 1998, [16] J.R. Quinlan, Induction of Decision Trees. Machine Proceedings. p. 19-29 Learning, 1:81-106, 1986 [11] B. Mortazavi-Asl, Discovering and Mining User [17] P. Cheeseman and J. Stutz, Bayesian Classification Web-Page Traversal Patterns, MPhil. Dissertation, Simon (AutoClass): Theory and Results, In: Advances in Fraser University, 1999, p. 93 Knowledge Discovery in Databases, 1995. 10., Proceedings. AAAI Press, p. 61-83, 1995 45