Your SlideShare is downloading. ×
50120140505015 2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

50120140505015 2

55
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
55
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 129 DATA MINING CLASSIFICATION APPROACH FOR WEB-BASED EDUCATIONAL SYSTEM USING GENETIC ALGORITHM Ch. Neelima1 , K. Sridhar2 , Prof. S.S.V.N. Sarma3 1 Department of Computer Science, Kakatiya University - Warangal, Telangana, India 2 Dravidian University – Kuppam, Chittoor, A.P. India 3 Dept. of CSE, Vaagdevi College of Engineering - Warangal, Telangana, India ABSTRACT The ever Increasing progress of a network-distributed computing and particularly the rapid development of the web have had a broad impact on society. Online delivery of educational instructions provides the opportunity to bring the colleges and universities simply use the online infrastructure for institutions and students. The main aim of this paper is to introduce to find similar patterns of use in the data gathered from Learning Online Network with Computer-Assisted Personalized Approach (LON-CAPA), and eventually be able to make predictions as to the most- beneficial course of studies for each learner based on their present usage. The system could then make suggestions to the learner as to how to best proceed. The objective is to predict the students’ final grades based on their web-use features, which are extracted from the homework data. Using a GA to optimize a combination of classifiers test data we selected the student and course data of a LON-CAPA course, we design, implement, and evaluate a series of pattern classifiers with various parameters in order to compare their performance on a dataset from LON-CAPA. Keywords: Data Mining; Genetic Algorithm; Clustering; Classification; Prediction. 1. INTRODUCTION Data mining is a knowledge discovery process to find previously unknown, potentially useful and non-trivial patterns from large repositories of data [4]. The application of data mining technique such as classification approach is used to extract knowledge from web data? There are three web mining categories: web content mining, web structure mining and web usage mining [3]. Web usage INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 130 mining is the more relevant technique for e-learning systems. Web usage mining generally refers to the application of data mining techniques on web logs and meta-data. Frequently used methods in web usage mining are: • Association rules. Associations between web pages visited. • Sequence analysis. Analyzing sequences of page hits in a visit or between visits by the same user. • Clustering and classification. Grouping users by navigation behavior, grouping pages by content, type, access, and grouping similar navigation behaviors. The use of rule mining in education is not new but was already successfully employed in several web-based educational systems. Data mining techniques can discover useful information that can be used in formative evaluation to assist educators establish a pedagogical basis for decisions when designing or modifying an environment or teaching approach. The application of data mining in educational systems is an iterative cycle of hypothesis formation, testing, and refinement (Figure. 1). Mined knowledge should enter the loop of the system and guide, facilitate and enhance learning as a whole. Not only turning data into knowledge, but also filtering mined knowledge for decision making. This concept is shown in figure 1. Figure 1: A Cycle of applying data mining education systems This technique web based system of educators and academics responsible are in charge of designing, planning, building and maintaining the educational systems. Students use and interact with them. Starting from all the available information about courses, students, usage and interaction, different data mining techniques can be applied in order to discover useful knowledge that helps to improve the e-learning process. The discovered knowledge can be used not only by providers (educators) but also by own users (students).
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 131 2. PROPOSED SYSTEM Many leading instructional establishments are operating to ascertain an internet teaching and learning presence. Many systems with totally different capabilities and approaches have been developed to deliver on-line education in a tutorial setting. In this paper, two kinds of large data sets are proposed. • Educational resources such as web pages, demonstrations, simulations, and individualized problems designed for use on homework assignments, quizzes, and examinations; and • Information about users who create, modify, assess, or use these resources. In other words, we have two ever-growing pools of data. Figure 2: Proposed System Architecture In this proposed system there are two modules considered for this framework such as User and Administrator module. In this framework, administrator is responsible to capture all the user preferences including the analysis of the domain data. All users have the capabilities to communicate and cooperate with web based system network for online learning and teaching presence. Administrator is maintaining all data sets of examination, quizzes and home work assignments in data reposit. User or Student module is responsible to register the all details about them in this frame work application after registration student can access the web based system network and participating in the online teaching. Once student was participated in this frame work administrator has allotted the grading of the student for participating in examination, home work, quizzes and assignment etc. These all data has stored in data repository.
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 132 Figure 3: Process Flow Diagram 3. CLASSIFICATION Classification is used to find a model that segregates data into predefined classes [3]. Thus classification is based on the features present in the data. The result is a description of the present data and a better understanding of each class in the database. Thus classification provides a model for describing future data. Prediction helps users make a decision. Predictive modeling for knowledge discovery in databases predicts unknown or future values of some attributes of interest based on the values of other attributes in a database. Different methodologies have been used for classification and developing predictive modeling including Bayesian inference, neural net approaches, decision tree-based methods and genetic algorithms-based approaches. 3.1 Nearest Neighbor method: The k-nearest neighbor algorithm makes a classification for a given sample without making any assumptions about the distribution of the training and testing data [4]. Each testing sample must be compared to all the samples in the training set in order to classify the sample. In order to make a decision using this algorithm, the distances between the testing sample and all the samples in the training set must first be calculated. In this any distance measurement may be used. The Euclidean distance metric requires normalization of all features into the same range. At this point, the k closest neighbors of the given sample are determined where k represents an integer number between 1 and the total number of samples. The testing sample is then assigned to the label most frequently represented among the k nearest samples. The value of k that is chosen for this decision rule has an affect on the accuracy of the decision rule. The k-nearest neighbor classifier is a nonparametric classifier that is said to yield an efficient performance for optimal values of k.
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 133 Figure 4: K-Nearest Neighbor Algorithm Steps: • The set of stored records • Distance Metric to compute distance between records • The value of k, the number of nearest neighbors to retrieve The k-nearest neighbor classification algorithm Figure 5: Nearest Neighbor for K=3
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 134 3.2 K-Means Method The k-means algorithm is the simplest and most commonly used clustering algorithm employing a square error criterion [3]. It is computationally fast, and iteratively partitions a data set into k disjoint clusters, where the value of k is an algorithmic input. The goal is to obtain the partition (usually of hyper-spherical shape) with the smallest square-error. Suppose k clusters {C1, C2, …, Ck} such that Ck has nk patterns. The mean vector or center of cluster Ck. ( ) ( ) 1 1 kn k k i ik x n µ = = ∑ Where ni is number of patterns in cluster Ci, (among exactly k clusters: C1, C2,…, Ck) and x is the point in space representing the given object. The total squared-error: Where 2 2 k k k T E e= ∑ 2 ( ) ( ) 1 ( )( ) kn k k k k k i i i e x xµ µ = = − −∑ The steps of the iterative algorithm for partitioned clustering are as follows: 1. Choose an initial partition with k < n clusters (µ1, µ2, …, µk ) are cluster centers and n is the number of patterns). 2. Generate a new partition by assigning a pattern to its nearest cluster center µi . 3. Recompute new cluster centers µi . 4. Go to step 2 unless there is no change in µi . 5. Return µ1, µ2, …, µk as the mean values of C1, C2,…, Ck. 3.3 Classifiers Pattern recognition has a wide variety of applications in many different fields; therefore it is not possible to come up with a single classifier that can give optimal results in each case. The optimal classifier in every case is highly dependent on the problem domain. In practice, one might come across a case where no single classifier can perform at an acceptable level of accuracy. In such cases it would be better to pool the results of different classifiers to achieve the optimal accuracy. Every classifier operates well on different aspects of the training or test feature vector. As a result, assuming appropriate conditions, combining multiple classifiers may improve classification performance when compared with any single classifier. 4. OPTIMIZATION USING GENETIC ALGORITHM A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover [11].
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 135 In a genetic algorithm, a population of strings (called chromosomes or the genotype of the genome), which encode candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem, evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0’s and 1’s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the algorithm [14]. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached. Genetic Algorithm Steps Begin 1.X: = choose an Initial population; 2.Cost (X) := Compute initial chromosome cost of X; 3.Best-fitness chromosome value := Cost (X); Best-Soln := X; while (stopping criterion not met) do repeat (pre-chosen number of times) 4.X’:= Select a random neighbor from N(X); 5.C: = Cost (X’) – Cost (X); 6.prob := generate random number (0,1); i. If ((C < 0) or (prob <= e-C)) then { X: = X’; Cost (X):= Cost (X’) if (Cost (X) < Best- fitness chromosome value) then { Best- fitness chromosome Value: = Cost (X); Best-Soln: = X; } } 7.End repeat;
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 136 8.End while; Output Best- fitness chromosome value; End 5. IMPLEMENTATION The performance of faculty a college of any school or an Institute has been found to be obsessed with variety of parameters broadly starting from the individual’s qualifications, experience, level of commitment, analysis activities undertaken to institutional support, monetary feasibility, high management’s support etc. • Use a GA as an optimization tool for resetting the parameters in other classifiers. • Most applications of GAs in pattern recognition optimize some parameters in the classification process. • Implement and use a GA to optimize a combination of classifiers • We design, implement, and evaluate a series of pattern classifiers with various parameters in order to compare their performance on a dataset from LON-CAPA • Some of students dropped the course after doing a couple of homework sets, so they do not have any final grades The project has 3 major actors. 1. System Administrator 2. Student 3. Faculty System Administrator The Administrator is responsible for 1. Adding multiple students to the system 2. Managing the faculties like assigning them to the colleges 3. Deciding the no of examination questions for students 4. Deciding the questions for quizzes 5. Check the summary report
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 137 Student Role of Student: 1. Change his/her password 2. Participating in the quizzes and examination 6. RESULTS Some experiments in order to evaluate the performance and usefulness of different classification algorithms for predicting students’ final marks based on information in the students usage data in an e-learning system. The main objective is to classify students with final marks into different groups depending on the activities carried out in a web-based course. The Table I shows the sample student dataset and Table II shows the sample Faculty dataset. Table I: Student Details Name Sid Address Email-ID Phone Zip Pwd Harshith 111 Warangal harivala@gmail.com 8823462345 506001 Hari Ravi 333 Hyderabad ravi@yahoo.com 9987876543 500031 Ravi01 Ranjeet 544 Banglore ranjeet@gmail.com 8876523456 500015 Ran07 Meghana 344 Kazipet megha@rediffmail.com 8345562547 506006 Megha Manvitha 123 Warangal manvi_45@gmail.com 9753256324 506004 Manvi88 Varshith 653 Hyderabad varshit_k@yahoo.com 9562453562 500021 Var667 Stalin 768 Banglore stalin33ch@gmail.com 9876589786 560007 Sta222 Maleeha 543 Mumbai maleeha@gmail.com 9976854634 400022 Mal99 Ashish 434 Warangal ashish.raj@yahoo.com 8765439834 500600 Ashish3 Nanditha 577 Warangal nandu@yahoo.com 9812367854 500603 nan345 Vikranth 666 Mumbai vikranth67@gmail.com 8876545987 400106 Vik11 Kranthi 888 Hyderabad pkrnathip@gmail.com 9745623456 500025 Kra77 The students answer the assignments and homework given by a particular faculty for a particular subject. This assignment or homework is answered by the student and is evaluated by the system as per the answers submitted by the faculty along with the assignment. The assignment or homework has objective questions which are evaluated by the system. The students are graded as High, Medium or Fail as shown in the Figure 6. Table II: Faculty Details F_Id Faculty Pwd Sub Address Mobile Email –ID Zip 1 Neelima 123 1 Warangal 998765345 3 neelima@gmail.com 506001 2 Vijay 658 7 Hyderabad 998769987 1 vijay@yahoo.com 500041 3 Swetha 452 3 Mumbai 889765478 swethas@gmail.com 400071 4 Kamal 776 7 Hyderabad 998786231 3 nkamal@gmail.com 500004
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 138 Figure 6: Grade Distribution VII. CONCLUSION A new approach to classifying student usage of web-based instruction was proposed. Three classifiers are used in grouping the students. Weighing the features and using a genetic algorithm to minimize the error rate improves the prediction accuracy by at least 10%. The successful optimization of student classification in all three cases demonstrates the merits of using the LON- CAPA data to predict the students final grades based on their features, which are extracted from the homework data. This approach is easily adaptable to different types of courses, different population sizes, and allows for different features to be analyzed. This work represents a rigorous application of known classifiers as a means of analyzing and comparing use and performance of students who have taken a technical course that was partially or completely administered by the web. REFERENCES [1] Minaei Bidgoli, B., and Punch, “Using Genetic Algorithms for Data Mining Optimization in an Educational Web-based System”. [2] Freitas, A.A. “A survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery”, See: www.pgia.pucpr.br/~alex/papers. To appear in: A. Ghosh and S. Tsutsui. (Eds.) Advances in Evolutionary Computation. Springer-Verlag. [3] Duda, R.O, Hart, P.E, and Stork D.G. “Pattern Classification”. 2nd Edition, John Wiley & Sons, Inc., New York NY. [4] Kuncheva, L.I., and Jain, L.C., “Designing Classifier Fusion Systems by Genetic Algorithms”, IEEE Transaction on Evolutionary Computation, Vol. 33 2000, pp 351-373 [5] Kortemeyer, G., Bauer, W., Kashy, D. A., Kashy, E., & Speier, C. ,”The Learning Online Network with CAPA Initiative”, Proceedings of the Frontiers in Education conference, 2001. http://www.lon-capa.org. [6] Kashy, D. A., Albertelli, G., Ashkenazi, G., Kashy E. Ng, H. K., & Thoennessen, M., “Individualized interactive exercises: A promising role for network technology”, Proceedings of the Frontiers in Education conference, 2001.
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 © IAEME 139 [7] Albertelli, G., Minaei-Bigdoli, B., Punch, W.F., Kortemeyer, G., & Kashy, E., “Concept Feedback in Computer-Assisted Assignments”, Proceedings of the Frontiers in Education conference, 2002. [8] Guerra-Salcedo C. and Whitley D. “Feature Selection mechanisms for ensemble creation: a genetic search perspective”. Freitas AA (Ed.) Data Mining with Evolutionary Algorithms: Research Directions – Papers from the AAAI Workshop, 13-17. Technical Report WS-99-06. AAAI Press, 1999. [9] Martin-Bautista MJ and Vila MA. “A survey of genetic feature selection in mining issues”. Proceeding Congress on Evolutionary Computation (CEC-99), 1314-1321. [10] Pei, M., Punch, W.F., and Goodman, E.D. "Feature Extraction Using Genetic Algorithms", Proceeding of International Symposium on Intelligent Data Engineering and Learning ’98 (IDEAL’98), Hong Kong, Oct. 1998. [11] Pei, M., Goodman, E.D., and Punch, W.F. "Pattern Discovery from Data Using Genetic Algorithms", Proceeding of 1st Pacific-Asia Conference Knowledge Discovery & Data Mining (PAKDD-97). Feb.1997. [12] Jain, A. K.; Zongker, D. “Feature Selection: Evaluation, Application, and Small Sample Performance”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 19, No. 2, February 1997. [13] Michalewicz Z. “Genetic Algorithms + Data Structures = Evolution Programs”, 3rd Ed. Springer-Verlag, 1996. [14] Bandyopadhyay, S., and Muthy, C.A. “Pattern Classification Using Genetic Algorithms”, Pattern Recognition Letters, Vol. 16, 1995, pp.801-808. [15] Bala J., De Jong K., Huang J., Vafaie H., and Wechsler H. “Using learning to facilitate the evolution of features for recognizing visual concepts”. Evolutionary Computation 4(3) - Special Issue on Evolution, Learning, and Instinct: 100 years of the Baldwin Effect. 1997. [16] Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, “Development of Pattern Knowledge Discovery Framework using Clustering Data Mining Algorithm”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, pp. 101 - 112, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [17] Ravita Mishra, “Web Usage Mining Contextual Factor: Human Information Behavior”, International Journal of Information Technology and Management Information Systems (IJITMIS), Volume 5, Issue 1, 2014, pp. 12 - 29, ISSN Print: 0976 – 6405, ISSN Online: 0976 – 6413. [18] R. Vijaya Prakash, Dr. A. Govardhan and Prof. S.S.V.N. Sarmaeswari, “Mining Non- Redundant Frequent Patterns in Multi-Level Datasets using Min Max Approximate Rules”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 271 - 279, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [19] Nathan D’lima, Anirudh Prabhu, Jaison Joseph and Shamsuddin S. Khan, “Novel Approach in E-Learning to Imbibe Environmental Awareness”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 166 - 171, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [20] R. Manickam, D. Boominath and V. Bhuvaneswari, “An Analysis of Data Mining: Past, Present and Future”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 1 - 9, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.