SlideShare a Scribd company logo
1 of 5
ISSN: 2277 – 9043
                                                       International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                     Volume 1, Issue 4, June 2012




       Knowledge Discovery in Databases (KDD) in
      online educational system through LON-CAPA
                          system
                      1
                          Prof. Darshana Kishorbhai Dave, 2 Prof. Preeti Kishorbhai Dave ,

Abstract—In this paper we give a definition of KDD and Data
Mining, describing its tasks, methods, and applications. Our
motivation in this study is gaining the best technique for
extracting useful information from large amounts of data in an
online educational system, in general, and from the LON-CAPA
system, in particular. The goals for this study are: to obtain an
optimal predictive model for students within such systems, help
students use the learning resources better, based on the usage of
the resource by other students in their groups, help instructors
design their curricula more effectively, and provide the
information that can be usefully applied by instructors to               Figure 1.1 Steps of the KDD Process (Fayyad et al., 1996)
increase student learning.
                                                                         The process of data mining uses machine learning, statistics,
  Index Terms— LON-CAPA system, Knowledge Discovery in                   and visualization Techniques to discover and present
Databases (KDD), Data Mining                                             knowledge in a form that is easily comprehensible.
                                                                         The word “Knowledge” in KDD refers to the discovery of
                                                                         patterns which are extracted from the processed data. A
                          I. INTRODUCTION                                pattern is an expression describing facts in a subset of the
 Presently, the amount of data stored in databases is                    data. Thus, the difference between KDD and data mining is
increasing at a tremendous speed.                                        that “KDD refers to the overall process of discovering
This gives rise to a need for new techniques and tools to aid            knowledge from data while data mining refers to application
humans in automatically and intelligently analyzing huge                 of algorithms for extracting patterns from data without the
                                                                         additional steps of the KDD process.” (1)
data sets to gather useful information. This growing need
                                                                         However, since Data Mining is a crucial and important part
gives birth to a new research field called Knowledge
                                                                         of the KDD process, Most researchers use both terms
Discovery in Databases (KDD) or Data Mining, which has
                                                                         interchangeably. Figure 1.1 presents the iterative nature of
attracted attention from researchers in many different fields            the KDD process. Here we outline some of its basic steps as
including database design, statistics, pattern recognition,              are mentioned in Brachman & Anad (1996):
machine learning, and data visualization.                                • Providing an understanding of the application domain, the
                                                                         goals of the system And its users, and the relevant prior
Data Mining is the process of analyzing data from different              background and prior knowledge (This step In not specified
perspectives and      Summarizing the results as useful                  in this figure.)
information. It has been defined as "the nontrivial                      • Selecting a data set, or focusing on a subset of variables or
Process of identifying valid, novel, potentially useful, and             data samples, on Which discovery is to be performed?
ultimately understandable Patterns in data" (1).                         • Preprocessing and data cleansing, removing the noise,
                                                                         collecting the necessary Information for modeling, selecting
                                                                         methods for handling missing data fields, Accounting for
                                                                         time sequence information and changes
                                                                         • Data reduction and projection, finding appropriate features
                                                                         to represent data, Using dimensionality reduction or
                                                                         transformation methods to reduce the number
                                                                         Of variables to find invariant representations for data
                                                                         • Choosing the data mining task depending on the goal of
                                                                         KDD: clustering, Classification, regression, and so forth
   Manuscript received May, 2012.
                                                                         • Selecting methods and algorithms to be used for searching
    Prof. Darshana Kishorbhai Dave ,Production Engineering
Dept.,Government Engineering College,Bhavnagar ., Bhavnagar, Gujarat,    for the patterns in the Data
India,                                                                   • Mining the knowledge: searching for patterns of interest
   Prof. Preeti Kishorbhai Dave, Electrnonics & Communication            • Evaluating or interpreting the mined patterns, with a
Dept.Shantilal Shah Engineering College,Bhavnagar, Gujarat, India,
                                                                         possible return to any Previous steps
   .

                                                                                                                                              15
                                                 All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                    International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                  Volume 1, Issue 4, June 2012

• Using this knowledge for promoting the performance of the           probabilities or correlation to specify the strengths
system and Resolving any potential conflicts with previously          (quantitative level) of dependencies (3).
held beliefs or extracted Knowledge These are the steps that
all KDD and data mining tasks progress through.                       Mixed tasks

                  II Data Mining Methods                              There are some tasks in data mining that have both
                                                                      descriptive and predictive aspects. Using these tasks, we can
The objective of data mining is both prediction and                   move from basic descriptive tasks toward higher-order
description. That is, to predict Unknown or future values of          predictive tasks. Here, we indicate two of them:
the attributes of interest using other attributes in the              • Association Rule Discovery – Given a set of records each
Databases, while describing the data in a manner                      of which contain some number of items from a given
understandable and interpretable to Humans. Predicting the            collection, produce dependency rules which will predict the
sale amounts of a new product based on advertising                    occurrence of an item based on patterns found in the data.
expenditure, or predicting wind velocities as a function of           • Sequential Pattern Discovery – Given a set of objects,
temperature, humidity, air pressure, etc., are examples of            where each object is associated with its own timeline of
tasks with a predictive goal in data mining. Describing the           events, find rules that predict strong sequential dependencies
different terrain groupings that emerge in a sampling of              among different        events. Rules are formed by first
satellite imagery is an example of a descriptive goal for a data      discovering patterns followed by event occurrences which
mining task. The relative importance of description and               are governed by timing constraints found within those
prediction can vary between different applications. These             patterns.
two goals can be fulfilled by any of a number data mining
tasks including:       classification, regression, clustering,        So far we briefly described the main concepts of data mining.
summarization, dependency modeling, and deviation                     Chapter two focuses on methods and algorithms of data
detection. (2)                                                        mining in the context of descriptive and predictive tasks. The
                                                                      research background of both the association rule and
Predictive tasks                                                      sequential pattern mining – newer techniques in data mining,
                                                                      that deserve a separate discussion . Data mining does not
The following are general tasks that serve predictive data            take place in a vacuum. In other words, any application of
mining goals:                                                         this method of analysis is dependent upon the context in
• Classification – to segregate items into several predefined         which it takes place. Therefore, it is necessary to know the
classes. Given a Collection of training samples, this type of         environment in which we are going to use data mining
task can be designed to find a Model for class attributes as a        methods. The next section provides a brief overview of the
function of the values of other attributes (3).                       LON-CAPA system.
• Regression – to predict a value of a given continuously
valued variable based On the values of other variables,               Online Education systems
assuming either a linear or nonlinear model of Dependency.
These tasks are studied in statistics and neural network fields       Several Online Education systems1 such as Blackboard,
(4).                                                                  WebCT, Virtual University (VU), and some other similar
• Deviation Detection – to discover the most significant              systems have been developed to focus on course management
changes in data from Previously measured or normative                 issues. The objectives of these systems are to present courses
values (5). Explicit information outside the data, like               and instructional programs through the web and other
integrity constraints or predefined patterns, is used for             technologically enhanced media. These new technologies
deviation detection. Arning et al., (1996) approached the             make it possible to offer instruction without the limitations of
problem from the inside of the data, using the implicit               time and place found in traditional university programs.
redundancy.                                                           However, these systems tend to use existing materials and
                                                                      present them as a static package via the Internet. There is
Descriptive tasks                                                     another approach, pursued in LON-CAPA, to construct
                                                                      more-or-less new courses using newer network technology.
• Clustering – to identify a set of categories, or clusters, that     In this model of content creation, college faculty, K-12
describe the data (1).                                                teachers, and students interested in collaboration can access
• Summarization – to find a concise description for a subset          a database of hypermedia software
of data. Tabulating the mean and standard deviations for all
fields is a simple example of summarization. There are more           LON-CAPA, System Overview
sophisticated techniques for summarization and they are
usually applied to facilitate automated report generation and         LON-CAPA is a distributed instructional management
interactive data analysis (2).                                        system, which provides students with personalized problem
• Dependency modeling – to find a model that describes                sets, quizzes, and exams. Personalized (or individualized)
significant dependencies between variables. For example,              homework means that each student sees a slightly different
probabilistic dependency          networks use conditional            computer-generated problem. LON-CAPA provides students
independence to specify the structural level of the model and         and instructors with immediate feedback on conceptual
                                                                      understanding and correctness of solutions. It also provides


                                                                                                                                           16
ISSN: 2277 – 9043
                                                   International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                 Volume 1, Issue 4, June 2012

faculty the ability to augment          their courses with
individualized, relevant exercises, and develop and share
modular online resources. LON-CAPA aims to put this
functionality on a homogeneously distributed platform for
creating, sharing, and delivering course content with
emphasis on cross-institutional collaboration and intellectual
property rights management.

LON-CAPA Topology

LON-CAPA is physically built as a geographically
distributed network of constantly connected servers. Figure
1.2 shows an overview of this network. All machines in the
network are connected with each other through two-way
persistent TCP/IP connections. The network has two classes
of servers: library servers and access servers. A library
server can act as a home server that stores all personal records
of users, and is responsible for the initial authentication of
users when a session is opened on any server in the network.
For authors, it also hosts their construction area and the
authoritative copy of every resource that has been published
by that author. An Access Server is a machine that hosts              Figure 1.2 A schema of distributed data in LON-CAPA
student sessions. Library servers can be used as backups to
host sessions when all access servers in the network are
overloaded. Every user in LON-CAPA is a member of one                Data Distribution in LON-CAPA
domain. Domains could be defined by departmental or
institutional boundaries like MSU, FSU, OHIOU, or the                Educational objects in LON-CAPA range from simple
name of a publishing company. These domains can be used to           paragraphs of text, movies, and applets, to individualized
limit the flow of personal user                                      homework problems. Online educational projects at MSU
information across the network, set access privileges, and           have produced extensive libraries of resources across
enforce royalty schemes. Thus, the student and course data           disciplines. By combining these resources, LON-CAPA
are distributed amongst several repositories. Each user in the       produces a national         distributed digital library with
system has one library server, which is his/her home server.         mechanisms to store and retrieve these objects. Participants
It stores the authoritative copy of all of their records.            in LON-CAPA can publish their own objects in the common
                                                                     pool. LON-CAPA will allow groups of organizations
LON-CAPA         currently   runs     on     Redhat-Linux            (departments, universities, schools, commercial businesses)
Intel-compatible hardware. The LON-CAPA currently runs               to link their online instructional resources in a common
on Redhat-Linux Intel-compatible hardware. The current               marketplace, thus creating an online economy for
MSU production setup consists of several access servers and          instructional resources (lon-capa.org). Internally, all
some library servers. All access servers are set up on a             resources are identified primarily by their URL. LON-CAPA
round-robin IP scheme as frontline machines, and are                 does enable faculty to combine and sequence these learning
accessed by the students for “user session.” The current             objects at several levels. For example, an instructor from
implementation of LON-CAPA uses mod_perl inside of the               Community College A in Texas can compose a page by
Apache web server software.                                          combining a text paragraph from University B in Detroit with
                                                                     a movie from College C in California and an online
                                                                     homework problem from Publisher D in New York. Another
                                                                     instructor from High School E in Canada might take that page
                                                                     from Community College A and combine it with other pages
                                                                     into a module, unit or section. Those in turn can be combined
                                                                     into whole course packs.

                                                                     Resource Variation in LON-CAPA

                                                                     LON-CAPA provides three types of resources for organizing
                                                                     a course. LON-CAPA
                                                                     refers to these resources as Content Pages, Problems, and
                                                                     Maps. Maps may be either of two types: Sequences or Pages.
                                                                     LON-CAPA resources may be used to build the outline, or
                                                                     structure, for the presentation of the course to the students.


                                                                                                                                          17
                                             All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                  International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                Volume 1, Issue 4, June 2012

• A Content Page displays course content. It is essentially a
conventional html
page. These resources use the extension “.html”.
• A Problem resource represents problems for the students to
solve, with answers stored in the system. These resources are
stored in files that must use the extension “.problem”.
• A Page is a type of Map which is used to join other
resources together into one HTML page. For example, a page
of problems will appear as a problem set. These resources are
stored in files that must use the extension “.page”.
• A Sequence is a type of Map, which is used to link other            Figure 1.4 Directory listing of course’s home directory
resources together. Sequences are stored in files that must
use the extension “.sequence”. Sequences can contain other          An example of course data is shown in Figure 1.4. classlist is
sequences and pages.                                                the list of students in the course, environment includes the
Authors create these resources and publish them in library          course’s full name, etc, and resourcedata are deadlines, etc.
servers. Then, instructors use these resources in online            The parameters for homework problems are stored in these
courses. The LON-CAPA system logs any access to these               files . To identify a specific instance of a resource,
resources as well as the sequence and frequency of access in        LON-CAPA uses symbols or “symbs.” These identifiers are
relation to the successful completion of any assignment. All        built from the URL of the map, the resource number of the
these accesses are logged.                                          resource in the map, and the URL of the resource itself. The
                                                                    latter is somewhat redundant, but might help if maps change.
LON-CAPA Strategy for Data Storage                                  An                           example                        is
                                                                    gec/dkd/parts/part1.sequence___19___gec/dkd/tests/part12.
Internally, the student data is stored in a directory:              problem The respective map entry is
/home/httpd/lonUsers/domain/1st.char/2nd.char/3rd.char/use            <resource id="19" src="/res/gec/dkd/tests/part12.problem"
rname/                                                                title="Problem 2"> </resource>
For example /home/httpd/lonUsers/gec/m/i/n/minaeibi/                 Symbs are used by the random number generator, as well as
Figure 1.3 shows a list of a student’s data. Files ending with      to store and restore data specific to a certain instance of a
.db are GDBM files (Berkeley database), while those with a          problem. More details of the stored data and their exact
course-ID          as         name,          for       example      structures will be explained in chapter three, when we will
gec_12679c3ed543a25msul1.db, store performance data for             describe the data acquisition of the system.
that student in the course.
                                                                    Resource Evaluation in LON-CAPA

                                                                    One of the most challenging aspects of the system is to
                                                                    provide instructors with information concerning the quality
                                                                    and effectiveness of the various materials in the resource pool
                                                                    on student understanding of concepts. These materials can
                                                                    include web pages, demonstrations, simulations, and
                                                                    individualized problems designed for use on homework
                                                                    assignments, quizzes, and examinations. The system
                                                                    generates a range of statistics that can be useful in evaluating
                                                                    the degree to which individual problems are effective in
                                                                    promoting formative learning for students. For example, each
                                                                    exam problem contains attached metadata that catalog its
   Figure 1.3 Directory listing of user’s home directory            degree of difficulty and discrimination for students at
                                                                    different phases in their education (i.e., introductory college
Courses are assigned to users, not vice versa. Internally,          courses, advanced college courses, and so on). To evaluate
courses are handled like users without login privileges. The        resource pool materials, a standardized format is required so
username       is     a     unique     ID,     for    example       that materials from different sources can be compared. This
msu_12679c3ed543a25msul1 – every course in every                    helps resource users to select the most effective materials
semester has a unique ID, and there is no semester transition.      available.
The user-data of the course includes the full name of the           LON-CAPA has also provided questionnaires which are
course, a pointer to its top-level resource map (“course            completed by faculty and students who use the educational
map”), and any associated deadlines, spreadsheets, etc., as         materials to assess the quality and efficacy of resources.
well as a course enrollment list. The latter is somewhat            In addition to providing the questionnaires and using the
redundant, since in principle, this list could be produced by       statistical reports, we investigate here methods to find
going through the roles of all users, and looking for the valid     criteria for classifying students and grouping problems by
role for a student in that course.                                  examining logged data such as: time spent on a particular
                                                                    resource, resources visited (other web pages), due date for
                                                                    each homework, the difficulty of problems (observed


                                                                                                                                         18
ISSN: 2277 – 9043
                                                   International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                 Volume 1, Issue 4, June 2012

statistically) and others. Thus, floods of data on individual        for Educational Communications and Technology’s.
usage patterns need to be gathered and sorted – especially as        Jonassen. New York, NY, Macmillan.
students go through multiple steps to solve problems, and            [7]Shute, V. J., L. A. Gawlick-Grendell, et al. (1993). An
choose between multiple representations of the same                  Experiential System for Learning Probability: Stat Lady.
educational objects like video lecture demonstrations, a             Annual Meeting of the American Educational Research
derivation, a worked example, case-studies, and etc. As the          Association, Atlanta, GA.
resource pool grows, multiple content representations will be
available for use by students. There has been an increasing
demand for automated methods of resource evaluation.

One such method is data mining, which is the focus of this
research. Since the LON-CAPA data analyses are specific to
the field of education, it is important to recognize the general
context of using artificial intelligence in education.
The following section presents a brief review of intelligent
tutoring systems – one typical application of artificial
intelligence in the field of education. Note that herein the                              Prof. Darshana K. Dave has received the M.E. in
purpose is not to develop an intelligent tutoring system;            Mechanical Engineering (CAD/CAM) from Gujarat University, Gujarat ,
instead we apply the main ideas of intelligent tutoring              India in 2009 and B.E. in Production Engineering degree from Bhavnagar
systems in an online environment, and implement data                 University, India in 2003
                                                                        She is working as Assistant Professor in Production Engineering
mining methods to improve the performance of the                     Department of Government Engineering College Bhavnagar, Gujarat, India.
educational web-based system, LON-CAPA.                              Her research interests includes Digital Image Processing, CAD , CAM and
                                                                     Welding . E-Mail id : dave_dars@yahoo.co.in
                       V CONCLUSION
This paper addresses data mining methods for extracting
useful and interesting knowledge from the large data sets of
students using LON-CAPA educational resources. The
purpose is to develop techniques that will provide
information that can be usefully applied by instructors to
increase student learning, detect anomalies in homework
problems, design the curricula more effectively, predict the                             Prof. Preeti K. Dave has received the B.E. in
                                                                     Electronics and Communication degree from Bhavnagar University, India in
approaches that students will take for some types of                 2000 and M.E. in Electrical Engineering (Microprocessor Application) from
problems, and provide appropriate advising for students in a         M. S. University, Vadodara , India in 2009.
timely manner, etc. This introductory chapter provided an               She is working as Assistant Professor in Electronics and Communication
overview of the LON-CAPA system, the context in which we             Department of Shantilal Shah Engineering College, Bhavnagar, Gujarat,
                                                                     India. Her research interests includes Digital Image Processing, Digital
are going to use data mining methods.                                Signal Processing , Antenna and Wave Propagation and Analog Electronics.
                                                                     E-Mail id : preetidave@yahoo.com
References:

[1] Agrawal, R.; Srikant, R. (1994) "Fast Algorithms for
Mining Association Rules", Proceeding of the 20th
International Conference on Very Large Databases,
Santiago, Chile, September 1994.
[2] Agrawal, R. and Srikant. R. (1995) “Mining Sequential
Patterns”. In Proceeding of the 11th International
Conference on Data Engineering, Taipei, Taiwan,
March1995.
[3] Agrawal, R., Shafer, J.C. (1996) "Parallel Mining of
Association Rules", IEEE Transactions on Knowledge and
Data Engineering, Vol. 8, No. 6, December 1996.
[4] Arning, A., Agrawal, R., and Raghavan, P. (1996) “A
Linear Method for Deviation Detection in Large Databases”
Proceeding of 2nd international conference on Knowledge
Discovery and Data Mining, KDD 1996.
[5]Ching, J.Y. Wong, A.K.C. and Chan, C.C. (1995). “Class
Dependent Discretisation for Inductive Learning from
Continuous and Mixed-mode Data”.IEEE Transaction.
PAMI, 17(7) 641 - 645.
[6] Shute, V. and J. Psotka (1996). Intelligent Tutoring
Systems: Past, Present, and Future. Handbook of Research

                                                                                                                                          19
                                             All Rights Reserved © 2012 IJARCSEE

More Related Content

What's hot

Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Raghavendra Pokuri
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...butest
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET Journal
 
Analytics and reporting context linkedin final
Analytics and reporting context linkedin finalAnalytics and reporting context linkedin final
Analytics and reporting context linkedin finalDennis Crow
 
Cb pattern trees identifying
Cb pattern trees  identifyingCb pattern trees  identifying
Cb pattern trees identifyingIJDKP
 
A Review on Reversible Data Hiding Scheme by Image Contrast Enhancement
A Review on Reversible Data Hiding Scheme by Image Contrast EnhancementA Review on Reversible Data Hiding Scheme by Image Contrast Enhancement
A Review on Reversible Data Hiding Scheme by Image Contrast EnhancementIJERA Editor
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaIJDKP
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaIJERA Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...CSCJournals
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139IJRAT
 
Adaptive Neural Fuzzy Inference System for Employability Assessment
Adaptive Neural Fuzzy Inference System for Employability AssessmentAdaptive Neural Fuzzy Inference System for Employability Assessment
Adaptive Neural Fuzzy Inference System for Employability AssessmentEditor IJCATR
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek outiaemedu
 

What's hot (19)

Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
 
Analytics and reporting context linkedin final
Analytics and reporting context linkedin finalAnalytics and reporting context linkedin final
Analytics and reporting context linkedin final
 
Cb pattern trees identifying
Cb pattern trees  identifyingCb pattern trees  identifying
Cb pattern trees identifying
 
A Review on Reversible Data Hiding Scheme by Image Contrast Enhancement
A Review on Reversible Data Hiding Scheme by Image Contrast EnhancementA Review on Reversible Data Hiding Scheme by Image Contrast Enhancement
A Review on Reversible Data Hiding Scheme by Image Contrast Enhancement
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
F035431037
F035431037F035431037
F035431037
 
Hm2413291336
Hm2413291336Hm2413291336
Hm2413291336
 
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
 
1699 1704
1699 17041699 1704
1699 1704
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139
 
Adaptive Neural Fuzzy Inference System for Employability Assessment
Adaptive Neural Fuzzy Inference System for Employability AssessmentAdaptive Neural Fuzzy Inference System for Employability Assessment
Adaptive Neural Fuzzy Inference System for Employability Assessment
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
 

Viewers also liked (10)

Try
TryTry
Try
 
Team up to beam up session#6
Team up to beam up session#6Team up to beam up session#6
Team up to beam up session#6
 
Try2
Try2Try2
Try2
 
48 51
48 5148 51
48 51
 
1 9
1 91 9
1 9
 
Team up to beam up session #9
Team up to beam up session #9Team up to beam up session #9
Team up to beam up session #9
 
Ht israel - Introduction to DACs
Ht israel - Introduction to DACsHt israel - Introduction to DACs
Ht israel - Introduction to DACs
 
35 38
35 3835 38
35 38
 
79 83
79 8379 83
79 83
 
Rules in Capitalization and Punctuation
Rules in Capitalization and PunctuationRules in Capitalization and Punctuation
Rules in Capitalization and Punctuation
 

Similar to 15 19

knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)Kartik Kalpande Patil
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
 
A review on data mining
A  review on data miningA  review on data mining
A review on data miningEr. Nancy
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueMehmet Beyaz
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
A Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data MiningA Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data MiningBRNSSPublicationHubI
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemIJSRD
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemIJSRD
 
Data Mining Module 1 Business Analytics.
Data Mining Module 1 Business Analytics.Data Mining Module 1 Business Analytics.
Data Mining Module 1 Business Analytics.Jayanti Pande
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
 

Similar to 15 19 (20)

knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
 
A review on data mining
A  review on data miningA  review on data mining
A review on data mining
 
1 5
1 51 5
1 5
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data mining
Data miningData mining
Data mining
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
A Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data MiningA Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data Mining
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
 
Data mining
Data miningData mining
Data mining
 
Data Mining Module 1 Business Analytics.
Data Mining Module 1 Business Analytics.Data Mining Module 1 Business Analytics.
Data Mining Module 1 Business Analytics.
 
Dm and dw
Dm and dwDm and dw
Dm and dw
 
KDD assignmnt data.docx
KDD assignmnt data.docxKDD assignmnt data.docx
KDD assignmnt data.docx
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 

More from Ijarcsee Journal (20)

130 133
130 133130 133
130 133
 
122 129
122 129122 129
122 129
 
116 121
116 121116 121
116 121
 
109 115
109 115109 115
109 115
 
104 108
104 108104 108
104 108
 
99 103
99 10399 103
99 103
 
93 98
93 9893 98
93 98
 
88 92
88 9288 92
88 92
 
82 87
82 8782 87
82 87
 
78 81
78 8178 81
78 81
 
73 77
73 7773 77
73 77
 
65 72
65 7265 72
65 72
 
58 64
58 6458 64
58 64
 
52 57
52 5752 57
52 57
 
46 51
46 5146 51
46 51
 
41 45
41 4541 45
41 45
 
36 40
36 4036 40
36 40
 
28 35
28 3528 35
28 35
 
24 27
24 2724 27
24 27
 
19 23
19 2319 23
19 23
 

Recently uploaded

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Recently uploaded (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

15 19

  • 1. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 Knowledge Discovery in Databases (KDD) in online educational system through LON-CAPA system 1 Prof. Darshana Kishorbhai Dave, 2 Prof. Preeti Kishorbhai Dave ,  Abstract—In this paper we give a definition of KDD and Data Mining, describing its tasks, methods, and applications. Our motivation in this study is gaining the best technique for extracting useful information from large amounts of data in an online educational system, in general, and from the LON-CAPA system, in particular. The goals for this study are: to obtain an optimal predictive model for students within such systems, help students use the learning resources better, based on the usage of the resource by other students in their groups, help instructors design their curricula more effectively, and provide the information that can be usefully applied by instructors to Figure 1.1 Steps of the KDD Process (Fayyad et al., 1996) increase student learning. The process of data mining uses machine learning, statistics, Index Terms— LON-CAPA system, Knowledge Discovery in and visualization Techniques to discover and present Databases (KDD), Data Mining knowledge in a form that is easily comprehensible. The word “Knowledge” in KDD refers to the discovery of patterns which are extracted from the processed data. A I. INTRODUCTION pattern is an expression describing facts in a subset of the Presently, the amount of data stored in databases is data. Thus, the difference between KDD and data mining is increasing at a tremendous speed. that “KDD refers to the overall process of discovering This gives rise to a need for new techniques and tools to aid knowledge from data while data mining refers to application humans in automatically and intelligently analyzing huge of algorithms for extracting patterns from data without the additional steps of the KDD process.” (1) data sets to gather useful information. This growing need However, since Data Mining is a crucial and important part gives birth to a new research field called Knowledge of the KDD process, Most researchers use both terms Discovery in Databases (KDD) or Data Mining, which has interchangeably. Figure 1.1 presents the iterative nature of attracted attention from researchers in many different fields the KDD process. Here we outline some of its basic steps as including database design, statistics, pattern recognition, are mentioned in Brachman & Anad (1996): machine learning, and data visualization. • Providing an understanding of the application domain, the goals of the system And its users, and the relevant prior Data Mining is the process of analyzing data from different background and prior knowledge (This step In not specified perspectives and Summarizing the results as useful in this figure.) information. It has been defined as "the nontrivial • Selecting a data set, or focusing on a subset of variables or Process of identifying valid, novel, potentially useful, and data samples, on Which discovery is to be performed? ultimately understandable Patterns in data" (1). • Preprocessing and data cleansing, removing the noise, collecting the necessary Information for modeling, selecting methods for handling missing data fields, Accounting for time sequence information and changes • Data reduction and projection, finding appropriate features to represent data, Using dimensionality reduction or transformation methods to reduce the number Of variables to find invariant representations for data • Choosing the data mining task depending on the goal of KDD: clustering, Classification, regression, and so forth Manuscript received May, 2012. • Selecting methods and algorithms to be used for searching Prof. Darshana Kishorbhai Dave ,Production Engineering Dept.,Government Engineering College,Bhavnagar ., Bhavnagar, Gujarat, for the patterns in the Data India, • Mining the knowledge: searching for patterns of interest Prof. Preeti Kishorbhai Dave, Electrnonics & Communication • Evaluating or interpreting the mined patterns, with a Dept.Shantilal Shah Engineering College,Bhavnagar, Gujarat, India, possible return to any Previous steps . 15 All Rights Reserved © 2012 IJARCSEE
  • 2. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 • Using this knowledge for promoting the performance of the probabilities or correlation to specify the strengths system and Resolving any potential conflicts with previously (quantitative level) of dependencies (3). held beliefs or extracted Knowledge These are the steps that all KDD and data mining tasks progress through. Mixed tasks II Data Mining Methods There are some tasks in data mining that have both descriptive and predictive aspects. Using these tasks, we can The objective of data mining is both prediction and move from basic descriptive tasks toward higher-order description. That is, to predict Unknown or future values of predictive tasks. Here, we indicate two of them: the attributes of interest using other attributes in the • Association Rule Discovery – Given a set of records each Databases, while describing the data in a manner of which contain some number of items from a given understandable and interpretable to Humans. Predicting the collection, produce dependency rules which will predict the sale amounts of a new product based on advertising occurrence of an item based on patterns found in the data. expenditure, or predicting wind velocities as a function of • Sequential Pattern Discovery – Given a set of objects, temperature, humidity, air pressure, etc., are examples of where each object is associated with its own timeline of tasks with a predictive goal in data mining. Describing the events, find rules that predict strong sequential dependencies different terrain groupings that emerge in a sampling of among different events. Rules are formed by first satellite imagery is an example of a descriptive goal for a data discovering patterns followed by event occurrences which mining task. The relative importance of description and are governed by timing constraints found within those prediction can vary between different applications. These patterns. two goals can be fulfilled by any of a number data mining tasks including: classification, regression, clustering, So far we briefly described the main concepts of data mining. summarization, dependency modeling, and deviation Chapter two focuses on methods and algorithms of data detection. (2) mining in the context of descriptive and predictive tasks. The research background of both the association rule and Predictive tasks sequential pattern mining – newer techniques in data mining, that deserve a separate discussion . Data mining does not The following are general tasks that serve predictive data take place in a vacuum. In other words, any application of mining goals: this method of analysis is dependent upon the context in • Classification – to segregate items into several predefined which it takes place. Therefore, it is necessary to know the classes. Given a Collection of training samples, this type of environment in which we are going to use data mining task can be designed to find a Model for class attributes as a methods. The next section provides a brief overview of the function of the values of other attributes (3). LON-CAPA system. • Regression – to predict a value of a given continuously valued variable based On the values of other variables, Online Education systems assuming either a linear or nonlinear model of Dependency. These tasks are studied in statistics and neural network fields Several Online Education systems1 such as Blackboard, (4). WebCT, Virtual University (VU), and some other similar • Deviation Detection – to discover the most significant systems have been developed to focus on course management changes in data from Previously measured or normative issues. The objectives of these systems are to present courses values (5). Explicit information outside the data, like and instructional programs through the web and other integrity constraints or predefined patterns, is used for technologically enhanced media. These new technologies deviation detection. Arning et al., (1996) approached the make it possible to offer instruction without the limitations of problem from the inside of the data, using the implicit time and place found in traditional university programs. redundancy. However, these systems tend to use existing materials and present them as a static package via the Internet. There is Descriptive tasks another approach, pursued in LON-CAPA, to construct more-or-less new courses using newer network technology. • Clustering – to identify a set of categories, or clusters, that In this model of content creation, college faculty, K-12 describe the data (1). teachers, and students interested in collaboration can access • Summarization – to find a concise description for a subset a database of hypermedia software of data. Tabulating the mean and standard deviations for all fields is a simple example of summarization. There are more LON-CAPA, System Overview sophisticated techniques for summarization and they are usually applied to facilitate automated report generation and LON-CAPA is a distributed instructional management interactive data analysis (2). system, which provides students with personalized problem • Dependency modeling – to find a model that describes sets, quizzes, and exams. Personalized (or individualized) significant dependencies between variables. For example, homework means that each student sees a slightly different probabilistic dependency networks use conditional computer-generated problem. LON-CAPA provides students independence to specify the structural level of the model and and instructors with immediate feedback on conceptual understanding and correctness of solutions. It also provides 16
  • 3. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 faculty the ability to augment their courses with individualized, relevant exercises, and develop and share modular online resources. LON-CAPA aims to put this functionality on a homogeneously distributed platform for creating, sharing, and delivering course content with emphasis on cross-institutional collaboration and intellectual property rights management. LON-CAPA Topology LON-CAPA is physically built as a geographically distributed network of constantly connected servers. Figure 1.2 shows an overview of this network. All machines in the network are connected with each other through two-way persistent TCP/IP connections. The network has two classes of servers: library servers and access servers. A library server can act as a home server that stores all personal records of users, and is responsible for the initial authentication of users when a session is opened on any server in the network. For authors, it also hosts their construction area and the authoritative copy of every resource that has been published by that author. An Access Server is a machine that hosts Figure 1.2 A schema of distributed data in LON-CAPA student sessions. Library servers can be used as backups to host sessions when all access servers in the network are overloaded. Every user in LON-CAPA is a member of one Data Distribution in LON-CAPA domain. Domains could be defined by departmental or institutional boundaries like MSU, FSU, OHIOU, or the Educational objects in LON-CAPA range from simple name of a publishing company. These domains can be used to paragraphs of text, movies, and applets, to individualized limit the flow of personal user homework problems. Online educational projects at MSU information across the network, set access privileges, and have produced extensive libraries of resources across enforce royalty schemes. Thus, the student and course data disciplines. By combining these resources, LON-CAPA are distributed amongst several repositories. Each user in the produces a national distributed digital library with system has one library server, which is his/her home server. mechanisms to store and retrieve these objects. Participants It stores the authoritative copy of all of their records. in LON-CAPA can publish their own objects in the common pool. LON-CAPA will allow groups of organizations LON-CAPA currently runs on Redhat-Linux (departments, universities, schools, commercial businesses) Intel-compatible hardware. The LON-CAPA currently runs to link their online instructional resources in a common on Redhat-Linux Intel-compatible hardware. The current marketplace, thus creating an online economy for MSU production setup consists of several access servers and instructional resources (lon-capa.org). Internally, all some library servers. All access servers are set up on a resources are identified primarily by their URL. LON-CAPA round-robin IP scheme as frontline machines, and are does enable faculty to combine and sequence these learning accessed by the students for “user session.” The current objects at several levels. For example, an instructor from implementation of LON-CAPA uses mod_perl inside of the Community College A in Texas can compose a page by Apache web server software. combining a text paragraph from University B in Detroit with a movie from College C in California and an online homework problem from Publisher D in New York. Another instructor from High School E in Canada might take that page from Community College A and combine it with other pages into a module, unit or section. Those in turn can be combined into whole course packs. Resource Variation in LON-CAPA LON-CAPA provides three types of resources for organizing a course. LON-CAPA refers to these resources as Content Pages, Problems, and Maps. Maps may be either of two types: Sequences or Pages. LON-CAPA resources may be used to build the outline, or structure, for the presentation of the course to the students. 17 All Rights Reserved © 2012 IJARCSEE
  • 4. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 • A Content Page displays course content. It is essentially a conventional html page. These resources use the extension “.html”. • A Problem resource represents problems for the students to solve, with answers stored in the system. These resources are stored in files that must use the extension “.problem”. • A Page is a type of Map which is used to join other resources together into one HTML page. For example, a page of problems will appear as a problem set. These resources are stored in files that must use the extension “.page”. • A Sequence is a type of Map, which is used to link other Figure 1.4 Directory listing of course’s home directory resources together. Sequences are stored in files that must use the extension “.sequence”. Sequences can contain other An example of course data is shown in Figure 1.4. classlist is sequences and pages. the list of students in the course, environment includes the Authors create these resources and publish them in library course’s full name, etc, and resourcedata are deadlines, etc. servers. Then, instructors use these resources in online The parameters for homework problems are stored in these courses. The LON-CAPA system logs any access to these files . To identify a specific instance of a resource, resources as well as the sequence and frequency of access in LON-CAPA uses symbols or “symbs.” These identifiers are relation to the successful completion of any assignment. All built from the URL of the map, the resource number of the these accesses are logged. resource in the map, and the URL of the resource itself. The latter is somewhat redundant, but might help if maps change. LON-CAPA Strategy for Data Storage An example is gec/dkd/parts/part1.sequence___19___gec/dkd/tests/part12. Internally, the student data is stored in a directory: problem The respective map entry is /home/httpd/lonUsers/domain/1st.char/2nd.char/3rd.char/use <resource id="19" src="/res/gec/dkd/tests/part12.problem" rname/ title="Problem 2"> </resource> For example /home/httpd/lonUsers/gec/m/i/n/minaeibi/ Symbs are used by the random number generator, as well as Figure 1.3 shows a list of a student’s data. Files ending with to store and restore data specific to a certain instance of a .db are GDBM files (Berkeley database), while those with a problem. More details of the stored data and their exact course-ID as name, for example structures will be explained in chapter three, when we will gec_12679c3ed543a25msul1.db, store performance data for describe the data acquisition of the system. that student in the course. Resource Evaluation in LON-CAPA One of the most challenging aspects of the system is to provide instructors with information concerning the quality and effectiveness of the various materials in the resource pool on student understanding of concepts. These materials can include web pages, demonstrations, simulations, and individualized problems designed for use on homework assignments, quizzes, and examinations. The system generates a range of statistics that can be useful in evaluating the degree to which individual problems are effective in promoting formative learning for students. For example, each exam problem contains attached metadata that catalog its Figure 1.3 Directory listing of user’s home directory degree of difficulty and discrimination for students at different phases in their education (i.e., introductory college Courses are assigned to users, not vice versa. Internally, courses, advanced college courses, and so on). To evaluate courses are handled like users without login privileges. The resource pool materials, a standardized format is required so username is a unique ID, for example that materials from different sources can be compared. This msu_12679c3ed543a25msul1 – every course in every helps resource users to select the most effective materials semester has a unique ID, and there is no semester transition. available. The user-data of the course includes the full name of the LON-CAPA has also provided questionnaires which are course, a pointer to its top-level resource map (“course completed by faculty and students who use the educational map”), and any associated deadlines, spreadsheets, etc., as materials to assess the quality and efficacy of resources. well as a course enrollment list. The latter is somewhat In addition to providing the questionnaires and using the redundant, since in principle, this list could be produced by statistical reports, we investigate here methods to find going through the roles of all users, and looking for the valid criteria for classifying students and grouping problems by role for a student in that course. examining logged data such as: time spent on a particular resource, resources visited (other web pages), due date for each homework, the difficulty of problems (observed 18
  • 5. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 statistically) and others. Thus, floods of data on individual for Educational Communications and Technology’s. usage patterns need to be gathered and sorted – especially as Jonassen. New York, NY, Macmillan. students go through multiple steps to solve problems, and [7]Shute, V. J., L. A. Gawlick-Grendell, et al. (1993). An choose between multiple representations of the same Experiential System for Learning Probability: Stat Lady. educational objects like video lecture demonstrations, a Annual Meeting of the American Educational Research derivation, a worked example, case-studies, and etc. As the Association, Atlanta, GA. resource pool grows, multiple content representations will be available for use by students. There has been an increasing demand for automated methods of resource evaluation. One such method is data mining, which is the focus of this research. Since the LON-CAPA data analyses are specific to the field of education, it is important to recognize the general context of using artificial intelligence in education. The following section presents a brief review of intelligent tutoring systems – one typical application of artificial intelligence in the field of education. Note that herein the Prof. Darshana K. Dave has received the M.E. in purpose is not to develop an intelligent tutoring system; Mechanical Engineering (CAD/CAM) from Gujarat University, Gujarat , instead we apply the main ideas of intelligent tutoring India in 2009 and B.E. in Production Engineering degree from Bhavnagar systems in an online environment, and implement data University, India in 2003 She is working as Assistant Professor in Production Engineering mining methods to improve the performance of the Department of Government Engineering College Bhavnagar, Gujarat, India. educational web-based system, LON-CAPA. Her research interests includes Digital Image Processing, CAD , CAM and Welding . E-Mail id : dave_dars@yahoo.co.in V CONCLUSION This paper addresses data mining methods for extracting useful and interesting knowledge from the large data sets of students using LON-CAPA educational resources. The purpose is to develop techniques that will provide information that can be usefully applied by instructors to increase student learning, detect anomalies in homework problems, design the curricula more effectively, predict the Prof. Preeti K. Dave has received the B.E. in Electronics and Communication degree from Bhavnagar University, India in approaches that students will take for some types of 2000 and M.E. in Electrical Engineering (Microprocessor Application) from problems, and provide appropriate advising for students in a M. S. University, Vadodara , India in 2009. timely manner, etc. This introductory chapter provided an She is working as Assistant Professor in Electronics and Communication overview of the LON-CAPA system, the context in which we Department of Shantilal Shah Engineering College, Bhavnagar, Gujarat, India. Her research interests includes Digital Image Processing, Digital are going to use data mining methods. Signal Processing , Antenna and Wave Propagation and Analog Electronics. E-Mail id : preetidave@yahoo.com References: [1] Agrawal, R.; Srikant, R. (1994) "Fast Algorithms for Mining Association Rules", Proceeding of the 20th International Conference on Very Large Databases, Santiago, Chile, September 1994. [2] Agrawal, R. and Srikant. R. (1995) “Mining Sequential Patterns”. In Proceeding of the 11th International Conference on Data Engineering, Taipei, Taiwan, March1995. [3] Agrawal, R., Shafer, J.C. (1996) "Parallel Mining of Association Rules", IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, December 1996. [4] Arning, A., Agrawal, R., and Raghavan, P. (1996) “A Linear Method for Deviation Detection in Large Databases” Proceeding of 2nd international conference on Knowledge Discovery and Data Mining, KDD 1996. [5]Ching, J.Y. Wong, A.K.C. and Chan, C.C. (1995). “Class Dependent Discretisation for Inductive Learning from Continuous and Mixed-mode Data”.IEEE Transaction. PAMI, 17(7) 641 - 645. [6] Shute, V. and J. Psotka (1996). Intelligent Tutoring Systems: Past, Present, and Future. Handbook of Research 19 All Rights Reserved © 2012 IJARCSEE