Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where
data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge.Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehouse and the whole process is divded into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data
mining, its techniques and methods in it.
Short introduction to different options for ETL & ELT in the Cloud with Microsoft Azure. This is a small accompanying set of slides for my presentations and blogs on this topic
Introduction to MySQL, and its features with an explanation of the various processes that should be followed in order to have an efficient MySQL implementation.
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Some people use the terms "e-business" and "e-commerce" interchangeably. After all, they both involve business processes conducted electronically -- quite likely on the Internet. Others view e-commerce to be a subset of e-business.
Short introduction to different options for ETL & ELT in the Cloud with Microsoft Azure. This is a small accompanying set of slides for my presentations and blogs on this topic
Introduction to MySQL, and its features with an explanation of the various processes that should be followed in order to have an efficient MySQL implementation.
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Some people use the terms "e-business" and "e-commerce" interchangeably. After all, they both involve business processes conducted electronically -- quite likely on the Internet. Others view e-commerce to be a subset of e-business.
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
Over the last decade, the 3Vs of data - Volume, Velocity & Variety has grown massively. The Big Data revolution has completely changed the way companies collect, analyze & store data. Advancements in cloud-based data warehousing technologies have empowered companies to fully leverage big data without heavy investments both in terms of time and resources. But, that doesn’t mean building and managing a cloud data warehouse isn’t accompanied by any challenges. From deciding on a service provider to the design architecture, deploying a data warehouse tailored to your business needs is a strenuous undertaking. Looking to deploy a data warehouse to scale your company’s data infrastructure or still on the fence? In this presentation you will gain insights into the current Data Warehousing trends, best practices, and future outlook. Learn how to build your data warehouse with the help of real-life use-cases and discussion on commonly faced challenges. In this session you will learn:
- Choosing the best solution - Data Lake vs. Data Warehouse vs. Data Mart
- Choosing the best Data Warehouse design methodologies: Data Vault vs. Kimball vs. Inmon
- Step by step approach to building an effective data warehouse architecture
- Common reasons for the failure of data warehouse implementations and how to avoid them
DAT304_Amazon Aurora Performance Optimization with MySQLKamal Gupta
Amazon Aurora services are MySQL and PostgreSQL -compatible relational database engines with the speed, reliability, and availability of high-end commercial databases at one-tenth the cost. This session introduces you to Amazon Aurora, explores the capabilities and features of Aurora, explains common use cases, and helps you get started with Aurora.
Presentation / Workshop which will teach you the core patterns, concepts and visualisation options of D3.js (v4). Accompanying exercises can be found here: https://github.com/josdirksen/d3exercises
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Achieving Lakehouse Models with Spark 3.0Databricks
It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?
Presentation about how we automate the recurring job of a data analyst in kumparan. It mainly talks about two tools: data warehouse and derived automation.
Educational Data Mining/Learning Analytics issue brief overviewMarie Bienkowski
An overview of the Draft Issue Brief prepared by SRI International for the US Department of Education on Educational Data Mining and Learning Analytics
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
Over the last decade, the 3Vs of data - Volume, Velocity & Variety has grown massively. The Big Data revolution has completely changed the way companies collect, analyze & store data. Advancements in cloud-based data warehousing technologies have empowered companies to fully leverage big data without heavy investments both in terms of time and resources. But, that doesn’t mean building and managing a cloud data warehouse isn’t accompanied by any challenges. From deciding on a service provider to the design architecture, deploying a data warehouse tailored to your business needs is a strenuous undertaking. Looking to deploy a data warehouse to scale your company’s data infrastructure or still on the fence? In this presentation you will gain insights into the current Data Warehousing trends, best practices, and future outlook. Learn how to build your data warehouse with the help of real-life use-cases and discussion on commonly faced challenges. In this session you will learn:
- Choosing the best solution - Data Lake vs. Data Warehouse vs. Data Mart
- Choosing the best Data Warehouse design methodologies: Data Vault vs. Kimball vs. Inmon
- Step by step approach to building an effective data warehouse architecture
- Common reasons for the failure of data warehouse implementations and how to avoid them
DAT304_Amazon Aurora Performance Optimization with MySQLKamal Gupta
Amazon Aurora services are MySQL and PostgreSQL -compatible relational database engines with the speed, reliability, and availability of high-end commercial databases at one-tenth the cost. This session introduces you to Amazon Aurora, explores the capabilities and features of Aurora, explains common use cases, and helps you get started with Aurora.
Presentation / Workshop which will teach you the core patterns, concepts and visualisation options of D3.js (v4). Accompanying exercises can be found here: https://github.com/josdirksen/d3exercises
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Achieving Lakehouse Models with Spark 3.0Databricks
It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?
Presentation about how we automate the recurring job of a data analyst in kumparan. It mainly talks about two tools: data warehouse and derived automation.
Educational Data Mining/Learning Analytics issue brief overviewMarie Bienkowski
An overview of the Draft Issue Brief prepared by SRI International for the US Department of Education on Educational Data Mining and Learning Analytics
Advances in Learning Analytics and Educational Data Mining MehrnooshV
This presentation is about the state-of-the-art of Learning Analytics and Edicational Data Mining. It is presented by Mehrnoosh Vahdat as the introductory tutorial of Special Session 'Advances in Learning Analytics and Educational Data Mining' at ESANN 2015 conference.
Ijdms050304A SURVEY ON EDUCATIONAL DATA MINING AND RESEARCH TRENDSijdms
Educational Data Mining (EDM) is an emerging field exploring data in educational context by applying
different Data Mining (DM) techniques/tools. It provides intrinsic knowledge of teaching and learning
process for effective education planning. In this survey work focuses on components, research trends (1998
to 2012) of EDM highlighting its related Tools, Techniques and educational Outcomes. It also highlights
the Challenges EDM.
Data Analytics has incredible potential to impact education worldwide. There is significant amount of data being collected related to schools and students (e.g. personal information, attendance, marks, reduced lunches and so on), but much of it is administrative and/or siloed and/or unexamined. This document talks about data analytics in education domain.
5 consejos para repartir el capital entre los fundadores de una startupIvan Bedia García
El reparto de capital entre los socios, también conocido como “equity”, es una de los aspectos empresariales que más suele preocupar a los emprendedores.
Para ayudarte a realizar el reparto del capital, hoy te traigo 5 consejos que espero te sean de utilidad:
Data it's big, so, grab it, store it, analyse it, make it accessible...mine, warehouse and visualise...use the pictures in your mind and others will see it your way!
We are living in a world, where a vast amount of digital data which is called big data. Plus as the world becomes more and more connected via the Internet of Things (IoT). The IoT has been a major influence on the Big Data landscape. The analysis of such big data brings ahead business competition to the next level of innovation and productivity.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
An Empirical Study of the Applications of Classification Techniques in Studen...IJERA Editor
University servers and databases store a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. main problem that faces any system administration or any users is data increasing per-second, which is stored in different type and format in the servers, learning about students from a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. Graduation and academic information in the future and maintaining structure and content of the courses according to their previous results become importance. The paper objectives are extract knowledge from incomplete data structure and what the suitable method or technique of data mining to extract knowledge from a huge amount of data about students to help the administration using technology to make a quick decision. Data mining aims to discover useful information or knowledge by using one of data mining techniques, this paper used classification technique to discover knowledge from student’s server database, where all students’ information were registered and stored. The classification task is used, the classifier tree C4.5, to predict the final academic results, grades, of students. We use classifier tree C4.5 as the method to classify the grades for the students .The data include four years period [2006-2009]. Experiment results show that classification process succeeded in training set. Thus, the predicted instances is similar to the training set, this proves the suggested classification model. Also the efficiency and effectiveness of C4.5 algorithm in predicting the academic results, grades, classification is very good. The model also can improve the efficiency of the academic results retrieving and evidently promote retrieval precision.
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION
This paper has discussed various issues and security primitives like Spatial Data Handing, Privacy Protection of data, Data Load Balancing, Resource Mining etc. in the area of Data Mining.A 5-stage review process has been conductedfor 30 research papers which were published in the period of year ranging from 1996 to year 2013. After an exhaustive review process, nine key issues were found “Spatial Data Handing, Data Load Balancing, Resource Mining ,Visual Data Mining, Data Clusters Mining, Privacy Preservation, Mining of gaps between business tools & patterns, Mining of hidden complex patterns.” which have been resolved and explained with proper methodologies. Several solution approaches have been discussed in the 30 papers. This paper provides an outcome of the review which is in the form of various findings, found under various key issues. The findings included algorithms and methodologies used by researchers along with their strengths and weaknesses and the scope for the future work in the area.
Applying Classification Technique using DID3 Algorithm to improve Decision Su...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. 1 Data mining is an interdisciplinary sub field of computer science and statistics with an overall goal to extract from a data set and transform the information into a comprehensible structure for further use. 1 2 3 4 The process of digging through data to discover hidden connections and predict future trends has a long history. Sometimes referred to as 'knowledge discovery' in databases, the term data mining wasn't coined until the 1990s. What was old is new again, as data mining technology keeps evolving to keep pace with the limitless potential of big data and affordable computing power. Over the last decade, advances in processing power and speed have enabled us to move beyond manual, tedious and time consuming practices to quick, easy and automated data analysis. The more complex the data sets collected, the more potential there is to uncover relevant insights. Rupashi Koul "Overview of Data Mining" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31368.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/31368/overview-of-data-mining/rupashi-koul
Data Mining System and Applications: A Reviewijdpsjournal
In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS cscpconf
Cybersecurity solutions are traditionally static and signature-based. The traditional solutions
along with the use of analytic models, machine learning and big data could be improved by
automatically trigger mitigation or provide relevant awareness to control or limit consequences
of threats. This kind of intelligent solutions is covered in the context of Data Science for
Cybersecurity. Data Science provides a significant role in cybersecurity by utilising the power
of data (and big data), high-performance computing and data mining (and machine learning) to
protect users against cybercrimes. For this purpose, a successful data science project requires
an effective methodology to cover all issues and provide adequate resources. In this paper, we
are introducing popular data science methodologies and will compare them in accordance with
cybersecurity challenges. A comparison discussion has also delivered to explain methodologies’
strengths and weaknesses in case of cybersecurity projects.
We have concentrated on a range of strategies, methodologies, and distinct fields of research in this article, all of which are useful and relevant in the field of data mining technologies. As we all know, numerous multinational corporations and major corporations operate in various parts of the world. Each location of business may create significant amounts of data. Corporate decision-makers need access to all of these data sources in order to make strategic decisions.
Applications, Techniques and Trends of Data Mining and Knowledge Discovery Da...ijtsrd
Data Mining and Knowledge Discovery is intended to be the best technical publication in the field providing a resource collecting relevant common methods and techniques. Traditionally, data mining and knowledge discovery was performed manually. As time passed, the amount of data in many systems grew to larger than terabyte size, and could no longer be maintained manually. Besides, for the successful existence of any business, discovering underlying patterns in data is considered essential. This paper proposed about applications, techniques and trends of Data Mining and Knowledge Discovery Database. Khin Sein Hlaing | Yin Myo Kay Khine Thaw "Applications, Techniques and Trends of Data Mining and Knowledge Discovery Database" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26733.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26733/applications-techniques-and-trends-of-data-mining-and-knowledge-discovery-database/khin-sein-hlaing
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
DATA MINING IN EDUCATION : A REVIEW ON
THE KNOWLEDGE DISCOVERY PERSPECTIVE
Pratiyush Guleria and Manu Sood
Department of Computer Science,
Himachal Pradesh University, Shimla, Himachal Pradesh, India
ABSTRACT
Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where
data mining is the core of this process. Data mining can be used to mine understandable meaningful
patterns from large databases and these patterns may then be converted into knowledge.Data mining is the
process of extracting the information and patterns derived by the KDD process which helps in crucial
decision-making.Data mining works with data warehouse and the whole process is divded into action plan
to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we
have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data
mining, its techniques and methods in it.
KEYWORDS
Decision, Knowledge, Mining, Selection, Transformation, Warehouse
1. INTRODUCTION
Knowledge Discovery in Databases (KDD) is the process of finding useful knowledge from large
dataset.Data preparation, pattern search, knowledge evaluation and refinement are steps of KDD
[1]. According to Han Jiawei, the process of Knowledge Discovery consists of Data Cleaning,
Data Integration, Data Selection, Transformation, Data Mining and Pattern Evaluation Phases [2].
Data mining (DM) is the process where data is analysed and summarized into useful informa-tion.
In short, data mining is process of deriving patterns from large databases [3].DM analyses
large dataset to extract hidden patterns such as similar groups of data records using clustering
technique.This data is used for machine learning and predictive analysis.DM works to analyze
data stored in data warehouses and results in effective decision making [4]. Weiss and Indurkhya
in [5] proposed that, “DM is the search for valuable information in large volumes of data”. Ac-cording
to Technology Forecast [6], and Piatetsky-Shapiro et al. [7], it is the process of extract-ing
previously unknown, useful information which include knowledge, association rules, pattern
finding, statistical and mathematical techniques. Query languages or graphical user interface are
required to express the DM requests and discovered information , so that results obtained from
the DM Engine become understandable and usable for end users.
DOI : 10.5121/ijdkp.2014.4504 47
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
48
2. HISTORICAL TRENDS OF DATA MINING
Data Mining introduced in the year 1990’s and it is the combination of many disciplines like
database management systems (DBMS), Statistics, Artificial Intelligence (AI), and Machine
Learning (ML) [8].Data Mining produce useful patterns by applying algorithmic methods on
observational data.
2.1 Data mining trends
Data mining algorithms show best results for numerical data but with the emergence of Statis-tics
and Machine Learning techniques, algorithms have been developed to mine non numerical data
and relational databases [9].
Earlier most of the DM algorithms employed only statistical techniques [10],but now there are
computing techniques like Artificial Intelligence, Machine Learning and Pattern Reorganization
[9, 11] where huge heterogeneous data stored in data warehouses can be easily mined [2,12].
DM applications are successfully implemented in various fields like health care, finance, retail,
telecommunication, fraud detection, risk analysis,education etc [13, 14, 15, 16]. Due to increas-ing
complexities in various fields and improvements in technology, there are new challenges to
DM which includes different data formats, distributed databases, networking resources etc.
3. KNOWLEDGE DISCOVERY IN DATABASES
Data mining and knowledge discovery in databases are related to each other and to other related
fields such as machine learning, statistics, and databases. Data Mining is one of the steps in the
overall process of KDD that consists of collection and preprocessing of data, data mining, inter-pretation,
evaluation of discovered knowledge and finally post processing [17]. The KDD field’s
basic objective is to make data meaningful by developing methods and techniques of mining but
problem being faced by the KDD process is to map huge and heterogeneous data into under-standable,
more abstract and useful form [18, 19].
The phrase knowledge discovery in databases emphasizes that knowledge is the end product of a
data-driven discovery [1, 18, 20, 21, 22].The data-mining step of KDD relies heavily on known
techniques from machine learning, pattern recognition, and statistics to find patterns from data.
Data warehousing is one of the fields of databases [18, 21, 24, 25], which helps in business ana-lytics
and decision support. Data warehousing helps set the stage for KDD in two ways: (1) data
cleaning and (2) data access. Approach followed for analysis of data warehouses is called online
analytical processing (OLAP) [23,26, 27, 28, 29].
3.1 The data mining step of the KDD process
Data mining step of KDD Process involves iterations for particular data-mining methods in ap-plication.
There are two types of goals: (1) verification in which system is limited to verifying
user’s hypothesis and, (2) Discovery, in which system autonomously finds new patterns.
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
DM helps in determining patterns from observed data. Knowledge Inference is done from fitted
models. Two primary mathematical formalisms [18] are used in model fitting: (1) statistical and
(2) logical.
49
3.2 Data mining methods
Primary goals of data mining in practice are prediction and description.In prediction some vari-ables
and fields in the database are used to predict unknown values of other variables of interest,
and description helps in finding human-understandable patterns describing the data [31, 32].
Weiss and Kulikowski in [33] proposed that, “Classification is learning a function that maps
(classifies) a data item into one of several predefined classes”. Apte and Hong in [34] suggested
that classification methods of Data mining are used as part of knowledge discovery applications
which includes classifying trends in financial markets, education and identifying objects of
interest from large dataset of images.Regression is a predictive technique that maps data item to a
prediction variable. Clustering is a descriptive task where we identify a finite set of categories or
clusters to describe the data.E.g identifying those students who are short of attendance and shown
poor performance in sessionals [35, 36, 37]. Cheeseman and Stutz in [38] suggested that
examples of clustering applications in a knowledge discovery context include discovering simi-lar
groups. Summarization involves methods like calculating mean and standard deviations.
There are some methods which involve deriving of abstract rules, visualization techniques, and
the discovery of functional relationships between variables [18, 19]. Summarization techniques
are often applied to interactive exploratory data analysis and automated report generation.
3.2.1 Decision trees and rules
Decision Trees are useful for multiple variable analyses. They split a data set into branch-like
segments [39, 40].
3.2.2 Classification methods
These methods consist of techniques for prediction.Examples includes feed forward neural net-works,
adaptive spline methods, projection pursuit regression, Multi-Layer Perceptrons, Gener-alized
Linear Models [41, 42, 43], Bayesian Networks, Decision Trees, and Support Vector Ma-chines.
3.2.3 Example-based methods
In this, predictive analysis on new examples will be derived from those examples in the model for
which predictions are known. Techniques include nearest neighbour classification and re-gression
algorithms and case-based reasoning systems.
4. THE COMPONENTS OF DATA-MINING ALGORITHMS
One can identify three primary components [2, 10, 18] in any DM algorithm:
1. Model representation 2. Model evaluation 3. Search.
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
A model representation is used to describe or extract patterns whereas Model-evaluation criteria
are statements which help in meeting the goals of Knowledge Discovery Process using particu-lar
pattern or model.Predictive models are judged by the prediction accuracy on some dataset and
descriptive models are evaluated along the dimensions of predictive accuracy, novelty, util-ity,
and understandability of the model.
Search method consists of two components: (1) Parameter search and (2) Model search. Once the
model representation and the model-evaluation criteria are fixed, then Data Mining problem left
with optimization of task on observational dataset.
50
5. RESEARCH AND APPLICATION CHALLENGES
• Larger databases: There are databases with hundreds of fields, tables; millions of records
and to derive some useful information from it is itself a challenge. Agrawal et al. [44],
sug-gested methods for dealing with large data volumes using efficient algorithmic
approaches because with increasing dataset there are chances of finding those patterns
which are inva-lid.Solution to this problem is the use of prior knowledge to identify
irrelevant variables.
• There are some issues related to prompt change, deletion of data that can make
previously discovered patterns invalid [30, 45, 46]. Possible solutions are to discover
methods for up-dating the patterns.
• Problem of missing and noisy data: This problem is related to business databases
[47].and mostly happens when KDD methods and tools donot easily incorporate prior
knowledge about a problem.
6. DATA MINING STEPS
• According to IBM report [6], three main steps in DM are preparing the data, reducing the
data and, finally, looking for useful information.
• Predictive modeling [50] uses inductive reasoning techniques and algorithms like neural
networks.
• Database segmentation [51] use statistical clustering techniques to partition data into
clusters.
• Link analysis [3] identifies useful associations between data.
• Deviation detection [3] detects and explains why certain records cannot be put into
specific segments.
• Fayyad et al. [18],proposed following steps of Data Mining:
• Retrieving the data from a large database.
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
51
• Selecting the relevant subset to work with.
• Deciding appropriate sampling system, transformations, cleaning the data and to deal
with missing fields and records.
• Fitting models to the pre-processed data.
7. DM TECHNIQUES
There are different data mining techniques which are used to extract information from a data set
and transform it into an understandable format for further use. Table I shows different Data Min-ing
Techniques and their roles.
7.1 Statistics
Statistics is a vital component in data selection, sampling, Data Mining, and knowledge evalua-tion.
In data cleaning process, statistics offer the techniques to detect outliers to simplify data
when necessary, and to estimate noise, it deals with missing data using estimation techniques [52,
53].
7.2 Classification and prediction
One of the most useful data mining techniques for e-learning is classification. Classification maps
data into predefined group of classes. Classification is supervised learning approach be-cause the
classes are determined before examining the data. The prediction of student’s perfor-mance with
high accuracy is more beneficial for identifying low academic performance of the students at the
beginning.
Classification [54] is the processing of finding a set of models which describe and distinguish
data classes or concepts. The derived results may be represented in various forms, such as clas-sification
(IF-THEN) rules, decision trees, or neural networks. Models then can be used for pre-dicting
the class label of data objects. In many applications, there is need to predict some miss-ing
data values rather than class labels. E.g. Case when the predicted values are numerical data,
and is often specifically referred to as prediction.
7.3 Clustering
Clustering groups the data, which is not predefined and it can identify dense and sparse regions in
object space. Clustering algorithm groups the data. Unlike classification and prediction, which
analyse class labelled data objects, clustering analyses data objects without consulting a known
class label. The class labels are not present in the training data and clustering can be used to
generate such labels.Clusters of objects are formed so that objects within a cluster have high
similarity in comparison to one another, but are very dissimilar to objects in other clusters. Each
cluster formed can be viewed as a class of objects, from which rules can be derived [8]. Applica-tion
of clustering in education can help in finding academic trends, student’s performance analy-sis
in class
.
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
52
7.4 Association
Association rule mining is to find set of binary variables that occurs in the transaction database
repeatedly. Apriori measures are the association rule mining algorithm [52, 55]. Association
analysis is the discovery of association rules showing attribute-value conditions that occur fre-quently
together in a given set of data. The association rule A=>B shows those database tuples
that satisfy the conditions in A as well as in B.
8. TECHNIQUES FOR MINING TRANSACTIONAL/RELATIONAL
DATABASE
8.1 Artificial intelligence (AI) techniques
AI techniques consists of pattern recognition, machine learning, and neural networks.Other
techniques in AI such as knowledge acquisition, knowledge representation, and search, are rele-vant
to the various processes in DM.
8.2 Decision tree approach
Decision trees are non-linear data structures which start from root node and end with leaf
node.Decision Trees represent sets of decisions. This approach can generate rules for the classi-fication
of a data set. Specific decision tree methods include Classification and Regression Trees
(CART) and Chi Square Automatic Interaction Detection (CHAID) [56].These techniques are
used for classification of a data set. They provide a set of rules that is applied to an unclassified
data set to predict results. CART typically requires less data preparation than CHAID.
8.3 Visualization
Visual DM techniques are helpful in exploratory data analysis, and mining the large database.
This approach requires integration of human in the DM process. There are examples of visuali-zation
techniques that work on large data sets and produce interactive displays [57].
There are various techniques for visualizing multidimensional data like scatter plot matrices,
coplots, matrices, parallel coordinates, projection matrices, and other geometric projection tech-niques
such as icon-based techniques, hierarchical techniques, web-based techniques, graph-based
techniques, and dynamic techniques.
TABLE 1 DATA MINING TECHNIQUES AND THEIR ROLES
Techniques Roles
Classification Pre-Defined Examples
Clustering Identification of similar classes of objects.
Prediction Regression Technique.
Association Rules Find frequent item set findings among large data
sets.
Neural Networks Derive meaning from complex or imprecise data
and can be used to extract patterns and detect trends
that are complex.
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
Decision Trees Represent set of decisions using CART
(Classification and Regression Trees) and CHAID
(Chi Square Automatic Inte-raction and Detection),
C4.5, ID3.
Nearest Neighbour method Classify each record in a dataset Based on a
combination of the classes of the K-records which
are most similar in his-torical dataset.
53
9. THE VARIOUS DATA MINING AREAS
9.1 Web Mining
Web mining is the application of data mining to discover the patterns from the Web in the form
of data collected from online information databases, hyperlinks, and digital data. Data mining
technique used in web mining are Classification (supervised learning), Clustering (unsupervised
learning) [59, 60].
9.2 Ubiquitous data mining
Increasing computational capacity and emergence of latest electronic devices leads to ubiquitous
or pervasive computing paradigm [61]. The Ubiquitous computing environments give rise to
Ubiquitous Data Mining (UDM).
9.3 Data mining using multimedia
The multimedia data includes images, video, audio, and animation. Data mining techniques fol-lowed
in multimedia data are rule based decision tree classification algorithms like Artificial
Neural Networks, Instance-based learning algorithms, Support Vector Machines, Association rule
mining, clustering methods [63].
9.4 Spatial data mining
The spatial data includes astronomical and data related to space technology. It includes the use of
spatial warehouses, spatial data cubes, spatial OLAP, and clustering methods [64].
9.5 Emergence of Data mining in other fields
Other data mining areas include visualisation, medical, pattern, wireless networks, association
rule based mining.
10. PERFORMANCE IMPROVEMENT IN EDUCATION SECTOR
10.1 Data mining techniques in education
Applying data mining techniques to educational data for knowledge discovery is significant to
educational organizations as well as students. Knowledge driven data supports educational deci-sion
support system. Educational data mining enhance our understanding of learning by finding
educational trends which includes improving student performance, course selection, in-house
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
trainings and faculty development.Using linear regression analysis [11], some factors are corre-lated
to students academic performance like mother’s education and student’s family in-come.
Data mining techniques helps in increasing student’s retention rate, increase educational
improvement ratio, and increase student’s learning outcome.Thus, data mining techniques are
used to operate on large volumes of data to discover hidden patterns and relationship which help
in effective decision making [53].
According to Han and Kamber [2] data mining softwares should be developed in such a manner
that it allows the users to analyze data from different dimensions, enable to categorize it and
summarize the derived results. Data mining can be applied on traditional as well as distance
education.There are many general data mining tools that provide mining algorithms, filtering and
visualization techniques.Some examples of data mining tools are DBMiner, Clementine,
Intelligent Miner, RapidMiner and Weka etc [11] .DM combines machine learning, statistics and
visualization techniques to discover and extract knowledge. Questionnaires and feedback forms
are often used to collect data related to students’s approach towards educational patterns or
trends,interest towards technologies,teaching methodologies followed and data collected is to be
analyzed using techniques like decision tree, neural networks etc.
There are different Mining models like Decision Trees, Naive Bayes,SupportVector Machines,
Linear Regression, Minimum Description Length,K-means,O-Cluster.By using these models, one
can get Student Behaviour Patterns, Course Behaviour Patterns, Predict Student Retention,
Predict Course Suitability, and Personalized Intervention Strategy [65].
54
10.1.1 Statistics and visualization
According to Tsantis & Castellani [69], Student’s log history and usage statistics are helpful in
evaluation of an e-learning system.Information visualization techniques [66] can be used to
graphically represent student data like his maximum interest towards which technologies or in-terest
which he has shown in solving questionnaires etc are collected by web-based educational
systems.Visualization techniques involves conversations among online groups, social network-ing
websites etc. These techniques are also helpful for instructors which can manipulate the
graphical representations generated and get the understanding and interest of their learners.
10.2 Web mining
Srivastava et al., [70] proposed, “Web mining is used to extract knowledge from web data”. In
web mining useful information is extracted from the contents of web documents and web usage
mining is another technique to discover meaningful patterns from data generated by client-server
transactions on one or more web localities.
10.2.1 Clustering, classification and outlier detection
Clustering and classification are both classification methods. Clustering is unsupervised and
classification is supervised. Classification and prediction are also related techniques. Classifica-tion
predicts class labels, whereas prediction predicts continuous-valued functions and outlier is
an observation that is unusually large or small relative to the other values in a dataset.
9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
According to Chen, Liu [72] decision tree i.e C5.0 algorithm and data cube technology are used
for managing classroom processes. Induction analysis helps in identifying potential student
groups having similar characteristics.Talavera and Gaudioso [73] propose mining student data
using clustering to discover patterns reflecting user behaviours.
55
10.2.2 Adaptive and intelligent web-based educational systems
Tang et al. [74] gives concept of data clustering for web based learning and to help in solving
learner based problems. They find clusters of students with similar learning characteristics based
on the sequence and the contents of the pages they visited.
10.3 Association rule mining
Association rule mining is popular mining method used between set of items in large databases.
Here one or more attributes of a dataset are associated with each other using IF-THEN state-ments.
10.3.1 Particular web-based courses
Ha et al. [75] perform web page navigational structure analysis from web-based virtual class-rooms,
e-learning portals and web pages navigated by learners.
10.3.2 Adaptive and intelligent web-based educational systems
Lu uses association fuzzy rules in a personalized e-learning material recommender system. Fuzzy
matching rules are used to discover associations between student’s requirements and a list of
learning materials [75]. Romero et al. [76] propose to use grammar-based genetic program-ming
with optimization techniques for providing a feedback to authors who designed courses and
derived relationships from student’s usage information.
10.4 Text mining
In text mining, mining is done on text data and is related to web content mining. It is an inter-disciplinary
area involving machine learning and data mining, statistics, information retrieval and
natural language processing[66,77].Text mining can work with unstructured or semi-structured
datasets such as full-text documents, HTML files, emails, etc.
10.5 Web-based educational systems
Data mining and text mining technologies are used in Web-based educational systems for shared
learning.Text mining is used for discussion board for expanded correspondence analysis. Learn-ers
select the relevant category which represents his/her comment and the system provides eval-uations
for learner’s comments between peers.
10.5.1 Well-known learning content management systems
Dringus and Ellis [78, 79] propose to use text mining as a strategy for assessing conversations
among irregular discussion forums. Text mining techniques also helps in evaluating the progress
10. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
of a thread or user group discussions. Data can be retrieved from pdf interactive multimedia
productions for helping the evaluation of multimedia presentations for statistics purpose and for
extracting relevant data [75, 76].Web-based educational systems collect large amount of student
data from web log history which can be further analysed for deriving meaningful patterns [80].
56
10.5.2 Adaptive and intelligent web-based educational systems
Tang et al. [74] propose to construct a personalized web based application by which mining can
be done on both framework and structure of the courseware. Keyword-driven text mining algo-rithms
are used to select articles for distance learning students.
11. CONCLUSION
Educational Data Mining is an upcoming field related to several well-established areas of re-search
including e-learning, web mining, text mining etc. Data Mining Techniques are used to
analyze Educational data and extract useful information from large amount of data. This paper
presents review of the KDD and basic data-mining techniques so as to integrate research in this
area.
The KDD field is related to development of methods and techniques which make the data rele-vant.
In Educational Sector softwares and visualization techniques can be developed using Data
Mining Techniques which not only predict student’s performance in examinations as well as
helps us to cluster those students who need special attention in their studies. Knowledge Dis-covery
in Databases results in better decision-making related to latest technologies useful in
classroom teaching as well as faculty enhancement programs and in-house trainings etc.Using
data mining techniques we can achieve refined data from distributed databases. Data Mining is an
efficient tool for improving institutional effectiveness and student learning. Knowledge ac-quired
by Educational Data Mining not only help teachers to manage their classes, improves their
teaching skills, students learning processes but also provide feedback to institutions to im-prove
their infrastructures and quality. For making this approach successful and to increase its scope,
more data can be collected from Educational Institutions and queries can be performed on it.
Using Techniques like Decision Tree we can predict the Class Result of students based on the
attributes taken. Decision tree classifiers are used on student's data to predict the student's per-formance
in class result. These techniques will help in identifying those students who are below
attendance and shown poor performance in Sessionals. The main finding of using these tech-niques
is the gathering of knowledge from student’s academic performance.
Another helpful technique is K-Means Clustering through which we can cluster the students
based on some attributes like their Class Performance, sessionals and Attendance in class. Cen-troids
are calculated from the educational data set taking K-clusters. It enhances the decision-making
approach to monitor the performance of students. On increasing the value of K clusters,
the accuracy becomes better with huge dataset and Kmeans can find the better grouping of the
data. It also helps us to clusters those students who need special attention. This review on the
Knowledge Discovery perspective in Data Mining would be helpful to find useful patterns re-lated
to Educational Data Sets.
12. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
[24] Tarun Dhar Diwan, Kamlesh Lehre , Vertika Kashyap , “An Evolutionary Approach for Disco-vering
Changing Frequent Pattern in Data Mining”, International Journal For Advance Research in
Engineering and Technology , Vol. 1, Issue VII, ISSN 2320-6802, Aug 2013
[25] Matteo Golfarelli, Stefano Rizzi, “A Survey on Temporal Data Warehousing”, International Journal
58
of Data Warehousing & Mining, 5(1), 1-17, January-March 2009
[26] Eya Ben Ahmed, Ahlem Nabli and Faïez Gargouri, “A Survey of User-Centric Data Warehouses:
From Personalization to Recommendation”.
[27] G.Satyanarayana Reddy, Rallabandi Srinivasu, M. Poorna Chander Rao, Srikanth Reddy Rikku-la,
“Data Warehousing, Data Mining, OLAP and OLTP Technologies are essential elements to support
decision-making process in industries”, (IJCSE) International Journal on Computer Science and
Engineering,Vol. 02, No. 09, 2010, 2865-2873
[28] Surajit Chaudhuri, Umeshwar Dayal, “An Overview of Data Warehousing and OLAP Technolo-gy”,
Appears in ACM Sigmod Record, March 1997
[29] Muhammad Saqib, Muhammad Arshad, Mumtaz Ali, Nafees Ur Rehman, Zahid Ullah, “Improve
Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive
Calculations”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 1, No 2, January
2012
[30] Usama Fayyad, Gregory Piatetsky - Shapiro, Padhraic Smyth, “The KDD Process for Extracting
Useful Knowledge from Volumes of Data”, Communications of the ACM, November 1996/Vol. 39,
No. 1111
[31] Samir Farooqi, “Data Mining: An Overview”, I.A.S.R.I.,Library Avenue,Pusa,New Delhi-110012
[32] Khalid Raza, “Application of Data Mining in Bioinformatics”, Indian Journal of Computer Science
and Engineering, Vol 1 No 2, 114-118
[33] Weiss S.M. and Kulikowski C.A, Computer Systems that learn. Morgan Kaufman Publishers, 1991.
[34] Apte, C., and Hong, S. J. 1996. “Predicting Equity Returns from Securities Data with Minimal Rule
Generation”. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-
Shapiro, P.Smyth, and R. Uthurusamy, 514–560. Menlo Park, Calif.: AAAI Press.
[35] Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM com-puting
surveys (CSUR) 31, no. 3 (1999): 264-323.
[36] Sami Ayramo,Tommi Karkkainen, “Introduction to partitioning-based clustering methods with a
robust example”, ISBN 951392467X,ISSN 14564378
[37] Karl-Heinrich Anders and Monika Sester, “Parameter-Free Cluster Detection in Spatial Databas-es
and its application to typification”, International Archives of Photogrammetry and Remote Sensing.
Vol. XXXIII, Part B4. Amsterdam 2000
[38] Cheeseman, P., and Stutz, J. 1996. “Bayesian Classification (AUTOCLASS): Theory and Re-sults”.
In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P.
Smyth, and R. Uthurusamy, 73–95. Menlo Park, Calif.: AAAI Press.
[39] http://support.sas.com/publishing/pubcat/chaps/57587.pdf
[40] http://iasri.res.in/ebook/win_school_aa/notes/Decision_tree.pdf
[41] J.Elder, NonLinear Classification and Regression, CSE 4404/5327 Introduction to Machine Learning
and Pattern Recognition
[42] Irene Kouskoumvekaki,Non-linear Classification and Regression Methods, September 29, 2011
[43] Peter Filzmoser, “Linear and Nonlinear Methods for Regression and Classification and applica-tions
in R”, Forschungsbericht CS-2008-3, Juli 2008
[44] Agrawal, R.; Mannila, H.; Srikant, R.; Toivonen, H.; and Verkamo, I. 1996. “Fast Discovery of
Association Rules”. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G.
Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 307–328. Menlo Park, Calif.: AAAI Press.
[45] http://digital.cs.usu.edu/~xqi/DataMining.html
[46] N.K. Sharma, Dr. R.C. Jain, Manoj Yadav, “A Survey on Data Mining Algorithms and Future
Perspective”, (IJCSIT) International Journal of Computer Science and Information Technologies,
Vol. 3 (5) , ISSN:0975-9646, 2012,5149 – 5156
[47] Padhraic Smyth, David Heckerman, and Michael Jordan, “Probabilistic Independence Networks for
Hidden Markov Probability Models”, Massachusetts Institute of Technology, 1996