Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. 1 Data mining is an interdisciplinary sub field of computer science and statistics with an overall goal to extract from a data set and transform the information into a comprehensible structure for further use. 1 2 3 4 The process of digging through data to discover hidden connections and predict future trends has a long history. Sometimes referred to as 'knowledge discovery' in databases, the term data mining wasn't coined until the 1990s. What was old is new again, as data mining technology keeps evolving to keep pace with the limitless potential of big data and affordable computing power. Over the last decade, advances in processing power and speed have enabled us to move beyond manual, tedious and time consuming practices to quick, easy and automated data analysis. The more complex the data sets collected, the more potential there is to uncover relevant insights. Rupashi Koul "Overview of Data Mining" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31368.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/31368/overview-of-data-mining/rupashi-koul
We have concentrated on a range of strategies, methodologies, and distinct fields of research in this article, all of which are useful and relevant in the field of data mining technologies. As we all know, numerous multinational corporations and major corporations operate in various parts of the world. Each location of business may create significant amounts of data. Corporate decision-makers need access to all of these data sources in order to make strategic decisions.
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. 1 Data mining is an interdisciplinary sub field of computer science and statistics with an overall goal to extract from a data set and transform the information into a comprehensible structure for further use. 1 2 3 4 The process of digging through data to discover hidden connections and predict future trends has a long history. Sometimes referred to as 'knowledge discovery' in databases, the term data mining wasn't coined until the 1990s. What was old is new again, as data mining technology keeps evolving to keep pace with the limitless potential of big data and affordable computing power. Over the last decade, advances in processing power and speed have enabled us to move beyond manual, tedious and time consuming practices to quick, easy and automated data analysis. The more complex the data sets collected, the more potential there is to uncover relevant insights. Rupashi Koul "Overview of Data Mining" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31368.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/31368/overview-of-data-mining/rupashi-koul
We have concentrated on a range of strategies, methodologies, and distinct fields of research in this article, all of which are useful and relevant in the field of data mining technologies. As we all know, numerous multinational corporations and major corporations operate in various parts of the world. Each location of business may create significant amounts of data. Corporate decision-makers need access to all of these data sources in order to make strategic decisions.
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
1. Web Mining – Web mining is an application of data mining for di.docxbraycarissa250
1. Web Mining – Web mining is an application of data mining for discovering data patterns from the web. Web mining is of three categories – content mining, structure mining and usage mining. Content mining detects patterns from data collected by the search engine. Structure mining examines the data which is related to the structure of the website while usage mining examines data from the user’s browser. The data collected through web mining is evaluated and analyzed using techniques like clustering, classification, and association. It is a very good topic for the thesis in data mining.
2. Predictive Analytics – Predictive Analytics is a set of statistical techniques to analyze the current and historical data to predict the future events. The techniques include predictive modeling, machine learning, and data mining. In large organizations, predictive analytics help businesses to identify risks and opportunities in their business. Both structured and unstructured data is analyzed to detect patterns. Predictive Analysis is a lengthy process and consist of seven stages which are project defining, data collection, data analysis, statistics, modeling, deployment, and monitoring. It is an excellent choice for research and thesis.
3. Oracle Data Mining – Oracle Data Mining, also referred as ODM, is a component of Oracle Advanced Analytics Database. It provides powerful data mining algorithms to assist the data analysts to get valuable insights from data to predict the future standards. It helps in predicting the customer behavior which will ultimately help in targeting the best customer and cross-selling. SQL functions are used in the algorithm to mine data tables and views. It is also a good choice for thesis and research in data mining and database.
4. Clustering – Clustering is a process in which data objects are divided into meaningful sub-classes known as clusters. Objects with similar characteristics are aggregated together in a cluster. There are distinct models of clustering such as centralized, distributed. In centroid-based clustering, a vector value is assigned to each cluster. There are various applications of clustering in data mining such as market research, image processing, and data analysis. It is also used in credit card fraud detection.
5. Text mining – Text mining or text data mining is a process to extract high-quality information from the text. It is done through patterns and trends devised using statistical pattern learning. Firstly, the input data is structured. After structuring, patterns are derived from this structured data and finally, the output is evaluated and interpreted. The main applications of text mining include competitive intelligence, E-Discovery, National Security, and social media monitoring. It is a trending topic for the thesis in data mining.
6. Fraud Detection – The number of frauds in daily life is increasing in sectors like banking, finance, and government. Accurate detection of fraud is a challenge. Da.
In the information age, data turns to be the vital. Hence it is important to understand the data in order to face the future information challenges. This paper deals with the importance of data mining while explaining the concepts and life cycle involved. It extracts the basic gist of the topic presented in a user-friendly way. Further, in developing different stages of data mining followed by its extended application usage in practical business platform.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Data Science Demystified_ Journeying Through Insights and InnovationsVaishali Pal
In the digital age, data has emerged as one of the most valuable resources, driving decision-making processes across industries. Data science, the interdisciplinary field that extracts insights and knowledge from structured and unstructured data, plays a pivotal role in leveraging this resource. This section provides an overview of data science, its importance, and its applications in various domains.
Introduction to Data Science: Unveiling Insights Hidden in Datahemayadav41
Embark on a journey into the fascinating field of Data Science and uncover the valuable insights concealed within vast datasets. In this article, we explore the fundamental concepts of Data Science and its applications. Discover how a Data science Training Institute in Jaipur, Lucknow, Indore, Mumbai, Delhi, Noida, Gurgaon and other cities in India can equip you with the knowledge and skills to analyze, interpret, and extract meaningful information from data. Explore topics such as data preprocessing, statistical analysis, machine learning, and data visualization. Join us on this enlightening exploration of the world of Data Science.
Data Mining And Visualization of Large DatabasesCSCJournals
Data Mining and Visualization are tools that are used in databases to further analyse and understand the stored data. Data mining and visualization are knowledge discovery tools used to find hidden patterns and to visualize the data distribution. In the paper, we shall illustrate how data mining and visualization are used in large databases to find patterns and traits hidden within. In large databases where data is both large and seemingly random, mining and visualization help to find the trends found in such large sets. We shall look at the developments of data mining and visualization and what kind of application fields usage of such tools. Finally, we shall touch upon the future developments and newer trends in data mining and visualization being experimented for future use.
Real World Application of Big Data In Data Mining Toolsijsrd.com
The main aim of this paper is to make a study on the notion Big data and its application in data mining tools like R, Weka, Rapidminer, Knime,Mahout and etc. We are awash in a flood of data today. In a broad range of application areas, data is being collected at unmatched scale. Decisions that previously were based on surmise, or on painstakingly constructed models of reality, can now be made based on the data itself. Such Big Data analysis now drives nearly every aspect of our modern society, including mobile services, retail, manufacturing, financial services, life sciences, and physical sciences. The paper mainly focuses different types of data mining tools and its usage in big data in knowledge discovery.
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS cscpconf
Cybersecurity solutions are traditionally static and signature-based. The traditional solutions
along with the use of analytic models, machine learning and big data could be improved by
automatically trigger mitigation or provide relevant awareness to control or limit consequences
of threats. This kind of intelligent solutions is covered in the context of Data Science for
Cybersecurity. Data Science provides a significant role in cybersecurity by utilising the power
of data (and big data), high-performance computing and data mining (and machine learning) to
protect users against cybercrimes. For this purpose, a successful data science project requires
an effective methodology to cover all issues and provide adequate resources. In this paper, we
are introducing popular data science methodologies and will compare them in accordance with
cybersecurity challenges. A comparison discussion has also delivered to explain methodologies’
strengths and weaknesses in case of cybersecurity projects.
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
In this paper we focus on some techniques for solving data mining tasks such as: Statistics, Decision Trees and Neural
Networks. The new approach has succeed in defining some new criteria for the evaluation process, and it has obtained valuable results
based on what the technique is, the environment of using each techniques, the advantages and disadvantages of each technique, the
consequences of choosing any of these techniques to extract hidden predictive information from large databases, and the methods of
implementation of each technique. Finally, the paper has presented some valuable recommendations in this field.
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction
.
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction.
1. Web Mining – Web mining is an application of data mining for di.docxbraycarissa250
1. Web Mining – Web mining is an application of data mining for discovering data patterns from the web. Web mining is of three categories – content mining, structure mining and usage mining. Content mining detects patterns from data collected by the search engine. Structure mining examines the data which is related to the structure of the website while usage mining examines data from the user’s browser. The data collected through web mining is evaluated and analyzed using techniques like clustering, classification, and association. It is a very good topic for the thesis in data mining.
2. Predictive Analytics – Predictive Analytics is a set of statistical techniques to analyze the current and historical data to predict the future events. The techniques include predictive modeling, machine learning, and data mining. In large organizations, predictive analytics help businesses to identify risks and opportunities in their business. Both structured and unstructured data is analyzed to detect patterns. Predictive Analysis is a lengthy process and consist of seven stages which are project defining, data collection, data analysis, statistics, modeling, deployment, and monitoring. It is an excellent choice for research and thesis.
3. Oracle Data Mining – Oracle Data Mining, also referred as ODM, is a component of Oracle Advanced Analytics Database. It provides powerful data mining algorithms to assist the data analysts to get valuable insights from data to predict the future standards. It helps in predicting the customer behavior which will ultimately help in targeting the best customer and cross-selling. SQL functions are used in the algorithm to mine data tables and views. It is also a good choice for thesis and research in data mining and database.
4. Clustering – Clustering is a process in which data objects are divided into meaningful sub-classes known as clusters. Objects with similar characteristics are aggregated together in a cluster. There are distinct models of clustering such as centralized, distributed. In centroid-based clustering, a vector value is assigned to each cluster. There are various applications of clustering in data mining such as market research, image processing, and data analysis. It is also used in credit card fraud detection.
5. Text mining – Text mining or text data mining is a process to extract high-quality information from the text. It is done through patterns and trends devised using statistical pattern learning. Firstly, the input data is structured. After structuring, patterns are derived from this structured data and finally, the output is evaluated and interpreted. The main applications of text mining include competitive intelligence, E-Discovery, National Security, and social media monitoring. It is a trending topic for the thesis in data mining.
6. Fraud Detection – The number of frauds in daily life is increasing in sectors like banking, finance, and government. Accurate detection of fraud is a challenge. Da.
In the information age, data turns to be the vital. Hence it is important to understand the data in order to face the future information challenges. This paper deals with the importance of data mining while explaining the concepts and life cycle involved. It extracts the basic gist of the topic presented in a user-friendly way. Further, in developing different stages of data mining followed by its extended application usage in practical business platform.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Data Science Demystified_ Journeying Through Insights and InnovationsVaishali Pal
In the digital age, data has emerged as one of the most valuable resources, driving decision-making processes across industries. Data science, the interdisciplinary field that extracts insights and knowledge from structured and unstructured data, plays a pivotal role in leveraging this resource. This section provides an overview of data science, its importance, and its applications in various domains.
Introduction to Data Science: Unveiling Insights Hidden in Datahemayadav41
Embark on a journey into the fascinating field of Data Science and uncover the valuable insights concealed within vast datasets. In this article, we explore the fundamental concepts of Data Science and its applications. Discover how a Data science Training Institute in Jaipur, Lucknow, Indore, Mumbai, Delhi, Noida, Gurgaon and other cities in India can equip you with the knowledge and skills to analyze, interpret, and extract meaningful information from data. Explore topics such as data preprocessing, statistical analysis, machine learning, and data visualization. Join us on this enlightening exploration of the world of Data Science.
Data Mining And Visualization of Large DatabasesCSCJournals
Data Mining and Visualization are tools that are used in databases to further analyse and understand the stored data. Data mining and visualization are knowledge discovery tools used to find hidden patterns and to visualize the data distribution. In the paper, we shall illustrate how data mining and visualization are used in large databases to find patterns and traits hidden within. In large databases where data is both large and seemingly random, mining and visualization help to find the trends found in such large sets. We shall look at the developments of data mining and visualization and what kind of application fields usage of such tools. Finally, we shall touch upon the future developments and newer trends in data mining and visualization being experimented for future use.
Real World Application of Big Data In Data Mining Toolsijsrd.com
The main aim of this paper is to make a study on the notion Big data and its application in data mining tools like R, Weka, Rapidminer, Knime,Mahout and etc. We are awash in a flood of data today. In a broad range of application areas, data is being collected at unmatched scale. Decisions that previously were based on surmise, or on painstakingly constructed models of reality, can now be made based on the data itself. Such Big Data analysis now drives nearly every aspect of our modern society, including mobile services, retail, manufacturing, financial services, life sciences, and physical sciences. The paper mainly focuses different types of data mining tools and its usage in big data in knowledge discovery.
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS cscpconf
Cybersecurity solutions are traditionally static and signature-based. The traditional solutions
along with the use of analytic models, machine learning and big data could be improved by
automatically trigger mitigation or provide relevant awareness to control or limit consequences
of threats. This kind of intelligent solutions is covered in the context of Data Science for
Cybersecurity. Data Science provides a significant role in cybersecurity by utilising the power
of data (and big data), high-performance computing and data mining (and machine learning) to
protect users against cybercrimes. For this purpose, a successful data science project requires
an effective methodology to cover all issues and provide adequate resources. In this paper, we
are introducing popular data science methodologies and will compare them in accordance with
cybersecurity challenges. A comparison discussion has also delivered to explain methodologies’
strengths and weaknesses in case of cybersecurity projects.
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
In this paper we focus on some techniques for solving data mining tasks such as: Statistics, Decision Trees and Neural
Networks. The new approach has succeed in defining some new criteria for the evaluation process, and it has obtained valuable results
based on what the technique is, the environment of using each techniques, the advantages and disadvantages of each technique, the
consequences of choosing any of these techniques to extract hidden predictive information from large databases, and the methods of
implementation of each technique. Finally, the paper has presented some valuable recommendations in this field.
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction
.
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
A LITERATURE REVIEW ON DATAMINING
1. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS
www.ijrcar.com
Vol.2 Issue.7, Pg.: 95-101
July 2014
S a n k a r K . V . e t a l Page 95
INTERNATIONAL JOURNAL OF
RESEARCH IN COMPUTER
APPLICATIONS AND ROBOTICS
ISSN 2320-7345
A LITERATURE REVIEW ON DATAMINING
Sankar K V1
, Dr.S.Uma2
, Subin P S3
, Ambat Vipin4
1
PG scholar department of computer science and engineering HIT Coimbatore,
2
Head of the department PG department of computer science and engineering HIT Coimbatore,
3
PG scholar department of computer science and engineering HIT Coimbatore,
4
PG scholar department of computer science and engineering HIT Coimbatore
PG DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
HINDUSTHAN INSTITUTE OF TECHNOLOGY
COIMBATORE, TAMILNADU, INDIA
Abstract-Data mining is one of the powerful new technologies that has emerged and It facilitates the users both
individuals and organizations to dig and find data from a collection or a large cluster of data. Data mining is used to
find patterns from a large group of data. For example an organization can find out customer behaviour from the data
collected. The far reach of data mining extends to Artificial intelligence, business, education, scientific fields,
machine learning etc. In this literature review a discussion of the basic concepts, applications, and the challenges in
data mining is done.
Key Words: Data mining, Artificial intelligence, Information.
I. INTRODUCTION
Data mining [1] is the method of finding patterns, trends and relations by moving through a large
amount of data stored in data repositories. This is done using techniques like statistics and
mathematics. In data mining, data is analysed through different views in order to find the patterns that
satisfy our needs. In short, data mining is also an analytical technique. Data is stored in various forms
like images, text, sound, videos etc. Using data mining we can categorize data, relate similar data, and
find the data occurrence patterns.
Another way of mentioning data mining is Knowledge Discovery in Databases also known as KDD
[2]. It also involves extracting information and patterns from a large collection of data in a database.
The main functionalities of data mining is to apply various methods to identify bits of information and
to use this information for various decision making applications. For the last few years data mining and
its applications in finding patterns and decision making, and has become an important breakthrough in
the world of technology and it is now becoming one of the most sought out methods of data analysis.
2. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS
www.ijrcar.com
Vol.2 Issue.7, Pg.: 95-101
July 2014
S a n k a r K . V . e t a l Page 96
Data mining is being applied in the fields of medicine, education, machine learning [3], pattern
recognition, artificial intelligence etc.
II. DATA MINING
According to the Gartner Group data mining is the process of discovering meaningful new
correlations, patterns and trends by sifting through large amounts of data stored in repositories, using
pattern recognition technologies as well as statistical and mathematical techniques[4]
Data mining is the analysis of (often large) observational data sets to find unsuspected relationships
and to summarize the data in novel ways that are both understandable and useful to the data owner”
[5]. Data mining is an interdisciplinary field bringing together techniques from machine learning,
pattern recognition, statistics, databases, and visualization to address the issue of information
extraction from large data bases [6].The data that is used to extract patterns in data mining is called as
training data. Thus the training data is the data used to train a system to make decisions. Data mining
has two ways in dealing with data namely Classification [10] and Prediction [10]
Classification: Classifies data based on the training set and the values in a classifying
attribute and uses it in classifying new data.
Prediction: Predicts unknown or missing values
Data mining process can be considered as knowledge transformation from a fine grained space to a
coarse grained space[11].
A new techinique called Ontology of core data mining entities exist by the name of OntoDMCore[12].
It defines the most essential data mining entities in a 3 layered ontological structure comprising of
specification, implementation, and an application layer. It provides a representation al framework for
the mining of structured data.
There are 2 methods of data mining for predicitng soil electeical resistivity namely Support vector
machine(SVM)[22] and Least Squares Support Vector Machine(LSSVM)[22]
III. DATA MINING : A BRIEF HISTORY
Data mining was a term first introduced in 1990’s and the technology roots back to a family of
technologies namely machine learning, classical statistics and artificial intelligence.
Machine learning is a combination of statistics and artificial intelligence (AI). It is
considered that AI evolved to become machine learning because AI heuristics and statistical
analysis are blended in machine learning. Machine learning is a technology that allows
computers to understand and study the data they work on and make decisions based on what
they have learned from those data. This is done by using statistics and adding AI heuristics to
it.
Statistics forms the base for most technologies that data mining is based upon. The examples
include standard variance, regression analysis, cluster analysis, discriminate analysis,
confidence intervals, standard deviation and distribution. All these help to analyze data
patterns and their relations.
Artificial intelligence uses heuristics to simulate the system, a behavior similar to humans
which is the ability to think and make decisions on its own. This ability largely benefits data
mining as it requires machines it make decisions without prompting.
Data mining therefore is a combination of old and modern day statistics, Artificial intelligence
and machine learning. These methods are combined to find patterns both hidden and within a
large collection of data.
3. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS
www.ijrcar.com
Vol.2 Issue.7, Pg.: 95-101
July 2014
S a n k a r K . V . e t a l Page 97
IV. USES OF DATA MINING
The primary usage of data mining is information extraction from a large number of data. Other uses
include
To obtain maximum information from a large amount of data that in normal provides only
scanty information
To interpret extracted data
To find hidden patterns
To use patterns for decision making
Image mining [16] is a spin off form of data mining in which images are extracted from a
large database of images to find specific patterns: Image mining deals with the extraction of
implicit knowledge, image data relationship, or other patterns not explicitly stored in the
image database. Image mining is more than just an extension of data mining to image domain.
It is an Interdisciplinary endeavor that draws upon expertise in computer vision, image
processing, image retrieval, data mining, machine learning, database, and artificial
intelligence.
V. FUNCTIONS OF DATA MINING
The main functions of data mining are given below[8]
Classification: Classification is finding models that analyze and classify a data item into
several predefined classes.
Sequencing: Sequencing is similar to the association rule. The relationship exists over a
period of time such as repeat visit to supermarket.
Regression: Regression is mapping a data item to a real-valued prediction variable.
Clustering: Clustering is identifying a finite set of categories or clusters to describe the
data.
Dependency Modeling: Dependency Modeling (Association Rule Learning) is finding a
model which describes significant dependencies between variables.
Deviation Detection: Deviation Detection (Anomaly Detection) is discovering the most
significant changes in the data.
Summarization: Summarization is finding a compact description for a subset of data.
Data Cleaning[20] removes noise from data,
Data integration[20] combines various data source
Data Selection [20] transformation transforms data into the storm appropriate for mining.
4. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS
www.ijrcar.com
Vol.2 Issue.7, Pg.: 95-101
July 2014
S a n k a r K . V . e t a l Page 98
VI. APPLICATIONS OF DATAMINING
Data mining is used in all fields of science and engineering. Some of the applications of data mining
are listed below.
Medical/Pharmacy:
Computer assisted diagnosis,
Response to drug dosage,
Successful prescription by study of patterns.
Insurance and health care:
Finding medical procedure for claims through claims analysis
Identifying potential buyers of drugs
Identifying risky customers through behavior pattern analysis.
Banking:
Fraud detection
Risk management
Hidden correlation discovery between financial indicators
The applications of data mining cited in some research papers are given below
It is used in the field of forecasting [13] i.e. by bending the series data business gain can be
achieved if data is converted in to information and then in to knowledge. This provides an
ample opportunity to leverage numerous sources of time series data. Insurance frauds are also
detected using data mining [19].
Data mining is also used for Botnet detection [23]. A bot is autonomous software capable of
performing its own functions, botnets are used in cybercrime as they are very powerful and
can do denial of service, phishing, spamming and eavesdropping. Data mining and pattern
detection of network traffic helps to prevent botnets from breaching security.
Another field were data mining is used is in banks and financial institutions for credit scoring
[14].
Data mining is used in electronic commerce. Electronic commerce involves the use of
information and communication technologies through internet platform. Data mining and web
data mining techniques are used in the electronic commerce to understand customer patterns.
[15]
Medical data mining [17] is another important field where the large amount of data that is
generated in various medical and health centers are mined to make the diagnosis, prognosis,
and treatment schedules easier and quicker.
It is used to extract application oriented models and patterns from a continuous, rapid flow of
data streams in sensor networks [18].
5. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS
www.ijrcar.com
Vol.2 Issue.7, Pg.: 95-101
July 2014
S a n k a r K . V . e t a l Page 99
Pattern mining is used for activity recognition by searching for patterns of time segments in
which same activity is performed or occurred [21].
Data mining is also applied in social media to extract actionable patterns that can
be beneficial for business, users, and consumers [24].
Data mining is used in psychology [25] for pre diagnosis i.e. based on the data obtained from
the diagnosis and treatment of various patients a pattern can be found for each mental disorder
that is common to patients suffering from that particular disorder. Based on this pattern we
can predict if a person is susceptible to an issue of the same type.
VII. DATA MINING CHALLENGES
Like all other technologies data mining also has its share of issues. The following are the issues in data
mining.
Security and social issues
It is well known that data mining mainly focuses on a large amount of data to find out the
relevant data required. So security [8] is an issue in data mining. The profiling, pattern
making and matching of various people and other sources will lead to a breach in the privacy
of data on people, even it is susceptible to illegal accesses. Also creation of training data
based on a user’s behavior may cause breach of privacy for that individual and it can be
harmful if those details are linked to other seriously confident data. Data mining poses a threat
to usage of private and confidential information without control.
User interface issues
Data that is obtained through data mining need to be presented in such a manner that the users
must be able to analyze and understand the extracted data. The ability to see data clearly is
entirely dependent on the tool used to present the data. A good visualization allows proper
and accurate interpretation of data.
However even though there are good visualization techniques available further research is
necessary for avoiding problems like screen real estate and thus allow the user to focus on the
data and refine the mining tasks and also to view the data from all available perspectives.
Methodology issues
This issue is related to various approaches of data mining and their disadvantages. The
examples include diversity of data, versatility of methods, domain dimensions, knowledge
assessment, use of domain knowledge, controlling errors in data etc. The best way to resolve
this issue is to have a different approach of data mining based on the data being dealt with.
Performance issues
It is well known that data mining handles a large amount of data and this is done through the
application of statistics and artificial intelligence heuristics. The size of the data is ever
increasing and the demand for scalability and efficiency is also growing. The normal
algorithm used in data mining is linear algorithms rather than exponential algorithms. Parallel
programming is another issue in data mining. The introduction of parallelism will improve the
efficiency of data mining methods.
6. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS
www.ijrcar.com
Vol.2 Issue.7, Pg.: 95-101
July 2014
S a n k a r K . V . e t a l Page 100
Data source issues
Issues related to data can pertain to diversity of data types. The current method of data mining
is to collect maximum data as possible regardless of whether the data is taken in the right
amount and also if the data taken is appropriate. Data of different types are stored in different
repositories. Different kinds of data and sources may require distinct algorithms and
methodologies. Currently, there is a focus on relational databases and data warehouses, but
other approaches need to be pioneered for other specific complex data types. A versatile data
mining tool, for all sorts of data, may not be realistic. Moreover, the proliferation of
heterogeneous data sources, at structural and semantic levels, poses important challenges not
only to the database community but also to the data mining community.
VIII. CONCLUSION
Data mining focuses on the extraction and analysis of data from a large amount of data in a database.
In this review the basic concepts, applications, challenges and uses of data mining are discussed. This
review would be helpful for researchers to focus on challenges and their solutions to data mining.
New constraints and algorithms for enhanced security and more precise data retrieval could be
introduced which will help data mining to establish itself as a separate discipline where more studies
could be conducted and more improvements made. The future of data mining can extend to genetics
through genetic algorithms, artificial neural networks that resemble biological neural networks,
automated prediction of trends and behaviors. The extraction of useful if-then rules from data which is
called as rules induction is another future trend in data mining. The future cannot be foretold but there
are plenty of challenges awaiting solutions and data mining will definitely be a breakthrough in the
future of technology and mankind.
REFERENCES
[1] Gorunescu, F, Data Mining: Concepts, Models, and Techniques, Springer, 2011.
[2] Han, J., and Kamber, M., Data mining: Concepts and techniques, Morgan-Kaufman Series of Data Management
Systems San Diego:Academic Press, 2001.
[3] NeelamadhabPadhy, Dr.Pragnyaban Mishra and RasmitaPanigrahi, “The Survey of Data Mining Applications
and Feature Scope, International Journal of Computer Science, Engineering and Information Technology
(IJCSEIT)”, vol.2, no.3, June
[4] Heikki, Mannila,Data mining: machine learning, statistics and databases, IEEE, 1996.
[5] Fayadd, U., Piatesky -Shapiro, G., and Smyth, P, From Data Mining To Knowledge Discovery in Databases”,
The MIT Press, ISBN 0–26256097–6, Fayap, 1996.
[6] Piatetsky-Shapiro, Gregory,The Data-Mining Industry Coming of Age,”IEEE Intelligent Systems, 2000.
[7] Jing He Advances in Data Mining: History and Future, Third international Symposium on Information
Technology Application, 978-0-7695- 3859-4, IEEE, 2009.
[8] Berry, M.andLinoff, G., Master Data Mining: The Art and Science of Customer Relationship Management,
Wiley publisher, 2000.
[9] http://maaw.info/DataMining.htm
[10]Jiawei Han, Data mining concepts and techniques,2006
[11] Wang, Guoyin, Granular Computing,2008
[12]PančePanov, Larisa Soldatova, SašoDžeroski, Ontology of core data mining entities, July 2014.
[13] Timothy D Rey, Justin Kauhl, Using data mining for forecastingproblems, 2013
[14] S M Sadatrasoul, M R Gholamian, M Siami, Z Hajimohammadai, Credit scoring in banks and financial
institutions via data mining techniques, 2013
[15] Cesar Astudillo, Matthew Bardeen, NarcisoCerpa, Data mining in electroniccommerce, January 2014
[16] Madhumathi K, Dr Antony SelvadossThanamani, Image mining frameworks and techniques, 2014
7. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS
www.ijrcar.com
Vol.2 Issue.7, Pg.: 95-101
July 2014
S a n k a r K . V . e t a l Page 101
[17] Arun George Eapen, Application of data mining in medical applications, University of Waterloo, 2004
[18] AzharMahmood, Ke Shi, ShadeenKhatoon and Mi Xiao, Data mining techniques for wireless sensor
networks, 2013
[19] IBM, Using data mining to detect insurance fraud, 2011
[20] H Lookmansithic, T Balasubramanian, Survey of insurance fraud detection using data mining techniques,
2013
[21] UmutAvci, Andrea Passerni, Improving activity recognition by segmental pattern mining, 2014
[22] PijushSamui, ”Applicability of Data Mining Techniques for Predicting Electrical Resistivity of Soils Based
on Thermal Resistivity.” Int. J. Geomech, 2013
[23] BhavaniThuraisingam, Latifur Khan, Mohammad M. Masud, Kevin W. Hamlen, Data mining for security
applications, 2008
[24] PritamGundecha, Huan Liu, Mining social media, 2012
[25]Hengqing Tong, Application of data mining in psychological evaluation,Computer Science and
Computational Technology, 2008.ISCSCT '08. International Symposium, 2008
Brief Author biography
Sankar K V received the B Tech Degree in Information Technology from Hindusthan College Of
Engineering And Technology Coimbatore affiliated to Anna University in 2008 and is currently
pursuing M E degree at Hindusthan Institute of Technology Coimbatore affiliated to Anna University.
Dr S.Uma is Professor and Head of PG Department of Computer Science and Engineering at
Hindusthan Institute of Technology, Coimbatore, Tamilnadu, India. She received her B.E., degree in
Computer Science and Engineering in First Class with Distinction from PSG College of technology in
1991 and the M.S., degree from Anna University, Chennai, Tamilnadu, India. She received her Ph.D.,
in Computer Science and Engineering Anna University, Chennai, Tamilnadu, India with High
Commendation. She has nearly 24 years of academic experience. She has organized many National
Level events like seminars, workshops and conferences. She has published many research papers in
National and International Conferences and Journals. She is a potential reviewer of International
Journals and life member of ISTE professional body. Her research interests are pattern recognition and
analysis of nonlinear time series data.