1. DATA MINING
A DETAILED STUDY AND
ITS LITERATURE SURVEY
Submitted by- Ankur utsav
Department of Electronics and
communication engineering,
Birla Institute of technology, Patna
3. INTRODUCTION
• Data mining is the method of analyzing secreted
patterns of data.
• Data mining mainly includes extracting the data,
transforming the data , and uploading the data
against the data warehouse organization.
• Data mining uses complex arithmetical
algorithms to arrange the data.
• Data mining is applicable in different fields as
banking and financial services, Health care,
Telecommunications.
5. HISTORY
• Data mining is the field's evaluation and the term Data
mining" was coined in 1990.
• Its root can be sketched down three family lines:-
• i.) classical statistics, ii.)AI and iii.)machine language.
• Statistics: It is the base of the majority of the
technologies, which we are building data mining. e.g.
regression investigation, standard allocation, standard
deviation, variance, distinguish analysis, cluster
analysis, and self-assurance intervals.
• AI: Artificial intelligence is the authority that tries to
follow how the brain works with encoding methods
like, eg- making a program which plays chess.
6. • Machine learning: It is the combination of AI
and statistics. It is a branch of AI, including all
the set of algorithms which is applied in the
above discussed Statistical Models ,predictive
diagnostics are done by using clustering and
classification.
• It is basically the transformation or
modification of machine learning technology
to business applications.
8. IMPORTANCE OF DATA MINING
• Data can create profits. It is a crucial monetary asset of a
endeavour.
• Many businesses which can be used for discovering and
exploring knowledge from available data sets through data
mining.
• Data mining helps us for the fore casting of future trends.
• Data mining plays a vital role in the early stages of data
management by the help of skilled and efficient data entry
service providers.
• Data mining is also handy in locating the data variance
patterns which are necessary in scam recognition and fields
of pathetic or false data modification.
9. Issues of Data mining
• Security and social issues:-Now-a-days, the most
common issue in the data collection which can be
shared is security.
• User interface issues:- The information which are
discovered by data mining technique is useful if it
is fascinating and user can be able to understood.
• Mining methodology issues:-These issues refer to
the approaches used in data mining and its
restriction.
10. • Data source issues:- The realistic issues like the
multiplicity of data types and philosophical issues
like the data surplus troubles which includes
many issues which are linked to the data sources.
• Performance Issues:- There can be performance
related issues such as follows -i.)Efficiency and
scalability of data mining algorithms –For getting
high efficiency of extraction of the information
from a huge and massive amount of data.
11. • ii.)Parallel, spread, and incremental mining
algorithms – Parallel and spread data mining
algorithms comes in to picture when huge size
databases are used, large allocation of data
and complex methods of data mining.
13. Techniques of Data Mining.
• Classification: It obtains a model to establish the
group of item based on its attributes. :- It is a old
data mining system which works on machine
learning mostly in classification, everything in a
set of data is classified in predefined group or set
of classes
• Prediction: Its job is to forecast the probable
values of lost or future data.
• Time series: It is a series of events where the next
event is different kinds of preceding events.
14. • Association: It discovers the association or connection
between a set of items. It is one of the finest identified
data mining system. In this system, particular item
pattern on other item based discovery is done when
making a relationship. Association rules are if or then
declaration which assist to reveal relations among
seemingly distinct data in relational database.
• Clustering: Clustering used to identify data items that
are comparable to one another. :- It makes useful and
meaningful group of items which have same
characteristic using automated techniques.
15. • It’s major task involves exploration of data
mining, many statistical analysis, bio informatics
etc. There are five types of cluster:- Well
separated, centre based cluster, contiguous
clusters, density based , shared property.
• Summarization: It is the simplification of data. A
set of appropriate data is sum up that results in a
minor set which gives combined knowledge of
the data.
17. DATA MINING APPLICATION:
• Data mining is major worried with the investigation of
data that has been adopted. Data mining is a energetic
and quick-expanding area with vast strengths.
• Medical and Pharmacy:-Data mining allow
distinguishing patient behaviour to see incoming office
stay. Data mining helps in identification of successful
help therapies applicable for different illness its
application are constantly increasing in a variety of
domain to offer more unknown information which can
increase the business competence, efficiency.
18. • Web mining:-Web mining is the function of data mining
system to discover patterns, structures, and knowledge
from the Web.
• Health Care :-Health care is also one of the first important
areas of activity that boosted the intensive development of
the data mining methods, starting from visualization
techniques, predicting health care costs and ending with
computer-aid diagnosis.
• Detection of Banking and Finance:-The banking and
financial services domain is one of the first and most
important areas for data mining applications. Thus, in
banking, data mining methods were intensively used in
modelling and forecasting credit fraud, risk assessment etc
19. • Retail Industry: - Data Mining plays a vital role
in retail industry as it looks for the collection
of huge amount of data of sales, client
purchasing history, goods shipping,
expenditure and services. As population is
increasing day by day therefore, quantity of
data collected will also increase day by day.
That indicates the demand of data mining in
future is very high.
20. CONCLUSION:
• Data Mining is the process of extracting
knowledge from massive sets of data.
• Data mining is very Important domain,as it deals
with data and day by day population is increasing.
• Researchers would mainly centre on the issue
and challenges of data mining. Data mining
software is used to analyze the data.
• Proper predictions can be done by data analysis
and algorithms.
• We can use data mining for discion making.
21. FUTURE TRENDS
• Data mining as changing trends-
• Past:-Earlier uses the statistical, Machine Learning
algorithm used for numerical and structured data for
traditional database. Its main area of application was for
business purpose. It's computing resources was 4GPL and
its related technique.
• Present:- Now a days, Data mining was advanced statistical,
Machine learning, Artificial Intelligence and pattern
recognition techniques. It is applicable for structured, semi
structured, and unstructured data formats. Its main area of
application is business, web, medical diagnosis etc. Its
computing resource is high speed network, high end
storage devices, Distributed computing etc.
22. • Future:-In coming future, Data mining will use
soft computing (fuzzy logic), neural network and
genetic programming algorithm. It will include
high dimensional, speed data streams,
sequences, noise in the time, series, graph etc as
data formats. It's main area of application is
Business, web, Medical Diagnosis scientific and
research analysis fields (biomedical application,
remote sensing etc). Social Network etc. Its
computing resources will be multivalent
technologies and cloud computing.
23. REFERENCES
• [1] Pang-Ning Tan, Michael Steinbach, Vipin Kumar, "Introduction to
Data Mining", Addison Wesley, 2002.
• [2] S. Mitra, S.K.Pal & Mitra , P., Data mining in soft computing
framework: A survey, IEEE transactions on neural networks, 13(1),
3-14,2002.
• [3] Parvez Ahmad, Saqib Qamar, Syed Qasim Afser Rizvi, Techniques
of Data Mining in Healthcare : A Review, International Journal of
Computer Applications (0975 – 8887) Volume 120 – No.15, June
2015.
• [4] Hsinchun Chen, Sherrilynne, S. Fuller, Carol Friedman and
William Hersh, Knowledge Management, Data Mining and text
mining in medical informatics.
24. • [5] Zhu, Xingquan; Davidson, Ian (2007). Knowledge Discovery and Data Mining:
Challenges and Realities. New York, NY: Hershey. pp. 31–48. ISBN 978-1- 59904-
252-7
• [6] Md. Ansarul Haque1, Tamjid Rahman , “SENTIMENT ANALYSIS BY USING FUZZY
LOGIC”, International Journal of Computer Science, Engineering and Information
Technology (IJCSEIT), Vol. 4,No. 1, February 2014.
• [7] Vijayaran S, Sudha. “An Effective Classification Rule Technique for Heart Disease
Prediction”.International Journal of Engineering Associates, February 2013.
• [8] Fayadd, U., Piatesky -Shapiro, G., and Smyth, P, From Data Mining To Knowledge
Discovery in Databases”, The MIT Press, ISBN 0–26256097–6, Fayap, 1996.
• [9] Huan Liu and Lei Yu, “Toward Integrating Feature Selection Algorithms for
Classification and Clustering”,IEEE Transactions on Knowledge and Data
Engineering Volume 17 Issue 4,April 2005
• [10] Meenu Sharma, “Clustering In Data Mining : A Brief Review”, International
Journal Of Core Engineering & Management (IJCEM) Volume 1, Issue 5, August
2014.