All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Dm issues u 1
1. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Course: Data Mining Sub Code: 6ED
Google Classroom: q7b4gv Programme: B.Sc-CT
Unit: I Hour : 6
DATA MINING ISSUES
FACULTY : Ms.A.SATHIYA PRIYA
2. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
2
Department of Computer Technology III BSC CT SEM V Year:
2019- 20
UNIT I Basic Data Mining Tasks6ED – Data Mining
SNAP TALK
2
3. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
3
Department of Computer Technology III BSC CT SEM V Year:
2019- 20
UNIT I Basic Data Mining Tasks6ED – Data Mining
ATTENDANCE
3
4. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Expected outcome
The outcome of this session is to
understand about the Data Mining Issues.
5. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
There are some crucial implementation issues associated
with data mining.
Partitioning them into five groups,
Mining methodology
User integration
Efficiency and Scalability
Diversity of data types
Data mining and Society
5
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
DATA MINING ISSUES
6. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Mining Methodology
Mining various and new kinds of knowledge: Data
mining covers a wide a spectrum of data analysis and
knowledge discovery tasks.
The data mining tasks may use same database in
different ways.
It require the development of numerous data mining
techniques.
6
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
7. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Cont..,
Due to the diversity of applications, new mining tasks
continue to emerge, making data mining a dynamic
and fast growing field.
7
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
8. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Mining knowledge in multidimensional space
When searching for knowledge in large data sets.
It can explore the data in multidimensional space.
Search for interesting patterns among the combinations
of dimensions at varying levels of abstraction.
Data can be aggregated or viewed as a multidimensional
data cube.
Mining knowledge in cube space can substantially
enhance the power and flexibility of data mining.
8
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
9. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Data mining-an interdisciplinary effort
The power of data mining can be substantially
enhanced by integrating new methods from multiple
disciplines.
The mining of software bugs in large programs.
This form of mining known as bug-mining.
The incorporation of software engineering knowledge
into the data mining process.
9
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
10. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Boosting the power of discovery in a networked
environment
Most data objects reside in a linked or interconnected
environment.
It be the web, databases relations, files or
documents.
Semantic links across multiple data objects can be
used to advantage in data mining.
10
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
11. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Cont..,
Knowledge derived in one set of objects can be used
to boost the discovery of knowledge in a “related” or
semantically linked set of objects.
11
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
12. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Handling uncertainty, noise of incompleteness of
data
Data often contain noise, errors, exceptions or
uncertainty, or are incomplete.
Errors and noise may confuse the data mining
process, leading to the derivation of erroneous
patterns.
12
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
13. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Cont..,
Data cleaning, data preprocessing, outlier detection
and removal, and uncertainty reasoning.
13
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
14. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Missing Data
There may be missing variable values, incomplete
data.
Some algorithms require complete data.
Missing values have to be estimated or variables with
very frequent missing values perhaps to be removed.
14
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
15. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Irrelevant Data
Some variables may be useless.
If all values of a variable are constant, it is called
dead and can be removed.
If almost all values are constant, it is not
straightforward whether it can be removed.
Those very rare values could be essential in some
situations.
15
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
16. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Noisy Data
Some values might be invalid or incorrect.
A user or a measuring equipment has given a
false value.
These are corrected or deleted, but first they have
to be found.
16
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
17. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Outliers
There are sometimes many data entries that do not
fit nicely into the derived model.
They may be erroneous values or otherwise
exceptional that are best to remove.
For instance, the age of 0 year for patient data is
such.
17
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
18. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Pattern evaluation and pattern or constraint guided
mining
A pattern interesting may vary from user to user.
Techniques are needed to asses the interestingness of
discovered patterns based on subjective measures.
The value patterns with respect to a given user class,
based on user beliefs or expectations.
18
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
19. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
User Interaction
Interactive Mining: The data mining process should be
highly interactive.
It is important to build flexible user interfaces.
An exploratory mining environment, facilitating the users
interaction with the system.
First sample a set of data, explore general characteristics
of the data and estimate potential mining results.
19
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
20. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Cont..,
Data mining problems are often not precisely stated,
both application domain and data mining experts are
needed.
Training data and results desired are defined.
Interpretation of results is important to do carefully.
20
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
21. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Incorporation of background knowledge
Background knowledge, constraints, rules and other
information regarding the domain under study should
be incorporated into the knowledge discovery
process.
21
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
22. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Presentation and visualization of data mining results
A data mining system present data mining results,
vividly and flexibility.
The system to adopt expressive knowledge
representations, user friendly interfaces and
visualization techniques.
22
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
23. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Interpretation
This may require experts to correctly interpret the
results obtained.
Visualization
To easily view and understand input data and results
visualization is helpful.
23
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
24. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Large Datasets
Data set may be massive which create problems to
handle such.
Sampling and parallelization are effective to attack
these problems.
24
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
25. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Multimedia Data
Usually data mining methods are targeted to
traditional data types, i.e., numeric, characters an
text.
They are not always suitable for multimedia, e.g.,
geographic data (GIS).
25
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
26. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Changing Data
Data cannot be assumed to be static even if mostly
we start from this thought.
Therefore, algorithms must be rerun from time to
time.
26
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
27. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Over fitting
Over fitting occurs when a model is built to be too
detailed or strictly fit the data given.
Thus, it may lose its generalization ability and is not
valid for future data.
27
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
28. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Points to ponder
The data mining tasks may use same database in
different ways.
It require the development of numerous data mining
techniques.
The data mining process should be highly interactive.
A data mining system present data mining results,
vividly and flexibility.
29. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Keywords
same database in different ways.
input data and results visualization
rerun from time to time.
not valid for future data.
Sampling and parallelization.
30. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
31. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
MCQ’S
1. Many data entries that do not fit nicely into the ______model
A. Concurrent b. Derived c. Algorithm
2. To easily view and understand input data and results
________is helpful.
A. Visualization b. Related information C. Comparision
3. A data mining system present data mining results, vividly
and________.
A. Easy b. Compatibility c. Flexibility
32. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Answers
1. b. derived
2. a. visualization
3. c. flexibility
33. CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
THANK U
Department of Computer Technology III BSC CT SEM V year: 2019-
20
6ED – Data Mining UNIT I Basic Data Mining Tasks