SlideShare a Scribd company logo
1 of 33
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Course: Data Mining Sub Code: 6ED
Google Classroom: q7b4gv Programme: B.Sc-CT
Unit: I Hour : 6
DATA MINING ISSUES
FACULTY : Ms.A.SATHIYA PRIYA
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
2
Department of Computer Technology III BSC CT SEM V Year:
2019- 20
UNIT I Basic Data Mining Tasks6ED – Data Mining
SNAP TALK
2
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
3
Department of Computer Technology III BSC CT SEM V Year:
2019- 20
UNIT I Basic Data Mining Tasks6ED – Data Mining
ATTENDANCE
3
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Expected outcome
 The outcome of this session is to
understand about the Data Mining Issues.
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
 There are some crucial implementation issues associated
with data mining.
 Partitioning them into five groups,
 Mining methodology
 User integration
 Efficiency and Scalability
 Diversity of data types
 Data mining and Society
5
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
DATA MINING ISSUES
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Mining Methodology
 Mining various and new kinds of knowledge: Data
mining covers a wide a spectrum of data analysis and
knowledge discovery tasks.
 The data mining tasks may use same database in
different ways.
 It require the development of numerous data mining
techniques.
6
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Cont..,
 Due to the diversity of applications, new mining tasks
continue to emerge, making data mining a dynamic
and fast growing field.
7
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Mining knowledge in multidimensional space
 When searching for knowledge in large data sets.
 It can explore the data in multidimensional space.
 Search for interesting patterns among the combinations
of dimensions at varying levels of abstraction.
 Data can be aggregated or viewed as a multidimensional
data cube.
 Mining knowledge in cube space can substantially
enhance the power and flexibility of data mining.
8
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Data mining-an interdisciplinary effort
 The power of data mining can be substantially
enhanced by integrating new methods from multiple
disciplines.
 The mining of software bugs in large programs.
 This form of mining known as bug-mining.
 The incorporation of software engineering knowledge
into the data mining process.
9
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Boosting the power of discovery in a networked
environment
 Most data objects reside in a linked or interconnected
environment.
 It be the web, databases relations, files or
documents.
 Semantic links across multiple data objects can be
used to advantage in data mining.
10
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Cont..,
 Knowledge derived in one set of objects can be used
to boost the discovery of knowledge in a “related” or
semantically linked set of objects.
11
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Handling uncertainty, noise of incompleteness of
data
 Data often contain noise, errors, exceptions or
uncertainty, or are incomplete.
 Errors and noise may confuse the data mining
process, leading to the derivation of erroneous
patterns.
12
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Cont..,
 Data cleaning, data preprocessing, outlier detection
and removal, and uncertainty reasoning.
13
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Missing Data
 There may be missing variable values, incomplete
data.
 Some algorithms require complete data.
 Missing values have to be estimated or variables with
very frequent missing values perhaps to be removed.
14
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Irrelevant Data
 Some variables may be useless.
 If all values of a variable are constant, it is called
dead and can be removed.
 If almost all values are constant, it is not
straightforward whether it can be removed.
 Those very rare values could be essential in some
situations.
15
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Noisy Data
 Some values might be invalid or incorrect.
 A user or a measuring equipment has given a
false value.
 These are corrected or deleted, but first they have
to be found.
16
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Outliers
 There are sometimes many data entries that do not
fit nicely into the derived model.
 They may be erroneous values or otherwise
exceptional that are best to remove.
 For instance, the age of 0 year for patient data is
such.
17
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Pattern evaluation and pattern or constraint guided
mining
 A pattern interesting may vary from user to user.
 Techniques are needed to asses the interestingness of
discovered patterns based on subjective measures.
 The value patterns with respect to a given user class,
based on user beliefs or expectations.
18
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
User Interaction
 Interactive Mining: The data mining process should be
highly interactive.
 It is important to build flexible user interfaces.
 An exploratory mining environment, facilitating the users
interaction with the system.
 First sample a set of data, explore general characteristics
of the data and estimate potential mining results.
19
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Cont..,
 Data mining problems are often not precisely stated,
both application domain and data mining experts are
needed.
 Training data and results desired are defined.
 Interpretation of results is important to do carefully.
20
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Incorporation of background knowledge
 Background knowledge, constraints, rules and other
information regarding the domain under study should
be incorporated into the knowledge discovery
process.
21
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Presentation and visualization of data mining results
 A data mining system present data mining results,
vividly and flexibility.
 The system to adopt expressive knowledge
representations, user friendly interfaces and
visualization techniques.
22
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Interpretation
 This may require experts to correctly interpret the
results obtained.
Visualization
 To easily view and understand input data and results
visualization is helpful.
23
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Large Datasets
 Data set may be massive which create problems to
handle such.
 Sampling and parallelization are effective to attack
these problems.
24
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Multimedia Data
 Usually data mining methods are targeted to
traditional data types, i.e., numeric, characters an
text.
 They are not always suitable for multimedia, e.g.,
geographic data (GIS).
25
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Changing Data
 Data cannot be assumed to be static even if mostly
we start from this thought.
 Therefore, algorithms must be rerun from time to
time.
26
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Over fitting
 Over fitting occurs when a model is built to be too
detailed or strictly fit the data given.
 Thus, it may lose its generalization ability and is not
valid for future data.
27
Department of CT III-B.Sc-CS VI Semester: 2017-18
Unit I Data Mining Issues6ED – Data Mining
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Points to ponder
 The data mining tasks may use same database in
different ways.
 It require the development of numerous data mining
techniques.
 The data mining process should be highly interactive.
 A data mining system present data mining results,
vividly and flexibility.
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Keywords
same database in different ways.
input data and results visualization
rerun from time to time.
not valid for future data.
Sampling and parallelization.
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
MCQ’S
1. Many data entries that do not fit nicely into the ______model
A. Concurrent b. Derived c. Algorithm
2. To easily view and understand input data and results
________is helpful.
A. Visualization b. Related information C. Comparision
3. A data mining system present data mining results, vividly
and________.
A. Easy b. Compatibility c. Flexibility
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
Answers
1. b. derived
2. a. visualization
3. c. flexibility
CT Department III BSC CT Even Semester 2019 - 20
Unit I DATA MINING ISSUES 1DATA MINING
THANK U
Department of Computer Technology III BSC CT SEM V year: 2019-
20
6ED – Data Mining UNIT I Basic Data Mining Tasks

More Related Content

What's hot

Data preparation
Data preparationData preparation
Data preparationTony Nguyen
 
Creating simple component
Creating simple componentCreating simple component
Creating simple componentpriya Nithya
 
Network security - OSI Security Architecture
Network security - OSI Security ArchitectureNetwork security - OSI Security Architecture
Network security - OSI Security ArchitectureBharathiKrishna6
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
All data models in dbms
All data models in dbmsAll data models in dbms
All data models in dbmsNaresh Kumar
 
Introduction to IoT Architectures and Protocols
Introduction to IoT Architectures and ProtocolsIntroduction to IoT Architectures and Protocols
Introduction to IoT Architectures and ProtocolsAbdullah Alfadhly
 
Advanced database protocols
Advanced database protocolsAdvanced database protocols
Advanced database protocolsHitesh Mohapatra
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture janani thirupathi
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
Chapter 5 database security
Chapter 5   database securityChapter 5   database security
Chapter 5 database securitySyaiful Ahdan
 
Components and Advantages of DBMS
Components and Advantages of DBMSComponents and Advantages of DBMS
Components and Advantages of DBMSShubham Joon
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 

What's hot (20)

Data preparation
Data preparationData preparation
Data preparation
 
Sdi & mdi
Sdi & mdiSdi & mdi
Sdi & mdi
 
Watson IOT Platform
Watson IOT PlatformWatson IOT Platform
Watson IOT Platform
 
Creating simple component
Creating simple componentCreating simple component
Creating simple component
 
Network security - OSI Security Architecture
Network security - OSI Security ArchitectureNetwork security - OSI Security Architecture
Network security - OSI Security Architecture
 
Octave
OctaveOctave
Octave
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
All data models in dbms
All data models in dbmsAll data models in dbms
All data models in dbms
 
security and privacy-Internet of things
security and privacy-Internet of thingssecurity and privacy-Internet of things
security and privacy-Internet of things
 
Introduction to IoT Architectures and Protocols
Introduction to IoT Architectures and ProtocolsIntroduction to IoT Architectures and Protocols
Introduction to IoT Architectures and Protocols
 
Advanced database protocols
Advanced database protocolsAdvanced database protocols
Advanced database protocols
 
Data Flow Diagram
Data Flow DiagramData Flow Diagram
Data Flow Diagram
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Big Data & Privacy
Big Data & PrivacyBig Data & Privacy
Big Data & Privacy
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Chapter 5 database security
Chapter 5   database securityChapter 5   database security
Chapter 5 database security
 
Components and Advantages of DBMS
Components and Advantages of DBMSComponents and Advantages of DBMS
Components and Advantages of DBMS
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 

Similar to Dm issues u 1

Dm kinds of task,structured, flatfile u 1
Dm kinds of task,structured, flatfile u 1Dm kinds of task,structured, flatfile u 1
Dm kinds of task,structured, flatfile u 1sakthyvel3
 
Clustering, application, methods u 1
Clustering, application, methods u 1Clustering, application, methods u 1
Clustering, application, methods u 1sakthyvel3
 
Meet the Majors and Minors Panel Fall 2019 Bentley University
Meet the Majors and Minors Panel Fall 2019 Bentley UniversityMeet the Majors and Minors Panel Fall 2019 Bentley University
Meet the Majors and Minors Panel Fall 2019 Bentley UniversityMark Frydenberg
 
Data Ecosystems for Geospatial Data
Data Ecosystems for Geospatial DataData Ecosystems for Geospatial Data
Data Ecosystems for Geospatial DataSlim Turki, Dr.
 
Ark Product and Process Design v3 Ark Process Metrics
Ark Product and Process Design v3 Ark Process MetricsArk Product and Process Design v3 Ark Process Metrics
Ark Product and Process Design v3 Ark Process MetricsBrij Consulting, LLC
 
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector WebinarBigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector WebinarBig Data Value Association
 
The Data Science Process: From Mining Raw Data to Story Visualization
The Data Science Process: From Mining Raw Data to Story VisualizationThe Data Science Process: From Mining Raw Data to Story Visualization
The Data Science Process: From Mining Raw Data to Story VisualizationDemetris Trihinas
 
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge ComputingStreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge ComputingDemetris Trihinas
 
Ark Product and Process Design v2 Design Metrics
Ark Product and Process Design v2 Design MetricsArk Product and Process Design v2 Design Metrics
Ark Product and Process Design v2 Design MetricsBrij Consulting, LLC
 
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Denis Parra Santander
 
Ypo 20190131 v1
Ypo 20190131 v1 Ypo 20190131 v1
Ypo 20190131 v1 ISSIP
 
#Cedem2017 Smart Cities of Self-Determined Data Subjects
#Cedem2017  Smart Cities of Self-Determined Data Subjects  #Cedem2017  Smart Cities of Self-Determined Data Subjects
#Cedem2017 Smart Cities of Self-Determined Data Subjects Malgorzata Zofia Goraczek
 
Big Data Pilot Demo Days – I-BiDaaS Sets the Scene
Big Data Pilot Demo Days – I-BiDaaS Sets the SceneBig Data Pilot Demo Days – I-BiDaaS Sets the Scene
Big Data Pilot Demo Days – I-BiDaaS Sets the SceneBig Data Value Association
 
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...Dr. Haxel Consult
 
Making accessibility mainstream
Making accessibility mainstreamMaking accessibility mainstream
Making accessibility mainstreamJisc
 
Data Mining @ Information Age
Data Mining @ Information AgeData Mining @ Information Age
Data Mining @ Information AgeIIRindia
 
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdfR18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdfNaveen Kumar
 
FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...
FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...
FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...Marco Balduini
 

Similar to Dm issues u 1 (20)

Dm kinds of task,structured, flatfile u 1
Dm kinds of task,structured, flatfile u 1Dm kinds of task,structured, flatfile u 1
Dm kinds of task,structured, flatfile u 1
 
Clustering, application, methods u 1
Clustering, application, methods u 1Clustering, application, methods u 1
Clustering, application, methods u 1
 
Meet the Majors and Minors Panel Fall 2019 Bentley University
Meet the Majors and Minors Panel Fall 2019 Bentley UniversityMeet the Majors and Minors Panel Fall 2019 Bentley University
Meet the Majors and Minors Panel Fall 2019 Bentley University
 
Data Ecosystems for Geospatial Data
Data Ecosystems for Geospatial DataData Ecosystems for Geospatial Data
Data Ecosystems for Geospatial Data
 
Ark Product and Process Design v3 Ark Process Metrics
Ark Product and Process Design v3 Ark Process MetricsArk Product and Process Design v3 Ark Process Metrics
Ark Product and Process Design v3 Ark Process Metrics
 
Lecture_1.pptx
Lecture_1.pptxLecture_1.pptx
Lecture_1.pptx
 
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector WebinarBigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
 
The Data Science Process: From Mining Raw Data to Story Visualization
The Data Science Process: From Mining Raw Data to Story VisualizationThe Data Science Process: From Mining Raw Data to Story Visualization
The Data Science Process: From Mining Raw Data to Story Visualization
 
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge ComputingStreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
 
Ark Product and Process Design v2 Design Metrics
Ark Product and Process Design v2 Design MetricsArk Product and Process Design v2 Design Metrics
Ark Product and Process Design v2 Design Metrics
 
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
 
Ypo 20190131 v1
Ypo 20190131 v1 Ypo 20190131 v1
Ypo 20190131 v1
 
#Cedem2017 Smart Cities of Self-Determined Data Subjects
#Cedem2017  Smart Cities of Self-Determined Data Subjects  #Cedem2017  Smart Cities of Self-Determined Data Subjects
#Cedem2017 Smart Cities of Self-Determined Data Subjects
 
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
 
Big Data Pilot Demo Days – I-BiDaaS Sets the Scene
Big Data Pilot Demo Days – I-BiDaaS Sets the SceneBig Data Pilot Demo Days – I-BiDaaS Sets the Scene
Big Data Pilot Demo Days – I-BiDaaS Sets the Scene
 
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
 
Making accessibility mainstream
Making accessibility mainstreamMaking accessibility mainstream
Making accessibility mainstream
 
Data Mining @ Information Age
Data Mining @ Information AgeData Mining @ Information Age
Data Mining @ Information Age
 
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdfR18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
R18B.Tech.CSE(DataScience)IIIIVYearTentativeSyllabus.pdf
 
FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...
FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...
FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...
 

Recently uploaded

fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 

Recently uploaded (20)

fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 

Dm issues u 1

  • 1. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Course: Data Mining Sub Code: 6ED Google Classroom: q7b4gv Programme: B.Sc-CT Unit: I Hour : 6 DATA MINING ISSUES FACULTY : Ms.A.SATHIYA PRIYA
  • 2. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING 2 Department of Computer Technology III BSC CT SEM V Year: 2019- 20 UNIT I Basic Data Mining Tasks6ED – Data Mining SNAP TALK 2
  • 3. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING 3 Department of Computer Technology III BSC CT SEM V Year: 2019- 20 UNIT I Basic Data Mining Tasks6ED – Data Mining ATTENDANCE 3
  • 4. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Expected outcome  The outcome of this session is to understand about the Data Mining Issues.
  • 5. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING  There are some crucial implementation issues associated with data mining.  Partitioning them into five groups,  Mining methodology  User integration  Efficiency and Scalability  Diversity of data types  Data mining and Society 5 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining DATA MINING ISSUES
  • 6. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Mining Methodology  Mining various and new kinds of knowledge: Data mining covers a wide a spectrum of data analysis and knowledge discovery tasks.  The data mining tasks may use same database in different ways.  It require the development of numerous data mining techniques. 6 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 7. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Cont..,  Due to the diversity of applications, new mining tasks continue to emerge, making data mining a dynamic and fast growing field. 7 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 8. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Mining knowledge in multidimensional space  When searching for knowledge in large data sets.  It can explore the data in multidimensional space.  Search for interesting patterns among the combinations of dimensions at varying levels of abstraction.  Data can be aggregated or viewed as a multidimensional data cube.  Mining knowledge in cube space can substantially enhance the power and flexibility of data mining. 8 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 9. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Data mining-an interdisciplinary effort  The power of data mining can be substantially enhanced by integrating new methods from multiple disciplines.  The mining of software bugs in large programs.  This form of mining known as bug-mining.  The incorporation of software engineering knowledge into the data mining process. 9 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 10. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Boosting the power of discovery in a networked environment  Most data objects reside in a linked or interconnected environment.  It be the web, databases relations, files or documents.  Semantic links across multiple data objects can be used to advantage in data mining. 10 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 11. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Cont..,  Knowledge derived in one set of objects can be used to boost the discovery of knowledge in a “related” or semantically linked set of objects. 11 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 12. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Handling uncertainty, noise of incompleteness of data  Data often contain noise, errors, exceptions or uncertainty, or are incomplete.  Errors and noise may confuse the data mining process, leading to the derivation of erroneous patterns. 12 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 13. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Cont..,  Data cleaning, data preprocessing, outlier detection and removal, and uncertainty reasoning. 13 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 14. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Missing Data  There may be missing variable values, incomplete data.  Some algorithms require complete data.  Missing values have to be estimated or variables with very frequent missing values perhaps to be removed. 14 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 15. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Irrelevant Data  Some variables may be useless.  If all values of a variable are constant, it is called dead and can be removed.  If almost all values are constant, it is not straightforward whether it can be removed.  Those very rare values could be essential in some situations. 15 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 16. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Noisy Data  Some values might be invalid or incorrect.  A user or a measuring equipment has given a false value.  These are corrected or deleted, but first they have to be found. 16 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 17. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Outliers  There are sometimes many data entries that do not fit nicely into the derived model.  They may be erroneous values or otherwise exceptional that are best to remove.  For instance, the age of 0 year for patient data is such. 17 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 18. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Pattern evaluation and pattern or constraint guided mining  A pattern interesting may vary from user to user.  Techniques are needed to asses the interestingness of discovered patterns based on subjective measures.  The value patterns with respect to a given user class, based on user beliefs or expectations. 18 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 19. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING User Interaction  Interactive Mining: The data mining process should be highly interactive.  It is important to build flexible user interfaces.  An exploratory mining environment, facilitating the users interaction with the system.  First sample a set of data, explore general characteristics of the data and estimate potential mining results. 19 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 20. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Cont..,  Data mining problems are often not precisely stated, both application domain and data mining experts are needed.  Training data and results desired are defined.  Interpretation of results is important to do carefully. 20 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 21. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Incorporation of background knowledge  Background knowledge, constraints, rules and other information regarding the domain under study should be incorporated into the knowledge discovery process. 21 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 22. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Presentation and visualization of data mining results  A data mining system present data mining results, vividly and flexibility.  The system to adopt expressive knowledge representations, user friendly interfaces and visualization techniques. 22 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 23. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Interpretation  This may require experts to correctly interpret the results obtained. Visualization  To easily view and understand input data and results visualization is helpful. 23 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 24. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Large Datasets  Data set may be massive which create problems to handle such.  Sampling and parallelization are effective to attack these problems. 24 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 25. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Multimedia Data  Usually data mining methods are targeted to traditional data types, i.e., numeric, characters an text.  They are not always suitable for multimedia, e.g., geographic data (GIS). 25 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 26. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Changing Data  Data cannot be assumed to be static even if mostly we start from this thought.  Therefore, algorithms must be rerun from time to time. 26 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 27. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Over fitting  Over fitting occurs when a model is built to be too detailed or strictly fit the data given.  Thus, it may lose its generalization ability and is not valid for future data. 27 Department of CT III-B.Sc-CS VI Semester: 2017-18 Unit I Data Mining Issues6ED – Data Mining
  • 28. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Points to ponder  The data mining tasks may use same database in different ways.  It require the development of numerous data mining techniques.  The data mining process should be highly interactive.  A data mining system present data mining results, vividly and flexibility.
  • 29. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Keywords same database in different ways. input data and results visualization rerun from time to time. not valid for future data. Sampling and parallelization.
  • 30. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING
  • 31. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING MCQ’S 1. Many data entries that do not fit nicely into the ______model A. Concurrent b. Derived c. Algorithm 2. To easily view and understand input data and results ________is helpful. A. Visualization b. Related information C. Comparision 3. A data mining system present data mining results, vividly and________. A. Easy b. Compatibility c. Flexibility
  • 32. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING Answers 1. b. derived 2. a. visualization 3. c. flexibility
  • 33. CT Department III BSC CT Even Semester 2019 - 20 Unit I DATA MINING ISSUES 1DATA MINING THANK U Department of Computer Technology III BSC CT SEM V year: 2019- 20 6ED – Data Mining UNIT I Basic Data Mining Tasks