SlideShare a Scribd company logo
1 of 8
Mining Complex Data
• Vast amounts of data are stored in various complex forms, such as structured or unstructured,
hypertext,andmultimedia.
• Thus, mining complex types of data, including object data, spatial data, multimedia data, text data, and
Web data, hasbecomeanincreasinglyimportant taskin datamining.
• Multidimensional analysisand datamining canbeperformed in object-relational and object-oriented
databases, by
• (1) class-based generalization of complex objects,
• including set-valued,list-valued,andother sophisticatedtypesof data,class/subclass
hierarchies,andclasscomposition hierarchies;
• (2) constructing object data cubes; and
• (3) performing generalization-based mining. A plan database canbeminedby a
• generalization-based,divide-and-conquerapproachin order to find interestinggeneral
• patternsat different levelsof abstraction.
• Spatial data mining is thediscoveryof interestingpatternsfrom largegeospatial
• databases. Spatial data cubes that contain spatial dimensionsandmeasurescanbe
• constructed. Spatial OLAP canbeimplementedto facilitate multidimensional spatial
• data analysis. Spatialdatamining includes mining spatial association and co-location
• patterns, clustering, classification, and spatial trend and outlier analysis.
• Multimedia data mining is thediscoveryof interestingpatternsfrom multimedia
• databasesthat storeandmanagelargecollectionsof multimediaobjects,including
• audio data,imagedata,videodata,sequencedata,andhypertext datacontaining
• text, text markups,andlinkages.Issuesin multimediadatamining include contentbased
• retrieval and similarity search, and generalization and multidimensional analysis.
• Multimediadatacubescontain additional dimensionsandmeasuresfor multimedia
• information. Other topicsin multimediamining include classification and prediction
• analysis, mining associations, and audio and video data mining.
• A substantial portion of theavailableinformation is storedin text or document
• databasesthat consistof largecollectionsof documents,suchasnewsarticles,technical
• papers,books,digital libraries,e-mail messages,andWebpages.Text information
• retrievalanddatamining hasthusbecomeincreasinglyimportant. Precision, recall,
• and the F-score arethreebasedmeasuresfrom Information Retrieval(IR). Various
• text retrieval methods havebeendeveloped.Thesetypically either focuson document
• selection (wherethequeryis regardedasproviding constraints) or document ranking
• (wherethequeryis usedto rank documentsin order of relevance).The vector-space
• model is apopular exampleof thelatter kind. LatexSementicIndexing(LSI), Locality
• PreservingIndexing(LPI), andProbabilistic LSI canbeusedfor text dimensionality
• reduction. Text mining goesonestepbeyondkeyword-basedandsimilarity-based
• information retrievalanddiscoversknowledgefrom semistructuredtext datausing
• methodssuchas keyword-based association analysis, document classification, and document
• clustering.
• TheWorld WideWebservesasahuge,widely distributed,globalinformation service
• centerfor news,advertisements,consumerinformation, financial management,
• education,government,e-commerce,andmanyother services.It alsocontainsarich
• anddynamiccollection of hyperlink information, andaccessandusageinformation,
• providing rich sourcesfor datamining. Web mining includesmining Web linkage
• structures, Web contents, and Web access patterns. This involvesmining the Web page
• layout structure, mining the Web’s link structures to identify authoritative Web pages,
• mining multimedia data on theWeb, automatic classification of Web documents, and
• Web usage mining.
• Trends in data mining
• Trends in Data Mining
• Thediversity of data,datamining tasks,anddatamining approachesposesmanychallenging
• researchissuesin datamining. Thedevelopmentof efficient andeffectivedata
• mining methodsandsystems,theconstruction of interactiveandintegrateddatamining
• environments,thedesignof datamining languages,andtheapplication of datamining
• techniquesto solvelargeapplication problemsareimportant tasksfor datamining
• researchersanddatamining systemandapplication developers.Thissection describes
• someof thetrendsin datamining that reflect thepursuit of thesechallenges:
• Application exploration: Early datamining applicationsfocusedmainly on helping
• businessesgain acompetitiveedge.Theexploration of datamining for businesses
• continuesto expandase-commerceande-marketing havebecomemainstreamelements
• of theretail industry. Datamining is increasinglyusedfor theexploration
• of applicationsin other areas,suchasfinancial analysis,telecommunications,
• biomedicine,andscience.Emergingapplication areasincludedatamining for counterterrorism
• (including andbeyondintrusion detection) andmobile(wireless)data
• mining. Asgenericdatamining systemsmayhavelimitationsin dealingwith
• application-specific problems,wemayseeatrend toward thedevelopmentof more
• application-specific datamining systems.
• Scalable and interactive data mining methods: In contrastwith traditional dataanalysis
• methods,datamining must beableto handlehugeamountsof dataefficiently
• and,if possible,interactively.Becausetheamount of databeingcollectedcontinues
• to increaserapidly, scalablealgorithmsfor individual andintegrateddatamining
• functionsbecomeessential.Oneimportant direction toward improving theoverall
• efficiencyof themining processwhile increasinguserinteraction is constraint-based
• mining. Thisprovidesuserswith addedcontrol by allowing thespecification anduse
• of constraintsto guidedatamining systemsin their searchfor interestingpatterns.
• Integration of data mining with database systems, data warehouse systems, and
• Web database systems: Databasesystems,datawarehousesystems,andtheWebhave
• becomemainstreaminformation processingsystems.It is important to ensurethat
• datamining servesasanessentialdataanalysiscomponent that canbesmoothly
• integratedinto suchaninformation processingenvironment. Asdiscussedearlier,
• adatamining systemshould betightly coupledwith databaseanddatawarehouse
• systems.Transaction management,queryprocessing,on-line analyticalprocessing,
• andon-line analyticalmining should beintegratedinto oneunified framework. This
• will ensuredataavailability, datamining portability, scalability, high performance,
• andanintegratedinformation processingenvironment for multidimensionaldata
• analysisandexploration.
• Standardization of data mining language: A standarddatamining languageor other
• standardization effortswill facilitatethesystematicdevelopment of datamining solutions,
• improveinteroperability amongmultiple datamining systemsandfunctions,
• andpromotetheeducation anduseof datamining systemsin industry andsociety.
• Recenteffortsin this direction includeMicrosoft’sOLEDB for DataMining (the
• appendix of this book providesanintroduction), PMML, andCRISP-DM.
• Visual data mining: Visualdatamining is aneffectivewayto discoverknowledge
• from hugeamountsof data.Thesystematicstudyanddevelopment of visualdata
• mining techniqueswill facilitatethepromotion anduseof datamining asatool for
• dataanalysis.
• New methods for mining complex types of data: Asshownin Chapters8 to 10,
• mining complextypesof datais animportant researchfrontier in datamining.
• Althoughprogresshasbeenmadein mining stream,time-series,sequence,graph,
• spatiotemporal,multimedia,andtext data,thereis still ahugegapbetweentheneeds
• for theseapplicationsandtheavailabletechnology.More researchis required,especially
• toward theintegration of datamining methodswith existingdataanalysis
• techniquesfor thesetypesof data.
• Biological data mining: Althoughbiologicaldatamining canbeconsideredunder
• “application exploration” or “mining complextypesof data,” theuniquecombination
• of complexity, richness,size,andimportanceof biologicaldatawarrants
• specialattention in datamining. Mining DNA andprotein sequences,mining highdimensional
• microarraydata,biologicalpathwayandnetwork analysis,link analysis
• acrossheterogeneousbiologicaldata,andinformation integration of biologicaldata
• by datamining areinterestingtopicsfor biologicaldatamining research.
• Data mining and software engineering: Assoftwareprogramsbecomeincreasingly
• bulky in size,sophisticatedin complexity, andtend to originatefrom theintegration
• of multiple componentsdevelopedby different softwareteams,it is anincreasingly
• challengingtaskto ensuresoftwarerobustnessandreliability. Theanalysisof the
• executionsof abuggysoftwareprogramis essentially adatamining process—
• tracingthedatageneratedduring programexecutionsmaydiscloseimportant
• patternsandoutliersthat mayleadto theeventualautomateddiscoveryof software
• bugs.Weexpectthat thefurther development of datamining methodologiesfor software
• debuggingwill enhancesoftwarerobustnessandbring newvigor to softwareengineering.
• Web mining: Issuesrelatedto Webmining werealsodiscussedin Chapter10.Given
• thehugeamount of information availableon theWebandtheincreasinglyimportant
• role that theWebplaysin today’ssociety,Webcontent mining, Weblogmining, and
• datamining serviceson theInternet will becomeoneof themost important and
• flourishing subfieldsin datamining.
• Distributed data mining: Traditional datamining methods,designedto work at a
• centralizedlocation, do not work well in manyof thedistributedcomputing environments
• presenttoday(e.g.,theInternet,intranets,localareanetworks,high-speed
• wirelessnetworks,andsensornetworks).Advancesin distributeddatamining methods
• areexpected.
• Real-time or time-critical data mining: Many applicationsinvolving streamdata
• (suchase-commerce,Webmining, stockanalysis,intrusion detection,mobiledata
• mining, anddatamining for counterterrorism) requiredynamicdatamining models
• to bebuilt in realtime.Additional developmentis neededin thisarea.
• Graph mining, link analysis, and social network analysis: Graphmining, link analysis,
• andsocialnetwork analysisareuseful for capturing sequential, topological,geometric,
• andother relational characteristicsof manyscientific datasets(suchasfor
• chemicalcompoundsandbiologicalnetworks)andsocialdatasets(suchasfor the
• analysisof hiddencriminal networks).Suchmodelingis alsouseful for analyzinglinks
• in Webstructuremining. Thedevelopment of efficient graphandlinkagemodelsis
• achallengefor datamining.
• Multirelational andmultidatabase data mining:Most datamining approachessearch
• for patternsin asinglerelational tableor in asingledatabase.However,most realworld
• dataandinformation arespreadacrossmultipletablesanddatabases.Multirelational
• datamining methodssearchfor patternsinvolvingmultiple tables(relations)
• from arelational database.Multidatabasemining searchesfor patternsacrossmultiple
• databases.Further researchis expectedin effectiveandefficient datamining
• acrossmultiple relationsandmultiple databases.
• Privacy protection and information security in data mining: An abundanceof
• recordedpersonalinformation availablein electronic formsandon theWeb,coupled
• with increasinglypowerful datamining tools,posesathreatto our privacy
• anddatasecurity. Growing interestin datamining for counterterrorism alsoadds
• to thethreat.Further developmentof privacy-preservingdatamining methodsis
Data Mining, Privacy, and Data Security
With moreandmoreinformation accessiblein electronic formsandavailableon the
• Web,andwith increasinglypowerful datamining toolsbeingdevelopedandput into
use,thereareincreasingconcernsthat datamining mayposeathreatto our privacy
anddatasecurity. However,it is important to notethat most of themajor datamining
applicationsdo not eventouch personaldata.Prominent examplesincludeapplications
involving natural resources,theprediction of floodsanddroughts,meteorology,
astronomy,geography,geology,biology,andother scientific andengineeringdata.Furthermore,
most studiesin datamining focuson thedevelopment of scalablealgorithms
andalsodo not involvepersonaldata.Thefocusof datamining technologyis on the
discovery of general patterns, not on specific information regardingindividuals.In this
sense,webelievethat therealprivacyconcernsarewith unconstrainedaccessof individual
records,like credit card andbankingapplications,for example,which must access
privacy-sensitiveinformation. For thosedatamining applicationsthat do involvepersonal
data,in manycases,simplemethodssuchasremovingsensitiveIDs fromdatamay
protect theprivacyof most individuals.Numerousdatasecurity–enhancingtechniques
havebeendevelopedrecently. In addition, therehasbeenagreatdealof recenteffort on
developingprivacy-preserving datamining methods.In this section,welook at someof
theadvancesin protecting privacyanddatasecurity in datamining.
In 1980,theOrganization for EconomicCo-operation andDevelopment(OECD)
establishedasetof international guidelines,referredto asfair information practices.
Theseguidelinesaim to protect privacyanddataaccuracy.Theycoveraspectsrelating
to datacollection, use,openness,security, quality, andaccountability. Theyincludethe
following principles:
Purpose specification and use limitation: Thepurposesfor which personaldataare
collectedshould bespecifiedat thetime of collection, andthedatacollectedshould
not exceedthestatedpurpose.Datamining is typically asecondarypurposeof the
datacollection. It hasbeenarguedthat attachingadisclaimer that thedatamayalso
beusedfor mining is generally not acceptedassufficient disclosureof intent. Dueto
theexploratory natureof datamining, it is impossibleto know what patternsmay
bediscovered;therefore,thereis no certainty overhow theymaybeused.
Openness: Thereshould beageneralpolicy of opennessabout developments,practices,
andpolicieswith respectto personaldata.Individualshavetheright to know the
natureof thedatacollectedabout them,theidentity of thedatacontroller (responsible
for ensuringtheprinciples),andhow thedataarebeingused.
Security Safeguards: Personaldatashould beprotectedby reasonablesecurity safeguards
againstsuchrisksaslossor unauthorizedaccess,destruction, use,modification,
or disclosureof data.
IndividualParticipation:Anindividual should havetheright to learnwhetherthedata
controller hasdatarelating to him or her, andif so,what that datais.Theindividual
mayalsochallengesuchdata.If thechallengeis successful,theindividual hastheright
to havethedataerased,corrected,or completed.Typically, inaccuratedataareonly
detectedwhenanindividual experiencessomerepercussionfromit, suchasthedenial
of credit orwithholding of apayment.Theorganization involvedusually cannot detect
• suchinaccuraciesbecausetheylackthecontextualknowledgenecessary.
“How can these principles help protect customers from companies that collect personal
client data?” Onesolution is for suchcompaniesto provideconsumerswith multiple
opt-out choices,allowing consumersto specifylimitationson theuseof their personal
data,suchas(1) theconsumer’spersonaldataarenot to beusedat all for datamining;
(2) theconsumer’sdatacanbeusedfor datamining, but theidentity of eachconsumer
or anyinformation that mayleadto thedisclosureof aperson’sidentity should be
removed;(3) thedatamaybeusedfor in-housemining only; or (4) thedatamaybe
usedin-houseandexternally aswell. Alternatively,companiesmayprovideconsumers
with positiveconsent,that is, by allowing consumersto opt in on thesecondaryuseof
their information for datamining. Ideally, consumersshould beableto call atoll-free
numberor accessacompanywebsitein order to opt in or out andrequestaccessto their
personaldata.
Counterterrorism is anewapplication areafor datamining that is gaining interest.
Data mining for counterterrorism maybeusedto detectunusualpatterns,terrorist
activities(including bioterrorism), andfraudulent behavior. Thisapplication areais in
its infancybecauseit facesmanychallenges.Theseincludedevelopingalgorithmsfor
real-time mining (e.g.,for building modelsin realtime,soasto detectreal-time threats
suchasthat abuilding is scheduledto bebombedby 10a.m.thenext morning); for
multimediadatamining (involving audio,video,andimagemining, in addition to text
mining); andin finding unclassifieddatato testsuchapplications.While thisnewform
of datamining raisesconcernsabout individual privacy,it is againimportant to note
that thedatamining researchis to developatool for thedetection of abnormal patterns
or activities,andtheuseof suchtoolsto accesscertain datato uncoverterrorist patterns
or activities is confinedonly to authorized security agents.
“What can we do to secure the privacy of individuals while collecting and mining data?”
Many data security–enhancing techniques havebeendevelopedto help protect data.
Databasescanemployamultilevel security modelto classifyandrestrict dataaccording
to varioussecurity levels,with userspermitted accessto only their authorizedlevel.
It hasbeenshown,however,that usersexecutingspecific queriesat their authorized
security levelcanstill infer moresensitiveinformation, andthat asimilar possibility can
occur throughdatamining. Encryption is anothertechniquein which individual data
itemsmaybeencoded.This mayinvolveblind signatures (which build on public key
encryption), biometric encryption (e.g.,wheretheimageof aperson’siris or fingerprint
is usedto encodehisor her personalinformation), andanonymous databases (which
permit theconsolidation of variousdatabasesbut limit accessto personalinformation to
only thosewho needto know; personalinformation is encryptedandstoredat different
locations).Intrusion detection is anotheractiveareaof researchthat helpsprotect the
privacyof personaldata.
Privacy-preserving data mining is anewareaof datamining researchthat is emerging
in responseto privacyprotection during mining. It is alsoknown asprivacy-enhanced or
privacy-sensitive datamining. It dealswith obtaining valid datamining resultswithout
learning theunderlying datavalues.Therearetwo common approaches:secure multiparty
computation anddata obscuration. In secure multiparty computation, datavalues
• areencodedusingsimulation andcryptographictechniquessothat no party canlearn
another’sdatavalues.This approachcanbeimpracticalwhenmining largedatabases.
In data obscuration, theactualdataaredistortedby aggregation (suchasusingtheaverage
incomefor aneighborhood,rather than theactualincomeof residents)or by adding
random noise.Theoriginal distribution of acollection of distorteddatavaluescanbe
approximatedusingareconstruction algorithm. Mining canbeperformedusingthese
approximatedvalues,rather than theactualones.Althoughacommon framework for
defining, measuring,andevaluatingprivacyis needed,manyadvanceshavebeenmade.
Thefield is expectedto flourish.
Likeanyother technology,datamining maybemisused.However,wemust not
losesight of all thebenefitsthat datamining researchcanbring,rangingfrom insights
gainedfrom medicalandscientific applicationsto increasedcustomersatisfaction by
helping companiesbetter suit their clients’needs.We expectthat computer scientists,
policy experts,andcounterterrorism expertswill continueto work with socialscientists,
lawyers,companiesandconsumersto takeresponsibility in building solutions
to ensuredataprivacyprotection andsecurity. In thisway,wemaycontinueto reap
thebenefitsof datamining in termsof time andmoneysavingsandthediscoveryof
• newknowledge.

More Related Content

What's hot

Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378nitttin
 
A Survey On Ontology Agent Based Distributed Data Mining
A Survey On Ontology Agent Based Distributed Data MiningA Survey On Ontology Agent Based Distributed Data Mining
A Survey On Ontology Agent Based Distributed Data MiningEditor IJMTER
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningAarshDhokai
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data miningDataminingTools Inc
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
Research Data Management, Open Data and Zenodo - 6th National Open Access Con...
Research Data Management, Open Data and Zenodo - 6th National Open Access Con...Research Data Management, Open Data and Zenodo - 6th National Open Access Con...
Research Data Management, Open Data and Zenodo - 6th National Open Access Con...Pedro Príncipe
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)silambu111
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 

What's hot (19)

Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
 
A Survey On Ontology Agent Based Distributed Data Mining
A Survey On Ontology Agent Based Distributed Data MiningA Survey On Ontology Agent Based Distributed Data Mining
A Survey On Ontology Agent Based Distributed Data Mining
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Ngdm09 han gao
Ngdm09 han gaoNgdm09 han gao
Ngdm09 han gao
 
Open Science and Open Data for Librarians
Open Science and Open Data for LibrariansOpen Science and Open Data for Librarians
Open Science and Open Data for Librarians
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data Mining: Key definitions
Data Mining: Key definitionsData Mining: Key definitions
Data Mining: Key definitions
 
Research Data Management, Open Data and Zenodo - 6th National Open Access Con...
Research Data Management, Open Data and Zenodo - 6th National Open Access Con...Research Data Management, Open Data and Zenodo - 6th National Open Access Con...
Research Data Management, Open Data and Zenodo - 6th National Open Access Con...
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 

Similar to All types of mining and trends indata mining

UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningNandakumar P
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining Jeremiah Fadugba
 
Data mining basic concept and Data warehousing
Data mining basic concept and Data warehousingData mining basic concept and Data warehousing
Data mining basic concept and Data warehousingNivaTripathy1
 
big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptxssuser96aab9
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Graph Database in Graph Intelligence
Graph Database in Graph IntelligenceGraph Database in Graph Intelligence
Graph Database in Graph IntelligenceChen Zhang
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Datawina wulansari
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slidestafosepsdfasg
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptxAkhirulAminulloh2
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsGDi Techno Solutions
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 

Similar to All types of mining and trends indata mining (20)

UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Data mining basic concept and Data warehousing
Data mining basic concept and Data warehousingData mining basic concept and Data warehousing
Data mining basic concept and Data warehousing
 
big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptx
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Graph Database in Graph Intelligence
Graph Database in Graph IntelligenceGraph Database in Graph Intelligence
Graph Database in Graph Intelligence
 
Data mining
Data miningData mining
Data mining
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Data
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Dm1.1
Dm1.1Dm1.1
Dm1.1
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
dwm.pptx
dwm.pptxdwm.pptx
dwm.pptx
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 

Recently uploaded

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 

Recently uploaded (20)

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 

All types of mining and trends indata mining

  • 1. Mining Complex Data • Vast amounts of data are stored in various complex forms, such as structured or unstructured, hypertext,andmultimedia. • Thus, mining complex types of data, including object data, spatial data, multimedia data, text data, and Web data, hasbecomeanincreasinglyimportant taskin datamining. • Multidimensional analysisand datamining canbeperformed in object-relational and object-oriented databases, by • (1) class-based generalization of complex objects, • including set-valued,list-valued,andother sophisticatedtypesof data,class/subclass hierarchies,andclasscomposition hierarchies; • (2) constructing object data cubes; and • (3) performing generalization-based mining. A plan database canbeminedby a • generalization-based,divide-and-conquerapproachin order to find interestinggeneral • patternsat different levelsof abstraction. • Spatial data mining is thediscoveryof interestingpatternsfrom largegeospatial • databases. Spatial data cubes that contain spatial dimensionsandmeasurescanbe • constructed. Spatial OLAP canbeimplementedto facilitate multidimensional spatial • data analysis. Spatialdatamining includes mining spatial association and co-location • patterns, clustering, classification, and spatial trend and outlier analysis. • Multimedia data mining is thediscoveryof interestingpatternsfrom multimedia • databasesthat storeandmanagelargecollectionsof multimediaobjects,including • audio data,imagedata,videodata,sequencedata,andhypertext datacontaining • text, text markups,andlinkages.Issuesin multimediadatamining include contentbased • retrieval and similarity search, and generalization and multidimensional analysis. • Multimediadatacubescontain additional dimensionsandmeasuresfor multimedia • information. Other topicsin multimediamining include classification and prediction • analysis, mining associations, and audio and video data mining. • A substantial portion of theavailableinformation is storedin text or document • databasesthat consistof largecollectionsof documents,suchasnewsarticles,technical • papers,books,digital libraries,e-mail messages,andWebpages.Text information • retrievalanddatamining hasthusbecomeincreasinglyimportant. Precision, recall, • and the F-score arethreebasedmeasuresfrom Information Retrieval(IR). Various • text retrieval methods havebeendeveloped.Thesetypically either focuson document • selection (wherethequeryis regardedasproviding constraints) or document ranking
  • 2. • (wherethequeryis usedto rank documentsin order of relevance).The vector-space • model is apopular exampleof thelatter kind. LatexSementicIndexing(LSI), Locality • PreservingIndexing(LPI), andProbabilistic LSI canbeusedfor text dimensionality • reduction. Text mining goesonestepbeyondkeyword-basedandsimilarity-based • information retrievalanddiscoversknowledgefrom semistructuredtext datausing • methodssuchas keyword-based association analysis, document classification, and document • clustering. • TheWorld WideWebservesasahuge,widely distributed,globalinformation service • centerfor news,advertisements,consumerinformation, financial management, • education,government,e-commerce,andmanyother services.It alsocontainsarich • anddynamiccollection of hyperlink information, andaccessandusageinformation, • providing rich sourcesfor datamining. Web mining includesmining Web linkage • structures, Web contents, and Web access patterns. This involvesmining the Web page • layout structure, mining the Web’s link structures to identify authoritative Web pages, • mining multimedia data on theWeb, automatic classification of Web documents, and • Web usage mining. • Trends in data mining • Trends in Data Mining • Thediversity of data,datamining tasks,anddatamining approachesposesmanychallenging • researchissuesin datamining. Thedevelopmentof efficient andeffectivedata • mining methodsandsystems,theconstruction of interactiveandintegrateddatamining • environments,thedesignof datamining languages,andtheapplication of datamining • techniquesto solvelargeapplication problemsareimportant tasksfor datamining • researchersanddatamining systemandapplication developers.Thissection describes • someof thetrendsin datamining that reflect thepursuit of thesechallenges: • Application exploration: Early datamining applicationsfocusedmainly on helping • businessesgain acompetitiveedge.Theexploration of datamining for businesses • continuesto expandase-commerceande-marketing havebecomemainstreamelements • of theretail industry. Datamining is increasinglyusedfor theexploration • of applicationsin other areas,suchasfinancial analysis,telecommunications, • biomedicine,andscience.Emergingapplication areasincludedatamining for counterterrorism • (including andbeyondintrusion detection) andmobile(wireless)data • mining. Asgenericdatamining systemsmayhavelimitationsin dealingwith • application-specific problems,wemayseeatrend toward thedevelopmentof more
  • 3. • application-specific datamining systems. • Scalable and interactive data mining methods: In contrastwith traditional dataanalysis • methods,datamining must beableto handlehugeamountsof dataefficiently • and,if possible,interactively.Becausetheamount of databeingcollectedcontinues • to increaserapidly, scalablealgorithmsfor individual andintegrateddatamining • functionsbecomeessential.Oneimportant direction toward improving theoverall • efficiencyof themining processwhile increasinguserinteraction is constraint-based • mining. Thisprovidesuserswith addedcontrol by allowing thespecification anduse • of constraintsto guidedatamining systemsin their searchfor interestingpatterns. • Integration of data mining with database systems, data warehouse systems, and • Web database systems: Databasesystems,datawarehousesystems,andtheWebhave • becomemainstreaminformation processingsystems.It is important to ensurethat • datamining servesasanessentialdataanalysiscomponent that canbesmoothly • integratedinto suchaninformation processingenvironment. Asdiscussedearlier, • adatamining systemshould betightly coupledwith databaseanddatawarehouse • systems.Transaction management,queryprocessing,on-line analyticalprocessing, • andon-line analyticalmining should beintegratedinto oneunified framework. This • will ensuredataavailability, datamining portability, scalability, high performance, • andanintegratedinformation processingenvironment for multidimensionaldata • analysisandexploration. • Standardization of data mining language: A standarddatamining languageor other • standardization effortswill facilitatethesystematicdevelopment of datamining solutions, • improveinteroperability amongmultiple datamining systemsandfunctions, • andpromotetheeducation anduseof datamining systemsin industry andsociety. • Recenteffortsin this direction includeMicrosoft’sOLEDB for DataMining (the • appendix of this book providesanintroduction), PMML, andCRISP-DM. • Visual data mining: Visualdatamining is aneffectivewayto discoverknowledge • from hugeamountsof data.Thesystematicstudyanddevelopment of visualdata • mining techniqueswill facilitatethepromotion anduseof datamining asatool for • dataanalysis. • New methods for mining complex types of data: Asshownin Chapters8 to 10, • mining complextypesof datais animportant researchfrontier in datamining. • Althoughprogresshasbeenmadein mining stream,time-series,sequence,graph, • spatiotemporal,multimedia,andtext data,thereis still ahugegapbetweentheneeds • for theseapplicationsandtheavailabletechnology.More researchis required,especially
  • 4. • toward theintegration of datamining methodswith existingdataanalysis • techniquesfor thesetypesof data. • Biological data mining: Althoughbiologicaldatamining canbeconsideredunder • “application exploration” or “mining complextypesof data,” theuniquecombination • of complexity, richness,size,andimportanceof biologicaldatawarrants • specialattention in datamining. Mining DNA andprotein sequences,mining highdimensional • microarraydata,biologicalpathwayandnetwork analysis,link analysis • acrossheterogeneousbiologicaldata,andinformation integration of biologicaldata • by datamining areinterestingtopicsfor biologicaldatamining research. • Data mining and software engineering: Assoftwareprogramsbecomeincreasingly • bulky in size,sophisticatedin complexity, andtend to originatefrom theintegration • of multiple componentsdevelopedby different softwareteams,it is anincreasingly • challengingtaskto ensuresoftwarerobustnessandreliability. Theanalysisof the • executionsof abuggysoftwareprogramis essentially adatamining process— • tracingthedatageneratedduring programexecutionsmaydiscloseimportant • patternsandoutliersthat mayleadto theeventualautomateddiscoveryof software • bugs.Weexpectthat thefurther development of datamining methodologiesfor software • debuggingwill enhancesoftwarerobustnessandbring newvigor to softwareengineering. • Web mining: Issuesrelatedto Webmining werealsodiscussedin Chapter10.Given • thehugeamount of information availableon theWebandtheincreasinglyimportant • role that theWebplaysin today’ssociety,Webcontent mining, Weblogmining, and • datamining serviceson theInternet will becomeoneof themost important and • flourishing subfieldsin datamining. • Distributed data mining: Traditional datamining methods,designedto work at a • centralizedlocation, do not work well in manyof thedistributedcomputing environments • presenttoday(e.g.,theInternet,intranets,localareanetworks,high-speed • wirelessnetworks,andsensornetworks).Advancesin distributeddatamining methods • areexpected. • Real-time or time-critical data mining: Many applicationsinvolving streamdata • (suchase-commerce,Webmining, stockanalysis,intrusion detection,mobiledata • mining, anddatamining for counterterrorism) requiredynamicdatamining models • to bebuilt in realtime.Additional developmentis neededin thisarea. • Graph mining, link analysis, and social network analysis: Graphmining, link analysis, • andsocialnetwork analysisareuseful for capturing sequential, topological,geometric, • andother relational characteristicsof manyscientific datasets(suchasfor
  • 5. • chemicalcompoundsandbiologicalnetworks)andsocialdatasets(suchasfor the • analysisof hiddencriminal networks).Suchmodelingis alsouseful for analyzinglinks • in Webstructuremining. Thedevelopment of efficient graphandlinkagemodelsis • achallengefor datamining. • Multirelational andmultidatabase data mining:Most datamining approachessearch • for patternsin asinglerelational tableor in asingledatabase.However,most realworld • dataandinformation arespreadacrossmultipletablesanddatabases.Multirelational • datamining methodssearchfor patternsinvolvingmultiple tables(relations) • from arelational database.Multidatabasemining searchesfor patternsacrossmultiple • databases.Further researchis expectedin effectiveandefficient datamining • acrossmultiple relationsandmultiple databases. • Privacy protection and information security in data mining: An abundanceof • recordedpersonalinformation availablein electronic formsandon theWeb,coupled • with increasinglypowerful datamining tools,posesathreatto our privacy • anddatasecurity. Growing interestin datamining for counterterrorism alsoadds • to thethreat.Further developmentof privacy-preservingdatamining methodsis Data Mining, Privacy, and Data Security With moreandmoreinformation accessiblein electronic formsandavailableon the • Web,andwith increasinglypowerful datamining toolsbeingdevelopedandput into use,thereareincreasingconcernsthat datamining mayposeathreatto our privacy anddatasecurity. However,it is important to notethat most of themajor datamining applicationsdo not eventouch personaldata.Prominent examplesincludeapplications involving natural resources,theprediction of floodsanddroughts,meteorology, astronomy,geography,geology,biology,andother scientific andengineeringdata.Furthermore, most studiesin datamining focuson thedevelopment of scalablealgorithms andalsodo not involvepersonaldata.Thefocusof datamining technologyis on the discovery of general patterns, not on specific information regardingindividuals.In this sense,webelievethat therealprivacyconcernsarewith unconstrainedaccessof individual records,like credit card andbankingapplications,for example,which must access privacy-sensitiveinformation. For thosedatamining applicationsthat do involvepersonal data,in manycases,simplemethodssuchasremovingsensitiveIDs fromdatamay protect theprivacyof most individuals.Numerousdatasecurity–enhancingtechniques havebeendevelopedrecently. In addition, therehasbeenagreatdealof recenteffort on developingprivacy-preserving datamining methods.In this section,welook at someof theadvancesin protecting privacyanddatasecurity in datamining.
  • 6. In 1980,theOrganization for EconomicCo-operation andDevelopment(OECD) establishedasetof international guidelines,referredto asfair information practices. Theseguidelinesaim to protect privacyanddataaccuracy.Theycoveraspectsrelating to datacollection, use,openness,security, quality, andaccountability. Theyincludethe following principles: Purpose specification and use limitation: Thepurposesfor which personaldataare collectedshould bespecifiedat thetime of collection, andthedatacollectedshould not exceedthestatedpurpose.Datamining is typically asecondarypurposeof the datacollection. It hasbeenarguedthat attachingadisclaimer that thedatamayalso beusedfor mining is generally not acceptedassufficient disclosureof intent. Dueto theexploratory natureof datamining, it is impossibleto know what patternsmay bediscovered;therefore,thereis no certainty overhow theymaybeused. Openness: Thereshould beageneralpolicy of opennessabout developments,practices, andpolicieswith respectto personaldata.Individualshavetheright to know the natureof thedatacollectedabout them,theidentity of thedatacontroller (responsible for ensuringtheprinciples),andhow thedataarebeingused. Security Safeguards: Personaldatashould beprotectedby reasonablesecurity safeguards againstsuchrisksaslossor unauthorizedaccess,destruction, use,modification, or disclosureof data. IndividualParticipation:Anindividual should havetheright to learnwhetherthedata controller hasdatarelating to him or her, andif so,what that datais.Theindividual mayalsochallengesuchdata.If thechallengeis successful,theindividual hastheright to havethedataerased,corrected,or completed.Typically, inaccuratedataareonly detectedwhenanindividual experiencessomerepercussionfromit, suchasthedenial of credit orwithholding of apayment.Theorganization involvedusually cannot detect • suchinaccuraciesbecausetheylackthecontextualknowledgenecessary. “How can these principles help protect customers from companies that collect personal client data?” Onesolution is for suchcompaniesto provideconsumerswith multiple opt-out choices,allowing consumersto specifylimitationson theuseof their personal data,suchas(1) theconsumer’spersonaldataarenot to beusedat all for datamining; (2) theconsumer’sdatacanbeusedfor datamining, but theidentity of eachconsumer or anyinformation that mayleadto thedisclosureof aperson’sidentity should be removed;(3) thedatamaybeusedfor in-housemining only; or (4) thedatamaybe usedin-houseandexternally aswell. Alternatively,companiesmayprovideconsumers with positiveconsent,that is, by allowing consumersto opt in on thesecondaryuseof their information for datamining. Ideally, consumersshould beableto call atoll-free
  • 7. numberor accessacompanywebsitein order to opt in or out andrequestaccessto their personaldata. Counterterrorism is anewapplication areafor datamining that is gaining interest. Data mining for counterterrorism maybeusedto detectunusualpatterns,terrorist activities(including bioterrorism), andfraudulent behavior. Thisapplication areais in its infancybecauseit facesmanychallenges.Theseincludedevelopingalgorithmsfor real-time mining (e.g.,for building modelsin realtime,soasto detectreal-time threats suchasthat abuilding is scheduledto bebombedby 10a.m.thenext morning); for multimediadatamining (involving audio,video,andimagemining, in addition to text mining); andin finding unclassifieddatato testsuchapplications.While thisnewform of datamining raisesconcernsabout individual privacy,it is againimportant to note that thedatamining researchis to developatool for thedetection of abnormal patterns or activities,andtheuseof suchtoolsto accesscertain datato uncoverterrorist patterns or activities is confinedonly to authorized security agents. “What can we do to secure the privacy of individuals while collecting and mining data?” Many data security–enhancing techniques havebeendevelopedto help protect data. Databasescanemployamultilevel security modelto classifyandrestrict dataaccording to varioussecurity levels,with userspermitted accessto only their authorizedlevel. It hasbeenshown,however,that usersexecutingspecific queriesat their authorized security levelcanstill infer moresensitiveinformation, andthat asimilar possibility can occur throughdatamining. Encryption is anothertechniquein which individual data itemsmaybeencoded.This mayinvolveblind signatures (which build on public key encryption), biometric encryption (e.g.,wheretheimageof aperson’siris or fingerprint is usedto encodehisor her personalinformation), andanonymous databases (which permit theconsolidation of variousdatabasesbut limit accessto personalinformation to only thosewho needto know; personalinformation is encryptedandstoredat different locations).Intrusion detection is anotheractiveareaof researchthat helpsprotect the privacyof personaldata. Privacy-preserving data mining is anewareaof datamining researchthat is emerging in responseto privacyprotection during mining. It is alsoknown asprivacy-enhanced or privacy-sensitive datamining. It dealswith obtaining valid datamining resultswithout learning theunderlying datavalues.Therearetwo common approaches:secure multiparty computation anddata obscuration. In secure multiparty computation, datavalues • areencodedusingsimulation andcryptographictechniquessothat no party canlearn another’sdatavalues.This approachcanbeimpracticalwhenmining largedatabases. In data obscuration, theactualdataaredistortedby aggregation (suchasusingtheaverage
  • 8. incomefor aneighborhood,rather than theactualincomeof residents)or by adding random noise.Theoriginal distribution of acollection of distorteddatavaluescanbe approximatedusingareconstruction algorithm. Mining canbeperformedusingthese approximatedvalues,rather than theactualones.Althoughacommon framework for defining, measuring,andevaluatingprivacyis needed,manyadvanceshavebeenmade. Thefield is expectedto flourish. Likeanyother technology,datamining maybemisused.However,wemust not losesight of all thebenefitsthat datamining researchcanbring,rangingfrom insights gainedfrom medicalandscientific applicationsto increasedcustomersatisfaction by helping companiesbetter suit their clients’needs.We expectthat computer scientists, policy experts,andcounterterrorism expertswill continueto work with socialscientists, lawyers,companiesandconsumersto takeresponsibility in building solutions to ensuredataprivacyprotection andsecurity. In thisway,wemaycontinueto reap thebenefitsof datamining in termsof time andmoneysavingsandthediscoveryof • newknowledge.