2. 06/04/16 07:10 PM2
Contents
HISTORY
INTRODUCTION
DATA VS. INFORMATION
DATA MINING- CONCEPT
TECHNIQUES USED IN DATA MINING
ALGORITHMS USED IN DATA MINING
ROLE OF DATA MINING
REVIEW OF LITERATURE
CONCLUSION
3. 06/04/16 07:10 PM3
Agriculture is a business with risk
Depends on climate, geography, political and economic factors
Some risks which can be quantified by mathematical, statistical
methods, and advanced computing
Challenge is to extract information from agri. Databases
Data mining is such a technology which can bring the knowledge to
agriculture development
IntRoDUCtIon
5. eVoLUtIon oF DAtA
sCIenCe
06/04/16 07:10 PM5
Evolutionary Steps Enabling Technologies Product
Providers
Data Collection
(1960s)
Computers, tapes IBM
Data Access
(1980s)
Relational databases Oracle,
Informix, IBM,
Microsoft
Data Warehousing OLAP, multi dimensional
databases, data
warehouses
Pilot,
Microstrategy
Data Mining
(Emerging Today)
Advanced algorithms,
multiprocessor
computers,
Pilot,
IBM,others
8. Databases today are huge:
– More than 1,000,000 entities/records/rows
– From 10 to 10,000 fields/attributes/variables
– Giga-bytes ,tera-bytes and now in peta & exa
Databases a growing at an unprecendented rate
The corporate world is a cut-throat world
– Decisions must be made rapidly
– Decisions must be made with maximum knowledge
06/04/16 07:10 PM8
RApID gRowtH oF DAtA
9. 06/04/16 07:10 PM
9
KnowLeDge DIsCoVeRY In
DAtABAses (KDD)
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
KNOWLEDGEKNOWLEDGE
10. 06/04/16 07:10 PM10
DAtA pResentAtIon
Graphical-bar charts, pie charts histograms Geometric- scatter plot
Icon-based- using colors figures as icons Pixel-based- data as colored
pixels
0
5
10
15
20
25
30
35
40
10000 30000 50000 70000 90000
11. wHAt Is DAtA MInIng ?
06/04/16 07:10 PM11
DATA MINING
12. DeFInItIon
06/04/16 07:10 PM12
The process of discovering interesting and useful patterns and
relationships in large volumes of data.
Encyclopaedia Britannica
• Other synonym of data mining
are knowledge extraction, pattern
analysis, data archeology
14. goALs oF DAtA MInIng
Prediction: How certain attibutes within the data will behave in
the future.
Identification: Identify the existence of an item, an event,an
activity.
Classification: Partition the data into categories.
Optimization: Optimize the use of limited resources
06/04/16 07:10 PM14
15. Why Mine Data?
Lots of data is being collected and warehoused
– Web data, e-commerce
– Purchases at department/grocery stores
– Bank/Credit Card transactions
– Crop databases
Computers have become cheaper and more powerful
06/04/16 07:10 PM15
16. Why Mine Data? Scientific
VieWpoint
Data collected and stored at
enormous speeds (GB/hour)
– Remote sensors on a satellite
– Telescopes scanning the skies
– Microarrays generating gene
expression data
– Scientific simulations
generating terabytes of data
Traditional techniques infeasible for raw data
Data mining may help scientists
– In classifying and segmenting data
– In Hypothesis Formation
06/04/16 07:10 PM16
17. SoURceS of Data foR MininG
Data warehouses
Transactional databases
Spatial and Temporal
Time-series
Multimedia, text
06/04/16 07:10 PM17
18. 06/04/16 07:10 PM18
What aRe techniqUeS
USeD in Data MininG?
1.STATISTICS
2.MACHINE LEARNING
3.DATA BASE SYSTEMS
4,5,6,7…… LET’S SEE
20. 2.Machine LeaRninG
06/04/16 07:10 PM20
Ability to automatically learn to recognize complex patterns
and make intelligent decisions based on data.
21. 3.Data baSe SySteMS anD Data
WaRehoUSeS
A repository of information collection from multiple sources,
stored under a unified schema and that usually resides at a
single site.
06/04/16 07:10 PM21
22. 4.infoRMation RetRieVaL (iR)
activity of obtaining information resources relevant to an information
from a collection of information resources.
textual data mining and multimedia mining ,integrated with
information retrieval methods have great importance.
06/04/16 07:10 PM
22
24. 6.patteRn RecoGnition
Branch of machine learning that focuses on the
recognition of patterns and regularities in data
06/04/16 07:10 PM24
25. 7.hiGh peRfoRMance
coMpUtinG
it is the use of parallel processing for running advanced
application programs efficiently, reliably and quickly.
06/04/16 07:10 PM25
26. ReGReSSion MoDeLS
Regression is a data mining (machine learning) technique
used to fit an equation to a dataset.
A straight line is given by the equation y = mx + c and
determines the approximate values for m and c to
calculate the value of y based on a particular value of x.
multiple regression, uses more than one input variable
and allows for the fitting of more complex models
06/04/16 07:10 PM26
28. Data MininG MethoDoLoGieS
Neural Networks
K-means
Fuzzy set
Bayesian network
K-nearest Neighbour
Support Vector Machine
Decision Tree Analysis
WEKA Tool
06/04/16 07:10 PM28
29. Neural Networks
An information processing paradigm that is inspired by
the way biological nervous systems, such as the brain,
process information.
06/04/16 07:10 PM29
31. waikato eNviroNmeNt for
kNowledge aNalysis (weka)
06/04/16 07:10 PM31
i. Machine learning
software written in
java
ii. Free software
iii. Analyze data from
agricultural domains
iv. Visualization tools
and algorithms for
data analysis and
predictive modelling
32. advaNtages of weka iNclude
• Runs on almost any modern computing platform.
• A comprehensive collection of data preprocessing and modeling
techniques.
• Ease of use due to its graphical user interfaces.
• Supports several standard data minig tasks, more specifically, data
preprocessing, clustering, classification regression, visualization, and
feature selection
06/04/16 07:10 PM32
33. commercial tools
Oracle Data Miner
– http://www.oracle.com
Data To Knowledge
– http://alg.ncsa.uiuc.edu
SAS
– http://www.sas.com/
Clementine
– http://spss.com/clemetine/
Intelligent Miner
– http://www-306.ibm.com/software
06/04/16 07:10 PM33
34. matlaB
MATLAB (matrix laboratory) is a multi-paradigm numerical computing
environment and fourth-generation programming language.
A proprietary programming language developed by MathWorks,
MATLAB allows matrix manipulations, plotting of functions and data,
implementation of algorithms, creation of user interfaces
interfacing with programs written in other languages, including C, C++,
Java, Fortran and Python.
06/04/16 07:10 PM34
35. r
06/04/16 07:10 PM35
• R is a programming language and software environment for statistical
computing and graphics supported by the R Foundation for Statistical
Computing
• widely used among statisticians and data miners for developing
statistical software & data analysis.
• Polls, surveys of data miners, and studies of scholarly literature
databases show that R's popularity has increased substantially in
recent years.
36. role of data miNiNg iN
agriculture
Influence of climate on kharif and rabi crops
Crop yield estimation
Estimation of Damage caused by pest
Mushroom grading
Spatial data mining reveals interesting pattern related to
agriculture
06/04/16 07:10 PM36
37. role iN agriculture domaiN
06/04/16 07:10 PM37
Data mining methodologies Applications
Neural Networks Focuses on weather forecasts, Prediction of
rainfall
K-means Classifying soil in combination with GPS, Wine
fermentation problem, Yield Prediction
Fuzzy set For detecting weeds in precision agriculture
Bayesian network Developed the model for agriculture purpose
based on the Bayesian network learning method
K-nearest Neighbour Simulating daily precipitations and other
weather conditions
40. a study oN effect of weatHer parameters
By artificial Neural Networks oN yield
of aoNla (iNdiaN gooseBerry) uNder
differeNt fertiliZers treatmeNts
April Month’s Range of temperature has high correlation
coefficient with yield Highest
Extreme variations negatively, affect the AONLA fruit
yield.
06/04/16 07:10 PM40
Kulshrestha et al,2010Anand
41. geospatial data miNiNg tecHNiques
kNowledge discovery iN
agriculture
Visualization of the spatial OLAP of Gujarat
Concluded that integration of computer science with
agriculture will generate new emission in management of
agricultural information
06/04/16 07:10 PM41
Bhojani, 2013Anand
42. time series forecastiNg of losses due to pod Borer,
pod
fly aNd productivity of pigeoNpea (caJaNus caJaN)
for
NortH west plaiN ZoNe (NwpZ) By usiNg artificial
Neural
Network (aNN)
Pod damage by pod fly
in2010-11–damage found as 22.21%
2011-12 –damage found as18.77%
2012-13 – damage found found as 16.72%
A linear regression between network outputs and the
corresponding targets with the R2
value as 0.88
Indicating the fit was reasonably good for all data sets
06/04/16 07:10 PM
42
Kumari et. al. 2014Varanasi
43. A survey on DAtA Mining techniques for
crop yielD preDiction
06/04/16 07:10 PM43
K-Means algorithm is used to perform forecast of the
pollution
K Nearest Neighbor (KNN) is applied for simulating daily
precipitations
K-Means approach is used for classifying soils in
combination with GPS
Ramesh et al,2014Bangalore
44. A survey on DAtA Mining techniques in
Agriculture
Use of information technology in agriculture can change the
situation of decision making and farmers can yield in better
way
Data mining plays a crucial role for decision making on several issues
related to agriculture field.
06/04/16 07:10 PM44
Geetha et al,2015Coimbatore
45. preDiction of stuDents recruitMent
process using DAtA Mining techniques
with clAssificAtion rules
06/04/16 07:10 PM45
Malathi et al,2015Tamilnadu
46. text recognition
Feature Extraction,
Clustering, and Pattern Matching
k-NN classifier
06/04/16 07:10 PM46
Haridwar Krishan et al,2016
47. the ApplicAtion of DAtA Mining
techniques to chArActerize
AgriculturAl soil profiles
06/04/16 07:10 PM47
Armstrong et al.Australia
48. iMpleMentAtion of DAtA Mining
techniques for MeteorologicAl DAtA
AnAlysis
06/04/16 07:10 PM48
palestine Sarah et al.
49. DAtA Mining - An evolutionAry view of
Agriculture
GPS techniques may be employed for discovering important
information from agricultural-related like soil identification.
Data Mining techniques were adopted in order to estimate crop
yield analysis with existing data and their use in data mining.
06/04/16 07:10 PM49
Amaravati Abhishek et al. 2014
50. DAtA Mining in Agriculture on crop
price preDiction: techniques AnD
ApplicAtions
Data Mining techniques were adopted in order to estimate crop
price analysis with existing data
K-Means approach, utilize only the basic algorithm
06/04/16 07:10 PM50
Bangalore Manpreet et al.,2014
51. DAtA Mining techniques for preDicting
crop proDuctivity – A review Article
Data mining is relatively a novel research field and it is
expected to grow in the future
multidisciplinary approach of integrating computer science with
agriculture will help in forecasting/ managing agricultural
crops effectively.
06/04/16 07:10 PM51
Bhopal Veenadhari et al. ,2011
52. Applying DAtA Mining techniques in the
fielD of
Agriculture AnD AllieD sciences
Mentioned different techniques of data mining used in
agriculture domaim
K-means algorithm,ID3 algoritm,k nearest neighbour are
mostly used
06/04/16 07:10 PM52
Bangalore Yethiraj,2012
53. DAtA Mining techniques AnD
ApplicAtions to
AgriculturAl yielD DAtA
06/04/16 07:10 PM53
Andhra pradesh
Ramesh,2013
MLR Technique is given as 98 % and using K-Means algorithm is
given as 96% accuracy
54. 06/04/16 07:10 PM54
Data Mining is boon for large data in agriculture
Extraction of knowledge is a big challenge
A lot of data mining techniques are developed today to
tackle the challenge
Skill is also required to handle the tools and techniques
conclusion
Temporal data is data that varies over time
Time series forecasting is the use of a model to predict future values based on previously observed values. While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called "time series analysis", which focuses on comparing values of a single time series or multiple dependent time series at different points in time.[2]
mathematical function in terms of random variables and probaility functions)
Data warehouse needs consistent integration of quality data
Flat files are simple data files in text
or binary forma
In mathematics, fuzzy sets are sets whose elements have degrees of membership. Fuzzy sets were introduced by Lotfi A. Zade
As an extension of the case of multi-valued logic