Agri Data Mining/Warehousing:
Innovative Tools for Analysis of Integrated Agricultural & Meteorological Data
Ahsan Abdullah Stephen Brobst Ijaz Pervaiz
National University of Computers & Teradata Division, NCR, Directorate of Pest Warning &
Emerging Sciences, Islamabad, Pakistan Dayton, OH, USA Quality Control of Pesticides, Punjab
Muhammad Umer, Azhar Nisar
National University of Computers & Emerging Sciences, Islamabad, Pakistan
Abstract making where a whole history of events is required to be
synthesized. Later is the point where we, as a nation,
Every year significant yield loss occurs in Pakistan due seriously lack today even though we possess basic input
to pest attacks on cash crops. Although pesticides have for such an undertaking i.e. availability of historic (of
been used, but the desired correlation has not been more than two decades) monitoring data. Integration of
observed between yield and pesticides usage. Different this data in a single standardized format and a set of
government departments and agencies have the task to automated tools that may complement the task of
monitor dynamic agricultural situations all around exploratory analysis of this massive data is what we do
Pakistan, but the data collected has never been integrated not possess.
and standardized to give a complete picture, and answer
several pressing questions. In this paper we have In this paper we have discussed a pilot project
discussed a pilot project implementation of a data implementation of a data warehouse for the analyses of
warehouse for the analyses of above mentioned data in an above mentioned data in an integrated fashion. Such a
integrated fashion. Indigenously developed Data Mining data warehouse can best support new breed of analytical
and OLAP (Online Analytical Processing) tools were used tools including Data Mining and Online Analytical
to analyze the data. Processing (OLAP)1. No such work has ever been
undertaken in agriculture sector of Pakistan . Data
warehouses are quite popular in industries such as
1. Introduction telecommunication, retail sale, manufacturing and
scientific research but an application in agriculture sector
Every year significant yield loss occurs in Pakistan is a novel idea and we have been able to show its strength
due to pest attacks on cash crops. To monitor and through an actual implementation.
ultimately counter these attacks, different government
departments and agencies have the task to keep an eye on Rest of this paper is organized as follows; Section 2
dynamic agricultural situations all around Pakistan. As a gives the motivation behind this work, Section 3 gives
result, thousands of digital and non digital data files are necessary technical background and an explanation of
generated from hundreds of pest-scouting and yield techniques and methods we have employed, Section 4 and
surveys, agro-meteorological data collection and other 5 give a discussion of construction of the Agri Data
such undertakings. The collected data, due to its Warehouse while Section 6 describe various analytical
multivariate nature and disparate origins, has never been operations performed over it. Section 7 gives a roundup of
integrated and thus do not provide a complete picture. related work. Lessons learnt and conclusions are
Thus the lack of data integration (and standardization) summarized in section 8 and 9.
contributes to an under-utilization of historical data, and
inevitably results in a limited ability to perform even
simple analysis. 2. Motivation
Analyzing data - such as pest scouting, pesticide Pakistan’s agricultural sector contributes more than 24%
usage and agro-meteorological recordings – contains huge of GDP, employs about 44% of the labor force, and
analytical potential in at least two major respects. Firstly,
short term forecasting and day to day tactical handling of
issues related to crop and pest management and secondly,
long-term forecasting, strategic planning and policy 1
details in section 3
directly sustains 75% of the population and accounts for This data is weekly collected from a hundred odd
30% of exports . More importantly, it accounts for points in every district. This turns out to be nearly 3400
about 60% of total foreign exchange earnings . Textile recordings per week for the whole province.
exports comprise more than 60% of Pakistan's total
exports, thus the success or failure of cotton crop has a Above exercise has been in place with more or less
direct bearing on textile exports. Cotton production is the same vigor and shape since the last two decades. Amount
inherent comparative advantage of the textile sector of of data that has been accumulated until now is enormous
Pakistan . both diagonally (scores of factors for which recordings are
made) and vertically (coarse estimate stands at more than
Punjab is the main producer of agricultural 3 million records). Moreover, for a detailed analysis,
commodities in Pakistan, producing 83% of the cotton, scouting data is required to be integrated with other data
72% of the wheat, 95% of the rice, 56% of the sugarcane elements such as crop yield and prices over the years and
and 35% of the maize . For this reason, the Punjab is most importantly weather data for the same duration of
commonly known as the bread basket of Pakistan . time.
After the crash of cotton crop in 1983, Government of
Punjab decided to enhance crop monitoring facilities and Counting on human brain alone, for synthesis of
established the Directorate of Pest Scouting and Quality information contained by this data is not only impractical
Control of Pesticides (DPWQCP) in 1984. but is unjust too. Our motivation is just the same i.e.
complimenting the knowledge discovery in this massive
Pest scouting is a systematic field sampling process data using modern information management tools,
that provide field specific information on pest pressure specifically Data Warehouse, Data Mining and OLAP.
and crop injury . Motivation of this work has arisen
form the need to have a better insight into the dynamics of 3. Technological Background
crop growth using the data being generated constantly
from pest scouting program in the province of Punjab
Taking into account, the diverse audience that this
implemented by DPWQCP.
paper caters to, we present a brief introduction of some
technological concepts from the IT domain that needs
DPWQCP has over the years perfected the activity of
understanding for the appropriate appreciation of our
pest scouting such as the data it generates gives the true
and unbiased coverage of the whole acreage. Scouts move
from field to field in their area of jurisdiction, collect
statistics on pest situation from various fixed and random 3.1. Data Warehouse
sampling points and keep a check on pest population
dynamics. Collection of data regarding farmer A data warehouse is an integrated and time-varying
demographics (acreage, variety sown, date of sowing etc.) collection of data primarily used for the support of
and pesticide usage history (amount of pesticide used, management decision-making [14, 9]. A data warehouse
spray dates etc) is an essential part of data collected by the often integrates heterogeneous data from multiple and
Directorate. Table 1 gives a brief detail of the attributes distributed information sources and contains historical and
recorded at each point. aggregated data.
A major misconception is assuming a “data
Sr. no Attribute
warehouse to be a warehouse of data”. It is true that a data
1 Date of Visit
warehouse normally contains large amount of data but it’s
2 Farmer Name and Address not the requirement of building one. Major requirements
3 Acreage attributed to a data warehouse are its ability to
4 Variety(ies) Sown complement the process of analytical querying through
5 Plant Population simplistic schema, efficient implementation and optimized
6 Pest Population performance.
7 Predator Population
8 Pesticide Spray Dates 3.2. Data Mining
9 Pesticide(s) Used
Data mining is the exploration and analysis of
Table 1: Attributes Recorded by DPWQCP Surveyors extremely large quantities of multivariate data by
automatic means in order to discover meaningful patterns A pilot project strategy is highly recommended in
and rules . data warehouse construction . As the full sized data
warehouse construction requires huge amount of capital,
Data mining is regarded as knowledge discovery
process i.e. no prior assumption or hypothesis is made effort and resources, it must be attempted only after a
about data to be proved or disproved through mining. thorough understanding of domain and a valid proof of
Furthermore, It operates on an undirected knowledge concept. A small scale projects in this regard serves many
discovery discourse where one attempts to find patterns or purposes such as (i) providing a valid proof of concept,
similarities among groups of records without the use of (ii) establishing blue print processes for later full-blown
particular target field or collection of predefined classes. project, (iii) identifying problem areas and, (iv) revealing
true data demographics. Agri data warehousing is an
3.3. OLAP unexplored territory, requiring knowledge of a multitude
Aggregate queries (such as sum, average etc.) are used of domains, hence we deem building a small scale version
frequently in decision support applications, where the in first iteration as the best strategy. A detailed proposal
basic goal is to collect information from detail tables . for full blown project has already been submitted to
OLAP tools capitalize on these aggregate queries, by Pakistan Agriculture Research Council (PARC),
generating and then storing answers to all possible queries Islamabad for funding under Agriculture Linkages
in advance and provide a powerful and intuitive Graphical Program, and was under review by PARC when this paper
User Interface (GUI). The most popular among OLAP went to press. Full blown project will cover all 34 districts
features are drill down and roll up aggregates . of Punjab, 10 years of pest scouting data and
meteorological data of 53 elements recorded at seven
OLAP tools are powerful and fast tools for reporting observatories.
on data, fundamentally though, they depend upon human
intelligence coupled with domain expertise for extraction For the sake of pilot project, we limited ourselves to
of valuable information . the pest scouting data of cotton crop recorded in District
Multan (Figure 1), during the cotton growing seasons
2000-01 and 2001-02. This data was weekly recorded
3.4. The Connection between Data Mining, from more than 100 fields of district Multan, as per
OLAP & Data Warehousing normal practice of DPWQCP. Meteorological data for the
same dates was also arranged thorough a different source.
Data mining algorithms require data as input, which Following sections give the details of our Pest-Pesticide-
may not necessarily come from a data warehouse. Still a Metrology Data Warehouse (PPM-DWH), issues in its
data warehouse simplifies the job of data miner . In the construction and a discussion of results that it generated
scenario where data mining is to be performed on data after the processes of mining and iterative analyses.
coming in huge volumes, from multiple sources, with
inconsistent representation and with an inherent time
disparity (monthly vs. weekly vs. daily data ), a single and
consistent source of truth may be the only solution, hence
the data warehouse.
OLAP and data mining are complementary; both are
important parts of exploiting data. OLAP is a presentation
tool that can enable manual knowledge discovery, while
on the other hand; data mining is an automated knowledge
discovery process. It happens quite often that the OLAP
tool is used to explore results/findings generated by data
mining into more detail.
4. Pest-Pesticide-Metrology Data Warehouse
(PPM-DWH): The Pilot Project
Figure 1: Area under study for pilot project– Three
Tehsils of District Multan
Fig. 2: The Overall Process
5 Development Life Cycle for Pilot Project of data cleansing in scouting data were arisen by its
processing at three levels by different individuals. i.e.
firstly, recording by the surveyors at the field level,
Figure 2 gives a panoramic view of the life cycle secondly, typing into data sheets at the DPWQCP office,
that we undertook for this study. Overall process can be and lastly digitization by data entry operators. To
divided into four major phases, Requirement Analysis, maintain full compliance between the data sheets and
Input Data Acquisition, Implementation and Analytical their digitized copies a double check strategy was
Operations. We look upon each of these in the adopted. Two individuals entered every row of data
subsequent sections. Readers interested in the technical separately. In case of conflict the data sheet was
details of implementation are referred to [1b]. consulted for final reconciliation.
5.1 Input Data Acquisition and ETL Phase Variations in farmer names were removed through
certain heuristics so that records of same farmers can be
5.1.1 Pest Scouting Data identified and grouped together. Similar variations
found in pesticide names and cotton variety names were
Field level acquisition of pest scouting data can be removed by comparing them to actual names using the
summarized as: Trained surveyors from DPWQCP visit standard pesticide list of National Agriculture Research
a point and note the recordings against the attributes Center  and standard crop varieties list of Cotton
given in Table 1. These readings are later typed on a Research Institute .
standard sheet and stored in the hard format.
5.1.2 Meteorological Data
For PPM–DWH implementation these sheets were
digitized by data entry operators. During data Digitally formatted daily weather recordings from over
acquisition phase, standard procedures of ETL (Extract, seventy observatories throughout Pakistan for more
Transform, Load) were applied, a primary step in any than last three decades is available with Pakistan
data warehouse implementation that concerns acquiring, Meteorological Department (PMD). This data has an
integrating, cleansing and standardizing data from inherent weakness, as very large area is represented by
source(s). one meteorological recording. It’s a common
Data cleansing and standardization is probably the
largest part in an ETL exercise. In our case, major issues
observation that meteorological elements vary with in •Insect dimension: Insects surveyed in the scouting
even kilometers of range, hence using same figure for process are grouped on entomological basis. Such
thousands of Kilometers brings with it a strong element as in case of cotton crop, Bollworm Complex (Pink
of estimation. Bollworm, Spotted Bollworm, Army Bollworm),
Sucking pests (Whitefly, Jassid etc.), Viruses
As no local readings for the past where available in (CLCV) and Predators.
our case so we had to use meteorological recordings
taken at district level. The second weakness discovered •Pesticide dimension: Numerous pesticide solutions
was the cost of meteorological data as commissioned by are used by farmers depending upon infestation,
PMD that is too high for a research group to pay, price and availability. These solutions may differ in
literary running beyond millions of Rupees. As a last their trade names but belong to some generic
resort, daily weather estimates of years 2001 and 2002, chemical class, and cure group, as listed in .
including minimum, maximum temperatures, humidity
and outlook were downloaded from the website of the 5.2.2 Schema Design
newspaper daily Dawn (www.dawn.com).
Looking at the whole scouting process, number of
5.2Implementation Phase data elements, type and frequency of data generation
and type of questions likely to be faced by the final
5.2.1 Dimensional Model implementation we propose a modified star schema for
PPM-DWH (not shown here). It has been persistently
Dimensional modeling is a technique used to model reported in the literature that star schema best support
databases for analytical applications. It yields a simpler DSS due to its simplified nature [19, 20]. Due to its
design and hence efficient retrievals, a prime technical nature, we omit the details of schema here.
requirement for large data warehouses [6, 18, 19].
Primary output of a dimensional model is the 5.2.3 Coding and Hardware/Software
identification of Dimensions and Facts present in a Platform
PPM-DWH was implemented on a commercially
A Dimension is a collection of conceptually related available server with dual Intel 950 Mhz Xeon
entities with an inherent hierarchy. For example, Time processors and 1GB of RAM. Total internal Hard Disk
dimension would consist of entities representing capacity of the server amounts to 36 GB while external
temporal intervals, such as day, week, month, and year. RAID control supports 8 additional SCSIs of 18 GB
On the other hand, Facts are the metrics associated with each.
(and reported for) dimensions. Such as minimum or
maximum temperature are facts recorded for time
5.2.4 Data Validation
Quality and validity of the underlying data is the
key to meaningful and authentic analyses. After
Though we omit the details of dimensional model
ensuring a satisfactory level of data quality it is
here, but a brief introduction to the involved dimensions
extremely important to somehow judge the validity of
is as follows.
data that a data warehouse constitutes. We applied some
very natural checks for this purpose.
•Location dimension: It corresponds to the
administrative hierarchy of a province. It starts with
Relationship between the pesticide spraying and
the Division which is divided into districts. Each
predator (insects that destroy pests) population is a fact
district consists of three to four Tehsils which are
that has been discussed by many agriculturists
further divided into a number of Markaz.
[James02], [Relyea01]. Predator population decreases
with the first pesticide spray and then continually
•Farmer dimension: For a detailed study, farms are decreases. We dig out this fact in its same form from our
categorized into different sets depending upon data as well, as it can bee seen in the figure below.
acreage [Pak 01], such as 0.5 to under 1.0 Acres,
1.0 to under 2.0 Acres, 2 to under 3.0 Acres, 3 to
under 5.0 Acres and similarly so on.
Data mining technique that we applied on PPM-
DWH is called Recursive noise Removal (RNR), first
proposed and used on gene expression data by . Due
to non-technical nature of this paper, we omit the details
of this method but a brief overview may be necessary in
order to appreciate the results appropriately. More
interested readers are referred to . Readers interested
in the details of RNR application to the agriculture data
are referred to [1a].
Clustering is a data mining technique that assigns
data elements to various classes/clusters. A cluster is a
collection of data elements that are highly similar to one
Figure 3: Y 2001 Pesticide usage vs Predators another within the same cluster, but weakly similar from
the data elements in other clusters.
6. Analytical Operations Clustering falls into two main classes (i) Un-
supervised, when size number and/or demographics of
Once the underlying structure is in place (the data clusters are not known in advance (ii) Supervised,
warehouse), there is no end to the exploration that one popularly known as classification it applies to the
can perform. Probably the task of knowledge discovery situation where cluster (or class) properties are known a
is limited only by the imagination and to some extent priori, and unclassified data elements are assigned to
domain expertise of the explorer, provided that any known clusters. An obvious edge that unsupervised
such undertaking is supported by appropriate automated clustering has, is its data driven and domain independent
or semi- automated tools. We applied such tools on the nature. An unsupervised clustering technique fit for
pilot version of our PPM data warehouse and results are identifying patterns in medical images may be equally
more than promising. applicable in analyzing seismic data.
As described above, given a standardized and RNR algorithm is an unsupervised clustering method
integrated data set coupled with efficient storage that can run on any database table containing
structure and data exploration tools, any aspect of data alphanumeric values. Domain expertise is required not
can be investigated resulting in numerous findings. for extracting the clusters but for understanding the
Hence PPM-DWH and the tools involved are by no implication of clusters that RNR has identified. RNR
means limited to the findings we report here. We give works by repeatedly/recursively using crossing
these findings with the sole aim of demonstrating the minimization technique and dropping “noise” till a
potential of the framework we proposed (and partially desired level of cluster quality is obtained. We
implemented). demonstrate its use through following experiment.
Following sub sections describe the analytical
operations that we performed on PPM-DWH (data 6.1.2 Data Mining Experiment
mining, OLAP and statistical analysis) and the results
these operations yielded. Initiative behind this experiment was a common
farmer question that which pesticide should be bought
6.1 Data Mining Operations and when should it be bought. We modeled these
questions as to find the relationship between pest
As described in section 3.2, data mining is a process of population and meteorological data elements and to find
automated discovery. In the wake of data explosion in out (if possible) temperature and humidity thresholds at
almost every domain data mining techniques and which population of a certain pest booms.
algorithms have received huge acclaim. Numerous data
mining techniques, frameworks and algorithms have Figure 4 gives the random input. Pre-processing method
been reported in literature, and are currently used for preparing this input is omitted here due to its
successfully in a multitude of domains. mathematical and technical nature, see  for details.
Two distinct clusters were identified by RNR heuristic
as shown in Figure 5. Matching the clusters with the Checking these rules against the data (376 matching
detailed data showed clear grouping on the basis of pest records retrieved out of 2,000+) shows some very
populations where cluster 1 have low populations and exciting results as shown in Figure 7.
cluster 2 has quite high pest populations. Average
values shown in Table-2. Over Threshold Under Threshold
Cluster Thrips Jassid SBW
Figure 7: Experiment 2 Findings
Figure 4: Input Figure 5: Clusters
similarity matrix Identified by RNR
This experimentation presents a very credible case
Cluster Jassid Thrip SBW* that common farmer questions can be modeled through
C1 0.1 2.22 0.88 this data mining technique and answers can be given
C2 0.65 4.11 2.44 based on evidence present in the data before the pest
ETL** 1 8-10 3
attack occurs. Strength of this method lies in clustering
Table 2 -: Cluster Demographics
the evidence scattered in the data and hidden from the
*Spotted Boll Worm
bare human mind.
** Economic Threshold Level
These clusters provide us with a good starting point
and next we try to establish certain rules on the basis of 6.2 OLAP and Statistical Operations
this clustering. For each record, we look back in time for
seven days and note meteorological recordings against OLAP and statistical analysis operations are related
minimum and maximum temperatures and humidity. in the sense that these operations are user-driven, unlike
Figure 6 shows the resulting graph of average values. the above described method of data mining, which is
Now on the basis of these graphs we establish two data-driven. An analyst has particular questions in mind
simple rules i.e. so the exploration through both of these methods is
performed with a certain bias towards answering those
• If Temp > 29 AND Humidity > 70 then pest
incidence will be high.
PPM-DWH contains integrated data and hence can
be probed along any dimension. Generally an iterative
• If Temp < 27 AND Humidity < 67 then pest
analysis technique proves most beneficial. Iterative
incidence will be low.
analysis starts with a broad based question resulting in a
High Pest Population Low Pest Population large set of records. Analyst then capitalizes on these
80 records and asks more specific question. The process of
iteratively building up on the previous question
continues until a result of significant importance is
40 Tmax We performed this analysis with various initiatives
and a few of the interesting results are reported below.
20 6.2.1 Working Behaviors at Field Level
6 5 4 3 2 1 0
Days before visit
Cultural practices are one way of controlling pests;
Figure 6: Meteorological recordings for the two we were interested in exploring this behavior of our
clusters farmers. Results of probing for sowing dates in Y2001
and Y2002 are shown in Fig-8. Note the surprising This finding was later confirmed by agriculturalists
finding that most sowings occurring on 20th and 25th of i.e. in social set up of District Multan, Thursdays are
May and 2nd of Jun. in both years. usually related with religious activities, such as visiting
2001: Sowing date shrines, hence a tendency of doing lesser amount of
work exists on this day.
Y2001: Sprayings Vs. Weekdays
Figure-8(a): Sowing Vs. day of year 2001
2002: Sowing date
Y2002: Sprayings Vs. Weekdays
Figure-8(b): Sowing Vs. day of year 2002
Further drilling down on the day of the week basis as
shown in Fig-9 and 10, resulted in an even more
surprising finding, that least number of sowings
occurring on Thursdays, in each year. Last but not the
least, except for 2002, least number of pesticides were
also sprayed on Thursdays. Thus Thursday is the day
when work performed is particularly less than other
Figure 10: Number of sowings against week days
Y2001: Sowings Vs. Weekdays
7 Related Work
Data warehousing is very popular in domains such as
telecommunication, retail sale, manufacturing and
scientific research . An agriculture data warehouse
is a rather new concept with a very few parallels.
Probably the closest among these is USDA-NASS2 data
warehouse. Established in 1997, basic goal behind its
construction was to standardize and integrate survey
Y2002: Sowings Vs. Weekdays data generated by NASS . Our work differs in
principal with  as (i) data is generated by multiple
sources and (ii) goal behind our data warehouse is
construction of a foundation on which analytical
exploration can take place.
Other than , world have yet to see a full blown
agricultural data warehouse implementation, though
Figure 9: Number of sowings against week days United States Department of Agriculture, National
Agriculture Statistics Service
(iv) All pests are not present all the time, most
there have been a number of proposals in this regard, of the times second spray is not done (or not
such as . No such work has ever been undertaken in recorded), hence tables are sparse. We had
agriculture sector of Pakistan . to split tables to decrease header size and
Data mining applications have quite recently found (v) Unlike traditional data warehouse where the
their way into agricultural research and a lot of activity end users are decision makers, here the end
can be seen in this area such as [5, 10, 13, 24, 28]. In users include the farmers as well, thus the
 details of GIMMI project are provided which is decision-making goes all the way “down” to
aiming at providing a one-stop and integrated access to the extension level. This presets a challenge
the assessment of pesticide leaching into soil and to the analytical operations’ designer, as the
groundwater. Several IT tools including data mining are findings must be fairly simple to
to be implemented as part of this project. In  a new understand.
approach for acquisition and pre-processing of
agricultural data mining has been described.
In  simple numerical methods have been used to
establish the relationship between 10 soil characteristic Analytical exploration of vast amount of
variables and corn yield. In  remote sensing agricultural data can best be supported by an appropriate
techniques are used in conjunction with AI neural application of Data warehousing, and OLAP
networks to identify weeds in cornfields. In  Data technologies. A data warehouse provides a flexible yet
Mining techniques on images have been used to identify efficient and reliable storage structure for vast amount
trash in the ginned cotton. In  a case study approach of data while OLAP techniques provide mechanisms for
was used to help understand how data mining could be ad hoc and in depth analysis of this data. Traditional
used in the manufacturing of textiles using SAS. In  analytical tools and database techniques may not
spatio-temporal knowledge discovery techniques are succeed here due to their rigid nature. Techniques used
integrated into a Geo-Spatial Decision Support System in this work are equally applicable at any geographic
(GDSS) using a combination of data mining techniques location provided that related data is available. The
to find relationships between user-specified target paradigms are quite different from a traditional business
episodes and other climatic events and to predict the application of a data warehouse.
 Abdullah, A., and Brobst, S., “Clustering by recursive noise
During the construction and subsequent utilization of removal”, Proc. Atlantic Symposium on Computational
the data warehouse, following lessons are of extremely Biology and Genome Informatics, USA, Sep. 2003
important nature. [1a] Abdullah, A., Brobst S., Pervaiz I., Umer M., and Nisar A.,
“Learning Dynamics of Pesticide Abuse through Data
(i) ETL of agricultural data is a big issue. Mining”, to appear in proceedings of Australasian Workshop
There are no digitized operational databases on Data Mining and Web Intelligence 20004
(AWDM&WI2004), Dunedin, New Zealand, January 2004.
so one has to resort to data available in
typed (or hand written) sheets. Typing of [1b] Abdullah, A., Brobst S., and Umer M., “The Case for an Agri
these sheets is very expensive, slow and Data Warehouse: Enabling Analytical Exploration of
prone to errors. Integrated Agricultural Data”, to appear in proceedings of
The IASTED International Conference on Databases and
Applications (DBA 2004), Innsbruck, Austria, Feb. 2004
(ii) Particular to the pest scouting data, farmer
individualization is critical, as a farmer is
visited number of times by the extension  Ahmed, Mumtax & Joseph G. Nagy, “Private Investment in
Agriculture Research : Pakistan”, Economic Research
people. Services, U.S. Department of Agriculture, January 2001.
(iii) Scouting data includes pesticide names,  Avesani, P., E. Olivetti, and A.Susi, “Feeding Data Mining”,
which are complex and not easy to IRST Technical Report #0207-01, Istituto Trentino di Cultura,
Povo (Trento), Italy, July 2002
remember/pronounce and requires extra
effort to learn and type correctly.
 Berry, J.A. Micheal & Gordan Linoff “Data Mining
Techniques” John Wiley and Sons inc., 1997.
 Bertis, B., Walter L. Johnston et al . “Data Mining in U.S.  Levene, Mark and George Loizou, “Why is the Snowflake Schema
Corn Fields”, Proceedings of the First SIAM International a Good Data Warehouse Design?” 1999 citeseer.nj.nec.com
Conference on Data Mining, Fall 2001
 “Introduction to Crop Scouting”, Plant Protection Program,
 Brobst, Stephen, “Perfect Dimensions”, Intelligent College of Agriculture, Food and Natural Resources, MU
ENTERPRISE, June 1999. Extension University of Missouri-Columbia, 2001
 Christensen, W. F., and Di Cook, "Data Mining Soil  Nguyen, H. T., N. R. Prasad, V. Kreinovich, and H.
Characteristics Affecting Corn Yield", 1998, Gassoumi, "Some Practical Applications of Soft Computing
citeseer.nj.nec.com/christensen98data.html and Data Mining ", In: A. Kandel, H. Bunke, and M. Last
(eds.), Data Mining and Computational Intelligence, Springer-
 S. Chaudhuri and U. Dayal. An overview of data warehousing Verlag, Berlin, pp. 273-- 307, 2001.
and OLAP technology. ACM SIGMOD Record, 26:6574,
1997.  Poe, Vidette, Patricia Klauer and Stephen Brobst, “Building A
Data Warehouse for Decision Support” 2nd Edition, Prentice
 “Cotton Production Technology”, Cotton Research Institute, Hall, 1998.
http://www.punjab.gov.pk/agriculture/Research_Institutes/cri  Scherte, S. L., PhD dissertation, “DATA MINING AND ITS
_fbd.htm POTENTIAL USE IN TEXTILES: A Spinning Mill”, North
Carolina State University, 2002
 Cunningham, Sally Jo and Geoffrey Holmes,
“Developing innovative applications in agriculture using data  Sharma, S.D., Randhir Singh and Anil Rai, “Integrated
mining”, Department of Computer Science, University of National Agricultural Resources Information System
Waikato, Hamilton, New Zealand, 2001 (INARIS)”, Indian Agricultural Statistics Research Institute,
New Delhi, 2000.
 Gray, Jim, Surajit CHAUDHURI, ADAM
BOSWORTH, et al. “Data Cube: A Relational Aggregation  Voss, H., et al., “Simulation, Visualization, and Decision
Operator Generalizing Group-By, Cross-Tab, and Sub- Support in GIMMI”, 9 th EC GI & GIS Workshop, ESDI
Totals”, J. Data Mining and Knowledge Discovery, 1997. Serving the User, A Coruña, Spain, June 2003
 Gupta, A., Venky Harinarayan and Dallan Quass,
“Aggregate-Query Processing in Data Warehousing  Yost, Mickey., Jack Nealon, “Using A Dimensional Data
Environments”, Proc. Of 24 Conf. on Very Large Warehouse to Standardize Survey And Census Metadata”,
Databases, Zurich, Switzerland, 1995. National Agricultural Statistics Service, U.S. Department of
Agriculture, Fall 1999.
 Hrms, Sherri K., et al, “Data Mining in a Geospatial Decision
Support System for Drought Risk Management” U.S.  Yang, C.-C., S. O. Prasher and J.-A.
Department of Agriculture, Risk Management Agency, Fall Landry, “Use of artificial neural networks to recognize weeds
2001 in a corn field”, Journée d'information scientifique et
technique en génie agroalimentaire, Saint-Hyacinthe QC,
] W.H. Inmon. Building the Data Warehouse. John Wiley & Canada, p. 60-65, Mar. 1999.
Sons, Chichester, second edition, 1996.
 Government of Pakistan “Economic survey of Pakistan
 Irshad, M., Ehsan-ul-Haq and Javed Iqbal, “Catalogue of 2000”, Islamabad, Pakistan.
Insecticides for Agricultural Pests of Pakistan”, Integrated
Pest Management Institute, National Agriculture Research  “Cotton and Ginning”, http://smeda.org/bopp/cotton-
Center (NARC), Islamabad, 2001. ginning.pdf viewed on 20 Sep. 2003.
 James, David G., and Tanya S. Price, “Imidacloprid Boosts  United Nations Fod and Agricuture Organization,
TSSM Egg Production”, Agricutre and Environment news, “Agricultural Sector in Pakistan” 1998
Issue No. 189, Washington State University, USA, July 2002
 Government of Punjab, Agriculture Department. “Future
] Johnston, Doug., “Data Mining for Site-specific Agriculture”, agricultural extension strategy” Lahore
Illinois Council on Food and Agriculture research (C-FAR) ,
Illinois, January 2000.  Davidson, Andrew P. (2000) ‘Soil salinity, a major constraint
to irrigated agriculture in the Punjab Region of Pakistan:
] R. Kimball, “A dimensional Modeling Manifesto”, DBMS contributing factors and strategies for amelioration’.
and Internet Systems, August 1997 American Journal of Alternative Agriculture No. 15 pp. 154–
 R. Kimball, L. Reeves, M. Ross, and W. Thornthwaite. “The
Data Warehouse Lifecycle Toolkit: Expert Methods for
Designing, Developing and Deploying Data Warehouses”,
John Wiley & Sons, Chichester, 1998.