SlideShare a Scribd company logo
1 of 21
Download to read offline
Data Warehousing and Business Intelligence
Project
on
INDIAN AGRICULTURE
SUSHANT PRABHAT PARTE
x18137440
https://www.youtube.com/watch?v=UazxAOpJ934
MSc/PGDip Data Analytics โ€“ 2019/20
Submitted to: Prof.SEAN HEENEY
National College of Ireland
Project Submission Sheet โ€“ 2019/2020
School of Computing
Student Name: Sushant Prabhat Parte
Student ID: x18137440
Programme: MSc Data Analytics
Year: 2019/20
Module: Data Warehousing and Business Intelligence
Lecturer: Prof. Sean Heeney
Submission Due
Date:
April 12, 2019
Project Title: INDIAN AGRICULTURE
URL: https://www.youtube.com/watch?v=UazxAOpJ934
I hereby certify that the information contained in this (my submission) is information
pertaining to my own individual work that I conducted for this project. All information
other than my own contribution is fully and appropriately referenced and listed in the
relevant bibliography section. I assert that I have not referred to any work(s) other than
those listed. I also include my TurnItIn report with this submission.
ALL materials used must be referenced in the bibliography section. Students are
encouraged to use the Harvard Referencing Standard supplied by the Library. To use
other authorโ€™s written or electronic work is an act of plagiarism and may result in disci-
plinary action. Students may be required to undergo a viva (oral examination) if there
is suspicion about the validity of their submitted work.
Signature:
Date: April 12, 2019
PLEASE READ THE FOLLOWING INSTRUCTIONS:
1. Please attach a completed copy of this sheet to each project (including multiple copies).
2. You must ensure that you retain a HARD COPY of ALL projects, both for
your own reference and in case a project is lost or mislaid. It is not sufficient to keep
a copy on computer. Please do not bind projects or place in covers unless specifically
requested.
3. Assignments that are submitted to the Programme Coordinator office must be placed
into the assignment box located outside the office.
Office Use Only
Signature:
Date:
Penalty Applied (if
applicable):
Table 1: Mark sheet โ€“ do not edit
Criteria Mark Awarded Comment(s)
Objectives of 5
Related Work of 10
Data of 25
ETL of 20
Application of 30
Video of 10
Presentation of 10
Total of 100
Project Check List
This section capture the core requirements that the project entails represented as a check
list for convenience.
 Used L
A
TEX template
 Three Business Requirements listed in introduction
 At least one structured data source
 At least one unstructured data source
 At least three sources of data
 Described all sources of data
 All sources of data are less than one year old, i.e. released after 17/09/2017
 Inserted and discussed star schema
 Completed logical data map
 Discussed the high level ETL strategy
 Provided 3 BI queries
 Detailed the sources of data used in each query
 Discussed the implications of results in each query
 Reviewed at least 5-10 appropriate papers on topic of your DWBI project
INDIAN AGRICULTURE
Sushant Prabhat Parte
x18137440
April 12, 2019
Abstract
India is known for its agriculture. Every state and region have different soil types
and different crops. Since this is a base of India, it is a very huge business too.
Various factors are affected by agriculture such as GDP value and overall economy
of India. Every region all over the world today use Indian cultivated spices, tea and
lots of various agricultural product. Best way to study and understand this field
is by data analytics. A data set has been chosen wherein various attributes like
Population, GDP, crop index value, production, consumption and has information
about agricultural land. Its a known fact that as population increases the land for
agriculture decreases. India has passed a regulation years ago that no agricultural
land should be used for any other purpose thus this fact is going very slow but as
you do data analytics on this, we know somehow its still a fact. According to the
analysis made on data set, it shows how important it is to do this type of data
analysis to regulated and have a proper control over agriculture. Many businesses
run due to agriculture and such analysis also helps new businesses to understand
the risk factors and know where they stand in business. Its so much necessary to do
the market survey and predict facts to minimize the risk factors for such businesses.
1 Introduction
As per mentioned in the Abstract, dataset available in on Agriculture India and this
whole project is on how smartly we can analyze any field of business using data warehouse
queries and find out that the known facts are true. As days pass by all fields of businesses
develop some facts which are taken to be true but, it all depends on various factors. Data
warehousing and business intelligence thus serves the purpose of having a strong base to
our predictions and it shows and highlights all the risk factors included in running a
business.Agriculture is the basic and long running business in country like India. In this
project as it goes down to conclusion the person would have a definite idea of this business
and with strong base points.
Req-1: Effect of population on Agriculture land
Req-2: Consumption of Crops as per the Production of Crops
Req-3: Crop production considering the crops value on GDP
1
Source Type Brief Summary
Data hub Structured Extracted to get data about all out of Crop
value index.
Statista Structured Extracted to get data about the GDP of In-
dian agriculture.
Kaggle Structured Extracted to get data about the consump-
tion of crops over the years.
Data gov.in Structured Extracted to get data about the production
of crops in India.
Country econ-
omy.com
Unstructured Scraped to get data about India population
.
data.world bank structured Extracted to get data about Agriculture land
available in India.
Table 2: Summary of sources of data used in the project
2 Data Sources
2.1 Source 1: Data hub
The first data set is taken from data hub :https://datahub.io/world-bank/ag.prd.crop.xd,the
dataset which is created 9 months ago, illustrates the crop value index.Crop Production
Index demonstrates farming generation for every year in respect to the base time period
2004-2006.
Figure 1: SOURCE 1
2.2 Source 2: Statista
The second source of Structured data is fetched from Statista: https://www.statista.com/statistics/2713
of-gross-domestic-product-gdp-across-economic-sectors-in-india/ which is published on Novem-
ber 2018.The statistic data extracted from statista represents total national output (GDP)
crosswise over different areas in India from 2007 to 2017.Agriculture sector contributed
around 15.45 percent to the GDP of India
Figure 2: SOURCE 2
2.3 Source 3: Kaggle
The third source of structured data is fetched from Kaggle: https://www.kaggle.com/dorbicycle/world-
foodfeed-production which is created one year ago.This data set gives an overall idea
about the consummation of different crops over different period of time.
Figure 3: SOURCE 3
2.4 Source 4: data gov.in
The fourth source of structured data is taken from data gov.in-https://data.gov.in/resources/all-
India-level-production-principal-crops-2001-02-2016-17.The information mention to All
India production of essential crops (grains, oil seeds, cotton, jute and so on).
Figure 4: Source 4
2.5 Source 5: country economy.com
The unstructured data,which is the fifth source of data is scraped from:https://countryeconomy.com/dem
information outlines India population from time of 1960 t0 2017.India ranks No. 2 among
196 nations which is published his data in country economy.com
Figure 5: SOURCE 5
2.6 Source 6: data.world bank
The sixth source of data is fetched from :https://data.worldbank.org/topic/agriculture-
and-rural-development.The information reprents the territory of land available in India
for farming over a time of 1960 to 2016.
Figure 6: SOURCE 6
3 Related Work
(Sahota 1968)The author represents an statistical study of resources in Indian agriculture.
The function, production in agriculture is considered to be the major data in the project.
The different quantity of crops produced in different parts of Indian state with availability
of land is mentioned in figures in these article.
Birthal et al. (2007)The author examines the ideology of how important the crop value
is for the consumption purpose. As India is a vast territory for its diverse foodstuffs,
Indian spices and milk products author has mentioned state wise distribution of crops
value for different crops cultivated in that areas. Due to shifting agriculture there is
massive destruction of cultivable land which could been overcome by the authors ideas.
Gottlieb  Grobovek (2019)The author enlights the points about the reasons the
productivity in agriculture has decreased. The unskilled farmers with poor knowledge of
crops ,soil and smart agriculture leads to decrease in overall agricultural income which
massively effects the Indian economy and revenue system.
Wolfert et al. (2017)The citation provided examines how big data can be used in the
smart farming with an overflow of the task which can be reviewed. The statistical analysis
is been used in the project to conclude the task of available land and consumption of crops
for agriculture.
Sen (1962)The ideology refers to the problems faced by different states in India to
be stated as poverty,low crop prices acquired by farmers etc these problems are arrived
due to the growing population,less availability of land,inappropriate production of crops.
The author mainly referes to the profit percentage achieved by farmers to the land in
acers they own.These statistical data is been reflected in the queries by how the land is
affecting by increase in population where as how stable GDP can be achieved.the author
also refers to the loan sanctioned to the farmers by government on particular factors.
Dev (2008)The author had focused on the wide range of challenges faced by Indian
Agriculture due to low skill availability of workers in the cultivation area. These leads to
the decrease in the economy sector moreover author highlights the key points as growth
of agriculture state wise,employment available, growth rate of infants effecting the agri-
culture and the production of crops as per state in tabular format.the data set supports
to overcome these problems.
4 Data Model
The methodology that has been used from kimballs approach to data warehouse is kim-
balls bottom-up methodology,which are in the long run coordinated together to shape
the data warehouse. [Key points of Kimballs approach are building data warehouse takes
less time,low initial cost,less time for initial setup,generalist set of team can setup a data
warehouse]
Figure 7: STAR SCHEMA
Adamson (2012)Star schema detailed study is referred by the author using the Third
normal form.The model used for this data warehouse project is the star schema. Star
Schema is straight forward and highly effective. Star schema comprises of single fact
table table, which is surrounded by various dimension tables. The Dimensions used in
this project are Dimension Crop (Dim Crop), Dimension year (Dim year) and Dimension
country (Dim country). Each dimension has an essential key known as primary key,
which has been assigned into the fact table as a foreign key.The part of every dimensions
and the columns present in each are given in detail, below:
4.1 Dimension Year:
Dim year consists of the following columns-
a) Year ID b) Year
The Dimension year represents year columns and year ID. It provides all the years in
following columns taken from the SQL queries provided in the appendix.
4.2 Dimension Crop:
Dim Crop includes the following columns- a) Crop ID b) Crop
The Dimension Crop consist of the Crop ID for each of the crop and distinct crop
names.It includes all the crop names in following columns fetched form the SQL queries
provided in the appendix.
4.3 Dimension Country:
Dim Country consists of the following columns- a) Country ID b) Country
The Dimension Country contributes to the identity numbers for each year referred to
as Country ID. Country column consist of India as the project is based on India we can
achieve for other countries, for future instances.
4.4 Fact table:
The fact table includes Crop ID, Year ID, Country ID and the distinct values of Agricul-
tural land use in thousand hectares, Production of crops9in tonnes), Crop value index,
Crop consumption(in tonnes), GDP (in Percentage) and Indian Population. The Crop
ID, Year ID and Country ID are considered here as the foreign key which were the primary
keys fetched from the dimension tables Dim Crop, Dim Year and Dim Country.
Figure 8: DATA BASE
5 Logical Data Map
The Logical Data Map illustrates how each and every data source having similar or different columns are been transformed and utilized
as per project requirements.
Table 3: Logical Data Map describing all transforma-
tions, sources and destinations for all components of the
data model
Source Column Destination Column Type Transformation
1 Country
Name
Dim Country Country Dimension Country India was only extracted from the table
1 Year Dim Year Year Dimension No transformation was done in this column
1 Crop value Facttable Crop value Fact The decimal numbers were changed to whole numbers
2 Year Dim Year Year Dimension No transformation was done in this column
2 GDP Facttable Total GDP Fact No transformation was done in this column
3 Country
Name
Dim country Country Dimension No transformation was done in this column
3 Crop Dim Crop Crop Dimension Crops were present in column fromat which were
changed to row format
3 Year Dim Year Year Dimension No transformations was done in this column
3 Consumption Facttable Consumption Dimension No transformation was done in this column
4 Year Dim Year Year Dimension No transformation was done in this column
4 Crop Dim Crop Crop Dimension Crops were present in column fromat which were
changed to row format
4 Agriculture
Production
FactTable Agriculture
Production
Fact No transformation was done in this column
5 Country Dim Country Country Dimension Extracted using R code,then transposed the row to col-
umn
Continued on next page
Table 3 โ€“ Continued from previous page
Source Column Destination Column Type Transformation
5 Year Dim Year Year Dimension Extracted using R code,then transposed the row to col-
umn
5 PopUlation Facttable Population Facttable Extracted using R code,then transposed the row to col-
umn
6 Year Dim Year Year Dimension No transformation was done in this column
6 Country Dim COun-
try
Country Dimension Agriculture land of only India was extracted
6 Population Facttable Population Fact The column was unchanged
6 ETL Process
ETL is the process of extracting data and loading it to data warehouse. ETL refers to
Extraction transformation and loading.[Trujillo  Lujaฬn-Mora (2003) the author refers
to the better consideration of ETL process for data warehouse project. The author
also provides conceptual data flow diagrams to understand the ETL process. The basic
concept of schema is referred in this report.]
โ€ข EXTRACTION: Extraction is the detail study of differentiating information from a
source of database and use it in data warehouse project which is the starting stage
of ETL process. There are 6 sources of data, from 6 different data sets which are
cleaned using R studio. From the 6 different data sets,5 are structured (1 from
Statista,4 From different source as referred) and 1 is Unstructured.
โ€ข TRANSFORMATION AND LOADING: In the process of extracting the various
data sets from the different sources provided, the data sets are joined and cleaned
as per the project requirement. The data extracted from Statista consisted of GDP
with variable parts as industries and services which were removed. Kaggle data
consisted of total consumption of Kharif and Rabi crops which was redundant data
as per the project. The data fetched from country economy website which provides
population information which includes density and all male and female count over
certain period mentioned in data which was excess data. Agriculture land had all
the countries information which did not proved to be useful so was removed.
The mentioned cleaning process was executed in R studio and the code used is
mentioned below in appendix. Considering to the process of cleaning of data set,
the data set were arranged In proper CSV format. Using the SQL task in SSIS
the tables are truncated. The data cleaning part is done using RStudio and it
is automated within the SSIS, with the help of Execute process flow. Each data
source is included in the data flow task using the flat file source and OLE DB
destination in the SSMS. After completion of these process the data is imported in
the database. As the data is imported into the database, dimension tables are made
using the Execute SQL task, where the SQL query is been created to transfer the
required column from the Raw table created in database to the dimension table.
To be specified, there were 3 dimension tables crated as Dim Crop, Dim year and
Dim Country. The dimension tables had values and the distinct primary key. After
processing the dimension table, the next step is to create the fact table, which
includes all the different values and the foreign keys of each and every dimension.
After creation of the fact table the cube is deployed and it been automated using
the sequence generator.As per the connection provided between the factable and
dimension table are processed,after the cube is deployed. The whole task including
cleaning, fetching data, creation of fact and dimension and the cube deployment
is automated. The cube deployment process is executed in SQL server Analysis
Services.
โ€ข CUBE DEPLOYMENT: When the dimension tables and fact tables are stored, the
cube is executed using SQL server Analysis. The cube consists of fact table and
dimensions. When the cube is deployed,the star schema showed in figure is been
created. The automation task is executed to Analysis services automation task.
Figure 9: CONTROL FLOW
7 Application
7.1 BI Query1: Effect of Population on Agriculture land
The Sources of data for these Query are:
Source 1-country economy.com(population).
Source 2-world bank data (Agriculture land).
Figure 10: Results for BI Query 1
The query consists of data from country economy.com and world bank data. The
above query, highlights how increase in population affects the agriculture. Its a known
fact that with increase in population the consumption increases, thus somewhere around
the agricultural land(in hectors) does need to increase but as seen in above image it shows
clearly that the increase in land is not stable. This query states that there is nowhere
relation of population with the land but its just as the population increases consumption
increases.
7.2 BI Query 2: Agriculture Production vs Agriculture con-
sumption
The Sources of data for these Query are:
Source 1-data.gov.in (Agriculture Production).
Source 2-Kaggel (Agriculture Consumption).
Figure 11: Results for BI Query 2
After the white revolution (development of milk creation and urge to encourage Indian
farmers to keep more animals to expand generation of milk) in India the milk and milk
products have increased which has had an amazing impact on Indian Economy. Basic
agricultural products are mentioned above and as we can see the agriculture production
(in tonnes) and consumption (in tonnes) is not same. India is always know for its quality
food exports and thus Indian spices and produce can be seen all over the globe. If certain
types of products are promoted well their consumption can be increased and thus an
overall balance can be created just as years ago Indian government was promoting to the
masses that they should consume Dal(lentils) more and more and thus the consumption
increased. We can conclude that consumption of the protein rich diet is more with change
in production of that products in India.
7.3 BI Query 3: Effect on GDP by crop value and Agriculture
Production
The Sources of data for these Query are:
Source 1-data.gov.in (Agriculture Production).
Source 2-Statista (Agriculture GDP).
Figure 12: Results for BI Query 3
Source 3-crop value (data hub).
From the above Query it can be observed that the crop value for each crop taken into
consideration for each year is gradually increasing over the given period of time, hand in
hand we can see that there is also fluctuation in agriculture production(in tonnes) which
is greatest in 2012.These measures kept the country GDP(total in percentage) to stay
balanced in spite of the growing population with increase in consumption. The country
GDP as a crucial role in making country economy strong which can be achieved if GDP
of certain region is stable.
7.4 BI Query 4: How is the agriculture production considering
crop value
The Sources of data for these Query are:
Source 1-data.gov.in (Agriculture Production).
Source 2-crop value (data hub).
Figure 13: Results for BI Query 4
Most of the Indian population are vegetarians and even the meat eaters dont eat meat
every single day so thus the consumption is less. Crop values refers to the effective way
of plantation thus better the crop value better is the consumption. This query reflects
the fact that ways and means are needed to increase in the crop values so as to have more
production(in tonnes) so the increasing consumption can be achieved.
7.5 Discussion
Form the 4 BI QUERIES it can be encapsulated that agriculture land growth has been
effected by human growth, moreover to stabilized the agriculture sector GDP production
of agriculture crops, crop value , and crop consumption play a vital role.
From Query 1 we can summarize that the agriculture land usage is increasing with
less availability of land and growing inhabitants. Query 2 illustrates that certain FOOD
products production must be increased to meet up the needs of the inhabitants. In Query
3 the country GDP is been focused. It can be increased by growing the production of
crops considering the Crop value or to achieve a stable bench mark. Query 4 supports
the query 3 by how knowing the figures as how much the production of each crops must
be increased for particular crop.
Ganguli (1994)The author has a clearly stated as due to the extensive increase in
population agriculture sector is divested .Human adjustment for the environment and
concentrating on agriculture and economy, had made the Ganges valley low cultivated is
the example mentioned by the author. The analysis on each region is been illustrated
considering the factors as shifting cultivation, adversely increasing population density,
double cropping etc had made the change in agriculture trend. Due to extensive migration
the deforestation and soil erosion has increased which leads to struggling irrigation.
8 Conclusion and Future Work
The purpose of creation of these data warehouse project was to create an awareness for
the improvisation in the method of cultivation of crops by farmers,to improve the use
of land available for cultivation and to increase the country economy for the agriculture
sector by means of social media,chats,emails and media downloads. Indian government
has achieved a great bench mark for the nation and farmers development but still they
lack behind in achieving the proper GDP value. There should be also an increase in agri-
culture production as per requirement by providing advanced agriculture tools, pesticides
,fertilizers and proper storage availability.The BI Queries supports the problems arised.
The project justifies that factors that should improved to improve Indian agriculture.
The project can be improvised with importing new data by any significant source.
In future instance, as including the data sets of factor that can show how to drastically
improve agriculture are soil,rainfall,temperature,pesticide and fertilizers.Wolfert et al.
(2017) The author states how big data can be smartly implemented to increase the Crop
production and also stated analysis to support how land is cultivated for agriculture.
References
Adamson, C. (2012), Mastering data warehouse aggregates: solutions for star schema
performance, John Wiley  Sons.
Birthal, P. S., Joshi, P., Roy, D.  Thorat, A. (2007), Diversification in Indian agriculture
towards high-value crops, Vol. 727, Intl Food Policy Res Inst.
Dev, S. M. (2008), โ€˜Challenges for revival of indian agricultureโ€™.
Ganguli, B. (1994), โ€˜Trends of agriculture and population in the ganges valley: a study
in agricultural economics.โ€™.
Gottlieb, C.  Grobovek, J. (2019), โ€˜Communal land and agricultural productivityโ€™,
Journal of Development Economics 138, 135 โ€“ 152.
URL: http://www.sciencedirect.com/science/article/pii/S0304387818304462
Sahota, G. S. (1968), โ€˜Efficiency of resource allocation in indian agricultureโ€™, American
Journal of Agricultural Economics 50(3), 584โ€“605.
Sen, A. K. (1962), โ€˜An aspect of indian agricultureโ€™, Economic Weekly 14(4-6), 243โ€“246.
Trujillo, J.  Lujaฬn-Mora, S. (2003), A uml based approach for modeling etl processes
in data warehouses, in โ€˜International Conference on Conceptual Modelingโ€™, Springer,
pp. 307โ€“320.
Wolfert, S., Ge, L., Verdouw, C.  Bogaardt, M.-J. (2017), โ€˜Big data in smart farmingโ€“a
reviewโ€™, Agricultural Systems 153, 69โ€“80.
Appendix
R code
library(htmltab)
install.packages(htmltab)
scrap -as.data.frame(htmltab(doc=https :// countryeconomy .com/demography/
population/india,which=1))
scrap
write.csv(scrap ,file=C:/ Users/MOLAP/Desktop/New folder/scrap4.csv)
getwd ()
setwd(C:/ Users/MOLAP/Desktop/Newfolder)
file1 - read.csv(scrap1.csv,TRUE ,,)
summary(file1)
head(file1)
class(file1)
names(file1)
ncol(file1)
nrow(file1)
file2 - file1[,c(-3:-5)]
ncol(file2)
nrow(file2)
write.csv(file2,file=C:/ MOPLAP/Desktop/New folder/scrap2.csv)
#to delete unnecessary columns -
getwd ()
setwd(C:/ Users/MOLAP/Desktop/New folder)
Agriland1 - read.csv(API_AG.LND.AGRI.K2_DS2_en_csv_v2_10473928.csv,FALSE ,
install.packages(โ€™dplyr โ€™)
library(dplyr)
head(Agriland1)
names(Agriland1)
class(Agriland1)
view
Agriland2 - Agriland1[,c(-4,-5,-62,-63,-64)]
Agriland2 - Agriland1[c(-1,-2),]
Agriland2 - Agriland1[c(-1,-2),c(-4,-5,-62,-63,-64)]
Agriland2 - Agriland2[,c(1:3,40:61)]
ncol(Agriland2)
nrow(Agriland2)
Agriland2 - filter(Agriland1,V1 %in% c(India))
Agriland2 - filter(Agriland2,V1 %in% c(Country Name,India))
names(Agriland2)
Agriland2 - Agriland1
Agriland3 - Agriland2
Agriland3 - Agriland2[,c(-4:-39)]
write.csv(Agriland3,file = C: Users  MOLAP  Desktop New folder  agri.csv)
// SQL DIM CREATION
Drop table FACTTABLE;
drop table Dim_Year
create TABLE Dim_Year(
[Year_ID]int identity (1,1)primary key ,
[Year] numeric)
truncate table Dim_Year
insert into Dim_Year
select distinct
[a1].[ Year] from
Raw_Agriland as a1
full outer join Raw_consumption as a2 on [a2].[ Year ]=[a1].[ Year]
full outer join Raw_Cropindex as a3 on [a3].[ Year ]=[a2].[ Year]
full outer join Raw_GDP as a4 on [a4].[ Year ]=[a1].[ Year]
full outer join Raw_production as a5 on [a5].[ Year ]=[a3].[ Year]
full outer join Raw_population as a6 on [a6].[ Year ]=[a4].[ Year]
order by [Year]asc
drop table Dim_Crop
create table Dim_Crop(
[Crop_id]int identity (1,1)primary key ,
[Crop_Name] varchar(50))
truncate table Dim_Crop
insert into Dim_Crop
select distinct
[a6].[ Crop] from
Raw_consumption as a6
full outer join Raw_production as a5 on [a5].[ Crop ]=[a6].[ Crop]
drop table Dim_country
create TABLE Dim_Country(
[Country_ID]int identity (1,1)primary key ,
[Country] varchar(50))
truncate table Dim_Country
insert into Dim_Country (country)
select country from [dbo ].[ Raw_population ];
//SQL FACTTABLE CREATION
Create table FACTTABLE(
Year_id int foreign key references [dbo ].[ Dim_Year ]([ Year_ID]),
Country_id int foreign key references [dbo ].[ Dim_Country ]([ Country_ID]),
Crop_id int foreign key references [dbo ].[ Dim_Crop ]([ Crop_id]),
Agri_land float ,
Agri_COnsumption float ,
Crop_valueindex float ,
GDP_value float ,
COuntry_Population float ,
Agri_production float
)
Insert into FACTTABLE
select distinct a.year_ID ,b.Country_id ,c.Crop_id ,d.[ Agri_land],e.[ Agri_consum
g.[GDP_value],h.[ Popultion],i.[ Agri_Production] from Raw_Agriland d full oute
on d.Year = e.Year full outer join [dbo ].[ Raw_Cropindex] f on e.Year = f.Year
full outer join [dbo ].[ Raw_GDP] g on f.Year = g.Year full outer join
[dbo ].[ Raw_population] h on g.Year = h.Year
full outer join [dbo ].[ Raw_production] i on h.Year = i.Year inner join [dbo ].
[dbo ].[ Dim_Country] b on d.Country = b.Country inner join [dbo ].[ Dim_Crop]

More Related Content

Similar to Analysis of Indian Agriculture

Crop Prediction System using Machine Learning
Crop Prediction System using Machine LearningCrop Prediction System using Machine Learning
Crop Prediction System using Machine Learning
ijtsrd
ย 
Crop Prediction System using Machine Learning
Crop Prediction System using Machine LearningCrop Prediction System using Machine Learning
Crop Prediction System using Machine Learning
ijtsrd
ย 
An Efficient Online Offline Data Collection Software Solution for Creating Re...
An Efficient Online Offline Data Collection Software Solution for Creating Re...An Efficient Online Offline Data Collection Software Solution for Creating Re...
An Efficient Online Offline Data Collection Software Solution for Creating Re...
ijcseit
ย 
Sustainable Empowerment Model of Beef Cattle Business Farmers in the Dry Land...
Sustainable Empowerment Model of Beef Cattle Business Farmers in the Dry Land...Sustainable Empowerment Model of Beef Cattle Business Farmers in the Dry Land...
Sustainable Empowerment Model of Beef Cattle Business Farmers in the Dry Land...
IJSRED
ย 

Similar to Analysis of Indian Agriculture (20)

Crop Prediction System using Machine Learning
Crop Prediction System using Machine LearningCrop Prediction System using Machine Learning
Crop Prediction System using Machine Learning
ย 
India Online Grocery Market - Industry Size, Share, Trends, Opportunity & For...
India Online Grocery Market - Industry Size, Share, Trends, Opportunity & For...India Online Grocery Market - Industry Size, Share, Trends, Opportunity & For...
India Online Grocery Market - Industry Size, Share, Trends, Opportunity & For...
ย 
Internship Project Report on RATIOS
Internship Project Report on RATIOSInternship Project Report on RATIOS
Internship Project Report on RATIOS
ย 
An intensive analtics for farmer using big data
An intensive analtics for farmer using big dataAn intensive analtics for farmer using big data
An intensive analtics for farmer using big data
ย 
Crop Prediction System using Machine Learning
Crop Prediction System using Machine LearningCrop Prediction System using Machine Learning
Crop Prediction System using Machine Learning
ย 
An Efficient Online Offline Data Collection Software Solution for Creating Re...
An Efficient Online Offline Data Collection Software Solution for Creating Re...An Efficient Online Offline Data Collection Software Solution for Creating Re...
An Efficient Online Offline Data Collection Software Solution for Creating Re...
ย 
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdfAn Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
ย 
DWBI_Project_Women_Empowerment_and_Gender_Gap
DWBI_Project_Women_Empowerment_and_Gender_GapDWBI_Project_Women_Empowerment_and_Gender_Gap
DWBI_Project_Women_Empowerment_and_Gender_Gap
ย 
FARM-EASY
FARM-EASYFARM-EASY
FARM-EASY
ย 
An Architectural design proposal for IoT in Agriculture
An Architectural design proposal for IoT in AgricultureAn Architectural design proposal for IoT in Agriculture
An Architectural design proposal for IoT in Agriculture
ย 
IRJET- Farmerโ€™s Friend
IRJET-  	  Farmerโ€™s FriendIRJET-  	  Farmerโ€™s Friend
IRJET- Farmerโ€™s Friend
ย 
IRJET- Analysis of Crop Yield Prediction using Data Mining Technique to Predi...
IRJET- Analysis of Crop Yield Prediction using Data Mining Technique to Predi...IRJET- Analysis of Crop Yield Prediction using Data Mining Technique to Predi...
IRJET- Analysis of Crop Yield Prediction using Data Mining Technique to Predi...
ย 
Business Statistics Notes for Business and Commerce Department
Business Statistics Notes for Business and Commerce DepartmentBusiness Statistics Notes for Business and Commerce Department
Business Statistics Notes for Business and Commerce Department
ย 
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRYSOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
ย 
Data warehousing and Business intelligence project on Tourism sector's impact...
Data warehousing and Business intelligence project on Tourism sector's impact...Data warehousing and Business intelligence project on Tourism sector's impact...
Data warehousing and Business intelligence project on Tourism sector's impact...
ย 
DESIGN AND IMPLEMENTATION OF ONTOLOGY BASED ON SEMANTIC ANALYSIS FOR GIS APPL...
DESIGN AND IMPLEMENTATION OF ONTOLOGY BASED ON SEMANTIC ANALYSIS FOR GIS APPL...DESIGN AND IMPLEMENTATION OF ONTOLOGY BASED ON SEMANTIC ANALYSIS FOR GIS APPL...
DESIGN AND IMPLEMENTATION OF ONTOLOGY BASED ON SEMANTIC ANALYSIS FOR GIS APPL...
ย 
Data-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceData-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-Intelligence
ย 
IRJET- Android Application for Farmers
IRJET- 	  Android Application for FarmersIRJET- 	  Android Application for Farmers
IRJET- Android Application for Farmers
ย 
ASHE 2017 - Annual Status of Higher Education of States and UTs in India
ASHE 2017 - Annual Status of Higher Education of States and UTs in India ASHE 2017 - Annual Status of Higher Education of States and UTs in India
ASHE 2017 - Annual Status of Higher Education of States and UTs in India
ย 
Sustainable Empowerment Model of Beef Cattle Business Farmers in the Dry Land...
Sustainable Empowerment Model of Beef Cattle Business Farmers in the Dry Land...Sustainable Empowerment Model of Beef Cattle Business Farmers in the Dry Land...
Sustainable Empowerment Model of Beef Cattle Business Farmers in the Dry Land...
ย 

Recently uploaded

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
ย 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
ย 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
ย 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
ย 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
ย 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
kumargunjan9515
ย 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
ย 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
ย 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
ย 

Recently uploaded (20)

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
ย 
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...
๐Ÿ‘‰ Bhilai Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘๐Ÿ‘„ Top Class Call Girl Ser...
ย 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
ย 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
ย 
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts ServiceCall Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
ย 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
ย 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
ย 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ“ž 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ“ž 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ“ž 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ“ž 8448380779 Top Class C...
ย 
๐Ÿ’ž Safe And Secure Call Girls Agra Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘...
๐Ÿ’ž Safe And Secure Call Girls Agra Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘...๐Ÿ’ž Safe And Secure Call Girls Agra Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘...
๐Ÿ’ž Safe And Secure Call Girls Agra Call Girls Service Just Call ๐Ÿ‘๐Ÿ‘„6378878445 ๐Ÿ‘...
ย 
Charbagh + Female Escorts Service in Lucknow | Starting โ‚น,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting โ‚น,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting โ‚น,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting โ‚น,5K To @25k with A/C...
ย 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
ย 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
ย 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
ย 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
ย 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
ย 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
ย 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
ย 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
ย 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
ย 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
ย 

Analysis of Indian Agriculture

  • 1. Data Warehousing and Business Intelligence Project on INDIAN AGRICULTURE SUSHANT PRABHAT PARTE x18137440 https://www.youtube.com/watch?v=UazxAOpJ934 MSc/PGDip Data Analytics โ€“ 2019/20 Submitted to: Prof.SEAN HEENEY
  • 2. National College of Ireland Project Submission Sheet โ€“ 2019/2020 School of Computing Student Name: Sushant Prabhat Parte Student ID: x18137440 Programme: MSc Data Analytics Year: 2019/20 Module: Data Warehousing and Business Intelligence Lecturer: Prof. Sean Heeney Submission Due Date: April 12, 2019 Project Title: INDIAN AGRICULTURE URL: https://www.youtube.com/watch?v=UazxAOpJ934 I hereby certify that the information contained in this (my submission) is information pertaining to my own individual work that I conducted for this project. All information other than my own contribution is fully and appropriately referenced and listed in the relevant bibliography section. I assert that I have not referred to any work(s) other than those listed. I also include my TurnItIn report with this submission. ALL materials used must be referenced in the bibliography section. Students are encouraged to use the Harvard Referencing Standard supplied by the Library. To use other authorโ€™s written or electronic work is an act of plagiarism and may result in disci- plinary action. Students may be required to undergo a viva (oral examination) if there is suspicion about the validity of their submitted work. Signature: Date: April 12, 2019 PLEASE READ THE FOLLOWING INSTRUCTIONS: 1. Please attach a completed copy of this sheet to each project (including multiple copies). 2. You must ensure that you retain a HARD COPY of ALL projects, both for your own reference and in case a project is lost or mislaid. It is not sufficient to keep a copy on computer. Please do not bind projects or place in covers unless specifically requested. 3. Assignments that are submitted to the Programme Coordinator office must be placed into the assignment box located outside the office. Office Use Only Signature: Date: Penalty Applied (if applicable):
  • 3. Table 1: Mark sheet โ€“ do not edit Criteria Mark Awarded Comment(s) Objectives of 5 Related Work of 10 Data of 25 ETL of 20 Application of 30 Video of 10 Presentation of 10 Total of 100
  • 4. Project Check List This section capture the core requirements that the project entails represented as a check list for convenience. Used L A TEX template Three Business Requirements listed in introduction At least one structured data source At least one unstructured data source At least three sources of data Described all sources of data All sources of data are less than one year old, i.e. released after 17/09/2017 Inserted and discussed star schema Completed logical data map Discussed the high level ETL strategy Provided 3 BI queries Detailed the sources of data used in each query Discussed the implications of results in each query Reviewed at least 5-10 appropriate papers on topic of your DWBI project
  • 5. INDIAN AGRICULTURE Sushant Prabhat Parte x18137440 April 12, 2019 Abstract India is known for its agriculture. Every state and region have different soil types and different crops. Since this is a base of India, it is a very huge business too. Various factors are affected by agriculture such as GDP value and overall economy of India. Every region all over the world today use Indian cultivated spices, tea and lots of various agricultural product. Best way to study and understand this field is by data analytics. A data set has been chosen wherein various attributes like Population, GDP, crop index value, production, consumption and has information about agricultural land. Its a known fact that as population increases the land for agriculture decreases. India has passed a regulation years ago that no agricultural land should be used for any other purpose thus this fact is going very slow but as you do data analytics on this, we know somehow its still a fact. According to the analysis made on data set, it shows how important it is to do this type of data analysis to regulated and have a proper control over agriculture. Many businesses run due to agriculture and such analysis also helps new businesses to understand the risk factors and know where they stand in business. Its so much necessary to do the market survey and predict facts to minimize the risk factors for such businesses. 1 Introduction As per mentioned in the Abstract, dataset available in on Agriculture India and this whole project is on how smartly we can analyze any field of business using data warehouse queries and find out that the known facts are true. As days pass by all fields of businesses develop some facts which are taken to be true but, it all depends on various factors. Data warehousing and business intelligence thus serves the purpose of having a strong base to our predictions and it shows and highlights all the risk factors included in running a business.Agriculture is the basic and long running business in country like India. In this project as it goes down to conclusion the person would have a definite idea of this business and with strong base points. Req-1: Effect of population on Agriculture land Req-2: Consumption of Crops as per the Production of Crops Req-3: Crop production considering the crops value on GDP 1
  • 6. Source Type Brief Summary Data hub Structured Extracted to get data about all out of Crop value index. Statista Structured Extracted to get data about the GDP of In- dian agriculture. Kaggle Structured Extracted to get data about the consump- tion of crops over the years. Data gov.in Structured Extracted to get data about the production of crops in India. Country econ- omy.com Unstructured Scraped to get data about India population . data.world bank structured Extracted to get data about Agriculture land available in India. Table 2: Summary of sources of data used in the project 2 Data Sources 2.1 Source 1: Data hub The first data set is taken from data hub :https://datahub.io/world-bank/ag.prd.crop.xd,the dataset which is created 9 months ago, illustrates the crop value index.Crop Production Index demonstrates farming generation for every year in respect to the base time period 2004-2006. Figure 1: SOURCE 1 2.2 Source 2: Statista The second source of Structured data is fetched from Statista: https://www.statista.com/statistics/2713 of-gross-domestic-product-gdp-across-economic-sectors-in-india/ which is published on Novem- ber 2018.The statistic data extracted from statista represents total national output (GDP)
  • 7. crosswise over different areas in India from 2007 to 2017.Agriculture sector contributed around 15.45 percent to the GDP of India Figure 2: SOURCE 2 2.3 Source 3: Kaggle The third source of structured data is fetched from Kaggle: https://www.kaggle.com/dorbicycle/world- foodfeed-production which is created one year ago.This data set gives an overall idea about the consummation of different crops over different period of time. Figure 3: SOURCE 3
  • 8. 2.4 Source 4: data gov.in The fourth source of structured data is taken from data gov.in-https://data.gov.in/resources/all- India-level-production-principal-crops-2001-02-2016-17.The information mention to All India production of essential crops (grains, oil seeds, cotton, jute and so on). Figure 4: Source 4 2.5 Source 5: country economy.com The unstructured data,which is the fifth source of data is scraped from:https://countryeconomy.com/dem information outlines India population from time of 1960 t0 2017.India ranks No. 2 among 196 nations which is published his data in country economy.com Figure 5: SOURCE 5
  • 9. 2.6 Source 6: data.world bank The sixth source of data is fetched from :https://data.worldbank.org/topic/agriculture- and-rural-development.The information reprents the territory of land available in India for farming over a time of 1960 to 2016. Figure 6: SOURCE 6 3 Related Work (Sahota 1968)The author represents an statistical study of resources in Indian agriculture. The function, production in agriculture is considered to be the major data in the project. The different quantity of crops produced in different parts of Indian state with availability of land is mentioned in figures in these article. Birthal et al. (2007)The author examines the ideology of how important the crop value is for the consumption purpose. As India is a vast territory for its diverse foodstuffs, Indian spices and milk products author has mentioned state wise distribution of crops value for different crops cultivated in that areas. Due to shifting agriculture there is massive destruction of cultivable land which could been overcome by the authors ideas. Gottlieb Grobovek (2019)The author enlights the points about the reasons the productivity in agriculture has decreased. The unskilled farmers with poor knowledge of crops ,soil and smart agriculture leads to decrease in overall agricultural income which massively effects the Indian economy and revenue system. Wolfert et al. (2017)The citation provided examines how big data can be used in the smart farming with an overflow of the task which can be reviewed. The statistical analysis is been used in the project to conclude the task of available land and consumption of crops for agriculture. Sen (1962)The ideology refers to the problems faced by different states in India to be stated as poverty,low crop prices acquired by farmers etc these problems are arrived due to the growing population,less availability of land,inappropriate production of crops. The author mainly referes to the profit percentage achieved by farmers to the land in acers they own.These statistical data is been reflected in the queries by how the land is
  • 10. affecting by increase in population where as how stable GDP can be achieved.the author also refers to the loan sanctioned to the farmers by government on particular factors. Dev (2008)The author had focused on the wide range of challenges faced by Indian Agriculture due to low skill availability of workers in the cultivation area. These leads to the decrease in the economy sector moreover author highlights the key points as growth of agriculture state wise,employment available, growth rate of infants effecting the agri- culture and the production of crops as per state in tabular format.the data set supports to overcome these problems. 4 Data Model The methodology that has been used from kimballs approach to data warehouse is kim- balls bottom-up methodology,which are in the long run coordinated together to shape the data warehouse. [Key points of Kimballs approach are building data warehouse takes less time,low initial cost,less time for initial setup,generalist set of team can setup a data warehouse] Figure 7: STAR SCHEMA Adamson (2012)Star schema detailed study is referred by the author using the Third normal form.The model used for this data warehouse project is the star schema. Star Schema is straight forward and highly effective. Star schema comprises of single fact table table, which is surrounded by various dimension tables. The Dimensions used in this project are Dimension Crop (Dim Crop), Dimension year (Dim year) and Dimension country (Dim country). Each dimension has an essential key known as primary key, which has been assigned into the fact table as a foreign key.The part of every dimensions and the columns present in each are given in detail, below: 4.1 Dimension Year: Dim year consists of the following columns- a) Year ID b) Year The Dimension year represents year columns and year ID. It provides all the years in following columns taken from the SQL queries provided in the appendix.
  • 11. 4.2 Dimension Crop: Dim Crop includes the following columns- a) Crop ID b) Crop The Dimension Crop consist of the Crop ID for each of the crop and distinct crop names.It includes all the crop names in following columns fetched form the SQL queries provided in the appendix. 4.3 Dimension Country: Dim Country consists of the following columns- a) Country ID b) Country The Dimension Country contributes to the identity numbers for each year referred to as Country ID. Country column consist of India as the project is based on India we can achieve for other countries, for future instances. 4.4 Fact table: The fact table includes Crop ID, Year ID, Country ID and the distinct values of Agricul- tural land use in thousand hectares, Production of crops9in tonnes), Crop value index, Crop consumption(in tonnes), GDP (in Percentage) and Indian Population. The Crop ID, Year ID and Country ID are considered here as the foreign key which were the primary keys fetched from the dimension tables Dim Crop, Dim Year and Dim Country. Figure 8: DATA BASE
  • 12. 5 Logical Data Map The Logical Data Map illustrates how each and every data source having similar or different columns are been transformed and utilized as per project requirements. Table 3: Logical Data Map describing all transforma- tions, sources and destinations for all components of the data model Source Column Destination Column Type Transformation 1 Country Name Dim Country Country Dimension Country India was only extracted from the table 1 Year Dim Year Year Dimension No transformation was done in this column 1 Crop value Facttable Crop value Fact The decimal numbers were changed to whole numbers 2 Year Dim Year Year Dimension No transformation was done in this column 2 GDP Facttable Total GDP Fact No transformation was done in this column 3 Country Name Dim country Country Dimension No transformation was done in this column 3 Crop Dim Crop Crop Dimension Crops were present in column fromat which were changed to row format 3 Year Dim Year Year Dimension No transformations was done in this column 3 Consumption Facttable Consumption Dimension No transformation was done in this column 4 Year Dim Year Year Dimension No transformation was done in this column 4 Crop Dim Crop Crop Dimension Crops were present in column fromat which were changed to row format 4 Agriculture Production FactTable Agriculture Production Fact No transformation was done in this column 5 Country Dim Country Country Dimension Extracted using R code,then transposed the row to col- umn Continued on next page
  • 13. Table 3 โ€“ Continued from previous page Source Column Destination Column Type Transformation 5 Year Dim Year Year Dimension Extracted using R code,then transposed the row to col- umn 5 PopUlation Facttable Population Facttable Extracted using R code,then transposed the row to col- umn 6 Year Dim Year Year Dimension No transformation was done in this column 6 Country Dim COun- try Country Dimension Agriculture land of only India was extracted 6 Population Facttable Population Fact The column was unchanged
  • 14. 6 ETL Process ETL is the process of extracting data and loading it to data warehouse. ETL refers to Extraction transformation and loading.[Trujillo Lujaฬn-Mora (2003) the author refers to the better consideration of ETL process for data warehouse project. The author also provides conceptual data flow diagrams to understand the ETL process. The basic concept of schema is referred in this report.] โ€ข EXTRACTION: Extraction is the detail study of differentiating information from a source of database and use it in data warehouse project which is the starting stage of ETL process. There are 6 sources of data, from 6 different data sets which are cleaned using R studio. From the 6 different data sets,5 are structured (1 from Statista,4 From different source as referred) and 1 is Unstructured. โ€ข TRANSFORMATION AND LOADING: In the process of extracting the various data sets from the different sources provided, the data sets are joined and cleaned as per the project requirement. The data extracted from Statista consisted of GDP with variable parts as industries and services which were removed. Kaggle data consisted of total consumption of Kharif and Rabi crops which was redundant data as per the project. The data fetched from country economy website which provides population information which includes density and all male and female count over certain period mentioned in data which was excess data. Agriculture land had all the countries information which did not proved to be useful so was removed. The mentioned cleaning process was executed in R studio and the code used is mentioned below in appendix. Considering to the process of cleaning of data set, the data set were arranged In proper CSV format. Using the SQL task in SSIS the tables are truncated. The data cleaning part is done using RStudio and it is automated within the SSIS, with the help of Execute process flow. Each data source is included in the data flow task using the flat file source and OLE DB destination in the SSMS. After completion of these process the data is imported in the database. As the data is imported into the database, dimension tables are made using the Execute SQL task, where the SQL query is been created to transfer the required column from the Raw table created in database to the dimension table. To be specified, there were 3 dimension tables crated as Dim Crop, Dim year and Dim Country. The dimension tables had values and the distinct primary key. After processing the dimension table, the next step is to create the fact table, which includes all the different values and the foreign keys of each and every dimension. After creation of the fact table the cube is deployed and it been automated using the sequence generator.As per the connection provided between the factable and dimension table are processed,after the cube is deployed. The whole task including cleaning, fetching data, creation of fact and dimension and the cube deployment is automated. The cube deployment process is executed in SQL server Analysis Services. โ€ข CUBE DEPLOYMENT: When the dimension tables and fact tables are stored, the cube is executed using SQL server Analysis. The cube consists of fact table and dimensions. When the cube is deployed,the star schema showed in figure is been created. The automation task is executed to Analysis services automation task.
  • 15. Figure 9: CONTROL FLOW 7 Application 7.1 BI Query1: Effect of Population on Agriculture land The Sources of data for these Query are: Source 1-country economy.com(population). Source 2-world bank data (Agriculture land). Figure 10: Results for BI Query 1 The query consists of data from country economy.com and world bank data. The above query, highlights how increase in population affects the agriculture. Its a known fact that with increase in population the consumption increases, thus somewhere around the agricultural land(in hectors) does need to increase but as seen in above image it shows clearly that the increase in land is not stable. This query states that there is nowhere
  • 16. relation of population with the land but its just as the population increases consumption increases. 7.2 BI Query 2: Agriculture Production vs Agriculture con- sumption The Sources of data for these Query are: Source 1-data.gov.in (Agriculture Production). Source 2-Kaggel (Agriculture Consumption). Figure 11: Results for BI Query 2 After the white revolution (development of milk creation and urge to encourage Indian farmers to keep more animals to expand generation of milk) in India the milk and milk products have increased which has had an amazing impact on Indian Economy. Basic agricultural products are mentioned above and as we can see the agriculture production (in tonnes) and consumption (in tonnes) is not same. India is always know for its quality food exports and thus Indian spices and produce can be seen all over the globe. If certain types of products are promoted well their consumption can be increased and thus an overall balance can be created just as years ago Indian government was promoting to the masses that they should consume Dal(lentils) more and more and thus the consumption increased. We can conclude that consumption of the protein rich diet is more with change in production of that products in India. 7.3 BI Query 3: Effect on GDP by crop value and Agriculture Production The Sources of data for these Query are: Source 1-data.gov.in (Agriculture Production). Source 2-Statista (Agriculture GDP).
  • 17. Figure 12: Results for BI Query 3 Source 3-crop value (data hub). From the above Query it can be observed that the crop value for each crop taken into consideration for each year is gradually increasing over the given period of time, hand in hand we can see that there is also fluctuation in agriculture production(in tonnes) which is greatest in 2012.These measures kept the country GDP(total in percentage) to stay balanced in spite of the growing population with increase in consumption. The country GDP as a crucial role in making country economy strong which can be achieved if GDP of certain region is stable. 7.4 BI Query 4: How is the agriculture production considering crop value The Sources of data for these Query are: Source 1-data.gov.in (Agriculture Production). Source 2-crop value (data hub). Figure 13: Results for BI Query 4 Most of the Indian population are vegetarians and even the meat eaters dont eat meat
  • 18. every single day so thus the consumption is less. Crop values refers to the effective way of plantation thus better the crop value better is the consumption. This query reflects the fact that ways and means are needed to increase in the crop values so as to have more production(in tonnes) so the increasing consumption can be achieved. 7.5 Discussion Form the 4 BI QUERIES it can be encapsulated that agriculture land growth has been effected by human growth, moreover to stabilized the agriculture sector GDP production of agriculture crops, crop value , and crop consumption play a vital role. From Query 1 we can summarize that the agriculture land usage is increasing with less availability of land and growing inhabitants. Query 2 illustrates that certain FOOD products production must be increased to meet up the needs of the inhabitants. In Query 3 the country GDP is been focused. It can be increased by growing the production of crops considering the Crop value or to achieve a stable bench mark. Query 4 supports the query 3 by how knowing the figures as how much the production of each crops must be increased for particular crop. Ganguli (1994)The author has a clearly stated as due to the extensive increase in population agriculture sector is divested .Human adjustment for the environment and concentrating on agriculture and economy, had made the Ganges valley low cultivated is the example mentioned by the author. The analysis on each region is been illustrated considering the factors as shifting cultivation, adversely increasing population density, double cropping etc had made the change in agriculture trend. Due to extensive migration the deforestation and soil erosion has increased which leads to struggling irrigation. 8 Conclusion and Future Work The purpose of creation of these data warehouse project was to create an awareness for the improvisation in the method of cultivation of crops by farmers,to improve the use of land available for cultivation and to increase the country economy for the agriculture sector by means of social media,chats,emails and media downloads. Indian government has achieved a great bench mark for the nation and farmers development but still they lack behind in achieving the proper GDP value. There should be also an increase in agri- culture production as per requirement by providing advanced agriculture tools, pesticides ,fertilizers and proper storage availability.The BI Queries supports the problems arised. The project justifies that factors that should improved to improve Indian agriculture. The project can be improvised with importing new data by any significant source. In future instance, as including the data sets of factor that can show how to drastically improve agriculture are soil,rainfall,temperature,pesticide and fertilizers.Wolfert et al. (2017) The author states how big data can be smartly implemented to increase the Crop production and also stated analysis to support how land is cultivated for agriculture. References Adamson, C. (2012), Mastering data warehouse aggregates: solutions for star schema performance, John Wiley Sons.
  • 19. Birthal, P. S., Joshi, P., Roy, D. Thorat, A. (2007), Diversification in Indian agriculture towards high-value crops, Vol. 727, Intl Food Policy Res Inst. Dev, S. M. (2008), โ€˜Challenges for revival of indian agricultureโ€™. Ganguli, B. (1994), โ€˜Trends of agriculture and population in the ganges valley: a study in agricultural economics.โ€™. Gottlieb, C. Grobovek, J. (2019), โ€˜Communal land and agricultural productivityโ€™, Journal of Development Economics 138, 135 โ€“ 152. URL: http://www.sciencedirect.com/science/article/pii/S0304387818304462 Sahota, G. S. (1968), โ€˜Efficiency of resource allocation in indian agricultureโ€™, American Journal of Agricultural Economics 50(3), 584โ€“605. Sen, A. K. (1962), โ€˜An aspect of indian agricultureโ€™, Economic Weekly 14(4-6), 243โ€“246. Trujillo, J. Lujaฬn-Mora, S. (2003), A uml based approach for modeling etl processes in data warehouses, in โ€˜International Conference on Conceptual Modelingโ€™, Springer, pp. 307โ€“320. Wolfert, S., Ge, L., Verdouw, C. Bogaardt, M.-J. (2017), โ€˜Big data in smart farmingโ€“a reviewโ€™, Agricultural Systems 153, 69โ€“80. Appendix R code library(htmltab) install.packages(htmltab) scrap -as.data.frame(htmltab(doc=https :// countryeconomy .com/demography/ population/india,which=1)) scrap write.csv(scrap ,file=C:/ Users/MOLAP/Desktop/New folder/scrap4.csv) getwd () setwd(C:/ Users/MOLAP/Desktop/Newfolder) file1 - read.csv(scrap1.csv,TRUE ,,) summary(file1) head(file1) class(file1) names(file1) ncol(file1) nrow(file1) file2 - file1[,c(-3:-5)] ncol(file2) nrow(file2) write.csv(file2,file=C:/ MOPLAP/Desktop/New folder/scrap2.csv)
  • 20. #to delete unnecessary columns - getwd () setwd(C:/ Users/MOLAP/Desktop/New folder) Agriland1 - read.csv(API_AG.LND.AGRI.K2_DS2_en_csv_v2_10473928.csv,FALSE , install.packages(โ€™dplyr โ€™) library(dplyr) head(Agriland1) names(Agriland1) class(Agriland1) view Agriland2 - Agriland1[,c(-4,-5,-62,-63,-64)] Agriland2 - Agriland1[c(-1,-2),] Agriland2 - Agriland1[c(-1,-2),c(-4,-5,-62,-63,-64)] Agriland2 - Agriland2[,c(1:3,40:61)] ncol(Agriland2) nrow(Agriland2) Agriland2 - filter(Agriland1,V1 %in% c(India)) Agriland2 - filter(Agriland2,V1 %in% c(Country Name,India)) names(Agriland2) Agriland2 - Agriland1 Agriland3 - Agriland2 Agriland3 - Agriland2[,c(-4:-39)] write.csv(Agriland3,file = C: Users MOLAP Desktop New folder agri.csv) // SQL DIM CREATION Drop table FACTTABLE; drop table Dim_Year create TABLE Dim_Year( [Year_ID]int identity (1,1)primary key , [Year] numeric) truncate table Dim_Year insert into Dim_Year select distinct [a1].[ Year] from Raw_Agriland as a1 full outer join Raw_consumption as a2 on [a2].[ Year ]=[a1].[ Year] full outer join Raw_Cropindex as a3 on [a3].[ Year ]=[a2].[ Year] full outer join Raw_GDP as a4 on [a4].[ Year ]=[a1].[ Year] full outer join Raw_production as a5 on [a5].[ Year ]=[a3].[ Year] full outer join Raw_population as a6 on [a6].[ Year ]=[a4].[ Year] order by [Year]asc
  • 21. drop table Dim_Crop create table Dim_Crop( [Crop_id]int identity (1,1)primary key , [Crop_Name] varchar(50)) truncate table Dim_Crop insert into Dim_Crop select distinct [a6].[ Crop] from Raw_consumption as a6 full outer join Raw_production as a5 on [a5].[ Crop ]=[a6].[ Crop] drop table Dim_country create TABLE Dim_Country( [Country_ID]int identity (1,1)primary key , [Country] varchar(50)) truncate table Dim_Country insert into Dim_Country (country) select country from [dbo ].[ Raw_population ]; //SQL FACTTABLE CREATION Create table FACTTABLE( Year_id int foreign key references [dbo ].[ Dim_Year ]([ Year_ID]), Country_id int foreign key references [dbo ].[ Dim_Country ]([ Country_ID]), Crop_id int foreign key references [dbo ].[ Dim_Crop ]([ Crop_id]), Agri_land float , Agri_COnsumption float , Crop_valueindex float , GDP_value float , COuntry_Population float , Agri_production float ) Insert into FACTTABLE select distinct a.year_ID ,b.Country_id ,c.Crop_id ,d.[ Agri_land],e.[ Agri_consum g.[GDP_value],h.[ Popultion],i.[ Agri_Production] from Raw_Agriland d full oute on d.Year = e.Year full outer join [dbo ].[ Raw_Cropindex] f on e.Year = f.Year full outer join [dbo ].[ Raw_GDP] g on f.Year = g.Year full outer join [dbo ].[ Raw_population] h on g.Year = h.Year full outer join [dbo ].[ Raw_production] i on h.Year = i.Year inner join [dbo ]. [dbo ].[ Dim_Country] b on d.Country = b.Country inner join [dbo ].[ Dim_Crop]