SlideShare a Scribd company logo
Big data is an extensive collection of both structured and unstructured data that
can be mined for information and analyzed to build predictive systems for better
decision making.
Big data is consider as a large collection of dataset which is having high volume,
velocity, variety.
Volume: The amount of data generated
Velocity: How speed data is generated and processed
Variety : Variation of data with respect to the time
Big data analysis is successfully adopted to industries
like banking, insurance etc. Allthough agriculture did
not adopted big data analysis for past few years,
recently it is used to use it .
Big data analytics in agriculture can be studied under
two major areas: Smart farming
And precision agriculture.
Farmers use big data to get information on changing
weather, rainfall, fertilizer usage, and other factors
that impact the crop yield.
All of the information assists farmers in
making accurate and dependable decision that
maximize their productivity and cultivating the land.
The role of Big Data in the Agriculture industry
Big data in the agriculture industry is completely based
on using technology, information, and analytics to bring
useful information to farmers. Big data can be utilized for
grabbing information about the agriculture industry or it
can prove beneficial for any specific segment or area to
improve its efficiency. Data mining processes are utilized
by Big Data to create such vital information. With this
methodology, you can find the important patterns in a
huge set of data and condense this information into
useful forms. There are different modern systems, such
as artificial intelligence, machine learning statistics, and
more, that are used in the big data mechanism.
How is big data analytics transforming agriculture?
 Boosting productivity – Data collected from GPS-
equipped tractors, soil sensors, and other external
sources has helped in better management of seeds,
pesticides, and fertilizers while increasing productivity to
feed the ever-increasing global population.
 Access to plant genome information – This has allowed
the development of useful agronomic traits.
 Predicting yields – Mathematical models and machine
learning are used to collate and analyze data obtained
from yield, chemicals, weather, and biomass index. The
use of sensors for data collection reduces erroneous
manual work and provides useful insights on yield
 Risk management– Data-driven farming has mitigated
crop failures arising due to changing weather patterns.
 Food safety – Collection of data relating to temperature,
humidity, and chemicals, lowers the risk of food spoilage
by early detection of microbes and other contaminants.
 To counter pressures of increasing food demand and climate changes,
policymakers and industry leaders are seeking assistants from technology
forces such as Iot,bigdata analytics and cloud computing.
 IoT, devices helping first phase of this process –data collection ,sensor plugged
in tractors and trucks as well as in fields, soil and plants aid in the collection of
real time data directly from the ground.
 Second, analysts integrate the large amount of data collected with other
information available in the cloud such as weather data pricing with models to
determine the patterns.
 Finally, these patterns and insights assist in controlling the problem. They
helped to pinpoint existence issues, like operational inefficiencies and
 The adoption of analytics in agriculture has been increasing consistently
;its market size is expected to grow from USD 585million in 2018 to USD
1236 million by 2023 at a compound annual rate (CAGR)of 16.2%.
Top four use cases for big data on the form:
1.Feeding a growing population.
2.Using pesticides ethical.
3.Optimizing form equipment.
4.Managing supply chain.
1. Feeding a growing population:
This is one of the key challenges that even governments are putting their
heads together to solve. One way to achieve this is to increase the yield from
existing farmlands.
Big data provides formers granular data on rainfall patterns, water cycles,
fertilizer requirements, and more. This enables them to make smart decisions,
such as what crops to plant for better profitability and when to harvest. The
right decisions ultimately improve farm yields.
2. Using pesticides ethnically
 Administration of pesticides has been a contentious issue due to its side effects
on the ecosystem. Big data allows farmers to manage this better by
recommending what pesticides to apply, when ,and by how much. By
monitoring it closely farmers can adhere to government regulations and avoid
overuse of chemicals in food production. Moreover, this leads to increased
profitability because crops don't get destroyed by weeds and insects
3. Optimizing farm equipment
 Companies like John Deere have integrated sensors in their farming
equipment and deployed big data applications that will help better
manage their fleet. For large farms, this level of monitoring can be a life
saver as it lets users know of tractor availability, service due dates, and
fuel refill alerts . In essence this optimizes usage and ensure the long -
term health of farm equipment.
4. Managing supply chain issues:
 McKinney reports that a third of food produced for human consumption
is lost or wasted every year. A devastating fact since the industry
struggles to bridge the gap between supply and demand. To address
this, food delivery cycle from producer to the market need to be reduced.
Big data can help achieve supply chain efficiencies by tracking and
optimizing delivery truck routes.
Analysis of agriculture data using data mining
techniques: application of big data
In agriculture sector where farmers and agribusinesses have to make
innumerable decisions every day and intricate complexities involves the
various factors influencing them. An essential issue for agricultural planning
intention is the accurate yield estimation for the numerous crops involved in
the planning. Data mining techniques are necessary approach for
accomplishing practical and effective solutions for this problem. Agriculture
has been an obvious target for big data. Environmental conditions, variability
in soil, input levels, combinations and commodity prices have made it all the
more relevant for farmers to use information and get help to make critical
farming decisions. This focuses on the analysis of the agriculture data and
finding optimal parameters to maximize the crop production using data
mining techniques like PAM, CLARA, DBSCAN and Multiple Linear Regression.
Mining the large amount of existing crop, soil and climatic data, and analyzing
new, non-experimental data optimizes the production and makes agriculture
more resilient to climatic change.
Big Data, PAM, CLARA
 Input dataset consist of 6 year data with following parameters namely: year, State-
Karnataka (28 districts), District, crop (cotton, groundnut, jowar, rice and wheat.),
season (kharif, rabi, summer), area (in hectares), production (in tonnes), average
temperature (°C), average rainfall (mm), soil, PH value, soil type, major fertilizers,
nitrogen (kg/Ha), phosphorus (Kg/Ha),Potassium(Kg/Ha), minimum rainfall required,
minimum temperature required.
 In proposed work, modified approach of DBSCAN method is used to cluster the data
based on districts which are having similar temperature, rain fall and soil type. PAM
and CLARA are used to cluster the data based on the districts which are producing
maximum crop production (In proposed work wheat crop is considered as example).
Based on these analyses we are obtaining the optimal parameters to produce the
maximum crop production. Multiple linear regression method is used to forecast the
annual crop yield
Partition around medoids (PAM)
 It is a partitioning based algorithm. It breaks the input data into number of
groups. It finds a set of objects called medoids that are centrally located.
With the medoids, nearest data points can be calculated and made it as
clusters. The algorithm has two phases:
 1. BUILD phase, a collection of k objects are selected for an initial set S.
• Arbitrarily choose k objects as the initial medoids.
• Until no change, do.
–– (Re) assign each object to the cluster with the nearest medoid.
– Improve the quality of the k-medoids .
 2. SWAP phase, one tries to improve the quality of the clustering by exchanging
selected objects with unselected objects. Choose the minimum swapping cost.
Example: For each medoid m1, for each non-medoid data point d; Swap m1 and d,
recomputed the cost (sum of distances of points to their medoid), if total cost of
the configuration increased in the previous step, undo the swap Fig. 2 depicts the
steps involved the PAM algorithms.
CLARA (clustering large applications)
 CLARA (clustering large applications) It is designed by Kaufman and Rousseeuw
to handle large datasets, CLARA (clustering large applications) relies on sampling
. Instead of finding representative objects for the entire data set, CLARA draws a
sample of the data set, applies PAM on the sample, and finds the medoids of the
sample. To come up with better approximations, CLARA draws multiple samples
and gives the best clustering as the output. Here, for accuracy, the quality of the
clustering is measured based on the average dissimilarity of all objects in the
entire data set.
Multiple linear regression
to forecast the crop yield
Multiple linear regression is a variant of
“linear regression” analysis. Tis model is
built to establish the relationship that
exists between one dependent variable
and two or more independent variables
.For a given dataset where x1… xk are
independent variables and Y is a
dependent variable, the multiple linear
regression fts the dataset to the model:
yi = β0 + β1x1i + β2x2i +···+ βkxki + ε
 is the y-intercept and β1, β2, ... , βk parameters are called the partial
coeffient.In matrix form
Y = XB + E Y ,
Before applying the multiple linear regression to forecast the crop yield,
it’s necessary to know the significant attributes from the database. All the
attributes used in the database will not be significant or changing the
value of these attributes will not affect anything on the dependent
variables. Such attributes can be neglected. P value test is performed on
the database to find the significant attributes and multiple linear
regression is applied only on the significant values to forecast the crop
Evaluation methods
depicts the different districts of Karnataka which are
having similar temperature range, rain fall range and soil
types respectively..
 PAM To apply the PAM algorithm on the dataset, initially user need to give
k (Number of clusters), where k is given as 3 in current experiment. Crop
yield is categorised into LOW, MODERATE and HIGH production. Total
districts are clustered into 3 clusters using PAM clustering method.
 As a result of the analysis, North Karnataka districts such as Bijapur,
Dharwad, Bagalkot, Belgaum, Raichur, Bellary, Chitradurga and Davangere
are the districts which have maximum wheat crop production
RESULTS/Clusters: 29
Study and analysis of temperature and wheat crop
production in different districts of Karnataka as shown in
Fig. 12. From the Fig. 12, we can analyze that the optimal
temperature for Wheat crop
 Various data mining techniques are implemented on the input data to
assess the best performance yielding method. The present work used data
mining techniques PAM, CLARA and DBSCAN to obtain the optimal climate
requirement of wheat like optimal range of best temperature, worst
temperature and rain fall to achieve higher production of wheat crop.
Clustering methods are compared using quality metrics. According to the
analyses of clustering quality metrics, DBSCAN gives the better clustering
quality than PAM and CLARA, CLARA gives the better clustering quality
than the PAM. Te proposed work can also be extended to analyze the soil
and other factors for the crop and to increase the crop production under
the different climatic conditions
Image Processing IM toolkit, VTK toolkit, OpenCv library
Machine Learning R, Google tenserflow, Apache Mahout, Weka, Mlpack
Cloud based platforms
for large scale
information storing
EMC corporation, MapR converged data platforms, Apache Pig,
Big Databases Hive, HadoopDB, Mango DB, Google big table, Cassandra, PostGIS
Statistical tools Norsys Netica, R, Weka
Image Processing Tool:
VTK toolkit:
The Visualization Toolkit (VTK) is open-source software for manipulating and
displaying scientific data. It comes with state-of-the-art tools for 3D
rendering, a suite of widgets for 3D interaction, and extensive 2D plotting
OpenCV is a huge open-source library the computer vision, machine learning,
and image processing and now it plays a major role in real-time operation
which is very important in today’s systems.
Machine Learning tools:
R analytics is data analytics using R programming language, an open-source
language used for statistical computing or graphics. This programming
language is often used in statistical analysis and data mining. It can be used
for analytics to identify patterns and build practical models.
Google TensorFlow:
TensorFlow is a free and open-source software library for machine learning
and artificial intelligence. It can be used across a range of tasks but has a
particular focus on training and inference of deep neural networks.
Machine Learning tools:
R analytics is data analytics using R programming language, an open-source language used for statistical computing or graphics.
This programming language is often used in statistical analysis and data mining. It can be used for analytics to identify patterns and
build practical models.
Google Tensor Flow:
Tensor Flow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range
of tasks but has a particular focus on training and inference of deep neural networks.
Apache Mahout:
Apache Mahout is an open-source project to create scalable, machine-learning algorithms. Mahout operates in addition to Hadoop,
which allows you to apply the concept of machine learning via a selection of Mahout algorithms to distributed computing via
Cloud-based platforms for large-scale information storing:
EMC Corporation:
EMC is a multinational provider of products and services related to cloud computing,
storage, big data, data analytics, information security, content management, and converged
infrastructure. EMC was acquired by Dell in September 2016 and the company was
renamed to Dell EMC.
MapR converged data platforms: A platform for all the data and applications. With MapR,
users have a single platform (on a single codebase!) that delivers data-wide convergence. It
is the only platform that has a distributed file system that supports storage and analytics of
data streams, files, and NoSQL tables in the same converged
Apache Pig:
Apache Pig is an abstraction over Map Reduce. It is a tool/platform which is used to analyze
larger sets of data representing them as data flows. Pig is generally used with Hadoop; we
can perform all the data manipulation operations in Hadoop using Apache Pig
 Big Databases
Hive allows users to read, write, and manage petabytes of data using SQL. Hive is built on top of Apache Hadoop, which
is an open-source framework used to efficiently store and process large datasets. As a result, Hive is closely integrated
with Hadoop, and is designed to work quickly on petabytes of data.
Hadoop DB:
Hadoop handles larger data sets but only writes data once. SQL is easier to use but more difficult to scale. Apache
Hadoop is an open-source framework that is used to efficiently store and process large datasets ranging in size from
gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows
clustering multiple computers to analyze massive datasets in parallel more quickly.
Mango DB:
MongoDB is an open-source document-oriented database that is designed to store a large scale of data and also allows
you to work with that data very efficiently. It is categorized under the NoSQL (Not only SQL) database because the storage
and retrieval of data in the MongoDB are not in the form of tables.
Google big table:
Big table is ideal for storing large amounts of single-keyed data with low latency. It supports high read and writes
throughput at low latency, and it's an ideal data source for Map Reduce operations.
 Statistical tools:
Norsys Netica: Netica is a powerful, easy-to-use, complete program for working with belief networks
and influence diagrams. It has an intuitive and smooth user interface for drawing the networks, and
the relationships between variables may be entered as individual probabilities, in the form of
equations, or learned from data files (which may be in ordinary tab-delimited form and have "missing
Weka is a collection of machine-learning algorithms for data mining tasks. The algorithms can either
be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-
processing, classification, regression, clustering, association rules, and visualization.
 The objective of proposed work is to analyses the agriculture data using
data mining In proposed work, agriculture data has been collected from
following sources: Dataset in agricultural sector [, statistics].
 Crop wise agriculture data [html://CROPWISE_NORMAL_AREA],
Agriculture data of different districts
 Agriculture data based on weather, temperature, and relative humidity
Thank you

More Related Content

Similar to big data.pptx

An Overview of Crop Yield Prediction using Machine Learning Approach
An Overview of Crop Yield Prediction using Machine Learning ApproachAn Overview of Crop Yield Prediction using Machine Learning Approach
An Overview of Crop Yield Prediction using Machine Learning Approach
IRJET Journal
Big Data and AI Revolution in Precision Agriculture
Big Data and AI Revolution in Precision AgricultureBig Data and AI Revolution in Precision Agriculture
Big Data and AI Revolution in Precision Agriculture
Datamining 4
Datamining 4Datamining 4
Datamining 4
Crop Prediction using IoT & Machine Learning Algorithm
Crop Prediction using IoT & Machine Learning AlgorithmCrop Prediction using IoT & Machine Learning Algorithm
Crop Prediction using IoT & Machine Learning Algorithm
IRJET Journal
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Survey of Estimation of Crop Yield using Agriculture DataIRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET Journal
IRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET- Crop Yield Prediction and Disease Detection using IoT ApproachIRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET Journal
Selection of crop varieties and yield prediction based on phenotype applying ...
Selection of crop varieties and yield prediction based on phenotype applying ...Selection of crop varieties and yield prediction based on phenotype applying ...
Selection of crop varieties and yield prediction based on phenotype applying ...
IRJET - Agricultural Analysis using Data Mining Techniques
IRJET - Agricultural Analysis using Data Mining TechniquesIRJET - Agricultural Analysis using Data Mining Techniques
IRJET - Agricultural Analysis using Data Mining Techniques
IRJET Journal
An intensive analtics for farmer using big data
An intensive analtics for farmer using big dataAn intensive analtics for farmer using big data
An intensive analtics for farmer using big data
Dr. C.V. Suresh Babu
IRJET- Crop Prediction System using Machine Learning Algorithms
IRJET- Crop Prediction System using Machine Learning AlgorithmsIRJET- Crop Prediction System using Machine Learning Algorithms
IRJET- Crop Prediction System using Machine Learning Algorithms
IRJET Journal
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdfAn Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
IRJET Journal
Automated Machine Learning based Agricultural Suggestion System
Automated Machine Learning based Agricultural Suggestion SystemAutomated Machine Learning based Agricultural Suggestion System
Automated Machine Learning based Agricultural Suggestion System
IRJET Journal
IRJET Journal
IRJET Journal
IRJET- Crop Prediction and Disease Detection
IRJET-  	  Crop Prediction and Disease DetectionIRJET-  	  Crop Prediction and Disease Detection
IRJET- Crop Prediction and Disease Detection
IRJET Journal
Revolution in Farming with Big Data
Revolution in Farming with Big DataRevolution in Farming with Big Data
Revolution in Farming with Big Data
Publisher in research
Publisher in researchPublisher in research
Publisher in research
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
IRJET Journal
Big Data Solution in Agriculture.pdf
Big Data Solution in Agriculture.pdfBig Data Solution in Agriculture.pdf
Big Data Solution in Agriculture.pdf
Lily Williams

Similar to big data.pptx (20)

An Overview of Crop Yield Prediction using Machine Learning Approach
An Overview of Crop Yield Prediction using Machine Learning ApproachAn Overview of Crop Yield Prediction using Machine Learning Approach
An Overview of Crop Yield Prediction using Machine Learning Approach
Big Data and AI Revolution in Precision Agriculture
Big Data and AI Revolution in Precision AgricultureBig Data and AI Revolution in Precision Agriculture
Big Data and AI Revolution in Precision Agriculture
Datamining 4
Datamining 4Datamining 4
Datamining 4
Crop Prediction using IoT & Machine Learning Algorithm
Crop Prediction using IoT & Machine Learning AlgorithmCrop Prediction using IoT & Machine Learning Algorithm
Crop Prediction using IoT & Machine Learning Algorithm
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Survey of Estimation of Crop Yield using Agriculture DataIRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET- Crop Yield Prediction and Disease Detection using IoT ApproachIRJET- Crop Yield Prediction and Disease Detection using IoT Approach
IRJET- Crop Yield Prediction and Disease Detection using IoT Approach
Selection of crop varieties and yield prediction based on phenotype applying ...
Selection of crop varieties and yield prediction based on phenotype applying ...Selection of crop varieties and yield prediction based on phenotype applying ...
Selection of crop varieties and yield prediction based on phenotype applying ...
IRJET - Agricultural Analysis using Data Mining Techniques
IRJET - Agricultural Analysis using Data Mining TechniquesIRJET - Agricultural Analysis using Data Mining Techniques
IRJET - Agricultural Analysis using Data Mining Techniques
An intensive analtics for farmer using big data
An intensive analtics for farmer using big dataAn intensive analtics for farmer using big data
An intensive analtics for farmer using big data
IRJET- Crop Prediction System using Machine Learning Algorithms
IRJET- Crop Prediction System using Machine Learning AlgorithmsIRJET- Crop Prediction System using Machine Learning Algorithms
IRJET- Crop Prediction System using Machine Learning Algorithms
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdfAn Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
An Intensive Analtics for farmer using Big Data by vitul chauhan.pdf
Automated Machine Learning based Agricultural Suggestion System
Automated Machine Learning based Agricultural Suggestion SystemAutomated Machine Learning based Agricultural Suggestion System
Automated Machine Learning based Agricultural Suggestion System
IRJET- Crop Prediction and Disease Detection
IRJET-  	  Crop Prediction and Disease DetectionIRJET-  	  Crop Prediction and Disease Detection
IRJET- Crop Prediction and Disease Detection
Revolution in Farming with Big Data
Revolution in Farming with Big DataRevolution in Farming with Big Data
Revolution in Farming with Big Data
Publisher in research
Publisher in researchPublisher in research
Publisher in research
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Big Data Solution in Agriculture.pdf
Big Data Solution in Agriculture.pdfBig Data Solution in Agriculture.pdf
Big Data Solution in Agriculture.pdf

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

big data.pptx

  • 1. 1
  • 3. 3
  • 6. BIG DATA : Big data is an extensive collection of both structured and unstructured data that can be mined for information and analyzed to build predictive systems for better decision making. Big data is consider as a large collection of dataset which is having high volume, velocity, variety. Volume: The amount of data generated Velocity: How speed data is generated and processed Variety : Variation of data with respect to the time 6
  • 7. BIG DATA ANALYTICS IN AGRICULTURE Big data analysis is successfully adopted to industries like banking, insurance etc. Allthough agriculture did not adopted big data analysis for past few years, recently it is used to use it . Big data analytics in agriculture can be studied under two major areas: Smart farming And precision agriculture. Farmers use big data to get information on changing weather, rainfall, fertilizer usage, and other factors that impact the crop yield. All of the information assists farmers in making accurate and dependable decision that maximize their productivity and cultivating the land. 7
  • 8. The role of Big Data in the Agriculture industry Big data in the agriculture industry is completely based on using technology, information, and analytics to bring useful information to farmers. Big data can be utilized for grabbing information about the agriculture industry or it can prove beneficial for any specific segment or area to improve its efficiency. Data mining processes are utilized by Big Data to create such vital information. With this methodology, you can find the important patterns in a huge set of data and condense this information into useful forms. There are different modern systems, such as artificial intelligence, machine learning statistics, and more, that are used in the big data mechanism. 8
  • 9. How is big data analytics transforming agriculture?  Boosting productivity – Data collected from GPS- equipped tractors, soil sensors, and other external sources has helped in better management of seeds, pesticides, and fertilizers while increasing productivity to feed the ever-increasing global population.  Access to plant genome information – This has allowed the development of useful agronomic traits.  Predicting yields – Mathematical models and machine learning are used to collate and analyze data obtained from yield, chemicals, weather, and biomass index. The use of sensors for data collection reduces erroneous manual work and provides useful insights on yield prediction.  Risk management– Data-driven farming has mitigated crop failures arising due to changing weather patterns.  Food safety – Collection of data relating to temperature, humidity, and chemicals, lowers the risk of food spoilage by early detection of microbes and other contaminants. 9
  • 11. HOW TO USE BIGDATA ANALYTICS IN AGRICULTURE  To counter pressures of increasing food demand and climate changes, policymakers and industry leaders are seeking assistants from technology forces such as Iot,bigdata analytics and cloud computing.  IoT, devices helping first phase of this process –data collection ,sensor plugged in tractors and trucks as well as in fields, soil and plants aid in the collection of real time data directly from the ground.  Second, analysts integrate the large amount of data collected with other information available in the cloud such as weather data pricing with models to determine the patterns.  Finally, these patterns and insights assist in controlling the problem. They helped to pinpoint existence issues, like operational inefficiencies and problems. 11
  • 12.  The adoption of analytics in agriculture has been increasing consistently ;its market size is expected to grow from USD 585million in 2018 to USD 1236 million by 2023 at a compound annual rate (CAGR)of 16.2%. 12
  • 13. USE OF BIG DATA IN AGRICULTURE SECTOR Top four use cases for big data on the form: 1.Feeding a growing population. 2.Using pesticides ethical. 3.Optimizing form equipment. 4.Managing supply chain. 13
  • 14. 1. Feeding a growing population: This is one of the key challenges that even governments are putting their heads together to solve. One way to achieve this is to increase the yield from existing farmlands. Big data provides formers granular data on rainfall patterns, water cycles, fertilizer requirements, and more. This enables them to make smart decisions, such as what crops to plant for better profitability and when to harvest. The right decisions ultimately improve farm yields. 14
  • 15. 2. Using pesticides ethnically  Administration of pesticides has been a contentious issue due to its side effects on the ecosystem. Big data allows farmers to manage this better by recommending what pesticides to apply, when ,and by how much. By monitoring it closely farmers can adhere to government regulations and avoid overuse of chemicals in food production. Moreover, this leads to increased profitability because crops don't get destroyed by weeds and insects 15
  • 16. 3. Optimizing farm equipment  Companies like John Deere have integrated sensors in their farming equipment and deployed big data applications that will help better manage their fleet. For large farms, this level of monitoring can be a life saver as it lets users know of tractor availability, service due dates, and fuel refill alerts . In essence this optimizes usage and ensure the long - term health of farm equipment. 16
  • 17. 4. Managing supply chain issues:  McKinney reports that a third of food produced for human consumption is lost or wasted every year. A devastating fact since the industry struggles to bridge the gap between supply and demand. To address this, food delivery cycle from producer to the market need to be reduced. Big data can help achieve supply chain efficiencies by tracking and optimizing delivery truck routes. 17
  • 18. Analysis of agriculture data using data mining techniques: application of big data In agriculture sector where farmers and agribusinesses have to make innumerable decisions every day and intricate complexities involves the various factors influencing them. An essential issue for agricultural planning intention is the accurate yield estimation for the numerous crops involved in the planning. Data mining techniques are necessary approach for accomplishing practical and effective solutions for this problem. Agriculture has been an obvious target for big data. Environmental conditions, variability in soil, input levels, combinations and commodity prices have made it all the more relevant for farmers to use information and get help to make critical farming decisions. This focuses on the analysis of the agriculture data and finding optimal parameters to maximize the crop production using data mining techniques like PAM, CLARA, DBSCAN and Multiple Linear Regression. Mining the large amount of existing crop, soil and climatic data, and analyzing new, non-experimental data optimizes the production and makes agriculture more resilient to climatic change. Big Data, PAM, CLARA 18
  • 19.  Input dataset consist of 6 year data with following parameters namely: year, State- Karnataka (28 districts), District, crop (cotton, groundnut, jowar, rice and wheat.), season (kharif, rabi, summer), area (in hectares), production (in tonnes), average temperature (°C), average rainfall (mm), soil, PH value, soil type, major fertilizers, nitrogen (kg/Ha), phosphorus (Kg/Ha),Potassium(Kg/Ha), minimum rainfall required, minimum temperature required.  In proposed work, modified approach of DBSCAN method is used to cluster the data based on districts which are having similar temperature, rain fall and soil type. PAM and CLARA are used to cluster the data based on the districts which are producing maximum crop production (In proposed work wheat crop is considered as example). Based on these analyses we are obtaining the optimal parameters to produce the maximum crop production. Multiple linear regression method is used to forecast the annual crop yield 19
  • 20. Partition around medoids (PAM)  It is a partitioning based algorithm. It breaks the input data into number of groups. It finds a set of objects called medoids that are centrally located. With the medoids, nearest data points can be calculated and made it as clusters. The algorithm has two phases: 20
  • 21. 21
  • 22.  1. BUILD phase, a collection of k objects are selected for an initial set S. • Arbitrarily choose k objects as the initial medoids. • Until no change, do. –– (Re) assign each object to the cluster with the nearest medoid. – Improve the quality of the k-medoids .  2. SWAP phase, one tries to improve the quality of the clustering by exchanging selected objects with unselected objects. Choose the minimum swapping cost. Example: For each medoid m1, for each non-medoid data point d; Swap m1 and d, recomputed the cost (sum of distances of points to their medoid), if total cost of the configuration increased in the previous step, undo the swap Fig. 2 depicts the steps involved the PAM algorithms. 22
  • 23. CLARA (clustering large applications)  CLARA (clustering large applications) It is designed by Kaufman and Rousseeuw to handle large datasets, CLARA (clustering large applications) relies on sampling . Instead of finding representative objects for the entire data set, CLARA draws a sample of the data set, applies PAM on the sample, and finds the medoids of the sample. To come up with better approximations, CLARA draws multiple samples and gives the best clustering as the output. Here, for accuracy, the quality of the clustering is measured based on the average dissimilarity of all objects in the entire data set. 23
  • 24. Multiple linear regression to forecast the crop yield Multiple linear regression is a variant of “linear regression” analysis. Tis model is built to establish the relationship that exists between one dependent variable and two or more independent variables .For a given dataset where x1… xk are independent variables and Y is a dependent variable, the multiple linear regression fts the dataset to the model: yi = β0 + β1x1i + β2x2i +···+ βkxki + ε 24
  • 25.  is the y-intercept and β1, β2, ... , βk parameters are called the partial coeffient.In matrix form Y = XB + E Y , Before applying the multiple linear regression to forecast the crop yield, it’s necessary to know the significant attributes from the database. All the attributes used in the database will not be significant or changing the value of these attributes will not affect anything on the dependent variables. Such attributes can be neglected. P value test is performed on the database to find the significant attributes and multiple linear regression is applied only on the significant values to forecast the crop yield 25
  • 27. depicts the different districts of Karnataka which are having similar temperature range, rain fall range and soil types respectively.. 27
  • 28. PAM  PAM To apply the PAM algorithm on the dataset, initially user need to give k (Number of clusters), where k is given as 3 in current experiment. Crop yield is categorised into LOW, MODERATE and HIGH production. Total districts are clustered into 3 clusters using PAM clustering method.  As a result of the analysis, North Karnataka districts such as Bijapur, Dharwad, Bagalkot, Belgaum, Raichur, Bellary, Chitradurga and Davangere are the districts which have maximum wheat crop production 28
  • 30. 30
  • 31. 31
  • 32. Study and analysis of temperature and wheat crop production in different districts of Karnataka as shown in Fig. 12. From the Fig. 12, we can analyze that the optimal temperature for Wheat crop 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. Conclusion  Various data mining techniques are implemented on the input data to assess the best performance yielding method. The present work used data mining techniques PAM, CLARA and DBSCAN to obtain the optimal climate requirement of wheat like optimal range of best temperature, worst temperature and rain fall to achieve higher production of wheat crop. Clustering methods are compared using quality metrics. According to the analyses of clustering quality metrics, DBSCAN gives the better clustering quality than PAM and CLARA, CLARA gives the better clustering quality than the PAM. Te proposed work can also be extended to analyze the soil and other factors for the crop and to increase the crop production under the different climatic conditions 36
  • 37. Image Processing IM toolkit, VTK toolkit, OpenCv library Machine Learning R, Google tenserflow, Apache Mahout, Weka, Mlpack Cloud based platforms for large scale information storing EMC corporation, MapR converged data platforms, Apache Pig, Big Databases Hive, HadoopDB, Mango DB, Google big table, Cassandra, PostGIS Statistical tools Norsys Netica, R, Weka 37
  • 38. Image Processing Tool: VTK toolkit: The Visualization Toolkit (VTK) is open-source software for manipulating and displaying scientific data. It comes with state-of-the-art tools for 3D rendering, a suite of widgets for 3D interaction, and extensive 2D plotting capability. OpenCV: OpenCV is a huge open-source library the computer vision, machine learning, and image processing and now it plays a major role in real-time operation which is very important in today’s systems. 38
  • 39. Machine Learning tools: R: R analytics is data analytics using R programming language, an open-source language used for statistical computing or graphics. This programming language is often used in statistical analysis and data mining. It can be used for analytics to identify patterns and build practical models. Google TensorFlow: TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. 39
  • 40. Machine Learning tools: R: R analytics is data analytics using R programming language, an open-source language used for statistical computing or graphics. This programming language is often used in statistical analysis and data mining. It can be used for analytics to identify patterns and build practical models. Google Tensor Flow: Tensor Flow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. Apache Mahout: Apache Mahout is an open-source project to create scalable, machine-learning algorithms. Mahout operates in addition to Hadoop, which allows you to apply the concept of machine learning via a selection of Mahout algorithms to distributed computing via Hadoop. 40
  • 41. Cloud-based platforms for large-scale information storing: EMC Corporation: EMC is a multinational provider of products and services related to cloud computing, storage, big data, data analytics, information security, content management, and converged infrastructure. EMC was acquired by Dell in September 2016 and the company was renamed to Dell EMC. MapR converged data platforms: A platform for all the data and applications. With MapR, users have a single platform (on a single codebase!) that delivers data-wide convergence. It is the only platform that has a distributed file system that supports storage and analytics of data streams, files, and NoSQL tables in the same converged Apache Pig: Apache Pig is an abstraction over Map Reduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig 41
  • 42.  Big Databases Hive: Hive allows users to read, write, and manage petabytes of data using SQL. Hive is built on top of Apache Hadoop, which is an open-source framework used to efficiently store and process large datasets. As a result, Hive is closely integrated with Hadoop, and is designed to work quickly on petabytes of data. Hadoop DB: Hadoop handles larger data sets but only writes data once. SQL is easier to use but more difficult to scale. Apache Hadoop is an open-source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly. Mango DB: MongoDB is an open-source document-oriented database that is designed to store a large scale of data and also allows you to work with that data very efficiently. It is categorized under the NoSQL (Not only SQL) database because the storage and retrieval of data in the MongoDB are not in the form of tables. Google big table: Big table is ideal for storing large amounts of single-keyed data with low latency. It supports high read and writes throughput at low latency, and it's an ideal data source for Map Reduce operations. 42
  • 43.  Statistical tools: Norsys Netica: Netica is a powerful, easy-to-use, complete program for working with belief networks and influence diagrams. It has an intuitive and smooth user interface for drawing the networks, and the relationships between variables may be entered as individual probabilities, in the form of equations, or learned from data files (which may be in ordinary tab-delimited form and have "missing data"). Weka: Weka is a collection of machine-learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre- processing, classification, regression, clustering, association rules, and visualization. 43
  • 44. References:  The objective of proposed work is to analyses the agriculture data using data mining In proposed work, agriculture data has been collected from following sources: Dataset in agricultural sector [, statistics].  Crop wise agriculture data [html://CROPWISE_NORMAL_AREA], Agriculture data of different districts [], G/statistics.asp],  Agriculture data based on weather, temperature, and relative humidity [http://dmc.]. 44
  • 46. 46