5. Data, data and data…
The data integration objective is to "offer uniform
access to a set of autonomous and heterogeneous
data sources".
Doan, AnHai, Alon Halevy, and Zachary Ives. Principles of data integration. Elsevier, 2012.
8. Examples: operations files
● Point data
● Very dense
● Planting, harvesting, and
applications
● Many different types of
files and formats
● Pieces of operations
delivered from different
sources
https://medium.com/leaf-agriculture/merge-of-files-into-operations-1e62726df64d
9. Examples: satellite imagery
● Raster data
● Very dense
● Different light spectrum
frequencies (bands)
● Many different types of files
and formats
● Different resolution
https://withleaf.io/en/blog/ndvi-vs-ndre/
12. A taxonomy for Precision Agriculture
https://www.linkedin.com/pulse/towards-taxonomy-data-precision-agriculture-time-luiz-henrique
13. Challenges - why not just use the standards?
● Many formats, mostly of
general purpose
● GIS and Big Data
● Lack of developers
trained
● The technology will not
wait to our standards
14. Challenges - Many types of data sources
● Irrigation
● Soil
● Weather
● FMIS
● …
● And what is still been developed will also need data
integration (robots, Ag BioTech)
15. Challenges
● Big areas + Big time = Big data
● Storing, processing, securing, making sense
21. What people is doing with this data?
● Crop Insurance:
○ https://withleaf.io/en/blog/3-reasons-to-use-mor
e-data-in-crop-insurance/
● Seeds improvements
● Farm management
○ Machines operations
○ Fields monitoring
● Farm financial organization
● Ecological assets management
● …
22. Opportunities
● Blockchain
○ Smart contracts
● Big data in general
○ Data lakes
■ Huge amount of data
■ Historical data is very important
○ NoSQL
■ Highly unstructured data
● GIS
○ Almost everything
23. Opportunities
● Data integration
○ Offer different versions of the same data:
■ Raw data
■ Standardized data
■ Cleaned
■ Operations
■ Image
○ Link the data:
■ Operations files relate to fields related to machines
related to the resources used
● Developer centric approach
○ Better documentation
○ Dismistify AgTech to developers (we are just starting)
24. How AI can Improve Agriculture
● For sure AI will improve all the
traditional fields of agriculture,
but the following has a huge
potential:
○ Robotics
○ ESG
○ Real Time insights
■ Hydric stress
■ Machine maintenance
■ Farm to fork
25. Robotics
https://www.youtube.com/watch?v=Ql8MbI2oXzM
● Involves many segments of ML/AI
including vision and sensors
● Benefits of the use of smart
machines in agriculture:
○ Reduce waste
○ Reduce resources
consumption, specially water
○ Reduce pollution
○ Improve safety and labor
quality for the workers
26. Real Time insights
● Provide short term insights for the
farmers will became crucial
● Examples: pest, hydric stress,
machinery breakdown
● This involves using a huge amount
of data into machine learn
pipelines and delivering insights
direct in the field
28. How to start with Leaf?
1. Register your account on:
https://withleaf.io/account/quickstart
2. We will provide sample data for satellite and
operations - we can help with more data upon
request
3. Start building:
○ https://learn.withleaf.io/
○ https://github.com/Leaf-Agriculture/Leaf-API-Po
stman-Collection
4. Register for free credits:
○ https://withleaf.io/account/startups
29. Ideas on how to organize a data pipeline for production
● Remmember: GIS + BigData (time series) + ML
● GeoJSON as format
● GIS manipulation
○ Python is the best
○ GDAL + GeoPandas
● BigData
○ Spark and Sedona
● Database
○ MongoDB
31. Infrastructure: MongoDB
● Faster and more scalable than PostGIS
○ Specially true when you have a write load
● GeoJSON :)
● But avoid storing point data as much as possible,
better keep it in files using GeoJSON or GeoParquet
33. Example: identify yield level in different
areas of a field
https://www.linkedin.com/pulse/creating-field-zones-apache-spark-using-leaf-data-luiz-henrique/
● Task:
○ Read a file with yield point data
(provided by Leaf in a GeoJSON
https://learn.withleaf.io/docs/operati
ons_sample_output#field-operation
s-filtered-geojson)
○ Classify the points using k-means
○ Generate polygons using Apache
Sedona buffer function
○ …
34. Example: identify yield level in different
areas of a field
https://www.linkedin.com/pulse/creating-field-zones-apache-spark-using-leaf-data-luiz-henrique/
● Task:
○ …
○ Discover which variable affected the
low yield areas using regression
○ Send an alert to the farm with the
property that is affecting the yield
35. Example: mixing Spark ML and Sedona
● Imagine a field where there is a harvest operation
● This field will have different levels of yield depending
on the point
● We can use Spark ML to group the points accordingly
to moisture
36. Example: mixing Spark ML and Sedona
○ Transform the RDD into a SpatialRDD
○ Run a SQL transformation using the buffer
function
37. Conclusions
● The data created in the farms is yet to be unlocked
● The standards are very important, but only part of
the solution
● Even with the best standards, there will always be a
gap for new data providers and use cases
https://withleaf.io/registration/
38. Where to learn more?
● Journals
○ AgFunder
○ Computers and Electronics in Agriculture
● Podcasts
○ AgTech… so what?
https://www.agtechsowhat.com/
○ AgTech Garage podcast
https://open.spotify.com/show/3MhDeWCL5ElGk
228WcCs9w
● Leaf's blog and webinars:
https://withleaf.io/en/blog