SlideShare a Scribd company logo
Monsanto Company Confidential - Attorney Client Privilege
Geospatial Processing @ Monsanto
Hadoop Summit 2013
Robert Grailer, Big Data Engineer
Erich Hochmuth, Data & Analytics Architecture Lead
Monsanto Company Confidential - Attorney Client Privilege
Our Vision: Sustainable Agriculture
A Strong Vision That Guides All We Do
• Producing More
– We are committed to increasing yields to meet
the growing demand for food, fiber & fuel
• Conserving More
– We are committed to reducing the amount
of land, water and energy needed to
grow our crops
• Improving Lives
– We are committed to improving lives around
the world
2
Monsanto Company Confidential - Attorney Client Privilege
 ADVANCED EQUIPMENT
 AVERAGE CORN YIELD
–300 BU/AC
 AUTOMATED WEATHER
STATIONS
 FIELD SENSORS PROVIDING
INFORMATION
 ADVANCED IMAGERY
TECHNOLOGY
Doubling Yields by 2030 - Farming in the Future
Will Be Increasingly Information-Driven
3
Monsanto Company Confidential - Attorney Client Privilege
4
Planting Prescription 2012
(DKC63-84 Brand)
Target Rate (Count)
(ksds/ac)
38.00 (24.75 ac)
37.00 (22.63 ac)
35.00 (16.60 ac)
34.00 ( 8.23 ac)
33.00 ( 6.00 ac)
32.00 ( 2.82 ac)
Integrated Farming Systems – FieldScriptsSM for 2014
• FieldScripts℠ will deliver, by field, a corn hybrid recommendation utilizing variable
rate seeding by FieldScripts management zones to increase yield potential and
reduce risk
• The science of FieldScripts is based on proprietary algorithms that combine data
from the FieldScripts Testing Network and Monsanto generated hybrid response to
plant population research
Precision Planting
Monsanto Company Confidential - Attorney Client Privilege
IL Irrigated, Back 80
Treatment Yield (bu/ac)
Static|34000 196
FieldScripts (35000) 233
Central IL Dry Land, 47-50
Treatment Yield (bu/ac)
Static|34000 139
FieldScripts (33000) 145
MS Irrigated, 21
Treatment Yield (bu/ac)
Static|34000 166
FieldScripts (34700) 181
2012 Field Trials Indicate 5-10 bu/a Average Yield Gain
5
In the United States Alone:
Corn acres planted in 2013 – 96M
Price of Corn per bushel – $6.93*
Advantage of 5–10 Bu/Ac
*Price reflects CBOT price of corn 1/9/2013
Monsanto Company Confidential - Attorney Client Privilege
Integrated Farming SystemsSM Combine Advanced Seed
Genetics, On-farm Agronomic Practices, Software and Hardware
Innovations to Drive Yield
DATABASE BACKBONE
Expansive product by environment
testing makes on-farm
prescriptions possible
VARIABLE-RATE FERTILITY
Variable rate N, P & K
“Apps” aligned with yield
management zones
PRECISION SEEDING
Planter hardware
systems enabling
variable rate seeding &
row spacing of
multiple hybrids in a
field by yield
management zone
FERTILITY & DISEASE
MANAGEMENT
“Apps” for in-season
custom application of
supplemental late
nitrogen and
fungicides
YIELD MONITOR
Advances in Yield
Monitoring to
deliver higher
resolution data
BREEDING
Significant
increases in data
points collected
per year to
increase annual
rate genetic gain
6
Monsanto Company Confidential - Attorney Client Privilege
Use Case
7
Public Data
Monsanto Data
Grower Data
Standardize
&
Link
Algorithms
• Load thousands of files containing spatial data
• Support diverse range of data types
— tabular, vector, raster
• Join & link data spatially
• Generate dense grid covering entire US
— 120 billion polygons
• Generate a set of derived attributes
— Think moving average
• Make data available for other data products such as Field Scripts
High Level Data Flow
Monsanto Company Confidential - Attorney Client Privilege
Version 1 Architecture
• In RDBMS spatial
• PL/SQL
• Multiple patches to DB Engine
• Just 8% of the data!!
– 35+ days to process
• TBs in indexes
• Tradeoffs
– Compressed vs. Uncompressed
– Performance vs. Storage
– Read vs. Write performance
• Options/recommendations
– Limit use of in DB spatial functionality
– Buy more RDBMS
8
0
10
20
30
Days
Data Processing Time
Soil
Elevation
Spatial Index
Processing
0
50
100
TBs
Data Volumes
Raw Data
Uncompressed
Compressed
Spatial Index
Monsanto Company Confidential - Attorney Client Privilege
Version 2 Architecture
• Combination of MapReduce & HBase
• Leverage existing Hadoop cluster
• MapReduce
– Parallelize everything!
– Bulk HBase loads
• HBase
– Spatial data model
– Custom spatial engine
9
Monsanto Company Confidential - Attorney Client Privilege
Data Ingestion
• Bulk load 1,000s of files into HDFS
• Standardize data
– Common usable format
• Storage vs. Compute
• Raster format is easily splitable
• Hadoop Streaming integrated with GDAL
• Streaming API Lessons Learned
– Lack of documentation
– Counters to track task progress
– Jobs run as mapred user
– HDFS Access outside of MR
10
0
20
40
60
Hours
Data Ingestion Time
RDBMS
Hadoop
NFS
• Raster Images
• Vector Shape Files
• Zip Files
• Text Data
•Unzip
•Convert to Raster
• Re-project
HDFS
Hadoop
Streaming
• Raster Files
Results
Monsanto Company Confidential - Attorney Client Privilege
Data Processing
• Process raster data
– Dense matrix
• Generic InputFormat & RecordReader for
raster data
• HFiles easily transportable between clusters
• Challenges tuning Jobs
– IO Sort Factor
– Split/Task Size
11
HDFS HBase
Generate
Derived
Attributes
• Raster Files
Results
Pre-split
table
Generate
HFiles
0
10
20
30
Days
Data Processing Time
RDBMS
Hadoop
Monsanto Company Confidential - Attorney Client Privilege
HBASE SCHEMA DESIGN
12
Monsanto Company Confidential - Attorney Client Privilege
Geospatial in HBase
Need
– Dense data set
– Complex computations
– Scalable & cost efficient
– Bulk analytics & random reads
HBase
– GeoHash most notable example
• Best suited for sparse data
– Precision of reads
– Alphanumeric key
HBase Considerations
– Key overhead
– Scan vs. Get performance
– Reduce reading unnecessary data
Example Field
Complex Data Interactions
Monsanto Company Confidential - Attorney Client Privilege
Global Coordinate System
Longitude
Latitude-180 180
-90
90
Monsanto Company Confidential - Attorney Client Privilege
Reference System
Longitude
Latitude-180 180
-90
90
Monsanto Company Confidential - Attorney Client Privilege
Reference System Continued
Longitude
Latitude
1 2 3 20
21 22 23
19
381 382 400399
190
-180 180
-90
90
4
Monsanto Company Confidential - Attorney Client Privilege
HBase Schema Take 1
Spatial Table
• Key: cell_id long
• Column Family: A
– Column: Data Holder
• elevation
• slope: float
• aspect: float
17
• Each spatial dataset is a separate table
• All attributes for a layer that are read together are stored together
‒ Attributes packed into a single column as an Avro object
• 1 row per record
• 120 billion rows total!
• 1,000s of Get requests per field
• TBs of key overhead – roughly 56% of the data
Monsanto Company Confidential - Attorney Client Privilege
Reference System Storage Format
• Data grouped into 100 x 100 super cells
• A super cell of 100 x 100 cells is a single row in HBase
• At most 4 disk reads are required to read all data for one layer for a 150 acre field
• Given a bounding box the super cells and attributed grid cells containing the desired
data can easily be computed
• A generic geospatial data service when given a set of layers will read each layer in
parallel
• Overhead of key data reduced from 56% to below 0.1%
Super Grid Cells
Attributed Grid Cells
Spatial Table
• Key: super_cell_id long
• Column Family: A
– Column: Data Holder
• elevation : array float [ values ]
• slope: array float [ values ]
• aspect: array float [ values ]
Monsanto Company Confidential - Attorney Client Privilege
Results
• Significant cost savings in required hardware
• 120 billion unique polygons in total
• 1.5 trillion data points
• Dense grid of the entire U.S.
• Foundational architecture for other spatial data sets
• Fully unit tested implementation
RDBMS
• 4 states only
• 30+ days to load
• 8 months of dev.
Hadoop
• Entire U.S.
• 18 hour load time
• 3 months of dev.
• 100% scalable
• Cloud ready
0
10
20
30
Days
Total Data Processing Time
RDBMS Hadoop
8% of the
data
Full
data set
Total Run Time
Monsanto Company Confidential - Attorney Client Privilege
Thank You
20

More Related Content

What's hot

Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
Amr Awadallah
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016Derek Downey
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient Backups
Mydbops
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
Spark overview
Spark overviewSpark overview
Spark overview
Lisa Hua
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developersSergio Bossa
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
Vigen Sahakyan
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
leanderlee2
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
Milind Bhandarkar
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
MongoDB
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Edureka!
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
PgDay.Seoul
 
ProxySQL Cluster - Percona Live 2022
ProxySQL Cluster - Percona Live 2022ProxySQL Cluster - Percona Live 2022
ProxySQL Cluster - Percona Live 2022
René Cannaò
 

What's hot (20)

Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient Backups
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Spark overview
Spark overviewSpark overview
Spark overview
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
 
ProxySQL Cluster - Percona Live 2022
ProxySQL Cluster - Percona Live 2022ProxySQL Cluster - Percona Live 2022
ProxySQL Cluster - Percona Live 2022
 

Viewers also liked

HGrid A Data Model for Large Geospatial Data Sets in HBase
HGrid A Data Model for Large Geospatial Data Sets in HBaseHGrid A Data Model for Large Geospatial Data Sets in HBase
HGrid A Data Model for Large Geospatial Data Sets in HBaseDan Han
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and Hadoop
Hortonworks
 
Spatial Data processing with Hadoop
Spatial Data processing with HadoopSpatial Data processing with Hadoop
Spatial Data processing with Hadoop
VisionGEOMATIQUE2014
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
Adam Muise
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
Ravi Veeramachaneni
 
Computation of spatial data on Hadoop Cluster
Computation of spatial data on Hadoop ClusterComputation of spatial data on Hadoop Cluster
Computation of spatial data on Hadoop ClusterAbhishek Sagar
 
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons LearnedHadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
DataWorks Summit
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0
Matt Turck
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
Monsanto Automates R&D Decisions with Big Data
Monsanto Automates R&D Decisions with Big DataMonsanto Automates R&D Decisions with Big Data
Monsanto Automates R&D Decisions with Big Data
Cloudera, Inc.
 
Open Source Geospatial
Open Source GeospatialOpen Source Geospatial
Open Source Geospatial
Jody Garnett
 
Bringing Geospatial Business Intelligence to the Enterprise
Bringing Geospatial Business Intelligenceto the EnterpriseBringing Geospatial Business Intelligenceto the Enterprise
Bringing Geospatial Business Intelligence to the Enterprisemkarren
 
MLconf NYC Josh Wills
MLconf NYC Josh WillsMLconf NYC Josh Wills
MLconf NYC Josh WillsMLconf
 
Promoting Geospatial Education in Europe
Promoting Geospatial Education in EuropePromoting Geospatial Education in Europe
Promoting Geospatial Education in Europe
Karl Donert
 
MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware...
MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware...MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware...
MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware...nishimurashoji
 
Кирилл Алешин - Big Data и Lambda архитектура на практике
Кирилл Алешин - Big Data и Lambda архитектура на практикеКирилл Алешин - Big Data и Lambda архитектура на практике
Кирилл Алешин - Big Data и Lambda архитектура на практике
IT Share
 
NUS-ISS PCP for FullStack Software Developers
NUS-ISS PCP for FullStack Software DevelopersNUS-ISS PCP for FullStack Software Developers
NUS-ISS PCP for FullStack Software Developers
NUS-ISS
 
A Producer’s Perspective: Agriculture and Nitrogen Deposition in Rocky Mounta...
A Producer’s Perspective: Agriculture and Nitrogen Deposition in Rocky Mounta...A Producer’s Perspective: Agriculture and Nitrogen Deposition in Rocky Mounta...
A Producer’s Perspective: Agriculture and Nitrogen Deposition in Rocky Mounta...
LPE Learning Center
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
 

Viewers also liked (20)

HGrid A Data Model for Large Geospatial Data Sets in HBase
HGrid A Data Model for Large Geospatial Data Sets in HBaseHGrid A Data Model for Large Geospatial Data Sets in HBase
HGrid A Data Model for Large Geospatial Data Sets in HBase
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and Hadoop
 
Spatial Data processing with Hadoop
Spatial Data processing with HadoopSpatial Data processing with Hadoop
Spatial Data processing with Hadoop
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Computation of spatial data on Hadoop Cluster
Computation of spatial data on Hadoop ClusterComputation of spatial data on Hadoop Cluster
Computation of spatial data on Hadoop Cluster
 
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons LearnedHadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Spatial database
Spatial databaseSpatial database
Spatial database
 
Monsanto Automates R&D Decisions with Big Data
Monsanto Automates R&D Decisions with Big DataMonsanto Automates R&D Decisions with Big Data
Monsanto Automates R&D Decisions with Big Data
 
Open Source Geospatial
Open Source GeospatialOpen Source Geospatial
Open Source Geospatial
 
Bringing Geospatial Business Intelligence to the Enterprise
Bringing Geospatial Business Intelligenceto the EnterpriseBringing Geospatial Business Intelligenceto the Enterprise
Bringing Geospatial Business Intelligence to the Enterprise
 
MLconf NYC Josh Wills
MLconf NYC Josh WillsMLconf NYC Josh Wills
MLconf NYC Josh Wills
 
Promoting Geospatial Education in Europe
Promoting Geospatial Education in EuropePromoting Geospatial Education in Europe
Promoting Geospatial Education in Europe
 
MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware...
MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware...MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware...
MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware...
 
Кирилл Алешин - Big Data и Lambda архитектура на практике
Кирилл Алешин - Big Data и Lambda архитектура на практикеКирилл Алешин - Big Data и Lambda архитектура на практике
Кирилл Алешин - Big Data и Lambda архитектура на практике
 
NUS-ISS PCP for FullStack Software Developers
NUS-ISS PCP for FullStack Software DevelopersNUS-ISS PCP for FullStack Software Developers
NUS-ISS PCP for FullStack Software Developers
 
A Producer’s Perspective: Agriculture and Nitrogen Deposition in Rocky Mounta...
A Producer’s Perspective: Agriculture and Nitrogen Deposition in Rocky Mounta...A Producer’s Perspective: Agriculture and Nitrogen Deposition in Rocky Mounta...
A Producer’s Perspective: Agriculture and Nitrogen Deposition in Rocky Mounta...
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 

Similar to Building a geospatial processing pipeline using Hadoop and HBase and how Monsanto is using it to help farmers increase their yield

Software for the Hydrographic ocean
Software for the Hydrographic oceanSoftware for the Hydrographic ocean
Software for the Hydrographic ocean
Hydrographic Society Benelux
 
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
StampedeCon
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
ahmed alshikh
 
Introduction to DDS: Context, Information Model, Security, and Applications.
Introduction to DDS: Context, Information Model, Security, and Applications.Introduction to DDS: Context, Information Model, Security, and Applications.
Introduction to DDS: Context, Information Model, Security, and Applications.
Gerardo Pardo-Castellote
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSS
Keyur Thakore
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
Amr Kamel Deklel
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
SreeSowmya7
 
Big ideas for using data by Brett Whelan University of Sydney
Big ideas for using data by Brett Whelan University of SydneyBig ideas for using data by Brett Whelan University of Sydney
Big ideas for using data by Brett Whelan University of Sydney
Amanda Woods
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
Softweb Solutions
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
DataWorks Summit
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Joel Saltz
 
Uniting traditional GIS and mainstream IT
Uniting traditional GIS and mainstream ITUniting traditional GIS and mainstream IT
Uniting traditional GIS and mainstream IT
gssg
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
Cloudera, Inc.
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
Cloudera, Inc.
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
Doug O'Flaherty
 
Rethink Server Backup and Regain Control
Rethink Server Backup and Regain ControlRethink Server Backup and Regain Control
Rethink Server Backup and Regain Control
Druva
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC
 

Similar to Building a geospatial processing pipeline using Hadoop and HBase and how Monsanto is using it to help farmers increase their yield (20)

Software for the Hydrographic ocean
Software for the Hydrographic oceanSoftware for the Hydrographic ocean
Software for the Hydrographic ocean
 
Big data
Big dataBig data
Big data
 
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Introduction to DDS: Context, Information Model, Security, and Applications.
Introduction to DDS: Context, Information Model, Security, and Applications.Introduction to DDS: Context, Information Model, Security, and Applications.
Introduction to DDS: Context, Information Model, Security, and Applications.
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSS
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Big ideas for using data by Brett Whelan University of Sydney
Big ideas for using data by Brett Whelan University of SydneyBig ideas for using data by Brett Whelan University of Sydney
Big ideas for using data by Brett Whelan University of Sydney
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
 
Uniting traditional GIS and mainstream IT
Uniting traditional GIS and mainstream ITUniting traditional GIS and mainstream IT
Uniting traditional GIS and mainstream IT
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
 
Rethink Server Backup and Regain Control
Rethink Server Backup and Regain ControlRethink Server Backup and Regain Control
Rethink Server Backup and Regain Control
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Building a geospatial processing pipeline using Hadoop and HBase and how Monsanto is using it to help farmers increase their yield

  • 1. Monsanto Company Confidential - Attorney Client Privilege Geospatial Processing @ Monsanto Hadoop Summit 2013 Robert Grailer, Big Data Engineer Erich Hochmuth, Data & Analytics Architecture Lead
  • 2. Monsanto Company Confidential - Attorney Client Privilege Our Vision: Sustainable Agriculture A Strong Vision That Guides All We Do • Producing More – We are committed to increasing yields to meet the growing demand for food, fiber & fuel • Conserving More – We are committed to reducing the amount of land, water and energy needed to grow our crops • Improving Lives – We are committed to improving lives around the world 2
  • 3. Monsanto Company Confidential - Attorney Client Privilege  ADVANCED EQUIPMENT  AVERAGE CORN YIELD –300 BU/AC  AUTOMATED WEATHER STATIONS  FIELD SENSORS PROVIDING INFORMATION  ADVANCED IMAGERY TECHNOLOGY Doubling Yields by 2030 - Farming in the Future Will Be Increasingly Information-Driven 3
  • 4. Monsanto Company Confidential - Attorney Client Privilege 4 Planting Prescription 2012 (DKC63-84 Brand) Target Rate (Count) (ksds/ac) 38.00 (24.75 ac) 37.00 (22.63 ac) 35.00 (16.60 ac) 34.00 ( 8.23 ac) 33.00 ( 6.00 ac) 32.00 ( 2.82 ac) Integrated Farming Systems – FieldScriptsSM for 2014 • FieldScripts℠ will deliver, by field, a corn hybrid recommendation utilizing variable rate seeding by FieldScripts management zones to increase yield potential and reduce risk • The science of FieldScripts is based on proprietary algorithms that combine data from the FieldScripts Testing Network and Monsanto generated hybrid response to plant population research Precision Planting
  • 5. Monsanto Company Confidential - Attorney Client Privilege IL Irrigated, Back 80 Treatment Yield (bu/ac) Static|34000 196 FieldScripts (35000) 233 Central IL Dry Land, 47-50 Treatment Yield (bu/ac) Static|34000 139 FieldScripts (33000) 145 MS Irrigated, 21 Treatment Yield (bu/ac) Static|34000 166 FieldScripts (34700) 181 2012 Field Trials Indicate 5-10 bu/a Average Yield Gain 5 In the United States Alone: Corn acres planted in 2013 – 96M Price of Corn per bushel – $6.93* Advantage of 5–10 Bu/Ac *Price reflects CBOT price of corn 1/9/2013
  • 6. Monsanto Company Confidential - Attorney Client Privilege Integrated Farming SystemsSM Combine Advanced Seed Genetics, On-farm Agronomic Practices, Software and Hardware Innovations to Drive Yield DATABASE BACKBONE Expansive product by environment testing makes on-farm prescriptions possible VARIABLE-RATE FERTILITY Variable rate N, P & K “Apps” aligned with yield management zones PRECISION SEEDING Planter hardware systems enabling variable rate seeding & row spacing of multiple hybrids in a field by yield management zone FERTILITY & DISEASE MANAGEMENT “Apps” for in-season custom application of supplemental late nitrogen and fungicides YIELD MONITOR Advances in Yield Monitoring to deliver higher resolution data BREEDING Significant increases in data points collected per year to increase annual rate genetic gain 6
  • 7. Monsanto Company Confidential - Attorney Client Privilege Use Case 7 Public Data Monsanto Data Grower Data Standardize & Link Algorithms • Load thousands of files containing spatial data • Support diverse range of data types — tabular, vector, raster • Join & link data spatially • Generate dense grid covering entire US — 120 billion polygons • Generate a set of derived attributes — Think moving average • Make data available for other data products such as Field Scripts High Level Data Flow
  • 8. Monsanto Company Confidential - Attorney Client Privilege Version 1 Architecture • In RDBMS spatial • PL/SQL • Multiple patches to DB Engine • Just 8% of the data!! – 35+ days to process • TBs in indexes • Tradeoffs – Compressed vs. Uncompressed – Performance vs. Storage – Read vs. Write performance • Options/recommendations – Limit use of in DB spatial functionality – Buy more RDBMS 8 0 10 20 30 Days Data Processing Time Soil Elevation Spatial Index Processing 0 50 100 TBs Data Volumes Raw Data Uncompressed Compressed Spatial Index
  • 9. Monsanto Company Confidential - Attorney Client Privilege Version 2 Architecture • Combination of MapReduce & HBase • Leverage existing Hadoop cluster • MapReduce – Parallelize everything! – Bulk HBase loads • HBase – Spatial data model – Custom spatial engine 9
  • 10. Monsanto Company Confidential - Attorney Client Privilege Data Ingestion • Bulk load 1,000s of files into HDFS • Standardize data – Common usable format • Storage vs. Compute • Raster format is easily splitable • Hadoop Streaming integrated with GDAL • Streaming API Lessons Learned – Lack of documentation – Counters to track task progress – Jobs run as mapred user – HDFS Access outside of MR 10 0 20 40 60 Hours Data Ingestion Time RDBMS Hadoop NFS • Raster Images • Vector Shape Files • Zip Files • Text Data •Unzip •Convert to Raster • Re-project HDFS Hadoop Streaming • Raster Files Results
  • 11. Monsanto Company Confidential - Attorney Client Privilege Data Processing • Process raster data – Dense matrix • Generic InputFormat & RecordReader for raster data • HFiles easily transportable between clusters • Challenges tuning Jobs – IO Sort Factor – Split/Task Size 11 HDFS HBase Generate Derived Attributes • Raster Files Results Pre-split table Generate HFiles 0 10 20 30 Days Data Processing Time RDBMS Hadoop
  • 12. Monsanto Company Confidential - Attorney Client Privilege HBASE SCHEMA DESIGN 12
  • 13. Monsanto Company Confidential - Attorney Client Privilege Geospatial in HBase Need – Dense data set – Complex computations – Scalable & cost efficient – Bulk analytics & random reads HBase – GeoHash most notable example • Best suited for sparse data – Precision of reads – Alphanumeric key HBase Considerations – Key overhead – Scan vs. Get performance – Reduce reading unnecessary data Example Field Complex Data Interactions
  • 14. Monsanto Company Confidential - Attorney Client Privilege Global Coordinate System Longitude Latitude-180 180 -90 90
  • 15. Monsanto Company Confidential - Attorney Client Privilege Reference System Longitude Latitude-180 180 -90 90
  • 16. Monsanto Company Confidential - Attorney Client Privilege Reference System Continued Longitude Latitude 1 2 3 20 21 22 23 19 381 382 400399 190 -180 180 -90 90 4
  • 17. Monsanto Company Confidential - Attorney Client Privilege HBase Schema Take 1 Spatial Table • Key: cell_id long • Column Family: A – Column: Data Holder • elevation • slope: float • aspect: float 17 • Each spatial dataset is a separate table • All attributes for a layer that are read together are stored together ‒ Attributes packed into a single column as an Avro object • 1 row per record • 120 billion rows total! • 1,000s of Get requests per field • TBs of key overhead – roughly 56% of the data
  • 18. Monsanto Company Confidential - Attorney Client Privilege Reference System Storage Format • Data grouped into 100 x 100 super cells • A super cell of 100 x 100 cells is a single row in HBase • At most 4 disk reads are required to read all data for one layer for a 150 acre field • Given a bounding box the super cells and attributed grid cells containing the desired data can easily be computed • A generic geospatial data service when given a set of layers will read each layer in parallel • Overhead of key data reduced from 56% to below 0.1% Super Grid Cells Attributed Grid Cells Spatial Table • Key: super_cell_id long • Column Family: A – Column: Data Holder • elevation : array float [ values ] • slope: array float [ values ] • aspect: array float [ values ]
  • 19. Monsanto Company Confidential - Attorney Client Privilege Results • Significant cost savings in required hardware • 120 billion unique polygons in total • 1.5 trillion data points • Dense grid of the entire U.S. • Foundational architecture for other spatial data sets • Fully unit tested implementation RDBMS • 4 states only • 30+ days to load • 8 months of dev. Hadoop • Entire U.S. • 18 hour load time • 3 months of dev. • 100% scalable • Cloud ready 0 10 20 30 Days Total Data Processing Time RDBMS Hadoop 8% of the data Full data set Total Run Time
  • 20. Monsanto Company Confidential - Attorney Client Privilege Thank You 20

Editor's Notes

  1. http://psipunk.com/page/18/With big agricultural farms getting smaller due to fast growing population, we need some compact and efficient tools of farming to balance structured agriculture with nature to ensure a healthy ecosystem around us. Offering a solution, the “Agria” by Julia Kaisinger, Katharina Unger and Stefan Riegbauer is an autonomous farm robot for sowing and plant protection in small farms. Featuring infrared and UV light to control bugs, fungi and pests, the modular machine examines the soil and plants regularly to allow specific treatment. Placing seeds and fertilizer in the right place and proportion, the Agria works with an intelligent network of fields and machines, supplied by a local station, which can be controlled through a computer or smartphone, so you may store and share data with experts for better analysis.
  2. Agriculture is going through transition via adoption of breakthrough technologies in seed genetics, farm equipment hardware and software, and farm practices – akin to the advances in computer technology ushering in the modern information technology era;Growers are getting increasingly swamped by information – much of it needing further thoughtful analysis leading to extraction and integration of actionable information.  Monsanto is gearing up to do that;Anyone interested in developing improved agronomic practices or information apps that contribute to increasing yield or improving life on the farm should get in touch with us (leave contact information at the Monsanto booth).
  3. General data flow
  4. Split and Task sizes were a challenge because of number of files to be processed and metadata needed to process each task. Data generation for only the United States so only 15% of all SuperCells covering the world were used. Presplit of table to even hfiles.