SlideShare a Scribd company logo
Processing millions of measurements per second
Flink Streaming at John Deere
© 2019, Deere & Co. All rights reserved.
About John Deere
Agricultural Equipment Construction Equipment
Turf Equipment Forestry Equipment
© 2019, Deere & Co. All rights reserved.
Our Purpose: Committed to Those Linked to the Land
We will help our customers – those who cultivate, harvest, transform, enrich, or
build upon the land – meet the world's dramatically increasing need for food,
fuel, and infrastructure. In so doing, we will support a higher quality of life
around the world.
Global population is increasing
Arable land is fixed
About John Deere
© 2019, Deere & Co. All rights reserved.
John Deere Intelligent Solutions Group
© 2019, Deere & Co. All rights reserved.
ExactEmerge™ Planter
15 sensor readings
x
5 hertz
x
32 row units
=
2400 readings / sec
----
10 miles / hr
160k seeds / ac
© 2019, Deere & Co. All rights reserved.
A “typical” Field
48 Acres
1.5 Million Corn
Plants
2 Billion Kernels
Spatially divided into
100000 3’x3’
sections
© 2019, Deere & Co. All rights reserved.
World Wide Data Processing
• Each dot
represents a
machine
capturing data
• 5738 active
sessions
• 12 million
measurements
per second
• 720 million
measurements
in 60 seconds
© 2019, Deere & Co. All rights reserved.
Use Cases – Precision Analysis
• Data is rasterized at the operation level
for precision analysis and visualization
• Full resolution to 0.1493 m/cell, on a
256x256 cell raster
• Can perform real-time evaluation,
combination and visualization of 1 to n
measurements via a robust API
© 2019, Deere & Co. All rights reserved.
Use Cases – Large Scale Analysis
• 1 to n sessions can be aggregated to
generate totals
• Arbitrary criteria can be used to filter results
• Spatially organized
• 2.5B stored layers
• Example - Average yield of corn in Polk
County Iowa, in 2018, grouped by average
harvester speed
© 2019, Deere & Co. All rights reserved.
Ingestion
Constant Stream
Micro-batches
Large Batch
© 2019, Deere & Co. All rights reserved.
Ingestion
Stream or Batch Processing?
• Zip up the stream and
process it as a batch?
• Unzip the batch and
process it as a stream?
• Some of both?
© 2019, Deere & Co. All rights reserved.
Streaming – The Lowest Common Denominator
Kinesis Data Stream
… but not always the best choice
© 2019, Deere & Co. All rights reserved.
Retaining Batch Cohesion
Kinesis Data Stream
© 2019, Deere & Co. All rights reserved.
Stateless Stream Processing
Decoder
Concerns:
• Can I keep up?
• Can I recover?
© 2019, Deere & Co. All rights reserved.
Keeping Up - Options
1.MoreShards
2. Bigger Decoder
Instances
Consumer
Decoder
Decoder
Decoder3. Fan Out
© 2019, Deere & Co. All rights reserved.
Stateful Stream Processing
512,107 seeds 4,804,347 seeds
More Concerns:
1. How do I group
related data?
2. How do I handle
late arriving data?
3. How do I ensure
exactly once
processing?
© 2019, Deere & Co. All rights reserved.
Apache Flink
© 2019, Deere & Co. All rights reserved.
Checkpoints, Savepoints, and Other Painpoints
Some problems we’ve had:
• Long checkpoint durations
• Very large checkpoints & savepoints
• S3 throttling
• Checkpoint timeout spiral
© 2019, Deere & Co. All rights reserved.
Checkpoints, Savepoints, and Other Painpoints
Some tips:
• Try to avoid backpressure
• Limit / reduce the amount of state we are
keeping
• Very long checkpoint duration
• Removing checkpoints altogether
© 2019, Deere & Co. All rights reserved.
Scaling and Spillway
• Flink/EMR does not autoscale
• Our data is very spiky.
• Irregular bursts of data
• Inconsistent record size
© 2019, Deere & Co. All rights reserved.
Scaling and Spillway
Solution - Spillway
• If backpressure is detected, start
piping records to a new stream
• Monitor stream, if record count
goes up, spin up a new cluster
• When record count goes down,
tear down cluster
• Can cascade if needed
© 2019, Deere & Co. All rights reserved.
Validation at Scale
• 26.8 Trillion Measurements (so far)
• Even at 6 Sigma that is 92 Million failures
• How to tackle this:
• Logging - Elasticsearch/Kibana with careful
grooming of what to log
• Audits - Periodic jobs that evaluate statistical
success
• Monitoring – Cloudwatch Dashboards and
Alarms
• Investigator – Internally developed spark
based tool that does analysis on failures at
scale.
© 2019, Deere & Co. All rights reserved.
John Deere Careers
http://jobs.deere.com
Now hiring:
• ML / AI
• Vision and Perception
• Data Science
• Telematics
• Robotics
• Mobile Software
• Embedded Software
• Software Engineering
• Architecture
Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

More Related Content

Similar to Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

ePlanPro System Snapshot
ePlanPro System Snapshot ePlanPro System Snapshot
ePlanPro System Snapshot
John K. Carroll III
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
Anthony Baker
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
Apache Geode
 
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder
 
EMEA14_CGI_FerreiraSergioPereira_WindEnergyManagementSystemPoweredbythePISyst...
EMEA14_CGI_FerreiraSergioPereira_WindEnergyManagementSystemPoweredbythePISyst...EMEA14_CGI_FerreiraSergioPereira_WindEnergyManagementSystemPoweredbythePISyst...
EMEA14_CGI_FerreiraSergioPereira_WindEnergyManagementSystemPoweredbythePISyst...Nuno Ferreira
 
Testing in digital agriculture
Testing in digital agricultureTesting in digital agriculture
Testing in digital agriculture
Heemeng Foo
 
Big Data Summit-Hudson Panel
Big Data Summit-Hudson PanelBig Data Summit-Hudson Panel
Big Data Summit-Hudson Panel
Madison Ingold
 
Agriculture and Big Data
Agriculture and Big DataAgriculture and Big Data
Agriculture and Big Data
UIResearchPark
 
NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!
DataCore Software
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLPublic Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQL
EDB
 
Cutting Costs in COVID-19
Cutting Costs in COVID-19Cutting Costs in COVID-19
Cutting Costs in COVID-19
Jeffery Smith
 
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展
Etu Solution
 
Building Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache GeodeBuilding Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache Geode
imcpune
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
Jason Hubbard
 
Infinidat InfiniGuard
Infinidat InfiniGuardInfinidat InfiniGuard
Infinidat InfiniGuard
MarketingArrowECS_CZ
 
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
HostedbyConfluent
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
DataWorks Summit
 
Rethink Server Backup and Regain Control
Rethink Server Backup and Regain ControlRethink Server Backup and Regain Control
Rethink Server Backup and Regain Control
Druva
 
Azure Reserved VM Instances Made Simple
Azure Reserved VM Instances Made SimpleAzure Reserved VM Instances Made Simple
Azure Reserved VM Instances Made Simple
CloudHealth by VMware
 
Navigating a Mesh of Microservices in the new Cloud-Native World with Istio
Navigating a Mesh of Microservices in the new Cloud-Native World with IstioNavigating a Mesh of Microservices in the new Cloud-Native World with Istio
Navigating a Mesh of Microservices in the new Cloud-Native World with Istio
Gary Arora
 

Similar to Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler (20)

ePlanPro System Snapshot
ePlanPro System Snapshot ePlanPro System Snapshot
ePlanPro System Snapshot
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
 
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
eFolder Partner Chat Webinar — Spring Cleaning: Getting Your Clients to Ditch...
 
EMEA14_CGI_FerreiraSergioPereira_WindEnergyManagementSystemPoweredbythePISyst...
EMEA14_CGI_FerreiraSergioPereira_WindEnergyManagementSystemPoweredbythePISyst...EMEA14_CGI_FerreiraSergioPereira_WindEnergyManagementSystemPoweredbythePISyst...
EMEA14_CGI_FerreiraSergioPereira_WindEnergyManagementSystemPoweredbythePISyst...
 
Testing in digital agriculture
Testing in digital agricultureTesting in digital agriculture
Testing in digital agriculture
 
Big Data Summit-Hudson Panel
Big Data Summit-Hudson PanelBig Data Summit-Hudson Panel
Big Data Summit-Hudson Panel
 
Agriculture and Big Data
Agriculture and Big DataAgriculture and Big Data
Agriculture and Big Data
 
NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLPublic Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQL
 
Cutting Costs in COVID-19
Cutting Costs in COVID-19Cutting Costs in COVID-19
Cutting Costs in COVID-19
 
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展
 
Building Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache GeodeBuilding Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache Geode
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
Infinidat InfiniGuard
Infinidat InfiniGuardInfinidat InfiniGuard
Infinidat InfiniGuard
 
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
 
Rethink Server Backup and Regain Control
Rethink Server Backup and Regain ControlRethink Server Backup and Regain Control
Rethink Server Backup and Regain Control
 
Azure Reserved VM Instances Made Simple
Azure Reserved VM Instances Made SimpleAzure Reserved VM Instances Made Simple
Azure Reserved VM Instances Made Simple
 
Navigating a Mesh of Microservices in the new Cloud-Native World with Istio
Navigating a Mesh of Microservices in the new Cloud-Native World with IstioNavigating a Mesh of Microservices in the new Cloud-Native World with Istio
Navigating a Mesh of Microservices in the new Cloud-Native World with Istio
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 

Recently uploaded (20)

A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 

Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

  • 1. Processing millions of measurements per second Flink Streaming at John Deere
  • 2. © 2019, Deere & Co. All rights reserved. About John Deere Agricultural Equipment Construction Equipment Turf Equipment Forestry Equipment
  • 3. © 2019, Deere & Co. All rights reserved. Our Purpose: Committed to Those Linked to the Land We will help our customers – those who cultivate, harvest, transform, enrich, or build upon the land – meet the world's dramatically increasing need for food, fuel, and infrastructure. In so doing, we will support a higher quality of life around the world. Global population is increasing Arable land is fixed About John Deere
  • 4. © 2019, Deere & Co. All rights reserved. John Deere Intelligent Solutions Group
  • 5. © 2019, Deere & Co. All rights reserved. ExactEmerge™ Planter 15 sensor readings x 5 hertz x 32 row units = 2400 readings / sec ---- 10 miles / hr 160k seeds / ac
  • 6. © 2019, Deere & Co. All rights reserved. A “typical” Field 48 Acres 1.5 Million Corn Plants 2 Billion Kernels Spatially divided into 100000 3’x3’ sections
  • 7. © 2019, Deere & Co. All rights reserved. World Wide Data Processing • Each dot represents a machine capturing data • 5738 active sessions • 12 million measurements per second • 720 million measurements in 60 seconds
  • 8. © 2019, Deere & Co. All rights reserved. Use Cases – Precision Analysis • Data is rasterized at the operation level for precision analysis and visualization • Full resolution to 0.1493 m/cell, on a 256x256 cell raster • Can perform real-time evaluation, combination and visualization of 1 to n measurements via a robust API
  • 9. © 2019, Deere & Co. All rights reserved. Use Cases – Large Scale Analysis • 1 to n sessions can be aggregated to generate totals • Arbitrary criteria can be used to filter results • Spatially organized • 2.5B stored layers • Example - Average yield of corn in Polk County Iowa, in 2018, grouped by average harvester speed
  • 10. © 2019, Deere & Co. All rights reserved. Ingestion Constant Stream Micro-batches Large Batch
  • 11. © 2019, Deere & Co. All rights reserved. Ingestion Stream or Batch Processing? • Zip up the stream and process it as a batch? • Unzip the batch and process it as a stream? • Some of both?
  • 12. © 2019, Deere & Co. All rights reserved. Streaming – The Lowest Common Denominator Kinesis Data Stream … but not always the best choice
  • 13. © 2019, Deere & Co. All rights reserved. Retaining Batch Cohesion Kinesis Data Stream
  • 14. © 2019, Deere & Co. All rights reserved. Stateless Stream Processing Decoder Concerns: • Can I keep up? • Can I recover?
  • 15. © 2019, Deere & Co. All rights reserved. Keeping Up - Options 1.MoreShards 2. Bigger Decoder Instances Consumer Decoder Decoder Decoder3. Fan Out
  • 16. © 2019, Deere & Co. All rights reserved. Stateful Stream Processing 512,107 seeds 4,804,347 seeds More Concerns: 1. How do I group related data? 2. How do I handle late arriving data? 3. How do I ensure exactly once processing?
  • 17. © 2019, Deere & Co. All rights reserved. Apache Flink
  • 18. © 2019, Deere & Co. All rights reserved. Checkpoints, Savepoints, and Other Painpoints Some problems we’ve had: • Long checkpoint durations • Very large checkpoints & savepoints • S3 throttling • Checkpoint timeout spiral
  • 19. © 2019, Deere & Co. All rights reserved. Checkpoints, Savepoints, and Other Painpoints Some tips: • Try to avoid backpressure • Limit / reduce the amount of state we are keeping • Very long checkpoint duration • Removing checkpoints altogether
  • 20. © 2019, Deere & Co. All rights reserved. Scaling and Spillway • Flink/EMR does not autoscale • Our data is very spiky. • Irregular bursts of data • Inconsistent record size
  • 21. © 2019, Deere & Co. All rights reserved. Scaling and Spillway Solution - Spillway • If backpressure is detected, start piping records to a new stream • Monitor stream, if record count goes up, spin up a new cluster • When record count goes down, tear down cluster • Can cascade if needed
  • 22. © 2019, Deere & Co. All rights reserved. Validation at Scale • 26.8 Trillion Measurements (so far) • Even at 6 Sigma that is 92 Million failures • How to tackle this: • Logging - Elasticsearch/Kibana with careful grooming of what to log • Audits - Periodic jobs that evaluate statistical success • Monitoring – Cloudwatch Dashboards and Alarms • Investigator – Internally developed spark based tool that does analysis on failures at scale.
  • 23. © 2019, Deere & Co. All rights reserved. John Deere Careers http://jobs.deere.com Now hiring: • ML / AI • Vision and Perception • Data Science • Telematics • Robotics • Mobile Software • Embedded Software • Software Engineering • Architecture