SlideShare a Scribd company logo
1 of 17
Download to read offline
See the Earth as it could be. 1
BringSatellite&DroneImageryintoYour
DataScienceWorkflows
Jason Brown
Senior Data Scientist, Astraea, Inc.
Spark+AI Summit 2020
June 26, 2020
See the Earth as it could be. Credit: Daily Overview
See the Earth as it could be. Credit: Astraea & Jamie Conklin
See the Earth as it could be.
Credit: Aviation Geek Club
See the Earth as it could be.
Credit: US NASA
See the Earth as it could be.
Big data
Variety:
• Naming? Overhead imagery, Earth observation, remote
sensing, raster
• > 150 file formats
Velocity:
• European Space Agency Sentinel-2 mission generating
~6TB per day
• Government operators in USA, Europe, Japan, South
Korea, China, Brazil
• Worldwide private organizations operating hundreds
satellites
Volume:
• 2 years ago, US NASA archives at 30 PB and growing
• Plus other public and private operators
• Plus drones
Open data policies (US, EU) and open standards for cloud
access
Source: US NASA
US NASA EOSDIS Data Volumes (PB)
See the Earth as it could be.
What about data science?
Long standing open source software for overhead imagery has
produced well developed but narrow tools
• QGIS: visual desktop app for cartography, processing of image to
enable human interpretation
• Geospatial Data Abstraction Library (GDAL): mature, high quality
open source C and C++ API
• GDAL programs: higher level operations, file I/O orientation
Necessary but not sufficient for data science
RasterFrames origin story:
• What makes overhead imagery “special”?
• How can we work with Volume of imagery in general-purpose tools?
See the Earth as it could be.
Overhead image data in 5 minutes
Starting from array or tensor representation of image data
What is different?
• Typical sizes 50-150 megapixel
• Each channel typically 16-bit integers (at rest), often interpreted as float
• Arbitrary number of channels representing more parts of electromagnetic
spectrum: e.g. ultraviolet, infrared, etc.
What is additional?
• Location!
• More metadata on each image: sensor, capture time, processing details from
publisher
• Encoding NaN in the integer values with a sentinel value
See the Earth as it could be.
Overhead image data in 5 minutes
Has a Projection, sometimes called a coordinate reference system (CRS)
• Invertible mathematical functions to convert coordinate on a 2D plane to/from a
location on the surface of an earth model
• Link to video explaining projections
• In practice we reference these with string representation
Has an affine transform to convert tensor index to/from 2D plane coordinates
• Translation: defines projection coordinate of 0, 0 index
• Scale: defines size of each cell in the 2D plane
• Shear: rarely seen in practice
Without shear, we can equivalently express the affine transform for a tensor of known size
as extent, a pair of 2D coordinates
See the Earth as it could be.
RasterFrames
Representing large image files in a DataFrame
See the Earth as it could be.
RasterFrames
Making the DataFrame model a practical reality
• Strong Python API - pip install pyrasterframes
• Custom datasource - spark.read.raster
• Tile User Defined Type
• Single column to carry tensor, projection, and extent
• In Python, the associated Tile class wraps a Numpy ndarray
• Column functions to extract or operate on the location and tensor
• Location examples: reproject, check intersection with other geometry
• Tensor examples: cell-wise arithmetic, column aggregates
• spark.ml Transformers to work with the Tile UDT in Pipelines
• Visualization and inspection of Tile UDT
See the Earth as it could be.
Github Gist
https://gist.github.com/vpipkt/09734d56082e7c74b187e9a635b6e764
Case study
See the Earth as it could be.
Currently incubating under Eclipse Foundation LocationTech
• Open source under Apache 2.0 license and commercial friendly intellectual
property governance
Try it:
• https://rasterframes.io/getting-started.html
• https://pypi.org/project/pyrasterframes/
• Free trial of EarthAI Notebook
Contribute!
• gitter.im for questions, engaging with ideas to contribute
https://gitter.im/locationtech/rasterframes
• GitHub for tracking issues & pull requests
• https://github.com/locationtech/rasterframes/blob/develop/CONTRIBUTING.md
RasterFrames project
See the Earth as it could be. 14
Thank you!
Visit https://rasterframes.io to try it out!
Date
See the Earth as it could be.
Trends in data publishing
• Another hypothesis: new standards for data publishing
are opening up new possibilities for tooling to use
satellite and drone data as found data
• Cloud optimized GeoTIFF
• More efficient parallel processing of large files
• Cloud hosted storage
• STAC specification: open standard for image metadata and search
• Cloud providers hosting open datasets
See the Earth as it could be.
Overhead Imagery Software & Data Science
Hypotheses:
1. Data science innovations often driven from application of “found data”. E.g. 1 million Flickr images lead to facial
recognition
2. Specialist tools have hindered the use of this data as “found data”
Observations:
• Many use cases satisfied by visual desktop apps for cartography, processing of image to enable human interpretation
• Typical science use case focuses on a small region, often very sophisticated processing of the data
• Science use cases well met by Geospatial Data Abstraction Library (GDAL) API
• GDAL
• Advantages: mature, high quality open source C and C++ API
• Disadvantages: higher level tools are oriented to Unix style read-process-write
• Data science, found data, and overhead imagery
• Data scientist needs to inspect some images but not a feasible tactic for creating a scalable model / algorithm / analytic
• Need to do some file reads, but want to use in memory processing
• In reality, a lot of data science and ML engineering is already done in Apache Spark SQL and ML APIs
See the Earth as it could be. 17Date

More Related Content

What's hot

Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Airflow: Save Tons of Money by Using Deferrable Operators
Airflow: Save Tons of Money by Using Deferrable OperatorsAirflow: Save Tons of Money by Using Deferrable Operators
Airflow: Save Tons of Money by Using Deferrable OperatorsKaxil Naik
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?Neo4j
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Ibis: Seamless Transition Between Pandas and Apache Spark
Ibis: Seamless Transition Between Pandas and Apache SparkIbis: Seamless Transition Between Pandas and Apache Spark
Ibis: Seamless Transition Between Pandas and Apache SparkDatabricks
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices Zalando Technology
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDBMongoDB
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stackVikrant Chauhan
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB ClusterMongoDB
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache JackrabbitJukka Zitting
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsAndrzej Michałowski
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 

What's hot (20)

Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Airflow: Save Tons of Money by Using Deferrable Operators
Airflow: Save Tons of Money by Using Deferrable OperatorsAirflow: Save Tons of Money by Using Deferrable Operators
Airflow: Save Tons of Money by Using Deferrable Operators
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Ibis: Seamless Transition Between Pandas and Apache Spark
Ibis: Seamless Transition Between Pandas and Apache SparkIbis: Seamless Transition Between Pandas and Apache Spark
Ibis: Seamless Transition Between Pandas and Apache Spark
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache Jackrabbit
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 

Similar to Bring Satellite and Drone Imagery into your Data Science Workflows

Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningAly Abdelkareem
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardDocker, Inc.
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.pptEqinNiftalyev
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTimeScience
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRebekah Rodriguez
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020GEO Analytics Canada
 
Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryAstraea, Inc.
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudGlobus
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataDatabricks
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
PhD Thesis Proposal
PhD Thesis Proposal PhD Thesis Proposal
PhD Thesis Proposal Ziqiang Feng
 
OpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsOpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsvirtualcitySYSTEMS GmbH
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsAmazon Web Services
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 

Similar to Bring Satellite and Drone Imagery into your Data Science Workflows (20)

Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.ppt
 
Project Matsu
Project MatsuProject Matsu
Project Matsu
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite Imagery
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Big data
Big dataBig data
Big data
 
PhD Thesis Proposal
PhD Thesis Proposal PhD Thesis Proposal
PhD Thesis Proposal
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
OpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsOpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developments
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data Platforms
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Bring Satellite and Drone Imagery into your Data Science Workflows

  • 1. See the Earth as it could be. 1 BringSatellite&DroneImageryintoYour DataScienceWorkflows Jason Brown Senior Data Scientist, Astraea, Inc. Spark+AI Summit 2020 June 26, 2020
  • 2. See the Earth as it could be. Credit: Daily Overview
  • 3. See the Earth as it could be. Credit: Astraea & Jamie Conklin
  • 4. See the Earth as it could be. Credit: Aviation Geek Club
  • 5. See the Earth as it could be. Credit: US NASA
  • 6. See the Earth as it could be. Big data Variety: • Naming? Overhead imagery, Earth observation, remote sensing, raster • > 150 file formats Velocity: • European Space Agency Sentinel-2 mission generating ~6TB per day • Government operators in USA, Europe, Japan, South Korea, China, Brazil • Worldwide private organizations operating hundreds satellites Volume: • 2 years ago, US NASA archives at 30 PB and growing • Plus other public and private operators • Plus drones Open data policies (US, EU) and open standards for cloud access Source: US NASA US NASA EOSDIS Data Volumes (PB)
  • 7. See the Earth as it could be. What about data science? Long standing open source software for overhead imagery has produced well developed but narrow tools • QGIS: visual desktop app for cartography, processing of image to enable human interpretation • Geospatial Data Abstraction Library (GDAL): mature, high quality open source C and C++ API • GDAL programs: higher level operations, file I/O orientation Necessary but not sufficient for data science RasterFrames origin story: • What makes overhead imagery “special”? • How can we work with Volume of imagery in general-purpose tools?
  • 8. See the Earth as it could be. Overhead image data in 5 minutes Starting from array or tensor representation of image data What is different? • Typical sizes 50-150 megapixel • Each channel typically 16-bit integers (at rest), often interpreted as float • Arbitrary number of channels representing more parts of electromagnetic spectrum: e.g. ultraviolet, infrared, etc. What is additional? • Location! • More metadata on each image: sensor, capture time, processing details from publisher • Encoding NaN in the integer values with a sentinel value
  • 9. See the Earth as it could be. Overhead image data in 5 minutes Has a Projection, sometimes called a coordinate reference system (CRS) • Invertible mathematical functions to convert coordinate on a 2D plane to/from a location on the surface of an earth model • Link to video explaining projections • In practice we reference these with string representation Has an affine transform to convert tensor index to/from 2D plane coordinates • Translation: defines projection coordinate of 0, 0 index • Scale: defines size of each cell in the 2D plane • Shear: rarely seen in practice Without shear, we can equivalently express the affine transform for a tensor of known size as extent, a pair of 2D coordinates
  • 10. See the Earth as it could be. RasterFrames Representing large image files in a DataFrame
  • 11. See the Earth as it could be. RasterFrames Making the DataFrame model a practical reality • Strong Python API - pip install pyrasterframes • Custom datasource - spark.read.raster • Tile User Defined Type • Single column to carry tensor, projection, and extent • In Python, the associated Tile class wraps a Numpy ndarray • Column functions to extract or operate on the location and tensor • Location examples: reproject, check intersection with other geometry • Tensor examples: cell-wise arithmetic, column aggregates • spark.ml Transformers to work with the Tile UDT in Pipelines • Visualization and inspection of Tile UDT
  • 12. See the Earth as it could be. Github Gist https://gist.github.com/vpipkt/09734d56082e7c74b187e9a635b6e764 Case study
  • 13. See the Earth as it could be. Currently incubating under Eclipse Foundation LocationTech • Open source under Apache 2.0 license and commercial friendly intellectual property governance Try it: • https://rasterframes.io/getting-started.html • https://pypi.org/project/pyrasterframes/ • Free trial of EarthAI Notebook Contribute! • gitter.im for questions, engaging with ideas to contribute https://gitter.im/locationtech/rasterframes • GitHub for tracking issues & pull requests • https://github.com/locationtech/rasterframes/blob/develop/CONTRIBUTING.md RasterFrames project
  • 14. See the Earth as it could be. 14 Thank you! Visit https://rasterframes.io to try it out! Date
  • 15. See the Earth as it could be. Trends in data publishing • Another hypothesis: new standards for data publishing are opening up new possibilities for tooling to use satellite and drone data as found data • Cloud optimized GeoTIFF • More efficient parallel processing of large files • Cloud hosted storage • STAC specification: open standard for image metadata and search • Cloud providers hosting open datasets
  • 16. See the Earth as it could be. Overhead Imagery Software & Data Science Hypotheses: 1. Data science innovations often driven from application of “found data”. E.g. 1 million Flickr images lead to facial recognition 2. Specialist tools have hindered the use of this data as “found data” Observations: • Many use cases satisfied by visual desktop apps for cartography, processing of image to enable human interpretation • Typical science use case focuses on a small region, often very sophisticated processing of the data • Science use cases well met by Geospatial Data Abstraction Library (GDAL) API • GDAL • Advantages: mature, high quality open source C and C++ API • Disadvantages: higher level tools are oriented to Unix style read-process-write • Data science, found data, and overhead imagery • Data scientist needs to inspect some images but not a feasible tactic for creating a scalable model / algorithm / analytic • Need to do some file reads, but want to use in memory processing • In reality, a lot of data science and ML engineering is already done in Apache Spark SQL and ML APIs
  • 17. See the Earth as it could be. 17Date