How Satellogic uses AI to understand the world to provide knowledge to our clients using satellites. The presentation was done at the Barcelona CitiAi meetup in January 2019.
Mike Warren is the co-founder and CTO of Descartes Labs, a company that operates a geospatial analysis platform using multiple integrated satellite image datasets. The platform provides analysis-ready images with historical records for machine learning and allows users to find, measure, monitor changes over time, and predict future changes to minimize risk and optimize outcomes. It eliminates much of the data preparation time typically required by geospatial scientists by maintaining a growing archive of processed images and a robust pipeline for continuous updates as new images become available.
This document discusses big data in the context of neuroscience and connectomics. It outlines the compute infrastructure, software tools, and data challenges needed to process and analyze huge volumes of neuroimaging and electron microscopy data to map neural connections at scale. Specifically, it mentions using Spark, Hadoop, Kafka and NoSQL databases to handle neuroimaging techniques like MRI, diffusion tensor imaging, and electron microscopy data in order to further the field of connectomics and map the brain at a microscale level.
This document describes the development of an image analysis GUI to characterize infrared detectors for the WFIRST telescope. The GUI allows researchers to efficiently analyze detector image data by automating the process of measuring key figures of merit like dark current, noise, gain and quantum efficiency. These statistical values are important for comparing detector performance under different operating conditions. The automated routines have standardized the analysis and accelerated the process of evaluating detectors for use in the WFIRST mission.
1) Machine learning and predictive analytics can be used to analyze large datasets and build models to find useful insights, predict outcomes, and provide competitive advantages.
2) WSO2 Machine Learner is a product that allows users to upload data, train machine learning models using various algorithms, compare results, and iterate on models.
3) Example use cases demonstrated by WSO2 Machine Learner include predicting airport wait times, tracking people via Bluetooth, predicting the Super Bowl winner, detecting defective manufacturing equipment, and identifying promising customers.
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
This document discusses image processing and big data initiatives. It describes how data can be used to create either useful applications or dangerous weapons. It also discusses the erosion of boundaries between different fields due to information technologies. New products are increasingly digital and complex due to advances in areas like sensors, machine learning, and computer technologies. Intelligent recognition technologies can now identify people from iris scans or detect diseases from molecular breath analysis. Both artificial intelligence and computational intelligence are discussed in the context of using data and algorithms to enable adaptive and intelligent systems. Various methods for preprocessing and classifying data are also outlined.
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial IntelligenceJason Creadore 🌐
Artificial intelligence (AI) has the potential to take the LiDAR
mapping market into hypergrowth. Following Moore’s law, with computation capacity doubling every 2 years, it is now possible for point cloud feature extraction to outpace the speed of data generation from laser scanning systems using artificial intelligence.
Mike Warren is the co-founder and CTO of Descartes Labs, a company that operates a geospatial analysis platform using multiple integrated satellite image datasets. The platform provides analysis-ready images with historical records for machine learning and allows users to find, measure, monitor changes over time, and predict future changes to minimize risk and optimize outcomes. It eliminates much of the data preparation time typically required by geospatial scientists by maintaining a growing archive of processed images and a robust pipeline for continuous updates as new images become available.
This document discusses big data in the context of neuroscience and connectomics. It outlines the compute infrastructure, software tools, and data challenges needed to process and analyze huge volumes of neuroimaging and electron microscopy data to map neural connections at scale. Specifically, it mentions using Spark, Hadoop, Kafka and NoSQL databases to handle neuroimaging techniques like MRI, diffusion tensor imaging, and electron microscopy data in order to further the field of connectomics and map the brain at a microscale level.
This document describes the development of an image analysis GUI to characterize infrared detectors for the WFIRST telescope. The GUI allows researchers to efficiently analyze detector image data by automating the process of measuring key figures of merit like dark current, noise, gain and quantum efficiency. These statistical values are important for comparing detector performance under different operating conditions. The automated routines have standardized the analysis and accelerated the process of evaluating detectors for use in the WFIRST mission.
1) Machine learning and predictive analytics can be used to analyze large datasets and build models to find useful insights, predict outcomes, and provide competitive advantages.
2) WSO2 Machine Learner is a product that allows users to upload data, train machine learning models using various algorithms, compare results, and iterate on models.
3) Example use cases demonstrated by WSO2 Machine Learner include predicting airport wait times, tracking people via Bluetooth, predicting the Super Bowl winner, detecting defective manufacturing equipment, and identifying promising customers.
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
This document discusses image processing and big data initiatives. It describes how data can be used to create either useful applications or dangerous weapons. It also discusses the erosion of boundaries between different fields due to information technologies. New products are increasingly digital and complex due to advances in areas like sensors, machine learning, and computer technologies. Intelligent recognition technologies can now identify people from iris scans or detect diseases from molecular breath analysis. Both artificial intelligence and computational intelligence are discussed in the context of using data and algorithms to enable adaptive and intelligent systems. Various methods for preprocessing and classifying data are also outlined.
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial IntelligenceJason Creadore 🌐
Artificial intelligence (AI) has the potential to take the LiDAR
mapping market into hypergrowth. Following Moore’s law, with computation capacity doubling every 2 years, it is now possible for point cloud feature extraction to outpace the speed of data generation from laser scanning systems using artificial intelligence.
Desktop Softwares for Unmanned Aerial Systems(UAS))Kamal Shahi
The document compares various desktop software used for processing data from unmanned aerial vehicles (UAVs). It provides a table comparing the major features and functionality of Pix4Dmapper, Agisoft Metashape, WebODM, and QGIS. These include outputs generated, ease of use, cost, support and limitations. It also provides guidance on choosing the best software by defining needs, researching options, trying demonstrations, considering costs, and getting recommendations from experts. Selecting the right software depends on the required processing capabilities, accuracy, compatibility and other factors listed.
Supermicro designed and implemented a rack-level cluster solution for San Diego Supercomputing Center (SDSC) optimized for their custom and experimental AI training and inferencing workloads, and meeting their environmental and TCO requirements. The project team will discuss the journey of designing and deploying our Rack Plug and Play cluster, and Shawn Strande, Dupty Director, SDSC, will be sharing his experience of partnering with the Supermicro team to solve his challgenges in HPC and AI.
The team will also share the technology that powers the SDSC Voyager Supercomputer, the Habana Gaudi AI system with 3rd Gen Intel® Xeon® Scalable processors for Deep Learning Training, and Habana Goya for Inferencing.
Watch the webinar: https://www.brighttalk.com/webcast/17278/517013
Streaming Analytics: It's Not the Same GameNumenta
This document discusses streaming analytics and how traditional machine learning algorithms are not well-suited for streaming data. It introduces Hierarchical Temporal Memory (HTM) as a new approach inspired by neuroscience that can handle streaming data, continuous learning, and temporal modeling. HTM uses sparse distributed representations and models sequences to make predictions and detect anomalies. The document provides examples of how HTM can be applied to problems like anomaly detection in server metrics, human behavior, geospatial tracking, social media streams, and stock prices. HTM algorithms are domain-independent and use the same codebase and parameters across different problem types.
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
In this talk Gerbert will give an overview of Artificial Intelligence, outline the current state of the art in research and explain what it takes to actually do an AI project. Using practical cases and tools he will give you insight in the phases of an AI project and explain some of the problems you might encounter along the way and how you might be able to solve them.
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
This document discusses analytics and IoT. It covers key topics like data collection from IoT sensors, data storage and processing using big data tools, and performing descriptive, predictive, and prescriptive analytics. Cloud platforms and visualization tools that can be used to build end-to-end IoT and analytics solutions are also presented. The document provides an overview of building IoT solutions for collecting, analyzing, and gaining insights from sensor data.
This document discusses 5-sense computing in robots for remote monitoring applications. It describes how giving robots human-like senses such as sight, hearing, smell, taste and touch would allow them to be used for remote inspection in hazardous environments. Current robotic sensing capabilities are outlined and examples of using multi-sensory robots for remote quality control, tank inspections and underground mine monitoring are provided. The networking requirements for transmitting multi-sensory data from robots in real-time are also summarized.
This presentation has slides from a talk that I gave at the annual Experimental Biology meeting, 2015, on our curriculum for Big Data Analytics in the Inland Empire.
Deep Learning Applications to Satellite Imageryrlewis48
These are the slides from Intel's AI DevCon 2018 Conference. The video from the workshop is available online.The last few years has seen a significant increase in the launch of commercial and federal satellite imaging platforms. As these data become more widely available, so too have the data science challenges and research opportunities. In this hands-on workshop, CosmiQ Works and Intel AI Lab will introduce the business use cases and research questions around leveraging this imagery, as well as helpful tools and datasets to ease the friction. We will guide attendees through a hands-on exercise using the tools to train a small network on Intel® Xeon® Processors to detect buildings or road networks using the SpaceNet™ dataset. Join us to learn how to explore this exciting area of applied deep learning.
Machine learning has led to tremendous impact on healthcare - diagnosis and treatment. Employing image classification and image segmentation various diagnostics insights & solutions with automated report generation can be delivered in real-time, leading to faster and more informed decisions & streamlining costs.
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...Codiax
This document provides an overview of Overstory, a company that uses satellite data and AI to monitor forests and tackle issues like deforestation and wildfires. It discusses how Overstory uses machine learning on high-resolution satellite imagery to create segmentation maps and monitor changes in forests over time. It also describes Overstory's infrastructure including its use of JupyterHub, Dask, and Papermill to enable large-scale distributed processing of satellite data and training of deep learning models.
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
This document discusses using Apache Spark for predictive analytics on DICOM medical images. It describes challenges in working with medical image and metadata, and how tools like Spark, Spark-TK, and TensorFlow can help analyze this data at scale. A live demo then shows building machine learning models on DICOM data to derive insights and predict patient outcomes or device performance. Performance tests analyze the impact of data size, partitions, executors, and cores on processing time.
This document discusses building scalable IoT applications using open source technologies. It begins by providing an overview of the growth of the IoT market and connected devices. It then discusses challenges with traditional "data lake" architectures for IoT data due to the high volume, velocity, and variety of IoT data. The document proposes an architecture combining stream processing for real-time data with analytics on both real-time and stored data. It discusses data access patterns and storage requirements for different types of IoT data. Finally, it provides an overview of open source technologies that can be used to build scalable IoT applications.
This document provides an overview of big data analytics, strategies, and the WSO2 big data platform. It discusses how the amount of data in the world is growing exponentially due to factors like increased data collection and the internet of things. It then summarizes the WSO2 big data platform for collecting, processing, analyzing and visualizing large datasets. Key components include the complex event processor for query processing and the business activity monitor for dashboards. The document concludes by outlining new developments and features being worked on, such as distributed complex event processing and machine learning integration.
Splunk is a powerful platform for understanding your data. The preview of the Machine Learning Toolkit and Showcase App extends Splunk with a rich suite of advanced analytics and machine learning algorithms. In this session, we'll present an overview of the app architecture and API and show you how to use Splunk to easily perform a variety of tasks, including outlier and anomaly detection, predictive analytics, and event clustering. We’ll use real data to explore these techniques and explain the intuition behind the analytics.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Webinar: Machine Learning para MicrocontroladoresEmbarcados
Neste webinar, serão apresentados conceitos sobre inteligência artificial, assim como ferramentas disponíveis para o desenvolvimento integradas ao MPLAB X e ao Harmony 3 e demonstração de um sistema de detecção de anomalia utilizando um microcontrolador da família ATSAMD21 (ARM Cortex M0+).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Desktop Softwares for Unmanned Aerial Systems(UAS))Kamal Shahi
The document compares various desktop software used for processing data from unmanned aerial vehicles (UAVs). It provides a table comparing the major features and functionality of Pix4Dmapper, Agisoft Metashape, WebODM, and QGIS. These include outputs generated, ease of use, cost, support and limitations. It also provides guidance on choosing the best software by defining needs, researching options, trying demonstrations, considering costs, and getting recommendations from experts. Selecting the right software depends on the required processing capabilities, accuracy, compatibility and other factors listed.
Supermicro designed and implemented a rack-level cluster solution for San Diego Supercomputing Center (SDSC) optimized for their custom and experimental AI training and inferencing workloads, and meeting their environmental and TCO requirements. The project team will discuss the journey of designing and deploying our Rack Plug and Play cluster, and Shawn Strande, Dupty Director, SDSC, will be sharing his experience of partnering with the Supermicro team to solve his challgenges in HPC and AI.
The team will also share the technology that powers the SDSC Voyager Supercomputer, the Habana Gaudi AI system with 3rd Gen Intel® Xeon® Scalable processors for Deep Learning Training, and Habana Goya for Inferencing.
Watch the webinar: https://www.brighttalk.com/webcast/17278/517013
Streaming Analytics: It's Not the Same GameNumenta
This document discusses streaming analytics and how traditional machine learning algorithms are not well-suited for streaming data. It introduces Hierarchical Temporal Memory (HTM) as a new approach inspired by neuroscience that can handle streaming data, continuous learning, and temporal modeling. HTM uses sparse distributed representations and models sequences to make predictions and detect anomalies. The document provides examples of how HTM can be applied to problems like anomaly detection in server metrics, human behavior, geospatial tracking, social media streams, and stock prices. HTM algorithms are domain-independent and use the same codebase and parameters across different problem types.
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
In this talk Gerbert will give an overview of Artificial Intelligence, outline the current state of the art in research and explain what it takes to actually do an AI project. Using practical cases and tools he will give you insight in the phases of an AI project and explain some of the problems you might encounter along the way and how you might be able to solve them.
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
This document discusses analytics and IoT. It covers key topics like data collection from IoT sensors, data storage and processing using big data tools, and performing descriptive, predictive, and prescriptive analytics. Cloud platforms and visualization tools that can be used to build end-to-end IoT and analytics solutions are also presented. The document provides an overview of building IoT solutions for collecting, analyzing, and gaining insights from sensor data.
This document discusses 5-sense computing in robots for remote monitoring applications. It describes how giving robots human-like senses such as sight, hearing, smell, taste and touch would allow them to be used for remote inspection in hazardous environments. Current robotic sensing capabilities are outlined and examples of using multi-sensory robots for remote quality control, tank inspections and underground mine monitoring are provided. The networking requirements for transmitting multi-sensory data from robots in real-time are also summarized.
This presentation has slides from a talk that I gave at the annual Experimental Biology meeting, 2015, on our curriculum for Big Data Analytics in the Inland Empire.
Deep Learning Applications to Satellite Imageryrlewis48
These are the slides from Intel's AI DevCon 2018 Conference. The video from the workshop is available online.The last few years has seen a significant increase in the launch of commercial and federal satellite imaging platforms. As these data become more widely available, so too have the data science challenges and research opportunities. In this hands-on workshop, CosmiQ Works and Intel AI Lab will introduce the business use cases and research questions around leveraging this imagery, as well as helpful tools and datasets to ease the friction. We will guide attendees through a hands-on exercise using the tools to train a small network on Intel® Xeon® Processors to detect buildings or road networks using the SpaceNet™ dataset. Join us to learn how to explore this exciting area of applied deep learning.
Machine learning has led to tremendous impact on healthcare - diagnosis and treatment. Employing image classification and image segmentation various diagnostics insights & solutions with automated report generation can be delivered in real-time, leading to faster and more informed decisions & streamlining costs.
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...Codiax
This document provides an overview of Overstory, a company that uses satellite data and AI to monitor forests and tackle issues like deforestation and wildfires. It discusses how Overstory uses machine learning on high-resolution satellite imagery to create segmentation maps and monitor changes in forests over time. It also describes Overstory's infrastructure including its use of JupyterHub, Dask, and Papermill to enable large-scale distributed processing of satellite data and training of deep learning models.
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
This document discusses using Apache Spark for predictive analytics on DICOM medical images. It describes challenges in working with medical image and metadata, and how tools like Spark, Spark-TK, and TensorFlow can help analyze this data at scale. A live demo then shows building machine learning models on DICOM data to derive insights and predict patient outcomes or device performance. Performance tests analyze the impact of data size, partitions, executors, and cores on processing time.
This document discusses building scalable IoT applications using open source technologies. It begins by providing an overview of the growth of the IoT market and connected devices. It then discusses challenges with traditional "data lake" architectures for IoT data due to the high volume, velocity, and variety of IoT data. The document proposes an architecture combining stream processing for real-time data with analytics on both real-time and stored data. It discusses data access patterns and storage requirements for different types of IoT data. Finally, it provides an overview of open source technologies that can be used to build scalable IoT applications.
This document provides an overview of big data analytics, strategies, and the WSO2 big data platform. It discusses how the amount of data in the world is growing exponentially due to factors like increased data collection and the internet of things. It then summarizes the WSO2 big data platform for collecting, processing, analyzing and visualizing large datasets. Key components include the complex event processor for query processing and the business activity monitor for dashboards. The document concludes by outlining new developments and features being worked on, such as distributed complex event processing and machine learning integration.
Splunk is a powerful platform for understanding your data. The preview of the Machine Learning Toolkit and Showcase App extends Splunk with a rich suite of advanced analytics and machine learning algorithms. In this session, we'll present an overview of the app architecture and API and show you how to use Splunk to easily perform a variety of tasks, including outlier and anomaly detection, predictive analytics, and event clustering. We’ll use real data to explore these techniques and explain the intuition behind the analytics.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Webinar: Machine Learning para MicrocontroladoresEmbarcados
Neste webinar, serão apresentados conceitos sobre inteligência artificial, assim como ferramentas disponíveis para o desenvolvimento integradas ao MPLAB X e ao Harmony 3 e demonstração de um sistema de detecção de anomalia utilizando um microcontrolador da família ATSAMD21 (ARM Cortex M0+).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
1. Understanding what happens on earth
using satellites
Barcelona CityAI 2019
Albert Pujol Torras
apujol@satellogic.com https://www.linkedin.com/in/albert-pujol-torras-3a7367/
2. Agenda
● Satellogic
● Satellogic Data Science and Solutions
● What we can do with satellites, examples of problems we face
● What type of data do we work with ?
● Processing infrastructure, hardware and software
● Our team
● Machine learning algorithms , ...and challenges.
● Lessons learned
● Questions
3.
4. Data Science & Solutions
BCN
Delivery platform
TLV
Headquarters & Design
BSAS
Manufacturing Plant
MVD
Comprehensive services
PEK
7. Estimation of other image modalities
HR RGB LR TIR LR SWIR1 LR SWIR2
HR THERMAL
8. Regression: time series image prediction
-Estimation of the yield at the end of the season
-Monitoring of changes in the estimation to know when and where to act.
12. Satellogic Data
3rd Party Satellite Data
Primary Data
Sources
Derived Layers
Temporal Evolution
Land Use Maps
Advanced Indices
Distance to Water
Terrain Orientation
Superresolution Images
...
These sources can be available globally or locally, dynamic or static, high or low res...
nKappa: Data science platform with focus on geographic data and satellite imagery.
Main goal: To scale solution development by automating/accelerating data science work.
nKappa enables solution development using aligned sets of image tiles (Kappas)
World Climate Maps
Geologic Data
Elevation Models
Georef: Man-Made Structure
Political Boundaries
Census Data Maps
Data - Data Sources
13. Sizes:
-Typical project: 350.000 km2, 3 times per week, 8 bands, at 10 meters per pixel resolution. 20Gb/day.
-We expect to acquire 7 Terabytes data per day by 2021.
Sources of image variation:
-Clouds….70% of the world is cloud covered.
-Perspective changes (off nadir satellite images, drone images).
-Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season.
-Chromatic changes due to aerosol and hour of day.
-Variations between sensors (different satellites, drone images,..)
-Variations/errors in image orthorectification, geolocalization.
-Growth and color vegetation changes,...
Data - Data Sources
clouds perspective shadows
Chromatic and vegetation
changes
15. rare and expensive: indispensable to train and to assess quality of ML and computer vision approaches.
Sources of ground truth:
- Land ground truth provided by client.
- GT generated using highest resolution imagery.
- Human annotation
- Our team always annotate ... to understand the problem.
- internal and external annotation (mechanical turk, supahands, ...)
- sample what to annotate to preserve variability and input domain coverage.
- Measure biases and variances of annotators (discard annotators, images,reconstruct annotation
instructions...).
- Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project, Creaf,
Siose in spain, USA USGS land cover dataset,...)
- Out of data, differing resolution, how to transfer it to places that differ in land management culture, climate or relief
(domain shift).
Data - Ground truth
16. Data: Covariate shift & Domain adaptation
Existent “good quality” Ground Truth
Rice fields in Europe
Target areas without ground truth
Urban areas in Europe Urban areas in Lagos
Rice fields in China
17. ● huge amount of data --> cloud infrastructure.
● nkappa platform for distributed processing (actually using Microsoft Azure)
and in-house gpu servers (equipped with 1080ti’s)
● Nkappa is used both in development and production stages.
● GPU-servers mostly used in the stage of EDA and DS algorithms and models
development.
● Cloud infrastructure mainly used to keep track, team share, and audit
datasets, algorithms and models putting pipelines and models in production.
Infrastructure - Hardware
18. Infraestructure: Software
Data scientist scripts
Infrastructure - Software
GIS Processing & remote sensing Rasterio, telluric,...
Distributed processing
nkappa
Trace, reuse and audit experiments, datasets, pipelines and models
Accelerate ds experimentation on remote sensing
Automatize insertion of new pipelines into production environment.
19. Our team Profile
19
Our development team
Data Scientists
Platform developers
Computer Vision, Machine Learning
specialists.
Additional background on remote
sensing.
● Strong python developers,
knowledge on machine
learning, computer vision
/image processing
● DevOps
● Front-end developers
● GIS python developers
specialists.
Solutions started 1 year and a half ago...
We are currently 13 and we are hiring !!
20. Algorithms- Computer vision algorithms, ML machinery from logistic regression, random forest to to the latest
deep NN.
- Training with tailored datasets using a smart sampling policy to maintain the input and output
variability of the original Datasets.
- We prefer Context knowledge + common sense heuristics + ML methods rather than pursuing end-
to-end Neural Networks (unless you are absolutely sure you have all relevant sources of image
variations in your training set and you are sure that your data augmentation policy is not biassing)
- Random forests, CNNs and variations of Unets alone or in ensembles, are the most used
algorithms by our team.
- Relevant lines of research:
- Generative models for Data augmentation, ground truth generation and hyper resolution.
- Transfer learning / Domain adaptation.
- Satellital Image invariant and efficient image Embeddings and Distance Metric Learning.
ML Algorithms: What we use
21. Lessons learned
- Project success :
- 5% ML algorithm and algorithm parameters selection,
- 95% really understanding what the client needs, how to generate value, and anticipate how
your output is going to be consume, defining good features, good ground truth, good sampling
data policy, pre and post processing.
- Dedicate the time first to ensure success, … after that improve:
- Using fast ML algorithms.
- Starting with small datasets with the input and output variability of the original one.
- Worth invest on automatically measure dataset quality before start training on big datasets.
- Missing values, constant variables, unaligned bands, duplicated variables, unbalancing…