Autonomous Driving: Mobility Data Challenges
There are numerous data challenges in the autonomous vehicle value chain, a lot of it impacts specific use
cases and outcomes. This article highlights the nature of challenges and their impact across the data value
chain.
Introduction
Automotive data is of high interest due to the
diverse and complex nature of data, its multiple
sources as well as high volume of the data
continually generated. Within an autonomous
driving context, real-time requirements around
navigation, safety and revenue models create new
challenges.
General AV Data Challenges
As vehicles advance to higher levels of autonomy,
the deep learning models utilize high volumes of
data for decisions e.g., from sensors, cameras,
lidars, pedestrian behaviors, road conditions etc.
There are a multitude of different challenges when
it comes to the data value chain within autonomous
driving and we will examine a few key themes here.
Real-time and Scale Challenges
A fleet of just a few dozen cars with cameras will
generate more than a million hours of video in a few
months which, once captured, needs to be
transmitted, stored, tagged, and processed for
training via deep learning neural networks. Dodging
every object, pedestrian, car, obstacle etc. can
become a mammoth task if data is not decomposed
in a modular way. Huge data sets e.g., driving
conditions, weather, behaviors, local laws etc. and
extreme computational and power requirements
create challenges to operate in real time.
Debugging/Troubleshooting Challenges
Massive data sets, complex weighted features, non-
deterministic AI models and real time processing is
a mix that is difficult to diagnose and interpret.
Quality, completeness, interdependencies, and
correlation of data including coverage of all outlier
and corner use cases, are much harder to detect,
predict and correct if machine learning models fail.
Neural networks can become unreliable and
sensitive to changes e.g., Changing of lighting
conditions, resizing of images, cropping at different
angles making it hard to address all safety aspects
in a standardized way and to troubleshoot them in
real time.
Accuracy Challenges
Safety systems require real-time detection,
accuracy from non-deterministic AI models with
complex compute and communication constraints –
these requirements can strain the systems of today.
These systems are expected to accurately operate
across diverse weather conditions, visibility,
infrastructure quality and external pedestrian
behaviors.
Lifecycle Challenges
The scale, accuracy, logging, monitoring, reporting
of data must consider the entire data management
lifecycle i.e., creation, acquisition, collection,
modification, processing, transmission, sharing,
storage, and disposal. Given nonstandard formats
of data collection e.g., differences between camera
and lidar, proprietary analysis by companies across
the value chain, volume, and velocity of data – we
risk losing out on many important attributes of data
towards completeness i.e., perishable insights or
dark data.
Collaboration Challenges
The lack of standardized live and test data sharing
between AV companies, infrastructure companies,
regulators and other stakeholders who want to
control their narratives, limit proprietary
information, expand competitive advantages and
not incur costs to share, makes it harder to inform
AV business models are driven by the ability
to understand and overcome data
challenges.
and develop safety features quickly and with
repeatable results.
Other Challenges
▪ Optimal synchronization of simulation and real
data from the field, replicating driving
algorithms and vice versa
▪ Handling operational and supply/demand data
to maximize revenue while minimizing dead
miles e.g., automotive health, battery levels,
fleet management and fleet positioning data etc.
▪ Passenger real-time data during each of the
four critical rideshare phases (pre-board,
boarding, transit, disembarking), to ensure
passenger safety and health, payments,
luggage, child safety etc. require complex and
secure interplay of real-time decisions from
sensor data, 3rd party integrations
▪ Aggregating and managing real time data from
the ecosystem i.e., vehicle edge and cloud to
monetize mobility services through APIs
The solution to these data challenges is being
focused on the specific autonomous use case being
addressed – and not let the large amounts and
velocity of data overwhelm the system. Specific use
case focus enables decisions on value of data,
reduced scope, addressing challenges in a more
surgical manner and will help with architectural,
partnership, safety, regulatory decisions that will
ultimately result in a valuable and profitable
autonomous service.
AV Data Value Chain Challenges
Each of these data challenges manifest themselves
differently across the autonomous driving data
chain. The table below attempts to map challenges
to the data value chain. The challenges with
accuracy have the highest impact across the value
chain all challenges impact AV decisions. Figure 1
below provides a summary of the data challenges
mapped to use cases.
Conclusion:
Automotive and autonomous data is diverse and challenging to manage since it is used for real time and safety
critical decisions, and since the data has impact across a wide variety of use cases. The point to note is that the
challenge is not a lack of data, but understanding data, its challenges and effectively prioritizing AV data
requirements and system design to operationalize specific autonomous use cases. A more comprehensive view
of the Big Data Value Chain associated with the AV ecosystem can be found in our article here.
About the Authors:
Nitin Kumar is a 20-year veteran in the Hi-Tech industry. He is currently the CEO of Appnomic but played a
variety of hands on executive roles ranging from CEO, Chief Growth Officer, Chief Transformation Officer,
M&A Integration/Separation Leader, BU Head and Management Consulting Partner (corporate and PE
portfolio companies). Nitin Kumar is a member of the Forbes Technology Council and shares his ideas and
thoughts on the forum regularly. In his role as a former Management Consulting Partner, Nitin has done
multiple strategy and M&A engagements for Software, Hardware, Semiconductor and AutoTech sectors
gaining invaluable insights in the value chain, technologies, and business models. He is also a Certified
Autonomous Driving Professional.
Manu Namboodiri has for 20 years worked across industries such as autonomous vehicles, security, IoT,
software and has broad experience ranging from strategy, product, marketing, and ecosystem development.
He resides in the San Francisco Bay area and advises companies in various stages of market adoption.
Figure 1: AV Data Challenges Mapping

Autonomous Driving: The Big Data Challenges

  • 2.
    Autonomous Driving: MobilityData Challenges There are numerous data challenges in the autonomous vehicle value chain, a lot of it impacts specific use cases and outcomes. This article highlights the nature of challenges and their impact across the data value chain. Introduction Automotive data is of high interest due to the diverse and complex nature of data, its multiple sources as well as high volume of the data continually generated. Within an autonomous driving context, real-time requirements around navigation, safety and revenue models create new challenges. General AV Data Challenges As vehicles advance to higher levels of autonomy, the deep learning models utilize high volumes of data for decisions e.g., from sensors, cameras, lidars, pedestrian behaviors, road conditions etc. There are a multitude of different challenges when it comes to the data value chain within autonomous driving and we will examine a few key themes here. Real-time and Scale Challenges A fleet of just a few dozen cars with cameras will generate more than a million hours of video in a few months which, once captured, needs to be transmitted, stored, tagged, and processed for training via deep learning neural networks. Dodging every object, pedestrian, car, obstacle etc. can become a mammoth task if data is not decomposed in a modular way. Huge data sets e.g., driving conditions, weather, behaviors, local laws etc. and extreme computational and power requirements create challenges to operate in real time. Debugging/Troubleshooting Challenges Massive data sets, complex weighted features, non- deterministic AI models and real time processing is a mix that is difficult to diagnose and interpret. Quality, completeness, interdependencies, and correlation of data including coverage of all outlier and corner use cases, are much harder to detect, predict and correct if machine learning models fail. Neural networks can become unreliable and sensitive to changes e.g., Changing of lighting conditions, resizing of images, cropping at different angles making it hard to address all safety aspects in a standardized way and to troubleshoot them in real time. Accuracy Challenges Safety systems require real-time detection, accuracy from non-deterministic AI models with complex compute and communication constraints – these requirements can strain the systems of today. These systems are expected to accurately operate across diverse weather conditions, visibility, infrastructure quality and external pedestrian behaviors. Lifecycle Challenges The scale, accuracy, logging, monitoring, reporting of data must consider the entire data management lifecycle i.e., creation, acquisition, collection, modification, processing, transmission, sharing, storage, and disposal. Given nonstandard formats of data collection e.g., differences between camera and lidar, proprietary analysis by companies across the value chain, volume, and velocity of data – we risk losing out on many important attributes of data towards completeness i.e., perishable insights or dark data. Collaboration Challenges The lack of standardized live and test data sharing between AV companies, infrastructure companies, regulators and other stakeholders who want to control their narratives, limit proprietary information, expand competitive advantages and not incur costs to share, makes it harder to inform AV business models are driven by the ability to understand and overcome data challenges.
  • 3.
    and develop safetyfeatures quickly and with repeatable results. Other Challenges ▪ Optimal synchronization of simulation and real data from the field, replicating driving algorithms and vice versa ▪ Handling operational and supply/demand data to maximize revenue while minimizing dead miles e.g., automotive health, battery levels, fleet management and fleet positioning data etc. ▪ Passenger real-time data during each of the four critical rideshare phases (pre-board, boarding, transit, disembarking), to ensure passenger safety and health, payments, luggage, child safety etc. require complex and secure interplay of real-time decisions from sensor data, 3rd party integrations ▪ Aggregating and managing real time data from the ecosystem i.e., vehicle edge and cloud to monetize mobility services through APIs The solution to these data challenges is being focused on the specific autonomous use case being addressed – and not let the large amounts and velocity of data overwhelm the system. Specific use case focus enables decisions on value of data, reduced scope, addressing challenges in a more surgical manner and will help with architectural, partnership, safety, regulatory decisions that will ultimately result in a valuable and profitable autonomous service. AV Data Value Chain Challenges Each of these data challenges manifest themselves differently across the autonomous driving data chain. The table below attempts to map challenges to the data value chain. The challenges with accuracy have the highest impact across the value chain all challenges impact AV decisions. Figure 1 below provides a summary of the data challenges mapped to use cases. Conclusion: Automotive and autonomous data is diverse and challenging to manage since it is used for real time and safety critical decisions, and since the data has impact across a wide variety of use cases. The point to note is that the challenge is not a lack of data, but understanding data, its challenges and effectively prioritizing AV data requirements and system design to operationalize specific autonomous use cases. A more comprehensive view of the Big Data Value Chain associated with the AV ecosystem can be found in our article here. About the Authors: Nitin Kumar is a 20-year veteran in the Hi-Tech industry. He is currently the CEO of Appnomic but played a variety of hands on executive roles ranging from CEO, Chief Growth Officer, Chief Transformation Officer, M&A Integration/Separation Leader, BU Head and Management Consulting Partner (corporate and PE portfolio companies). Nitin Kumar is a member of the Forbes Technology Council and shares his ideas and thoughts on the forum regularly. In his role as a former Management Consulting Partner, Nitin has done multiple strategy and M&A engagements for Software, Hardware, Semiconductor and AutoTech sectors gaining invaluable insights in the value chain, technologies, and business models. He is also a Certified Autonomous Driving Professional. Manu Namboodiri has for 20 years worked across industries such as autonomous vehicles, security, IoT, software and has broad experience ranging from strategy, product, marketing, and ecosystem development. He resides in the San Francisco Bay area and advises companies in various stages of market adoption. Figure 1: AV Data Challenges Mapping