The sensors in connected cars can generate up to 130 terabytes of data per year per car. This includes data from electronic control units such as RPMs, speed, fuel efficiency, and temperatures, as well as location, safety, voice and video data. Some of these sources stream in real time, while others are regularly delivered in batches. With Arcadia Enterprise, you can visualize the real-streams and perform deep discovery on historical data seamlessly on the same modern business intelligence platform.
The Connected cars segment presents a tremendous market opportunity– one of the leading growth areas within the IoT domain.
Sources: Image: http://automotive-engineering-illustration.com/Vehicle-sensors.html Sensor Numbers are from Mems Journal 2015
Vehicle management: Support for minimizing operating cost and increasing comfort. Examples include vehicle condition & service reminders, remote monitoring/ operations, telematics etc. Mobility management: Guidance on faster, safer, more economical, and more fuel-efficient driving, based on data gathered for the vehicle. Examples include real-time traffic information displays, parking lot or garage assistance etc. Safety: The ability to warn the driver of road problems and automatically sense and prevent potential collisions. Examples include collision avoidance features, danger warning signals and emergency call functions Autonomous driving: Operation of the vehicle without a human driver at the controls, existing only on a partial basis. Examples include self-parking cars, auto driving mode, motorway assistance, and the transportation of goods by trucks on well-delineated routes. Well-being: Optimization of the driver’s health and competence. Examples include electronic alerts that detect or mitigate fatigue, fatigue detection & alerting and other forms of individual assistance. Entertainment: Functions that provide music and video to passengers and the driver. Examples include smartphone interfaces, Wi-Fi or Local Area Network hotspots, access to social networks, and the “mobile office.” Home integration: Links to homes, offices, and other buildings. Examples include the integration of the automobile into home alarms or energy monitoring systems.
Safety Data relating to collisions, lane departures, or any hazards
Characteristics of Data from connected cars:
Data Volumes: According to experts, the connected car of the future will send 25 Gigabytes of data to the cloud every hour, representing up to 130 Terabytes of primary data storage per car, per year. Read more: http://telematicswire.net/every-connected-car-will-send-130tb-of-data-per-year-in-future-actifio/
Diverse data types: V2V stands for Vehicle to vehicle communication & V2I is Vehicle to Infrastructure
Let’s take a look at how Cloudera’s EDH fits into the IoT Ecosystem
Can ingest data from multiple sources including real-time streaming sensor data You can combine the sensor data with data other internal and external sources to drive business insights You can deploy EDH on prem (in your data center) or on hybrid cloud environments and still be able to manage it centrally And you can serve and analyze the data in a number of different ways - integrate it with existing BI solutions, do search or machine learning or integrate it with real time applications
For our demonstration today we wont be using live vehicle data. Instead we will be using a data generator which simulates messages and data commonly collected from vehicles. Such as vehicle position, speed, lane departure readings. This data will be sent to Apache Kafka where it is then picked up by StreamSets.
StreamSets is an open source Data collector which allows you to easily build out & manage data pipelines for ingestion into Hadoop One of our partners
For our demonstration today we wont be using live vehicle data. Instead we will be using a data generator which simulates messages and data commonly collected from vehicles. Such as vehicle position, speed, lane departure readings. This data will be sent to Apache Kafka where it is then picked up by StreamSets. StreamSets is an open source Data collector which allows you to rapidly build reliable complex data flows.
Its pure event-based flow, fits well with IoT sensor event data Its fast, scalable has an intuitive UI packed with lots of out-of-the-box functionality It provides pipeline monitoring/metrics and bad record/error handling.
The records are then split so that they can be processed independently. The top stream prepares they data for ingest into the Apache Solr search engine. Several key actions take place in this path. Types and Time Stamps are converted to be compatible with Solr and events are evaluated, categorized and tagged to make them easier to search.
The bottom stream redirects the messages to secondary Kafka topic where they are then picked up by Spark Streaming for Analysis, the output of which is stored in Apache Kudu. Apache Kudu is an new updateable storage layer for hadoop which allows for fast updates and fast scans. Its great for IOT and streaming use cases because data can be analyzed quickly without a complex batch update process.
This is a high level view of a Hadoop based architecture for collecting, processing and analyzing data from connected vehicles. Data from the vehicle can be transmitted in near-real time from the vehicle for collection or it could be stored onboard until the vehicle has internet connectivity. Such as when the vehicle is garaged at home with access to the owners Wi-Fi and then the data can be processed in batch.
For our demonstration today we will be using Kafka and StreamSets for ingest and data processing, Spark Streaming to process data on arrival, Kudu to store dynamically created vehicle profiles, Impala as the sql query engine to report on the vehicle profiles , Solr search for real-time event discovery. as the analysis layer and ZoomData as the visualization layer.
Apache Kafka is used as the ingest tool because: Scalable, durable and fault tolerant Supports very high throughput and large numbers of consumers/producers ( for instance the 40 million cars and trucks on the road) Apache Kafka allows for quick, non intrusive onboarding of new data sources, so if we add new sensors or decide to collect new readings from the vehicles the pipeline does not need to change. Allows for the Decoupling of data producers from data consumers consumers for a flexible and robust pipeline. Persistence of streaming data
Auto Manufacturers Personal maintenance recommendations (usage based or predictive) Understanding driver behavior to help shape products Insurance Providers Personalized rates (based on driver behavior) Public Safety Reporting collisions/road hazards/potholes Traffic analysis Other Mobile devices OBD readers
Connected Car: Delivering Real-Time Insights Using Hadoop