Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 11

Connected Car: Delivering Real-Time Insights Using Hadoop

Download to read offline

The sensors in connected cars can generate up to 130 terabytes of data per year per car. This includes data from electronic control units such as RPMs, speed, fuel efficiency, and temperatures, as well as location, safety, voice and video data. Some of these sources stream in real time, while others are regularly delivered in batches. With Arcadia Enterprise, you can visualize the real-streams and perform deep discovery on historical data seamlessly on the same modern business intelligence platform.

More Related Content

You Might Also Like

Related Books

Free with a 30 day trial from Scribd

See all

Connected Car: Delivering Real-Time Insights Using Hadoop

  1. 1. 1© Cloudera, Inc. All rights reserved. Connected Car: Delivering Real- Time Insights using Hadoop Vijay Raja & Jonathan Cooper-Ellis Cloudera
  2. 2. 2© Cloudera, Inc. All rights reserved. By 2020, there will be 250 Million connected vehicles on the road globally – Gartner & Connected Vehicle Trade Association Vehicles currently on the road have 60 – 100 sensors onboard. This number is projected to increase to 200+ by the year 2020. The Connected Car Paradigm 75% of new cars shipped in 2020 will have internet connectivity - Business Intelligence - Sources: Gartner, Strategy&, Mems Journal
  3. 3. 3© Cloudera, Inc. All rights reserved. Connected Cars - Key Segments/ Use Cases Mobility Management Functions that enable drivers to reach a destination quickly & effectively Vehicle Management Functions that reduce overall costs and improve comfort Well-Being Functions that improve drivers comfort and ability to drive Autonomous driving Operation of the vehicle without a human driver at the controls Safety Functions that warn drivers of external hazards Entertainment Functions involving entertainment of driver and passengers Source: Strategy&, 2015 Vehicle condition & performance, service reminders, remote mgmt., telematics etc. Real-time traffic information displays, parking lot or garage assistance etc. Collision avoidance, hazard warning signals and emergency call functions Self-parking cars, self driving mode, motorway assistance etc. Fatigue detection & alerting and other forms of individual assistance Smartphone interfaces, streaming videos & access to social networks
  4. 4. 4© Cloudera, Inc. All rights reserved. Data from Connected Cars Significant Data Volumes • 25 GB per hour per car • 130 TB per year per car 130TB 25GB per hour per car, per year Diverse Data Types • Data from ECUs • RPM, Speed, Fuel Efficiency, Temp, Pressure, Braking… • Location Data • Safety Data • V2V and V2I • Video Data Variety • Streaming (Real-Time) • Batch • Multiple Formats Data Sources • ECU (Electronic Control Units) • Vehicle Plug-ins • Head units • Cameras
  5. 5. 5© Cloudera, Inc. All rights reserved. Cloudera Enterprise – The Data & Analytics Platform for IoT Internal Systems External Sources Data Center Cloud Customer (CRM), Manufacturing, Dealer Data • Data Ingest • Data Storage • Data Processing • Machine Learning • Real-time Analytics BI Solutions Real-Time AppsSearch EDWDiscove r Machine Learning OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop REAL-TIME Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Partners
  6. 6. 6© Cloudera, Inc. All rights reserved. Connected Car – Demo Architecture OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop REAL-TIME Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Partners Cloudera Enterprise Data Hub MQTT - Kafka Bridge Connected Car Simulator Data Ingest & Pipeline Enterprise Data Hub BI & Visualization Streaming Data: • Time • VIN • Location • Mileage • Speed • Acceleration • Brakes applied? • Turn signal on? • Lane departed? • Collision detected? • Hazard detected? StreamSets Data Collector
  7. 7. 7© Cloudera, Inc. All rights reserved. Connected Car – Demo Architecture Cloudera Enterprise Data Hub MQTT - Kafka Bridge Connected Car Simulator Data Ingest & Pipeline Enterprise Data Hub BI & Visualization Streaming Data: • Time • VIN • Location • Mileage • Acceleration • Speed • Brakes applied? • Turn signal on? • Lane departed? • Collision detected? • Hazard detected? Data Storage Layer Search #2 #1 Pub-Sub Messaging System Real-Time Processing Engine StreamSets Data Collector
  8. 8. 8© Cloudera, Inc. All rights reserved. Demo Architecture Data Generator Spark Streaming Impala Kafka Kafka • Time • VIN • Miles • xAccel • yAccel • zAccel • Speed • Brakes • LaneDeparture • Signal • CollisionDetected • HazardDetected • Latitude • Longitude Kudu
  9. 9. 9© Cloudera, Inc. All rights reserved. Key Use Cases Highlighted Predictive Maintenance Public Services Understand traffic patterns, public safety hazards and provide recommendations accordingly Usage Based Insurance Provide specific Insurance coverage based on specific driving habits & risk profiles OEMs Insurance Public Services Provide specific service recommendations based on vehicle usage and driving habits
  10. 10. 11© Cloudera, Inc. All rights reserved. Thank You
  11. 11. 12© Cloudera, Inc. All rights reserved. Connected Car Use cases - Examples Predictive Maintenance Customer 360 Usage Based Insurance Fleet Management Public Services Vehicle Performance Usage & Feature Analytics Maintenance recommendations & analytics based on specific vehicle usage Insurance coverage based on specific driving habits & risk Targeted and personalized offers for customers based vehicle condition & driving habits Understand traffic patterns and public safety hazards What features in the vehicle are most used, what are least? Fleet performance and maintenance recommendations based on usage Provide a real-time view of vehicle health & performance

Editor's Notes

  • The Connected cars segment presents a tremendous market opportunity– one of the leading growth areas within the IoT domain.

    Sources:
    Image: http://automotive-engineering-illustration.com/Vehicle-sensors.html
    Sensor Numbers are from Mems Journal 2015


  • Vehicle management: Support for minimizing operating cost and increasing comfort. Examples include vehicle condition & service reminders, remote monitoring/ operations, telematics etc.
    Mobility management: Guidance on faster, safer, more economical, and more fuel-efficient driving, based on data gathered for the vehicle. Examples include real-time traffic information displays, parking lot or garage assistance etc.
    Safety: The ability to warn the driver of road problems and automatically sense and prevent potential collisions. Examples include collision avoidance features, danger warning signals and emergency call functions
    Autonomous driving: Operation of the vehicle without a human driver at the controls, existing only on a partial basis. Examples include self-parking cars, auto driving mode, motorway assistance, and the transportation of goods by trucks on well-delineated routes.
    Well-being: Optimization of the driver’s health and competence. Examples include electronic alerts that detect or mitigate fatigue, fatigue detection & alerting and other forms of individual assistance.
    Entertainment: Functions that provide music and video to passengers and the driver. Examples include smartphone interfaces, Wi-Fi or Local Area Network hotspots, access to social networks, and the “mobile office.”
    Home integration: Links to homes, offices, and other buildings. Examples include the integration of the automobile into home alarms or energy monitoring systems.
  • Safety Data relating to collisions, lane departures, or any hazards


    Characteristics of Data from connected cars:

    Data Volumes: According to experts, the connected car of the future will send 25 Gigabytes of data to the cloud every hour, representing up to 130 Terabytes of primary data storage per car, per year.
    Read more: http://telematicswire.net/every-connected-car-will-send-130tb-of-data-per-year-in-future-actifio/

    Diverse data types: V2V stands for Vehicle to vehicle communication & V2I is Vehicle to Infrastructure
  • Let’s take a look at how Cloudera’s EDH fits into the IoT Ecosystem

    Can ingest data from multiple sources including real-time streaming sensor data
    You can combine the sensor data with data other internal and external sources to drive business insights
    You can deploy EDH on prem (in your data center) or on hybrid cloud environments and still be able to manage it centrally
    And you can serve and analyze the data in a number of different ways - integrate it with existing BI solutions, do search or machine learning or integrate it with real time applications
  • For our demonstration today we wont be using live vehicle data.
    Instead we will be using a data generator which simulates messages and data commonly collected from vehicles. Such as vehicle position, speed, lane departure readings.
    This data will be sent to Apache Kafka where it is then picked up by StreamSets.

    StreamSets is an open source Data collector which allows you to easily build out & manage data pipelines for ingestion into Hadoop
    One of our partners

  • For our demonstration today we wont be using live vehicle data.
    Instead we will be using a data generator which simulates messages and data commonly collected from vehicles. Such as vehicle position, speed, lane departure readings.
    This data will be sent to Apache Kafka where it is then picked up by StreamSets. StreamSets is an open source Data collector which allows you to rapidly build reliable complex data flows.

    Its pure event-based flow, fits well with IoT sensor event data
    Its fast, scalable has an intuitive UI packed with lots of out-of-the-box functionality
    It provides pipeline monitoring/metrics and bad record/error handling.

    The records are then split so that they can be processed independently. The top stream prepares they data for ingest into the Apache Solr search engine. Several key actions take place in this path.
    Types and Time Stamps are converted to be compatible with Solr and events are evaluated, categorized and tagged to make them easier to search.

    The bottom stream redirects the messages to secondary Kafka topic where they are then picked up by Spark Streaming for Analysis, the output of which is stored in Apache Kudu. Apache Kudu is an new updateable storage layer for hadoop which allows for fast updates and fast scans. Its great for IOT and streaming use cases because data can be analyzed quickly without a complex batch update process.






  • This is a high level view of a Hadoop based architecture for collecting, processing and analyzing data from connected vehicles.
    Data from the vehicle can be transmitted in near-real time from the vehicle for collection or it could be stored onboard until the vehicle has internet connectivity.
    Such as when the vehicle is garaged at home with access to the owners Wi-Fi and then the data can be processed in batch.

    For our demonstration today we will be using Kafka and StreamSets for ingest and data processing, Spark Streaming to process data on arrival, Kudu to store dynamically created vehicle profiles, Impala as the sql query engine to report on the vehicle profiles , Solr search for real-time event discovery. as the analysis layer and ZoomData as the visualization layer.

    Apache Kafka is used as the ingest tool because:
    Scalable, durable and fault tolerant
    Supports very high throughput and large numbers of consumers/producers ( for instance the 40 million cars and trucks on the road)
    Apache Kafka allows for quick, non intrusive onboarding of new data sources, so if we add new sensors or decide to collect new readings from the vehicles the pipeline does not need to change.
    Allows for the Decoupling of data producers from data consumers consumers for a flexible and robust pipeline.
    Persistence of streaming data
  • Auto Manufacturers
    Personal maintenance recommendations (usage based or predictive)
    Understanding driver behavior to help shape products
    Insurance Providers
    Personalized rates (based on driver behavior)
    Public Safety
    Reporting collisions/road hazards/potholes
    Traffic analysis
    Other
    Mobile devices
    OBD readers
  • ×