Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards an Unified API for Spark and the IIoT by Ángel Conde at Big Data Spain 2017

898 views

Published on

Structured Streaming is a game changer for Apache Spark having a unified API for both batch and real-time processing. Moreover, its support for “event time” and watermarking simplifies its deployment on IIoT related projects. In this workshop, we will hands-on Spark´s Structured Streaming API and more specifically on its advantages for the IIoT domain.

https://www.bigdataspain.org/2017/talk/towards-an-unified-api-for-spark-and-the-iiot

Big Data Spain 2017
16th - 17th November Kinépolis Madrid

Published in: Technology
  • Be the first to comment

Towards an Unified API for Spark and the IIoT by Ángel Conde at Big Data Spain 2017

  1. 1. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE TOWARDS AN UNIFIED API FOR SPARK AND THE IIOT Ángel Conde Manjón/ 16-11-2017
  2. 2. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All rights reserved 1 2 3 4 5 Outline 2 Spark Structured Streaming Use Cases & Key Benefits Key Issues Processing IIoT Data The Industrial Internet of Things (IIoT) Demo time
  3. 3. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All rights reserved The Industrial Internet of Things (IIoT) : investment is expected to top $60 trillion during the next 15 years. : could add $14.2T to the global economy by 2030 McKinsey: will touch 43% of the global economy by 2025. Gartner : 20 billion IoT things installed by 2020. 3
  4. 4. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All rights reserved Use cases & Key Benefits 4
  5. 5. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All rights reserved Key issues Processing IIoT Data Late Data & Ordering • Connectivity issues: 2G, 3G. • Protocol support: Data quality • Raw sensor values: broken sensors. • Deal with duplicates: local acquisition systems. 5
  6. 6. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All rights reserved Structured Streaming 6 Stream processing on top of SparkSQL engine. Unified API for batch/stream processing. Watermarking & deduplication. Aggregations, UDFs, stateful ops. Joins with static data (Spark 2.3 will support joins between streams). spark.readStream .format(‚kafka‛) .option(‚subscribe‛,‛in‛) .load() .groupBy(‘value’) .agg(count(‚*‛)) .writeStream .format(‚kafka‛) .option(‚topic‛,‛out‛) .trigger(‚1 minute‛) .outputMode(‚update‛)
  7. 7. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All rights reserved Watermarking & Late Data 7
  8. 8. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE DEMO TIME
  9. 9. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All rights reserved Demo 9
  10. 10. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All rights reserved Architecture 10 Digital Platform (PaaS) JSON Filter and routing Aggregates & Raw data Real time processing
  11. 11. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE IKERLAN P.º José María Arizmendiarrieta, 2 - 20500 Arrasate-Mondragón T. +34 943712400 F. +34 943796944 THANK YOUhttps://github.com/Neuw84/bds2k17 aconde@ikerlan.es @neuw84

×