Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

•Download as PPTX, PDF•

2 likes•253 views

It is not enough to build a mesh of sensors or embedded devices to obtain more insights about the surrounding environment and optimize your production systems. Usually, your IoT solution needs to be capable of transferring enormous amounts of data to storage or the cloud where the data have to be processed further. Quite often, the processing of the endless streams of data has to be done in real-time so that you can react on the IoT subsystem's state accordingly. This session will show attendees how to build a Fast Data solution that will receive endless streams from the IoT side and will be capable of processing the streams in real-time using Apache Ignite's cluster resources.

Engineering

© 2017 GridGain Systems, Inc.
In-Memory Performance
Durability of Disk

© 2017 GridGain Systems, Inc.
Apache Ignite and Apache Spark
Where Fast Data Meets the IoT
Denis Magda
Apache Ignite PMC Chair
GridGain Director of Product Management

© 2017 GridGain Systems, Inc.
• IoT Demands to Software
• IoT Software Stack
• Device OS/RTOS
• Data Collection and Enrichment
• NewSQL Database
• Application APIs
• Demo
Agenda

© 2017 GridGain Systems, Inc.
IoT Demands to Software
Real-time Processing
SQL, Geo-Spatial
Analytics (BI, ML)
High-Availability
Simple Scalability

© 2017 GridGain Systems, Inc.
IoT Software Stack
Device OS/Real-Time OS
Data Collection and Enrichment
NewSQL Database
Application APIs

© 2017 GridGain Systems, Inc.
Apache IoT Software Stack
Device OS/Real-Time OS
Data Collection and Enrichment
NewSQL Database
Application APIs

© 2017 GridGain Systems, Inc.
Apache MyNewt
Open Source RTOS
Cortex M, MIPS Bluetooth, Wifi,
TCP/IP
Secured Bootloader
Remote Firmware
Upgrade

© 2017 GridGain Systems, Inc.
Data Collection and Enrichment
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster

© 2017 GridGain Systems, Inc.
Apache Ignite Database and Caching Platform
Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco

© 2017 GridGain Systems, Inc.
Distributed Storage
JCache Transactions Compute SQL
RDBMS
NoSQL
HDFS
Server Node
Distributed Key-Value Store
Dynamic
Scaling
Distributed
partitioned
hash map
ACID TransactionJCache & SQL
Server Node Server Node
3rd party storage caching
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

© 2017 GridGain Systems, Inc.
Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE,
INSERT, MERGE,
DELETE, CREATE
and ALTER
DDL, DML Support
Cross-platform
Compatibility
Indexes in
RAM or Disk
Dynamic
Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Tools

© 2017 GridGain Systems, Inc.
Ignite and Spark Integration
Spark Application
Spark Worker
Spark
Job
Spark
Job
Yarn Mesos Docker HDFS
Spark Worker
Spark
Job
Spark
Job
Spark Worker
Spark
Job
Spark
Job
Share RDD
across jobs
on the host
In-Memory
Indexes
SQL on top
of RDDs
Share RDD
Globally
Ignite Node Ignite Node Ignite Node

© 2017 GridGain Systems, Inc.
The company develops IoT solutions that transmit
energy consumption data between meters, consumers
and utilities in real time.
Problem
• Could not meet latency and throughput SLAs
• Missing scalability and elasticity
GridGain Solution
• 50 millions meters stream the data back in real-time
• Collocated in-memory processing
• Advanced security and multi-tenancy
SQL
Smart Meters
GridGain Ignite Cluster
DB
IN-MEMORY IN-MEMORY IN-MEMORY IN-MEMORY
GridGain Advanced Security
Large IoT Provider - Smart Metering and Utilities
Compute Transactions
Company’s Platform

© 2017 GridGain Systems, Inc.
Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
#apacheignite
#denismagda

What's hot

In-Memory Computing Essentials for Software EngineersDenis Magda

Loading data into Apache IgniteStephen Darlington

Apache Ignite - Distributed Database OrchestrationAriel Jatib

Continuous Machine and Deep Learning with Apache IgniteDenis Magda

How we broke Apache Ignite by adding persistenceStephen Darlington

Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppMongoDB

IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...In-Memory Computing Summit

Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaEdureka!

Presto + Alluxio on steroids a romantic drama on Production with happy endAlluxio, Inc.

RedisConf17 - Turbo-charge your apps with Amazon Elasticache for RedisRedis Labs

60000 TPS: How many CPUs?, Enterprise Postgres DayEDB

PostgreSQL 12: What is coming up?, Enterprise Postgres DayEDB

Big data on google cloudTu Pham

IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...In-Memory Computing Summit

Cloud Database Migration Made Easy: Migrating MySQL to NuoDBNuoDB

Integrate the AWS Cloud with Responsive Xilinx Machine Learning at the Edge (...Amazon Web Services

PostgreSQL continuous backup and PITR with BarmanEDB

Scaling Galaxy on Google Cloud PlatformLynn Langit

No Time to Waste: Migrate from Oracle to EDB Postgres in MinutesEDB

10 reasons why to choose Pure StorageMarketingArrowECS_CZ

What's hot (20)

In-Memory Computing Essentials for Software Engineers

Loading data into Apache Ignite

Apache Ignite - Distributed Database Orchestration

Continuous Machine and Deep Learning with Apache Ignite

How we broke Apache Ignite by adding persistence

Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp

IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...

Google Cloud Platform Tutorial | GCP Fundamentals | Edureka

Presto + Alluxio on steroids a romantic drama on Production with happy end

RedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis

60000 TPS: How many CPUs?, Enterprise Postgres Day

PostgreSQL 12: What is coming up?, Enterprise Postgres Day

Big data on google cloud

IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...

Cloud Database Migration Made Easy: Migrating MySQL to NuoDB

Integrate the AWS Cloud with Responsive Xilinx Machine Learning at the Edge (...

PostgreSQL continuous backup and PITR with Barman

Scaling Galaxy on Google Cloud Platform

No Time to Waste: Migrate from Oracle to EDB Postgres in Minutes

10 reasons why to choose Pure Storage

Similar to Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

Enabling Real-Time Business with Change Data CaptureMapR Technologies

Libera la potenza del Machine LearningJürgen Ambrosi

Demystifying Data Warehousing as a Service - DFWKent Graziano

Enterprise-class security with PostgreSQL - 2Ashnikbiz

Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit

Adabas & Natural Virtual User Group Meeting NAM 2022Software AG

EDB: Power to PostgresAshnikbiz

Horses for Courses: Database RoundtableEric Kavanagh

New Approaches to Migrating from Oracle to Enterprise-Ready Postgres in the C...EDB

AI Scalability for the Next DecadePaula Koziol

Ibm db2 big sqlModusOptimum

Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...Amazon Web Services

Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services

PostgreSQL to Accelerate InnovationEDB

Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst

Ibm db2update2019 icp4 dataGustav Lundström

Instantaneous Replication of Build Artifacts with NetAppNetApp

An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion

Ibm integrated analytics systemModusOptimum

How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit

Similar to Apache Spark and Apache Ignite: Where Fast Data Meets the IoT (20)

Enabling Real-Time Business with Change Data Capture

Libera la potenza del Machine Learning

Demystifying Data Warehousing as a Service - DFW

Enterprise-class security with PostgreSQL - 2

Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud

Adabas & Natural Virtual User Group Meeting NAM 2022

EDB: Power to Postgres

Horses for Courses: Database Roundtable

New Approaches to Migrating from Oracle to Enterprise-Ready Postgres in the C...

AI Scalability for the Next Decade

Ibm db2 big sql

Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...

Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018

PostgreSQL to Accelerate Innovation

Streaming Real-time Data to Azure Data Lake Storage Gen 2

Ibm db2update2019 icp4 data

Instantaneous Replication of Build Artifacts with NetApp

An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017

Ibm integrated analytics system

How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors

Recently uploaded

main PPT.pptx of girls hostel security using rfidNikhilNagaraju

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER

POWER SYSTEMS-1 Complete notes examplesDr. Gudipudi Nageswara Rao

Electronically Controlled suspensions system .pdfme23b1001

Artificial-Intelligence-in-Electronics (K).pptxbritheesh05

Biology for Computer Engineers Course Handout.pptxDeepakSakkari2

Effects of rheological properties on mixingviprabot1

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

What are the advantages and disadvantages of membrane structures.pptxwendy cai

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

Heart Disease Prediction using machine learning.pptxPoojaBan

Past, Present and Future of Generative AIabhishek36461

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

An experimental study in using natural admixture as an alternative for chemic...Chandu841456

Introduction-To-Agricultural-Surveillance-Rover.pptxk795866

Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort

Recently uploaded (20)

main PPT.pptx of girls hostel security using rfid

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

Risk Assessment For Installation of Drainage Pipes.pdf

POWER SYSTEMS-1 Complete notes examples

Electronically Controlled suspensions system .pdf

Artificial-Intelligence-in-Electronics (K).pptx

Biology for Computer Engineers Course Handout.pptx

Effects of rheological properties on mixing

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction

What are the advantages and disadvantages of membrane structures.pptx

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

Heart Disease Prediction using machine learning.pptx

Past, Present and Future of Generative AI

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

DATA ANALYTICS PPT definition usage example

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

An experimental study in using natural admixture as an alternative for chemic...

Introduction-To-Agricultural-Surveillance-Rover.pptx

Call Girls Narol 7397865700 Independent Call Girls

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

9. © 2017 GridGain Systems, Inc. Apache Ignite Database and Caching Platform Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute Services MLStreamingKey/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco

10. © 2017 GridGain Systems, Inc. Distributed Storage JCache Transactions Compute SQL RDBMS NoSQL HDFS Server Node Distributed Key-Value Store Dynamic Scaling Distributed partitioned hash map ACID TransactionJCache & SQL Server Node Server Node 3rd party storage caching DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

11. © 2017 GridGain Systems, Inc. Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL, DML Support Cross-platform Compatibility Indexes in RAM or Disk Dynamic Scaling Server Node Server NodeServer Node Apache Ignite Cluster DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Tools

12. © 2017 GridGain Systems, Inc. Ignite and Spark Integration Spark Application Spark Worker Spark Job Spark Job Yarn Mesos Docker HDFS Spark Worker Spark Job Spark Job Spark Worker Spark Job Spark Job Share RDD across jobs on the host In-Memory Indexes SQL on top of RDDs Share RDD Globally Ignite Node Ignite Node Ignite Node

13. © 2017 GridGain Systems, Inc. The company develops IoT solutions that transmit energy consumption data between meters, consumers and utilities in real time. Problem • Could not meet latency and throughput SLAs • Missing scalability and elasticity GridGain Solution • 50 millions meters stream the data back in real-time • Collocated in-memory processing • Advanced security and multi-tenancy SQL Smart Meters GridGain Ignite Cluster DB IN-MEMORY IN-MEMORY IN-MEMORY IN-MEMORY GridGain Advanced Security Large IoT Provider - Smart Metering and Utilities Compute Transactions Company’s Platform

Editor's Notes

The Apache Ignite Platform Apache Ignite is a memory-centric data platform that is used to build fast, scalable & resilient solutions. At the heart of the Apache Ignite platform lies a distributed memory-centric data storage platform with ACID semantics, and powerful processing APIs including SQL, Compute, Key/Value and transactions. Built with a memory-centric approach, this enables Apache Ignite to leverage memory for high throughput and low latency whilst utilising local disk or SSD to provide durability and fast recovery. The main difference between the memory-centric approach and the traditional disk-centric approach is that the memory is treated as a fully functional storage, not just as a caching layer, like most databases do. For example, Apache Ignite can function in a pure in-memory mode, in which case it can be treated as an In-Memory Database (IMDB) and In-Memory Data Grid (IMDG) in one. On the other hand, when persistence is turned on, Ignite begins to function as a memory-centric system where most of the processing happens in memory, but the data and indexes get persisted to disk. The main difference here from the traditional disk-centric RDBMS or NoSQL system is that Ignite is strongly consistent, horizontally scalable, and supports both SQL and key-value processing APIs. Apache Ignite platform can be integrated with third-party databases and external storage mediums and can be deployed on any infrastructure. It provides linear scalability, built-in fault tolerance, comprehensive security and auditing alongside advanced monitoring & management. The Apache Ignite platform caters for a range of use cases including: Core banking services, Real-time product pricing, reconciliation and risk calculation engines, analytics and machine learning.
Ignite Data Grid is a distributed key-value store that enables storing data both in memory and on disk within distributed clusters and provides extensive APIs. Ignite Data Grid can be viewed as a distributed partitioned hash map with every cluster node owning a portion of the overall data. This way the more cluster nodes we add, the more data we can store.
Apache Ignite incorporates distributed SQL database capabilities as a part of its platform. The database is horizontally scalable, fault tolerant and SQL ANSI-99 compliant. It supports all SQL, DDL, and DML commands including SELECT, UPDATE, INSERT, MERGE, and DELETE queries. It also provides support for a subset of DDL commands relevant for distributed databases. Data sets as well as indexes can be stored both in RAM and on disk thanks to the durable memory architecture. This allows executing distributed SQL operations across different memory layers achieving in-memory performance with durability of disk. You can interact with Apache Ignite using SQL language via natively developed APIs for Java, .NET and C++, or via the Ignite JDBC or ODBC drivers. This provides a true cross-platform connectivity from languages such as PHP, Ruby and more.
Apache Ignite provides an implementation of Spark RDD abstraction which allows to easily share state in memory across Spark jobs. The main difference between native Spark RDD and IgniteRDD is that Ignite RDD provides a shared in-memory view on data across different Spark jobs, workers, or applications, while native Spark RDD cannot be seen by other Spark jobs or applications. The way IgniteRDD is implemented is as a view over a distributed Ignite cache, which may be deployed either within the Spark job executing process, or on a Spark worker, or in its own cluster. This means that depending on the chosen deployment mode the shared state may either exist only during the lifespan of a Spark application (embedded mode), or it may out-survive the Spark application (standalone mode) in which case the state can be shared across multiple Spark applications.
* 50 million meters deployed worldwide The meters stream the data back to the providers platform. The data is to be analyzed to control optimal power coverage and usage of diverse territory

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

Similar to Apache Spark and Apache Ignite: Where Fast Data Meets the IoT (20)

Recently uploaded

Recently uploaded (20)

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

Editor's Notes