Virdata: lessons learned from the Internet of Things and M2M Cloud Services @ IBM Big Data Developers Meetup

Big Data Developers - Virdata, Internet of Things #virdata
Big Data & IoT: lessons learned
Big Data Developers Meetup, San Jose, CA - June 5, 2014
#virdata | @nathan_gs

Who is Technicolor?
Domains
● Media Services
● Entertainment Services
● Connected Home
● Emerging Ventures
● Technology & Innovations
Who We Are
Technicolor, a worldwide technology leader in the media and entertainment sector, is at the
forefront of digital innovation. Our world class research and innovation laboratories and our
creative talent pool enable us to lead the market in delivering advanced services to content
creators and distributors. We also benefit from an extensive intellectual property portfolio
focused on imaging and sound technologies, supporting our thriving licensing business.

Virdata – OUR CORE CLOUD SERVICES
Device
Monitoring
Device
Management
Big Data
Analytics
Big Data
Queries
Application
Monitoring
Virdata Cloud APIs
MQTT
MQTT
MQTT
MQTT
M
Q
TT
MQTT

Virdata - 2 COMPONENTS: A CLOUD & A LIBRARY
★ Elastic and Scalable cutting edge technologies
★ API’s for different types of information/data consumption
★ Cloud agnostic thru self build monitoring tools
★ Running on both public & private cloud infrastructure
★ Bi-directional messaging
★ High performance brokers architecture
★ Lightweight and portable library
★ Multiple programming languages
★ Supports multiple transport protocols
★ Available for all HW and OS
★ Supports any type of data in any format/syntax
★ Payload is compressed and encrypted

Virdata - SERVICE ARCHITECTURE
millions of simultaneous persistent bi-directional connections
millions of messages per second
Real-time Complex Event Processing
Distributed Pub/Sub Messaging
Historical Data Archiving Pre-computed Data
In-Memory
real-time Data
REST API
Launch Queries - Launch Jobs
INTEGRATION
CUSTOMIZATION
NOC, OPERATIONS, MGMT REPORTS, TRENDS
ANALYTICS

Virdata - VERTICAL INDUSTRIES
AUTOMOTIVE
● Fleet Management
● Insurance
● Emergency Services
UTILITIES
● Remote Meter Management
● Monitor Energy Consumption
● Optimize Subscription Plan
CONSUMER ELECTRONICS
● Monitoring & Management
● Upsell Services
● Enhanced End User Experience
CUSTOMER CARE
● Monitor Device & Application
● One Button Care
● Call Avoidance
RETAIL
● Geo-location Based Adverts
● Heat Mapping
● Individualized Offering
HEALTH
● Promote Patient Independence
● Time-Series Analysis
● Pro-active Responses

Live Demo
Contact us for a live demo at info@virdata.com or virdata.com.

Connected “Things”

Huge variety in devices and OSs.

Virdata Client Libraries

APIs

Northbound and Southbound API
Northbound API = Cloud API
● Messaging API
○ REST
○ PUB/SUB
○ MQTT
○ JMS
● Data Processing API
○ SQL
○ JobAPI
○ Query/REST
Southbound API provided at the device
level

Integration of Virdata into IBM BlueMix
Objectives
• Show the strengths of the Virdata Internet of Things platform
• Scalability to supports millions of connected devices
• Real-time and historical data processing
• Cloud API’s powering new data drives services across vertical markets
• Demonstrate the power of the IBM BlueMix solution
• Rapid development and deployment of new applications
• Platform as a Service marketplace
• Highlight the value of combining both
• Internet of Things platform as a service
Use-case
• Virdata provides real-time car data
• App acts upon car trouble codes
• Invokes manufacturer analytics service
• Initiates recommended actions, e.g. through
Maximo workflow service
• Schedules car dealer appointment
• Informs the car driver

Messaging & Broker

Messaging Architecture: Device to Platform
Protocol
Adapter
Protocol
Adapter
Protocol
Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data
Processing
API
State
State
State

Messaging Architecture: Device to Device(s)
Protocol
Adapter
Protocol
Adapter
Protocol
Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data
Processing
API
State
State
State

Messaging Architecture: Large Fan Out
Protocol
Adapter
Protocol
Adapter
Protocol
Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data
Processing
API
State
State
State

Horizontally scalable
… and elastic as well.
Messaging

Persistent connections
Broker

Real-time bidirectional communication

MQTT
Pub/Sub
Protocol Adaptor

MQTT: QoS levels
QoS 0: best effort
QoS 1: at least once
QoS 2: Exactly once
Protocol Adaptor

Kafka
Queues

Storm
Messaging

Message passing
Storm

Stream/Message partitioning, as well as grouping.
Storm

Storm
Nimbus Zookeeper
Supervisor
Worker Node
Executer
Executer
Executer
Supervisor
Worker Node
Executer
Executer
Executer
Supervisor
Worker Node
Executer
Executer
Executer

Storm
Tuple
Stream
Field 1 | Field 2 | Field 3| Field 4 | Field 5
TUPLE
TUPLE TUPLE TUPLE TUPLE
STREAM

Storm
Spout
Bolt
SPOUT BOLT
T
T T T
T T T BOLT
T T T
T T T
T T T BOLT API

Storm
Grouping
S
B
B
B
B
B
GROUPING GROUPING

Data Processing

Events used to manipulate the master data.
Events: Before

Today, events are the master data.
Events: After

Let’s store everything.
Data System

Data is Immutable.
Data System

Data is Time Based.
Data System

The data you query is often transformed, aggregated, ...
Rarely used in its original form.
Query

Query = function ( all data )
Query

Functional computation, based on immutable inputs, is
idempotent.
Batch Layer

Query: Number of cars living in each city
Car Location Timestamp
BMW 1 Antwerp 2008-10-11
Aston Martin Cologne 2010-01-23
BMW 2 Antwerp 2012-09-12
BMW 1 Cologne 2014-04-29
Location Count
Antwerp 1
Cologne 2

Query
All Data QueryPrecomputed
View

Layered Architecture
Batch Layer
Speed Layer
Serving
Layer

Layered Architecture
Spark C*
Incoming Data
*
Query

Batch Layer

Batch Layer
Incoming Data
Spark C*

Batch Layer
The batch layer can calculate anything, given enough time...
Unrestrained computation.

Keep the data in its original format.
The batch layer stores the data normalized, the generated views are often, if not always denormalized.
Batch Layer

Horizontally scalable.
Batch Layer

Stores a master copy of the data set
Batch Layer
… append only

High Latency.
Let’s for now pretend the update latency doesn’t matter.
Batch Layer

In-memory storage
Spark

Advanced DAG execution engine
Cyclic data, in memory computing.
Spark

Multilanguage support, interactive shells
Scala, Java & Python
Spark

Write programs in terms of transformations on
distributed datasets.
RDD, are collections of objects, stored in RAM or on disk.
Are build through parallel transformations,
and are automatically rebuild on failure.
Spark

map
Spark: API
reduce

map
filter
groupBy
sort
union
join
leftOuterJoin
rightOuterJoin
count
fold
reduceByKey
groupByKey
Spark: API
reduce
cogroup
cross
zip
sample
take
first
partitionBy
mapWith
pipe
save
...

Spark Ecosystem
Spark
HDFS
Tachyon
Mesos
Spark
Streaming
Shark /
Spark SQL
GraphX MLlib Mahout
MR
v1
Blink
DB
Velox
YARN

Every iteration produces the views from scratch.
Batch Layer

Batch View Databases
We need a (read-only) database to store those views.

Example: the automotive market
Real Time Tracking
Engine Block Performance
Fleet Management
3rd
Party API integration
Integration with Informix
Big Data Visualization
3rd
Party Application Creation
BlueMix Platform as a Service
Process Integrations
The Open Source Route Enterprise Integration Bringing Analytics to the Data

Batch Layer
Data absorbed into Batch Views
Time
Now
We are not done yet…
Not yet absorbed.
Just a few hours of data.

Speed Layer

Speed Layer
Spark C*
Incoming Data
C*

Stream processing.
Speed Layer

Continuous computation.
Speed Layer

Storing a limited window of data.
Compensating for the last few hours of data.
Speed Layer

All the complexity is isolated in the Speed Layer.
If anything goes wrong, it’s auto-corrected.
Speed Layer

You have a choice between:
● Availability
○ Queries are eventually
consistent
● Consistency
○ Queries are consistent
CAP
Consistency
Partition
Tolerance
Availability

Eventual accuracy
Some algorithms are hard to implement in real-time.
For those cases we could estimate the results.

Spark Streaming
Micro batches

Spark Streaming
Stateful

Spark Streaming
Exactly once

Incremental algorithms
Spark Streaming

IBM Infosphere Streams

Serving Layer

Serving Layer
Spark C*
Incoming Data
C*
Query

Serving Layer
Random reads.

This layer queries the batch & real-time views and
merges it.
Serving Layer

Lambda Architecture

Lambda Architecture
The Lambda Architecture can discard any view, batch
and real-time, and just recreate everything from the
master data.

Mistakes are corrected via recomputation.
Write bad data? Remove the data & recompute.
Bug in view generation? Just recompute the view.
Lambda Architecture

Using a new schema?
No problem, keep your data, keep your input F, change your output.
Lambda Architecture

Data storage is highly optimized.
Lambda Architecture

Control Plane

Cloud Agnostic
Control Plane

IBM SoftLayer
Experiences & Observations
1. Smooth migration from SCE 2.2 to SoftLayer in 1 months time including:
■ Development of SoftLayer specific FOG abstraction layer expansion to
accommodate Virdata’s Devops tooling (CHEF)
■ Complete on-boarding of the Virdata Platform
■ Complete launch of simulation and emulation clusters
■ Very exhaustive and complete API
2. Very constructive and professional support throughout the complete on-boarding
process
3. Availability of bare metal seen as a differentiator

Cluster Management & Orchestration
Control Plane
RGOSSIP

Monitoring and Logging
Control Plane

Wrap-up

Questions?
@virdata_iot | #virdata
@nathan_gs

Acknowledgements
I would like to thank Nathan Marz for writing a very insightful book, where the idea of the Lambda Architecture comes from.
Lambda: Big Data - Nathan Marz published at Manning
Lambda, Storm: A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van Landeghem at FOSDEM 2013
Spark: Apache Spark website
Spark: Apache Spark - the light at the end of the tunnel? - Michael Hausenblas, MapR at Data Science Day Berlin 2014

Thank you
virdata.com | +1 (937) 569 4220 | info@virdata.com
#virdata | @virdata_iot
@nathan_gs | nathan.bijnens@virdata.com

Virdata: lessons learned from the Internet of Things and M2M Cloud Services @ IBM Big Data Developers Meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Virdata: lessons learned from the Internet of Things and M2M Cloud Services @ IBM Big Data Developers Meetup

Similar to Virdata: lessons learned from the Internet of Things and M2M Cloud Services @ IBM Big Data Developers Meetup (20)

More from Nathan Bijnens

More from Nathan Bijnens (10)

Recently uploaded

Recently uploaded (20)

Virdata: lessons learned from the Internet of Things and M2M Cloud Services @ IBM Big Data Developers Meetup