Stephen Dillon - Fast Data Presentation Sept 02

Fast Data:
A Paradigm for the Demands of Efficient IoT Solutions
Stephen Dillon
Big Data Architect
@StephenDillon15
stephen.dillon@schneider-electric.com
http://www.linkedin.com/in/stephendillon/
Stephen Dillon

Agenda
● Goals
● Background
● Genesis of Fast Data
– Why has it emerged?
– IoT
– Big Data
– Influences on Fast Data
● Fast Data
– Define & Describe
– Fog Computing
● Examples of Technologies
– Review of Apache Spark
Stephen Dillon

Goals
● Be able to answer “What is Fast Data?
● Understand why we care about it.
● Expose you to Apache Spark
Stephen Dillon

About Me
● IoT platform team
– Innovation team since 2016
● Focus on technology
– 6 months, 1 year, 3 years out
– Big Data & DB Technologies
● NoSQL, NewSQL, Streaming
● Distributed data
– Proofs of concept, Best Practices
● Technical leadership, IP, papers

Recent Work in 2016
● Recent white paper:
– “IoT and the Pervasive Nature of Fast Data and Apache Spark”
– bit.ly/1Td6KFU
● Co-inventor on 2 patent submissions
– using Spark
● Co-authored upcoming research paper on
Federated Data queries
Stephen Dillon

GENESIS OF FAST DATA
Stephen Dillon

Why Fast Data?
● Growth of IoT
● Mobility of IoT
– Demands lower latency
● Complexity of analytics
– Graph theory
– Predictive Analytics
– Machine Learning
Stephen Dillon

Internet of Things
“The internet of things (IoT) is the network of
physical objects—devices, vehicles, buildings
and other items—embedded with electronics,
software, sensors, and network connectivity that
enables these objects to collect and exchange
data.” - Wikipedia
Stephen Dillon

Stephen Dillon, Schneider Electric
Stephen Dillon

IoT is Not Only about Hardware
It's also what you do with the Data that matters!

Why does it matter?
● Sensors collect data
● Data fuels analytics
● Analytics support business
● Derive actionable insights from the data
Stephen Dillon

Classic Definition
"...data that exceeds the processing capacity of conventional
database systems. The data is too big, moves too fast, or
doesn't fit the structures of your database architectures. To
gain value from this data, you must choose an alternative
way to process it."
Stephen Dillon

Big Data
● Volume
– A lot of it
● Velocity
– Ingress at high frequencies
● Variety
– Multi-structured not unstructured
– Data from disparate sources
– Different data points are captured
Stephen Dillon

Influences on Fast Data
● NoSQL
● Hadoop Framework
● Mapreduce & Batch Analytics
● NewSQL (in-memory DBs & distributed row
stores)
Stephen Dillon
Led to 3 significant concepts…

3 Significant Concepts
● Distributed Data storage
● Horizontal, shared-nothing, scale-out
architecture
● In-memory processing…RAM is the new disk
Stephen Dillon

FAST DATA
Describe, Define, Detail
Stephen Dillon

Definition
Fast Data is a paradigm that supports "...as-it-happens information
enabling real-time decision-making.“ [1]. It encompasses not only
the ingestion of data at speed but also the processing of the
data, deriving actionable insights from it, and the speed of delivery
of the results. It truly encompasses the Variety and Volume of data
at Velocity in all aspects.
[1] Alissa Lorentz, “Big Data, Fast Data, Smart Data”, Stephen Dillon

Characteristics
● It’s a paradigm, not a technology.
● A subset of Big Data
● Describes data in motion
● Data ingestion is a key tenet but...
– Not only about Velocity of data ingestion
Stephen Dillon

Fast Data Solutions
● Streaming
● Interactive queries (batch & real-time)
● In-memory capability
● Provides low-latency of ingestion, processing,
delivery
Stephen Dillon

FOG COMPUTING
The Evolution of Fast Data
Stephen Dillon

What is it?
● Only similar to “Edge” computing
– Fog pushes processing to a Fog node or gateway
– Edge places it on devices
● A decentralized computing infrastructure
● Move your compute resources & application
services closer to the data
● The goal is to improve efficiency and reduce the
amount of data that needs to be transported to
the cloud for data processing, analysis and
storage. Stephen Dillon

APACHE SPARK
Apache Spark
Stephen Dillon

Apache Spark
● A distributed compute engine that supports Fast Data via its in-
memory, distributed processing capability and its bundled APIs.
It can "...run programs up to 100x faster than Hadoop
MapReduce in memory, or 10x faster on disk."
Databricks Stephen Dillon

What Makes it Fast?
● Map Reduce on steroids
1. Spark passes data directly to other operations
2. In-memory processing of distributed data
3. JVM on each executor
Stephen Dillon

Spark Architecture
Mastering Apache Spark
Stephen Dillon

Misconceptions
● Not a Database!
● Not entirely in-memory
● Not a Hadoop Competitor
● No native distributed file system
Stephen Dillon

Contact
@StephenDillon15
stephen.dillon@schneider-electric.com
http://www.linkedin.com/in/stephendillon/
White paper: “IoT and the Pervasive Nature of Fast Data and Apache Spark”
Via Schneider Electric blog bit.ly/1Td6KFU

STATE OF THE ART
Commercial & Open-Source
Stephen Dillon

Spark Core Concepts
● RDD - Resilient Distributed Dataset
– Join RDDs from different sources
● Dataframes
– Allow you work with data in a table structure
– API for building a relational query plan
● Exactly Once Semantics
– No Duplicate data
Stephen Dillon

spark.apache.org
Stephen Dillon

Stephen Dillon - Fast Data Presentation Sept 02

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Viewers also liked

Viewers also liked (20)

Similar to Stephen Dillon - Fast Data Presentation Sept 02

Similar to Stephen Dillon - Fast Data Presentation Sept 02 (20)

Stephen Dillon - Fast Data Presentation Sept 02