SlideShare a Scribd company logo
1 of 35
Fast Data:
A Paradigm for the Demands of Efficient IoT Solutions
Stephen Dillon
Big Data Architect
@StephenDillon15
stephen.dillon@schneider-electric.com
http://www.linkedin.com/in/stephendillon/
Stephen Dillon
Agenda
● Goals
● Background
● Genesis of Fast Data
– Why has it emerged?
– IoT
– Big Data
– Influences on Fast Data
● Fast Data
– Define & Describe
– Fog Computing
● Examples of Technologies
– Review of Apache Spark
Stephen Dillon
Goals
● Be able to answer “What is Fast Data?
● Understand why we care about it.
● Expose you to Apache Spark
Stephen Dillon
About Me
● IoT platform team
– Innovation team since 2016
● Focus on technology
– 6 months, 1 year, 3 years out
– Big Data & DB Technologies
● NoSQL, NewSQL, Streaming
● Distributed data
– Proofs of concept, Best Practices
● Technical leadership, IP, papers
Recent Work in 2016
● Recent white paper:
– “IoT and the Pervasive Nature of Fast Data and Apache Spark”
– bit.ly/1Td6KFU
● Co-inventor on 2 patent submissions
– using Spark
● Co-authored upcoming research paper on
Federated Data queries
Stephen Dillon
GENESIS OF FAST DATA
Stephen Dillon
Why Fast Data?
● Growth of IoT
● Mobility of IoT
– Demands lower latency
● Complexity of analytics
– Graph theory
– Predictive Analytics
– Machine Learning
Stephen Dillon
Internet of Things
“The internet of things (IoT) is the network of
physical objects—devices, vehicles, buildings
and other items—embedded with electronics,
software, sensors, and network connectivity that
enables these objects to collect and exchange
data.” - Wikipedia
Stephen Dillon
Stephen Dillon, Schneider Electric
Stephen Dillon
IoT is Not Only about Hardware
It's also what you do with the Data that matters!
Why does it matter?
● Sensors collect data
● Data fuels analytics
● Analytics support business
● Derive actionable insights from the data
Stephen Dillon
Big Data
Stephen Dillon
Classic Definition
"...data that exceeds the processing capacity of conventional
database systems. The data is too big, moves too fast, or
doesn't fit the structures of your database architectures. To
gain value from this data, you must choose an alternative
way to process it."
Stephen Dillon
Big Data
● Volume
– A lot of it
● Velocity
– Ingress at high frequencies
● Variety
– Multi-structured not unstructured
– Data from disparate sources
– Different data points are captured
Stephen Dillon
Influences on Fast Data
● NoSQL
● Hadoop Framework
● Mapreduce & Batch Analytics
● NewSQL (in-memory DBs & distributed row
stores)
Stephen Dillon
Led to 3 significant concepts…
3 Significant Concepts
● Distributed Data storage
● Horizontal, shared-nothing, scale-out
architecture
● In-memory processing…RAM is the new disk
Stephen Dillon
FAST DATA
Describe, Define, Detail
Stephen Dillon
Definition
Fast Data is a paradigm that supports "...as-it-happens information
enabling real-time decision-making.“ [1]. It encompasses not only
the ingestion of data at speed but also the processing of the
data, deriving actionable insights from it, and the speed of delivery
of the results. It truly encompasses the Variety and Volume of data
at Velocity in all aspects.
[1] Alissa Lorentz, “Big Data, Fast Data, Smart Data”, Stephen Dillon
Characteristics
● It’s a paradigm, not a technology.
● A subset of Big Data
● Describes data in motion
● Data ingestion is a key tenet but...
– Not only about Velocity of data ingestion
Stephen Dillon
Fast Data Solutions
● Streaming
● Interactive queries (batch & real-time)
● In-memory capability
● Provides low-latency of ingestion, processing,
delivery
Stephen Dillon
High-Level
Stephen Dillon
FOG COMPUTING
The Evolution of Fast Data
Stephen Dillon
What is it?
● Only similar to “Edge” computing
– Fog pushes processing to a Fog node or gateway
– Edge places it on devices
● A decentralized computing infrastructure
● Move your compute resources & application
services closer to the data
● The goal is to improve efficiency and reduce the
amount of data that needs to be transported to
the cloud for data processing, analysis and
storage. Stephen Dillon
Fog Computing
Stephen Dillon
APACHE SPARK
Apache Spark
Stephen Dillon
Apache Spark
● A distributed compute engine that supports Fast Data via its in-
memory, distributed processing capability and its bundled APIs.
It can "...run programs up to 100x faster than Hadoop
MapReduce in memory, or 10x faster on disk."
Databricks Stephen Dillon
What Makes it Fast?
● Map Reduce on steroids
1. Spark passes data directly to other operations
2. In-memory processing of distributed data
3. JVM on each executor
Stephen Dillon
Spark Architecture
Mastering Apache Spark
Stephen Dillon
Misconceptions
● Not a Database!
● Not entirely in-memory
● Not a Hadoop Competitor
● No native distributed file system
Stephen Dillon
Contact
@StephenDillon15
stephen.dillon@schneider-electric.com
http://www.linkedin.com/in/stephendillon/
White paper: “IoT and the Pervasive Nature of Fast Data and Apache Spark”
Via Schneider Electric blog bit.ly/1Td6KFU
EXTRA MATERIALS
If we go fast
STATE OF THE ART
Commercial & Open-Source
Stephen Dillon
Spark Core Concepts
● RDD - Resilient Distributed Dataset
– Join RDDs from different sources
● Dataframes
– Allow you work with data in a table structure
– API for building a relational query plan
● Exactly Once Semantics
– No Duplicate data
Stephen Dillon
Streaming
Stephen Dillon
spark.apache.org
Stephen Dillon

More Related Content

What's hot

Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyDomino Data Lab
 
Big datatraining ranga_1
Big datatraining ranga_1Big datatraining ranga_1
Big datatraining ranga_1Ranga Vadlamudi
 
Spring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationSpring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationRanga Vadlamudi
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detikk4ndar
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...CloudxLab
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 

What's hot (16)

Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
Big datatraining ranga_1
Big datatraining ranga_1Big datatraining ranga_1
Big datatraining ranga_1
 
Spring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationSpring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB Presentation
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 

Viewers also liked

Supporting the Smart Grid with IMDBS
Supporting the Smart Grid with IMDBSSupporting the Smart Grid with IMDBS
Supporting the Smart Grid with IMDBSStephen Dillon
 
IoT and the Pervasive Nature of Fast Data and Apache Spark
IoT and the Pervasive Nature of Fast Data and Apache SparkIoT and the Pervasive Nature of Fast Data and Apache Spark
IoT and the Pervasive Nature of Fast Data and Apache SparkStephen Dillon
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Datafreshdatabos
 
20160524 ibm fast data meetup
20160524 ibm fast data meetup20160524 ibm fast data meetup
20160524 ibm fast data meetupshinolajla
 
Fast Data Overview
Fast Data OverviewFast Data Overview
Fast Data OverviewC. Scyphers
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...lisapaglia
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersVoltDB
 
The Expert Guide to Fast Data
The Expert Guide to Fast Data The Expert Guide to Fast Data
The Expert Guide to Fast Data VoltDB
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiHakka Labs
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep DiveAmazon Web Services
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataCloudera, Inc.
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisAmazon Web Services
 
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaDesign Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaIan Downard
 
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Lightbend
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big DataVoltDB
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...VoltDB
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend
 

Viewers also liked (20)

Supporting the Smart Grid with IMDBS
Supporting the Smart Grid with IMDBSSupporting the Smart Grid with IMDBS
Supporting the Smart Grid with IMDBS
 
IoT and the Pervasive Nature of Fast Data and Apache Spark
IoT and the Pervasive Nature of Fast Data and Apache SparkIoT and the Pervasive Nature of Fast Data and Apache Spark
IoT and the Pervasive Nature of Fast Data and Apache Spark
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
20160524 ibm fast data meetup
20160524 ibm fast data meetup20160524 ibm fast data meetup
20160524 ibm fast data meetup
 
Fast Data Overview
Fast Data OverviewFast Data Overview
Fast Data Overview
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
The Expert Guide to Fast Data
The Expert Guide to Fast Data The Expert Guide to Fast Data
The Expert Guide to Fast Data
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick Gorski
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for Redis
 
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaDesign Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in Kafka
 
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big Data
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 

Similar to Stephen Dillon - Fast Data Presentation Sept 02

Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackDenodo
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumVMware Tanzu
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Unlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECUnlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECPrincipled Technologies
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 

Similar to Stephen Dillon - Fast Data Presentation Sept 02 (20)

Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Unlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECUnlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NEC
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Big Data training
Big Data trainingBig Data training
Big Data training
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 

Stephen Dillon - Fast Data Presentation Sept 02

  • 1. Fast Data: A Paradigm for the Demands of Efficient IoT Solutions Stephen Dillon Big Data Architect @StephenDillon15 stephen.dillon@schneider-electric.com http://www.linkedin.com/in/stephendillon/ Stephen Dillon
  • 2. Agenda ● Goals ● Background ● Genesis of Fast Data – Why has it emerged? – IoT – Big Data – Influences on Fast Data ● Fast Data – Define & Describe – Fog Computing ● Examples of Technologies – Review of Apache Spark Stephen Dillon
  • 3. Goals ● Be able to answer “What is Fast Data? ● Understand why we care about it. ● Expose you to Apache Spark Stephen Dillon
  • 4. About Me ● IoT platform team – Innovation team since 2016 ● Focus on technology – 6 months, 1 year, 3 years out – Big Data & DB Technologies ● NoSQL, NewSQL, Streaming ● Distributed data – Proofs of concept, Best Practices ● Technical leadership, IP, papers
  • 5. Recent Work in 2016 ● Recent white paper: – “IoT and the Pervasive Nature of Fast Data and Apache Spark” – bit.ly/1Td6KFU ● Co-inventor on 2 patent submissions – using Spark ● Co-authored upcoming research paper on Federated Data queries Stephen Dillon
  • 6. GENESIS OF FAST DATA Stephen Dillon
  • 7. Why Fast Data? ● Growth of IoT ● Mobility of IoT – Demands lower latency ● Complexity of analytics – Graph theory – Predictive Analytics – Machine Learning Stephen Dillon
  • 8. Internet of Things “The internet of things (IoT) is the network of physical objects—devices, vehicles, buildings and other items—embedded with electronics, software, sensors, and network connectivity that enables these objects to collect and exchange data.” - Wikipedia Stephen Dillon
  • 9. Stephen Dillon, Schneider Electric Stephen Dillon
  • 10. IoT is Not Only about Hardware It's also what you do with the Data that matters!
  • 11. Why does it matter? ● Sensors collect data ● Data fuels analytics ● Analytics support business ● Derive actionable insights from the data Stephen Dillon
  • 13. Classic Definition "...data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it." Stephen Dillon
  • 14. Big Data ● Volume – A lot of it ● Velocity – Ingress at high frequencies ● Variety – Multi-structured not unstructured – Data from disparate sources – Different data points are captured Stephen Dillon
  • 15. Influences on Fast Data ● NoSQL ● Hadoop Framework ● Mapreduce & Batch Analytics ● NewSQL (in-memory DBs & distributed row stores) Stephen Dillon Led to 3 significant concepts…
  • 16. 3 Significant Concepts ● Distributed Data storage ● Horizontal, shared-nothing, scale-out architecture ● In-memory processing…RAM is the new disk Stephen Dillon
  • 17. FAST DATA Describe, Define, Detail Stephen Dillon
  • 18. Definition Fast Data is a paradigm that supports "...as-it-happens information enabling real-time decision-making.“ [1]. It encompasses not only the ingestion of data at speed but also the processing of the data, deriving actionable insights from it, and the speed of delivery of the results. It truly encompasses the Variety and Volume of data at Velocity in all aspects. [1] Alissa Lorentz, “Big Data, Fast Data, Smart Data”, Stephen Dillon
  • 19. Characteristics ● It’s a paradigm, not a technology. ● A subset of Big Data ● Describes data in motion ● Data ingestion is a key tenet but... – Not only about Velocity of data ingestion Stephen Dillon
  • 20. Fast Data Solutions ● Streaming ● Interactive queries (batch & real-time) ● In-memory capability ● Provides low-latency of ingestion, processing, delivery Stephen Dillon
  • 22. FOG COMPUTING The Evolution of Fast Data Stephen Dillon
  • 23. What is it? ● Only similar to “Edge” computing – Fog pushes processing to a Fog node or gateway – Edge places it on devices ● A decentralized computing infrastructure ● Move your compute resources & application services closer to the data ● The goal is to improve efficiency and reduce the amount of data that needs to be transported to the cloud for data processing, analysis and storage. Stephen Dillon
  • 26. Apache Spark ● A distributed compute engine that supports Fast Data via its in- memory, distributed processing capability and its bundled APIs. It can "...run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." Databricks Stephen Dillon
  • 27. What Makes it Fast? ● Map Reduce on steroids 1. Spark passes data directly to other operations 2. In-memory processing of distributed data 3. JVM on each executor Stephen Dillon
  • 28. Spark Architecture Mastering Apache Spark Stephen Dillon
  • 29. Misconceptions ● Not a Database! ● Not entirely in-memory ● Not a Hadoop Competitor ● No native distributed file system Stephen Dillon
  • 30. Contact @StephenDillon15 stephen.dillon@schneider-electric.com http://www.linkedin.com/in/stephendillon/ White paper: “IoT and the Pervasive Nature of Fast Data and Apache Spark” Via Schneider Electric blog bit.ly/1Td6KFU
  • 32. STATE OF THE ART Commercial & Open-Source Stephen Dillon
  • 33. Spark Core Concepts ● RDD - Resilient Distributed Dataset – Join RDDs from different sources ● Dataframes – Allow you work with data in a table structure – API for building a relational query plan ● Exactly Once Semantics – No Duplicate data Stephen Dillon