Yaron Ekshtein, Iguazio
April 2019
Real-Time Analytics & Actions at Scale
with Apache Spark and Nuclio (Serverless)
§ Current data-science and analytics challenges
§ A continuous and cloud native architecture
§ What does Serverless have to do with it?
§ Use cases
§ Summary and Q&A
Agenda
3
The Surprising Truth About What it Takes to Build a
Machine Learning Product
Source: https://medium.com/thelaunchpad/the-ml-surprise-f54706361a6c
Josh Cogan, Google
The Data-Driven Business Challenge
From Reactive to Proactive and Intelligent
Value
of Data
Time to Action
Real-time Minutes Days
Interactive
Event-Driven
Batch
Evolve Into an Agile Cloud-Native Architecture
Your Business Logic
Consume
Innovate
Cloud Storage and Databases
Any Containerized Microservice
6
Today: Intelligent App Pipeline is Complex and Siloed
Multiple Management
Interfaces:
Collection and
Exploration
ML Development
and Training
Deployment & Serving
(cloud or edge)
Stream Processing
ETL and Batch ML Training Jobs
Interactive Data Science ML model
Interactive app
Data and
Compute:
Data and
Compute:
Data and
Compute:
Data Engineers
App Developers,
Data EngineersData Scientists
Data Sources
Data Lakes/
Warehouses Reports and
Dashboards
Triggers and
Interaction
7
A Continuous Pipeline, Focused On Production
Real-time and historical data
Train and Test
ML Models
Deploy with
Serverless
Collect, Explore
and Tag Data
Monitor
Triggers and
Interactions
Data Sources
Develop MonitorDeploy
Microservices
8
§ Zero copy, buffer reuse
§ Up to 400K events/sec/proc
§ GPU Support
Nuclio: Taking Serverless to The Next Level
Function
Workers
Event
Listeners
Open-source Serverless for compute & data intensive tasks
Extreme Performance
Shard 1 Workers
Workers
Shard 2
Shard 3
Shard 4 Workers
Advanced Data & AI Features
DB, MQ, File
Functions
§ Auto-rebalance, checkpoints
§ Any trigger source
§ Simple integration
§ Data bindings
§ Shared volumes
§ Context cache
Statefulness
nuclio processor
Building Real-Time Intelligent Apps, The Easy way !
Use-cases:
Demo Video
10
https://www.youtube.com/watch?v=vA8Uq7MvxL4
Demo: Voice Driven Real-Time Analytics
Voice
Query
SQL APIAI
Update
Locations SMART HOME
DEVICE
GOOGLE
MAP
SERVICE
WEB UI (REACT)
SQL Query
12
Use Case: Real-Time Analysis of Financial Data
RT Tweet
Sentiment
Analysis
Tick feed
Analysis
& Tagging
Real-time Dashboard
News Stream
viewer
World Trading Data
Data Exploration
& RT Analysis
• Enriched tweet stream
• Stocks tables
• Stocks + sentiment TSDB
13
Auto-Healing Network Operations
Predict network outages and avoid them in real-time
§ Cross correlating real time data from multiple sources with historical data
§ AI based predictions trigger pre-programmed actions that fix evolving problems in the network
§ Implemented within weeks
14
Demo: Predictive Netops Using Serverless + Spark
NLP processing
Of real-time
router logs
NetFlow
data
Exploration &
Correlation
ML Training,
Model export
Failure & Anomaly
prediction
Real-time DB
Real-time
telemetry
Serverless
Spark
Auto-deploy
15
Real-time Data and AI for Airport Operations
Real-time Database
NoSQL + K/V tables + TSDB
Ingest and Process Data for Intelligent Apps
Staff
roster
Vehicle
Telemetry
Passenger
status
Flight Status
Baggage
status
Flight
Schedule
Events Streams
Scheduled batch
Push / Pull via
REST API
Insights
BI style dashboards
& alerts
Real-time Apps
Dashboards
alerts and actions
Intelligent Apps
Other AI/ML
Systems
Leading Airport Ground Operations uses AI to react faster to schedule changes
§ Quicker ground handling response to flight re-scheduling
§ Operational efficiency and visibility
Time Series Vectors
(Avg, Min/Max, Stdev per sensor)
Process
Sensor Data
• ML Models
• Machine Metadata
• Environmental dataReal-time
dashboard
Real-time
Alerts
Predicted
Alerts
Aggregate using
Time Series APIs
Every 6
hours
Every 15
minutes
Devices & Machines
Predict Upload to
Cloud
Query
APIs
Stream
Trigger
NoSQL & Time
Series API
intelligent edge
Web
hook
Update ML
Model
Example: Predictive Maintenance Based on Real-time + Historical Data
17
§ Focus on using data, not collecting it
§ Adopt a continuous data and integration approach
§ Consolidate cloud-native microservices architecture
§ Use Serverless – for faster agile results
Build continuous, AI-driven and proactive apps faster
Summary
My Email: yarone@Iguazio.com

Real-Time Analytics and Actions Across Large Data Sets with Apache Spark

  • 1.
    Yaron Ekshtein, Iguazio April2019 Real-Time Analytics & Actions at Scale with Apache Spark and Nuclio (Serverless)
  • 2.
    § Current data-scienceand analytics challenges § A continuous and cloud native architecture § What does Serverless have to do with it? § Use cases § Summary and Q&A Agenda
  • 3.
    3 The Surprising TruthAbout What it Takes to Build a Machine Learning Product Source: https://medium.com/thelaunchpad/the-ml-surprise-f54706361a6c Josh Cogan, Google
  • 4.
    The Data-Driven BusinessChallenge From Reactive to Proactive and Intelligent Value of Data Time to Action Real-time Minutes Days Interactive Event-Driven Batch
  • 5.
    Evolve Into anAgile Cloud-Native Architecture Your Business Logic Consume Innovate Cloud Storage and Databases Any Containerized Microservice
  • 6.
    6 Today: Intelligent AppPipeline is Complex and Siloed Multiple Management Interfaces: Collection and Exploration ML Development and Training Deployment & Serving (cloud or edge) Stream Processing ETL and Batch ML Training Jobs Interactive Data Science ML model Interactive app Data and Compute: Data and Compute: Data and Compute: Data Engineers App Developers, Data EngineersData Scientists Data Sources Data Lakes/ Warehouses Reports and Dashboards Triggers and Interaction
  • 7.
    7 A Continuous Pipeline,Focused On Production Real-time and historical data Train and Test ML Models Deploy with Serverless Collect, Explore and Tag Data Monitor Triggers and Interactions Data Sources Develop MonitorDeploy Microservices
  • 8.
    8 § Zero copy,buffer reuse § Up to 400K events/sec/proc § GPU Support Nuclio: Taking Serverless to The Next Level Function Workers Event Listeners Open-source Serverless for compute & data intensive tasks Extreme Performance Shard 1 Workers Workers Shard 2 Shard 3 Shard 4 Workers Advanced Data & AI Features DB, MQ, File Functions § Auto-rebalance, checkpoints § Any trigger source § Simple integration § Data bindings § Shared volumes § Context cache Statefulness nuclio processor
  • 9.
    Building Real-Time IntelligentApps, The Easy way ! Use-cases:
  • 10.
  • 11.
    Demo: Voice DrivenReal-Time Analytics Voice Query SQL APIAI Update Locations SMART HOME DEVICE GOOGLE MAP SERVICE WEB UI (REACT) SQL Query
  • 12.
    12 Use Case: Real-TimeAnalysis of Financial Data RT Tweet Sentiment Analysis Tick feed Analysis & Tagging Real-time Dashboard News Stream viewer World Trading Data Data Exploration & RT Analysis • Enriched tweet stream • Stocks tables • Stocks + sentiment TSDB
  • 13.
    13 Auto-Healing Network Operations Predictnetwork outages and avoid them in real-time § Cross correlating real time data from multiple sources with historical data § AI based predictions trigger pre-programmed actions that fix evolving problems in the network § Implemented within weeks
  • 14.
    14 Demo: Predictive NetopsUsing Serverless + Spark NLP processing Of real-time router logs NetFlow data Exploration & Correlation ML Training, Model export Failure & Anomaly prediction Real-time DB Real-time telemetry Serverless Spark Auto-deploy
  • 15.
    15 Real-time Data andAI for Airport Operations Real-time Database NoSQL + K/V tables + TSDB Ingest and Process Data for Intelligent Apps Staff roster Vehicle Telemetry Passenger status Flight Status Baggage status Flight Schedule Events Streams Scheduled batch Push / Pull via REST API Insights BI style dashboards & alerts Real-time Apps Dashboards alerts and actions Intelligent Apps Other AI/ML Systems Leading Airport Ground Operations uses AI to react faster to schedule changes § Quicker ground handling response to flight re-scheduling § Operational efficiency and visibility
  • 16.
    Time Series Vectors (Avg,Min/Max, Stdev per sensor) Process Sensor Data • ML Models • Machine Metadata • Environmental dataReal-time dashboard Real-time Alerts Predicted Alerts Aggregate using Time Series APIs Every 6 hours Every 15 minutes Devices & Machines Predict Upload to Cloud Query APIs Stream Trigger NoSQL & Time Series API intelligent edge Web hook Update ML Model Example: Predictive Maintenance Based on Real-time + Historical Data
  • 17.
    17 § Focus onusing data, not collecting it § Adopt a continuous data and integration approach § Consolidate cloud-native microservices architecture § Use Serverless – for faster agile results Build continuous, AI-driven and proactive apps faster Summary My Email: yarone@Iguazio.com