Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

Andrei Varanovich, InSpark
Lambda Architecture in
the Cloud with Azure
Databricks
#SAISDev6

Selfie
Data & AI Lead
@DrGigabit
andrei.varanovich
andrei.varanovich@inspark.nl
Data
Programmability
Cloud
High-
performance
teams
Neural
Networks
##SAISDev6

Big Data problem is many
small data problems
##SAISDev6

4
2.500.000 visitors per year
8.000 objects of art and history
1.000.000 objects stored from
the year 1200
##SAISDev6

Under the hood
5
Retail
Sponsorship engagement
Occupancy rate
Service Management
Incident Management
Marketing Performance
Warehouse Management
Capacity Planning
Financial Performance
Ticket Sales
##SAISDev6

THE DATA JOURNEY
6##SAISDev6
Consolidation
Insights
Innovation
Consolidate data
in a centralized
store
Organizational
processes and
efficiency
New ideas,
leveraging
machine learning

7
IN THE NEED FOR THE PLATFORM
START
• Begin small and
focused
• Prove value
GROW
• Grow organically
as more use cases
arise
SCALE
• Go production and
scale to the revel
required
We are in the need of the truly elastic data platform, to avoid any upfront planning, deployment and
operations expenses, and put business value discovery first. The platform should support the [big]data
projects in any stage, without the need to reengineer the whole solution.
##SAISDev6

Lambda Architecture on Azure
8
INGEST BATCH
INGEST STREAM STORE ANALYZE
Azure Data
Lake Store
Azure Data Factory
Azure Databricks
(managed Spark;
batch & streaming)
Social
LOB
Graph
IoT
Image
CRM
AI models/ APIs
Cognitive Services
Azure container
Service & registry
Insight sharing
Power BI/ other tools
Event Hubs
Stream
Batch
SECURITY &
MANAGEMENT Azure Log
Analytics
Azure Graph
API
Cost
monitoring
Azure Active Directory
##SAISDev6

9
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Production jobs & workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
##SAISDev6

10
Simplicity is the ultimate sophistication
Leonardo da Vinci
##SAISDev6

LAMBDA TO THE RESCUE
12
##SAISDev6

##SAISDev6 13
Composition of
functions is applying
one function to the
result of another

14
f(x) = x+1
(g º f)(x) = g(f(x))
(g º f)(x) = (x+1)2
g(x) = x2
input input+1 input2
##SAISDev6
input+1
(input+1)2

15
Transformation pipeline as a series of transitions
s1
f3
f2
f1
sum

Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Production jobs & workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
##SAISDev6

Conclusions
17
… with proper design, the features come cheaply. This
approach is arduous, but continues to succeed.
—Dennis Ritchie
##SAISDev6
• Standardization on Apache Spark allows us to move forward without
introducing extra complexity.
• 100% PaaS offering is important – no need to maintain the
infrastructure. All components we use offered as PaaS on Azure.
• Data pipelines as function composition allows us to ensure end-to-end
consistency and spot the errors quickly.
• Saving intermediate states allows to quickly inspect the data sets.

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

More Related Content

What's hot

Similar to Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

More from Databricks

Recently uploaded

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich