•
•
•
Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX
(graph)
Structured
Streaming
Mllib
(machine
learning)
Spark: The Definitive Guide
Source: http://spark.apache.org/
Structured
Streaming
Advanced
Analytics
Libraries &
Ecosystem
Low Level APIs
Structure APIs
Datasets DataFrame SQL
RDDs Distributed Variables
RDD
RDD
RDD
RDDRDD
Transformations ValueActions
Transformations Actions
select show
distinct count
groupBy collect
sum save
orderBy first
filter
limit
summarize
… and much more
Driver
Cluster Manager
Executor
Spark Session
User code
Executor Executor
Distributed Data Structure
Partition Partition Partition
Partition Partition Partition
Managed Apache Spark platform optimized for Azure
Microsoft Azure
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
AZURE DATABRICKS
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Cosmos DB
Kafka on HDInsight
Event Hubs
Power BI
SQL DW
Data Factory
O R C H E S T R A T I O N
Storage (Azure) Azure Data Lake
S T O R A G E
I N G E S T V I S U A L I Z E
S E C U R E Azure Active Directory
A Z U RE DATA BRIC KS
DBFS
Storage blob
CLI
https://movielens.org/
F. Maxwell Harper and Joseph A. Konstan. 2015.
The MovieLens Datasets: History and Context.
ACM Transactions on Interactive Intelligent
Systems (TiiS) 5, 4, Article 19 (December 2015), 19
pages. DOI=http://dx.doi.org/10.1145/2827872
Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX
(graph)
Structured
Streaming
Mllib
(machine
learning)
Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX
(graph)
Structured
Streaming
Mllib
(machine
learning)



Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX
(graph)
Structured
Streaming
Mllib
(machine
learning)
 Collaborative Workspace
Deploy Production Jobs & Workflows
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Collaborative Workspace
Deploy Production Jobs & Workflows
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
https://github.com/devlace/azure-databricks-
recommendation-system
Official Apache Spark website
Azure Databricks Documentation
[Book] Spark: The Definitive Guide
Spark as a Service with Azure Databricks

Spark as a Service with Azure Databricks