Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Real-Time Data Streaming
with Databricks, Spark
and Power BI
Bennie Haelen
Principal Architect – Insight Digital Innovation
Use Case Description
• Large Metropolitan Fire Department
• Implemented a MDW architecture on Azure
• Based upon the Insig...
Use Case Extension
• Need to add a real-time reporting channel
• Up-to-date location & status of equipment
• Location & st...
Use Case Analysis
• Forwarding of events through the Azure Cloud
• ESB exposes a Web Sockets interface
• Azure function re...
Architectural Requirements
• Ingest Event Stream
• High ingestion rate (1000+ events per second)
• Need high-performance, ...
Solution Architecture
Ingestion Channel
Azure Event
Hubs
Event Processing
Databricks with Spark
Structured Streaming
Real-...
Demo Architecture
• nb-create-unitStatusTable notebook
Invokes the generic CreateDeltaTable with the
appropriate parameter...
Demo - Organization
Creation of
Delta Lake Table
Implementation Resources Walk Through
Spark Streaming
Notebook
Stream Pro...
Demo 1 – Infrastructure Walkthrough
Demo 2 – Code Walkthrough
Demo 3 – Sample Run
Summary
• The need for large scale real-time stream processing
become more evident every day
• Provide organizations with ...
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Build Real-Time Applications with Databricks Streaming

Download to read offline

In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure.

This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.)

• The current locations and status of firefighters, EMT personnel and other relevant fire department employees

• The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments.

In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.

  • Be the first to like this

Build Real-Time Applications with Databricks Streaming

  1. 1. Real-Time Data Streaming with Databricks, Spark and Power BI Bennie Haelen Principal Architect – Insight Digital Innovation
  2. 2. Use Case Description • Large Metropolitan Fire Department • Implemented a MDW architecture on Azure • Based upon the Insight repeatable MDW framework architecture Legend RAW Ins-swdi-lens-aas Azure Automation Ins-swdi-lens-lapp PL_MT_raw2stage PL_processAAS Dataflow Workflow PL_DATA_ORA_2_ADLS_FULL DROPZONE CSV file 1 2 4 7 8 9 Power BI 5 PL_MT_stage2mdw PL_DATA_mdw2asql 6 Ins-swdi-lens-asql 3 Ins-swdi-lens-adf RAW/Archive STAGE MDW Oracle .parquet Workspace Folders Storage Acct ins-swdi-lens-adls Databricks Hive Databases Key Vaults Ins-swdi-lens-email-lapp
  3. 3. Use Case Extension • Need to add a real-time reporting channel • Up-to-date location & status of equipment • Location & status of firefighters, EMT personnel • List of active incidents within the city • Near real-time Visualization • Automatically updating dashboard • Map with automatic updates of locations and incidents • Used by fire chiefs to make real-time move-up decisions • Pre-emptively Move-up equipment & resources
  4. 4. Use Case Analysis • Forwarding of events through the Azure Cloud • ESB exposes a Web Sockets interface • Azure function reads events from ESB through WebSockets interface • Function forwards the events to the Azure cloud • Function is hosted in a Web Application Central FD Database Ingest data from the various event sources Change Data Capture Triggered with each transactional operation Enterprise Service Bus CDC Ingest & forward events to consumers Solution • Create Cloud ingest • Real Time Stream processing • Performant ACID Data Store • Real-Time Visualization `
  5. 5. Architectural Requirements • Ingest Event Stream • High ingestion rate (1000+ events per second) • Need high-performance, fault tolerant service • Stream Events, perform domain-specific conversions • Need real-time streaming analytics • Stored Processed Data in high-performant data store • Keyed access to the data • Ability to perform UPSERT operations • Visualize the data in a real-time dashboard • Updates triggered by data changes in the underlying data store
  6. 6. Solution Architecture Ingestion Channel Azure Event Hubs Event Processing Databricks with Spark Structured Streaming Real-Time Data Store Databricks Delta Lake Visualization Power BI Service Dashboard Ingest Event Stream • High ingestion rate (1000+ events per second) • Need high-performance, fault tolerant service Azure Event Hubs • Microsoft real-time data ingestion engine • Can ingest millions of events/second • Kafka compatibility Process Stream • Continuous Processing • Real time ingestion • Micro-batch processing Databricks on Azure • Spark Structured Streaming • Fault-tolerant Stream processing engine • Kafka compatibility Real-Time Storage • Keyed Access to Data • Ability to perform UPSERTS • Simple SQL-based access Delta Lake • ACID Transactions • High Scalability Real-Time Visualization • Simple Integration • Updates through Data Triggers • Direct Query into Data Source Microsoft Power BI • Direct Query against Delta Lake • Real-time dashboarding facilities • Updates trigger through data changes or push datasets
  7. 7. Demo Architecture • nb-create-unitStatusTable notebook Invokes the generic CreateDeltaTable with the appropriate parameters to create our UnitStatus table • nb-create-delta-table notebook Generic notebook which creates a Delta table • nb-eventhub-spark-streaming notebook reads the events from Event Hubs and invokes the foreachBatch sink function implemented in nb- unitstatus-event-processor notebook • nb-unitstatus-event-processor Processes the events, performs the transformations, and finally updates our UnitStatusTable Units-eh Event Hub C# .NET Console Application nb-eventhub-spark- streaming Databricks Notebook nb-unitstatus- event-processor Delta Table old_stream_fd. unit_status Databricks Notebook nb-create-unit- status-table Databricks Notebook nb-create-delta- table Create Delta Table unit_status UPSERTS Power BI Premium Power BI Report Streaming- demo.eventsimulator Databricks Notebook
  8. 8. Demo - Organization Creation of Delta Lake Table Implementation Resources Walk Through Spark Streaming Notebook Stream Processor Function Demo Run Event Simulator
  9. 9. Demo 1 – Infrastructure Walkthrough
  10. 10. Demo 2 – Code Walkthrough
  11. 11. Demo 3 – Sample Run
  12. 12. Summary • The need for large scale real-time stream processing become more evident every day • Provide organizations with the ability to respond quickly to a dynamic business climate • Spark Structured Streaming makes it easy to add a real- time channel • Simple extensions on top of Spark SQL
  13. 13. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure. This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.) • The current locations and status of firefighters, EMT personnel and other relevant fire department employees • The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments. In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.

Views

Total views

124

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

8

Shares

0

Comments

0

Likes

0

×