Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

0

Share

Download to read offline

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

Download to read offline

The term “Lambda Architecture” stands for a generic, scalable and fault-tolerant data processing architecture. As the hyper-scale now offers a various PaaS services for data ingestion, storage and processing, the need for a revised, cloud-native implementation of the lambda architecture is arising.

In this talk we demonstrate the blueprint for such an implementation in Microsoft Azure, with Azure Databricks — a PaaS Spark offering – as a key component. We go back to some core principles of functional programming and link them to the capabilities of Apache Spark for various end-to-end big data analytics scenarios.

We also illustrate the “Lambda architecture in use” and the associated tread-offs using the real customer scenario – Rijksmuseum in Amsterdam – a terabyte-scale Azure-based data platform handles data from 2.500.000 visitors per year.

  • Be the first to like this

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

  1. 1. Andrei Varanovich, InSpark Lambda Architecture in the Cloud with Azure Databricks #SAISDev6
  2. 2. Selfie Data & AI Lead @DrGigabit andrei.varanovich andrei.varanovich@inspark.nl Data Programmability Cloud High- performance teams Neural Networks ##SAISDev6
  3. 3. Big Data problem is many small data problems ##SAISDev6
  4. 4. 4 2.500.000 visitors per year 8.000 objects of art and history 1.000.000 objects stored from the year 1200 ##SAISDev6
  5. 5. Under the hood 5 Retail Sponsorship engagement Occupancy rate Service Management Incident Management Marketing Performance Warehouse Management Capacity Planning Financial Performance Ticket Sales ##SAISDev6
  6. 6. THE DATA JOURNEY 6##SAISDev6 Consolidation Insights Innovation Consolidate data in a centralized store Organizational processes and efficiency New ideas, leveraging machine learning
  7. 7. 7 IN THE NEED FOR THE PLATFORM START • Begin small and focused • Prove value GROW • Grow organically as more use cases arise SCALE • Go production and scale to the revel required We are in the need of the truly elastic data platform, to avoid any upfront planning, deployment and operations expenses, and put business value discovery first. The platform should support the [big]data projects in any stage, without the need to reengineer the whole solution. ##SAISDev6
  8. 8. Lambda Architecture on Azure 8 INGEST BATCH INGEST STREAM STORE ANALYZE Azure Data Lake Store Azure Data Factory Azure Databricks (managed Spark; batch & streaming) Social LOB Graph IoT Image CRM AI models/ APIs Cognitive Services Azure container Service & registry Insight sharing Power BI/ other tools Event Hubs Stream Batch SECURITY & MANAGEMENT Azure Log Analytics Azure Graph API Cost monitoring Azure Active Directory ##SAISDev6
  9. 9. 9 Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Production jobs & workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST ##SAISDev6
  10. 10. 10 Simplicity is the ultimate sophistication Leonardo da Vinci ##SAISDev6
  11. 11. 11
  12. 12. LAMBDA TO THE RESCUE 12 ##SAISDev6
  13. 13. ##SAISDev6 13 Composition of functions is applying one function to the result of another
  14. 14. 14 f(x) = x+1 (g º f)(x) = g(f(x)) (g º f)(x) = (x+1)2 g(x) = x2 input input+1 input2 ##SAISDev6 input+1 (input+1)2
  15. 15. 15 Transformation pipeline as a series of transitions s1 f3 f2 f1 sum
  16. 16. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Production jobs & workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST ##SAISDev6
  17. 17. Conclusions 17 … with proper design, the features come cheaply. This approach is arduous, but continues to succeed. —Dennis Ritchie ##SAISDev6 • Standardization on Apache Spark allows us to move forward without introducing extra complexity. • 100% PaaS offering is important – no need to maintain the infrastructure. All components we use offered as PaaS on Azure. • Data pipelines as function composition allows us to ensure end-to-end consistency and spot the errors quickly. • Saving intermediate states allows to quickly inspect the data sets.
  18. 18. Thank you! 18 ##SAISDev6
  19. 19. Questions? 19 ##SAISDev6

The term “Lambda Architecture” stands for a generic, scalable and fault-tolerant data processing architecture. As the hyper-scale now offers a various PaaS services for data ingestion, storage and processing, the need for a revised, cloud-native implementation of the lambda architecture is arising. In this talk we demonstrate the blueprint for such an implementation in Microsoft Azure, with Azure Databricks — a PaaS Spark offering – as a key component. We go back to some core principles of functional programming and link them to the capabilities of Apache Spark for various end-to-end big data analytics scenarios. We also illustrate the “Lambda architecture in use” and the associated tread-offs using the real customer scenario – Rijksmuseum in Amsterdam – a terabyte-scale Azure-based data platform handles data from 2.500.000 visitors per year.

Views

Total views

1,606

On Slideshare

0

From embeds

0

Number of embeds

3

Actions

Downloads

137

Shares

0

Comments

0

Likes

0

×