In this session we will delve into the world of Azure Databricks and analyze why it is becoming a tool for data Scientist and/or fundamental data Engineer in conjunction with Azure services
This presentation focuses on the value proposition for Azure Databricks for Data Science. First, the talk includes an overview of the merits of Azure Databricks and Spark. Second, the talk includes demos of data science on Azure Databricks. Finally, the presentation includes some ideas for data science production.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Spark as a Service with Azure DatabricksLace Lofranco
Presented at: Global Azure Bootcamp (Melbourne)
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we will go through Azure Databricks key collaboration features, cluster management, and tight data integration with Azure data sources. We’ll also walk through an end-to-end Recommendation System Data Pipeline built using Spark on Azure Databricks.
The Developer Data Scientist – Creating New Analytics Driven Applications usi...Microsoft Tech Community
The developer world is changing as we create and generate new data patterns and handling processes within our applications. Additionally, with the massive interest in machine learning and advanced analytics how can we as developers build intelligence directly into our applications that can integrate with the data and data paths we are creating? The answer is Azure Databricks and by attending this session you will be able to confidently develop smarter and more intelligent applications and solutions which can be continuously built upon and that can scale with the growing demands of a modern application estate.
Einstieg in Machine Learning für DatenbankentwicklerSascha Dittmann
Hast Du Dich als Datenbankentwickler schon einmal gefragt, wie Du Deine Datenbank-Projekte mit Machine Learning Technologien erweitern kannst?
Wie kannst Du Dein vorhandenes Wissen wiederverwenden und was muss Du noch lernen?
In dieser Session stellt Sascha Dittmann verschiedene Lernpfade vor, um als Datenbankentwickler in die Welt des Data Science eintauchen zu können. Für seine Praxisbeispiele nutzt er dabei verschiedene Werkzeuge, wie beispielsweise die SQL Server ML Services, Azure Databricks und die Azure ML Services, um bekanntes Wissen mit Neuen zu vereinen.
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
This presentation provides a survey of the advanced analytics strengths of Microsoft Azure from an enterprise perspective (with these organizations being the bulk of big data users) based on the Team Data Science Process. The talk also covers the range of analytics and advanced analytics solutions available for developers using data science and artificial intelligence from Microsoft Azure.
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks.
In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
This presentation focuses on the value proposition for Azure Databricks for Data Science. First, the talk includes an overview of the merits of Azure Databricks and Spark. Second, the talk includes demos of data science on Azure Databricks. Finally, the presentation includes some ideas for data science production.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Spark as a Service with Azure DatabricksLace Lofranco
Presented at: Global Azure Bootcamp (Melbourne)
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we will go through Azure Databricks key collaboration features, cluster management, and tight data integration with Azure data sources. We’ll also walk through an end-to-end Recommendation System Data Pipeline built using Spark on Azure Databricks.
The Developer Data Scientist – Creating New Analytics Driven Applications usi...Microsoft Tech Community
The developer world is changing as we create and generate new data patterns and handling processes within our applications. Additionally, with the massive interest in machine learning and advanced analytics how can we as developers build intelligence directly into our applications that can integrate with the data and data paths we are creating? The answer is Azure Databricks and by attending this session you will be able to confidently develop smarter and more intelligent applications and solutions which can be continuously built upon and that can scale with the growing demands of a modern application estate.
Einstieg in Machine Learning für DatenbankentwicklerSascha Dittmann
Hast Du Dich als Datenbankentwickler schon einmal gefragt, wie Du Deine Datenbank-Projekte mit Machine Learning Technologien erweitern kannst?
Wie kannst Du Dein vorhandenes Wissen wiederverwenden und was muss Du noch lernen?
In dieser Session stellt Sascha Dittmann verschiedene Lernpfade vor, um als Datenbankentwickler in die Welt des Data Science eintauchen zu können. Für seine Praxisbeispiele nutzt er dabei verschiedene Werkzeuge, wie beispielsweise die SQL Server ML Services, Azure Databricks und die Azure ML Services, um bekanntes Wissen mit Neuen zu vereinen.
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
This presentation provides a survey of the advanced analytics strengths of Microsoft Azure from an enterprise perspective (with these organizations being the bulk of big data users) based on the Team Data Science Process. The talk also covers the range of analytics and advanced analytics solutions available for developers using data science and artificial intelligence from Microsoft Azure.
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks.
In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
Data Con LA 2020
Description
Data warehouses are not enough. Data lakes are the backbone of a modern data environment. Data Lakes are best built leveraging unique services of the cloud provider to reduce operations complexity. This session will explain why everyone's talking about data lakes, break down the best services in Azure to build a Data Lake, and walk through code for querying and loading with Azure Databricks and Event Hubs for Kafka. Attendees will leave the session with a firm grasp of why we build data lakes and how Azure Databricks fits in for ETL and querying.
Speaker
Dustin Vannoy, Dustin Vannoy Consulting, Principal Data Engineer
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
The driving force behind Apache Spark (Databricks Inc.) and Microsoft have designed a joint service to quickly and easily create Big Data and Advanced Analytics solutions. The combination of the comprehensive Databricks Unified Analytics platform and the powerful capabilities of Microsoft Azure make it easy to analyse data streams or large amounts of data, as well asthe training of AI models. Sascha Dittmann shows in this session how the new Azure service can be set up and used in various real-world scenarios. He also shows, how to connect the various Azure Services to the Azure Databricks service.
Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform optimized for Azure. Designed in collaboration with the founders of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. As an Azure service, customers automatically benefit from the native integration with other Azure services such as Power BI, SQL Data Warehouse, and Cosmos DB, as well as from enterprise-grade Azure security, including Active Directory integration, compliance, and enterprise-grade SLAs.
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we start with a technical overview of Spark and quickly jump into Azure Databricks’ key collaboration features, cluster management, and tight data integration with Azure data sources. Concepts are made concrete via a detailed walk through of an advance analytics pipeline built using Spark and Azure Databricks.
Full video of the presentation: https://www.youtube.com/watch?v=14D9VzI152o
Presentation demo: https://github.com/devlace/azure-databricks-anomaly
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of very large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for machine learning and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation.
Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and Internet of Things applications become common, data in the enterprise is growing at a staggering pace, and the need to leverage different storage tiers (ranging from tape to main memory) is posing new challenges, leading to caching technologies, such as Spark. On the analytics side, the emergence of resource managers such as YARN has opened the door for analytics tools to bypass the Map-Reduce layer and directly exploit shared system resources while computing close to data copies. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which Map-Reduce is widely recognized to be a poor fit.
While Hadoop is widely recognized and used externally, Microsoft has long been at the forefront of Big Data analytics, with Cosmos and Scope supporting all internal customers. These internal services are a key part of our strategy going forward, and are enabling new state of the art external-facing services such as Azure Data Lake and more. I will examine these trends, and ground the talk by discussing the Microsoft Big Data stack.
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...Databricks
At Lennox International, we have thousands of IoT connected devices streaming data into the Azure platform with a minute level polling interval. The challenge was to use these data sets, combine with external data sources such as weather, and predict equipment failure with high levels of accuracy along with their influencing patterns and parameters. Previously the team was using a combination of on-premise and desktop tools to run algorithms on a sample set of devices. The result was low accuracy levels (around 65%) on a process that took more than 6 hours.
The team had to work through several data orchestration challenges and identify a machine learning platform which enabled them to collaborate between our engineering SME’s, Data Engineers and Data Scientists. The team decided to use Azure Databricks to build the data engineering pipelines, appropriate machine learning models and extract predictions using PySpark. To enhance the sophistication of the learning, the team worked on a variety of Spark ML models such as Gradient Boosted Trees and Random Forest. The team also implemented stacking, ensemble methods using H2O driverless AI and sparkling water on Azure Databricks clusters, which can scale up to 1000 cores.
Join us in this session and see how this resulted in models that run in 40 minutes with minimal tuning and predict failures with accuracy of about 90%.
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...Lace Lofranco
Talk Description:
The Modern Data Warehouse architecture is a response to the emergence of Big Data, Machine Learning and Advanced Analytics. DevOps is a key aspect of successfully operationalising a multi-source Modern Data Warehouse.
While there are many examples of how to build CI/CD pipelines for traditional applications, applying these concepts to Big Data Analytical Pipelines is a relatively new and emerging area. In this demo heavy session, we will see how to apply DevOps principles to an end-to-end Data Pipeline built on the Microsoft Azure Data Platform with technologies such as Data Factory, Databricks, Data Lake Gen2, Azure Synapse, and AzureDevOps.
Resources: https://aka.ms/mdw-dataops
RDX Insights Presentation - Microsoft Business IntelligenceChristopher Foot
May's RDX Insights Series Presentation focuses on Microsoft's BI products. We begin with an overview of Power BI, SSIS, SSAS and SSRS and how the products integrate with each other. The webinar continues with a detailed discussion on how to use Power BI to capture, model, transform, analyze and visualize key business metrics. We’ll finish with a Power BI demo highlighting some of its most beneficial and interesting features.
These are the slides for my talk "An intro to Azure Data Lake" at Azure Lowlands 2019. The session was held on Friday January 25th from 14:20 - 15:05 in room Santander.
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
This presentation shows how you can build solutions that follow the modern data warehouse architecture and introduces the .NET for Apache Spark support (https://dot.net/spark, https://github.com/dotnet/spark)
Automated machine learning (automated ML) automates feature engineering, algorithm and hyperparameter selection to find the best model for your data. The mission: Enable automated building of machine learning with the goal of accelerating, democratizing and scaling AI. This presentation covers some recent announcements of technologies related to Automated ML, and especially for Azure. The demonstrations focus on Python with Azure ML Service and Azure Databricks.
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
This talk summarizes key points for big data advanced analytics on Microsoft Azure. First, there is a review of the major technologies. Second, there is a series of technology demos (focusing on VMs, Databricks and Azure ML Service). Third, there is some advice on using the Team Data Science Process to help plan projects. The deck has web resources recommended. This presentation was delivered at the Global Azure Bootcamp 2019, Atlanta GA location (Alpharetta Avalon).
Data Con LA 2020
Description
Data warehouses are not enough. Data lakes are the backbone of a modern data environment. Data Lakes are best built leveraging unique services of the cloud provider to reduce operations complexity. This session will explain why everyone's talking about data lakes, break down the best services in Azure to build a Data Lake, and walk through code for querying and loading with Azure Databricks and Event Hubs for Kafka. Attendees will leave the session with a firm grasp of why we build data lakes and how Azure Databricks fits in for ETL and querying.
Speaker
Dustin Vannoy, Dustin Vannoy Consulting, Principal Data Engineer
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
The driving force behind Apache Spark (Databricks Inc.) and Microsoft have designed a joint service to quickly and easily create Big Data and Advanced Analytics solutions. The combination of the comprehensive Databricks Unified Analytics platform and the powerful capabilities of Microsoft Azure make it easy to analyse data streams or large amounts of data, as well asthe training of AI models. Sascha Dittmann shows in this session how the new Azure service can be set up and used in various real-world scenarios. He also shows, how to connect the various Azure Services to the Azure Databricks service.
Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform optimized for Azure. Designed in collaboration with the founders of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. As an Azure service, customers automatically benefit from the native integration with other Azure services such as Power BI, SQL Data Warehouse, and Cosmos DB, as well as from enterprise-grade Azure security, including Active Directory integration, compliance, and enterprise-grade SLAs.
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we start with a technical overview of Spark and quickly jump into Azure Databricks’ key collaboration features, cluster management, and tight data integration with Azure data sources. Concepts are made concrete via a detailed walk through of an advance analytics pipeline built using Spark and Azure Databricks.
Full video of the presentation: https://www.youtube.com/watch?v=14D9VzI152o
Presentation demo: https://github.com/devlace/azure-databricks-anomaly
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of very large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for machine learning and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation.
Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and Internet of Things applications become common, data in the enterprise is growing at a staggering pace, and the need to leverage different storage tiers (ranging from tape to main memory) is posing new challenges, leading to caching technologies, such as Spark. On the analytics side, the emergence of resource managers such as YARN has opened the door for analytics tools to bypass the Map-Reduce layer and directly exploit shared system resources while computing close to data copies. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which Map-Reduce is widely recognized to be a poor fit.
While Hadoop is widely recognized and used externally, Microsoft has long been at the forefront of Big Data analytics, with Cosmos and Scope supporting all internal customers. These internal services are a key part of our strategy going forward, and are enabling new state of the art external-facing services such as Azure Data Lake and more. I will examine these trends, and ground the talk by discussing the Microsoft Big Data stack.
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...Databricks
At Lennox International, we have thousands of IoT connected devices streaming data into the Azure platform with a minute level polling interval. The challenge was to use these data sets, combine with external data sources such as weather, and predict equipment failure with high levels of accuracy along with their influencing patterns and parameters. Previously the team was using a combination of on-premise and desktop tools to run algorithms on a sample set of devices. The result was low accuracy levels (around 65%) on a process that took more than 6 hours.
The team had to work through several data orchestration challenges and identify a machine learning platform which enabled them to collaborate between our engineering SME’s, Data Engineers and Data Scientists. The team decided to use Azure Databricks to build the data engineering pipelines, appropriate machine learning models and extract predictions using PySpark. To enhance the sophistication of the learning, the team worked on a variety of Spark ML models such as Gradient Boosted Trees and Random Forest. The team also implemented stacking, ensemble methods using H2O driverless AI and sparkling water on Azure Databricks clusters, which can scale up to 1000 cores.
Join us in this session and see how this resulted in models that run in 40 minutes with minimal tuning and predict failures with accuracy of about 90%.
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...Lace Lofranco
Talk Description:
The Modern Data Warehouse architecture is a response to the emergence of Big Data, Machine Learning and Advanced Analytics. DevOps is a key aspect of successfully operationalising a multi-source Modern Data Warehouse.
While there are many examples of how to build CI/CD pipelines for traditional applications, applying these concepts to Big Data Analytical Pipelines is a relatively new and emerging area. In this demo heavy session, we will see how to apply DevOps principles to an end-to-end Data Pipeline built on the Microsoft Azure Data Platform with technologies such as Data Factory, Databricks, Data Lake Gen2, Azure Synapse, and AzureDevOps.
Resources: https://aka.ms/mdw-dataops
RDX Insights Presentation - Microsoft Business IntelligenceChristopher Foot
May's RDX Insights Series Presentation focuses on Microsoft's BI products. We begin with an overview of Power BI, SSIS, SSAS and SSRS and how the products integrate with each other. The webinar continues with a detailed discussion on how to use Power BI to capture, model, transform, analyze and visualize key business metrics. We’ll finish with a Power BI demo highlighting some of its most beneficial and interesting features.
These are the slides for my talk "An intro to Azure Data Lake" at Azure Lowlands 2019. The session was held on Friday January 25th from 14:20 - 15:05 in room Santander.
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
This presentation shows how you can build solutions that follow the modern data warehouse architecture and introduces the .NET for Apache Spark support (https://dot.net/spark, https://github.com/dotnet/spark)
Automated machine learning (automated ML) automates feature engineering, algorithm and hyperparameter selection to find the best model for your data. The mission: Enable automated building of machine learning with the goal of accelerating, democratizing and scaling AI. This presentation covers some recent announcements of technologies related to Automated ML, and especially for Azure. The demonstrations focus on Python with Azure ML Service and Azure Databricks.
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
This talk summarizes key points for big data advanced analytics on Microsoft Azure. First, there is a review of the major technologies. Second, there is a series of technology demos (focusing on VMs, Databricks and Azure ML Service). Third, there is some advice on using the Team Data Science Process to help plan projects. The deck has web resources recommended. This presentation was delivered at the Global Azure Bootcamp 2019, Atlanta GA location (Alpharetta Avalon).
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementDatabricks
The ML Lifecycle management process is quickly becoming the bottleneck for a lot of ML projects. With MLflow’s newest release, and its enhanced integration with Azure Machine Learning, this process is now showing the right promise and capabilities on Azure. In this talk, we intend to take a tour of the integration details and how MLOps is now becoming a strength of the platform. We’ll talk about versioning, maintaining run history, production pipeline automation, deployment to cloud and edge, and CI/CD pipelines with MLOps as the backdrop.
Be prepared for an interactive conversation as we intend to seek a lot of feedback on the integration and capabilities being lit up.
Introduction to Machine learning and Deep LearningNishan Aryal
Overview of Machine Learning and Deep Learning. Brief introduction to different types of BI Reporting tools like Power BI, SSMS, Cortana, Azure ML, TenserFlow and other tools.
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
Part 1 of a conference workshop. This forms the morning session, which looks at moving from Business Intelligence to Analytics.
Topics Covered: Azure Data Explorer, Azure Data Factory, Azure Synapse Analytics, Event Hubs, HDInsight, Big Data
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis
In dieser Session stellen wir ein Projekt vor, in welchem wir ein umfassendes BI-System mit Hilfe von Azure Blob Storage, Azure SQL, Azure Logic Apps und Azure Analysis Services für und in der Azure Cloud aufgebaut haben. Wir berichten über die Herausforderungen, wie wir diese gelöst haben und welche Learnings und Best Practices wir mitgenommen haben.
When it comes to Large Scale data processing and Machine Learning, Apache Spark is no doubt one of the top battle-tested frameworks out there for handling batched or streaming workloads. The ease of use, built-in Machine Learning modules, and multi-language support makes it a very attractive choice for data wonks. However bootstrapping and getting off the ground could be difficult for most teams without leveraging a Spark cluster that is already pre-provisioned and provided as a managed service in the Cloud, while this is a very attractive choice to get going, in the long run, it could be a very expensive option if it’s not well managed.
As an alternative to this approach, our team has been exploring and working a lot with running Spark and all our Machine Learning workloads and pipelines as containerized Docker packages on Kubernetes. This provides an infrastructure-agnostic abstraction layer for us, and as a result, it improves our operational efficiency and reduces our overall compute cost. Most importantly, we can easily target our Spark workload deployment to run on any major Cloud or On-prem infrastructure (with Kubernetes as the common denominator) by just modifying a few configurations.
In this talk, we will walk you through the process our team follows to make it easy for us to run a production deployment of our Machine Learning workloads and pipelines on Kubernetes which seamlessly allows us to port our implementation from a local Kubernetes set up on the laptop during development to either an On-prem or Cloud Kubernetes environment
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
ML.NET 1.0 release is the first major milestone of a great journey that started in May 2018 when we released ML.NET 0.1 as open source. ML.NET is an open-source and cross-platform machine learning framework for .NET developers. Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom AI into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Recommendation, Image Classification and more.
“Automated ML” is a collection of new technologies from Microsoft to enhance the data science development process. Still in preview, Auto ML for ML.NET 1.0 will be demonstrated in a Deep Learning Virtual Machine running Windows Server 2016. Code examples are in C# and run in Visual Studio Community 2019.
This presentation is the second of four related to ML.NET and Automated ML. The presentation will be recorded with video posted to this YouTube Channel: http://bit.ly/2ZybKwI
Comparing Microsoft Big Data Platform TechnologiesJen Stirrup
In this segment, we look at technologies such as HDInsight, Azure Databricks, Azure Data Lake Analytics and Apache Spark. We compare the technologies to help you to decide the best technology for your situation.
Jupyter Notebooks and Apache Spark are first class citizens of the Data Science space, a truly requirement for the "modern" data scientist. Now with Azure Synapse these two computing powers are available to the .NET Developer. And .NET is available for all data scientists. Let's look what .net can do for notebooks and spark inside Azure Synapse and what are Synapse, notebooks and spark.
2018 11 14 Artificial Intelligence and Machine Learning in AzureBruno Capuano
Slides used during my session "Artificial Intelligence and Machine Learning in Azure" for The Azure Group (Canada's Azure User Community) on November 14 2018.
Public group
Similar to Global AI Bootcamp Madrid - Azure Databricks (20)
Microsoft 365 Virtual 2020 Spain - Microsoft Graph Search APIAlberto Diaz Martin
Microsoft Graph es el núcleo de nuestros desarrollos en Microsoft 365 y como tal, nos ofrece direfentes endpoints para extender la plataforma. ¿Qué API tenemos para Microsoft Search?
En esta sesión, haremos un repaso a las capacidades de Microsoft Graph para usar Microsoft Search en nuestras aplicaciones, así como las posibilidades de extender Microsoft Search con nuestros propios conectores.
En esta sesión os contaremos la visión de React para el desarrollo de aplicaciones web desde el punto de vista de un desarrollador de ASP.NET que tiene que aprender a trabajar con estas nuevas tecnologías.
Azure4Research - Big Data Analytics con Hadoop, Spark y Power BIAlberto Diaz Martin
n esta sesión, veremos el desarrollo de un proceso de AI con Azure Databricks que nos ayudará a trabajar con datos estructurados y no estructurados, a obtener una visión profunda del algoritmo a implementar e incluso crear un ciclo aprendizaje en tiempo real. El objetivo será adentrarnos en un proyecto de AI para preparar los datos, realizar el análisis que nos permita elegir un algoritmo, entrenar un modelo y ejecutar una predicción de dicho modelo. Todo esto con mucho Big Data y Power BI como herramienta de Reporting.
En este webinar queremos inspirarte, mostrándote las tecnologías de IA que ya dominan nuestro entorno, y ponerte las pilas con todo lo que está por venir en materia de Asistentes Virtuales, reconocimiento de imágenes, de voz o de aprendizaje automático. Hablaremos de:
Machine Learning
Cognitive Services
Deep Learning
¿Nunca te ha faltado alguna funcionalidad en el asistente qué usas? Puede que la haya en otro idioma pero no en el tuyo, somos impacientes ¿La hacemos? ¿Extendemos Google Assistant, Alexa y Cortana?
Hubo un momento en la antiguedad que SharePoint se ejecutaba en los servidores de nuestros Datacenters, y los usuarios de negocio nos pedían personalizaciones y desarrollábamos soluciones que aportaban valor a las funcionalidades base de SharePoint. Ahora que todo se está moviendo hacia Office 365, hay cada vez menos personalizaciones que podamos hacer. Aún así, los usuarios siguen necesidades soluciones y ese código tiene que ejecutarse en alguna parte. Nos encontramos con el problema de hospedar nuestro propio webjob, incluso de tener que encontrar una VM. ¿Tendremos que aprender a Docker? Gracias a Dios en 2016 una nueva arquitectura llamada #Serverless que una solución a muchos de estos problemas y nos permite centrarnos en lo que hacemos mejor, la creación de soluciones. Sin tener que preocuparse de dónde o cómo se ejecuta nuestra solución. Esta sesión introduce Azure Functions y varios escenarios posibles que podemos aplicar a SharePoint y Office 365.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Global AI Bootcamp Madrid - Azure Databricks
1. AI con Azure Databricks
Alberto Diaz Martin
CTIO en ENCAMINA y Microsoft Azure MVP
2. Alberto Diaz Martin
adiazcan@hotmail.com - @adiazcan
Alberto Diaz cuenta con más de 15 años de experiencia en la Industria IT, todos ellos trabajando
con tecnologías Microsoft. Actualmente, es Chief Technology Innovation Officer en ENCAMINA,
liderando el desarrollo de software con tecnología Microsoft, y miembro del equipo de Dirección.
Para la comunidad, trabaja como organizador y speaker de las conferencias más relevantes del
mundo Microsoft en España, en las cuales es uno de los referentes en SharePoint, Office 365 y
Azure. Autor de diversos libros y artículos en revistas profesionales y blogs, en 2013 empezó a
formar parte del equipo de Dirección de CompartiMOSS, una revista digital sobre tecnologías
Microsoft.
Desde 2011 ha sido nombrado Microsoft MVP, reconocimiento que ha renovado por séptimo año
consecutivo. Se define como un geek, amante de los smartphones y desarrollador. Fundador de
TenerifeDev (www.tenerifedev.com), un grupo de usuarios de .NET en Tenerife, y coordinador de
SUGES (Grupo de Usuarios de SharePoint de España, www.suges.es)
9. Machine Learning on Azure
Sophisticated pretrained models
To simplify solution development
Azure
Databricks
Machine Learning
VMs
Popular frameworks
To build advanced deep learning solutions TensorFlow KerasPytorch Onnx
Azure
Machine Learning
LanguageSpeech
…
SearchVision
On-premises Cloud Edge
Productive services
To empower data science and development teams
Powerful infrastructure
To accelerate deep learning
Flexible deployment
To deploy and manage models on intelligent cloud and edge
10. Recommended architecture to build e2e ML solutions
ServeStore Prep and trainIngest
Batch data
Streaming data
Azure Kubernetes
service
Power BI
Azure analysis
services
Azure SQL data
warehouse
Cosmos DB, SQL DB
Azure Data Lake Storage
Azure Data Factory
Azure Event
Hubs
Azure Databricks
Azure Machine
Learning service
Apps
Model Serving
Ad-hoc Analysis
Operational
Databases
11. What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, ADLS, Azure Storage, Azure Data Factory,
Azure AD, Event Hub, IoT Hub, HDInsight Kafka, SQL DB)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
12. Optimized Databricks Runtime Engine
DATABRICKS I/O High Concurrency
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
A Z U R E D A T A B R I C K S
14. SQL, Python, Scala & R Support
Code in your favorite language
Source data from File System, Object stores, HDFS, Database, Pub-Sub systems & Others
Read and write data from/to multiple sources
Optimized for Azure Blob Store, ADLS, SQLDW, Event Hubs & Cosmos DB
File Formats
CSV, JSON, Parquet, Text, ORC, XML & More
Your Language, Your Data (Anywhere), Your Format
16. PREP & TRAIN
Collect and prepare data Train and evaluate model
A
B
C
Operationalize and manage
Azure Databricks
Azure Data Factory
Azure Databricks
Azure Databricks
Azure ML Services
17. Collect and prepare all of your data at scale
Ingest
Azure Data
Factory
Store
Azure Blob
Storage
Understand and transform
Azure
Databricks
• Leverage open source technologies
• Collaborate within teams
• Use ML on batch streams
• Build in the language of your choice
• Leverage scale out topology
• Scale compute and storage separately
• Integrate with all of your data sources
• Create hybrid pipelines
• Orchestrate in a code-free environment
Leverage best-in-class
analytics capabilities
Scale
without limits
Connect to data
from any source
19. Train and evaluate machine learning models
• Easily scale up or scale out
• Autoscale on serverless infrastructure
• Leverage commodity hardware
• Determine the best algorithm
• Tune hyperparameters to optimize models
• Rapidly prototype in agile environments
• Collaborate in interactive workspaces
• Access a library of battle-tested models
• Automate job execution
Scale compute resources
to meet your needs
Quickly determine the
right model for your data
Simplify model
development
Automated ML capabilities
Azure ML
Services
Automated ML
Scale out clusters
Infrastructure
Azure
Databricks
Machine learning
Tools
Azure
Databricks
20. S P A R K M A C H I N E L E A R N I N G ( M L ) O V E R V I E W
Offers a set of parallelized machine learning algorithms (see next
slide)
Supports Model Selection (hyperparameter tuning) using Cross
Validation and Train-Validation Split.
Supports Java, Scala or Python apps using DataFrame-based API (as
of Spark 2.0). Benefits include:
• An uniform API across ML algorithms and across multiple languages
• Facilitates ML pipelines (enables combining multiple algorithms into a
single pipeline).
• Optimizations through Tungsten and Catalyst
• Spark MLlib comes pre-installed on Azure Databricks
• 3rd Party libraries supported include: H20 Sparkling Water, SciKit-
learn and XGBoost
Enables Parallel, Distributed ML for large datasets on Spark Clusters
21. Why use Azure Databricks for Machine learning?
Complete platform in one (Data ingestion, exploration, transformation, featurization, model building,
model tuning, and even model serving).
No need to copy the data in our system to do ML on it.
DataScientists like the ease of use of our platform.
Deep learning algorithms are now available!
Productionization Features built in.
25. Operationalize and manage models with ease
• Identify and promote your best models
• Capture model telemetry
• Retrain models with APIs
• Deploy models anywhere
• Scale out to containers
• Infuse intelligence into the IoT edge
• Build and deploy models in minutes
• Iterate quickly on serverless infrastructure
• Easily change environments
Proactively manage
model performance
Deploy models
closer to your data
Bring models
to life quickly
Train and evaluate models
Azure
Databricks
Model MGMT, experimentation,
and run history
Azure
ML Services
Containers
AKS ACI
IoT edge
Docker
26. ML Model Export allows you to export models and full ML
pipelines
Then imported into Spark and non-Spark platforms to do scoring, make predictions
Targeted at low-latency, lightweight ML-powered applications
We recommend using MLeap, an open source solution for ML
Model Export that works well in Azure Databricks
ML Export
27. Build and deploy deep learning models
• Choose VMs for your modeling needs
• Process video using GPU-based VMs
• Run experiments in parallel
• Provision resources automatically
• Leverage popular deep learning toolkits
• Develop your language of choice
Scale compute
resources in any environment
Quickly evaluate
and identify the right model
Streamline
AI development efforts
Azure ML Services
Scale out clusters
Azure
Databricks
Notebooks
Azure
Databricks
Scale out clusters
Batch AI
MS Cognitive
Toolkit
Keras
TensorFlow
PyTorch
29. Azure Databricks for deep learning modeling
Tools InfrastructureFrameworks
Leverage powerful GPU-enabled VMs
pre-configured for deep neural
network training
Use HorovodEstimator via a native
runtime to enable build deep learning
models with a few lines of code
Full Python and Scala support for
transfer learning on images
Automatically store metadata in
Azure Database with geo-replication
for fault tolerance
Use built-in hyperparameter tuning
via Spark MLLib to quickly optimize the
model
Simultaneously collaborate within
notebooks environments to streamline
model development
Load images natively in Spark
DataFrames to automatically decode
them for manipulation at scale with
distributed DNN training on Spark
Improve performance 10x-100x over
traditional Spark deployments with
an optimized environment
Seamlessly use TensorFlow, Microsoft
Cognitive Toolkit, Caffe2, Keras, and more
Ready-to-use clusters with Azure Databricks Runtime for ML
30. Deep Learning
Supports Deep Learning Libraries/frameworks including:
Microsoft Cognitive Toolkit (CNTK).
o Article explains how to install CNTK on Azure Databricks.
TensorFlowOnSpark
BigDL
Offers Spark Deep Learning Pipelines, a suite of tools for working with and
processing images using deep learning using transfer learning. It includes
high-level APIs for common aspects of deep learning so they can be done
efficiently in a few lines of code:
Azure Databricks supports and integrates with a number of Deep Learning libraries and
frameworks to make it easy to build and deploy Deep Learning applications
Distributed Hyperparameter Tuning
Transfer Learning
31. Fast, easy, and collaborative Apache Spark™-based analytics platform
Azure Databricks
Built with your needs in mind
Role-based access controls
Effortless autoscaling
Live collaboration
Enterprise-grade SLAs
Best-in-class notebooks
Simple job scheduling
Seamlessly integrated with the Azure Portfolio
Increase productivity
Build on a secure, trusted cloud
Scale without limits