SlideShare a Scribd company logo
1 of 85
Download to read offline
Workshop Slides Follow-Up:
Comparing Technologies
Jen Stirrup
Data Whisperer,
Data Relish
Level: 300
Big Data
What is Big Data?
“Big data is a collection of data sets so large
and complex that it becomes awkward to work
with using on-hand database management
tools.
Difficulties include capture, storage, search,
sharing, analysis, and visualization.”
– Wikipedia
Examples
Enormous amounts of data
. online behavior social networking users .
.. samples of medical ailments ..
… purchasing habits of grocery shoppers …
…. crime statistics of cities ….
….. “internet of things” IoT…..
…… 24/7 out-patient monitor ……
……. real-time tele-metric devices …….
fully featured RDBMS
transactional processing
rich query
managed as a service
elastic scale
internet accessible http/rest
schema-free data model
arbitrary data formats
Apache Spark
What is Apache Spark?
Apache Spark solves a problem
There's no need to structure everything as map and reduce operations.
Apache Spark
• Interactive manipulation and visualization
of data
– Scala, Python, and R Interactive Shells
– Jupyter Notebook with PySpark (Python) and
Spark (Scala) kernels provide in-browser
interaction
Apache Spark
• Unified platform for processing multiple
workloads
– Real-time processing, Machine Learning,
Stream Analytics, Interactive Querying,
Graphing
Apache Spark
• Leverages in-memory processing for
really big data
– Resilient distributed datasets (RDDs)
– APIs for processing large datasets
– Up to 100x faster than Hadoop
What is Spark?
• an open-source software soIution that
performs rapid caIcuIations on in-memory
datasets
• RDD (ResiIient Distributed Data) is the
basis for what Spark enabIes
– ResiIient
– Distributed
Example RDD Transformations
• map(func)
• filter(func)
• distinct(func)
Example RDD Actions
• count()
• reduce(func)
• collect()
• take()
HDInsight
HDInsight Cluster Types
• Hadoop: Query workloads
– Reliable data storage, simple MapReduce
• HBase: NoSQL workloads
– Distributed database offering random access to large
amounts of data
• Apache Storm: Stream workloads
– Real-time analysis of moving data streams
• Apache Spark: High-performance workloads
– In-memory parallel processing
Azure Databricks
What is Databricks?
• Databricks provides an end-to-end,
managed Apache Spark platform
optimized for the cloud
• Improved performance of Spark jobs in
the cloud by 10 – 100x
• Cost Efficient to run large-scale Spark
workloads
Databricks for Big Data
• Data Scientists get an interactive
notebook environment
• Good monitoring suite
• Security Controls to facilitate thousands of
users
Databricks for Data Engineers
• Databricks Runtime adds increased
performance to Apache Spark workloads
when running on Azure
• Auto-scaling and auto-termination for
Spark clusters to automatically minimize
costs
Databricks for Data Science
• Notebooks have real-time collaboration
and are multi-editable for productivity
• Integration with Power BI for data
visualization
• Supported by Azure Database
Credit: https://databricks.com/blog/2017/11/15/a-technical-overview-of-azure-databricks.html
Why is Databricks in Azure?
• Close integration with Azure services
• Optimized connectors
• One-click management directly from the
Azure console
• Azure Databricks will greatly simplify building
enterprise-grade production data
applications
Azure and Databricks together
• Azure launches and manages worker
nodes in the customer subscriptions
• Customer launches a cluster, which
initiates a Databricks appliance
• A managed resource group is deployed
with a Vnet, Security Group and a
Storage account
Azure and Databricks Together
• Close Integration to provide an enterprise
platform
• Use all existing VMs
• Security and Privacy remains with
customer
• Network topology is flexible
Azure and Databricks together
• Azure Storage and Azure Data Lake
integration
• Azure Power BI
• Azure Active Directory
• Azure SQL Data Warehouse, Azure SQL
DB, Azure Cosmos DB
Azure and Databricks Together
• Metadata is stored in an Azure Database
with geo-replication
• Databricks cluster is managed through
Azure Databricks UI
Azure and Databricks Together
• Azure Container Services to run the
control plane and data planes via
containers
• Accelerated Networking
• Latest generation Azure hardware for
performance
Why Azure Databricks?
Collaboration
Why Azure Databricks?
Collaboration
Trusted Cloud
Why Azure Databricks?
Collaboration
Trusted Cloud
Scalability
Azure
Databricks
Fast, easy and collaborative Apache Spark-based analytics service
https://blogs.microsoft.com/ai/shell-iot-ai-safety-intelligent-tools/
Shell Case Study
Shell Case Study
Azure Databricks
● Unlock insights from all your data and build
artificial intelligence (AI) solutions with
Azure Databricks
● Azure Databricks supports Python, Scala,
R, Java and SQL, as well as data science
frameworks and libraries including
TensorFlow, PyTorch and scikit-learn.
Azure Databricks
● Fast, optimised Apache Spark environment
● Interactive workspace with built-in support
for popular tools, languages and
frameworks
Azure Databricks
● Supercharged machine learning on big data
with native Azure Machine Learning
integration
● High-performance modern data
warehousing in conjunction with Azure SQL
Data Warehouse
Azure Databricks
● Start quickly with an optimised Apache
Spark environment
● Spin up clusters and build quickly in a fully
managed Apache Spark environment with
the global scale and availability of Azure
● autoscaling and auto-termination to
improve total cost of ownership (TCO)
Azure Databricks
● Turbocharge machine learning on big data
● Get high-performance modern data
warehousing
Azure Databricks
What is Databricks?
• Databricks provides an end-to-end,
managed Apache Spark platform
optimized for the cloud
• Improved performance of Spark jobs in
the cloud by 10 – 100x
• Cost Efficient to run large-scale Spark
workloads
Databricks for Big Data
• Data Scientists get an interactive
notebook environment
• Good monitoring suite
• Security Controls to facilitate thousands of
users
Databricks for Data Engineers
• Databricks Runtime adds increased
performance to Apache Spark workloads
when running on Azure
• Auto-scaling and auto-termination for
Spark clusters to automatically minimize
costs
Databricks for Data Science
• Notebooks have real-time collaboration
and are multi-editable for productivity
• Integration with Power BI for data
visualization
• Supported by Azure Database
Credit: https://databricks.com/blog/2017/11/15/a-technical-overview-of-azure-databricks.html
Why is Databricks in Azure?
• Close integration with Azure services
• Optimized connectors
• One-click management directly from the
Azure console
• Azure Databricks will greatly simplify building
enterprise-grade production data
applications
Azure and Databricks together
• Azure launches and manages worker
nodes in the customer subscriptions
• Customer launches a cluster, which
initiates a Databricks appliance
• A managed resource group is deployed
with a Vnet, Security Group and a
Storage account
Azure and Databricks Together
• Close Integration to provide an enterprise
platform
• Use all existing VMs
• Security and Privacy remains with
customer
• Network topology is flexible
Azure and Databricks together
• Azure Storage and Azure Data Lake
integration
• Azure Power BI
• Azure Active Directory
• Azure SQL Data Warehouse, Azure SQL
DB, Azure Cosmos DB
Azure and Databricks Together
• Metadata is stored in an Azure Database
with geo-replication
• Databricks cluster is managed through
Azure Databricks UI
Azure and Databricks Together
• Azure Container Services to run the
control plane and data planes via
containers
• Accelerated Networking
• Latest generation Azure hardware for
performance
Why Azure Databricks?
Collaboration
Why Azure Databricks?
Collaboration
Trusted Cloud
Why Azure Databricks?
Collaboration
Trusted Cloud
Scalability
Azure
Databricks
Fast, easy and collaborative Apache Spark-based analytics service
https://blogs.microsoft.com/ai/shell-iot-ai-safety-intelligent-tools/
Shell Case Study
Shell Case Study
Azure Databricks
● Unlock insights from all your data and build
artificial intelligence (AI) solutions with
Azure Databricks
● Azure Databricks supports Python, Scala,
R, Java and SQL, as well as data science
frameworks and libraries including
TensorFlow, PyTorch and scikit-learn.
Azure Databricks
● Fast, optimised Apache Spark environment
● Interactive workspace with built-in support
for popular tools, languages and
frameworks
Azure Databricks
● Supercharged machine learning on big data
with native Azure Machine Learning
integration
● High-performance modern data
warehousing in conjunction with Azure SQL
Data Warehouse
Azure Databricks
● Start quickly with an optimised Apache
Spark environment
● Spin up clusters and build quickly in a fully
managed Apache Spark environment with
the global scale and availability of Azure
● autoscaling and auto-termination to
improve total cost of ownership (TCO)
Azure Databricks
● Turbocharge machine learning on big data
● Get high-performance modern data
warehousing
Azure Databricks & HDInsight
● Databricks is focused on collaboration, streaming and batch with a notebook
experience for the user. It integrates well with Azure, has AAD authentication, and
can export to SQL DWH, Cosmos DB, Power BI, etc. Databricks’ greatest strengths
are its zero-management cloud solution and the collaborative, interactive
environment it provides in the form of notebooks.
● HDInsight has Kafka, Storm and Hive LLAP, which Databricks doesn’t have. It is
better for processing very large datasets and in a way that allows the user to just “let
it run”.
● Sometimes a mix of both these technologies occurs. Databricks is more user-
friendly and easier to work with, so is better for exploration, whereas HDInsight is
better for processing data.
Azure Databricks & HDInsight - Pricing
HDInsight:
● Billed on a per-minute basis, clusters run a group of nodes depending on the
component. Nodes vary by group (e.g. Worker Node, Head Node, etc.), quantity and
instance type (e.g. D1v2).
Component Pricing
Hadoop, Spark, Interactive Query, Kafka,
Storm, HBase
Base price/node-hour
HDInsight Machine Learning services Base price/node-hour + £0.012/core-hour
Enterprise Security Package Base price/node-hour + £0.008/core-hour
Azure Databricks & HDInsight - Pricing
Databricks:
● Azure Databricks bills you for virtual machines (VMs) provisioned in clusters and
Databricks Units (DBUs) based on the VM instance selected. A DBU is a unit of
processing capability, billed on a per-second usage. The DBU consumption
depends on the size and type of instance running Azure Databricks.
Workload Standard Tier prices Premium Tier prices
Data Analytics £0.30/DBU-hour £0.410/DBU-hour
Data Engineering £0.12/DBU-hour £0.224/DBU-hour
Data Engineering Light £0.06/DBU-hour £0.164/DBU-hour
Azure Databricks & HDInsight - Pricing
Azure Databricks also offers a pre-purchase plan. You can get up to 37% savings
over pay-as-you-go DBU prices when you pre-purchase Azure Databricks Units
(DBU) as Databricks Commit Units (DBCU) for either 1 or 3 years.
HDInsight does not offer a pre-purchase plan.
Azure Databricks & HDInsight - Speed
Azure Databricks is even faster than Apache Spark, which can run 100 x faster than
Hadoop MapReduce. It is a very fast system, and provides a series of performance
enhancements on top of regular Apache Spark.
HDInsight is very effective at rapidly collecting large amounts of data, and with it you can
quickly spin up open source projects and clusters, with no hardware to install or
infrastructure to manage. However, some processes can be slightly slower with
HDInsight than with Databricks.
Azure Databricks & HDInsight - Hadoop
HDInsight uses Apache Hadoop, which is an open-source distributed
data analysis solution. Hadoop manages the processing of large
datasets across large clusters of computers and it detects and
handles failures.
Why Hadoop?
Azure provides dynamic machines that are billed only when active.
This enables elastic computing, where you can add machines for
particular workloads or projects and then remove them when not
needed. HDInsight can take advantage of this scalable platform. It can
also capitalize on the security and management features of Azure,
integration with Azure Active Directory and Log Analytics.
Azure Databricks & HDInsight - Hadoop
You can also make use of Hadoop with
Azure Databricks, but as a storage
function, rather than a function for data
analysis and management.
Azure Databricks & HDInsight - Learning Curve
● Databricks is a good technology to use
regardless of the previous experience that
the user / developer may be going in with.
Databricks’ vision is to make big data
easy for so that every organization can
use it. It aims to make complex systems
easier to work with and manage.
Azure Databricks & HDInsight -
Learning Curve
• There is more of a learning curve when it
comes to HDInsight.
• Generally, comprehensive training is
required, and a background knowledge of
SQL is very helpful.
Azure Databricks & HDInsight -
Languages
• While Azure Databricks is Spark based, it
allows commonly-used programming
languages like Python, R, and SQL to be
used. These languages are converted in
the backend through APIs, to interact with
Spark.
Azure Databricks & HDInsight - Languages
HDInsight clusters, including Spark, HBase, Kafka, Hadoop, and others, support many
programming languages. Some programming languages aren't installed by default. For
libraries, modules, or packages that are not installed by default, you need to use a script
action to install the component.
By default, HDInsight supports:
● Java
● Python
● .NET
● Go
HDInsight also supports Hadoop-specific languages - Pig, HiveQL and SparkSQL.
Azure Databricks HDInsight
Pricing Per Cluster Time (VM cost + DBU
processing time)
Per Cluster Time
Engine Apache Spark, optimized for
Databricks
Apache Spark or Apache Hive
Default Environment Databricks Notebooks, R Studio for
Databricks
Ambari, or Zeppelin if using Spark
De Facto Language R, Python, Scala, Java, SQL,
mostly open-source languages
HiveQL, open source
Integration with Data Factory Yes, to run notebooks or Spark
scripts
Yes, to run MapReduce jobs, Pig,
and Spark scripts
Scalability Easy to change machines, allows
autoscaling
Not scaleable
Testing Very easy, notebook functionality is
extremely flexible
Easy, Ambari allows interactive
query execution
Setup and Managing Easy - clusters can be modified
easily and Databricks offers two
main types of services
Complex - must decide cluster
types and sizes
Learning Curve Very flexible Flexible if user knows SQL
Azure Databricks and Data Lake Analytics
Both Databricks and DLA can be used for batch processing. How can we decide
which to choose over the other?
Azure Databricks and Data Lake Analytics
Data Lake Analytics is a distributed computing resource, which uses its strong U-SQL
language to assist in carrying out complex transformations and loading the data in
Azure/Non-Azure databases and file systems. Data Lake Analytics combines the power
of distributed processing with ease of SQL-like language, making it suitable for Ad-hoc
data processing.
Preferred use cases for DLA:
● Large amounts of data where conversion and loading are the only actions needed
● Processing data from relational databases into Azure
● Repetitive loads with no intermediary action
Azure Databricks and Data Lake Analytics
Azure Databricks is a Notebook type resource which allows setting up of high-
performance clusters which perform computing using its in-memory architecture. Users
can choose from a wide variety of programming languages and use their favorite libraries
to perform transformations, data type conversions and modeling. Databricks also comes
with infinite API connectivity options, which enables connection to various data sources
that include SQL/No-SQL/File systems and a lot more.
Preferred use cases for Databricks:
● Processes that require intermediary analysis of data
● ETL that requires more visibility during data transformation and modeling
Data Lake Analytics Databricks
Cost Control Pay-as-you-go Manual
Development Tool IDE + SDK based (U-SQL Supported Notebook type
Payment Per job Cluster properties, time duration and
workload
Scaling Auto-scaling based on data Auto-scaling for jobs running on cluster
Data Storage Internal database available Database File System, Direct Access
(Storage)
Manage Usage Portal (preferred); Azure SDK;
Python; Java; Node.js; .NET
Spark framework: Scala, Java, R and
Python; Spark SQL
Monitoring Jobs Azure Portal, Visual Studio Within Databricks
Functionalities Scheduling jobs, inducing in Data
Factory pipelines (U-SQL scripts)
Scheduling jobs, inducing in Data Factory
pipelines (Data Factory notebooks)
Data Lake Analytics and HDInsight
In the case of these technologies, they can actually be used together.
HDInsight is the analytics service whereas the Azure Data Lake Storage is the storage
service. You most likely need both to have functional analytics cluster.
HDInsight provides the cluster, fully manages the open-source packages for analytics
(Hadoop, Spark, etc), and you set up your cluster to use Azure Data Lake Storage which
support HDFS API (Hadoop FileSystem) on top of Cloud Storage.
Essentially, Hdinsight is a managed Hadoop service to provide compute support, and
DLA is a managed storage service to provide large amount of storage support.

More Related Content

What's hot

Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWSStylight
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Daniel Toomey
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Modern data warehouse with Azure
Modern data warehouse with AzureModern data warehouse with Azure
Modern data warehouse with AzureNilesh Gule
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukErwin de Kreuk
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Microsoft Tech Community
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureNiels Naglé
 

What's hot (20)

Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWS
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Modern data warehouse with Azure
Modern data warehouse with AzureModern data warehouse with Azure
Modern data warehouse with Azure
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 

Similar to Comparing Microsoft Big Data Platform Technologies

Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyNilesh Shah
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
 
Microsoft Azure update
Microsoft Azure updateMicrosoft Azure update
Microsoft Azure updateKarina Matos
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksAlberto Diaz Martin
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the CloudRoss McNeely
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365Marco Parenzan
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Microsoft Azure News - November 2019
Microsoft Azure News - November 2019Microsoft Azure News - November 2019
Microsoft Azure News - November 2019Daniel Toomey
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...PROIDEA
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxKshitija(KJ) Gupte
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric IntroductionJames Serra
 
Azure Data Engineering course in hyderabad.pptx
Azure Data Engineering course in hyderabad.pptxAzure Data Engineering course in hyderabad.pptx
Azure Data Engineering course in hyderabad.pptxshaikmadarbi3zen
 

Similar to Comparing Microsoft Big Data Platform Technologies (20)

CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
Microsoft Azure update
Microsoft Azure updateMicrosoft Azure update
Microsoft Azure update
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Azure Data Engineering.pdf
Azure Data Engineering.pdfAzure Data Engineering.pdf
Azure Data Engineering.pdf
 
Microsoft Azure News - November 2019
Microsoft Azure News - November 2019Microsoft Azure News - November 2019
Microsoft Azure News - November 2019
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
 
Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptx
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Azure Data Engineering course in hyderabad.pptx
Azure Data Engineering course in hyderabad.pptxAzure Data Engineering course in hyderabad.pptx
Azure Data Engineering course in hyderabad.pptx
 

More from Jen Stirrup

AI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdfAI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdfJen Stirrup
 
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATIONBUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATIONJen Stirrup
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...Jen Stirrup
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonJen Stirrup
 
Sales Analytics in Power BI
Sales Analytics in Power BISales Analytics in Power BI
Sales Analytics in Power BIJen Stirrup
 
Analytics for Marketing
Analytics for MarketingAnalytics for Marketing
Analytics for MarketingJen Stirrup
 
Diversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doersDiversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doersJen Stirrup
 
Artificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspectiveArtificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspectiveJen Stirrup
 
How to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to successHow to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to successJen Stirrup
 
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...Jen Stirrup
 
Data Visualization dataviz superpower
Data Visualization dataviz superpowerData Visualization dataviz superpower
Data Visualization dataviz superpowerJen Stirrup
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsJen Stirrup
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowJen Stirrup
 
Blockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence ProfessionalsBlockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence ProfessionalsJen Stirrup
 
Examples of the worst data visualization ever
Examples of the worst data visualization everExamples of the worst data visualization ever
Examples of the worst data visualization everJen Stirrup
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureJen Stirrup
 
Digital Transformation for the Human Resources Leader
Digital Transformation for the Human Resources LeaderDigital Transformation for the Human Resources Leader
Digital Transformation for the Human Resources LeaderJen Stirrup
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationJen Stirrup
 
Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016Jen Stirrup
 

More from Jen Stirrup (20)

AI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdfAI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdf
 
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATIONBUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
 
Sales Analytics in Power BI
Sales Analytics in Power BISales Analytics in Power BI
Sales Analytics in Power BI
 
Analytics for Marketing
Analytics for MarketingAnalytics for Marketing
Analytics for Marketing
 
Diversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doersDiversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doers
 
Artificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspectiveArtificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspective
 
How to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to successHow to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to success
 
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
 
Data Visualization dataviz superpower
Data Visualization dataviz superpowerData Visualization dataviz superpower
Data Visualization dataviz superpower
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Blockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence ProfessionalsBlockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence Professionals
 
Examples of the worst data visualization ever
Examples of the worst data visualization everExamples of the worst data visualization ever
Examples of the worst data visualization ever
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in Azure
 
Digital Transformation for the Human Resources Leader
Digital Transformation for the Human Resources LeaderDigital Transformation for the Human Resources Leader
Digital Transformation for the Human Resources Leader
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Comparing Microsoft Big Data Platform Technologies

  • 1. Workshop Slides Follow-Up: Comparing Technologies Jen Stirrup Data Whisperer, Data Relish Level: 300
  • 3. What is Big Data? “Big data is a collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization.” – Wikipedia
  • 4. Examples Enormous amounts of data . online behavior social networking users . .. samples of medical ailments .. … purchasing habits of grocery shoppers … …. crime statistics of cities …. ….. “internet of things” IoT….. …… 24/7 out-patient monitor …… ……. real-time tele-metric devices …….
  • 5.
  • 6. fully featured RDBMS transactional processing rich query managed as a service elastic scale internet accessible http/rest schema-free data model arbitrary data formats
  • 8. What is Apache Spark? Apache Spark solves a problem There's no need to structure everything as map and reduce operations.
  • 9. Apache Spark • Interactive manipulation and visualization of data – Scala, Python, and R Interactive Shells – Jupyter Notebook with PySpark (Python) and Spark (Scala) kernels provide in-browser interaction
  • 10. Apache Spark • Unified platform for processing multiple workloads – Real-time processing, Machine Learning, Stream Analytics, Interactive Querying, Graphing
  • 11. Apache Spark • Leverages in-memory processing for really big data – Resilient distributed datasets (RDDs) – APIs for processing large datasets – Up to 100x faster than Hadoop
  • 12. What is Spark? • an open-source software soIution that performs rapid caIcuIations on in-memory datasets • RDD (ResiIient Distributed Data) is the basis for what Spark enabIes – ResiIient – Distributed
  • 13.
  • 14. Example RDD Transformations • map(func) • filter(func) • distinct(func)
  • 15. Example RDD Actions • count() • reduce(func) • collect() • take()
  • 17. HDInsight Cluster Types • Hadoop: Query workloads – Reliable data storage, simple MapReduce • HBase: NoSQL workloads – Distributed database offering random access to large amounts of data • Apache Storm: Stream workloads – Real-time analysis of moving data streams • Apache Spark: High-performance workloads – In-memory parallel processing
  • 18.
  • 19.
  • 21. What is Databricks? • Databricks provides an end-to-end, managed Apache Spark platform optimized for the cloud • Improved performance of Spark jobs in the cloud by 10 – 100x • Cost Efficient to run large-scale Spark workloads
  • 22.
  • 23. Databricks for Big Data • Data Scientists get an interactive notebook environment • Good monitoring suite • Security Controls to facilitate thousands of users
  • 24. Databricks for Data Engineers • Databricks Runtime adds increased performance to Apache Spark workloads when running on Azure • Auto-scaling and auto-termination for Spark clusters to automatically minimize costs
  • 25. Databricks for Data Science • Notebooks have real-time collaboration and are multi-editable for productivity • Integration with Power BI for data visualization • Supported by Azure Database
  • 27. Why is Databricks in Azure? • Close integration with Azure services • Optimized connectors • One-click management directly from the Azure console • Azure Databricks will greatly simplify building enterprise-grade production data applications
  • 28. Azure and Databricks together • Azure launches and manages worker nodes in the customer subscriptions • Customer launches a cluster, which initiates a Databricks appliance • A managed resource group is deployed with a Vnet, Security Group and a Storage account
  • 29. Azure and Databricks Together • Close Integration to provide an enterprise platform • Use all existing VMs • Security and Privacy remains with customer • Network topology is flexible
  • 30. Azure and Databricks together • Azure Storage and Azure Data Lake integration • Azure Power BI • Azure Active Directory • Azure SQL Data Warehouse, Azure SQL DB, Azure Cosmos DB
  • 31. Azure and Databricks Together • Metadata is stored in an Azure Database with geo-replication • Databricks cluster is managed through Azure Databricks UI
  • 32. Azure and Databricks Together • Azure Container Services to run the control plane and data planes via containers • Accelerated Networking • Latest generation Azure hardware for performance
  • 36. Azure Databricks Fast, easy and collaborative Apache Spark-based analytics service https://blogs.microsoft.com/ai/shell-iot-ai-safety-intelligent-tools/ Shell Case Study
  • 38. Azure Databricks ● Unlock insights from all your data and build artificial intelligence (AI) solutions with Azure Databricks ● Azure Databricks supports Python, Scala, R, Java and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch and scikit-learn.
  • 39. Azure Databricks ● Fast, optimised Apache Spark environment ● Interactive workspace with built-in support for popular tools, languages and frameworks
  • 40. Azure Databricks ● Supercharged machine learning on big data with native Azure Machine Learning integration ● High-performance modern data warehousing in conjunction with Azure SQL Data Warehouse
  • 41. Azure Databricks ● Start quickly with an optimised Apache Spark environment ● Spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure ● autoscaling and auto-termination to improve total cost of ownership (TCO)
  • 42. Azure Databricks ● Turbocharge machine learning on big data ● Get high-performance modern data warehousing
  • 43.
  • 45. What is Databricks? • Databricks provides an end-to-end, managed Apache Spark platform optimized for the cloud • Improved performance of Spark jobs in the cloud by 10 – 100x • Cost Efficient to run large-scale Spark workloads
  • 46.
  • 47. Databricks for Big Data • Data Scientists get an interactive notebook environment • Good monitoring suite • Security Controls to facilitate thousands of users
  • 48. Databricks for Data Engineers • Databricks Runtime adds increased performance to Apache Spark workloads when running on Azure • Auto-scaling and auto-termination for Spark clusters to automatically minimize costs
  • 49. Databricks for Data Science • Notebooks have real-time collaboration and are multi-editable for productivity • Integration with Power BI for data visualization • Supported by Azure Database
  • 51. Why is Databricks in Azure? • Close integration with Azure services • Optimized connectors • One-click management directly from the Azure console • Azure Databricks will greatly simplify building enterprise-grade production data applications
  • 52. Azure and Databricks together • Azure launches and manages worker nodes in the customer subscriptions • Customer launches a cluster, which initiates a Databricks appliance • A managed resource group is deployed with a Vnet, Security Group and a Storage account
  • 53. Azure and Databricks Together • Close Integration to provide an enterprise platform • Use all existing VMs • Security and Privacy remains with customer • Network topology is flexible
  • 54. Azure and Databricks together • Azure Storage and Azure Data Lake integration • Azure Power BI • Azure Active Directory • Azure SQL Data Warehouse, Azure SQL DB, Azure Cosmos DB
  • 55. Azure and Databricks Together • Metadata is stored in an Azure Database with geo-replication • Databricks cluster is managed through Azure Databricks UI
  • 56. Azure and Databricks Together • Azure Container Services to run the control plane and data planes via containers • Accelerated Networking • Latest generation Azure hardware for performance
  • 60. Azure Databricks Fast, easy and collaborative Apache Spark-based analytics service https://blogs.microsoft.com/ai/shell-iot-ai-safety-intelligent-tools/ Shell Case Study
  • 62. Azure Databricks ● Unlock insights from all your data and build artificial intelligence (AI) solutions with Azure Databricks ● Azure Databricks supports Python, Scala, R, Java and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch and scikit-learn.
  • 63. Azure Databricks ● Fast, optimised Apache Spark environment ● Interactive workspace with built-in support for popular tools, languages and frameworks
  • 64. Azure Databricks ● Supercharged machine learning on big data with native Azure Machine Learning integration ● High-performance modern data warehousing in conjunction with Azure SQL Data Warehouse
  • 65. Azure Databricks ● Start quickly with an optimised Apache Spark environment ● Spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure ● autoscaling and auto-termination to improve total cost of ownership (TCO)
  • 66. Azure Databricks ● Turbocharge machine learning on big data ● Get high-performance modern data warehousing
  • 67.
  • 68. Azure Databricks & HDInsight ● Databricks is focused on collaboration, streaming and batch with a notebook experience for the user. It integrates well with Azure, has AAD authentication, and can export to SQL DWH, Cosmos DB, Power BI, etc. Databricks’ greatest strengths are its zero-management cloud solution and the collaborative, interactive environment it provides in the form of notebooks. ● HDInsight has Kafka, Storm and Hive LLAP, which Databricks doesn’t have. It is better for processing very large datasets and in a way that allows the user to just “let it run”. ● Sometimes a mix of both these technologies occurs. Databricks is more user- friendly and easier to work with, so is better for exploration, whereas HDInsight is better for processing data.
  • 69.
  • 70. Azure Databricks & HDInsight - Pricing HDInsight: ● Billed on a per-minute basis, clusters run a group of nodes depending on the component. Nodes vary by group (e.g. Worker Node, Head Node, etc.), quantity and instance type (e.g. D1v2). Component Pricing Hadoop, Spark, Interactive Query, Kafka, Storm, HBase Base price/node-hour HDInsight Machine Learning services Base price/node-hour + £0.012/core-hour Enterprise Security Package Base price/node-hour + £0.008/core-hour
  • 71. Azure Databricks & HDInsight - Pricing Databricks: ● Azure Databricks bills you for virtual machines (VMs) provisioned in clusters and Databricks Units (DBUs) based on the VM instance selected. A DBU is a unit of processing capability, billed on a per-second usage. The DBU consumption depends on the size and type of instance running Azure Databricks. Workload Standard Tier prices Premium Tier prices Data Analytics £0.30/DBU-hour £0.410/DBU-hour Data Engineering £0.12/DBU-hour £0.224/DBU-hour Data Engineering Light £0.06/DBU-hour £0.164/DBU-hour
  • 72. Azure Databricks & HDInsight - Pricing Azure Databricks also offers a pre-purchase plan. You can get up to 37% savings over pay-as-you-go DBU prices when you pre-purchase Azure Databricks Units (DBU) as Databricks Commit Units (DBCU) for either 1 or 3 years. HDInsight does not offer a pre-purchase plan.
  • 73. Azure Databricks & HDInsight - Speed Azure Databricks is even faster than Apache Spark, which can run 100 x faster than Hadoop MapReduce. It is a very fast system, and provides a series of performance enhancements on top of regular Apache Spark. HDInsight is very effective at rapidly collecting large amounts of data, and with it you can quickly spin up open source projects and clusters, with no hardware to install or infrastructure to manage. However, some processes can be slightly slower with HDInsight than with Databricks.
  • 74. Azure Databricks & HDInsight - Hadoop HDInsight uses Apache Hadoop, which is an open-source distributed data analysis solution. Hadoop manages the processing of large datasets across large clusters of computers and it detects and handles failures. Why Hadoop? Azure provides dynamic machines that are billed only when active. This enables elastic computing, where you can add machines for particular workloads or projects and then remove them when not needed. HDInsight can take advantage of this scalable platform. It can also capitalize on the security and management features of Azure, integration with Azure Active Directory and Log Analytics.
  • 75. Azure Databricks & HDInsight - Hadoop You can also make use of Hadoop with Azure Databricks, but as a storage function, rather than a function for data analysis and management.
  • 76. Azure Databricks & HDInsight - Learning Curve ● Databricks is a good technology to use regardless of the previous experience that the user / developer may be going in with. Databricks’ vision is to make big data easy for so that every organization can use it. It aims to make complex systems easier to work with and manage.
  • 77. Azure Databricks & HDInsight - Learning Curve • There is more of a learning curve when it comes to HDInsight. • Generally, comprehensive training is required, and a background knowledge of SQL is very helpful.
  • 78. Azure Databricks & HDInsight - Languages • While Azure Databricks is Spark based, it allows commonly-used programming languages like Python, R, and SQL to be used. These languages are converted in the backend through APIs, to interact with Spark.
  • 79. Azure Databricks & HDInsight - Languages HDInsight clusters, including Spark, HBase, Kafka, Hadoop, and others, support many programming languages. Some programming languages aren't installed by default. For libraries, modules, or packages that are not installed by default, you need to use a script action to install the component. By default, HDInsight supports: ● Java ● Python ● .NET ● Go HDInsight also supports Hadoop-specific languages - Pig, HiveQL and SparkSQL.
  • 80. Azure Databricks HDInsight Pricing Per Cluster Time (VM cost + DBU processing time) Per Cluster Time Engine Apache Spark, optimized for Databricks Apache Spark or Apache Hive Default Environment Databricks Notebooks, R Studio for Databricks Ambari, or Zeppelin if using Spark De Facto Language R, Python, Scala, Java, SQL, mostly open-source languages HiveQL, open source Integration with Data Factory Yes, to run notebooks or Spark scripts Yes, to run MapReduce jobs, Pig, and Spark scripts Scalability Easy to change machines, allows autoscaling Not scaleable Testing Very easy, notebook functionality is extremely flexible Easy, Ambari allows interactive query execution Setup and Managing Easy - clusters can be modified easily and Databricks offers two main types of services Complex - must decide cluster types and sizes Learning Curve Very flexible Flexible if user knows SQL
  • 81. Azure Databricks and Data Lake Analytics Both Databricks and DLA can be used for batch processing. How can we decide which to choose over the other?
  • 82. Azure Databricks and Data Lake Analytics Data Lake Analytics is a distributed computing resource, which uses its strong U-SQL language to assist in carrying out complex transformations and loading the data in Azure/Non-Azure databases and file systems. Data Lake Analytics combines the power of distributed processing with ease of SQL-like language, making it suitable for Ad-hoc data processing. Preferred use cases for DLA: ● Large amounts of data where conversion and loading are the only actions needed ● Processing data from relational databases into Azure ● Repetitive loads with no intermediary action
  • 83. Azure Databricks and Data Lake Analytics Azure Databricks is a Notebook type resource which allows setting up of high- performance clusters which perform computing using its in-memory architecture. Users can choose from a wide variety of programming languages and use their favorite libraries to perform transformations, data type conversions and modeling. Databricks also comes with infinite API connectivity options, which enables connection to various data sources that include SQL/No-SQL/File systems and a lot more. Preferred use cases for Databricks: ● Processes that require intermediary analysis of data ● ETL that requires more visibility during data transformation and modeling
  • 84. Data Lake Analytics Databricks Cost Control Pay-as-you-go Manual Development Tool IDE + SDK based (U-SQL Supported Notebook type Payment Per job Cluster properties, time duration and workload Scaling Auto-scaling based on data Auto-scaling for jobs running on cluster Data Storage Internal database available Database File System, Direct Access (Storage) Manage Usage Portal (preferred); Azure SDK; Python; Java; Node.js; .NET Spark framework: Scala, Java, R and Python; Spark SQL Monitoring Jobs Azure Portal, Visual Studio Within Databricks Functionalities Scheduling jobs, inducing in Data Factory pipelines (U-SQL scripts) Scheduling jobs, inducing in Data Factory pipelines (Data Factory notebooks)
  • 85. Data Lake Analytics and HDInsight In the case of these technologies, they can actually be used together. HDInsight is the analytics service whereas the Azure Data Lake Storage is the storage service. You most likely need both to have functional analytics cluster. HDInsight provides the cluster, fully manages the open-source packages for analytics (Hadoop, Spark, etc), and you set up your cluster to use Azure Data Lake Storage which support HDFS API (Hadoop FileSystem) on top of Cloud Storage. Essentially, Hdinsight is a managed Hadoop service to provide compute support, and DLA is a managed storage service to provide large amount of storage support.