1. 191AIE503T CLOUD COMPUTING UNIT - IV
UNIT –IV
AZURE CLOUD AND CORE SERVICES
Azure Synapse Analytics - HDInsight-Azure Data bricks - Usage of Internet of Things (IoT) Hub-IoT
Central-Azure Sphere-Azure Cloud shell and Mobile Apps
Azure Synapse Analytics
Introduction
In the mid of 2016, Azure made Azure SQL Data Warehouse service generally available for data
warehousing on the cloud. Since then, this service has gone through several iterations, and towards the end
of 2019, Microsoft announced that the Azure SQL Data Warehouse service would be rebranded as Azure
Synapse Analytics. This service is the de-facto service for combining data warehousing and big data
analytics, with many new features of the service in preview as well.
High-Level Architecture
Online Transaction Processing Workloads (OLTP) typically involve transactional data that is voluminous
in terms of high reads and writes. The data access pattern usually involves a lot of scalar and tabular
datasets. And data ingestion generally happens through user transactions in small batches of rows. Online
Analytical Processing (OLAP) applications typically store and process large volumes of data collected from
various sources, which may be transformed and/or modeled in the OLAP repository, and then large datasets
are aggregated for ad-hoc reporting and analytical use-cases. The latter is the use-case where Synapse
Analytics fits in the overall data landscape, as shown below. Azure Data Lake Storage forms the bedrock
of big data storage, and Power BI forms the visualization layer, as shown below.
2. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure Synapse Components and Features
There are multiple components of Synapse Analytics architecture on Azure. Let’s understand all these
components one by one.
Synapse Analytics is basically an analytics service that has a virtually unlimited scale to support
analytics workloads
Synapse Workspaces (in preview as of Sept 2020) provides an integrated console to administer and
operate different components and services of Azure Synapse Analytics
3. 191AIE503T CLOUD COMPUTING UNIT - IV
Synapse Analytics Studio is a web-based IDE to enable code-free or low-code developer experience
to work with Synapse Analytics
Synapse supports a number of languages like SQL, Python, .NET, Java, Scala, and R that are
typically used by analytic workloads
Synapse supports two types of analytics runtimes – SQL and Spark (in preview as of Sept 2020)
based that can process data in a batch, streaming, and interactive manner
Synapse is integrated with numerous Azure data services as well, for example, Azure Data Catalog,
Azure Lake Storage, Azure Databricks, Azure HDInsight, Azure Machine Learning, and Power BI
Synapse also provides integrated management, security, and monitoring related services to support
monitoring and operations on the data and services supported by Synapse
Data Lake Storage is suited for big data scale of data volumes that are modeled in a data lake model.
This storage layer acts as the data source layer for Synapse. Data is typically populated in Synapse
from Data Lake Storage for various analytical purposes
Now that we understand different layers or components of the architecture let’s understand the core pillars
of Synapse.
Azure Synapse Studio – This tool is a web-based SaaS tool that provides developers to work with
every aspect of Synapse Analytics from a single console. In an analytical solution development life-
cycle using Synapse, one generally starts with creating a workspace and launching this tool that
provides access to different synapse features like Ingesting data using import mechanisms or data
pipelines and create data flows, explore data using notebooks, analyze data with spark jobs or SQL
scripts, and finally visualize data for reporting and dash boarding purposes. This tool also provides
4. 191AIE503T CLOUD COMPUTING UNIT - IV
features for authoring artifacts, debugging code, optimizing performance by assessing metrics,
integration with CI/CD tools, etc.
Azure Synapse Data Integration – There are different tools that can be used to load data into
Synapse. But having an integrated orchestration engine help to reduce dependency and management
of separate tool instances and data pipelines. This service comes with an integrated orchestration
engine that is identical to Azure Data Factory to create data pipelines and rich data transformation
capabilities within the Synapse workspace itself. Key features include support for 90+ data sources
that include almost 15 Azure-based data sources, 26 open-source and cross-cloud data warehouses
and databases, 6 file-based data sources, 3 No SQL based data sources, 28 Services and Apps that
can serve as data providers, as well as 4 generic protocols like ODBC, REST, etc. that can serve
data. Pipelines can be created using built-in templates from Synapse Studio to integrate data from
various sources, as shown below.
5. 191AIE503T CLOUD COMPUTING UNIT - IV
Synapse SQL Pools – This feature provides the same data warehousing features that were made
available with the earlier versions of this service when it was branded as SQL DW. This feature of
the service available in a provisioned manner where a fixed capacity of DWU units is allocated to
the instance of the service for data processing. Data can be imported into Synapse using different
mechanisms like SSIS, Polybase, Azure Data Factory, etc. Synapse stores data in a columnar format
and enables distributed querying capabilities, which is better suited for the performance of OLAP
workloads. SQL Pools have built-in support for data streaming, as well as few AI functions out-of-
box. Shown below is a screenshot of how Synapse SQL Pool would look.
Generally, Synapse SQL Pools are part of an Azure SQL Server instance and can be browsed using
tools like SSMS as well. Synapse SQL feature is also available in a serverless manner (in preview
as of Sept 2020), where no fixed capacity of the infrastructure needs to be provisioned. Instead,
Azure manages the required infrastructure capacity to meet the needs of the workloads. This is a
data virtualization feature supported by Synapse SQL. The pricing model, in this case, is based on
the data volumes processed instead of the number of DWUs allocated to the instance.
6. 191AIE503T CLOUD COMPUTING UNIT - IV
Apache Spark for Azure Synapse – This component of Synapse provides Spark runtime to
perform the same set of tasks like data loading, data processing, data preparation, ETLs, and other
tasks that are generally related to data warehousing. Azure provides Data Bricks, too, as a service
that is based on Spark runtime with a certain set of optimizations, which is typically used for a
similar set of purposes. One of the advantages of this feature compared to Azure Databricks is that
no additional or separate clusters need to be managed to process data as this is an integral part of
Synapse, provides Spark-based processing with auto-scaling, support for features like .NET for
Spark, SparkML algorithms, Delta Lake, Azure ML Integration for Apache Spark, Jupyter style
notebooks, etc. In addition, it has multi-language support for languages like C#, Pyspark, Scala,
Spark SQL, Java, etc. Once a Synapse workspace is created, one can provision Apache Spark pools
or Synapse SQL pools from a common interface, as shown below.
7. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure Synapse Security – Apart from the above features, one key aspect to take note of is the array
of security features packed in Azure Synapse. It is already compliant with almost 30 industry-
leading compliances like ISO, SOC, FedRAMP, DISA, HIPAA, FIPS, etc.
o It supports Azure AD authentication, SQL based authentication as well as Multi-factor
authentication
o It supports data encryption at rest and in transit as well as data classification for sensitive data
o It supports row-level, column-level, as well as object-level security along with dynamic data
masking
o It supports network-level security with virtual networks as well as firewalls
Azure Synapse is a tightly integrated suite of services that cover the entire spectrum of tasks and processes
that are used in the workflow of an analytical solution. These architectural components provide a modular
vision of the entire suite to get a head start.
Azure Synapse Analytics Features
Centralized Data Management: Azure Synapse utilizes Massively Parallel Processing (MPP)
technology. It allows processing and management of large workloads and efficiently handling large data
volumes. Deliver unified experience by managing both data lakes as well as data warehouses.
Workload Isolation: This capability allows users to manage the execution of heterogeneous workloads.
It offers increased flexibility by exclusively reserving resources for a specific workload group. This is
while having complete control over warehouse resources to satisfy business SLAs.
Machine Learning Integration: By integrating Azure Machine Learning, Azure Synapse
Analytics enables leveraging ML capabilities. This can help predict and score the ML models to generate
8. 191AIE503T CLOUD COMPUTING UNIT - IV
predictions within the data warehouse itself. Further, it allows converting the existing & trained ML
models into Synapse Analytics itself as opposed to recreating the entire model again. This helps businesses
to save time, money and effort.
Further, businesses can also analyze the data using machine learning algorithms and visualize the result
over a rich PowerBI Dashboard, how about that? Hence it is a great tool for companies to manage real-
time analytics for:
Supply chain forecasting
Inventory reporting
Predictive maintenance
Anomaly detection
Azure Synapse Analytics Benefits
Businesses of today use a variety of tools to manage, store and analyze workloads. And, you will agree,
things can go wrong when one of the interconnected systems faces a downtime or any other technical
challenge. Azure Synapse Analytics offers businesses centralized management of the data lakes and data
warehouses. Interestingly, Azure Synapse Studio offers a unified workspace for data preparation, data
management, data warehousing, Big Data & Artificial Intelligence tasks.
Here are some of the salient benefits;
Accelerate Analytics & Reporting
Reduced manual efforts for collecting, collating and building reports
Instant scalability & flexibility leads to no downtimes with workload variations
Faster DWH deployment
Better BI & Data Visualization
The seamless and native integration with Power BI makes the reporting & analysis of key metrics
that are engaging, easy-to-use. This makes it even easier to share with relevant stakeholders across
business streams – a word of advice: fewer data silos leads to more visibility.
Increased IT productivity
Enables staff to automate infrastructure provisioning and administrative tasks (includes DWH setup,
patch management & maintenance)
9. 191AIE503T CLOUD COMPUTING UNIT - IV
Limitless Scaling
Being a cloud-based service, Azure Synapse Analytics can view, organize and queries (relational
& non-relational data) faster than traditional on-premises tools. In other words, you can efficiently
manage thousands of concurrent users and systems. And, when compared with Google’s BigQuery,
Synapse could run the same query in roughly 75% less time over a petabyte of data.
Azure HDInsight
Azure HDInsight is a service offered by Microsoft, that enables us to use open source frameworks for big
data analytics. Azure HDInsight allows the use of frameworks like Hadoop, Apache Spark, Apache Hive,
LLAP, Apache Kafka, Apache Storm, R, etc., for processing large volumes of data. These tools can be used
on data to perform extract, transform, and load (ETL,) data warehousing, machine learning, and IoT.
Azure HDInsight Features
The main features of Azure HDInsight that set it apart are:
Cloud and on-premises availability: Azure HDInsight can help us in big data analytics using
Hadoop, Spark, interactive query (LLAP,) Kafka, Storm, etc., on the cloud as well as on-premises.
Scalable and economical: HDInsight can be scaled up or down as and when required. The ability
to be scaled also means that you have to pay for only what you use. You can upgrade your HDInsight
when required, and this eliminates having to pay for unused resources.
Security: Azure HDInsight protects your assets with industry-standard security. The encryption and
integration with Active Directory make sure that your assets are safe in the Azure Virtual Network.
Monitoring and analytics: HDInsight’s integration with Azure Monitor helps us to closely watch
what is happening in our clusters and take actions based on that.
Global availability: Azure HDInsight is more globally available than any other big data analytics
service.
Highly productive: Productive tools for Hadoop and Spark can be used in HDInsight in different
development environments like Visual Studio, VSCode, Eclipse, and IntelliJ for Scala, Python, R,
Java, etc.
10. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure HDInsight Architecture
Before getting into the uses of Azure HDInsight, let’s understand how to choose the right Architecture for
Azure HDInsight. Listed below are best practices for Azure HDInsight Architecture:
It is recommended that you migrate an on-premises Hadoop cluster to Azure HDInsight using
multiple workload clusters rather than a single cluster. A large number of clusters will increase your
costs unnecessarily if used over time.
On-demand transient clusters are used so that the clusters are deleted after the workload is complete.
As a result, resource costs may be reduced since HDInsight clusters are rarely used. By deleting a
cluster, you will not be deleting the associated meta-stores or storage accounts, so you can use them
to recreate the cluster if necessary.
In HDInsight clusters, as storage-and-compute can be used from Azure Storage, Azure Data Lake
Storage, or both, it is best to separate data storage from processing. In addition to reducing storage
costs, it will also allow you to use transient clusters, share data, scale storage, and compute
independently.
Azure HDInsight Metastore Best Practices
The Apache Hive Metastore is an important aspect of the Apache Hadoop architecture since it serves as a
central schema repository for other big data access resources including Apache Spark, Interactive Query
(LLAP), Presto, and Apache Pig. It is worth noting that HDInsight uses Azure SQL as its Hive metastore
database.
There are two types when it comes to HDInsight metastores: default metastores or custom
metastores.
A default metastore can be created for free for any cluster type, but if one is created it cannot be
shared.
The use of custom metastores is recommended for production clusters since they can be created
and removed without loss of metadata. It is suggested to use a custom metastore to isolate compute
and metadata and to periodically back it up.
HDInsight immediately deletes the Hive metastore upon cluster destruction. By storing Hive metastore in
Azure DB, you will not have to remove it when deleting the cluster.
11. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure Log Analysis and Azure Portal provide monitoring tools for monitoring metadata store performance.
If you are using HDInsight in the same region as your metastore, make sure that they are in the same
location.
Azure HDInsight Migration
The following are best practices for Azure HDInsight migration:
Script migration or replication can be used to migrate Hive metastore. You can migrate Hive metastore
with scripts by creating Hive DDLs from the existing metastore, editing the generated DDL to replace
HDFS URLs with WASB/ADLS/ABFS URLs, and then running the modified DDL on the metastore. Both
the on-premises and cloud versions of the metastore need to be compatible.
Migration Using DB Replication: When migrating your Hive metastores using DB replication, you can
use the Hive MetaTool to replace HDFS URLs with WASB/ADLS/ABFS URLs. Here’s an example code:
./hive --service metatool -updateLocation
hdfs://nn1:8020/
wasb://@.blob.core.windows.net/
Azure offers two approaches for migrating data from on-premises: migrating offline or migrating over TLS.
It will probably depend on how much data you need to migrate to determine the best choice for you.
Migrating over TLS: Microsoft Azure Storage Explorer, Azure Copy, Azure Powershell, and Azure CLI
can be used to migrate data over TLS to Azure storage.
Migrating offline: DataBox, DataBox Disk, and Data Box Heavy devices are also available for the offline
shipment of large amounts of data to Azure. As an alternative, you can also use native tools such as Apache
Hadoop DistCp, Azure Data Factory, or AzureCp to transfer data over the network.
Azure HDInsight Security and DevOps
To protect and maintain the cluster, it is wise to use Enterprise Security Package (ESP), which provides
directory-based authentication, multi-user assistance, and role-based access control. The ESP framework
can be used with a range of clusters, including Apache Hadoop, Apache Spark, Apache Hbase, Apache
Kafka, and Interactive Query (Hive LLAP).
To ensure your HDInsight deployment is secure, you need to take the following steps:
12. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure Monitor: Use the Azure Monitor service for monitoring and alerting.
Stay on top of updates: Always upgrade HDInsight to the latest version, install OS patches, and reboot
your nodes.
Enforce end-to-end enterprise security, with features such as auditing, encryption, authentication,
authorization, and a private pipeline.
Azure Storage Keys should also be encrypted. By using Shared Access Signatures (SAS), you can limit
access to your Azure storage resources. Azure Storage automatically encrypts data written to it using
Storage Service Encryption (SSE) and replication.
Make sure to update HDInsight at regular intervals. In order to do this, you can follow the steps
outlined below:
Set up a new HDInsight cluster and apply the most recent update to HDInsight.
Ensure the current cluster has enough workers and workloads.
As needed, change applications, or workloads.
A backup should be made of all temporary data stored on cluster nodes.
Delete the existing cluster.
Install HDInsight on a fresh new cluster with the same default data and metastore as previously.
Import any temporary file backups.
Finish processing jobs with the new cluster or start new ones.
Azure HDInsight Uses
The main scenarios in which we can use Azure HDInsight are:
Data Warehousing
Data warehousing is the storage of large volumes of data for retrieval and analysis at any point in time.
Data warehouses are maintained by businesses to analyze them and make strategic decisions based on them.
HDInsight can be used for data warehousing by performing queries at very large scales on structured or
unstructured data.
13. 191AIE503T CLOUD COMPUTING UNIT - IV
Internet of Things (IoT)
We are surrounded by a large number of smart devices that make our life easier. These IoT-enabled devices
help us in taking off the task of making small decisions regarding our devices.
IoT requires the processing and analytics of data coming in from millions of smart devices. This data is the
backbone of IoT and maintaining and processing it is vital for the proper functioning of IoT-enabled
devices.
Azure HDInsight can help in processing large volumes of data coming from numerous devices.
14. 191AIE503T CLOUD COMPUTING UNIT - IV
Data Science
Building applications that can analyze data and do tasks based on it are vital for AI-enabled solutions. These
apps need to be powerful enough to process large volumes of data and make decisions based on that.
An example worth noting would be the software used in self-driving cars. This software has to constantly
keep on learning from new experiences as well as from historical data to make real-time decisions.
Azure HDInsight helps in making applications that can extract vital information from analyzing large
volumes of data.
Preparing for job interviews? Have a look at our blog on Azure interview questions and answers!
Hybrid Cloud
A hybrid cloud is when companies use both public and private cloud for their workflows. In this, they will
get the benefits of both such as security, scalability, flexibility, etc.
Azure HDInsight can be used to extend a company’s on-premises infrastructure to the cloud for better
analytics and processing in a hybrid situation.
15. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure Data bricks
Databricks Introduction (https://intellipaat.com/blog/what-is-azure-databricks/#no9)
Databricks is a software company founded by the creators of Apache Spark. The company has also
created famous software such as Delta Lake, MLflow, and Koalas. These are the popular open-
source projects that span data engineering, data science, and machine learning. Databricks develops
web-based platforms for working with Spark, which provides automated cluster management and
IPython-style notebooks.
Databricks in Azure
16. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services
platform. Azure Databricks offers three environments:
Databricks SQL
Databricks data science and engineering
Databricks machine learning
Databricks SQL
Databricks SQL provides a user-friendly platform. This helps analysts, who work on SQL queries,
to run queries on Azure Data Lake, create multiple virtualizations, and build and share dashboards.
Databricks Data Science and Engineering
Databricks data science and engineering provide an interactive working environment for data
engineers, data scientists, and machine learning engineers. The two ways to send data through the
big data pipeline are:
Ingest into Azure through Azure Data Factory in batches
Stream real-time by using Apache Kafka, Event Hubs, or IoT Hub
17. 191AIE503T CLOUD COMPUTING UNIT - IV
Databricks Machine Learning
Databricks machine learning is a complete machine learning environment. It helps to manage
services for experiment tracking, model training, feature development, and management. It also
does model serving.
Enroll in our Azure training in Bangalore, if you are interested in getting an AZ-400
certification.
Pros and Cons of Azure Databricks
Moving ahead in this blog, we will discuss the pros and cons of Azure Databricks and understand
how good it really is.
Pros
It can process large amounts of data with Databricks and since it is part of Azure; the data is cloud-
native.
The clusters are easy to set up and configure.
It has an Azure Synapse Analytics connector as well as the ability to connect to Azure DB.
It is integrated with Active Directory.
It supports multiple languages. Scala is the main language, but it also works well with Python, SQL,
and R.
Cons
It does not integrate with Git or any other versioning tool.
It, currently, only supports HDInsight and not Azure Batch or AZTK.
Databricks SQL
Databricks SQL allows you to run quick ad-hoc SQL queries on Data Lake. Integrating with Azure Active
Directory enables to run of complete Azure-based solutions by using Databricks SQL. By integrating with
Azure databases, Databricks SQL can store Synapse Analytics, Azure Cosmos DB, Data Lake Store, and
Blob Storage. Integrating with Power BI, Databricks SQL allows users to discover and share insights more
easily. BI tools, such as Tableau Software, can also be used for accessing data bricks.
18. 191AIE503T CLOUD COMPUTING UNIT - IV
The interface that allows the automation of Databricks SQL objects is REST API.
Data Management
It has three parts:
Visualization: A graphical presentation of the result of running a query
Dashboard: A presentation of query visualizations and commentary
Alert: A notification that a field returned by a query has reached a threshold
Computation Management
Here, we will know about the terms that will help to run SQL queries in Databricks SQL.
Query: A valid SQL statement
SQL endpoint: A resource where SQL queries are executed
Query history: A list of previously executed queries and their characteristics
Opt for Microsoft Azure Training taught by industry experts and get certified!
Authorization
User and group: The user is an individual who has access to the system. The set of multiple users
is known as a group.
Personal access token: An opaque string is used to authenticate to the REST API.
Access control list: Set of permissions attached to a principal that requires access to an object. ACL
(Access Control List) specifies the object and actions allowed in it.
Databricks Data Science & Engineering
Databricks Data Science & Engineering is, sometimes, also called Workspace. It is an analytics platform
that is based on Apache Spark.
19. 191AIE503T CLOUD COMPUTING UNIT - IV
Databricks Data Science & Engineering comprises complete open-source Apache Spark cluster
technologies and capabilities. Spark in Databricks Data Science & Engineering includes the following
components:
Spark SQL and DataFrames: This is the Spark module for working with structured data. A
DataFrame is a distributed collection of data that is organized into named columns. It is very similar
to a table in a relational database or a data frame in R or Python.
Streaming: This integrates with HDFS, Flume, and Kafka. Streaming is real-time data processing
and analysis for analytical and interactive applications.
MLlib: It is short for Machine Learning Library consisting of common learning algorithms and
utilities including classification, regression, clustering, collaborative filtering, dimensionality
reduction as well as underlying optimization primitives.
GraphX: Graphs and graph computation for a broad scope of use cases from cognitive analytics to
data exploration.
Spark Core API: This has the support for R, SQL, Python, Scala, and Java.
Integrating with Azure Active Directory enables you to run complete Azure-based solutions by using
Databricks SQL. By integrating with Azure databases, Databricks SQL can store Synapse Analytics,
Cosmos DB, Data Lake Store, and Blob Storage. By integrating with Power BI, Databricks SQL allows
users to discover and share insights more easily. BI tools, such as Tableau Software, can also be used.
Workspace
Workspace is the place for accessing all Azure Databricks assets. It organizes objects into folders and
provides access to data objects and computational resources.
The workspace contains:
Dashboard: It provides access to visualizations.
Library: Package available to notebook or job running on the cluster. We can also add our own
libraries.
Repo: A folder whose contents are co-versioned together by syncing them to a local Git repository.
Experiment: A collection of MLflow runs for training an ML model.
20. 191AIE503T CLOUD COMPUTING UNIT - IV
Interface
It supports UI, API, and command line (CLI.)
UI: It provides a user-friendly interface to workspace folders and their resources.
Rest API: There are two versions, REST API 2.0 and REST API 1.2. REST API 2.0 has features
of REST API 1.2 along with some additional features. So, REST API 2.0 is the preferred version.
CLI: It is an open-source project that is available on GitHub. CLI is built on REST API 2.0.
Data Management
Databricks File System (DBFS): It is an abstraction layer over the Blob store. It contains
directories that can contain files or more directories.
Database: It is a collection of information that can be managed and updated.
Table: Tables can be queried with Apache Spark SQL and Apache Spark APIs.
Metastore: It stores information about various tables and partitions in the data warehouse.
To learn more, have a look at our blog on Azure tutorial now!
Computation Management
To run computations in Azure Databricks, we need to know about the following:
Cluster: It is a set of computation resources and configurations on which we can run notebooks and
jobs. These are of two types:
o All-purpose: We create an all-purpose cluster by using UI, CLI, or REST API. We can
manually terminate and restart an all-purpose cluster. Multiple users can share such clusters
to do collaborative, interactive analysis.
o Job: The Azure Databricks job scheduler creates a job cluster when we run a job on a new
job cluster and terminates the cluster when the job is complete. We cannot restart a job
cluster.
Pool: It has a set of ready-to-use instances that reduce cluster start. It also reduces auto-scaling time.
If the pool does not have enough resources, it expands itself. When the attached cluster is terminated,
the instances it uses are returned to the pool and can be reused by a different cluster.
21. 191AIE503T CLOUD COMPUTING UNIT - IV
Databricks Runtime
The core components that run on clusters managed by Azure Databricks offer several runtimes:
It includes Apache Spark but also adds numerous other features to improve big data analytics.
Databricks Runtime for machine learning is built on Databricks runtime and provides a ready
environment for machine learning and data science.
Databricks Runtime for genomics is a version of Databricks runtime that is optimized for working
with genomic and biomedical data.
Databricks Light is the Azure Databricks packaging of the open-source Apache Spark runtime.
Job
Workload: There are two types of workloads with respect to the pricing schemes:
o Data engineering workload: This workload works on a job cluster.
o Data analytics workload: This workload runs on an all-purpose cluster.
Execution context: It is the state of a REPL environment. It supports Python, R, Scala, and SQL.
Model Management
The concepts that are needed to know how to build machine learning models are:
Model: This is a mathematical function that represents the relation between inputs and outputs.
Machine learning consists of training and inference steps. We can train a model by using an existing
data set and using that to predict the outcomes of new data.
Run: It is a collection of parameters, metrics, and tags that are related to training a machine learning
model.
Experiment: It is the primary unit of organization and access control for runs. All MLflow runs
belong to the experiment.
Authentication and Authorization
User and group: A user is an individual who has access to the system. A set of users is a group.
Access control list: Access control list (ACL) is a set of permissions that are attached to a principal,
which requires access to an object. ACL specifies the object and the actions allowed on it.
Look at Azure Interview Questions and take a bigger step toward building your career.
22. 191AIE503T CLOUD COMPUTING UNIT - IV
Databricks Machine Learning
Databricks machine learning is an integrated end-to-end machine learning platform incorporating managed
services for experiment tracking, model training, feature development and management, and feature and
model serving. Databricks machine learning automates the creation of a cluster that is optimized for
machine learning. Databricks Runtime ML clusters include the most popular machine learning libraries
such as TensorFlow, PyTorch, Keras, and XGBoost. It also includes libraries, such as Horovod, that are
required for distributed training.
With Databricks machine learning, we can:
Train models either manually or with AutoML
Track training parameters and models by using experiments with MLflow tracking
Create feature tables and access them for model training and inference
Share, manage, and serve models by using Model Registry
We also have access to all of the capabilities of Azure Databricks workspace such as notebooks, clusters,
jobs, data, Delta tables, security and admin controls, and many more.
When to use Databricks
1. Modernize your Data Lake – if you are facing challenges around performance and reliability in your
data lake, or your data lake has become a data swamp, consider Delta as an option to modernize
your Data Lake.
2. Production Machine Learning – if your organization is doing data science work but is having trouble
getting that work into the hands of business users, the Databricks platform was built to enable data
scientists from getting their work from Development to Production.
3. Big Data ETL – from a cost/performance perspective, Databricks is best in its class.
4. Opening your Data Lake to BI users – If your analyst / BI group is consistently slowed down by the
major lift of the engineering team having to build a pipeline every time they want to access new
data, in might make sense to open the Data Lake to these users through a tool like SQL Analytics
within Databricks.
When not to use Databricks
There are a few scenarios when using Databricks is probably not the best fit for your use case:
1. Sub-second queries – Spark, being a distributed engine, has overhead involved in processing that
make it nearly impossible to get sub-second queries. Your data can still live in the data lake, but for
sub-second queries you will likely want to use a highly tuned speed layer.
2. Small data – Similar to the first point, you won't get the majority of the benefits of Databricks if you
are dealing with very small data (think GBs).
23. 191AIE503T CLOUD COMPUTING UNIT - IV
3. Pure BI without a supporting data engineering team – Databricks and SQL Analytics does not erase
the need for a data engineering team – in fact, they are more critical than ever in unlocking the
potential of the Data Lake. That said, Databricks offers tools to enable the data engineering team
itself.
4. Teams requiring drag and drop ETL – Databricks has many UI components but drag and drop code
is not currently one of them.
Usage of Internet of Things (IoT) Hub
Azure IoT hub allows you to get on with developing cool IoT stuff, and not worry about how it all gets
connected up and managed.
Internet of Things (IoT) offers businesses immediate and real-world opportunities to reduce costs, to
increase revenue, as well as transforming their businesses. Azure IoT hub is a managed IoT service which
is hosted in the cloud. It allows bi-directional communication between IoT applications and the devices it
manages. This cloud-to-device connectivity means that you can receive data from your devices, but you
can also send commands and policies back to the devices. How Azure IoT hub differs from the existing
solutions is that it also provides the infrastructure to authenticate, connect and manage the devices
connected to it.
24. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure IoT Hub allows full-featured and scalable IoT solutions. Virtually, any device can be connected
to Azure IoT Hub and it can scale up to millions of devices. Events can be tracked and monitored, such
as the creation, failure, and connection of devices.
Azure IoT Hub provides,
Device libraries for the most commonly used platforms and languages for easy device connectivity.
Secure communications with multiple options for device-to-cloud and cloud-to-device hyper-scale
communication.
Queryable storage of per-device state information as well as meta-data.
Managing devices with IoT Hub
The needs and requirements of IoT operators vary substantially in different industries, from transport
to manufacturing to agriculture to utilities. There is also a wide variation in the types of devices used
by IoT operators. IoT Hub is able to provide the capabilities, patterns and code libraries to allow
developers to build management solutions that can manage very diverse sets of devices.
Configuring and controlling devices
Devices which are connected to IoT Hub can be managed using an array of built-in functionality. This
means that-
Device metadata and state information for all your devices can be stored, synchronized and queried.
Device state can be set either per-device or in groups depending on common characteristics of the
devices.
A state change in a device can be automatically responded to by using message routing integration.
The lifecycle of devices with IoT Hub
Plan
Operators can create a device metadata scheme that allows them to easily carry out bulk management
25. 191AIE503T CLOUD COMPUTING UNIT - IV
operations.
Provision
New devices can be securely provisioned to IoT Hub and operators can quickly discover device
capabilities. The IoT Hub identity registry is used to create device identities and credentials.
Configure
Device management operations, such as configuration changes and firmware updates can be done in
bulk or by direct methods, while still maintaining system security.
Monitor
Operators can be easily alerted to any issues arising and at the same time the device collection health
can be monitored, as well as the status of any ongoing operations.
Retire
Devices need to be replaced, retired or decommissioned. The IoT Hub identity registry is used to
withdraw device identities and credentials.
Device management patterns
IoT Hub supports a range of device management patterns including,
Reboot
Factory reset
Configuration
Firmware update
Reporting progress and status
These patterns can be extended to fit your exact situation. Alternatively, new patterns can be designed based
on these templates.
Connecting your devices
You can build applications which run on your devices and interact with IoT Hub using the Azure IoT device
SDK. Windows, Linux distributions, and real-time operating systems are supported platforms. Supported
languages currently include,
C
C#
Java
Python
Node.js.
Messaging Patterns
Azure IoT Hub supports a range of messaging patterns including,
Device to cloud telemetry
File upload from devices
Request-reply methods which enable devices to be controlled from the cloud
Message routing and event grid
26. 191AIE503T CLOUD COMPUTING UNIT - IV
Both IoT Hub message routing and IoT Hub integration with Event Grid makes it possible to stream data
from your connected devices. However, there are differences. Message routing allows users to route device-
to-cloud messages to a range of supported service endpoints such as Event Hubs and Azure Storage
containers while IoT Hub integration with Event Grid is a fully managed routing service which can be
extended into third-party business applications.
Device data can be routed
In Azure IoT Hub, the message routing functionality is built in. This allows you to set up automatic rules-
based message fan-out. You can use message routing to decide where your hub sends your devices’
telemetry. Routing messages to multiple endpoints don’t incur any extra costs.
Building end-to-end solutions
End-to-end solutions can be built by integrating IoT Hub with other Azure services. For example,
Business processes can be automated using Azure Logic Apps.
You can run analytic computations in real-time on the data from your devices using Azure Stream
Analytics.
AI models and machine learning can be added using Azure Machine Learning.
You can respond rapidly to critical events with Azure Event Grid.
Azure IoT Hub or Azure Event Hub?
Both Azure IoT Hub and Azure Event Hub are cloud services which can ingest, process and store large
amounts of data. However, they were designed with different purposes in mind. Event Hub was developed
for big data streaming while IoT Hub was designed specifically to connect IoT devices at scale to the Azure
Cloud. Therefore, which one you choose to use will depend on the demands of your business.
Security
Businesses face security, privacy, and compliance challenges which are unique to the IoT. Security for IoT
solutions means that devices need to be securely provisioned and there needs to be secure connectivity
between the devices and the cloud, as well as secure data protection in the cloud during processing and
storage.
27. 191AIE503T CLOUD COMPUTING UNIT - IV
IoT Hub allows data to be sent on secure communications channels. Each device connects securely to the
hub and each device can be managed securely. You can control access at the per-device level and devices
are automatically provisioned to the correct hub when the device first boots up.
There’s also a range of different types of authentication depending on device capabilities, including SAS
SAS token-based authentication, individual X.509 certificate authentication for secure, standards-based
authentication, as well as X.509 CA authentication.
High Availability and Disaster Recovery
Uptime goals vary from business to business. Azure IoT Hub offers three main High Availability (HA) and
Disaster Recovery (DR) features including:
Intra-region HA
The IoT Hub service provides intra-region HA by implementing redundancies in almost all layers of
the service. The SLA published by the IoT Hub service is achieved by making use of these
redundancies and are available automatically to developers. However, transient failures should be
expected when using cloud computing; therefore, appropriate retry policies need to be built into
components which interact with the cloud in order to deal with these transient failures.
Cross region DR
Situations may arise when a datacentre suffers from extended outages or some other physical
failure. It is rare but possible that intra-region HA capability may not be able to help in some of
these situations. However, IoT Hub has a number of possible solutions for recovering from extended
outages or physical failures. In these situations, a customer can have a Microsoft initiated failover or
a manual failover.
Both of these options offer the following recovery time objectives (RTO),
Achieving cross region HA
28. 191AIE503T CLOUD COMPUTING UNIT - IV
If the RTOs provided by either the Microsoft initiated failover or manual failover aren’t sufficient
for your uptime goals, then another option is to implement a per-device automatic cross region
failover mechanism. In this model, the IoT solution runs in a primary and secondary datacentre in
two different locations. If there’s an outage or a loss of network connectivity in the primary region,
the devices can use the secondary location.
Choosing the right IoT Hub tier
Azure IoT hub offers two tiers, basic and standard. The basic tier which is uni-directional from
devices to the cloud is more suitable if the data is going to be gathered from devices and analyzed
centrally. However, if you want bi-directional communication, enabling you to, for example, control
devices remotely, then the standard tier is more appropriate. Both tiers have the same security and
authentication features.
Each tier has three different sizes (1, 2 and 3), depending on how much data they can handle in a
day. For instance, a level 3 unit can handle 300 million messages a day while a level 1 unit can
handle 400,000.
IoT Central:
https://www.thethingsindustries.com/docs/integrations/cloud-integrations/azure-iot-central/device-
templates/
IoT Central is an IoT application platform as a service (aPaaS) that reduces the burden and cost of
developing, managing, and maintaining enterprise-grade IoT solutions. If you choose to build with IoT
Central, you'll have the opportunity to focus time, money, and energy on transforming your business
with IoT data, rather than just maintaining and updating a complex and continually evolving IoT
infrastructure.
29. 191AIE503T CLOUD COMPUTING UNIT - IV
The web UI lets you quickly connect devices, monitor device conditions, create rules, and manage
millions of devices and their data throughout their life cycle. Furthermore, it enables you to act on device
insights by extending IoT intelligence into line-of-business applications.
The key features of the Azure IoT Hub integration are:
Handling uplink messages: The Things Stack publishes uplink messages to an Azure IoT Central
Application
Automatic device provisioning: end devices are automatically created into the Azure IoT Central
Application, using the LoRaWAN device repository information in order to provision the end device
template
Updating device state in Device Twin: update the device reported properties based on the decoded
payloads, and schedule downlinks based on the device desired properties
Architecture
The Azure IoT Central integration does not require any additional physical resources in your Azure
account. It connects to the Azure IoT Central Application using the underlying Azure IoT Device
Provisioning Service, then submits traffic using the Azure IoT Hub in which the application has been
provisioned.
30. 191AIE503T CLOUD COMPUTING UNIT - IV
The single resource deployed in your Azure Account is the Azure IoT Central Application. All
permissions are the minimum permissions for the integration to function.
Implementation details
Azure IoT Hub is designed around standalone end devices communicating directly with the hub. Each end
device must connect to the hub via one of the supported communication protocols (MQTT / AMQP). These
protocols are inherently stateful - each individual end device must have one connection always open in
order to send and receive messages from the Azure IoT Hub.
LoRaWAN end devices are in general low power, low resources devices with distinct traffic patterns.
Communication in the LoRaWAN world also does not have the concept of a connection, in the TCP sense,
but instead focuses on a communication session. Downlink traffic, which would map to IoT Hub cloud-to-
device messages, occurs rarely at application layer for most use cases. As such, keeping a connection open
per end device is both wasteful and hard to scale, as both communication protocols mentioned above
enforce that each end device has its own individual connection, and no subscription groups semantics are
available.
Based on the above arguments, the Azure IoT Central integration prefers to use an asynchronous, stateless
communication style. When uplink messages are received from an end device, the integration connects on
demand to the Azure IoT Hub and submits the message, and also updates the Device Twin. The data plane
protocol used between The Things Stack and Azure IoT Hub is MQTT, and the connections are always
secure using TLS 1.2.
31. 191AIE503T CLOUD COMPUTING UNIT - IV
Device Twin desired properties updates and device creation or deletion events are received by The Things
Stack using an IoT Central Data Export. The Data Export submits the data via HTTP requests which are
authenticated using the API key provided during the integration provisioning, and connections are always
done over TLS. This pipeline allows The Things Stack to avoid long running connections to the Azure IoT
Hub.
Azure Sphere
Microsoft’s website states that “Azure Sphere is a solution for creating highly secured, connected
Microcontroller (MCU) devices” (source). But it is not just about MCU, of course.
The solution also includes an operating system and an application platform. This provides product
manufacturers with a chance to create secured, internet-connected devices that can be controlled,
updated, monitored and maintained remotely.
Azure Sphere is a secured, high-level application platform with built-in communication and security
features for internet-connected devices. It comprises a secured, connected, crossover microcontroller unit
(MCU), a custom high-level Linux-based operating system (OS), and a cloud-based security service that
provides continuous, renewable security.
The Azure Sphere MCU integrates real-time processing capabilities with the ability to run a high-level
operating system. An Azure Sphere MCU, along with its operating system and application platform,
enables the creation of secured, internet-connected devices that can be updated, controlled, monitored,
and maintained remotely. A connected device that includes an Azure Sphere MCU, either alongside or in
place of an existing MCUs, provides enhanced security, productivity, and opportunity. For example:
A secured application environment, authenticated connections, and opt-in use of peripherals
minimizes security risks due to spoofing, rogue software, or denial-of-service attacks, among
others.
Software updates can be automatically deployed from the cloud to any connected device to fix
problems, provide new functionality, or counter emerging methods of attack, thus enhancing the
productivity of support personnel.
Product usage data can be reported to the cloud over a secured connection to help in diagnosing
problems and designing new products, thus increasing the opportunity for product service, positive
customer interactions, and future development.
The Azure Sphere Security Service is an integral aspect of Azure Sphere. Using this service, Azure
Sphere MCUs safely and securely connect to the cloud and web. The service ensures that the device boots
only with an authorized version of genuine, approved software. In addition, it provides a secured channel
through which Microsoft can automatically download and install OS updates to deployed devices in the
field to mitigate security problems. Neither manufacturer nor end-user intervention is required, thus
closing a common security hole.
32. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure Sphere consists of three main parts:
Secured Micro-controller Unit (MCU)
The first part is a crossover class of MCU with built-in Microsoft security technology and connectivity.
Each Azure Sphere MCU includes a wireless communications subsystem that facilitates an internet
connection.
It is worth mentioning that the Sphere’s MCU provides a kind of a hardware firewall or “sandbox” that
ensures that only certain I/O peripherals are accessible to the core to which they are mapped. Consequently,
you cannot connect any sensors without first declaring them.
The application processor also features an ARM Cortex-A subsystem, responsible for executing the
operating system, applications and services. It supports two operating environments:
Normal World (NW) – executes code in both user mode and supervisor mode
Secure World (SW) – executes only the Microsoft-supplied Security Monitor.
Secured OS
The second component is a highly-secured OS from Microsoft with a custom kernel running on top of
Microsoft’s Security Monitor. This creates a trustworthy defense in depth platform.
The purpose of the OS services is two-fold: to host the application container, and to facilitate the
communication with the Azure Sphere Security Service described further. These services manage Wi-Fi
authentication, including network firewall for all outbound traffic.
33. 191AIE503T CLOUD COMPUTING UNIT - IV
Cloud Security
The Azure Sphere Security Service guards every Azure Sphere device by renewing security, identifying
emerging threats, and brokering trust among devices and the cloud. It also provides certificate-based
authentication. Additionally, the remote attestation service connects with the device to test if it booted
with the correct software, including its version.
Furthermore, the Security Service distributes automatic updates for all Microsoft-supplied Azure Sphere
OS and OEM software. As a result, manufacturers can securely update their devices remotely without
having to worry about whether any update is falsified.
Finally, there is a small crash-reporting module which provides crash reporting for deployed software.
How does Azure Sphere work in practice?
You might wonder how to use Azure Sphere in a real-life scenario. Let’s say that our company, Predica,
is a manufacturer of washing machines.
In our example, Predica provides high-class, intelligent washing machines that users can remotely
control from a mobile app. Each washing machine has an embedded Azure Sphere MCU.
Predica has a software development team responsible for developing both software for the washing
machines, as well as the mobile application. There is also a support team responsible for maintenance
and detection of potential errors.
Take a look at the diagram below that visualizes the scenario:
35. 191AIE503T CLOUD COMPUTING UNIT - IV
Microsoft – handles the security aspect. The Azure Sphere Security Service is used to send system
updates automatically, so Predica as the manufacturer does not have to worry about them
Predica software team – develops and releases revisions of software for the washing machines,
which is uploaded to the devices using Microsoft Azure cloud services
Predica support team – responsible for maintenance, checking the system and application versions
on each washer, as well as detecting possible issues.
Azure Sphere provides a way to monitor and control all devices in a secured and centralized way. This is the
real power of this solution.
How to begin your journey with Azure Sphere?
The Azure Sphere Development Board (hardware) is already available to you. You can order it from
the Seeed Studio online store. However, once you receive the board, there a few additional things that you
will need in to get started:
Visual Studio 2017 IDE – Enterprise, Professional or Community, version 15.7 or later
A PC running Windows 10 Anniversary Update or later
Azure Sphere SDK Preview for Visual Studio
An unused USB port on the PC.
It is important to note that at this time the tools for Azure Sphere are still in preview. You do not require a
Microsoft Azure cloud subscription to use Azure Sphere and start development.
Azure Sphere and the seven properties of highly secured devices
A primary goal of the Azure Sphere platform is to provide high-value security at a low cost, so that price-
sensitive, microcontroller-powered devices can safely and reliably connect to the internet. As network-
connected toys, appliances, and other consumer devices become commonplace, security is of utmost
importance. Not only must the device hardware itself be secured, its software and its cloud connections
must also be secured. A security lapse anywhere in the operating environment threatens the entire product
and, potentially, anything or anyone nearby.
Based on Microsoft's decades of experience with internet security, the Azure Sphere team has
identified seven properties of highly secured devices. The Azure Sphere platform is designed around these
seven properties:
Hardware-based root of trust. A hardware-based root of trust ensures that the device and its identity
cannot be separated, thus preventing device forgery or spoofing. Every Azure Sphere MCU is identified by
an unforgeable cryptographic key that is generated and protected by the Microsoft-designed Pluton security
subsystem hardware. This ensures a tamper-resistant, secured hardware root of trust from factory to end
user.
Defense in depth. Defense in depth provides for multiple layers of security and thus multiple mitigations
against each threat. Each layer of software in the Azure Sphere platform verifies that the layer above it is
secured.
36. 191AIE503T CLOUD COMPUTING UNIT - IV
Small trusted computing base. Most of the device's software remains outside the trusted computing base,
thus reducing the surface area for attacks. Only the secured Security Monitor, Pluton runtime, and Pluton
subsystem—all of which Microsoft provides—run on the trusted computing base.
Dynamic compartments. Dynamic compartments limit the reach of any single error. Azure Sphere MCUs
contain silicon counter-measures, including hardware firewalls, to prevent a security breach in one
component from propagating to other components. A constrained, "sandboxed" runtime environment
prevents applications from corrupting secured code or data.
Password-less authentication. The use of signed certificates, validated by an unforgeable cryptographic
key, provides much stronger authentication than passwords. The Azure Sphere platform requires every
software element to be signed. Device-to-cloud and cloud-to-device communications require further
authentication, which is achieved with certificates.
Error reporting. Errors in device software or hardware are typical in emerging security attacks; errors that
result in device failure constitute a denial-of-service attack. Device-to-cloud communication provides early
warning of potential errors. Azure Sphere devices can automatically report operational data and errors to a
cloud-based analysis system, and updates and servicing can be performed remotely.
Renewable security. The device software is automatically updated to correct known vulnerabilities or
security breaches, requiring no intervention from the product manufacturer or the end user. The Azure
Sphere Security Service updates the Azure Sphere OS and your applications automatically.
Azure Sphere architecture
Working together, the Azure Sphere hardware, software, and Security Service enable unique, integrated
approaches to device maintenance, control, and security.
The hardware architecture provides a fundamentally secured computing base for connected devices,
allowing you to focus on your product.
The software architecture, with a secured custom OS kernel running atop the Microsoft-written Security
Monitor, similarly enables you to concentrate your software efforts on value-added IoT and device-specific
features.
The Azure Sphere Security Service supports authentication, software updates, and error reporting over
secured cloud-to-device and device-to-cloud channels. The result is a secured communications
infrastructure that ensures that your products are running the most up-to-date Azure Sphere OS. For
architecture diagrams and examples of cloud architectures, see Browse Azure Architectures.
Hardware architecture
An Azure Sphere crossover MCU consists of multiple cores on a single die, as the following figure shows.
Azure Sphere MCU hardware architecture
Each core, and its associated subsystem, is in a different trust domain. The root of trust resides in the Pluton
security subsystem. Each layer of the architecture assumes that the layer above it may be compromised.
Within each layer, resource isolation and dynamic compartments provide added security.
37. 191AIE503T CLOUD COMPUTING UNIT - IV
Microsoft Pluton security subsystem
The Pluton security subsystem is the hardware-based (in silicon) secured root of trust for Azure Sphere. It
includes a security processor core, cryptographic engines, a hardware random number generator,
public/private key generation, asymmetric and symmetric encryption, support for elliptic curve digital
signature algorithm (ECDSA) verification for secured boot, and measured boot in silicon to support remote
attestation with a cloud service, as well as various tampering counter-measures including an entropy
detection unit.
As part of the secured boot process, the Pluton subsystem boots various software components. It also
provides runtime services, processes requests from other components of the device, and manages critical
components for other parts of the device.
High-level application core
The high-level application core features an ARM Cortex-A subsystem that has a full memory management
unit (MMU). It enables hardware-based compartmentalization of processes by using trust zone functionality
and is responsible for running the operating system, high-level applications, and services. It supports two
operating environments: Normal World (NW), which runs code in both user mode and supervisor mode,
and Secure World (SW), which runs only the Microsoft-supplied Security Monitor. Your high-level
applications run in NW user mode.
Real-time cores
The real-time cores feature an ARM Cortex-M I/O subsystem that can run real-time capable applications
as either bare-metal code or a real-time operating system (RTOS). Such applications can map peripherals
and communicate with high-level applications but cannot access the internet directly.
Connectivity and communications
The first Azure Sphere MCU provides an 802.11 b/g/n Wi-Fi radio that operates at both 2.4GHz and 5GHz.
High-level applications can configure, use, and query the wireless communications subsystem, but they
cannot program it directly. In addition to or instead of using Wi-Fi, Azure Sphere devices that are properly
equipped can communicate on an Ethernet network.
Multiplexed I/O
The Azure Sphere platform supports a variety of I/O capabilities, so that you can configure embedded
devices to suit your market and product requirements. I/O peripherals can be mapped to either the high-
level application core or to a real-time core.
Microsoft firewalls
Hardware firewalls are silicon countermeasures that provide "sandbox" protection to ensure that I/O
peripherals are accessible only to the core to which they are mapped. The firewalls impose
compartmentalization, thus preventing a security threat that is localized in the high-level application core
from affecting the real-time cores' access to their peripherals.
38. 191AIE503T CLOUD COMPUTING UNIT - IV
Integrated RAM and flash
Azure Sphere MCUs include a minimum of 4MB of integrated RAM and 16MB of integrated flash memory.
Software architecture and OS
The high-level application platform runs the Azure Sphere OS along with a device-specific high-level
application that can communicate both with the internet and with real-time capable applications that run on
the real-time cores. The following figure shows the elements of this platform.
Microsoft-supplied elements are shown in gray.
High-level Application Platform
Microsoft provides and maintains all software other than your device-specific applications. All software
that runs on the device, including the high-level application, is signed by the Microsoft certificate authority
(CA). Application updates are delivered through the trusted Microsoft pipeline, and the compatibility of
each update with the Azure Sphere device hardware is verified before installation.
Application runtime
The Microsoft-provided application runtime is based on a subset of the POSIX standard. It consists of
libraries and runtime services that run in NW user mode. This environment supports the high-level
applications that you create.
Application libraries support networking, storage, and communications features that are required by high-
level applications but do not support direct generic file I/O or shell access, among other constraints. These
restrictions ensure that the platform remains secured and that Microsoft can provide security and
maintenance updates. In addition, the constrained libraries provide a long-term stable API surface so that
system software can be updated to enhance security while retaining binary compatibility for applications.
OS services
OS services host the high-level application container and are responsible for communicating with the Azure
Sphere Security Service. They manage network authentication and the network firewall for all outbound
traffic. During development, OS services also communicate with a connected PC and the application that
is being debugged.
Custom Linux kernel
The custom Linux-based kernel runs in supervisor mode, along with a boot loader. The kernel is carefully
tuned for the flash and RAM footprint of the Azure Sphere MCU. It provides a surface for preemptable
execution of user-space processes in separate virtual address spaces. The driver model exposes MCU
peripherals to OS services and applications. Azure Sphere drivers include Wi-Fi (which includes a TCP/IP
networking stack), UART, SPI, I2C, and GPIO, among others.
39. 191AIE503T CLOUD COMPUTING UNIT - IV
Security Monitor
The Microsoft-supplied Security Monitor runs in SW. It is responsible for protecting security-sensitive
hardware, such as memory, flash, and other shared MCU resources and for safely exposing limited access
to these resources. The Security Monitor brokers and gates access to the Pluton Security Subsystem and the
hardware root of trust and acts as a watchdog for the NW environment. It starts the boot loader, exposes
runtime services to NW, and manages hardware firewalls and other silicon components that are not
accessible to NW.
Azure Sphere Security Service
The Azure Sphere Security Service comprises three components: password-less authentication, update, and
error reporting.
Password-less authentication. The authentication component provides remote attestation and
password-less authentication. The remote attestation service connects via a challenge-response
protocol that uses the measured boot feature on the Pluton subsystem. It verifies not merely that the
device booted with the correct software, but with the correct version of that software.
After attestation succeeds, the authentication service takes over. The authentication service
communicates over a secured TLS connection and issues a certificate that the device can present to a
web service, such as Microsoft Azure or a company's private cloud. The web service validates the
certificate chain, thus verifying that the device is genuine, that its software is up to date, and that
Microsoft is its source. The device can then connect safely and securely with the online service.
Update. The update service distributes automatic updates for the Azure Sphere OS and for
applications. The update service ensures continued operation and enables the remote servicing and
update of application software.
Error reporting. The error reporting service provides simple crash reporting for deployed software.
To obtain richer data, use the reporting and analysis features that are included with a Microsoft Azure
subscription.
Azure Cloud shell and Mobile Apps
Azure Cloud Shell is another service under the Microsoft banner that enables you to have a Bash or
PowerShell console without changing your browser. Since the service is browser-based, there's no problem
about having a local setup run for the two platforms. Azure Cloud Shell is basically what the cloud is
integrated with. There's no point in worrying about the underlying infrastructure if your only focus is on
the console. With Azure Cloud Shell, the key is to develop and manage Azure resources in a friendlier
environment. The service offers a pre-configured, browser-accessible shell experience to take care of the
Azure resources without incurring an additional cost of machine maintenance, versioning, and installation.
And since the whole idea is to provide interactive sessions through Cloud Shell, the machine that works on
a per-request basis automatically terminates the activity if left idle for 20 minutes. The latest upgrades
enable Azure Cloud Shell to run on Ubuntu 16.04 LTS.
Getting Started with Azure Cloud Shell
40. 191AIE503T CLOUD COMPUTING UNIT - IV
The Cloud Shell service can be used within the Azure Container service based on your subscription type.
Not every subscriber have to pay for the storage account separately. If your subscription allows, it can be
created and associated with the current package.
Also, the storage account is tied to the Cloud Shell and can be used right away. The container is mounted
under the PowerShell user profile. In short, the Azure Cloud Shell is your Microsoft-managed admin
machine in Azure which enables you to:
Get authentic virtual access to Azure Shell from anywhere in the world
Use common programming languages and tools in a Shell that's maintained and updated by
Microsoft
Persists your data files across sessions in Azure files
With Azure, you have the flexibility to choose according to the preferred shell experience that perfectly
matches the way you work. Both PowerShell and Bash experiences are available.
Microsoft Azure Cloud Shell Important Features
Here are the top most important features associated with Azure Cloud Shell:
Automatic Authentication for Improved Security
Cloud shell automatically and securely authenticates account access for PowerShell and Azure CLI. This
means that the interactive session will terminate if the shell inactivity persists for more than 20 minutes.
This automatic feature help improves security.
Persistence Across Sessions
To help the user with a stick with the files across sessions, you get a walk through with Cloud Shell, that
instantly attaches on Azure file share right on the launch. After the session is completed, the Cloud Shell
will attach itself to your storage and persist for all the sessions in the future. Moreover, your home directory
is saved as a .img file in your Azure File share. The files that are outside of the machine state or home
directory are not persisted across sessions. It is best to refer to the best practices for Cloud Shell for storing
secrets like the SSH keys.
Virtual Access from Anywhere
The service allows you to connect to Azure platform using a browser-based, authenticated shell experience
that is hosted in the cloud and can be accessed from anywhere. The Cloud Shell service can be utilized by
a unique user as per the automatic assignment. The user account is then authenticated for each session for
increased security. To enjoy a modern CLI experience using multiple access points - including Azure
mobile app, shell.azure.com, Azure docs (such as Azure PowerShell, Azure CLI), Azure portal, and VS
Code Azure Account Extension.
Common Programming Languages and Tools
Just like any other component of the Microsoft, the platform regularly updates and maintains the Cloud
Shell. The browser-based service naturally comes common CLI tools, which include PowerShell modules,
Linux Shell interpreters, source control, text editors, Azure tools, container tools, build tools, database tools,
41. 191AIE503T CLOUD COMPUTING UNIT - IV
and many more. On the other hand, Cloud Shell also works with a number of supportive programming
languages. The most popular ones include Python, .NET, and Node.js.
Azure Drive
Cloud Shell in PowerShell begins in the Azure Drive. This enables you to navigate through the entire range
of Azure resources including Storage, Network, and Compute among the rest. The process of discovery and
navigation are similar to filesystem navigation. However, the drive really doesn't matter as you can still
manage the resources using Azure PowerShell cmdlets. Whatever changes you make to the Azure resources
will be reflected in the drive right away. To refresh the resources, run dir-Force.
Configured and Authenticated Azure Workstation
Naturally, one cannot deny the security and authentication of Cloud Shell as it works under the most reliable
name, Microsoft. In fact, Microsoft manages the Cloud Shell and ensures popular language support and
command-line tools as mentioned earlier. Cloud Shell is also responsible for securely authenticating the
instant and automatic access to the resources using Azure CLI.
Seamless Deployment
One of the latest updates of Cloud Shell is the graphical text editor. The feature is integrated based on the
open-source called the Monaco Editor. The feature enables you to create and customize files by running
code. This helps with seamless and smooth deployment through Azure PowerShell or Azure CLI 2.0.
As far as the pricing is concerned the Cloud Shell machine hosting services are free. These services are a
pre-requisite of a mounted Azure Files share. However, to access all the features and to utilize the storage,
the regular cost may apply. The best way to get the hang of it and to use it for maximum benefits, it is best
to get Azure training and Azure certification for a more detailed understanding of Azure Cloud Shell.
Azure Mobile Apps
Azure Mobile Apps (also known as the Microsoft Data sync Framework) gives enterprise developers and
system integrators a mobile-application development platform that's highly scalable and globally
available. The framework provides your mobile app with:
Authentication
Data query
Offline data synchronization
42. 191AIE503T CLOUD COMPUTING UNIT - IV
Azure Mobile Apps is designed to work with Azure App Service. Since it's based on ASP.NET 6, it can
also be run as a container in Azure Container Apps or Azure Kubernetes Service.
Why Mobile Apps?
With the Mobile Apps SDKs, you can:
Build native and cross-platform apps: Build cloud-enabled apps for Android™, iOS, or Windows
using native SDKs.
Connect to your enterprise systems: Authenticate your users with Azure Active Directory, and
connect to enterprise data stores.
Build offline-ready apps with data sync: Make your mobile workforce more productive by building
apps that work offline. Use Azure Mobile Apps to sync data in the background.
Azure Mobile Apps features
The following features are important to cloud-enabled mobile development:
Authentication and authorization: Use Azure Mobile Apps to sign-in users using social and
enterprise provides. Azure App Service supports Azure Active Directory, Facebook™, Google®,
Microsoft, Twitter®, and OpenID Connect®. Azure Mobile Apps supports any authentication
scheme that is supported by ASP.NET Core.
Data access: Azure Mobile Apps provides a mobile-friendly OData v4 data source that's linked to a
compatible database via Entity Framework Core. Any compatible database can be used including
Azure SQL, Azure Cosmos DB, or an on-premises Microsoft SQL Server.
43. 191AIE503T CLOUD COMPUTING UNIT - IV
Offline sync: Build robust and responsive mobile applications that operate with an offline dataset.
You can sync this dataset automatically with service, and handle conflicts with ease.
Client SDKs: There's a complete set of client SDKs that cover cross-platform development (.NET,
and Apache Cordova™). Each client SDK is available with an MIT license and is open-source.
Azure App Service features
The following platform features are useful for mobile production sites:
Autoscaling: With App Service, you can quickly scale up or scale out to handle any incoming
customer load. Manually select the number and size of VMs, or set up autoscaling to scale your
service based on load or schedule.
Staging environments: App Service can run multiple versions of your site. You can perform A/B
testing and do in-place staging of a new mobile service.
Continuous deployment: App Service can integrate with common source control
management (SCM) systems, allowing you to easily deploy a new version of your mobile service.
Virtual networking: App Service can connect to on-premises resources by using virtual network,
Azure ExpressRoute, or hybrid connections.
Isolated and dedicated environments: For securely running Azure App Service apps, you can run
App Service in a fully isolated and dedicated environment. This environment is ideal for application
workloads that require high scale, isolation, or secure network access.