SlideShare a Scribd company logo
| © Copyright 2015 Hitachi Consulting1
Enterprise Cloud Data Platforms
with Microsoft Azure & Cortana Analytical Suite
Khalid M. Salama, Ph.D.
Business Insights & Analytics
Hitachi Consulting UK
We Make it Happen. Better.
| © Copyright 2015 Hitachi Consulting2
Outline
 Microsoft Azure and Cloud Computing
 Azure Data Architecture and Services
 EDW with Azure SQL Data Warehouse
 Big Data Storage and Analytics on Azure Data Lake and HDInsight
 Data Processing and Movements with Azure Data Factory
 Real-time Processing with Azure Event Hubs and Stream Analytics
 Data Science and Machine Learning on Microsoft Azure
 Data Discoverability with Azure Data Catalog
| © Copyright 2015 Hitachi Consulting3
Microsoft Azure and Cloud
Computing
| © Copyright 2015 Hitachi Consulting4
Cloud Computing
Enabling the future of computing
• Web interfaces that deliver capacity in seconds or minutes
On-demand Self-
service
• Support many devices, such as mobile phones, laptops, tablets and workstations
Broad Network
Access
• Thousands to hundred of thousands severs are immediately usableResource Pooling
• On-demand or automatic (threshold-based) scale-up or down.Rapid Elasticity
• Pay as you use, what you use, Improve operational metric and cost accountingMeasured Service
• Modern IT capabilities and services provided frequentlyInnovation
| © Copyright 2015 Hitachi Consulting5
Cloud Computing on Microsoft Azure
Computing Models
| © Copyright 2015 Hitachi Consulting6
Cloud Computing on Microsoft Azure
Computing Models
| © Copyright 2015 Hitachi Consulting7
Cloud Computing on Microsoft Azure
Computing Models
| © Copyright 2015 Hitachi Consulting8
Cloud Computing on Microsoft Azure
Computing Models
Hybrid
| © Copyright 2015 Hitachi Consulting9
Cloud Computing on Microsoft Azure
Computing Models - Examples
Infrastructure as a Service (IaaS) – VMs and Networking Services
(build Software infrastructure, develop apps, use apps, Software maintained
infrastructure)
Platform as a Service (PaaS) – Data and App Services
(develop apps, use apps)
Software as a Service (SaaS) – SharePoint Online, Office 365, Microsoft
Dynamics, Visual Studio Online, Azure Cognitive Services, etc.
(use apps)
| © Copyright 2015 Hitachi Consulting10
Cloud Computing on Microsoft Azure
So why the cloud?
Agility
 Configure and provision in
minutes
 Automatic/on-demand scaling
up/down
Economics
 Pay as you use
(100% Utilization)
 Elastic capacity
Innovation
 Connectivity
 Innovative capabilities and
services
 New compute models
| © Copyright 2015 Hitachi Consulting11
Azure Data Architecture and
Services
| © Copyright 2015 Hitachi Consulting12
Enterprise Data Platforms
Challenges of Enterprise Information Management
| © Copyright 2015 Hitachi Consulting13
| © Copyright 2015 Hitachi Consulting14
Filter,
Reform,
Transpose
Extract, Load
| © Copyright 2015 Hitachi Consulting15
Aggregate/
Calculate
Filter,
Reform,
Transpose
Extract, Load
| © Copyright 2015 Hitachi Consulting16
Aggregate/
Calculate
Filter,
Reform,
Transpose
Extract, Load
| © Copyright 2015 Hitachi Consulting17
Big Data
Processing
 Hive/Pig
 U-SQL
 Spark
 Storm
Real-time Stream
Processing
 Stream Analytics
 Event Hubs
 Storm
 Spark Streaming
Data Science & Machine
Learning
 Azure Machine Learning
 Microsoft R Server
 Spark ML
 Azure Cognitive Services
NoSQL Data
Stores
 Table Storage
 HBase
 DocumentDB
Aggregate/
Calculate
Data Discovery &
Search
 Azure Data
Catalog
 Azure
Search
Filter,
Reform,
Transpose
Extract, Load
Data Orchestration & Movement
 Azure Data Factory
High Performance Computing
 Azure Batch
| © Copyright 2015 Hitachi Consulting18
Azure Enterprise Data Platform
Layers and relevance
Data Lake
Data
Warehouse
Information
Marts
Analytical &
Reporting
Products
Data
Strict /Limited
Batch
Big (Everything)
Data Professionals
Query
Access
Size
Processed - Structured
Available/ Wide
Small (Focused)
Information WorkersUsers
Raw - Unstructured
Interactive
Approach
High-valueLow - Unknown Value
Data  Structure Structure  Data
| © Copyright 2015 Hitachi Consulting19
Azure Enterprise Data Platform
Business Agility & IT Governance
Data Lake
Data
Warehouse
Information
Marts
Analytical &
Reporting
Products
| © Copyright 2015 Hitachi Consulting20
Build Poof of Concepts
and Proof of Values
Azure Enterprise Data Platform
Business Agility & IT Governance
Data Lake
Data
Warehouse
Information
Marts
Analytical &
Reporting
Products
| © Copyright 2015 Hitachi Consulting21
Build Poof of Concepts
and Proof of Values
Azure Enterprise Data Platform
Business Agility & IT Governance
Data Lake
Data
Warehouse
Information
Marts
Analytical &
Reporting
Products
Import new data to
a sandbox area in
the mart
Build a show-case
Power Pivot Model
Delve into data lake
for exploratory data
analysis
Spin up an
experimental VM or
Azure Services
Data Analysts and
Data Scientists
| © Copyright 2015 Hitachi Consulting22
Build Poof of Concepts
and Proof of Values
Azure Enterprise Data Platform
Business Agility & IT Governance
Data Lake
Data
Warehouse
Information
Marts
Analytical &
Reporting
Products
Validate/ Integrate/ Automate with
the Enterprise-wide Data Layers
New business definitions and requirements
Data Engineers
Import new data to
a sandbox area in
the mart
Build a show-case
Power Pivot Model
Delve into data lake
for exploratory data
analysis
Spin up an
experimental VM or
Azure Services
Data Analysts and
Data Scientists
| © Copyright 2015 Hitachi Consulting23
Azure Data Lake Storage and
Analytics
| © Copyright 2015 Hitachi Consulting24
Data Lake – The Concept
What is that? And Why it is needed?
What is a Data lake
Why a Data Lake
A repository that holds
raw data file extracts of
all the Enterprise source
systems
Pentaho (Hitachi Data
Systems) co-founder and
CTO, James Dixon coined
the term ‘Data Lake’
| © Copyright 2015 Hitachi Consulting25
Data Lake – The Concept
Data Processing
Data stored in its native form (Structured, Semi-structured, and Unstructured):
Text Files, JSON, XML, Logs, Mainframe, etc.
Big Data technologies, such as MapReduce, Pig, Hive, Spark, and U-SQL, are used
to process the data in a procedural fashion:
Transpose/ reshape data
into a tabular form
Extract/filter valuable
and relevant data
elements
Load prepared datasets
into a relational EDW
TEL
| © Copyright 2015 Hitachi Consulting26
Azure Data Lake
Azure Data Lake Services
Data Lake Storage
 Built for data files, rather than Blobs
 WebHDFS REST APIs for Hadoop integration
 Optimized for analytical workloads
 Unlimited Storage, petabyte files
Data Lake Analytics
 Big Data Jobs as a Service
 U-SQL
 Unit of Cost: running Job
HDInsight
 Hadoop Cluster as a Service
 MapReduce | Pig | Hive | Spark
HBase | Oozie | Storm
 Unit of Cost: Running nodes
| © Copyright 2015 Hitachi Consulting27
Azure Data Lake Analytics
U-SQL
 C# meets SQL (.NET data type, C#
expressions + SQL set statements)
 Natively Parallel
 Set-based and procedural processing
 Works with structured and unstructured data
 Extensible (User-defined Functions,
Operators and Aggregate Functions)
 Reuse of custom build .NET assemblies
| © Copyright 2015 Hitachi Consulting28
Azure HDInsight
Hadoop on Microsoft Cloud
Windows Azure Blob Storage | Data Lake (DFS)
Applications
In-Memory Stream SQL
 Spark-
SQL
NoSQL Machine
Learning
Batch
Yet Another Resource Negotiator (YARN)
Orchest. Mgmnt.Acquisition
| © Copyright 2015 Hitachi Consulting29
Azure SQL Data Warehouse
| © Copyright 2015 Hitachi Consulting30
Azure SQL Data Warehouse
Enterprise Data Warehouse – why?
| © Copyright 2015 Hitachi Consulting31
Azure SQL Data Warehouse
Enterprise Data Warehouse – why?
“ That is great! Now we have a data lake that has all the data that we want. Let’s spin up
a VM, build a database, and produce the reports we want.”
– Said Mr. quick-win from the business.
| © Copyright 2015 Hitachi Consulting32
Azure SQL Data Warehouse
Enterprise Data Warehouse – why?
“ That is great! Now we have a data lake that has all the data that we want. Let’s spin up
a VM, build a database, and produce the reports we want.”
– Said Mr. quick-win from the business.
“No. The data lake has raw data with no business structure. High-value data needs to be
loaded into a canonical relational form, where enterprise-wide business entities ,
calculations, and rules are defined. Building reporting products on top of the data lake will
results in unreliable information silos without single version of the truth.”
– Said Mr. enterprise data architect from the IS.
| © Copyright 2015 Hitachi Consulting33
Azure SQL Data Warehouse
A Parallel DW on the cloud
An elastic, Massively Parallel Processing (MPP), cloud-based
data warehouse as a service, with enterprise-class features
Suitable for
 Batch data processing workloads
 Consolidate high-value data into a single
enterprise-wide structured store
 Model, transform and aggregate data
 Perform query analysis across large
Not Suitable for
 Operational workloads
 High frequency reads & writes
 Singleton operations  (Information Mart)
 Row by row processing needs  (Data Lake)
 High concurrency  (Information Mart)
| © Copyright 2015 Hitachi Consulting34
Massively Parallel Processing
(MPP)
Azure SQL Data Warehouse
Scale-out
Distributed
Queries
| © Copyright 2015 Hitachi Consulting35
Azure SQL Data Warehouse
MPP - Logical Overview
 60 Data Distribution Nodes
 Each node contains a portion of data
(distribution) for parallel processing
 Storage is de-coupled of compute
Storage
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
| © Copyright 2015 Hitachi Consulting36
Storage
Azure SQL Data Warehouse
MPP - Logical Overview
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Compute  A scale-out distributed query engine
| © Copyright 2015 Hitachi Consulting37
Storage
Azure SQL Data Warehouse
MPP - Logical Overview
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Compute  A scale-out distributed query engine
| © Copyright 2015 Hitachi Consulting38
Storage
Azure SQL Data Warehouse
MPP - Logical Overview
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Compute  A scale-out distributed query engine
 You get more compute capacity by
increasing Data Warehouse Units (DWUs)
| © Copyright 2015 Hitachi Consulting39
Storage
Azure SQL Data Warehouse
MPP - Logical Overview
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Compute  A scale-out distributed query engine
 You get more compute capacity by
increasing Data Warehouse Units (DWUs)
 Elastic – scale up/down as needed, and pay
per usage (hourly-level)
 Scaling takes minutes as – compute is
independent of data, thus no data re-
distribution is needed.
| © Copyright 2015 Hitachi Consulting40
Azure SQL Data Warehouse
Data Warehouse Units
0 200 400 600 800 1000
DW100
DW200
DW300
DW400
DW500
DW600
DW1000
DW1200
DW1500
DW2000
Max transaction size (GB)
| © Copyright 2015 Hitachi Consulting41
Azure SQL Data Warehouse
Data Warehouse Units
0 200 400 600 800 1000
DW100
DW200
DW300
DW400
DW500
DW600
DW1000
DW1200
DW1500
DW2000
Max transaction size (GB)
1
10
100
1000
Connections Queries
Slots small rc (MB/dist)
medium rc (MB/dist)
| © Copyright 2015 Hitachi Consulting42
Azure SQL Data Warehouse
Data Warehouse Units
DWU Readers Writers
1
10
100
1000
Connections Queries
Slots small rc (MB/dist)
medium rc (MB/dist)
0 200 400 600 800 1000
DW100
DW200
DW300
DW400
DW500
DW600
DW1000
DW1200
DW1500
DW2000
Max transaction size (GB)
| © Copyright 2015 Hitachi Consulting43
Azure SQL Data Warehouse
Distribution vs. Partitioning
Distribution
 Aim to avoid skewing to maximize balance parallel processing.
- if most of your data in one node, no parallel processing is gained.
 Aim to avoid expensive data shuffling across data nodes.
 Consider columns used in JOIN or GROUP BY for distribution keys to reduce shuffling
 Consider columns with high cardinality for distribution keys to avoid skewing
 Round-Robin distribution is the default - and a good compromise.
 Use Replication for small tables (yet to be available).
Partitioning
 In a given data node, aim to minimize the amount of data to be considered in the query
- that is, partition elimination.
 Consider columns used in WHERE clauses for partition keys.
| © Copyright 2015 Hitachi Consulting44
Azure SQL Data Warehouse
Indexing
Column Store Index +
Partitioning
Column Store Index
Row Index + Partitioning
Row Index
Heap
Better query
performance
Better load
performance
Used in data landing
tables
Used in tables expecting
aggregation queries
Used in staging/data
processing tables
| © Copyright 2015 Hitachi Consulting45
Azure SQL Data Warehouse
Data Model
 Capture elementary source data structure, main business entities and relationships
 Atomic, fine-grained, integrated, detail-oriented, historical, bi-temporal
 Provides a flexible, yet unified, data foundation allow agile business rules to construct
new data products and information models.
Data Vault
(Hubs, Links, Satellites)
Dan Linstedt
3rd Normal form
(3NF)
Bill Inmon
Dimensional Model
(Facts, Dimension)
Ralph Kimball
| © Copyright 2015 Hitachi Consulting46
Azure SQL Data Warehouse
PolyBase
Control
Compute
Compute
Compute
Compute
PolyBase
Allow brining data in/out Azure
SQL DW/SQL from/to Azure Blob
Storage/Data Lake
Create External table
(Schema-on-Read)
– Identify file location and format
Parallel Data Movement – Direct
connectivity between compute
and data files (back door)
Best way to get data in and out of
Azure SQL DW
Hadoop-base technology
(very similar to Hive)
| © Copyright 2015 Hitachi Consulting47
Azure SQL Data Warehouse
PolyBase
CREATE EXTERNAL DATA SOURCE MyAzureDataSource
WITH
(TYPE = HADOOP
,LOCATION =
'wasbs://[container]@accountname.blob.core.windows.net/path'
);
CREATE EXTERNAL FILE FORMAT MyTextFileFormat
WITH
( FORMAT_TYPE = DELIMITEDTEXT
,FORMAT_OPTIONS
(
FIELD_TERMINATOR = '|',
STRING_DELIMITER = ',',
DATE_FORMAT= 'yyyy-MM-dd',
USE_TYPE_DEFAULT= TRUE
)
,DATA_COMPRESSION =
| 'org.apache.hadoop.io.compress.GzipCodec'
);
CREATE EXTERNAL TABLE [asb].[FactOnlineSales]
([ProductKey] int NOT NULL
,[StoreKey] int NOT NULL
,[DateKey] int NOT NULL
,[CustomerKey] int NOT NULL
,[PromotionKey] int NOT NULL
,[SalesQuantity] int NOT NULL
,[UnitPrice] money NOT NULL
,[SalesAmount] money NOT NULL
)
WITH
(LOCATION='wasbs://filepath_or_directory'
,DATA_SOURCE = MyAzureDataSource
,FILE_FORMAT = MyTextFileFormat
,REJECT_TYPE = VALUE
,REJECT_VALUE = 0
,REJECT_SAMPLE_VALUE = 1000
);
| © Copyright 2015 Hitachi Consulting48
Azure SQL Data Warehouse
Data Processing
Avoid Singleton
Inserts, Deletes,
and Updates
Used Create Table
As Select (CTAS)
for processing
data
Create and Update
Statistics
Extract the data from the source, Load the data, then
perform the set-based Transformations as SQL
procedures to benefit from MPP - No SSIS data flow task!
ELT
Materialize external tables
before processing
Distribution
Partitioning
Indexing
| © Copyright 2015 Hitachi Consulting49
Information Marts
| © Copyright 2015 Hitachi Consulting50
Information Marts
Role in the Enterprise Data Platform
Subject-Oriented
Owned by business
departments Contains data
aggregations and
calculations to answer
pre-defined business
questionsDimensional Model or
pre-canned datasets
May contain a
sandbox area for
experimentation
Built on top of the single version of the truth in
the EDW
Suitable for interactive
query and high
concurrency
Azure SQL Database |
Azure SQL 2016 VM |
Analysis Service Model
Data is processed on
the MPP platform
(Azure SQL DW) and
then loaded in the
mart
ETL
| © Copyright 2015 Hitachi Consulting51
End-to-end Data Processing
Data Lake
<Azure Data Lake | Blob Storage>
Enterprise DW
<Azure SQL DW>
Information Mart
<Azure SQL DB | SQL Server 2016>
Big Data Processing (TEL)
<Data Lake Analytics | HDInsight>
Massively Parallel Processing (ELT)
<PolyBase + T-SQL>
Populate (ETL)
<Azure Data Factory | SSIS>
PolyBase
| © Copyright 2015 Hitachi Consulting52
Azure Data Factory
| © Copyright 2015 Hitachi Consulting53
Azure Data Factory
Cloud data processing & movement services
A managed Azure cloud service for building & operating
data processing and movements
Scalable
Reliable
Pay-as-you use
Compose, monitor & schedule
data pipelines
Automatic cloud resource
management
| © Copyright 2015 Hitachi Consulting54
Azure Data Factory
Concepts & Components
Data Factory
Linked Service
A connection information to a data store
or a compute resource
Dataset
A pointer to the data object to be used as
input or an output of an Activity
Pipeline
logical grouping of Activities with certain
sequence & schedule.
| © Copyright 2015 Hitachi Consulting55
Azure Data Factory
Concepts & Components
Data Factory
Pipeline
Linked Service
A connection information to a data store
or a compute resource
Dataset
A pointer to the data object to be used as
input or an output of an Activity
Pipeline
logical grouping of Activities with certain
sequence & schedule.
Activity
actions to perform on data.
Input 0-N dataset(s) - Output 1-N dataset(s)
| © Copyright 2015 Hitachi Consulting56
Azure Data Factory
Concepts & Components
Data Factory
Pipeline
Linked Service
A connection information to a data store
or a compute resource
Dataset
A pointer to the data object to be used as
input or an output of an Activity
Pipeline
logical grouping of Activities with certain
sequence & schedule.
Activity
actions to perform on data.
Input 0-N dataset(s) - Output 1-N dataset(s)
Linked
Service Dataset
Linked
Service
Dataset
| © Copyright 2015 Hitachi Consulting57
Azure Data Factory
Concepts & Components
Data Factory
Linked Service
A connection information to a data store
or a compute resource
Dataset
A pointer to the data object to be used as
input or an output of an Activity
Pipeline
logical grouping of Activities with certain
sequence & schedule.
PipelineDataset
Pipeline
Dataset
Dataset
Dataset
Dataset
| © Copyright 2015 Hitachi Consulting58
Azure Data Factory
Concepts & Components
| © Copyright 2015 Hitachi Consulting59
Azure Data Factory
Features
Linked Service - Compute
 Azure HDInsight On-Demand
Azure Batch
 Azure Machine Learning
 Azure Data Lake Analytics
 Azure SQL DW (StoedProc)
 Azure SQL DB (StoedProc)
 SQL Server (StoredProc)
Activities
 Data Movement
(copy Data from Source to sink)
 Data Transformation
 Hive
 Pig
 MapReduce
 Machine Learning
 Stored Procedure
 Data Lake U-SQL
 Custom .NET
Linked Service – Data
 Azure Blob
 Azure Table
 Azure SQL Database
 Azure SQL Data Warehouse
 Azure DocumentDB
 Azure Data Lake Store
 SQL Server
 ODBC data sources
 Hadoop Distributed File
System (HDFS)
Data Gateway - Connect to an on-prem data source
| © Copyright 2015 Hitachi Consulting60
Azure Data Factory
Example {
"name": "AzureBlobInput",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "AzureStorageLinkedService1",
"typeProperties": {
"fileName": "input.log",
"folderPath": "adfgetstarted/inputdata",
"format": {
"type": "TextFormat",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Month",
"interval": 1
},
"external": true, "policy": {}
}
}
{
"name": "AzureBlobOutput",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "AzureStorageLinkedService1",
"typeProperties": {
"folderPath": "adfgetstarted/partitioneddata",
"format": {
"type": "TextFormat",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Month",
"interval": 1
}
}
}
{
"name": "MyFirstPipeline",
"properties": {
"description": "My first Azure Data Factory pipeline",
"activities": [
{
"type": "HDInsightHive",
"typeProperties": {
"scriptPath": "adfgetstarted/script/partitionweblogs.hql",
"scriptLinkedService": "AzureStorageLinkedService1",
"defines": {
"inputtable": <storageaccounturl>",
"partitionedtable": ,<folderurl>"
}
},
"inputs": [
{
"name": "AzureBlobInput"
}
],
"outputs": [
{
"name": "AzureBlobOutput"
}
],
"policy": {
"concurrency": 1,
"retry": 3
},
"scheduler": {
"frequency": "Month",
"interval": 1
},
"name": "RunSampleHiveActivity",
"linkedServiceName": "HDInsightOnDemandLinkedService"
}
],
"start": "2016-04-01T00:00:00Z",
"end": "2016-04-02T00:00:00Z",
"isPaused": false
}
}
{
"name": "HDInsightOnDemandLinkedService",
"properties": {
"type": "HDInsightOnDemand",
"typeProperties": {
"version": "3.2",
"clusterSize": 1,
"timeToLive": "00:30:00",
"linkedServiceName": "AzureStorageLinkedService1"
}
}
}
Input Dataset
Linked Service
Output Dataset
Pipeline
| © Copyright 2015 Hitachi Consulting61
Azure NoSQL Data Stores
| © Copyright 2015 Hitachi Consulting62
What is NoSQL?
key attributes..
Non-relationalDistributed
Fault-tolerant
Semi-structuredScalable
Non-Transactional
Complement
RDBMS
Flexible Schema
Random, real-time
read/write access
| © Copyright 2015 Hitachi Consulting63
Why NoSQL?
Suitable for Not Suitable for
Batch
Processing
Complex
Analytical Queries
Joins
Complex
Transactions
Reference Data
Variable Data
Structures
Singleton Select/ Inset/ update
Random, real-time read/write access
Suitability
| © Copyright 2015 Hitachi Consulting64
NoSQL Data Stores
Categories and breads
Memory Cache
Store
Graph
Store
Column Family
Store
Key/Value
Store
Document
Store
Others
| © Copyright 2015 Hitachi Consulting65
Azure NoSQL Technologies
Memory Cache
Store
Document
Store
Graph Store
Column Family
Store
Key/Value
Store
 Flexible data structure
 Dictionary/ lookup
 Value can be anything
 Column-oriented access
 Big Data with real-time read/write random access
 Extensible
 Query-able data
 Objects (complex structure) in JSON, XML, etc.
 CRUD apps
 Social networks
 Fraud detection
 Relationship-heavy data
 Non-durable data
 Fast access
 Hot data
Azure Table Storage
Azure Hbase on HDInsight
Azure DocumentDB
Microsoft Graph Engine (Trinity)
Azure Redis Cache
| © Copyright 2015 Hitachi Consulting66
Size
NoSQL Data Stores
Categories and breads
Memory Cache
Store
Column Family
Store
Key/Value
Store
Document
Store
RDMS
QueryComplexity-Flexibility
DefinedStructure
Graph
Store
Specialized
Latency
| © Copyright 2015 Hitachi Consulting67
NoSQL Usage Patterns
A common scenario…
NoSQL
Online Apps
Read
Data WarehouseRDMS
Read/Write
ETL (batch)
ETL (batch)
 Data is Extracted, Transformed, loaded from OLTP to
EDW
 Aggregations, KPIs, and scores are computed via Batch
Processing
 Results are populated to a NoSQL data store for
reference use in apps
 App hot read - ETL cold write
 Document & Graph stores
Hot Path
Cold Path
E.g., Single Customer View:
 Customer Matching, customer KPIs,
segment assignment, and propensity scoring
are performed as batch processing in DW
 The output goes to NoSQL to be used for real-time
recommendation, campaigning, targeted advertising, etc.
Write
| © Copyright 2015 Hitachi Consulting68
Real-time stream processing on
Microsoft Azure
| © Copyright 2015 Hitachi Consulting69
What is Event & Stream Processing?
Data at rest vs. Data in motion
Traditional – Working with data at rest Real-time – Working with data at motion
Data Store
Bulk-load & Batch Processing
Submit Query Get Results
Continuous
Processing
& Query
Contiguous
Data Stream
Static Reference Data
Actions &
Data Archiving
Real-time Continuous Results
| © Copyright 2015 Hitachi Consulting70
Lambda Architecture
The speed layer and stream processing
Hot Path
Cold Path
| © Copyright 2015 Hitachi Consulting71
Events & Stream Processing Architecture
Azure Tools & Technologies
Event
Triggers
Applications
Web and social
Devices
Sensors
| © Copyright 2015 Hitachi Consulting72
Events & Stream Processing Architecture
Azure Tools & Technologies
Data Stream
Collection
Message
Queuing
Event
Triggers
Applications
Web and social
Devices
Sensors
Apache Kafka
Azure Event Hub
Azure Service Bus
Azure IoT Hub
| © Copyright 2015 Hitachi Consulting73
Events & Stream Processing Architecture
Azure Tools & Technologies
Stream
Processing
Data Stream
Collection
Message
Queuing
Event
Triggers
Applications
Web and social
Devices
Sensors
Azure ML
Spark Streaming on
HDInsight
Storm on HDInsight
Apache Kafka
Azure Event Hub
Azure Service Bus
Azure Steam
Analytics
Azure IoT Hub
| © Copyright 2015 Hitachi Consulting74
Events & Stream Processing Architecture
Azure Tools & Technologies
Stream
Processing
Data Stream
Collection
Message
Queuing
Event
Triggers
Applications
Web and social
Devices
Sensors
Azure ML
Spark Streaming on
HDInsight
Storm on HDInsight
Reference Data
Apache Kafka
Azure Event Hub
Azure Service Bus
Azure Steam
Analytics
Azure IoT Hub
| © Copyright 2015 Hitachi Consulting75
Events & Stream Processing Architecture
Azure Tools & Technologies
Stream
Processing
Data Stream
Collection
Presentation
and Action
Message
Queuing
Storage and
Batch Analysis
PowerBI
Live Dashboards
Apps and Devices
to take actions
Ingress
Event
Triggers
Applications
Web and social
Devices
Sensors
Azure ML
Spark Streaming on
HDInsight
Storm on HDInsight
Reference Data
Apache Kafka
Azure Event Hub
Azure Service Bus
HDFS
Azure SQL
DB/DW
Azure Steam
Analytics
Azure IoT Hub
| © Copyright 2015 Hitachi Consulting76
Events & Stream Processing Architecture
 Devices, Websites, and Apps that continuously produce data streamsData Sources
 Listen to, collection, and transfer in-bound eventsData Collection
 De-couples data consumers from data producers
 Reliable, distributed fault-tolerant, high-throughputs short-tem storage
Message Queuing
 Aggregate / filter / join incoming event streams
 Temporal engine for analysing data across time-series windows
Stream Processing
 High throughputs, random access data store to support processing
 Usually NoSQL data stores
Reference Data
 Store processed/ aggregated/ filtered data (SQL/NoSQL)
 Consolidate and store raw data into files for batch analysis (DFS)
Storage
 Rich interactive visualizations for real-time data analysis
 Application integration for process automation
Presentation
| © Copyright 2015 Hitachi Consulting77
Azure Stream Analytics
Fully-managed real-time processing
• Intake millions of events per second
• Processing on continuous streams of data
• Reference data lookup
• Output to live dashboards and data sores
Mission Critical Reliability
• Guaranteed events delivery
• Preserves event order pre-device basis
• Guaranteed business continuity
• Auto-recovery from failures
No challenges with Scale
• Elasticity for scale up or scale down
• Distributed, scale-out architecture
• Pay only for the resources you use
Rapid Development & Deployment
• SQL-like Language
• Built-in temporal semantics
• Up and running in a few clicks
• Scheduling and Monitoring
A PaaS real-time complex event processing (CEP) on Microsoft Azure
| © Copyright 2015 Hitachi Consulting78
Azure Stream Analytics
Data Source Ingest/Queue Process ConsumeDeliver
Event Inputs
- Event Hub
- Azure Blob
- DocumentDB
(coming soon)
Transform
- Temporal joins
- Filter
- Aggregates
- Projections
- Windows
- REST APIs
(coming soon)
Enrich
Azure ML
Outputs
- SQL Azure
- Azure Blobs
- Event Hub
- Service Bus Queue
- Service Bus Topics
- Table storage
- DocumentDB
- PowerBI
Azure
Storage
 Distributed
 Lowlatency
 Highthroughputs
 Scalable-Reliable
 Lowcost
Azure Stream Analytics
Reference Data
- Azure Blob
- HBase
(coming soon)
Power BI
Dashboard
| © Copyright 2015 Hitachi Consulting79
Azure Stream Analytics
Stream Analytics Query
| © Copyright 2015 Hitachi Consulting80
Stream Analytics Query Language
 In data streams, a common requirement is to perform aggregation (max, min, sum, count, etc.)
over messages that arrive within a specified period of time (window) - to detect events.
 Each Group By requires a windowing function
 Each window operation outputs a single event at the end of the window
 All windows have a fixed length
Windowing Functions
Tumbling window
Aggregate per time interval
Hopping window
Schedule overlapping
windows
Sliding window
Windows constant
re-evaluated
| © Copyright 2015 Hitachi Consulting81
Data Science and Machine Learning
on Microsoft Azure
| © Copyright 2015 Hitachi Consulting82
Data Science and Machine Learning
What?
Data
Mining
Machine
Learning
Statistics
Artificial
Intelligence
Databases
Other
Technologies
“Data mining, an interdisciplinary subfield of
computer science, is the computational
process of automatic discovering patterns in
large data sets”
Other Related Technologies:
 Visualization
 Big Data
 High Performance Computing
 Cloud Computing
 Others..
| © Copyright 2015 Hitachi Consulting83
Data Science and Machine Learning
Why?
Vision Analytics
Recommendation
engines
Advertising
analysis
Weather
forecasting for
business planning
Social network
analysis
Legal
discovery and
document
archiving
Pricing analysis
Fraud
detection
Churn
analysis
Predictive
Maintenance
Location-based
tracking and
services
Personalized
Insurance
Machine learning &
predictive analytics are core
capabilities that are needed
throughout your business
| © Copyright 2015 Hitachi Consulting84
Data Science and Machine Learning
How?
Understanding
the Data
Modelling
Evaluation,
Interpretation,
Communication
Deployment
Cross Industry Standard Process for
Data Mining (CRISP-DM)
Data
Start
Understanding
the Business
Preparing
the Data
| © Copyright 2015 Hitachi Consulting85
Data Science and Machine Learning
How?
Classification Learning
Build a model that can predict the target
class of an input case
Cluster Analysis
Discover natural groupings within the
data points
Association Rule Discovery
Extract frequent patterns present in the
data
Regression Modeling
Build a model that can estimate the
response value given an input case
Time Series Analysis
Analysis of temporal data to forecast
future values
Probabilistic Modeling &
Estimation
Compute the probability of an event to
occur
Similarity Analysis
Identify similar cases to a given input
case
Collaborative Filtering
Filtering of information using techniques
involving collaboration viewpoints
IF .. AND .. AND ..
THEN A
ELSE IF .. AND ..
THEN C
ELSE IF .. AND ..
THEN B
..
..
ELSE C
| © Copyright 2015 Hitachi Consulting86
Data Science and Machine Learning
Microsoft tools and technologies
Data Mining - SQL Server
Analysis Services
Azure Machine Learning
Spark ML – Azure HDInsight Microsoft R Server
Azure Cognitive Services
| © Copyright 2015 Hitachi Consulting87
Data Science and Machine Learning
Microsoft Analysis Services
 SQL Server Analysis Services (SQL Server on a VM, no PaaS)
 Process data from many OLEDB and ODBC data sources
 Easy to build, interpret, deploy, and productionize
 Limited extensibility
 Custom APIs can be built to use constructed models (DMX)
 Limited pre-processing functionality
Azure SQL DW/DB SQL Server
Analysis Services
Online Apps
Build Model
Result
Explore/ Interpret Model
DMX Query
Batch Scoring Retrain Model
| © Copyright 2015 Hitachi Consulting88
Data Science and Machine Learning
Microsoft Analysis Services
Explore/ Interpret Model
Data Source View Mining Structure
Mining Algorithm Mining Model
 Decision Tress
 Naïve-Bayes
 Linear Regression
 Neural Networks
 Association
 Clustering
 Sequence Clustering
 Time Series
| © Copyright 2015 Hitachi Consulting89
Data Science and Machine Learning
Microsoft Analysis Services
| © Copyright 2015 Hitachi Consulting90
Data Science and Machine Learning
Spark ML on HDInsight
 HDInsight-based ML Services provide by Spark
 Spark MLlib and Spark ML functionality
 Stream mining functionality
 Imports data from HDFS (Blob Storage/Data Lake)
 Scalable – Highly Distributed
 Extensible – Python and R
 Suitable for Big Data - Batch Model Training and Data Scoring
Batch Scoring
Build and Save Model
Learning Data
Retrain Model
| © Copyright 2015 Hitachi Consulting91
Data Science and Machine Learning
Spark ML on HDInsight – Pipelines
Spark ML standardizes APIs for machine learning algorithms to make it easier to combine
multiple algorithms into a single pipeline, or workflow.
 Transformers – used for data pre-processing. Input: DataFrame - Output:DataFrame
 Estimators – ML algorithm used to build a predictive model. Input: DataFrame - Output: Model.
 Parameters – Configurations for Transformers and Estimators
 Pipeline – Chains Transformers and Estimators
ML Pipeline
Dataset
(DataFrame)
Transformer A
(pre-processing)
Estimator
(ML Learning
Algorithm)
Model
Evaluation
Parameters
Transformer Z
(pre-processing)
…
| © Copyright 2015 Hitachi Consulting92
Data Science and Machine Learning
Spark ML on HDInsight - functionality
Text Feature Extraction
 TF-IDF (HashingTF and IDF)
 Word2Vec
 CountVectorizer
 Tokenizer
 StopWordsRemover
 n-gram
Feature Selection
 VectorSlicer
 RFormula
 ChiSqSelector
Dimensionality Reduction
 PCA
Features Vector Preparation
 VectorAssembler
 VectorIndexer
 StringIndexer
 IndexToString
Transformers
Feature Type Conversion
 Binarizer
 Discrete Cosine Transform (DCT)
 OneHotEncoder
 Bucketizer
 QuantileDiscretizer
Feature Scaling
 Normalizer
 StandardScaler
 MinMaxScaler
Feature Construction
 SQLTransformer
 ElementwiseProduct
 PolynomialExpansion
Estimators (supervised)
Classification
 Decision Trees – Ensembles
 Naïve-Bayes
 SVM
Regression
 Linear Regression
 SVM
Other (Unsupervised)
Clustering
Collaborative Filtering
Frequent Pattern Mining
| © Copyright 2015 Hitachi Consulting93
Data Science and Machine Learning
Azure Machine Learning
 Cloud-based Machine Learning Services
 Imports data from everywhere
 Easy to develop and productionize – Web Services
 Extensible via R and Python scripts
 Limited in exploration and model interpretation
 Rich pre-processing functionality
Azure Machine Learning
Build and deploy
models in the cloud
Import Data
Publish
Result
Input
Web Services
Batch Scoring
Retrain Model
| © Copyright 2015 Hitachi Consulting94
Data Science and Machine Learning
Azure Machine Learning – algorithms cheat sheet
| © Copyright 2015 Hitachi Consulting95
Data Science and Machine Learning
Azure Machine Learning – ML Studio
| © Copyright 2015 Hitachi Consulting96
Data Science and Machine Learning
Azure Machine Learning – ML Studio
| © Copyright 2015 Hitachi Consulting97
Data Science and Machine Learning
Microsoft R Server
Microsoft R Open (MRO)
 Based on latest Open Source R (3.2.2.) - Built, tested, and distributed by Microsoft
 Enhanced by Intel Math Kernel Library (MKL) to speed up linear algebra functions
 More efficient and multi-threaded math computation
 Revolutions Open-Source R packages (ParallelR, Rhadoop, AzureML, etc.)
 Compatible with all R-related software
| © Copyright 2015 Hitachi Consulting98
Data Science and Machine Learning
Microsoft R Server
| © Copyright 2015 Hitachi Consulting99
Data Science and Machine Learning
Microsoft R Server
CRAN MRO MRS
Data size In-memory In-memory In-memory & disk
Efficiency Single threaded Multi-threaded Multi-threaded, parallel
processing 1:N servers
Support Community Community Community + Commercial
Functionality 7500+ innovative analytic
packages
7500+ innovative analytic
packages
7500+ innovative packages +
commercial parallel high-speed
functions
Licence Open Source Open Source Commercial license.
| © Copyright 2015 Hitachi Consulting100
Data Science and Machine Learning
Microsoft R Server
 Powerful and scalable
 Installed on Windows, Linux, or as R Services in SQL Server 2016
 Work in different compute contexts – local, Hadoop, Spark, or SQL Server 2016
 Very suitable for interactive Data Science activities
 Models are deployed to SQL Sever 2016, AzureML, or Web Service on R Server (DeployR)
Deploy
MSR Server
(XDS)
Explore/ Build
(Rstudio | RTVS)
SQL Server 2016
Import Data Online Apps
Result
sp_execute_external_script
Batch Scoring
Retrain Model
| © Copyright 2015 Hitachi Consulting101
Data Science and Machine Learning
Microsoft R Server – ScaleR functionality
| © Copyright 2015 Hitachi Consulting102
Data Science and Machine Learning
Azure Cognitive Services - SaaS
| © Copyright 2015 Hitachi Consulting103
Azure Data Catalog
| © Copyright 2015 Hitachi Consulting104
A cloud-based service into which data source can be registered,
described and discovered by enterprise-wide users
 Registration of central data sources
 Self-service business intelligence
 Capturing tribal knowledge
Azure Data Catalog
Enterprise-wide metadata catalogue for data assets discovery
Discover Understand Consume Contribute
 Search
 Browse
 Filter
 Metadata
 Expert
 Description
 Your data
 Your Tools
 Your Way
 Tag
 Document
 Publish
Register
 Data Sources
 Data Assets
 Locatoins
| © Copyright 2015 Hitachi Consulting105
Azure Data Catalog
Terminology
Catalog
metadata repository in which data
sources and data assets can be
registered.
| © Copyright 2015 Hitachi Consulting106
Azure Data Catalog
Terminology
Catalog
metadata repository in which data
sources and data assets can be
registered.
Data Source
container that manages data
assets. (Database Server, Blob
Storage Account, etc.)
| © Copyright 2015 Hitachi Consulting107
Azure Data Catalog
Terminology
Catalog
metadata repository in which data
sources and data assets can be
registered.
Data Source
container that manages data
assets. (Database Server, Blob
Storage Account, etc.)
Data Asset
container that manages data assets.
(Database table/view, Blob Storage file, etc.)
| © Copyright 2015 Hitachi Consulting108
Azure Data Catalog
Terminology
Catalog
metadata repository in which data
sources and data assets can be
registered.
Data Source
container that manages data
assets. (Database Server, Blob
Storage Account, etc.)
Data Asset
container that manages data assets.
(Database table/view, Blob Storage file, etc.)
Data Asset Location
location of a data source or data asset,
which can be used to connect to the
source using a client application
| © Copyright 2015 Hitachi Consulting109
Azure Data Catalog
Terminology
Catalog
metadata repository in which data
sources and data assets can be
registered.
Data Source
container that manages data
assets. (Database Server, Blob
Storage Account, etc.)
Data Asset
container that manages data assets.
(Database table/view, Blob Storage file, etc.)
Data Asset Location
location of a data source or data asset,
which can be used to connect to the
source using a client application
Structural
Metadata
metadata extracted from a data
source that describes the
structure of a data asset
(names, data types, etc.)
Descriptive
Metadata
describe the purpose and the
relevance of the data asset (free-
form description, tags,
documentations, etc.)
| © Copyright 2015 Hitachi Consulting110
Azure Data Catalog
Terminology
Catalog
metadata repository in which data
sources and data assets can be
registered.
Data Source
container that manages data
assets. (Database Server, Blob
Storage Account, etc.)
Data Asset
container that manages data assets.
(Database table/view, Blob Storage file, etc.)
Data Asset Location
location of a data source or data asset,
which can be used to connect to the
source using a client application
Structural
Metadata
metadata extracted from a data
source that describes the
structure of a data asset
(names, data types, etc.)
Descriptive
Metadata
describe the purpose and the
relevance of the data asset (free-
form description, tags,
documentations, etc.)
Expert
identified as having an
informed perspective
for a data asset
Owner
has additional
privileges for managing
a data asset
| © Copyright 2015 Hitachi Consulting111
Azure Data Catalog
Terminology
Catalog
metadata repository in which data
sources and data assets can be
registered.
Data Source
container that manages data
assets. (Database Server, Blob
Storage Account, etc.)
Data Asset
container that manages data assets.
(Database table/view, Blob Storage file, etc.)
Data Asset Location
location of a data source or data asset,
which can be used to connect to the
source using a client application
Structural
Metadata
metadata extracted from a data
source that describes the
structure of a data asset
(names, data types, etc.)
Descriptive
Metadata
describe the purpose and the
relevance of the data asset (free-
form description, tags,
documentations, etc.)
Preview
a snapshot of up to 20
records that can be
extracted from the data
source during
registration, and stored in
the catalog
Owner
has additional
privileges for managing
a data asset
Profile
a snapshot of table-level and
column-level metadata about the
content of a registered data
asset
Expert
identified as having an
informed perspective
for a data asset
| © Copyright 2015 Hitachi Consulting112
Azure Data Catalog
Terminology
Catalog
metadata repository in which data
sources and data assets can be
registered.
Data Source
container that manages data
assets. (Database Server, Blob
Storage Account, etc.)
Data Asset
container that manages data assets.
(Database table/view, Blob Storage file, etc.)
Data Asset Location
location of a data source or data asset,
which can be used to connect to the
source using a client application
Structural
Metadata
metadata extracted from a data
source that describes the
structure of a data asset
(names, data types, etc.)
Descriptive
Metadata
describe the purpose and the
relevance of the data asset (free-
form description, tags,
documentations, etc.)
Preview
a snapshot of up to 20
records that can be
extracted from the data
source during
registration, and stored in
the catalog
Owner
has additional
privileges for managing
a data asset
Profile
a snapshot of table-level and
column-level metadata about the
content of a registered data
asset
Expert
identified as having an
informed perspective
for a data asset
Glossary
Terms, parent terms, and definitions
| © Copyright 2015 Hitachi Consulting113
Azure Data Catalog
Preview
| © Copyright 2015 Hitachi Consulting114
Cortana Analytical Suite
https://gallery.cortanaintelligence.com
| © Copyright 2015 Hitachi Consulting115
Acknowledgement
This guy is awesome
Thanks to Paul Lineham, the best big data architect I worked with, for
supplying me with practical ideas and knowledge on architecting and
delivering data platforms.
| © Copyright 2015 Hitachi Consulting116
My Background
Applying Computational Intelligence in Data Mining
• Honorary Research Fellow, School of Computing , University of Kent.
• Ph.D. Computer Science, University of Kent, Canterbury, UK.
• M.Sc. Computer Science , The American University in Cairo, Egypt.
• 25+ published journal and conference papers, focusing on:
– classification rules induction,
– decision trees construction,
– Bayesian classification modelling,
– data reduction,
– instance-based learning,
– evolving neural networks, and
– data clustering
• Journals: Swarm Intelligence, Swarm & Evolutionary Computation,
, Applied Soft Computing, and Memetic Computing.
• Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio,
ECTA, IEEE WCCI and INNS-BigData.
ResearchGate.org
| © Copyright 2015 Hitachi Consulting117
Thank you!

More Related Content

What's hot

Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft Azure
John Archer
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3
Eric Rice
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
 
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
DataWorks Summit
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
Torsten Steinbach
 
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Edureka!
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
Torsten Steinbach
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
Torsten Steinbach
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
DataWorks Summit
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLAS
DataWorks Summit
 
Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices Mesh
Red Hat Developers
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Steven Feuerstein
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
Informatica
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 

What's hot (20)

Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft Azure
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLAS
 
Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices Mesh
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 

Viewers also liked

How to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machinesHow to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machines
SolarWinds
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
Sascha Dittmann
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
Łukasz Grala
 
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
GUSS
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
Shy Engelberg
 
AnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data WarehouseAnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data Warehouse
Wlodek Bielski
 
Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)
Enrique Catala Bañuls
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse Overview
Justin Munsters
 
SQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu NiculitaSQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu Niculita
ITCamp
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
Ahmed Elharouny
 
Azure doc db (slideshare)
Azure doc db (slideshare)Azure doc db (slideshare)
Azure doc db (slideshare)
David Green
 
Azure DocumentDB for Healthcare Integration - Part 2
Azure DocumentDB for Healthcare Integration - Part 2Azure DocumentDB for Healthcare Integration - Part 2
Azure DocumentDB for Healthcare Integration - Part 2
BizTalk360
 
What are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft AzureWhat are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft Azure
Chris Roche
 
Power BI: From the Basics
Power BI: From the BasicsPower BI: From the Basics
Power BI: From the Basics
Nikkia Carter
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
Sascha Dittmann
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
Shiju Varghese
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
Khalid Salama
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
James Serra
 
Azure DocumentDb
Azure DocumentDbAzure DocumentDb
Azure DocumentDb
Marco Parenzan
 
Microsoft Azure Security Infographic
Microsoft Azure Security InfographicMicrosoft Azure Security Infographic
Microsoft Azure Security Infographic
Microsoft Azure
 

Viewers also liked (20)

How to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machinesHow to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machines
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
 
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
AnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data WarehouseAnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data Warehouse
 
Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse Overview
 
SQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu NiculitaSQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu Niculita
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Azure doc db (slideshare)
Azure doc db (slideshare)Azure doc db (slideshare)
Azure doc db (slideshare)
 
Azure DocumentDB for Healthcare Integration - Part 2
Azure DocumentDB for Healthcare Integration - Part 2Azure DocumentDB for Healthcare Integration - Part 2
Azure DocumentDB for Healthcare Integration - Part 2
 
What are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft AzureWhat are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft Azure
 
Power BI: From the Basics
Power BI: From the BasicsPower BI: From the Basics
Power BI: From the Basics
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Azure DocumentDb
Azure DocumentDbAzure DocumentDb
Azure DocumentDb
 
Microsoft Azure Security Infographic
Microsoft Azure Security InfographicMicrosoft Azure Security Infographic
Microsoft Azure Security Infographic
 

Similar to Enterprise Cloud Data Platforms - with Microsoft Azure

Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Amazon Web Services
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
ArunPandiyan890855
 
Modern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced AnalyticsModern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced Analytics
Collective Intelligence Inc.
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
Senturus
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_OpportunityNojan Emad
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
Get Started with Microsoft Azure.pptx
Get Started with Microsoft Azure.pptxGet Started with Microsoft Azure.pptx
Get Started with Microsoft Azure.pptx
AnjaliMishra647628
 
Deep architectural competency for deploying azure solutions
Deep architectural competency for deploying azure solutionsDeep architectural competency for deploying azure solutions
Deep architectural competency for deploying azure solutions
Synergetics Learning and Cloud Consulting
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
James Serra
 
The Cloud - What's different
The Cloud - What's differentThe Cloud - What's different
The Cloud - What's different
Chen-Tien Tsai
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
Tapdata
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Caserta
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
Amazon Web Services
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
DATAVERSITY
 

Similar to Enterprise Cloud Data Platforms - with Microsoft Azure (20)

Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Modern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced AnalyticsModern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced Analytics
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_Opportunity
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Get Started with Microsoft Azure.pptx
Get Started with Microsoft Azure.pptxGet Started with Microsoft Azure.pptx
Get Started with Microsoft Azure.pptx
 
Deep architectural competency for deploying azure solutions
Deep architectural competency for deploying azure solutionsDeep architectural competency for deploying azure solutions
Deep architectural competency for deploying azure solutions
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
The Cloud - What's different
The Cloud - What's differentThe Cloud - What's different
The Cloud - What's different
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 

Enterprise Cloud Data Platforms - with Microsoft Azure

  • 1. | © Copyright 2015 Hitachi Consulting1 Enterprise Cloud Data Platforms with Microsoft Azure & Cortana Analytical Suite Khalid M. Salama, Ph.D. Business Insights & Analytics Hitachi Consulting UK We Make it Happen. Better.
  • 2. | © Copyright 2015 Hitachi Consulting2 Outline  Microsoft Azure and Cloud Computing  Azure Data Architecture and Services  EDW with Azure SQL Data Warehouse  Big Data Storage and Analytics on Azure Data Lake and HDInsight  Data Processing and Movements with Azure Data Factory  Real-time Processing with Azure Event Hubs and Stream Analytics  Data Science and Machine Learning on Microsoft Azure  Data Discoverability with Azure Data Catalog
  • 3. | © Copyright 2015 Hitachi Consulting3 Microsoft Azure and Cloud Computing
  • 4. | © Copyright 2015 Hitachi Consulting4 Cloud Computing Enabling the future of computing • Web interfaces that deliver capacity in seconds or minutes On-demand Self- service • Support many devices, such as mobile phones, laptops, tablets and workstations Broad Network Access • Thousands to hundred of thousands severs are immediately usableResource Pooling • On-demand or automatic (threshold-based) scale-up or down.Rapid Elasticity • Pay as you use, what you use, Improve operational metric and cost accountingMeasured Service • Modern IT capabilities and services provided frequentlyInnovation
  • 5. | © Copyright 2015 Hitachi Consulting5 Cloud Computing on Microsoft Azure Computing Models
  • 6. | © Copyright 2015 Hitachi Consulting6 Cloud Computing on Microsoft Azure Computing Models
  • 7. | © Copyright 2015 Hitachi Consulting7 Cloud Computing on Microsoft Azure Computing Models
  • 8. | © Copyright 2015 Hitachi Consulting8 Cloud Computing on Microsoft Azure Computing Models Hybrid
  • 9. | © Copyright 2015 Hitachi Consulting9 Cloud Computing on Microsoft Azure Computing Models - Examples Infrastructure as a Service (IaaS) – VMs and Networking Services (build Software infrastructure, develop apps, use apps, Software maintained infrastructure) Platform as a Service (PaaS) – Data and App Services (develop apps, use apps) Software as a Service (SaaS) – SharePoint Online, Office 365, Microsoft Dynamics, Visual Studio Online, Azure Cognitive Services, etc. (use apps)
  • 10. | © Copyright 2015 Hitachi Consulting10 Cloud Computing on Microsoft Azure So why the cloud? Agility  Configure and provision in minutes  Automatic/on-demand scaling up/down Economics  Pay as you use (100% Utilization)  Elastic capacity Innovation  Connectivity  Innovative capabilities and services  New compute models
  • 11. | © Copyright 2015 Hitachi Consulting11 Azure Data Architecture and Services
  • 12. | © Copyright 2015 Hitachi Consulting12 Enterprise Data Platforms Challenges of Enterprise Information Management
  • 13. | © Copyright 2015 Hitachi Consulting13
  • 14. | © Copyright 2015 Hitachi Consulting14 Filter, Reform, Transpose Extract, Load
  • 15. | © Copyright 2015 Hitachi Consulting15 Aggregate/ Calculate Filter, Reform, Transpose Extract, Load
  • 16. | © Copyright 2015 Hitachi Consulting16 Aggregate/ Calculate Filter, Reform, Transpose Extract, Load
  • 17. | © Copyright 2015 Hitachi Consulting17 Big Data Processing  Hive/Pig  U-SQL  Spark  Storm Real-time Stream Processing  Stream Analytics  Event Hubs  Storm  Spark Streaming Data Science & Machine Learning  Azure Machine Learning  Microsoft R Server  Spark ML  Azure Cognitive Services NoSQL Data Stores  Table Storage  HBase  DocumentDB Aggregate/ Calculate Data Discovery & Search  Azure Data Catalog  Azure Search Filter, Reform, Transpose Extract, Load Data Orchestration & Movement  Azure Data Factory High Performance Computing  Azure Batch
  • 18. | © Copyright 2015 Hitachi Consulting18 Azure Enterprise Data Platform Layers and relevance Data Lake Data Warehouse Information Marts Analytical & Reporting Products Data Strict /Limited Batch Big (Everything) Data Professionals Query Access Size Processed - Structured Available/ Wide Small (Focused) Information WorkersUsers Raw - Unstructured Interactive Approach High-valueLow - Unknown Value Data  Structure Structure  Data
  • 19. | © Copyright 2015 Hitachi Consulting19 Azure Enterprise Data Platform Business Agility & IT Governance Data Lake Data Warehouse Information Marts Analytical & Reporting Products
  • 20. | © Copyright 2015 Hitachi Consulting20 Build Poof of Concepts and Proof of Values Azure Enterprise Data Platform Business Agility & IT Governance Data Lake Data Warehouse Information Marts Analytical & Reporting Products
  • 21. | © Copyright 2015 Hitachi Consulting21 Build Poof of Concepts and Proof of Values Azure Enterprise Data Platform Business Agility & IT Governance Data Lake Data Warehouse Information Marts Analytical & Reporting Products Import new data to a sandbox area in the mart Build a show-case Power Pivot Model Delve into data lake for exploratory data analysis Spin up an experimental VM or Azure Services Data Analysts and Data Scientists
  • 22. | © Copyright 2015 Hitachi Consulting22 Build Poof of Concepts and Proof of Values Azure Enterprise Data Platform Business Agility & IT Governance Data Lake Data Warehouse Information Marts Analytical & Reporting Products Validate/ Integrate/ Automate with the Enterprise-wide Data Layers New business definitions and requirements Data Engineers Import new data to a sandbox area in the mart Build a show-case Power Pivot Model Delve into data lake for exploratory data analysis Spin up an experimental VM or Azure Services Data Analysts and Data Scientists
  • 23. | © Copyright 2015 Hitachi Consulting23 Azure Data Lake Storage and Analytics
  • 24. | © Copyright 2015 Hitachi Consulting24 Data Lake – The Concept What is that? And Why it is needed? What is a Data lake Why a Data Lake A repository that holds raw data file extracts of all the Enterprise source systems Pentaho (Hitachi Data Systems) co-founder and CTO, James Dixon coined the term ‘Data Lake’
  • 25. | © Copyright 2015 Hitachi Consulting25 Data Lake – The Concept Data Processing Data stored in its native form (Structured, Semi-structured, and Unstructured): Text Files, JSON, XML, Logs, Mainframe, etc. Big Data technologies, such as MapReduce, Pig, Hive, Spark, and U-SQL, are used to process the data in a procedural fashion: Transpose/ reshape data into a tabular form Extract/filter valuable and relevant data elements Load prepared datasets into a relational EDW TEL
  • 26. | © Copyright 2015 Hitachi Consulting26 Azure Data Lake Azure Data Lake Services Data Lake Storage  Built for data files, rather than Blobs  WebHDFS REST APIs for Hadoop integration  Optimized for analytical workloads  Unlimited Storage, petabyte files Data Lake Analytics  Big Data Jobs as a Service  U-SQL  Unit of Cost: running Job HDInsight  Hadoop Cluster as a Service  MapReduce | Pig | Hive | Spark HBase | Oozie | Storm  Unit of Cost: Running nodes
  • 27. | © Copyright 2015 Hitachi Consulting27 Azure Data Lake Analytics U-SQL  C# meets SQL (.NET data type, C# expressions + SQL set statements)  Natively Parallel  Set-based and procedural processing  Works with structured and unstructured data  Extensible (User-defined Functions, Operators and Aggregate Functions)  Reuse of custom build .NET assemblies
  • 28. | © Copyright 2015 Hitachi Consulting28 Azure HDInsight Hadoop on Microsoft Cloud Windows Azure Blob Storage | Data Lake (DFS) Applications In-Memory Stream SQL  Spark- SQL NoSQL Machine Learning Batch Yet Another Resource Negotiator (YARN) Orchest. Mgmnt.Acquisition
  • 29. | © Copyright 2015 Hitachi Consulting29 Azure SQL Data Warehouse
  • 30. | © Copyright 2015 Hitachi Consulting30 Azure SQL Data Warehouse Enterprise Data Warehouse – why?
  • 31. | © Copyright 2015 Hitachi Consulting31 Azure SQL Data Warehouse Enterprise Data Warehouse – why? “ That is great! Now we have a data lake that has all the data that we want. Let’s spin up a VM, build a database, and produce the reports we want.” – Said Mr. quick-win from the business.
  • 32. | © Copyright 2015 Hitachi Consulting32 Azure SQL Data Warehouse Enterprise Data Warehouse – why? “ That is great! Now we have a data lake that has all the data that we want. Let’s spin up a VM, build a database, and produce the reports we want.” – Said Mr. quick-win from the business. “No. The data lake has raw data with no business structure. High-value data needs to be loaded into a canonical relational form, where enterprise-wide business entities , calculations, and rules are defined. Building reporting products on top of the data lake will results in unreliable information silos without single version of the truth.” – Said Mr. enterprise data architect from the IS.
  • 33. | © Copyright 2015 Hitachi Consulting33 Azure SQL Data Warehouse A Parallel DW on the cloud An elastic, Massively Parallel Processing (MPP), cloud-based data warehouse as a service, with enterprise-class features Suitable for  Batch data processing workloads  Consolidate high-value data into a single enterprise-wide structured store  Model, transform and aggregate data  Perform query analysis across large Not Suitable for  Operational workloads  High frequency reads & writes  Singleton operations  (Information Mart)  Row by row processing needs  (Data Lake)  High concurrency  (Information Mart)
  • 34. | © Copyright 2015 Hitachi Consulting34 Massively Parallel Processing (MPP) Azure SQL Data Warehouse Scale-out Distributed Queries
  • 35. | © Copyright 2015 Hitachi Consulting35 Azure SQL Data Warehouse MPP - Logical Overview  60 Data Distribution Nodes  Each node contains a portion of data (distribution) for parallel processing  Storage is de-coupled of compute Storage SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL
  • 36. | © Copyright 2015 Hitachi Consulting36 Storage Azure SQL Data Warehouse MPP - Logical Overview SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Compute  A scale-out distributed query engine
  • 37. | © Copyright 2015 Hitachi Consulting37 Storage Azure SQL Data Warehouse MPP - Logical Overview SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Compute  A scale-out distributed query engine
  • 38. | © Copyright 2015 Hitachi Consulting38 Storage Azure SQL Data Warehouse MPP - Logical Overview SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Compute  A scale-out distributed query engine  You get more compute capacity by increasing Data Warehouse Units (DWUs)
  • 39. | © Copyright 2015 Hitachi Consulting39 Storage Azure SQL Data Warehouse MPP - Logical Overview SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Compute  A scale-out distributed query engine  You get more compute capacity by increasing Data Warehouse Units (DWUs)  Elastic – scale up/down as needed, and pay per usage (hourly-level)  Scaling takes minutes as – compute is independent of data, thus no data re- distribution is needed.
  • 40. | © Copyright 2015 Hitachi Consulting40 Azure SQL Data Warehouse Data Warehouse Units 0 200 400 600 800 1000 DW100 DW200 DW300 DW400 DW500 DW600 DW1000 DW1200 DW1500 DW2000 Max transaction size (GB)
  • 41. | © Copyright 2015 Hitachi Consulting41 Azure SQL Data Warehouse Data Warehouse Units 0 200 400 600 800 1000 DW100 DW200 DW300 DW400 DW500 DW600 DW1000 DW1200 DW1500 DW2000 Max transaction size (GB) 1 10 100 1000 Connections Queries Slots small rc (MB/dist) medium rc (MB/dist)
  • 42. | © Copyright 2015 Hitachi Consulting42 Azure SQL Data Warehouse Data Warehouse Units DWU Readers Writers 1 10 100 1000 Connections Queries Slots small rc (MB/dist) medium rc (MB/dist) 0 200 400 600 800 1000 DW100 DW200 DW300 DW400 DW500 DW600 DW1000 DW1200 DW1500 DW2000 Max transaction size (GB)
  • 43. | © Copyright 2015 Hitachi Consulting43 Azure SQL Data Warehouse Distribution vs. Partitioning Distribution  Aim to avoid skewing to maximize balance parallel processing. - if most of your data in one node, no parallel processing is gained.  Aim to avoid expensive data shuffling across data nodes.  Consider columns used in JOIN or GROUP BY for distribution keys to reduce shuffling  Consider columns with high cardinality for distribution keys to avoid skewing  Round-Robin distribution is the default - and a good compromise.  Use Replication for small tables (yet to be available). Partitioning  In a given data node, aim to minimize the amount of data to be considered in the query - that is, partition elimination.  Consider columns used in WHERE clauses for partition keys.
  • 44. | © Copyright 2015 Hitachi Consulting44 Azure SQL Data Warehouse Indexing Column Store Index + Partitioning Column Store Index Row Index + Partitioning Row Index Heap Better query performance Better load performance Used in data landing tables Used in tables expecting aggregation queries Used in staging/data processing tables
  • 45. | © Copyright 2015 Hitachi Consulting45 Azure SQL Data Warehouse Data Model  Capture elementary source data structure, main business entities and relationships  Atomic, fine-grained, integrated, detail-oriented, historical, bi-temporal  Provides a flexible, yet unified, data foundation allow agile business rules to construct new data products and information models. Data Vault (Hubs, Links, Satellites) Dan Linstedt 3rd Normal form (3NF) Bill Inmon Dimensional Model (Facts, Dimension) Ralph Kimball
  • 46. | © Copyright 2015 Hitachi Consulting46 Azure SQL Data Warehouse PolyBase Control Compute Compute Compute Compute PolyBase Allow brining data in/out Azure SQL DW/SQL from/to Azure Blob Storage/Data Lake Create External table (Schema-on-Read) – Identify file location and format Parallel Data Movement – Direct connectivity between compute and data files (back door) Best way to get data in and out of Azure SQL DW Hadoop-base technology (very similar to Hive)
  • 47. | © Copyright 2015 Hitachi Consulting47 Azure SQL Data Warehouse PolyBase CREATE EXTERNAL DATA SOURCE MyAzureDataSource WITH (TYPE = HADOOP ,LOCATION = 'wasbs://[container]@accountname.blob.core.windows.net/path' ); CREATE EXTERNAL FILE FORMAT MyTextFileFormat WITH ( FORMAT_TYPE = DELIMITEDTEXT ,FORMAT_OPTIONS ( FIELD_TERMINATOR = '|', STRING_DELIMITER = ',', DATE_FORMAT= 'yyyy-MM-dd', USE_TYPE_DEFAULT= TRUE ) ,DATA_COMPRESSION = | 'org.apache.hadoop.io.compress.GzipCodec' ); CREATE EXTERNAL TABLE [asb].[FactOnlineSales] ([ProductKey] int NOT NULL ,[StoreKey] int NOT NULL ,[DateKey] int NOT NULL ,[CustomerKey] int NOT NULL ,[PromotionKey] int NOT NULL ,[SalesQuantity] int NOT NULL ,[UnitPrice] money NOT NULL ,[SalesAmount] money NOT NULL ) WITH (LOCATION='wasbs://filepath_or_directory' ,DATA_SOURCE = MyAzureDataSource ,FILE_FORMAT = MyTextFileFormat ,REJECT_TYPE = VALUE ,REJECT_VALUE = 0 ,REJECT_SAMPLE_VALUE = 1000 );
  • 48. | © Copyright 2015 Hitachi Consulting48 Azure SQL Data Warehouse Data Processing Avoid Singleton Inserts, Deletes, and Updates Used Create Table As Select (CTAS) for processing data Create and Update Statistics Extract the data from the source, Load the data, then perform the set-based Transformations as SQL procedures to benefit from MPP - No SSIS data flow task! ELT Materialize external tables before processing Distribution Partitioning Indexing
  • 49. | © Copyright 2015 Hitachi Consulting49 Information Marts
  • 50. | © Copyright 2015 Hitachi Consulting50 Information Marts Role in the Enterprise Data Platform Subject-Oriented Owned by business departments Contains data aggregations and calculations to answer pre-defined business questionsDimensional Model or pre-canned datasets May contain a sandbox area for experimentation Built on top of the single version of the truth in the EDW Suitable for interactive query and high concurrency Azure SQL Database | Azure SQL 2016 VM | Analysis Service Model Data is processed on the MPP platform (Azure SQL DW) and then loaded in the mart ETL
  • 51. | © Copyright 2015 Hitachi Consulting51 End-to-end Data Processing Data Lake <Azure Data Lake | Blob Storage> Enterprise DW <Azure SQL DW> Information Mart <Azure SQL DB | SQL Server 2016> Big Data Processing (TEL) <Data Lake Analytics | HDInsight> Massively Parallel Processing (ELT) <PolyBase + T-SQL> Populate (ETL) <Azure Data Factory | SSIS> PolyBase
  • 52. | © Copyright 2015 Hitachi Consulting52 Azure Data Factory
  • 53. | © Copyright 2015 Hitachi Consulting53 Azure Data Factory Cloud data processing & movement services A managed Azure cloud service for building & operating data processing and movements Scalable Reliable Pay-as-you use Compose, monitor & schedule data pipelines Automatic cloud resource management
  • 54. | © Copyright 2015 Hitachi Consulting54 Azure Data Factory Concepts & Components Data Factory Linked Service A connection information to a data store or a compute resource Dataset A pointer to the data object to be used as input or an output of an Activity Pipeline logical grouping of Activities with certain sequence & schedule.
  • 55. | © Copyright 2015 Hitachi Consulting55 Azure Data Factory Concepts & Components Data Factory Pipeline Linked Service A connection information to a data store or a compute resource Dataset A pointer to the data object to be used as input or an output of an Activity Pipeline logical grouping of Activities with certain sequence & schedule. Activity actions to perform on data. Input 0-N dataset(s) - Output 1-N dataset(s)
  • 56. | © Copyright 2015 Hitachi Consulting56 Azure Data Factory Concepts & Components Data Factory Pipeline Linked Service A connection information to a data store or a compute resource Dataset A pointer to the data object to be used as input or an output of an Activity Pipeline logical grouping of Activities with certain sequence & schedule. Activity actions to perform on data. Input 0-N dataset(s) - Output 1-N dataset(s) Linked Service Dataset Linked Service Dataset
  • 57. | © Copyright 2015 Hitachi Consulting57 Azure Data Factory Concepts & Components Data Factory Linked Service A connection information to a data store or a compute resource Dataset A pointer to the data object to be used as input or an output of an Activity Pipeline logical grouping of Activities with certain sequence & schedule. PipelineDataset Pipeline Dataset Dataset Dataset Dataset
  • 58. | © Copyright 2015 Hitachi Consulting58 Azure Data Factory Concepts & Components
  • 59. | © Copyright 2015 Hitachi Consulting59 Azure Data Factory Features Linked Service - Compute  Azure HDInsight On-Demand Azure Batch  Azure Machine Learning  Azure Data Lake Analytics  Azure SQL DW (StoedProc)  Azure SQL DB (StoedProc)  SQL Server (StoredProc) Activities  Data Movement (copy Data from Source to sink)  Data Transformation  Hive  Pig  MapReduce  Machine Learning  Stored Procedure  Data Lake U-SQL  Custom .NET Linked Service – Data  Azure Blob  Azure Table  Azure SQL Database  Azure SQL Data Warehouse  Azure DocumentDB  Azure Data Lake Store  SQL Server  ODBC data sources  Hadoop Distributed File System (HDFS) Data Gateway - Connect to an on-prem data source
  • 60. | © Copyright 2015 Hitachi Consulting60 Azure Data Factory Example { "name": "AzureBlobInput", "properties": { "type": "AzureBlob", "linkedServiceName": "AzureStorageLinkedService1", "typeProperties": { "fileName": "input.log", "folderPath": "adfgetstarted/inputdata", "format": { "type": "TextFormat", "columnDelimiter": "," } }, "availability": { "frequency": "Month", "interval": 1 }, "external": true, "policy": {} } } { "name": "AzureBlobOutput", "properties": { "type": "AzureBlob", "linkedServiceName": "AzureStorageLinkedService1", "typeProperties": { "folderPath": "adfgetstarted/partitioneddata", "format": { "type": "TextFormat", "columnDelimiter": "," } }, "availability": { "frequency": "Month", "interval": 1 } } } { "name": "MyFirstPipeline", "properties": { "description": "My first Azure Data Factory pipeline", "activities": [ { "type": "HDInsightHive", "typeProperties": { "scriptPath": "adfgetstarted/script/partitionweblogs.hql", "scriptLinkedService": "AzureStorageLinkedService1", "defines": { "inputtable": <storageaccounturl>", "partitionedtable": ,<folderurl>" } }, "inputs": [ { "name": "AzureBlobInput" } ], "outputs": [ { "name": "AzureBlobOutput" } ], "policy": { "concurrency": 1, "retry": 3 }, "scheduler": { "frequency": "Month", "interval": 1 }, "name": "RunSampleHiveActivity", "linkedServiceName": "HDInsightOnDemandLinkedService" } ], "start": "2016-04-01T00:00:00Z", "end": "2016-04-02T00:00:00Z", "isPaused": false } } { "name": "HDInsightOnDemandLinkedService", "properties": { "type": "HDInsightOnDemand", "typeProperties": { "version": "3.2", "clusterSize": 1, "timeToLive": "00:30:00", "linkedServiceName": "AzureStorageLinkedService1" } } } Input Dataset Linked Service Output Dataset Pipeline
  • 61. | © Copyright 2015 Hitachi Consulting61 Azure NoSQL Data Stores
  • 62. | © Copyright 2015 Hitachi Consulting62 What is NoSQL? key attributes.. Non-relationalDistributed Fault-tolerant Semi-structuredScalable Non-Transactional Complement RDBMS Flexible Schema Random, real-time read/write access
  • 63. | © Copyright 2015 Hitachi Consulting63 Why NoSQL? Suitable for Not Suitable for Batch Processing Complex Analytical Queries Joins Complex Transactions Reference Data Variable Data Structures Singleton Select/ Inset/ update Random, real-time read/write access Suitability
  • 64. | © Copyright 2015 Hitachi Consulting64 NoSQL Data Stores Categories and breads Memory Cache Store Graph Store Column Family Store Key/Value Store Document Store Others
  • 65. | © Copyright 2015 Hitachi Consulting65 Azure NoSQL Technologies Memory Cache Store Document Store Graph Store Column Family Store Key/Value Store  Flexible data structure  Dictionary/ lookup  Value can be anything  Column-oriented access  Big Data with real-time read/write random access  Extensible  Query-able data  Objects (complex structure) in JSON, XML, etc.  CRUD apps  Social networks  Fraud detection  Relationship-heavy data  Non-durable data  Fast access  Hot data Azure Table Storage Azure Hbase on HDInsight Azure DocumentDB Microsoft Graph Engine (Trinity) Azure Redis Cache
  • 66. | © Copyright 2015 Hitachi Consulting66 Size NoSQL Data Stores Categories and breads Memory Cache Store Column Family Store Key/Value Store Document Store RDMS QueryComplexity-Flexibility DefinedStructure Graph Store Specialized Latency
  • 67. | © Copyright 2015 Hitachi Consulting67 NoSQL Usage Patterns A common scenario… NoSQL Online Apps Read Data WarehouseRDMS Read/Write ETL (batch) ETL (batch)  Data is Extracted, Transformed, loaded from OLTP to EDW  Aggregations, KPIs, and scores are computed via Batch Processing  Results are populated to a NoSQL data store for reference use in apps  App hot read - ETL cold write  Document & Graph stores Hot Path Cold Path E.g., Single Customer View:  Customer Matching, customer KPIs, segment assignment, and propensity scoring are performed as batch processing in DW  The output goes to NoSQL to be used for real-time recommendation, campaigning, targeted advertising, etc. Write
  • 68. | © Copyright 2015 Hitachi Consulting68 Real-time stream processing on Microsoft Azure
  • 69. | © Copyright 2015 Hitachi Consulting69 What is Event & Stream Processing? Data at rest vs. Data in motion Traditional – Working with data at rest Real-time – Working with data at motion Data Store Bulk-load & Batch Processing Submit Query Get Results Continuous Processing & Query Contiguous Data Stream Static Reference Data Actions & Data Archiving Real-time Continuous Results
  • 70. | © Copyright 2015 Hitachi Consulting70 Lambda Architecture The speed layer and stream processing Hot Path Cold Path
  • 71. | © Copyright 2015 Hitachi Consulting71 Events & Stream Processing Architecture Azure Tools & Technologies Event Triggers Applications Web and social Devices Sensors
  • 72. | © Copyright 2015 Hitachi Consulting72 Events & Stream Processing Architecture Azure Tools & Technologies Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Apache Kafka Azure Event Hub Azure Service Bus Azure IoT Hub
  • 73. | © Copyright 2015 Hitachi Consulting73 Events & Stream Processing Architecture Azure Tools & Technologies Stream Processing Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Azure ML Spark Streaming on HDInsight Storm on HDInsight Apache Kafka Azure Event Hub Azure Service Bus Azure Steam Analytics Azure IoT Hub
  • 74. | © Copyright 2015 Hitachi Consulting74 Events & Stream Processing Architecture Azure Tools & Technologies Stream Processing Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Azure ML Spark Streaming on HDInsight Storm on HDInsight Reference Data Apache Kafka Azure Event Hub Azure Service Bus Azure Steam Analytics Azure IoT Hub
  • 75. | © Copyright 2015 Hitachi Consulting75 Events & Stream Processing Architecture Azure Tools & Technologies Stream Processing Data Stream Collection Presentation and Action Message Queuing Storage and Batch Analysis PowerBI Live Dashboards Apps and Devices to take actions Ingress Event Triggers Applications Web and social Devices Sensors Azure ML Spark Streaming on HDInsight Storm on HDInsight Reference Data Apache Kafka Azure Event Hub Azure Service Bus HDFS Azure SQL DB/DW Azure Steam Analytics Azure IoT Hub
  • 76. | © Copyright 2015 Hitachi Consulting76 Events & Stream Processing Architecture  Devices, Websites, and Apps that continuously produce data streamsData Sources  Listen to, collection, and transfer in-bound eventsData Collection  De-couples data consumers from data producers  Reliable, distributed fault-tolerant, high-throughputs short-tem storage Message Queuing  Aggregate / filter / join incoming event streams  Temporal engine for analysing data across time-series windows Stream Processing  High throughputs, random access data store to support processing  Usually NoSQL data stores Reference Data  Store processed/ aggregated/ filtered data (SQL/NoSQL)  Consolidate and store raw data into files for batch analysis (DFS) Storage  Rich interactive visualizations for real-time data analysis  Application integration for process automation Presentation
  • 77. | © Copyright 2015 Hitachi Consulting77 Azure Stream Analytics Fully-managed real-time processing • Intake millions of events per second • Processing on continuous streams of data • Reference data lookup • Output to live dashboards and data sores Mission Critical Reliability • Guaranteed events delivery • Preserves event order pre-device basis • Guaranteed business continuity • Auto-recovery from failures No challenges with Scale • Elasticity for scale up or scale down • Distributed, scale-out architecture • Pay only for the resources you use Rapid Development & Deployment • SQL-like Language • Built-in temporal semantics • Up and running in a few clicks • Scheduling and Monitoring A PaaS real-time complex event processing (CEP) on Microsoft Azure
  • 78. | © Copyright 2015 Hitachi Consulting78 Azure Stream Analytics Data Source Ingest/Queue Process ConsumeDeliver Event Inputs - Event Hub - Azure Blob - DocumentDB (coming soon) Transform - Temporal joins - Filter - Aggregates - Projections - Windows - REST APIs (coming soon) Enrich Azure ML Outputs - SQL Azure - Azure Blobs - Event Hub - Service Bus Queue - Service Bus Topics - Table storage - DocumentDB - PowerBI Azure Storage  Distributed  Lowlatency  Highthroughputs  Scalable-Reliable  Lowcost Azure Stream Analytics Reference Data - Azure Blob - HBase (coming soon) Power BI Dashboard
  • 79. | © Copyright 2015 Hitachi Consulting79 Azure Stream Analytics Stream Analytics Query
  • 80. | © Copyright 2015 Hitachi Consulting80 Stream Analytics Query Language  In data streams, a common requirement is to perform aggregation (max, min, sum, count, etc.) over messages that arrive within a specified period of time (window) - to detect events.  Each Group By requires a windowing function  Each window operation outputs a single event at the end of the window  All windows have a fixed length Windowing Functions Tumbling window Aggregate per time interval Hopping window Schedule overlapping windows Sliding window Windows constant re-evaluated
  • 81. | © Copyright 2015 Hitachi Consulting81 Data Science and Machine Learning on Microsoft Azure
  • 82. | © Copyright 2015 Hitachi Consulting82 Data Science and Machine Learning What? Data Mining Machine Learning Statistics Artificial Intelligence Databases Other Technologies “Data mining, an interdisciplinary subfield of computer science, is the computational process of automatic discovering patterns in large data sets” Other Related Technologies:  Visualization  Big Data  High Performance Computing  Cloud Computing  Others..
  • 83. | © Copyright 2015 Hitachi Consulting83 Data Science and Machine Learning Why? Vision Analytics Recommendation engines Advertising analysis Weather forecasting for business planning Social network analysis Legal discovery and document archiving Pricing analysis Fraud detection Churn analysis Predictive Maintenance Location-based tracking and services Personalized Insurance Machine learning & predictive analytics are core capabilities that are needed throughout your business
  • 84. | © Copyright 2015 Hitachi Consulting84 Data Science and Machine Learning How? Understanding the Data Modelling Evaluation, Interpretation, Communication Deployment Cross Industry Standard Process for Data Mining (CRISP-DM) Data Start Understanding the Business Preparing the Data
  • 85. | © Copyright 2015 Hitachi Consulting85 Data Science and Machine Learning How? Classification Learning Build a model that can predict the target class of an input case Cluster Analysis Discover natural groupings within the data points Association Rule Discovery Extract frequent patterns present in the data Regression Modeling Build a model that can estimate the response value given an input case Time Series Analysis Analysis of temporal data to forecast future values Probabilistic Modeling & Estimation Compute the probability of an event to occur Similarity Analysis Identify similar cases to a given input case Collaborative Filtering Filtering of information using techniques involving collaboration viewpoints IF .. AND .. AND .. THEN A ELSE IF .. AND .. THEN C ELSE IF .. AND .. THEN B .. .. ELSE C
  • 86. | © Copyright 2015 Hitachi Consulting86 Data Science and Machine Learning Microsoft tools and technologies Data Mining - SQL Server Analysis Services Azure Machine Learning Spark ML – Azure HDInsight Microsoft R Server Azure Cognitive Services
  • 87. | © Copyright 2015 Hitachi Consulting87 Data Science and Machine Learning Microsoft Analysis Services  SQL Server Analysis Services (SQL Server on a VM, no PaaS)  Process data from many OLEDB and ODBC data sources  Easy to build, interpret, deploy, and productionize  Limited extensibility  Custom APIs can be built to use constructed models (DMX)  Limited pre-processing functionality Azure SQL DW/DB SQL Server Analysis Services Online Apps Build Model Result Explore/ Interpret Model DMX Query Batch Scoring Retrain Model
  • 88. | © Copyright 2015 Hitachi Consulting88 Data Science and Machine Learning Microsoft Analysis Services Explore/ Interpret Model Data Source View Mining Structure Mining Algorithm Mining Model  Decision Tress  Naïve-Bayes  Linear Regression  Neural Networks  Association  Clustering  Sequence Clustering  Time Series
  • 89. | © Copyright 2015 Hitachi Consulting89 Data Science and Machine Learning Microsoft Analysis Services
  • 90. | © Copyright 2015 Hitachi Consulting90 Data Science and Machine Learning Spark ML on HDInsight  HDInsight-based ML Services provide by Spark  Spark MLlib and Spark ML functionality  Stream mining functionality  Imports data from HDFS (Blob Storage/Data Lake)  Scalable – Highly Distributed  Extensible – Python and R  Suitable for Big Data - Batch Model Training and Data Scoring Batch Scoring Build and Save Model Learning Data Retrain Model
  • 91. | © Copyright 2015 Hitachi Consulting91 Data Science and Machine Learning Spark ML on HDInsight – Pipelines Spark ML standardizes APIs for machine learning algorithms to make it easier to combine multiple algorithms into a single pipeline, or workflow.  Transformers – used for data pre-processing. Input: DataFrame - Output:DataFrame  Estimators – ML algorithm used to build a predictive model. Input: DataFrame - Output: Model.  Parameters – Configurations for Transformers and Estimators  Pipeline – Chains Transformers and Estimators ML Pipeline Dataset (DataFrame) Transformer A (pre-processing) Estimator (ML Learning Algorithm) Model Evaluation Parameters Transformer Z (pre-processing) …
  • 92. | © Copyright 2015 Hitachi Consulting92 Data Science and Machine Learning Spark ML on HDInsight - functionality Text Feature Extraction  TF-IDF (HashingTF and IDF)  Word2Vec  CountVectorizer  Tokenizer  StopWordsRemover  n-gram Feature Selection  VectorSlicer  RFormula  ChiSqSelector Dimensionality Reduction  PCA Features Vector Preparation  VectorAssembler  VectorIndexer  StringIndexer  IndexToString Transformers Feature Type Conversion  Binarizer  Discrete Cosine Transform (DCT)  OneHotEncoder  Bucketizer  QuantileDiscretizer Feature Scaling  Normalizer  StandardScaler  MinMaxScaler Feature Construction  SQLTransformer  ElementwiseProduct  PolynomialExpansion Estimators (supervised) Classification  Decision Trees – Ensembles  Naïve-Bayes  SVM Regression  Linear Regression  SVM Other (Unsupervised) Clustering Collaborative Filtering Frequent Pattern Mining
  • 93. | © Copyright 2015 Hitachi Consulting93 Data Science and Machine Learning Azure Machine Learning  Cloud-based Machine Learning Services  Imports data from everywhere  Easy to develop and productionize – Web Services  Extensible via R and Python scripts  Limited in exploration and model interpretation  Rich pre-processing functionality Azure Machine Learning Build and deploy models in the cloud Import Data Publish Result Input Web Services Batch Scoring Retrain Model
  • 94. | © Copyright 2015 Hitachi Consulting94 Data Science and Machine Learning Azure Machine Learning – algorithms cheat sheet
  • 95. | © Copyright 2015 Hitachi Consulting95 Data Science and Machine Learning Azure Machine Learning – ML Studio
  • 96. | © Copyright 2015 Hitachi Consulting96 Data Science and Machine Learning Azure Machine Learning – ML Studio
  • 97. | © Copyright 2015 Hitachi Consulting97 Data Science and Machine Learning Microsoft R Server Microsoft R Open (MRO)  Based on latest Open Source R (3.2.2.) - Built, tested, and distributed by Microsoft  Enhanced by Intel Math Kernel Library (MKL) to speed up linear algebra functions  More efficient and multi-threaded math computation  Revolutions Open-Source R packages (ParallelR, Rhadoop, AzureML, etc.)  Compatible with all R-related software
  • 98. | © Copyright 2015 Hitachi Consulting98 Data Science and Machine Learning Microsoft R Server
  • 99. | © Copyright 2015 Hitachi Consulting99 Data Science and Machine Learning Microsoft R Server CRAN MRO MRS Data size In-memory In-memory In-memory & disk Efficiency Single threaded Multi-threaded Multi-threaded, parallel processing 1:N servers Support Community Community Community + Commercial Functionality 7500+ innovative analytic packages 7500+ innovative analytic packages 7500+ innovative packages + commercial parallel high-speed functions Licence Open Source Open Source Commercial license.
  • 100. | © Copyright 2015 Hitachi Consulting100 Data Science and Machine Learning Microsoft R Server  Powerful and scalable  Installed on Windows, Linux, or as R Services in SQL Server 2016  Work in different compute contexts – local, Hadoop, Spark, or SQL Server 2016  Very suitable for interactive Data Science activities  Models are deployed to SQL Sever 2016, AzureML, or Web Service on R Server (DeployR) Deploy MSR Server (XDS) Explore/ Build (Rstudio | RTVS) SQL Server 2016 Import Data Online Apps Result sp_execute_external_script Batch Scoring Retrain Model
  • 101. | © Copyright 2015 Hitachi Consulting101 Data Science and Machine Learning Microsoft R Server – ScaleR functionality
  • 102. | © Copyright 2015 Hitachi Consulting102 Data Science and Machine Learning Azure Cognitive Services - SaaS
  • 103. | © Copyright 2015 Hitachi Consulting103 Azure Data Catalog
  • 104. | © Copyright 2015 Hitachi Consulting104 A cloud-based service into which data source can be registered, described and discovered by enterprise-wide users  Registration of central data sources  Self-service business intelligence  Capturing tribal knowledge Azure Data Catalog Enterprise-wide metadata catalogue for data assets discovery Discover Understand Consume Contribute  Search  Browse  Filter  Metadata  Expert  Description  Your data  Your Tools  Your Way  Tag  Document  Publish Register  Data Sources  Data Assets  Locatoins
  • 105. | © Copyright 2015 Hitachi Consulting105 Azure Data Catalog Terminology Catalog metadata repository in which data sources and data assets can be registered.
  • 106. | © Copyright 2015 Hitachi Consulting106 Azure Data Catalog Terminology Catalog metadata repository in which data sources and data assets can be registered. Data Source container that manages data assets. (Database Server, Blob Storage Account, etc.)
  • 107. | © Copyright 2015 Hitachi Consulting107 Azure Data Catalog Terminology Catalog metadata repository in which data sources and data assets can be registered. Data Source container that manages data assets. (Database Server, Blob Storage Account, etc.) Data Asset container that manages data assets. (Database table/view, Blob Storage file, etc.)
  • 108. | © Copyright 2015 Hitachi Consulting108 Azure Data Catalog Terminology Catalog metadata repository in which data sources and data assets can be registered. Data Source container that manages data assets. (Database Server, Blob Storage Account, etc.) Data Asset container that manages data assets. (Database table/view, Blob Storage file, etc.) Data Asset Location location of a data source or data asset, which can be used to connect to the source using a client application
  • 109. | © Copyright 2015 Hitachi Consulting109 Azure Data Catalog Terminology Catalog metadata repository in which data sources and data assets can be registered. Data Source container that manages data assets. (Database Server, Blob Storage Account, etc.) Data Asset container that manages data assets. (Database table/view, Blob Storage file, etc.) Data Asset Location location of a data source or data asset, which can be used to connect to the source using a client application Structural Metadata metadata extracted from a data source that describes the structure of a data asset (names, data types, etc.) Descriptive Metadata describe the purpose and the relevance of the data asset (free- form description, tags, documentations, etc.)
  • 110. | © Copyright 2015 Hitachi Consulting110 Azure Data Catalog Terminology Catalog metadata repository in which data sources and data assets can be registered. Data Source container that manages data assets. (Database Server, Blob Storage Account, etc.) Data Asset container that manages data assets. (Database table/view, Blob Storage file, etc.) Data Asset Location location of a data source or data asset, which can be used to connect to the source using a client application Structural Metadata metadata extracted from a data source that describes the structure of a data asset (names, data types, etc.) Descriptive Metadata describe the purpose and the relevance of the data asset (free- form description, tags, documentations, etc.) Expert identified as having an informed perspective for a data asset Owner has additional privileges for managing a data asset
  • 111. | © Copyright 2015 Hitachi Consulting111 Azure Data Catalog Terminology Catalog metadata repository in which data sources and data assets can be registered. Data Source container that manages data assets. (Database Server, Blob Storage Account, etc.) Data Asset container that manages data assets. (Database table/view, Blob Storage file, etc.) Data Asset Location location of a data source or data asset, which can be used to connect to the source using a client application Structural Metadata metadata extracted from a data source that describes the structure of a data asset (names, data types, etc.) Descriptive Metadata describe the purpose and the relevance of the data asset (free- form description, tags, documentations, etc.) Preview a snapshot of up to 20 records that can be extracted from the data source during registration, and stored in the catalog Owner has additional privileges for managing a data asset Profile a snapshot of table-level and column-level metadata about the content of a registered data asset Expert identified as having an informed perspective for a data asset
  • 112. | © Copyright 2015 Hitachi Consulting112 Azure Data Catalog Terminology Catalog metadata repository in which data sources and data assets can be registered. Data Source container that manages data assets. (Database Server, Blob Storage Account, etc.) Data Asset container that manages data assets. (Database table/view, Blob Storage file, etc.) Data Asset Location location of a data source or data asset, which can be used to connect to the source using a client application Structural Metadata metadata extracted from a data source that describes the structure of a data asset (names, data types, etc.) Descriptive Metadata describe the purpose and the relevance of the data asset (free- form description, tags, documentations, etc.) Preview a snapshot of up to 20 records that can be extracted from the data source during registration, and stored in the catalog Owner has additional privileges for managing a data asset Profile a snapshot of table-level and column-level metadata about the content of a registered data asset Expert identified as having an informed perspective for a data asset Glossary Terms, parent terms, and definitions
  • 113. | © Copyright 2015 Hitachi Consulting113 Azure Data Catalog Preview
  • 114. | © Copyright 2015 Hitachi Consulting114 Cortana Analytical Suite https://gallery.cortanaintelligence.com
  • 115. | © Copyright 2015 Hitachi Consulting115 Acknowledgement This guy is awesome Thanks to Paul Lineham, the best big data architect I worked with, for supplying me with practical ideas and knowledge on architecting and delivering data platforms.
  • 116. | © Copyright 2015 Hitachi Consulting116 My Background Applying Computational Intelligence in Data Mining • Honorary Research Fellow, School of Computing , University of Kent. • Ph.D. Computer Science, University of Kent, Canterbury, UK. • M.Sc. Computer Science , The American University in Cairo, Egypt. • 25+ published journal and conference papers, focusing on: – classification rules induction, – decision trees construction, – Bayesian classification modelling, – data reduction, – instance-based learning, – evolving neural networks, and – data clustering • Journals: Swarm Intelligence, Swarm & Evolutionary Computation, , Applied Soft Computing, and Memetic Computing. • Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio, ECTA, IEEE WCCI and INNS-BigData. ResearchGate.org
  • 117. | © Copyright 2015 Hitachi Consulting117 Thank you!