1. Data Virtualization & SQL Server 2019
presented by: Matthew Bowers | Director – Data & Analytics
2. Who is Oakwood?
10 Gold Competencies
Since 1981, Oakwood has been helping companies of all
sizes, across all industries, solve their business problems.
We bring world-class consultants to architect, design and
deploy technology solutions to move your company
forward.
Our proven approach guarantees better business
outcomes. With flexible engagement options, your
project is delivered on-time and on budget.
As a Microsoft Gold Partner, we are a leading provider of
transformative digital and cloud services, managed
business services and custom application development.
When you choose to engage with us, you’ll enjoy
improved customer relationships, enhanced productivity,
reduced IT costs and less responsibility for your
technology. With our expertise and industry insights,
we’ll deliver better business outcomes with speed and
certainty.
Thousands of Successful Software Projects over 35+ years
100+ Dev Experts to Help You Scale
40% Faster Development
10-30% Cost Savings Over Traditional In-house Staff
On-budget Delivery
Thousands of Clients in the Software and Digital Practice
3. Microsoft Cloud Solution Provider
With Microsoft’s Cloud Solutions Partner (CSP) Program,
Oakwood can provide and help manage your Azure and
Office 365 licenses, giving you the flexibility and scalability
your enterprise needs. Also, when used with our Managed
Services, you’ll have the peace of mind knowing that your
Azure usage will be monitored and optimized by our team
of in-house experts.
Microsoft Cloud Solutions Provider strengths
Tier 1 Cloud Solution Provider
• Access to additional advisory services
Provision Any O365 or Azure Resources
• Work with you to select the appropriate SKU for every
situation
Microsoft Gold Partner
• More impactful Microsoft communications
• Premium Support
• Deeper understanding of the Microsoft ecosystem
Actively manage spend to optimize service without overpaying
• More impactful Microsoft communications
• Premium Support
• Deeper understanding of the Microsoft ecosystem
4. What is Data Virtualization?
“Data virtualization is any approach to data management that allows an application to retrieve and
manipulate data without requiring technical details about the data, such as how it is formatted at
source, or where it is physically located, and can provide a single customer view of the overall data.”
Unlike the traditional extract, transform, load ("ETL") process, the data remains in place, and real-time
access is given to the source system for the data.
Data virtualization is a real-time, agile data integration methodology that provides a logical view of
the entire enterprise data without having to replicate them into a physical repository, which costs
time, money, and resources. It has been around for more than a decade and has matured over the
years into an enterprise use. The report notes that “…many implementations have moved from
single-use case deployments to more enterprise-wide strategies supporting multiple use cases….”
Forrester
7. Benefits of Data Virtualization Data Virtualization:
• Reduces the risk of data errors
• Reduces need of the workloads to move
data around that may never be used
• Reduced system workloads
• Enhanced performance and speed to access
data on a real time basis
• Significantly reduced development and
support time
• Increased governance
• Reduced storage costs
• It does not attempt to impose a single data
model on the data
• Allows for the integration of data from
multiple disparate sources, locations and
formats, without the need for data
replication or complete ETL/ELT
• Allows creation of a single “virtual” data
layer
8. Capabilities of Data Virtualization
Data Virtualization software may
provide many of the following
capabilities:
• Abstraction
• Virtualized Data Access
• Transformation
• Data Federation
• Data Delivery
9. Potential Drawbacks of Data
Virtualization
Data Virtualization has potential
drawbacks:
• May impact Operational Systems
response times
• Does not impose a heterogenous
data model
• Requires a defined governance
model to avoid budgeting issues
with shared services
• Not suitable for recording historic
snapshots of data for rolling
reporting (EDW)
• Change management can be
huge as all stakeholders need to
agree to changes
10. Not Always the Best Option
Data Virtualization is not a “be all
to end all” and should not be
used in certain use cases:
• Operational Systems or data stores
where response times are key critical
success factors
• When a heterogenous data model is
required
• Use case requiring the need to build
historical data snapshots
• When there is need for significant
data transformation or cleansing
11. Common Use Cases include:
• Virtual Data Warehouses
• Virtual Data Lakes
• Prototyping for physical integration
and defining the requirements and
architecture
• Vendor agnostic analytics data access
and semantic layer
• Developing a logical data warehouse
architecture
• Agile data preparation
• Virtual operational data store for
single application data
• Registry Style Master Data
Management
• Legacy System migration
Business Use Cases
https://simplicable.com/new/data-virtualization-vs-data-federation
12. Data federation is described by many as a
“type of data virtualization”. But with
subtle differences.
Data federation is typically a term used for
techniques that resemble virtual
databases, with strict data models.
Data Federation
Data virtualization is a term typically
used to describe a service that does not
impost a strict data model, while providing
a single pan of glass to the data.
https://simplicable.com/new/data-virtualization-vs-data-federation
13. Data Federation vs. Virtualization Data Federation:
• Virtual database(s)
• Provides a unified data model
• Accessing distributed data with
different data models
• Does impose a data model
Data Virtualization:
• A single interface or layer
• Accessing distributed data with
different data models
• Does not require a strict data model
14. The Microsoft Story
• Arguably one of the most eagerlyanticipatednewfeaturesofMicrosoftSQLServerinthenewreleaseofSQLServer2019,
isdatavirtualization
• SQLServer2016addedPolyBasethatprovidessomelimiteddatavirtualizationcapabilitiesagainstdatastoredinHadoopand
AzureBlobStorageandAzureDataLake
• In2019,thisfunctionalityhasbeenexpandedtoincludeSQLServer,Oracle,TeradataandMongoDB
• DatavirtualizationinSQLServer2019isaccomplishedusingsomesignificantenhancementsmadetoPolyBase,andtheuseof
anexternaltable
• For more information: https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-
guide?view=sql-server-ver15
15. The Microsoft Story
• PolyBaseisusedtoconnecttonumerousdatasourcesandfileformats
• InadditiontoPolyBase,theotherfeaturesetrelatedtodatavirtualizationthatallowsforthecombinationoflargevolumesof
relationalandnon-relationaldata,isBigDataclusters
• SQLServer2019bigdataclusterswiththeenhancementstoPolyBaseactasavirtualdatalayertointegratestructuredand
unstructureddatafromacrosstheentiredataestate(SQLServer,AzureSQLDatabase,AzureSQLDataWarehouse,Azure
CosmosDB,MySQL,PostgreSQL,MongoDB,Oracle,Teradata,HDFS,BlobStorage,AzureDataLakeStore)usingfamiliar
programmingframeworksanddataanalysistools: (JamesSerra)
• YoucanvirtualizethedatainaSQLServerinstancesothatitcanbequeriedtherelikeanyothertableinSQLServer
21. Install & Configure
Ensure the SQL Server PolyPase Data movement service and SQL Server PolyBase
Engine service are both enabled and running (SQL Server Configuration Manager)
22. Install & Configure
Enable TCIP in Protocols for MSSQLSERVER
Under SQL Server Configuration Manager (if not enabled)
Restart the SQL Service (only if was not enabled and you enable it)
24. Install & Configure
Launch Azure Data Studio
Connect to your SQL Server instance
Configure the PolyBase services
25. Install & Configure
Launch Azure Data Studio
Go to extensions
Install External data wizards
(Data Virtualization)
26. Create an External Table
Two Methods:
• Both involve the use of Azure Data
Studio
• Manual creation using T-SQL
command
• Use of “Create External Table Wizard”
27. Create an External Table
Two Methods:
• Both involve the use of Azure Data
Studio
• Manual creation using T-SQL
command
• Use of “Create External Table Wizard”