This session shows via live demonstration the use of Integration Services, Data Quality- and Master Data Services to create a closed loop information management solution, which cleans, standardize, merge and purges data all with the new data curation tools of SQL Server 2012. The session will also cover principals and best practises for each of the technology used.
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
SQLSaturday #188 - Enterprise Information Management
1. Closed Loop in Enterprise
Information Management
Oliver Engels & Tillmann Eitelberg
2. Who we are
Oliver:
CEO of oh22data AG, German MS Gold Partner
SQL MVP, Microsoft vTSP
Tillmann:
CTO of oh22information services GmbH
Both:
PASS Germany Board Members
Regional Mentors for Germany
SQL Information Services Advisory Board Members
Data Quality Maniacs
4. What Are Your Professional Development
Goals?
I want to take
the path from
DBA to Data
Analytics Guru
I want to
upgrade
my skills
I want to
give my
career a
competitive
edge
I want to expand
my network in the
business analytics
industry
Sound familiar? Get a head start and join us today at:
www.passbaconference.com
#passbac
Enjoy $150 off registration: use code CHM2D
5. Upcoming SQL Server events:
XXXIII Encontro da Comunidade SQLPort
Data Evento: 23 Abril 2013 - 18:30
Local do Evento: Auditório Microsoft, Parque das Nações, Lisboa
18:30 - Abertura e recepção.
19:10 - "Analyzing Twitter Data" - Niko Neugebauer (SQL Server MVP, Community Evangelist –
PASS)
20:15 - Coffee break
20:30 - "First Approach to SQL Server Analysis Services" - João Fialho (Consultor BI Independente)
21:30 - Sorteio de prémios
XXXIV Encontro da Comunidade SQLPort
Data Evento: 7 Maio 2013 - 19:00
Local do Evento: Porto
18:30 - Abertura e recepção.
19:00 - «Apresentação para Developers» - para definir
20:15 - Coffee break
20:30 - «Apresentação para definir» - para definir
21:30 - Sorteio de prémios
6. Volunteers:
They spend their FREE time to give you this
event. (2 months per person)
Because they are crazy.
Because they want YOU
to learn from the BEST IN THE WORLD.
If you see a guy with “STAFF” on their back –
buy them a beer, they deserve it.
12. Take aways
EIM. SQL IS. Data Curation…. what? Give
some explanations
Understanding of the building bricks of EIM in
the Microsoft BI Stack: SSIS, DQS, MDS
Closed loop: Bring’em all together
What’s possible
If time allow: First impressions on Selfservice
ETL: Data Explorer Preview
15. Definition:
EIM: Enterprise Information Management
Wiki:
Enterprise information management combines
business intelligence (BI) and enterprise content
management (ECM)
Where BI and ECM respectively manage structured
and unstructured information, EIM does not make this
"technical" distinction.
It approaches the management of information from
the perspective of enterprise information strategy,
based on the needs of information workers.
16. Definition: SQL Information Services
SQL Information Service charter:
Enrich enterprise data with the world’s data
Empower developers to build new services and
applications
Connecting with the
world’s data to turn data
into action
Vibrant marketplace ecosystem for the world’s
data
SQL Information Services
17. IT Pro
Knowledge Worker
Surface all
information as a
service to the
organization,
while maintaining
the right level of
control
Enable any user to find
reliable, trusted
information needed
to do their job
discover
secure
create
govern
clean
curate
publish
operationalize
recommend
transform
analyze
Developer
Immediate access to
the data and services
they need to build new
services and applications
Data Analyst
Democratize the broad
adoption of advanced
analytics to empower
businesses
18. SQL Information Services Portfolio
Building the tools for Enterprise Information
Management
Integration
Services
BizTalk
Master
Data Services
Data
Quality Services
Data Explorer
Big Data
Azure
Data Market
Stream
Insight
Other
IS Tools
19. Data curation
Data curation components for EIM
Data Quality Services
Master Data Services
Manage
Cleanse
SSIS/BizTalk
Integrate
20. Discover and
Access Data
and Services
PoC: Role definiton
Mash, Improve
Quality, Enrich
and Analyze
Share and
Collaborate
Information
Worker
Simplified, trusted
consumption of data
Data Steward
Data Management
ITProfessional
Service
Management
Provision, Deploy,
Maintain SLA
Publish
Add data sources to source
catalog
Investigate
Identify Data usage
Artifacts and data relations issues
Monitor usage
Govern
Assess, configure and oversee
Respond to
incidents
Manage Assets
Usage and Policy
Improve Quality of
Data and Metadata
Cleanse, Enrich, Curate
Build the plumbing,
Connect the assets
to the service
23. DQS: Data Quality Services
Main driver for data quality: Costs!
Data quality cost
Costs because of
bad data quality
Cost of optimizing
data quality
Direct
Prevention
Indirect
Discovery
Cleansing
24. DQS: Data Quality Services
Microsoft's DQM approach:
Data Quality Services (DQS)
is a Knowledge-Driven data quality solution
enabling data stewards to easily improve the
quality of their data
Easy = Information Worker Driven
Knowledge driven =
Capturing knowledge of good and bad
data in knowledge base
25
25. DQS: Data Quality Services
Domain concept
Domain (e.g. Street) has
Domain values
(List of correct and incorrect values)
Reference data
(External data references, e.g. D&B)
Rules
(Proofing if data is valid or invalid)
Termbased Relations
(Change abreviations)
26
27. DQS: Data Quality Services
Domain values
List of values
By Excel Import
By knowledge
discovery
By hand
Correction values
Invalid values
29. DQS: Data Quality Services
Reference data (RDS)
External cloud or on premise data streams
with enrichment functions
30. DQS: Data Quality Services
Reference data (RDS)
DQS delivers the address, RDS Service
delivers the correction or the geocode
DQS delivers the name and address RDS
service delivers the new address if moved
All kind of services available
Exchange rates, Translations, Geocoding,
Gender definition
32. DQS: Data Quality Services
Matching
Second functionality in DQS. Detection of
redundant data. After the cleaning values are
standardized and good for comparison processes
No simple comparison! Comparison will be
handled through complex fuzzy algorithms based
on matching policies the data steward will test
and setup
33
34. DQS: Data Quality Services
Uncleaned
data
Standardized, structure
and enrich
Discover
redundancy
Classified
data
Monitoring
Azure
Discovery
Reference
Data
Domain
Values
Uncleaned
data
Matching
Cleansing
Rules
Knowledge
Base (KB)
Termbased
Relations
Cleaned
data
Policy
Classification
Profiling & Notifications
37. MDS: Master Data Services
Problem in EIM
Heterogenic system environment with several line
of business application [LOB] who produce and
consume data from identical business entities
Core identities
Customer
Product
Chart of accounts etc.
Operational and Analytical Problem:
39
39. MDS: Master Data Services
Operational MDM
LOB‘s write and read from MDM to achieve a
single point of trouth
MDM enforcing the single point of truth [SPOT]
through rules, security, versioning
LOB systems provide and consume the SPOT of
an entity and the related attributes
Open interfaces for data exchange
All by an LOB indipendend UI
41
40. MDS: Master Data Services
Analytical MDM
Instead of loading the data from different LOBs to
the DWH landing area and standardize it in the
stage the MDM solution is the gatekeeper
The gatekeeper function of MDM will be achieved
through rules, standardized hierarchies,
versioning, approvals workflows, dimension
modeling (SCD etc.)
All by an LOB indipendend UI
42
48. MDS: Master Data Services
Business Rules:
Allows Data Owners to validate data without
writing T-SQL
Compiled into Stored Procedures
Uses IF..THEN Structures
Can use AND & OR Logical Operators, to create
Complex Rules up to 7 levels
Rules using OR Logical Operator can be broken
down into simpler rules
Applied to Attribute Members for it’s validation
50
49. MDS: Master Data Services
Business rules accommodate various
requirements
Connecting data sources and set overrides
Multi-level processes
Workflow and approval – internal (Master Data
Services) and external (Service Broker > SharePoint)
Multiple or compound business rules provide for more
complex requirements
Logical operators (AND / OR)
Control priority of activation
Enable/disable rules
51
50. MDS: Master Data Services
Rolebased user access
for master data stewards
Stream
Excel Add In
Silverlight UI
MDS App
LOB [1-n]
DWH
LOB
SSIS
BizTalk
MDS DB
SQL
Views
Stage
Table
Subscription
Views
52. EIM: Closed Loop
Combine MDS and DQS Functionalities
Use Integration Services to build a closed
loop workflow:
DQS Knowledge base for cleaning
MDS Model for standardization and audit
SSIS for data import, control flow and export
53. EIM Closed Loop
Demo case:
Sample available as download from MS for
everybody to play with (
Today using new SSDT 2012
)
http://www.microsoft.com/en-us/download/details.aspx?id=35462
54. EIM Closed
Business case:
Supplier Data List from External
Need to be checked if new suppliers are available
New data need to be proofed against data quality
standards set up by the Data Steward
Correct/Corrected data need to be validated
against Master Data Management to apply
business rules and add new data to the master
56. EIM Closed Loop
Version 2 (Advanced version)
Cleaning
with DQS KB
Source
Split
Union
for
MDS
Review by
MDS Data
Steward
Union
for
DQS
Correct
Review by
DQS Data
Steward
New
Lookup Up
MDS via ID
Corrected
No
Match
Lookup
corrected
MDS
Yes
Union
Data stream
Yes
>= Confidence
No
Match
Split
< Confidence
Stage