Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SQLSaturday #188 - Enterprise Information Management

This session shows via live demonstration the use of Integration Services, Data Quality- and Master Data Services to create a closed loop information management solution, which cleans, standardize, merge and purges data all with the new data curation tools of SQL Server 2012. The session will also cover principals and best practises for each of the technology used.

  • Login to see the comments

  • Be the first to like this

SQLSaturday #188 - Enterprise Information Management

  1. 1. Closed Loop in Enterprise Information Management Oliver Engels & Tillmann Eitelberg
  2. 2. Who we are  Oliver:  CEO of oh22data AG, German MS Gold Partner  SQL MVP, Microsoft vTSP  Tillmann:  CTO of oh22information services GmbH  Both:     PASS Germany Board Members Regional Mentors for Germany SQL Information Services Advisory Board Members Data Quality Maniacs
  3. 3. Our Sponsors:
  4. 4. What Are Your Professional Development Goals? I want to take the path from DBA to Data Analytics Guru I want to upgrade my skills I want to give my career a competitive edge I want to expand my network in the business analytics industry Sound familiar? Get a head start and join us today at: #passbac Enjoy $150 off registration: use code CHM2D
  5. 5. Upcoming SQL Server events: XXXIII Encontro da Comunidade SQLPort Data Evento: 23 Abril 2013 - 18:30 Local do Evento: Auditório Microsoft, Parque das Nações, Lisboa 18:30 - Abertura e recepção. 19:10 - "Analyzing Twitter Data" - Niko Neugebauer (SQL Server MVP, Community Evangelist – PASS) 20:15 - Coffee break 20:30 - "First Approach to SQL Server Analysis Services" - João Fialho (Consultor BI Independente) 21:30 - Sorteio de prémios XXXIV Encontro da Comunidade SQLPort Data Evento: 7 Maio 2013 - 19:00 Local do Evento: Porto 18:30 - Abertura e recepção. 19:00 - «Apresentação para Developers» - para definir 20:15 - Coffee break 20:30 - «Apresentação para definir» - para definir 21:30 - Sorteio de prémios
  6. 6. Volunteers:  They spend their FREE time to give you this event. (2 months per person)  Because they are crazy.  Because they want YOU to learn from the BEST IN THE WORLD.  If you see a guy with “STAFF” on their back – buy them a beer, they deserve it.
  7. 7. Paulo Matos:
  8. 8. Paulo Borges:
  9. 9. João Fialho:
  10. 10. Bruno Basto:
  11. 11. Niko Neugebauer:
  12. 12. Take aways  EIM. SQL IS. Data Curation…. what? Give some explanations  Understanding of the building bricks of EIM in the Microsoft BI Stack: SSIS, DQS, MDS  Closed loop: Bring’em all together What’s possible  If time allow: First impressions on Selfservice ETL: Data Explorer Preview
  13. 13. Agenda     Definitions Data Quality Services Master Data Quality Services Closed loop: SSIS, DQS, MDS Team up!
  14. 14. The situation
  15. 15. Definition:  EIM: Enterprise Information Management  Wiki:  Enterprise information management combines business intelligence (BI) and enterprise content management (ECM)  Where BI and ECM respectively manage structured and unstructured information, EIM does not make this "technical" distinction.  It approaches the management of information from the perspective of enterprise information strategy, based on the needs of information workers.
  16. 16. Definition: SQL Information Services  SQL Information Service charter: Enrich enterprise data with the world’s data Empower developers to build new services and applications Connecting with the world’s data to turn data into action Vibrant marketplace ecosystem for the world’s data SQL Information Services
  17. 17. IT Pro Knowledge Worker Surface all information as a service to the organization, while maintaining the right level of control Enable any user to find reliable, trusted information needed to do their job discover secure create govern clean curate publish operationalize recommend transform analyze Developer Immediate access to the data and services they need to build new services and applications Data Analyst Democratize the broad adoption of advanced analytics to empower businesses
  18. 18. SQL Information Services Portfolio  Building the tools for Enterprise Information Management Integration Services BizTalk Master Data Services Data Quality Services Data Explorer Big Data Azure Data Market Stream Insight Other IS Tools
  19. 19. Data curation  Data curation components for EIM Data Quality Services Master Data Services Manage Cleanse SSIS/BizTalk Integrate
  20. 20. Discover and Access Data and Services PoC: Role definiton Mash, Improve Quality, Enrich and Analyze Share and Collaborate Information Worker Simplified, trusted consumption of data Data Steward Data Management ITProfessional Service Management Provision, Deploy, Maintain SLA Publish Add data sources to source catalog Investigate Identify Data usage Artifacts and data relations issues Monitor usage Govern Assess, configure and oversee Respond to incidents Manage Assets Usage and Policy Improve Quality of Data and Metadata Cleanse, Enrich, Curate Build the plumbing, Connect the assets to the service
  21. 21. Enterprise Information Management DQS: DATA QUALITY SERVICE 22
  22. 22. Data curation
  23. 23. DQS: Data Quality Services  Main driver for data quality: Costs! Data quality cost Costs because of bad data quality Cost of optimizing data quality Direct Prevention Indirect Discovery Cleansing
  24. 24. DQS: Data Quality Services  Microsoft's DQM approach: Data Quality Services (DQS) is a Knowledge-Driven data quality solution enabling data stewards to easily improve the quality of their data  Easy = Information Worker Driven  Knowledge driven =  Capturing knowledge of good and bad data in knowledge base 25
  25. 25. DQS: Data Quality Services  Domain concept  Domain (e.g. Street) has  Domain values (List of correct and incorrect values)  Reference data (External data references, e.g. D&B)  Rules (Proofing if data is valid or invalid)  Termbased Relations (Change abreviations) 26
  26. 26. Data Quality Services DEMO 27
  27. 27. DQS: Data Quality Services  Domain values  List of values  By Excel Import  By knowledge discovery  By hand  Correction values  Invalid values
  28. 28. DQS: Data Quality Services  Domain rules  Regular Expressions  Logical expressions  Matching values  Termbased relations 29
  29. 29. DQS: Data Quality Services  Reference data (RDS)  External cloud or on premise data streams with enrichment functions
  30. 30. DQS: Data Quality Services  Reference data (RDS)  DQS delivers the address, RDS Service delivers the correction or the geocode  DQS delivers the name and address RDS service delivers the new address if moved  All kind of services available  Exchange rates, Translations, Geocoding, Gender definition
  31. 31. DQS: Data Quality Services  Knowledge base (KB) Knowledge Discovery Domain 1 Reference Data Domain 6 Domain 2 Domain Values Domain 7 Domain 8 Rules Termbased Relations Domain 9 Domain 3 Domain 4 Domain 5 Matching Policy Composite domain 1 CP Domain 2 CP Domain 3
  32. 32. DQS: Data Quality Services  Matching  Second functionality in DQS. Detection of redundant data. After the cleaning values are standardized and good for comparison processes  No simple comparison! Comparison will be handled through complex fuzzy algorithms based on matching policies the data steward will test and setup 33
  33. 33. DQS: Data Quality Services  Matching policies:
  34. 34. DQS: Data Quality Services Uncleaned data Standardized, structure and enrich Discover redundancy Classified data Monitoring Azure Discovery Reference Data Domain Values Uncleaned data Matching Cleansing Rules Knowledge Base (KB) Termbased Relations Cleaned data Policy Classification Profiling & Notifications
  35. 35. Enterprise Information Management MDS: MASTER DATA SERVICES 37
  36. 36. Master Data Management CRM ERP WWW Customer Customer Customer Product Product Product DWH
  37. 37. MDS: Master Data Services  Problem in EIM  Heterogenic system environment with several line of business application [LOB] who produce and consume data from identical business entities  Core identities  Customer  Product  Chart of accounts etc.  Operational and Analytical Problem: 39
  38. 38. MDS: Master Data Services CRM ERP WWW MDM [operational] Customer Customer Customer Product Product Product MDM [analytical] DWH
  39. 39. MDS: Master Data Services  Operational MDM  LOB‘s write and read from MDM to achieve a single point of trouth  MDM enforcing the single point of truth [SPOT] through rules, security, versioning  LOB systems provide and consume the SPOT of an entity and the related attributes  Open interfaces for data exchange  All by an LOB indipendend UI 41
  40. 40. MDS: Master Data Services  Analytical MDM  Instead of loading the data from different LOBs to the DWH landing area and standardize it in the stage the MDM solution is the gatekeeper  The gatekeeper function of MDM will be achieved through rules, standardized hierarchies, versioning, approvals workflows, dimension modeling (SCD etc.)  All by an LOB indipendend UI 42
  41. 41. MDS: Master Data Services CRM ERP MDM [operational] Customer Product MDM [analytical] DWH WWW
  42. 42. MDS: Master Data Services  Basic Model
  43. 43. Master Data Services DEMO 45
  44. 44. MDS: Master Data Services  Administration Screen
  45. 45. MDS: Master Data Services  Excel Add in for Information Worker 47
  46. 46. MDS: Master Data Services  Central hierachy management 48
  47. 47. MDS: Master Data Services  Collection
  48. 48. MDS: Master Data Services  Business Rules:  Allows Data Owners to validate data without writing T-SQL  Compiled into Stored Procedures  Uses IF..THEN Structures  Can use AND & OR Logical Operators, to create Complex Rules up to 7 levels  Rules using OR Logical Operator can be broken down into simpler rules  Applied to Attribute Members for it’s validation 50
  49. 49. MDS: Master Data Services  Business rules accommodate various requirements  Connecting data sources and set overrides  Multi-level processes  Workflow and approval – internal (Master Data Services) and external (Service Broker > SharePoint)  Multiple or compound business rules provide for more complex requirements  Logical operators (AND / OR)  Control priority of activation  Enable/disable rules 51
  50. 50. MDS: Master Data Services Rolebased user access for master data stewards  Stream Excel Add In Silverlight UI MDS App LOB [1-n] DWH LOB SSIS BizTalk MDS DB SQL Views Stage Table Subscription Views
  51. 51. Enterprise Information Management CLOSED LOOP: TEAM UP
  52. 52. EIM: Closed Loop  Combine MDS and DQS Functionalities  Use Integration Services to build a closed loop workflow:  DQS Knowledge base for cleaning  MDS Model for standardization and audit  SSIS for data import, control flow and export
  53. 53. EIM Closed Loop  Demo case:  Sample available as download from MS for everybody to play with (  Today using new SSDT 2012 )
  54. 54. EIM Closed  Business case:  Supplier Data List from External  Need to be checked if new suppliers are available  New data need to be proofed against data quality standards set up by the Data Steward  Correct/Corrected data need to be validated against Master Data Management to apply business rules and add new data to the master
  55. 55. EIM Closed Loop  Version 1 (Simple version) Correct /Corrected Source Cleaning with DQS KB Fuzzy Grouping Filter Duplicates Incorrect Audit MDS Stage
  56. 56. EIM Closed Loop  Version 2 (Advanced version) Cleaning with DQS KB Source Split Union for MDS Review by MDS Data Steward Union for DQS Correct Review by DQS Data Steward New Lookup Up MDS via ID Corrected No Match Lookup corrected MDS Yes Union Data stream Yes >= Confidence No Match Split < Confidence Stage
  57. 57. Obrigado!