Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Get the Most Out of Your Tools: Data Management Technologies

620
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
620
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. TITLE Welcome! Get the Most out of Your Tools: Data Management Technologies Date: November 13, 2012 Time: 2:00 PM ET Presenter: Dr. Peter Aiken PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 111/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 2. TITLE Get Social With Us! Live Twitter Feed Like Us on Facebook Join the Group Join the conversation! www.facebook.com/ Data Management & Follow us: datablueprint Business Intelligence @datablueprint Post questions and Ask questions, gain insights comments and collaborate with fellow @paiken Find industry news, insightful data management Ask questions and submit content professionals your comments: #dataed and event updates. PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 211/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 3. TITLE TITLE Meet Your Presenter: Dr. Peter Aiken •  Internationally recognized thought-leader in the data management field – 30 years of experience •  Recipient of multiple international awards •  Founder, Data Blueprint http://datablueprint.com •  7 books and dozens of articles •  Experienced w/ 500+ data management practices in 20 countries •  Multi-year immersions with organizations as diverse as the US DoD, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia and Walmart PRODUCED BY CLASSIFICATION DATE SLIDE PRODUCED BY CLASSIFICATION* DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3 DATA© BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 11/06/12 Copyright this and previous years by Data Blueprint - all rights reserved! EDUCATION 411/13/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 4. Data Management Technologies Data Management TechnologiesDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION
  • 5. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 511/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 6. TITLE The DAMA Guide to the Data Management Body of Knowledge Published by DAMA International • The professional association for Data Managers (40 chapters worldwide) DMBoK organized around • Primary data management functions focused around data delivery to the organization • Organized around several environmental elements Data Management Functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 611/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 7. TITLE The DAMA Guide to the Data Management Body of Knowledge Amazon: http:// www.amazon.com/ DAMA-Guide- Management- Knowledge-DAMA- DMBOK/dp/ 0977140083 Or enter the terms "dama dm bok" at the Amazon search engine Environmental Elements PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 711/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 8. TITLE What is the CDMP? • Certified Data Management Professional • DAMA International and ICCP • Membership in a distinct group made up of your fellow professionals • Recognition for your specialized knowledge in a choice of 17 specialty areas • Series of 3 exams • For more information, please visit: – http://www.dama.org/i4a/pages/ index.cfm?pageid=3399 – http://iccp.org/certification/ designations/cdmp #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 811/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 9. TITLE Data Management PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 911/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 10. TITLE Data Management Manage data coherently. Data Program Coordination Share data across boundaries. Organizational Data Integration Data Stewardship Data Development Assign responsibilities for data. Engineer data delivery systems. Data Support Operations Maintain data availability. PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1011/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 11. TITLE Data Management PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1111/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 12. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1211/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 13. TITLE Tools and Methods Are Required! PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1311/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 14. TITLE Sample Existing Environment Ma r ketin g Logistics Systems Flat Files S 2 BM RD HR Finance Manufacturing Systems Flat Files RDB #1 MS 1 R&D 2 BackOffice D# Applications R& #3 Network D Database R& PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1411/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 15. TITLE Reengineering is typically the problem solution… Reverse Engineering As Is Information As Is Data Design Assets As Is Data Implementation Requirements Assets Assets Existing To Be To Be To Be Data Requirements Design Implementation New Assets Assets Assets Forward engineering PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1511/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 16. TITLE Bibiana DuetsExample10124-C W. BROAD ST, GLEN ALLEN, VA 23060 DATA BLUEPRINT Query Outputs PRODUCED BY CLASSIFICATION DATE EDUCATION SLIDE11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 17. TITLE Data Management Technologies • Managing data technology should follow the same principles and standards for managing any technology • Leading reference model for technology management is the Information Technology Infrastructure Library (ITIL): http://www.itil-officialsite.com/home/home.asp from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1711/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 18. TITLE Understanding Data Technology Requirements Need to understand: • How the technology works • How it provides value in the context of a particular business • Requirements of a data technology before determining what technical solution to choose for a particular situation Suggested questions: • What problem does this data technology mean to solve? • What sets this data technology apart from others? • Are there specific hardware/software/operating systems/ storage/network/connectivity requirements? • Does this technology include data security functionality? from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1811/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 19. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1911/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 20. TITLE Defining Data Technology Architecture • Data technology is part of the overall technology architecture • It is also often considered part of the enterprise’s data architecture • Data technology architecture addresses 3 questions: – What technologies are standard/required/ preferred/acceptable? – Which technologies apply to which purposes and circumstances? – In a distributed environment, which technologies exist where, and how does data move from one node to another? from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2011/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 21. TITLE Data Technology Architecture, cont’d Data technologies to be included in the technology architecture: • Database management systems (DBMS) software • Related database management utilities • Data modeling and model management software • Business intelligence software for reporting and analysis • Extract-transform-load (ETL) and other data integration tools • Data quality analysis and data cleansing tools • Metadata management software, including metadata repositories from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2111/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 22. TITLE Data Technology Architecture, cont’d • The technology roadmap for the organization consists of technology objectives as well as reviewed, approved, and published technology architecture components • This strategic roadmap can be used to inform and direct future data technology research and project work from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2211/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 23. TITLE Polling Question #1 What is one important thing to understand about technology? a) It is sometimes free b) Buying the same technology that everyone else is using, and using it in the same way will create business value c) It should always be regarded as the means to an end, rather than the end itself PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2311/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 24. TITLE Data Technology Architecture, cont’d • It is important to understand several things about technology: – It is never free. Even open-sourced technology requires care and feeding. – It should always be regarded as the means to an end, rather than the end itself. – Most importantly: Buying the same technology that everyone else is using, and using it in the same way, does not create business value or competitive advantage. from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2411/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 25. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2511/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 26. TITLE CASE Tools Computer Aided Software/Systems Engineering Computer-aided software engineering Tools (CASE) is application of a set of tools and methods • Scientific the scientific application of a set of software system which is meant to result in to a tools and methods to a software system which is meantand result in high- high-quality, defect free, to maintainable software products quality, defect-free, and maintainable • Refers to methods for the development of software products. It also refers to information systems together with automated methods for the development of tools that can be used in the software information systems together with development process automated toolsinclude analysis, design, the • CASE functions that can be used in and programming software development process. Source: http://en.wikipedia.org/wiki/ PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2611/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 27. TITLE CASE Tools: Example(s) • Microsoft – Visio – Powerpoint – Excel • ERwin • ER/Studio List of CASE Tools: http://www.unl.csi.cuny.edu/faqs/software-enginering/tools.html PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2711/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 28. TITLE Figure 18.2 Sample budget for implementing a $2500/seat CASE technology can be $2.5 million over a 5-year period [adapted from Huff "Elements of a Realistic CASE Tool Adoption Budget" © 1992 Communications of the ACM] $187K = $2500/seat × 75 seats $360K = training $500K = workstations 28 $150K= assessment costs $910K = total initial investment $150K = in-house support $ 55K = hardware and software maintenance $ 60K = ongoing training and misc. $265K = annual additional investment × 5 years $1325K investment over 5 years PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 29. CASE Tool: "Taxonomy" TITLE • Senders—flows from the CASE effort that can inform the re-architecting effort. • Receivers —flows from the project that can inform the CASE effort. • Senders and receivers —some elements, such as restructuring and reengineering, are both senders and receivers. [adapted from Joanes Assessment and Control of Software © 1994 Prentice-Hall] PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2911/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 30. TITLE CASE-based XML Support http://www.visible.com PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3011/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 31. TITLE Changing Model of CASE Tool Usage Everything must "fit" into one CASE technology metadata A variety of Limited access CASE-based from outside CASE methods and the CASE tool-specific XML technologies can technology methods Integration access and environment and update the technologies metadata Additional metadata uses Limited additional accessible via: web; portal; metadata use XML; RDBMS PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3111/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 32. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3211/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 33. TITLE Repositories have been difficult to "sell" 21 September 1999 Michael Blechar, Lisa Wallace Management Summary Most executive and IS managers view an IT metadata repository as an esoteric technology that is not directly related to the business. However, as will be seen, an IT metadata repository can substantially help IS organizations support the applications, which in turn support the business. An IT metadata repository is a pre-built system and reference database where the IS organizations can track and manage the information about the applications and databases they build and maintain; think of it as the inventory and change impact reporting system for IS. These repositories track metadata such as the descriptions of jobs, programs, modules, screens, data and databases, and the interrelationships between them. Metadata differs from the actual data being described. Metadata is information about data. For example, the metadata descriptions in the repository tell one that the field "customer number" appears in Databases A, B and F ... [From gartner.com] PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3311/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 34. TITLE Repository Technologies in Use What tools do you use? 45% • Almost one in two organizations (45%) doesnt use repository technology • Almost one in four organizations (23%) is building their own repository technology 23% • The "traditional" players (CA & Rochade) are in use in 16% of organizations surveyed 13% 9% 7% 2% 1% 1% 1% 1% None HomeGrown Other CA Platinum Rochade Universal DesignBank DWGuide InfoManager Interface Repository Metadata Tool PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 Number Responding=181 EDUCATION 3411/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 35. TITLE Repository Evolution Traditional Evolving § Passive Analysis § Standards – investment protection: MOF § Relational & Data Warehouse § Openness, Simplification & Choice: XMI § Batch & Reports § Diverse metadata management § Optional not critical (including messaging) § Proprietary & OIM § Real time and ad hoc for decision support § Daily business value within a production architecture PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3511/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 36. TITLE Metadata Repositories 2004 "However, due to cost (these tools start at about $150,000, but frequently exceed $1 million) and being slow to market in terms of support for new service-oriented architectures (SOAs), CA and ASG have opened the door to smaller competitors" PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3611/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 37. Application Build Model IBMs AD/Cycle Information ModelDefines the tools, parameters and Business Business Strategy TITLEenvironment required to build an IBM AD/Cycle Model Rules Model Modelautomated Business Application. GoalsApplications Structure ModelDefines the overall scope of an automatedBusiness Application, the components of the Resource/application and how they fit together. Organization/ Problem LocationModelBusiness Goals Model ModelDefines the mission of theenterprise, its long-range goals,and the business policies andassumptions that affect itsoperations.Business Rules Model Enterprise Entity-Records rules that govern the Structure Relationshipoperation of the business and the Model ModelBusiness Events that triggerexecution of Business Processes. Process ModelData Structures ModelDefines the data structures and theirelements used in an automatedBusiness Application. Info Usage Flow ModelDB2 Model ModelRefines the definition of a Relational Value DomainDatabase design to a DB2-specificdesign. ModelDerivations/Constraints ModelRecords the rules for deriving legalvalues for instances of Extension Derivations/Entity-Relationship Model Global Text Support Model Constriantscomponents, and for controlling the Model Modeluse or existence of E-R instance.Enterprise Structure ModelDefines the scope of the enterpriseto be modeled. Assigns a name to themodel that serves to qualify eachcomponent of the model. Application ApplicationEntity-Relationship Model Structure Build ModelDefines the Business Entities, their Modelproperties (attributes) and the Programrelationships they have with other ElementsBusiness Entities. Model IMS StructureExtension Support Model DB2 Model ModelProvides for tactical InformationModel extensions to support specialtool needs.Flow Model Relational Data Library Panel/ ScreenSpecifies which of the Entity Database Test Model Structure Model ModelRelationship Model component Model Modelinstances are passed betweenProcess Model components. Library Model Program Elements Model Strategy ModelGlobal Text Model Records the existence of Identifies the various pieces and Records business strategies toSupports recording of extended non-repository files and the role they elements of application program resolve problems, address goals,descriptive text for many of the play in defining and building an source that serve as input to the and take advantage of businessInformation Model components. automated Business Application. application build process. opportunities. It also recordsIMS Structures Model Organization/Location Model Resource/Problem Model the actions and steps to be taken.Defines the component structures Records the organization structure Identifies the problems and needs Test Modeland elements and the application and location definitions for use in of the enterprise, the projects Identifies the various file (testprogram views of an IMS Database. describing the enterprise. designed to address those needs, procedures, test cases, etc.)Info Usage Model Panel/Screen Model and the resources required. affiliated with an automatedSpecifies which of the Identifies the Panels and Screens and business Application for use in Relational Database ModelEntity-Relationship Model the fields they contain as elements testing that application. Describes the components of a PRODUCED BYcomponent instances are used byother Information Model used in an automated Business Application. CLASSIFICATION DATE Relational Database design in Value Domain SLIDE Model terms common to all SAA Defines the data characteristicscomponents. DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 Process Model Defines Business Processes, their relational DBMSs. EDUCATION and allowed values for information items. 37 11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved! and components. sub processes
  • 38. TITLE Implementing Metadata Repository Functionality • "The repository" does not have to be an integrated solution – it must be an easily integrateable solution • Repository functionality (does not equal a) repository – metadata must easily evolve to repository solution • Multiple repositories are not necessarily bad – as interim solutions, Excel has been working quite well • Minimal functionality includes ability to create, read, update, delete, and evolve metadata items • Remember the 1st law of data management – In order to manage metadata, you need metadata repository functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3811/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 39. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3911/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 40. Profiling TITLE Data Discovery Technologies Analysis • Data analysis software technologies deliver up to 10X productivity over manual approaches • Based on a powerful computing technology that allows data engineers to quickly form candidate hypotheses with respect to the existing data structures • Hypotheses are then presented to the SMEs (both business and technical) who confirm, refine, or deny them • Allows existing data structures to be inferred at rate that is an order of magnitude more effective than previous manual approaches • Pioneers include Evoke->CSI, Metagenix->Ascential->IBM, Sypherlink PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 4011/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 41. How has this been done in the past? Old New • Manually • Semi-automated • Brute force • Engineered • Repository • Repository dependent independent • Quality • Integrated quality indifferent • Repeatable • Not repeatable • Currency • Accuracy41 - datablueprint.com 11/15/2012 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 42. TITLE Select an Attribute to get a list of values PRODUCED BY Double-click a value to CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 see rows with that value EDUCATION 4211/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 43. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 4311/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 44. TITLE Data Quality Engineering Tools 4 categories of Principal tools: activities: 1) Data Profiling 1) Analysis 2) Parsing and 2) Cleansing Standardization 3) Enhancement 3) Data Transformation 4) Monitoring 4) Identity Resolution and Matching 5) Enhancement 6) Reporting from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 4411/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 45. TITLE DQ Tools: DQ Tools: (2) Parsing & (1) Data Profiling Standardization • Need to be able to distinguish • Data parsing tools enable between good and bad data the definition of patterns that before making any feed into a rules engine improvements used to distinguish between • Data profiling is a set of valid and invalid data values algorithms for 2 purposes: • Actions are triggered upon – Statistical analysis and matching a specific pattern assessment of the data • When an invalid pattern is quality values within a data recognized, the application set may attempt to transform the – Exploring relationships that invalid value into one that exist between value meets expectations collections within and across data sets from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 4511/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 46. TITLE DQ Tools: DQ Tools: (4) Identify Resolution (3) Data Transformation & Matching • Upon identification of data 2 basic approaches to matching: errors, trigger data rules to • Deterministic transform the flawed data – Relies on defined patterns and rules • Perform standardization and for assigning weights and scores to determine similarity guide rule-based – Predictable transformations by mapping – Only as good as anticipations of the data values in their original rules developers formats and patterns into a • Probabilistic target representation – Relies on statistical techniques for • Parsed components of a assessing the probability that any pair of record represents the same entity pattern are subjected to – Not reliant on rules rearrangement, corrections, or – Probabilities can be refined based on any changes as directed by the experience -> matchers can improve rules in the knowledge base precision as more data is analyzed from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 4611/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 47. TITLE DQ Tools: DQ Tools: (5) Enhancement (6) Reporting Definition: Good reporting supports: • A method for adding value to • Inspection and monitoring of information by accumulating conformance to data quality additional information about a base expectations set of entities and then merging all • Monitoring performance of data the sets of information to provide a stewards conforming to data quality focused view SLAs Examples of data • Workflow processing for data quality incidents enhancements: • Manual oversight of data cleansing • Time/date stamps and correction • Auditing information Associate report results w/: • Contextual information • Data quality measurement • Geographic information • Metrics • Demographic information • Activity • Psychographic information from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 4711/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 48. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 4811/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 49. TITLE Traditional Quality Life Cycle PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 4911/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 50. TITLE Data Life Cycle Model Metadata Creation Metadata Refinement Data Refinement Data Metadata Assessment Structuring Data Utilization Data Creation Data Storage Data Manipulation PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5011/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 51. TITLE Extended data life cycle model with metadata sources and uses Starting point Metadata Refinement Metadata Creation for new • Define Data Architecture • Correct Structural Defects system • Update Implementation • Define Data Model Structures development architecture data architecture refinements Metadata Structuring Data Refinement • Implement Data Model Views • Correct Data Value Defects • Populate Data Model Views corrected • Re-store Data Values data data architecture and Metadata & data models Data Storage data performance metadata Data Creation facts & Data Assessment • Create Data meanings • Assess Data Values • Verify Data Values • Assess Metadata shared data updated data Starting point for existing Data Utilization Data Manipulation systems • Inspect Data • Manipulate Data • Present Data • Updata Data PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5111/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 52. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5211/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 53. TITLE Other Technologies Data Integration Definition: • Pulling together and reconciling dispersed data for analytic purposes that organizations have maintained in multiple, heterogeneous systems. Data needs to be accessed and extracted, moved and loaded, validated and cleaned, standardized and transformed. • Other tools include: – Servers – EII technologies – Portals – Conversion tools Source: http://www.information-management.com PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5311/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 54. TITLE Polling Question #2 Which is not a strategic technology trend in 2013? a) Hybrid IT and Cloud Computing b) App and Cloud Computing c) Personal Cloud PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5411/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 55. TITLE Top 10 Strategic Tech Trends in 2013 1. Mobile device Battles- By 2013 mobile phones will overtake PCs as the most common Web access device worldwide. 2. Mobile Applications and HTML5- For the next few years, no single tool will be optimal for all types of mobile application so expect to employ several. 3. Personal Cloud- The personal cloud will gradually replace the PC as the location where individuals keep their personal content. 4. Enterprise APP Stores- Enterprises face a complex app store future as some vendors will limit their stores to specific devices and types of apps forcing the enterprise to deal with multiple stores. 5. The Internet of Things- The Internet of Things (IoT) is a concept that describes how the Internet will expand as physical items such as consumer devices and physical assets are connected to the Internet. Source: http://www.gartner.com/it/page.jsp?id=2209615 PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5511/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 56. TITLE Top 10 Strategic Tech Trends in 2013 6. Hybrid IT and Cloud Computing- As staffs have been asked to do more with less, IT departments must play multiple roles in coordinating IT-related activities, and cloud computing is now pushing that change to another level. 7. Strategic Big Data- Big Data is moving from a focus on individual projects to an influence on enterprises’ strategic information architecture. 8. Actionable Analytics- Analytics is increasingly delivered to users at the point of action and in context. 9. In Memory Computing- In memory computing (IMC) can also provide transformational opportunities. 10.Integrated Ecosystems- The market is undergoing a shift to more integrated systems and ecosystems and away from loosely coupled heterogeneous approaches. Source: http://www.gartner.com/it/page.jsp?id=2209615 PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5611/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 57. TITLE XML Server Types: Integration, Mediation, Repository XML Integration Server Requirements • Traditional Integration with Existing Systems – Message Oriented Middleware – “EAI” Adapters • Validation – Using XML Schema or DTD • Query Multiple Integration Points using XQuery • Ease of Defining Mappings – XML to Existing Systems – Existing Systems Creating XML • APIs for XML Adapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5711/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 58. TITLE XML Server Types: Integration, Mediation, Repository XML Mediation Server Requirements • XML Standards Based – Ensures eXtensibility – Changing documents / applications – Transformation to new outputs • Validation – Using XML Schema or DTD – Business Rules • Integration with Existing Systems / Integration Servers • Ease of Defining Rules via GUI for Business User – IT Should Not Have to be Involved Adapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5811/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 59. TITLE XML Server Types: Integration, Mediation, Repository XML Repository Server Requirements • XML Optimization – Document Instance • XML Storage – Stores Document in Native Format • Better performance • Non-repudiation – Compression • XML Standards Support – Faster Development XML Server Types (Integration, Mediation, Repository) – Ensures Extensibility • Support Data Access Security at Node level Adapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5911/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 60. TITLE Portal Options [Adapted from Terry Lanham Designing Innovative Enterprise Portals and Implementing Them Into Your Content Strategies Lockheed Martin’s Compelling Case Study Web Content II: Leveraging Best-of-Breed Content Strategies - San Francisco, CA 23 January 2001] PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6011/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 61. TITLE Top Tier Demo PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6111/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 62. TITLE Portals as a Data Quality Tool PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6211/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 63. TITLE Meta-Matrix Integration Example PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6311/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 64. TITLE • Data extraction and conversion software solutions for transforming complex, unstructured data formats into XML for Enterprise Application Integration BizTalk – RTF – HTML – HL7 – Positional (Offset-Based) reports – TAB-delimited and other delimited reports – EDI Tamino • Binary documents are automatically converted to a suitable text for parsing for: – Microsoft Word documents – Microsoft Excel documents – PDF documents – COBOL programs ItemField http://www.itemfield.com/ PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6411/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 65. TITLE More Data Management Tools from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6511/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 66. TITLE More Data Management Tools from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6611/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 67. TITLE Outline 1. Data Management Overview 2. Data Management Tools Overview 3. Data Technology Architecture 4. CASE Tools 5. Repositories 6. Profiling/Discovery Tools 7. Data Quality Engineering Tools 8. Data Life Cycle 9. Other Technologies 10.Q&A Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6711/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 68. TITLE Questions? + = It’s your turn! Use the chat feature or Twitter (#dataed) to submit your questions to Peter now. PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6811/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 69. TITLE Upcoming Events December Webinar: Show Me the Money: The Business Value of Data and ROI December 11, 2012 @ 2:00 PM – 3:30 PM ET (11:00 AM-12:30 PM PT) Sign up here: • www.datablueprint.com/webinar-schedule • www.Dataversity.net Brought to you by: PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6911/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!