• Like
DWPresentation.ppt
Upcoming SlideShare
Loading in...5
×

DWPresentation.ppt

  • 353 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
353
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Administration Data Warehouse Environment (DWE) Implementation 8/19/04
  • 2.  
  • 3. DWE Terms
    • Source Data: Operational data from internal systems, such as IDMS (FES, FRS, HRS, SIS), Oracle, etc.
    • External Data : Data from systems external to the University, such as economic and census data collected by the government.
    • Data Staging Area: Storage and processing area for data extracted from the internal and external systems prior to loading into the Warehouse, Data Marts or Ad Hoc Query Repository. Some of the data will remain un-cleansed and an exact replica of the data in the online systems, for subsequent loading into the Ad Hoc Query Repository. Other data will be cleansed and transformed before being moved to the Data Warehouse and Data Marts for analysis. Some data will be located in multiple places and in multiple forms and aggregations. (Also known as an ETL or Extract, Transformation and Load server.)
    • Metadata : A term used for data that describes or specifies other data. It is used to define all of the characteristics of data required to build databases and applications, and to support knowledge workers and information producers. This includes data element name, meaning, format, domain values, business integrity rules, relationships, owner, etc.
  • 4. DWE Terms
    • Ad Hoc Query Repository: A collection of enterprise data from multiple sources, used to do ad hoc and operational reporting where the need to use the most current and un-standardized source data is a requirement. The Repository will typically contain only one or two years of the most recent data, unless regulatory or statutory requirements dictate otherwise. (Also known as an Operational Data Store or ODS .)
    • Data Warehouse: An enterprise-wide, cross-functional, cross-organizational database typically comprised of data extracted, cleansed and/or summarized from multiple online transaction processing systems, and other stores of data (Purdue University; Stanford University). It is designed for query and analysis, typically contains historical data, and is used to present information to support decision-making, tactical and strategic business processes. A data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used . In general, a data warehouse tends to be a strategic, but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need. ( Improving Data Warehouse and Business Information Quality , Larry P. English, 1999.)
  • 5. DWE Terms
    • Data Mart: A subset of enterprise data from the Data Warehouse that is summarized and stored in an optimal fashion for analysis and presentation of information to support trend analysis and tactical decisions and processes. Data Marts are typically designed based on an analysis of user needs to answer specific questions in the pursuit of specific goals . The scope can be that of a complete data subject such as Student, or of a particular business area or line of business, such as Enrollment. ( Improving Data Warehouse and Business Information Quality , Larry P. English, 1999.)
    • Enterprise Reporting: A category of software technology that enables the development, organization, sharing, execution, delivery and scheduling of reports via a web platform.
  • 6. DW Terms (Continued)
    • On-Line Analytical Processing (OLAP): A category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server. ( http:// www.moulton.com/olap/olap.glossary.html ) Functionality includes multi-dimensional analysis, slicing, drill-down and rotation.
    • Data Mining: A class of database applications that look for hidden patterns in a group of data. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data. ( http://www.webopedia.com/TERM/d/data_mining.html )
  • 7. DW Terms (Continued)
    • Executive Information System (EIS): An application developed to provide senior management direct access to information relevant to an organization’s goals and performance, such as a dashboard. These applications are developed to gather, analyze and integrate internal and external data to provide management with insight into key performance indicators, potential problems, and changes in the environment. Typical features include extensive use of graphics, simple navigational controls, automatic replacement of report contents, drill-down analysis, trend analysis capabilities, exception reporting or alerts, graphical charts with links to underlying reports, provision of data from multiple sources, and the highlighting of information an executive feels is critical. ( The Data Warehouse Lifecycle Toolkit , Ralph Kimball, et al.)
  • 8.  
  • 9.  
  • 10.  
  • 11.  
  • 12. DWE Current Resources
      • Query Repository Production : PowerEdge 6650, 4 2.8GHz CPU, 4GB RAM, 1.2TB storage, Windows Server 2003 Development : PowerEdge 2650, 1 3.0GHz CPU, 2GB RAM, 252GB storage, Windows Server 2003 Software: Oracle Enterprise
      • ETL Production : Dell PowerEdge 6650, 4 2.0GHz CPU, 2TB storage, Windows 2000 Advanced Server Development : Dell PowerEdge 6650, 2 2.0GHz CPU, 1TB storage, Windows 2000 Advanced Server Software : Informatica PowerCenter
      • Enterprise Reporting Production : PowerEdge 2650, 2 2.8GHz CPU, 4GB RAM, 291GB storage, Windows 2003 Server Standard Development : PowerEdge 2550, 2 1.27GHz CPU, 1GB RAM, 220GB storage, Windows 2000 Server Software: WebFOCUS
      • Statistical Analysis : Dell PowerEdge 2550, 2 1.4 GHZ CPU, 4GB RAM, 144GB storage, Windows 2000 Software: SAS Enterprise Miner, Enterprise Guide, etc.
  • 13. DWE Tasks
      • DBA (1-2 FTE) – Design Oracle DB, write/run ETL jobs and production support (i.e. monitor system and DB performance, enforce security, schedule backups, etc.)
      • Data Administration (2-3 FTE) – User interface, develop requirements document for all DW projects and new views, evaluate data quality, develop specialized reports, test, train users and coordinate projects
      • Reporting (1-2 FTE) - Develop enterprise reports
      • All – Infrastructure design (with Systems staff), and tool evaluation (ETL, OLAP and desktop reporting) with help from the C/S group.
  • 14. Implementation Strategy - Educate Users
    • Basics – “What is a Data Warehouse?” Create a “ single-source-of-truth .” “What it’s not!” (It is not all the data, with daily updates and online storage.)
    • Change in culture – “Let’s make better decisions based on objective analysis of data.”
    • Set realistic expectations - No silver bullet. It can help you make better decisions, but you still have to be responsible for implementing those decisions.
    • Focus on institutional goals – “What is it we need to achieve? What metrics do we need to evaluate our progress in attaining goals?”
    • Importance of business sponsors – Make timely business decisions and support requests for necessary resources.
  • 15. Implementation Strategy - Requirements
    • Develop DWE in a phased approach.
    • Develop detailed requirements documents with users and institutional administrators for applications within the DWE (DW/DM and reports).
  • 16. Course Management (I.V.C.)
  • 17. Implementation Strategy – Data Quality
    • Focus on improving data quality, and establishing standards for data view and element names and data content.
  • 18. Implementation Strategy – Enterprise Reports
    • Gather user input on most important reports required by many users, and develop these reports with an enterprise reporting tool that allows us to deliver pre-defined parameter-driven reports via the web.
  • 19. 2001-2002: Infrastructure and Planning
    • Create IDMS data dump to Oracle
    • Implement WebFOCUS
    • Purchase data mining tools and server for IR
    • Create views for Query Repository (ad hoc reporting repository)
    • Establish enterprise standards for key data – Analysis and recommendations are ongoing
    • Identify and prioritize data mart development – Course Management Data Mart top priority for Data Stewards
  • 20. 2001-2002: Infrastructure and Planning (Continued)
    • Initiate GASB – Phase I
    • Initiate data quality projects
    • Review Desktop Reporting Tools – Ongoing review and testing of:
        • Brio
        • Crystal Reports
        • SAS
        • WebFOCUS
  • 21. 2002-2003: Data Mart Development, etc.
    • Complete GASB – Phase I
    • Implement SAS data mining server
    • Conduct data quality projects – vendor, facilities, FRS, TA data
    • Select and Purchase ETL Tool
    • Begin requirements on Course Management DM
    • Define standards for data view and element names
  • 22. 2003-2004: DWE Upgrades and User Support
    • Implement ETL tool
    • Upgrade database servers
    • Create Metadata application – “Data about data”
    • Conduct SAS data mining project on freshmen data
    • Provide user and technical training on reporting tools, support listservs and web page
    • Purchase enterprise reporting tool and develop reports
  • 23. 2003-2004: DWE Upgrades and User Support
    • Create new data views with standardized names
    • Complete GASB - Phase II
    • Continue development of the Course Management DM requirements
    • Initiate development of the requirements for the Resource Management DM
  • 24. 2004-2005: SAP, etc.
    • Complete standardization of remaining data views
    • Create additional enterprise reports
    • Evaluate SAP Business Warehouse (BW)
    • Conduct extensive data quality analysis for SAP
  • 25. Reporting Web Site and Metadata
    • Reporting URL: https://reporting.uky.edu/
    • Metadata URL: http://iweb.uky.edu/RptDataDesc/
    • Metadata directions URL: http://www.uky.edu/IS/DataAdmin/DOCS/metadata/MetadataDirections.pdf
    • Data element standards URL: http://www.uky.edu/IS/DataAdmin/DOCS/ware/IUUN0020-QRVE/QRVE-NamingStds/DataElementNamingStds.pdf
    • Data Administration URL: http://www.uky.edu/IT/DataAdmin/
  • 26. Naming Standards
    • All data view names start with “V_”.
    • All standard element names are comprised of words:
      • Prime (required) – describes the subject area of the data (i.e. account, student, department, course, etc.),
      • Qualifier (optional) – further defines and distinguishes the “prime” and “class” words (i.e. gender, ethnic, first, etc.),
      • Class (required) – describes the major classifications or types of data (i.e. name, date, code, amount, etc.).
    • Standard Name: “Prime”_”Qualifier”_”Class”; standard abbreviations
    • View - V_POSTN; Element - POSTN_BEG_DT
  • 27. Current Query Repository Data
    • UKFRS_FOC and UKHRS_FOC : to be used by WebFOCUS.
    • UKFRS_SYB : will be removed within 3-4 months .
    • GASB : non-standard views used by OC in producing institutional financial statements.
    • UKFRS_RPT, UKHRS_RPT, UKSIS_RPT and UKSIS_FAMSBR: standardized views will be created over the next couple months, and old views will be removed in 90 days after new views are available. Purchasing views in UKFRS_RPT are in development. UKHRS_RPT also contains standard Labor Distribution views.
    • UKHRS_STAT_RPT : HRS Stat File standard views currently in development and being tested.
  • 28. DWE/SAP Issues
      • How does the SAP Business Warehouse functionality compare to what we originally planned for the DWE?
      • Will the SAP BW replace our Data Warehouse/Marts?
      • Should we continue our plans for the historical legacy data in the DWE, and use the SAP BW for data “from this point forward”?
      • Can/how do we “merge/join” historical data with the new data in SAP?
      • What are our options to “interface” the SAP BW with our DWE (API, etc.)?
      • Should the SAP BW feed our DWE or vice versa?
  • 29. DWE/SAP Issues (Continued)
      • How much (years of data) should we load into the SAP OLTP system?
      • How much (years of data) should we load directly to the SAP BW?
      • What level of detail data should be loaded into the SAP BW, if the corresponding data is not available in the OLTP system?
      • Should we continue with the “data mart” concept within the SAP environment?
      • How easy is it to add new functionality to the SAP BW (data, reports, “cubes”, etc.)?
  • 30. Data Administration QUESTIONS?