Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
168
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Overview Discuss concepts Define terms High level view of data warehousing
  • A little history Mainframe systems ISIS for 15 years IBIS for 9 years Millions of records Adabas data base Natural is not an end user tool Requires programming staff in departments and colleges to get data
  • Steps required to get data from the mainframe for ad-hoc reporting: Get security authorization. Paper or IBIS form. Specify purpose of request, what data is required. ASR signs. Must be approved by all data stewards (may be several). Write a Natural program. Natural is a complex language, requires programming skills and extensive training. Adabas file structures are complex. Write job control language. Use Roscoe. Submit the job. Goes into a queue that runs only at night. Queues can be quite long. Some jobs will wait several days in the queue until they reach the top. Check the status. If anything failed, fix problems and repeat the steps. Experienced Natural programmer writing a simple job will take several days of coding, testing, running. Anything complicated can take weeks.
  • Steps required to get data from the mainframe for ad-hoc reporting: Get security authorization. Paper or IBIS form. Specify purpose of request, what data is required. ASR signs. Must be approved by all data stewards (may be several). Write a Natural program. Natural is a complex language, requires programming skills and extensive training. Adabas file structures are complex. Write job control language. Use Roscoe. Submit the job. Goes into a queue that runs only at night. Queues can be quite long. Some jobs will wait several days in the queue until they reach the top. Check the status. If anything failed, fix problems and repeat the steps. Experienced Natural programmer writing a simple job will take several days of coding, testing, running. Anything complicated can take weeks.
  • Steps required to get data from the mainframe for ad-hoc reporting: Get security authorization. Paper or IBIS form. Specify purpose of request, what data is required. ASR signs. Must be approved by all data stewards (may be several). Write a Natural program. Natural is a complex language, requires programming skills and extensive training. Adabas file structures are complex. Write job control language. Use Roscoe. Submit the job. Goes into a queue that runs only at night. Queues can be quite long. Some jobs will wait several days in the queue until they reach the top. Check the status. If anything failed, fix problems and repeat the steps. Experienced Natural programmer writing a simple job will take several days of coding, testing, running. Anything complicated can take weeks.
  • Steps required to get data from the mainframe for ad-hoc reporting: Get security authorization. Paper or IBIS form. Specify purpose of request, what data is required. ASR signs. Must be approved by all data stewards (may be several). Write a Natural program. Natural is a complex language, requires programming skills and extensive training. Adabas file structures are complex. Write job control language. Use Roscoe. Submit the job. Goes into a queue that runs only at night. Queues can be quite long. Some jobs will wait several days in the queue until they reach the top. Check the status. If anything failed, fix problems and repeat the steps. Experienced Natural programmer writing a simple job will take several days of coding, testing, running. Anything complicated can take weeks.
  • Steps required to get data from the mainframe for ad-hoc reporting: Get security authorization. Paper or IBIS form. Specify purpose of request, what data is required. ASR signs. Must be approved by all data stewards (may be several). Write a Natural program. Natural is a complex language, requires programming skills and extensive training. Adabas file structures are complex. Write job control language. Use Roscoe. Submit the job. Goes into a queue that runs only at night. Queues can be quite long. Some jobs will wait several days in the queue until they reach the top. Check the status. If anything failed, fix problems and repeat the steps. Experienced Natural programmer writing a simple job will take several days of coding, testing, running. Anything complicated can take weeks.
  • Steps required to get data from the mainframe for ad-hoc reporting: Get security authorization. Paper or IBIS form. Specify purpose of request, what data is required. ASR signs. Must be approved by all data stewards (may be several). Write a Natural program. Natural is a complex language, requires programming skills and extensive training. Adabas file structures are complex. Write job control language. Use Roscoe. Submit the job. Goes into a queue that runs only at night. Queues can be quite long. Some jobs will wait several days in the queue until they reach the top. Check the status. If anything failed, fix problems and repeat the steps. Experienced Natural programmer writing a simple job will take several days of coding, testing, running. Anything complicated can take weeks.
  • Steps required to get data from the mainframe for ad-hoc reporting: Get security authorization. Paper or IBIS form. Specify purpose of request, what data is required. ASR signs. Must be approved by all data stewards (may be several). Write a Natural program. Natural is a complex language, requires programming skills and extensive training. Adabas file structures are complex. Write job control language. Use Roscoe. Submit the job. Goes into a queue that runs only at night. Queues can be quite long. Some jobs will wait several days in the queue until they reach the top. Check the status. If anything failed, fix problems and repeat the steps. Experienced Natural programmer writing a simple job will take several days of coding, testing, running. Anything complicated can take weeks.
  • Classic definition Move the data from the mainframe to a server where it can be accessed from the user’s PC.
  • Snapshot data, not live. Data extracted periodically. Changes on the mainframe may not be reflected on the warehouse for a week or more. Used for data analysis, not operational Transcripts are on the warehouse, but official transcripts are only available through ISIS.
  • We have now evolved to a 3-tier data structure at Penn State. Years ago we had only the bottom tier. Bottom -- transaction processing systems on the mainframe. Detail data updated immediately. From that, extract the middle tier: data warehouse. Detail data (some but not all). Not live, extracted periodically. From that, extract the top tier: executive information system. Summary data.
  • Data goes through a series of steps as it is moved to the warehouse Extract programs get data out of the mainframe legacy data base Verify accuracy and consistency of data -- “data legibility” Create normalized tables on the warehouse -- no data redundancy (I.e. address appears in one place only) Load tables with extracted data Establish a schedule to refresh the data -- frequency depends on volatility of the data. Some refreshed weekly, some once per semester
  • Data goes through a series of steps as it is moved to the warehouse Extract programs get data out of the mainframe legacy data base Verify accuracy and consistency of data -- “data legibility” Create normalized tables on the warehouse -- no data redundancy (I.e. address appears in one place only) Load tables with extracted data Establish a schedule to refresh the data -- frequency depends on volatility of the data. Some refreshed weekly, some once per semester
  • Data goes through a series of steps as it is moved to the warehouse Extract programs get data out of the mainframe legacy data base Verify accuracy and consistency of data -- “data legibility” Create normalized tables on the warehouse -- no data redundancy (I.e. address appears in one place only) Load tables with extracted data Establish a schedule to refresh the data -- frequency depends on volatility of the data. Some refreshed weekly, some once per semester
  • Data goes through a series of steps as it is moved to the warehouse Extract programs get data out of the mainframe legacy data base Verify accuracy and consistency of data -- “data legibility” Create normalized tables on the warehouse -- no data redundancy (I.e. address appears in one place only) Load tables with extracted data Establish a schedule to refresh the data -- frequency depends on volatility of the data. Some refreshed weekly, some once per semester
  • Data goes through a series of steps as it is moved to the warehouse Extract programs get data out of the mainframe legacy data base Verify accuracy and consistency of data -- “data legibility” Create normalized tables on the warehouse -- no data redundancy (I.e. address appears in one place only) Load tables with extracted data Establish a schedule to refresh the data -- frequency depends on volatility of the data. Some refreshed weekly, some once per semester
  • Users use query tools to get data. Tools are off-the-shelf software that runs on the PC. Users buy. Users can purchase whatever package they want based on platform (Windows vs. Mac), price ($50 - 600), preference (might be something you already use)
  • Results are returned quickly -- minutes -- and query criteria can be adjusted to modify results or obtain additional data Results returned to desktop and can be easily exported to various packages for manipulation and analysis Amount of data retrieved limited by query tool and size of PC hard drive GUI -- point and click
  • Results are returned quickly -- minutes -- and query criteria can be adjusted to modify results or obtain additional data Results returned to desktop and can be easily exported to various packages for manipulation and analysis Amount of data retrieved limited by query tool and size of PC hard drive GUI -- point and click
  • Results are returned quickly -- minutes -- and query criteria can be adjusted to modify results or obtain additional data Results returned to desktop and can be easily exported to various packages for manipulation and analysis Amount of data retrieved limited by query tool and size of PC hard drive GUI -- point and click
  • Results are returned quickly -- minutes -- and query criteria can be adjusted to modify results or obtain additional data Results returned to desktop and can be easily exported to various packages for manipulation and analysis Amount of data retrieved limited by query tool and size of PC hard drive GUI -- point and click
  • Results are returned quickly -- minutes -- and query criteria can be adjusted to modify results or obtain additional data Results returned to desktop and can be easily exported to various packages for manipulation and analysis Amount of data retrieved limited by query tool and size of PC hard drive GUI -- point and click
  • Results are returned quickly -- minutes -- and query criteria can be adjusted to modify results or obtain additional data Results returned to desktop and can be easily exported to various packages for manipulation and analysis Amount of data retrieved limited by query tool and size of PC hard drive GUI -- point and click
  • Can purchase any query tool with two criteria: Must do SQL Must support ODBC Most software packages on the market satisfy these criteria
  • If you have Microsoft Office on your PC, you have several query tools available to you that you might already be using. Access Excel Query (included in Excel)
  • Power user concept developed for data warehouse Hundreds of power users in depts, colleges, campuses Two criteria for power user: Understand the data. This is critical and not an easy task. If you’ve used AIDA or the mainframe systems, you already have some of this knowledge. Takes time and training. Familiar with query tools. Training available.
  • Power user concept developed for data warehouse Hundreds of power users in depts, colleges, campuses Two criteria for power user: Understand the data. This is critical and not an easy task. If you’ve used AIDA or the mainframe systems, you already have some of this knowledge. Takes time and training. Familiar with query tools. Training available.
  • Power user concept developed for data warehouse Hundreds of power users in depts, colleges, campuses Two criteria for power user: Understand the data. This is critical and not an easy task. If you’ve used AIDA or the mainframe systems, you already have some of this knowledge. Takes time and training. Familiar with query tools. Training available.
  • Must be connected to the backbone. SQL Client License is a piece of paper that makes it legal for your PC to access the data warehouse server For Windows, license comes with client tools, software which must be installed on PC. Mac users don’t need to install any extra software. Must have an ODBC-compliant query tool.
  • Request access. Each database must be requested separately. Requests via e-mail. Instructions on the web site. Set up your workstation. Purchase client license. Install client tools for Windows. Instructions on the web site. Select a query tool. ODBC-compliant. Install on your PC. Get training through HRDC and OAS: hands-on for Access, Excel, FilemakerPro Learn the data. Work with it, documentation on the web site, help in steward offices. If you have problems, contact user groups, listservs, web site.
  • Request access. Each database must be requested separately. Requests via e-mail. Instructions on the web site. Set up your workstation. Purchase client license. Install client tools for Windows. Instructions on the web site. Select a query tool. ODBC-compliant. Install on your PC. Get training through HRDC and OAS: hands-on for Access, Excel, FilemakerPro Learn the data. Work with it, documentation on the web site, help in steward offices. If you have problems, contact user groups, listservs, web site.
  • Request access. Each database must be requested separately. Requests via e-mail. Instructions on the web site. Set up your workstation. Purchase client license. Install client tools for Windows. Instructions on the web site. Select a query tool. ODBC-compliant. Install on your PC. Get training through HRDC and OAS: hands-on for Access, Excel, FilemakerPro Learn the data. Work with it, documentation on the web site, help in steward offices. If you have problems, contact user groups, listservs, web site.
  • Request access. Each database must be requested separately. Requests via e-mail. Instructions on the web site. Set up your workstation. Purchase client license. Install client tools for Windows. Instructions on the web site. Select a query tool. ODBC-compliant. Install on your PC. Get training through HRDC and OAS: hands-on for Access, Excel, FilemakerPro Learn the data. Work with it, documentation on the web site, help in steward offices. If you have problems, contact user groups, listservs, web site.
  • Request access. Each database must be requested separately. Requests via e-mail. Instructions on the web site. Set up your workstation. Purchase client license. Install client tools for Windows. Instructions on the web site. Select a query tool. ODBC-compliant. Install on your PC. Get training through HRDC and OAS: hands-on for Access, Excel, FilemakerPro Learn the data. Work with it, documentation on the web site, help in steward offices. If you have problems, contact user groups, listservs, web site.
  • Request access. Each database must be requested separately. Requests via e-mail. Instructions on the web site. Set up your workstation. Purchase client license. Install client tools for Windows. Instructions on the web site. Select a query tool. ODBC-compliant. Install on your PC. Get training through HRDC and OAS: hands-on for Access, Excel, FilemakerPro Learn the data. Work with it, documentation on the web site, help in steward offices. If you have problems, contact user groups, listservs, web site.
  • Web site has lots of useful information. Growing Knowledge base has tips, sample queries, troubleshooting
  • Can get to data to do analysis. We can work with better information, make decisions based on data. Mainframe cycles can be used for transaction processing. Programmers can use talents to enhance the mainframe systems instead of writing ad-hoc reports. Can retrieve data in minutes rather than days. Data can be retrieved with limitless combination of criteria and can be exported into other desktop tools for analysis and manipulation.
  • Can get to data to do analysis. We can work with better information, make decisions based on data. Mainframe cycles can be used for transaction processing. Programmers can use talents to enhance the mainframe systems instead of writing ad-hoc reports. Can retrieve data in minutes rather than days. Data can be retrieved with limitless combination of criteria and can be exported into other desktop tools for analysis and manipulation.
  • Can get to data to do analysis. We can work with better information, make decisions based on data. Mainframe cycles can be used for transaction processing. Programmers can use talents to enhance the mainframe systems instead of writing ad-hoc reports. Can retrieve data in minutes rather than days. Data can be retrieved with limitless combination of criteria and can be exported into other desktop tools for analysis and manipulation.
  • Can get to data to do analysis. We can work with better information, make decisions based on data. Mainframe cycles can be used for transaction processing. Programmers can use talents to enhance the mainframe systems instead of writing ad-hoc reports. Can retrieve data in minutes rather than days. Data can be retrieved with limitless combination of criteria and can be exported into other desktop tools for analysis and manipulation.
  • Data warehouse will continue to grow. New databases, new tables, new fields. Mainframe used for transaction processing. Very few ad-hoc reports will be run there. Only reports that require live data. No new AIDA development. No enhancements to existing AIDA’s.

Transcript

  • 1. INTRODUCTION TO THE PENN STATE DATA WAREHOUSE
  • 2. Where Have We Been?
    • Transactions processed on the mainframe -- ISIS, IBIS, ADIS
    • Many years of historical data, millions of mainframe records
    • Accessing mainframe data requires knowledge and skills in Natural programming language
  • 3. Steps Required to get Data from the Mainframe
    • Security forms to get access to files
      • Complete form. Specify purpose of request, what data is required. Signed by Access and Security Representative (ASR). Must be approved by all data stewards (may be several).
  • 4. Steps Required to get Data from the Mainframe
    • Security forms to get access to files
    • Write a Natural program
      • Natural is a complex language, requires programming skills and extensive training. File structures are complex.
  • 5. Steps Required to get Data from the Mainframe
    • Security forms to get access to files
    • Write a Natural program
    • Write the JCL
      • Use Roscoe system to write J ob C ontrol L anguage.
  • 6. Steps Required to get Data from the Mainframe
    • Security forms to get access to files
    • Write a Natural program
    • Write the JCL
    • Submit the job
      • Goes into a queue with other similar jobs: first in, first out.
  • 7. Steps Required to get Data from the Mainframe
    • Security forms to get access to files
    • Write a Natural program
    • Write the JCL
    • Submit the job
    • Wait at least overnight
      • Jobs in queue run only at night. Queues can be quite long. Some jobs will wait several days in the queue until they reach the top.
  • 8. Steps Required to get Data from the Mainframe
    • Security forms to get access to files
    • Write a Natural program
    • Write the JCL
    • Submit the job
    • Wait at least overnight
    • Job failed? Repeat steps
      • Check the status. If anything failed, fix problems and repeat the steps.
  • 9. Steps Required to get Data from the Mainframe
    • Security forms to get access to files
    • Write a Natural program
    • Write the JCL
    • Submit the job
    • Wait at least overnight
    • Job failed? Repeat steps
    • Days or weeks to complete
      • Experienced Natural programmer writing a simple job will take several days of coding, testing, running. Anything complicated can take longer.
  • 10. Solution to a problem -- AIDA
    • A dministrative I nformation D ecision A id (AIDA)
    • Available for ISIS (registration, admissions, enrollment) and IBIS (human resources) data
    • Run on the mainframe against extract (i.e. not “live” data) files
    • Create reports or data files for downloading
  • 11. Solution to a problem -- AIDA
    • AIDA’s were developed to fulfill ad-hoc reporting needs
    • ADVANTAGES : DISADVANTAGES :
    • No programming required Inflexible
    • Easy to use Outdated technology
    • Available to everyone, Long development time
    • even from terminals for a new AIDA
    • Eliminated a lot of ad-hoc Difficult to change or to
    • programming add new features
  • 12. What is a Data Warehouse?
    • The consolidation of data from mainframe legacy systems into subject-oriented tables that are accessible through desktop tools
  • 13. What is a Data Warehouse?
    • Move the data from the mainframe to a server where it can be accessed from the user’s PC
    • Snapshot data, NOT live
    • Use for data analysis, NOT operational
    Data extracted periodically. Changes on the mainframe may not be reflected on the warehouse for a week or more. Transcripts are on the warehouse, but official transcripts are only available through ISIS.
  • 14. AIS Three-Tier Data Structure EIS summary data Data Warehouse detail data extracted periodically ISIS, IBIS, ADIS detail transactions processed immediately Enterprise Information System extracts data from the warehouse and summarizes it Data Warehouse extracts detail data from the mainframe on a periodic schedule On-line transactions update the mainframe systems immediately
  • 15. Data Transformation
    • Data goes through a series of steps as
    • it is moved to the warehouse:
    • Extract programs
      • Write Natural programs to extract data from the mainframe data base
  • 16. Data Transformation
    • Data goes through a series of steps as
    • it is moved to the warehouse:
    • Extract programs
    • Verify data
        • Verify accuracy and consistency of data -- ensure “data legibility”
  • 17. Data Transformation
    • Data goes through a series of steps as
    • it is moved to the warehouse:
    • Extract programs
    • Verify data
    • Create tables
      • Create “normalized” tables on the warehouse -- eliminate data redundancy (i.e. address appears in one place only)
  • 18. Data Transformation
    • Data goes through a series of steps as
    • it is moved to the warehouse:
    • Extract programs
    • Verify data
    • Create tables
    • Load tables
        • Load warehouse tables with extracted data
  • 19. Data Transformation
    • Data goes through a series of steps as it is moved
    • to the warehouse:
    • Extract programs
    • Verify data
    • Create tables
    • Load tables
    • Refresh data
        • Establish a schedule to refresh the data. Frequency depends on volatility of the data. Some refreshed weekly, some once per semester
  • 20. How is the Warehouse Accessed? Data Warehouse End Users Query Tool to retrieve data Tools are off-the-shelf software that run on the desktop. Users can purchase whatever package they want based on platform, price, preference.
  • 21. Features of Query Tools
    • Easy access to data
      • Programming ability not required. Tools have G raphical U ser I nterface -- point and click.
  • 22. Features of Query Tools
    • Easy access to data
    • Quick results
      • Results are returned quickly, often within minutes.
  • 23. Features of Query Tools
    • Easy access to data
    • Quick results
    • Interactive approach to creating reports
      • Query criteria can be easily and quickly adjusted to modify results or obtain additional data
  • 24. Features of Query Tools
    • Easy access to data
    • Quick results
    • Interactive approach to creating reports
    • Data available on the user’s desktop
      • Results returned directly to desktop
  • 25. Features of Query Tools
    • Easy access to data
    • Quick results
    • Interactive approach to creating reports
    • Data available on the user’s desktop
    • Manipulation of data for customized report layout
      • Data can be exported to other desktop software for formatting and analysis
  • 26. Features of Query Tools
    • Easy access to data
    • Quick results
    • Interactive approach to creating reports
    • Data available on the user’s desktop
    • Manipulation of data for customized report layout
    • Unlimited retrieval of data
      • Amount of data retrieved limited only by query tool and size of PC hard drive
  • 27. Query Tool Requirements
    • To access Penn State’s data warehouse, you can purchase any query tool that satisfies the following two criteria:
    • Uses Structured Query Language (SQL)
    • Supports Open Data Base Connectivity (ODBC)
  • 28. Readily Available Query Tools
    • Microsoft Access
    • Microsoft Excel
    • Microsoft Query
    If you have Microsoft Office, you already have three query tools on your desktop that will work with Penn State’s data warehouse.
  • 29. Who Can Access the Warehouse?
    • Users in departments, colleges, campuses
      • Hundreds of Penn State data warehouse users in departments, colleges, campuses.
  • 30. Who Can Access the Warehouse?
    • Users in departments, colleges, campuses
    • Two requirements for warehouse users:
      • Understand the data
        • This understanding is critical and not simple to acquire. If you’ve used AIDA or the mainframe systems, you already have some of this knowledge. Takes time and training.
  • 31. Who Can Access the Warehouse?
    • Users in departments, colleges, campuses
    • Two requirements for warehouse users:
      • Understand the data
      • Familiar with query tools
        • Hands-on training available
  • 32. How to Get Connected
    • A PC or Macintosh connected to the Penn State
    • Backbone with TCP/IP communications software
    • SQL Client License or MS BackOffice license
    • (Purchased from the M icrocomputer O rder C enter)
    • For Windows, SQL Client Tools (provided by the
    • MOC when the SQL Client License is purchased)
    • must be installed on the PC. Not required for Macintosh.
    • Open Data Base Connectivity (ODBC) compliant
    • query tool
  • 33. First Steps for New Users
    • Be authorized as a warehouse user
      • Request access for each database separately. Requests submitted via e-mail. Instructions on the web site.
  • 34. First Steps for New Users
    • Be authorized as a warehouse user
    • Set up your workstation
      • Purchase client license. Install client tools for Windows. Instructions on the web site.
  • 35. First Steps for New Users
    • Be authorized as a warehouse user
    • Set up your workstation
    • Select a query tool
      • ODBC-compliant. Install on your PC.
  • 36. First Steps for New Users
    • Be authorized as a warehouse user
    • Set up your workstation
    • Select a query tool
    • Get training
      • Training offered through HRDC and TLT: hands-on and web-based
  • 37. First Steps for New Users
    • Be authorized as a warehouse user
    • Set up your workstation
    • Select a query tool
    • Get training
    • Learn the data
      • Work with it; documentation on the web site; help available from steward offices.
  • 38. First Steps for New Users
    • Be authorized as a warehouse user
    • Set up your workstation
    • Select a query tool
    • Get training
    • Learn the data
    • Get help
      • User groups, listservs, web site
  • 39. More Detailed Information ... on our Web Site http://ais.its.psu.edu/dataware/ Web site contains data documentation, instructions for setting up your PC to access the warehouse, sample queries, schedules for training and user group meetings.
  • 40. Benefits of the Warehouse
    • Improves access to administrative information for faculty and staff
      • Can get data quickly and easily to do analysis. We can work with better information, make decisions based on data.
  • 41. Benefits of the Warehouse
    • Improves access to administrative information for faculty and staff
    • Reduces cost
      • Mainframe can be used for transaction processing. Programmers can use skills to enhance the mainframe systems instead of writing ad-hoc reports.
  • 42. Benefits of the Warehouse
    • Improves access to administrative information for faculty and staff
    • Reduces cost
    • Timely retrieval of data
      • Can retrieve data in minutes rather than days.
  • 43. Benefits of the Warehouse
    • Improves access to administrative information for faculty and staff
    • Reduces cost
    • Timely retrieval of data
    • Flexibility
      • Data can be retrieved with limitless combinations of criteria and can be exported into other desktop tools for analysis and manipulation.
  • 44. What Does the Future Hold?
    • More data will be added to the warehouse
    • Mainframe will primarily be used for transaction processing