ECU’s Extract Transform and Load (ETL) Framework consists of two paths for loading external data into the Operational Data Store (ODS): Non-Oracle Data Sources (Microsoft SQL Server, MS Access databases, web services) and Oracle data sources. The paths are controlled by the external system and the mechanism to connect and extract the data. When the external system does not allow for an Oracle to Oracle connection, Microsoft SQL Server Integration Services (SSIS) is used as the foundation for the Non-Oracle data source path. When the external systems allows for an Oracle to Oracle connection the Oracle Data Source path is selected.
In this session we will present several major projects showcasing how ECU leverages Microsoft SQL Server Integration Services (SSIS), Oracle Streams, and the Ellucian/Banner ODS ETL process to load various types of external data into the Ellucian/Banner Operational Data Store (ODS).
ECU ODS data integration using OWB and SSIS UNC Cause 2013
1. UNC CAUSE 2013
Integrating Oracle and non-Oracle External Data into
the Ellucian/Banner ODS using Oracle Warehouse
Builder (OWB) and Microsoft SQL Server Integration
Services (SSIS)
East Carolina University Enterprise Analytics
Ruben Villasmil - villasmilr@ecu.edu
Keith Washer - washerk@ecu.edu
2. Integrating Oracle and non-Oracle External Data into ODS
Why this session?
The need for unified reporting by East Carolina University which utilizes numerous
Information Systems to accomplish its mission.
In this session we will present several projects showcasing how ECU leverages the
Ellucian/Banner ETL methodology, ORACLE Streams, OWB and Microsoft SQL Server
Integration Services (SSIS) to load various types of external data into the ECU Operational
Data Store (ODS)
5. Integrating Oracle and non-Oracle External Data into ODS
ECU's Extract Transform and Load (ETL) Framework consists of two paths for loading
external data into the Ellucian/Banner Operational Data Store (ODS):
• Oracle Data Source Path:
1. Data resides in Oracle
2. ECU DBAs manage the data sources (i.e., BlackBoard, DegreeWorks).
3. Tools and methods:
•
Ellucian’s ODS ETL process
• Oracle Warehouse Builder. Where necessary SQL , and PL/SQL scripts
1. Rationale
•
The infrastructure for this path was already in place as result of the ODS implementation. No additional tools/cost
were required/incurred for this path.
• Non-Oracle and Oracle(non-managed) Data Source Path:
1. Data resides in non-Oracle systems such as Microsoft SQL Server, MS Access databases, web services, or flat files.
2. Data resides in external Oracle system and ECU DBAs do not manage the data source.
3. Tools and methods:
•
Microsoft SQL Server Integration Services(SSIS) with best practices and standards.
1. Rationale
• In-house expertise with Microsoft’s Business Intelligence Stack(Reporting Services, Integration Services, Analysis
Services) architecture and products: MS SQL Server, BIDS/Visual Studio, Share Point.
• Tool provides native connectivity components to heterogeneous systems.
6. Integrating Oracle and non-Oracle External Data into ODS
(Continue)
Oracle Data Source Path Projects include:
• BlackBoard : From 1 billion records to 20 million+ records for reporting. Oracle streams is used to pre-filter what data is
replicated, and summary Object Access views are used to load the final tables for reporting.
• Sciquest XML data extracts: Oracle Streams is used to replicate the "clob" containing the Sciquest XML message. Then
Oracle xml syntax is used in the ETL process to parse the xml message and load Requisition and Purchasing data into ODS.
• DegreeWorks (DWs): Oracle Streams is used to replicate the necessary DWs tables. Then the standard ETL process is
used to load DWs data to ODS. Data is presented to the users via 38 CPA reporting views .
Non Oracle and Oracle(non-managed) Data Source Path Projects include:
• SSIS Infrastructure: ECU SSIS Package automation tool, ETL Package Logging, ETL Package Execution Reporting
• RAMSeS (Research Administration Management System & eSubmission): a comprehensive web-based Electronic
Research Administration (eRA) system to manage research more efficiently and effectively. Ramses includes several years
of proposal and award data.
10. Oracle Data Source Path: ECU ETL OBJECT
SUMMARY
ECU ETL Footprint
Load Groups:
ECU and ECUODS
Census Day Freeze
Sciquest
BlackBoard(BB)**
Degreeworks (DW)
ECU
Custom
ETL’S
and
Reporting
Views Footprint
are
Approaching
Ellucian’s
Str. Tables
50*
1
13
22
SCHEMA OWB Maps
ECU
ECUODS
140
39
14
21
4
22
Total
179
14
21
4
22
240
* In addition to Ellucian - delivered streamed tables
** Currently developing new ETL process to freeze BB student grade book data. Process will
add 7 additional OWB maps
Other Objects related to ECU related ETLs
Reporting views
452
ELLUCIANS ODS FOOTPRINT
ODSMGR OWB ETL
Load Maps
Update Maps
Delete Maps
252
225
226
ODSMGR Reporting views
505
Streamed Tables
Schema
(PBAN)
stage (ODS)
FAISMGR
403
FIMSMGR
571
GENERAL
358
ONESTOP *
297
PAYROLL
485
PORTAL2 *
144
POSNCTL
146
SATURN
1,230
TAISMGR
158
3,792
* ECU developed Schemas
94
206
111
36
168
8
31
436
49
1,139
11. ORACLE Data Path: BlackBoard
• System: BlackBoard Learning Management System.
• Requirements: Identify tool/application utilization per Academic Period.
• Tools: Oracle, BB Data dictionary. UMBC BB Project (http://www.umbc.edu/oit/newmedia/blackboard/stats/ )
• Challenges:
• Managing 1 billion records for reporting (# of records to determine utilization by College, student profile, and
course attributes).
• Identifying application paths for summary data.
• Joining BB data with ODS course data.
• Project summary: Developed 4 OWB maps to extract data from streamed tables. Developed 7
reporting views and 7 BIDs reports.
12. ORACLE Data Path: BlackBoard
Solving the Challenge:
•Developed Summary Composite views grouped by month, course, user and application
(Reducing the data for reporting from 1 billion to 20 million records).
•Identified additional application paths by extracting the information from the activity
accumulator “DATA” column.
•Mapped BB users by banner ID to ODS. Use ODS person to get Student Demographics.
•Mapped BB course identifiers to ODS by parsing the Course batchID:
SUBSTR (C.batch_uid, 7) bb_academic_period,
SUBSTR (C.batch_uid, 1, 5) bb_crn where INSTR (batch_uid, '.') = 6.
•Created OWB map to extract ODS Academic Study data into a separate table for
Performance Improvement.
16. Oracle Data Source Path: Sciquest
• System: SCIQUEST Requisition (PR) and Purchasing (PO) system (Third party vendor).
• Requirements: Extract PR and PO data from xml message delivered nightly by Sciquest.
• Tools: ORACLE , XMLSPY
• Challenges:
• Security: Validating the “original” xml required ORACLE go over the internet to Sciquest.
•
•
•
•
Performance: Oracle XDB parsing/validating was impacting the production database other
processes.
Performance: extracting XML data via views for reporting is sluggish (messages 10 Mbytes+)
Oracle issues pivoting data extracted in XML via relational views.
Handling PO/PR updates. Sciquest xml message contains the latest PO/PR information.
• Project summary: Developed 21 OWB maps to extract xml data from 1 streamed table. Developed 16
PO reporting views, and 13 PR reporting views. User is developing report solution in BIDS.
17. Oracle Data Source Path: Sciquest
Solving the Challenge:
•Security: Removed Sciquest Schema reference from XML message.
REGEXP_REPLACE ( SUBSTR (HTTP_CONTENT, 1, 1000), '(.*?)(<!DOCTYPE.*?">)(.*)|
(xmlns="http://solutions.*?xsd")(.*)', '13')
•Performance validating xml: Streamed xmlreceipt table to ODS. Leverage oracle 11G xmltype/clob
which allows “extract xml functions” without validating the entire content. (User assumes xml is valid). No
need for XDB to parse the xml.
•Performance querying xmltype: Changed approach for reporting. Data is extracted nightly and
appended to composite tables. Reporting views are based on composite tables. No xml extracts in reporting
views.
•PO/PR Updates: Created OWB map to track transactions loaded. Created delete OWB maps for PO
and PR composite tables. PO and PR OWB maps are set to insert only.
20. Oracle Data Source Path: Sciquest
Used in first OWB map to load the latest xml message
Sample xml extract used in other OWB maps
(18 composite views use this method)
21. Oracle Data Source Path: Sciquest
Delete map – Deletes same PO/PR ID if it exist
in the Composite tables.
Sample OWB Map:
LOAD_EFT_PO_HDR_CUST_FIELDS
22. Oracle Data Source Path: DegreeWorks
• System: Ellucian DegreeWorks. Curriculumn and Planning Tool for Student and advisors
• Requirements: Provide access to DegreeWorks Audit and Planning data to the registrar Office and
Advisors via ODS
• Tools: ORACLE. DW CPA reporting guide. DW Sample reports (earlier versions)
• Challenges:
•
Data structures in DW (CHAR vs. VARCHAR).
• Project summary: Developed 22 OWB maps. Developed 38 DW reporting views. During development
identified issues with Audit Data, Ellucian provided updated software to correct the data issue. Currently
working the users to develop report solution.
23. ORACLE Data Path:DegreeWorks
Solving the Challenge:
•Data structures :Developed script to leverage the data dictionary information and automate the DDL
creation for composite views. Composite views cast and trim source tables column based on column data
type:
CAST (TRIM (DAP_STU_ID) AS VARCHAR2 (10)) DAP_STU_ID,
CAST (TRIM (DAP_SCHOOL) AS VARCHAR2 (12)) DAP_SCHOOL,
30. SSIS Data Path: RAMSeS
• Source System: Microsoft SQL Server (Hosted at UNC- Chapel Hill)
• Application: RAMSeS
• Objects analyzed: 125 Tables + 23 Views = 148 Source Objects
• Tools used: Microsoft Business Intelligence Development Studio, C# and .NET, SQL Server Integration Services,
ECU SSIS Package Automation Tool, TOAD, SQL Scripts
• Challenges:
• Previously developed ETL Packages took several days to several weeks to complete with only 6- 12 source objects.
• Inconsistency when implementing SSIS Package naming/ETL standards combined with standard SSIS design
• Previously developed SSIS Packages had no robust logging or Package Execution Reporting.
• Project summary:
Microsoft SQL Server Integration Services and ECU’s SSIS Package automation tool are used to create an ETL
package to extract and load data from the Ramses database into the Ellucian/Banner ODS. Integration of the Ramses
data into the ODS allows the department of Institutional Research to compile an annual report in under 4 hours which
previously required a full 8 hours. Creating initial SSIS Packages has been reduced to under 5 minutes using ECU
SSIS Automation Tool. SSIS logging within the package is used to track execution errors, warnings,
package duration. Existing BI Stack-Reporting Services used to host a SSIS Package Execution Summary
Report for daily monitoring of SSIS ETL Package Executions.
31. SSIS Data Path: RAMSeS
Solving the Challenge:
•Implemented Staging and Target Schemas in the ODS
•Utilized SSIS Import/Export Wizard to quickly generate Ramses stage and target Destination tables
•Developed the ECU SSIS Package Automation Tool(Script Task, C# .NET) , Integrated ECU’s Methodology and
existing ETL Standards to efficiently build a standardized ETL Package.
•Developed an Object Mapping table – to support validation
•Leveraged existing SSIS Logging features to be configured automatically within the automation tool during package
creation
•Leverage existing Reporting Services Instance to host a SSIS Package Execution Summary Dashboard developed in
BIDS for daily monitoring of Package/ETL Job execution.
32. ECU ETL SSIS ARCHITECTURE for External Data Sources
1. SSIS Import Export Wizard (Creates Destination Tables in ODS)
2. Create Object Mapping Table
3. Generate SSIS ETL Package Encoded with design/ETL Standards
4. Deploy Package to SSIS ETL Server
SOURCE
DESTINATION
Web Services
RDBMS
ODS
Integration Services
5
SQL Agent Job
SSIS Package Store
SSIS Logging
FLAT FILES
ETL Job Execution Reporting
6
33. SQL Server Import Export Wizard:
Creating Destination Tables in the ECUBIC SSIS Staging Schema(ODS)
Enterprise system - objective ,, any ecu system to used for enterprise wide decision making must have its data reside in ODS
Reason why this session the complexity of an enterprise data store system
ECU commitment to making ODS an enterprise wide university data repository