Your SlideShare is downloading. ×
  • Like
Bhawani prasad data integration-ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Bhawani prasad data integration-ppt

  • 1,180 views
Published

Bhawani prasad data integration-ppt

Bhawani prasad data integration-ppt

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,180
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
92
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1 Data Integration Strategy April 10, 2013 – BHAWANI NANDAN PRASAD – BI Practice Head SMP – IIM Calcutta, MBA – Stratford University USA, B.E. (IT)
  • 2. 2 2 IIF Moment Integration Scope and Frame Integration Strategy Agenda Business Requirements Appendix – Lessons Learned Best Practices and Industry Research Recommendation - Integration Strategy and Architecture Integration Strategy Decisions Integration Technology Comparison
  • 3. 3 3 IT Integration Scope and Frame
  • 4. 4 4 IN THE FRAME • Define IT enablement needs for the key Business Process Areas that are not met by Advanced Technical Applications. e.g., L&D Support Systems, Portals to facilitate workflow, Management Reporting System(s). • Define standards for Data Definition, Information flow. Design architecture data and repositories to enable Reporting and data sharing across applications, including the “Given” applications. • Define standards for easy integration of new applications (design for growth). • Support Advanced Technical Apps team to meet ESAS and other standards and to ensure data integration. • Validate the infrastructure needs of applications against the PCIS environment. Initiate change if necessary. • Phase 4 planning – Proof of Concepts e.g. implement database structures, BI tools (internal pilots). • Application, and Information Governance design. 4 R00#1 Frame and Data Integration Charter “Robust” Messaging, Data transformation, Federation architecture Data warehousing, archive Enterprise data layer Business integration/analytics Bi-directional transfer of data Shows the R00#1 Match- play selection
  • 5. 5 55 IT Integration Scope Key Focus Area Related Area
  • 6. 6 Conceptual Architecture – Integration Focus area Presentation Refining Optimization Center Business Analytics (Dimensional Models) Integration Services Physical Integration Project DB Extract, Transform & Load Virtual Integration Information IntegrationInformation Integration Services Process & Workflow Process & Workflow Business Services Scheduling Simulations Planning Shipping Maintenance Blending Historian Production Operations Information Sources Data Integration Application Data Document Repositories Data Sources DataGovernance Master Data Top 10 • Material • Production • Warf • TankDCS & instruments OE/Reliability Test Results Field Reliability Center Key Focus Area Related Area
  • 7. 7 7 Solution Integration Architecture and Information Architecture • Solution Integration Architecture focus areas: – Process Integration: Workflow, Orchestration, ESB – Business Performance Mgmt: Business Activity Monitoring – Collaboration: Communications Information Architecture focus areas: – Data Integration: EAI, ETL, EII, CDC, Replication – Data Management: Repository for Core Data standardization; Data Governance; Master Data Management; Taxonomy – Data Standardization: Chevron/Industry standard model Definition, Semantic model, – Business Intelligence (BI): Reporting; Dashboard – Unstructured Data Management: Semantic Web; Ontology
  • 8. 8 8 IT Integration Business Requirements
  • 9. 9 9 Console Operator’s Data View Console Operator HMI 360 view ROC Oil Movement Systems Optimization (V-mesa, GDOT, DMCplus) Early Event Detection (EED) Mass Balance Alarm System (CAMMS) Procedure Mgmt System (ExaPilot) Root Cause Analysis Plant Resource Mgmt System (PRM) Business Event Log Global Notification Communication Alert Mgmt System Document Mgmt Knowledge Mgmt Learning Mgmt Console Scheduling Instruction Enterprise Asset Management (Maximo) Erroneous Data Log IT Asset Mgmt (ITSM) KPI Tracking & Dashboard Lab System (STARLIMS) Optimization Log Scheduling Systems Lynx SIMTO/MB O/RVT Simulator Shift Turnover Structure Round Safety Instrumented System (SIS) Legend IT scope AT scope Existing LP System Flying Petro Loss Prevention & HES Systems
  • 10. 10 10 System Data Flow (Partial example) Legend IT scope AT scope Existing Based on BPR Business Requirements and IA Assessment details, every systems and data input/output are depicted in diagrams as below and vetted with SMEs to assure Business Requirements are captured correctly in relation to data. This then serves as basis for our RICEF (Report, Interface, Conversion, Extension & Forms ) detail list.
  • 11. 11 11 RICEF List Summary Reports Web Reports Dashboard Conversions Extensions (Forms) 6 >100 10 3 Shift Turnover Report General KPIs Document Management System (DMS) Maximo Console Scheduling Instructions Blending KPIs Knowledge Management System (KMS) Lab Training Level Assessment EED & Alarms KPIs Turnover documents into Shift Turnover System Knowledge Management System Equipment Maximo List Reliability KPIs Plant Procedures into ExaPilot Lab Stream Reports SIS KPIs Console Scheduling Instructions (CSI) Energy KPIs Learning Management System (LMS) Optimization KPIs Plant economics information to Area of Optimization Petro Process Control KPIs (?) P&IDS, drawings Others ITSM conversion Facilities Phone list to Refinery Phone directory RICEF is an acronym for Reports, Interfaces, Conversions, Extensions, and Forms, all of which are basis for Data Integration.
  • 12. RICEF List Summary 12 Example Total of 36 “system to system” data interfaces in IT Scope  25 interfaces with bulk data on scheduled basis  7 interfaces on demand with low volume of data  4 interfaces with real-time streaming data  6+ of the 36 have process & workflow requirements
  • 13. RICEF List Interfaces in IT Scope Bulk Data (large volumes) on a Schedule Basis (25) Scheduling (SIMTO/MBO) to Console Scheduling Instructions (CSI) * Scheduling (SIMTO/MBO) to RVT Erroneous Data Log (EDL) to RVT (Lynx)* Structure Rounds to Shift Turnover Event Log to Shift Turnover Maximo to Shift Turnover Shift Turnover to Knowledge Management System (KMS) ** Optimization Log to KMS ** Document Management System (DMS) to KMS ** PI – Health Environmental and Safety (HES) (continued ) PI - Accounting systems PI-Lab Lab to DMS Learning Management System (LMS) to DMS LMS to People Data IT Asset Management System (ITSM) to Simulator Scheduler CSI/SIMTO - RVT SIMTO to Area of Optimization Oil Movement System to Wharf/Pipeline Scheduling (2) Maximo / Lab / Reliability / PI to KPI (4+) 13 * Indicates possible workflow * Indicates Unstructured data
  • 14. RICEF List Interfaces in IT Scope 14 On Demand & Lower Volume Data (7) Maximo to Maintenance Schedule Maximo to Maintenance Request (approval/denial) * Learning Management System (LMS) to Simulation Systems LMS to Simulator Scheduling System Document Management System (DMS) to Simulator Scheduling System DMS to IT Asset Management System (ITSM) LMS to ITSM * Indicates possible workflow Real-time and streaming data (4) Oil Movement Systems (OMS) to Alert System Lab to Alert System Task System to Alert System Alert System to Global Notification System Process Interfaces – Workflow (6) Shift Turnover- Knowledge Management System (KMS) (approval) * Maximo to Maintenance Request (approval/denial) * Document Management System (DMS) to KMS (Approval) * Optimization Log to KMS * (approval) Scheduling (SIMTO/MBO) to Console Scheduling Instructions (CSI) * Erroneous Data Log (EDL) to RVT (Lynx) * Maximo to Maintenance Request (approval/denial) *
  • 15. RICEF List Interfaces in PCIS Scope 15 PCIS Scope (11) PI to Mass Balance PRM to Alarms System DCS to Alarm System PRM to Maximo Lab to-Mass Balance DCS to APC DCS to Exapilot DCS-OMS DCS-PI PI-APC Scheduling (SIMTO/MBO) - OMS
  • 16. 16 16 Integration Landscape Oil Movement System Early Event Detection Mass Balance Alarm System Procedure Mgmt System (Exa-Pilot) Plant Resource Mgmt System Business Event Log Alert Mgmt System Knowledge Mgmt System Learning Mgmt System Console Scheduling Instruction Enterprise Asset Management Erroneous Data Log IT Asset Mgmt (ITSM) KPI Tracking & Dashboard Lab Systems (Starlims) Optimization Log Shift Turnover Structure Round PI (ODS with History) DCS Simulator Scheduling Outlook People Data Safety Instrumented System Real Time Scheduled On Request Scheduling RVT Legend IT scope AT scope Existing Level 3.5 P C I S S c o p e I T S c o p e Scheduling Systems SIMTO/MBO Simulation Communication Global Notification Document Mgmt System Flying Petro Optimization
  • 17. 17 17 Integration Technology Comparison
  • 18. 18 The Integration Stack & Products Low High High Integration Stack Vendor Capability BusinessCapabilityofadoptingIntegration File Transfer ETL Data Transfer File Transfer ETL Ad-Hoc Interfaces Point-to-point Interfaces ETL Adapters EAI WS*-Communication ESB Orchestration Engine BAM Service Registry BPM & SOA WS*-Communication ESB Orchestration Engine BAM Service Registry Web 2.0 Composite Solutions WS*-Communication ESB Orchestration Engine BAM Service Registry Web 2.0 EDA Integration 2.0 • Innovative integration techniques • Complex & flexible Technology • Mature integration techniques • Proven & Robust Technology Integration technology today moving data... …to synchronize and rationalize data for systems... …leveraging functionality in applications… …to create new processes and services to support business needs …predicting future business while sharing business capabilities with partners... …anywhere, anytime, and through any standard means More Less Centralized Integration 18
  • 19. 19 19 Key Integration Components Custom code remains a popular option Key Data Integration Methods EII (Enterprise Information Integration) CDC (Change Data Capture) EAI (Enterprise Application Integration) Data Replication ETL (Extract, Transform, Load) SOA (Service Oriented Architecture) Framework Key Process Integration Methods Workflow EAI Orchestration +ESB ESB (Enterprise Service Bus) Web Services EII (Enterprise Information Integration) EAI (Enterprise Application Integration) ETL (Extract, Transform, Load) CDC (Change Data Capture) Data Replication
  • 20. 20 20 Overview of Integration Components
  • 21. 21 21 Enterprise Information Integration (EII) Data Virtualization As a project-oriented DI middleware, data virtualization is often referred to as virtual data federation, high-performance query or EII. As enterprise architecture, it is frequently described as a virtualized data layer, an information grid, an information fabric or as data services in SOA environments. EII is a middle tier query server:  contains a metadata layer with consolidated business definitions.  Communicates through web services, database connections, or XML;  Listener waits for a request – sends whatever queries are needed across whatever data sources are required to return data to the requestor;  Metadata robustness is the differentiator.  Federated data stores produce accessibility to enterprise data without forcing central control.
  • 22. 22 22 Technology Overview: EII (Enterprise Information Integration) EII tools Create virtual views of distributed enterprise data through queries executed in real time without physically moving or copying data * Also known as data federation; virtual data warehousing; data virtualization Benefits Latency: Through federated queries, information can be accessed within milliseconds Storage: Data is not moved or copied from source systems, so additional storage is not required Drawbacks Volume: Should only be used for small targeted data sets Quality: Minimal transformation capabilities — efforts to include will negatively impact latency While never making a material impact as a pure-play market, data federation is an important part of the data integration platform, but need to watch out for high level integration and maintenance effort
  • 23. 23 23 Enterprise Application Integration (EAI) •EAI is focused on moving data between Enterprise Applications with business logic applied. It picks up an application transaction and initiates a transaction in another system, for example CRM system picks up a new order and enters it into your Financial Application.  Driven by business events  Connectivity between applications  Information consistency a key requirement  Bus/hub with application adaptors  Wire-level messaging protocols
  • 24. 24 24 Technology Overview: EAI (Enterprise Application Integration) EAI Tools These products, which started out as rudimentary software that supported basic messaging, routing, and data transformation needs, have grown into more sophisticated tools that now also provide full support for SOA as well as electronic data interchange (EDI). Benefits Latency: Through message based orchestration, information can be transferred within seconds to service real-time data integration Event based: Data transfer can be triggered by event Drawbacks Proprietary: Traditional EAI vendors used proprietary protocols Quality: Data validation can be performed, however doing it with match & merge multiple source systems is not the strength of this toolset EAI toolset can be brought in as a middleware framework to support SOA, however with insufficient in-house experience, a POC is recommended.
  • 25. 25 25 Extract Transform Load (ETL) As a data integration hub, ETL products connect to a broader array of databases, systems, and applications as well as other integration hubs. ETL batch architecture is generally split into 4 major components: Extract, Clean, Transform and Load.  Provide expanded functionality, especially in the areas of data quality, transformation, and administration  Coordinate and exchange meta data among heterogeneous systems to deliver a highly integrated environment that is easy to use and adapts well to change  Capture and process data in batch or near- real time using a standardized information delivery architecture  Provide greater performance, throughput, and scalability to process larger volumes of data at higher speeds  Load data more quickly and reliably, by adding change data capture techniques, continuous processing, and improved runtime operations.
  • 26. 26 26 Technology Overview: ETL (Extract, Transform and Load) •ETL tools Batch or incremental extraction of high volumes of data from one or more sources Able to run complex transformations on the data which can include cleansing, reformatting, standardization, aggregation, or the application of any number of business rules Loads the resulting data set into specified target systems Benefits Volume: Manages extremely high volumes of data movement Quality: Allows for complex data transformations, enabling much higher quality, hence more usable information Re-Use: Routines to extract/transform/load can be re-used by many applications Drawbacks Latency: Optimized when scheduled as batch data movement as opposed to real-time or on-demand. By reducing the volume throughput (with CDC) ETL can be used to meet operational near real-time requirements. Performance: Extracts can cause performance impacts to source production systems, so low-impact batch extraction “windows” need to be identified, or use CDC. ETL has spread beyond data warehousing and can supports near real-time data integration for both operational and BI applications
  • 27. 27 Change Data Capture (CDC) •CDC Integration Suite provides secure, high-volume, real-time, and bi-directional data integration and transformation between applications. This product supports a wide range of databases, including those that run on legacy, back-office, and other operational systems on different platforms. 27 Journal Log Redo/Archive Logs Publisher Engine And Metadata Subscriber Engine And Metadata TCP/IP GUI Unified Admin Point With Monitoring Databas e Audit Database Message Queue Web Services Business Process Publisher Subscriber • Provide a pseudo to actual real-time update capabilities • Heterogeneous system and platform support • Real-time selective data capture and delivery • Limited data transformation • High performance even with very high volumes; • Guaranteed integrity of data transactions- 2 phase commit
  • 28. 28 28 Technology Overview: CDC Change Data Capture CDC tools The optimal approach is to capture “deltas” or changes in the source data created or updated in operational systems as they are written to the DBMS log files and make them immediately available in real-time to downstream applications Benefits Volume: Captures only changes or “deltas” since last pull from source databases, reducing amount of data that needs to be moved Performance: With the option to access database log files versus production database — no performance impact to source operational systems Latency: Can enable continuous updates throughout the day Drawbacks Latency: No latency issue. Due to source log reading, source system down time could cause extra administrative task to synchronize and monitor data Performance: Data transformation ability is more limited unlike ETL tools CDC could be an option to combine with ETL for the enablement of near real- time and more throughput, with less impact to source systems.
  • 29. 29Copyright © 2006 Accenture All Rights Reserved. Enterprise Service Bus (ESB) • ESB is a product category in the integration market • ESBs (as products) are a set of technologies developed to support program-to-program communication/integration (such as Web services, Object request brokers, Remote procedure calls, MOM-Message Oriented Middleware, etc.) and SOA • ESBs are seen as a potential evolution of middleware technologies  All in one package (Administration and Management services, Service Definition tools and Repository services)  Combine features from previous integration technologies  Provide value added services:  Intelligent Routing  Message validation  Transformation  Security  Load balancing, etc.  Support highly distributed architectures ENTERPRISE SERVICE BUS .NET Application SOAP/HTTP J2EE Application JMS/JCA Legacy Application JCA / MQ Gateqay Partner Web Service SOAP/HTTP SOAP/HTTP Adapters Enterprise AppsEnterprise Apps Distributed Query Engiine Database Database
  • 30. 30 30 Technology Overview: ESB (Enterprise Service Bus) •ESB tools These technologies typically incorporate adapter technology to connect to a variety of application and database types, ability to route transactions according to business rules and transport transactions from source to target with low latency. Benefits Open: More open than EAI tools. Universal support for distributed processing External Entities: ESB is easier to configure and implement; hence often chosen to support B2B applications Drawbacks Vendor: ESB only vendors are smaller than EAI vendors Experience: Lack of Chevron internal working and support knowledge ESB is best used for establishing business processes (BPM) and orchestration infrastructure that will leverage a business services layer to support SOA across the entire enterprise. ESB federation can also be implemented to mitigate drawbacks.
  • 31. 31 31 • Focus on the Differences… Differences Between EAI and ETL EAI ETL Focus Application Integration Process, B2B Data Integration Analytic, KPI Timing Real-Time Batch, Near-real time Data Transactional Historical Transformation Minimal Complex Interfaces Predictable Evolutionary Volume Single Message or Transaction Bulk (Hour, Day, Week, etc.)
  • 32. 32 32 IT Integration Strategy Decision
  • 33. 33 33 Approach for Integration Strategy Decision 1. Translate BPR business process and functional requirements into system data flow diagrams, vetted and confirmed with business SMEs and AT teams. Categorize data integration requirements based on system data flow diagrams, and RICEF list was created 2. Study standards and seek to understand environment 3. Leverage other Information Management initiatives and EA direction. Capture lessons learned from others projects. 4. Conduct technology scanning and gather industry information from Gartner, Forrester, Open O&M and vendors 5. Develop Integration strategy focused on Data Integration, that supports Requirements, Process Integration and Data Management 6. Vet with Architects - IT AA, IA and SIA teams to get feedbacks 7. Present Integration strategy recommendation to technical review team 8. Include stakeholder and technical review board feedbacks and update recommendation
  • 34. 34 34 Objective and Criteria for Data Integration Strategy Decision Decision Objective Must support the business’ need in delivering timely & well integrated data with consistent naming, content and meaning; providing Console Operators a complete and concise view of trusted data.  Requirements: How well does the DI decision satisfy the requirements?  Standards: How well does the DI decision align with Chevron standards?  Reliability: Is the DI decision proven with robust technologies?  Interoperability: How well does the DI components interoperate?  Supportability: Does the DI decision match organization capability?  Total Cost of Ownership: Does this decision offer the optimum TCO?  Sustainability: Can the DI decision easily adapt to business changes?  Data Management: Does the DI decision support information management disciplines? Decision Criteria
  • 35. 35 35 Key Integration Alternatives Custom code remains a popular option Key Data Integration Methods EII (Enterprise Information Integration) CDC (Change Data Capture) EAI (Enterprise Application Integration) Data Replication ETL (Extract, Transform, Load) SOA (Service Oriented Architecture) Framework Key Process Integration Methods Workflow EAI Orchestration +ESB ESB (Enterprise Service Bus) Web Services EII (Enterprise Information Integration) EAI (Enterprise Application Integration) ETL (Extract, Transform, Load) CDC (Change Data Capture) Data Replication
  • 36. 36 Step 1 Business Requirement 36 “system to system” data interfaces in R00#1 IT Scope  25 bulk data on scheduled basis  7 on demand with low volume of data  4 real-time streaming data
  • 37. 37 37 Component Technology Standard Data Integration ETL tools Managed Choice: 1. IBM InfoSphere DataStage 2. MS SSIS Process Integration EAI tools Managed Choice: 1. SAP XI (version 3) 2. BizTalk Integration Middleware Managed Choice: 1. Integration Brokers (EAI tools) 2. Web Services 3. Batch file transfer 4. Direct access 5. Intermediate databases 6. Custom built Step 2 Standard and Usage
  • 38. 38 Step 3 Lessons learned from other projects • Selected SOA architecture to facilitate multiple data integration points with real time BI integration • Realized the value of Master Data Management with their SOA implementation • Selected a hub and spoke architecture to facilitate multiple data integration points with complex data translations. Selected ETL platform for data movement for all planning and scheduling data. • Realized the value of web services to facilitate work flow for data validation processes within the Refineries. • Selected a hybrid architecture to facilitate multiple data integration points with complex data translations. Most data was required in real time to capture trade deals. • Selected an ETL platform for application integration with robust transformation. • Selected orchestration to facilitate work flow and data integration with external parties and systems 38
  • 39. 39 Step 4 Integration Capability Comparisons Data Integration Technologies ETL EAI (Orchestration) Bulk Data Transfer Real Time Messaging Routing On Demand Data Integration Metadata Data Management Data Transformation Process Orchestration Distributed Processing Data Standardization Human Interfaces SOA and Web Services Integration WorkflowEII ESB full support partial support no support
  • 40. Step 5 Integration Components Chosen 40 Custom code remains a popular option Key Data Integration Methods EII (Enterprise Information Integration) CDC (Change Data Capture) EAI (Enterprise Application Integration) Data Replication ETL (Extract, Transform, Load) SOA (Service Oriented Architecture) Framework Key Process Integration Methods Workflow EAI Orchestration +ESB ESB (Enterprise Service Bus) Web Services EII (Enterprise Information Integration) EAI (Enterprise Application Integration) ETL (Extract, Transform, Load) CDC (Change Data Capture) Data Replication Selection
  • 41. 41 Integration Strategy Decision • Include both Process and Data Integration as a hybrid architecture • Process Integration includes EAI Orchestration and Workflow • Application Integration includes EAI for real-time data integration • Data Integration includes ETL for non real-time bulk data integration • ETL platform can be used to add on Data Management toolset • Exclude technologies that do not meet requirements or criteria • Custom coding does not meet supportability & TCO criteria • EII does not meet Info Mgmt - data standardization criteria • CDC for near real-time data can be handled by EAI • Replication/CDC is already used by PI (ODS), but is not extensible • ESB is not yet in CVX standard, EAI has some ESB features 41
  • 42. 42 42 Integration Conceptual Architecture Hybrid Technology Data Mart Human Workflow Process Integration Orchestration (EAI) Filter Route Other Requesting applications Receiving applications XML messages Transform Filter Route Service Calls Service wrapper Guarantee Data Integration Hub (ETL) Extract Transform Load Profile Quality Metadata Mgmt MDM Staging Area Data Warehouse ODS (PI) Source Systems OLAP cube HMI Operational BI Analytical BI Target Systems
  • 43. 43 43 Data Integration Strategy Decision Rationale Requirements • The combination of ETL, EAI & Workflow components satisfy the business requirements of bulk data transfer, real-time data integration and workflow. • ETL platform is necessary to facilitate analytical BI environment. Mature ETL platform incorporates information management - data standardization toolset. • EAI is required to facilitate the real-time application integration and automated work processes defined by the BPR teams. Standards • Ttechnology standards and best practices include Data Integration (ETL) and Application Integration (EAI) Toolsets. • Select a Data Management standard for Master Data Management, Metadata Management and Data Quality. Reliability • Using ETL toolset to provide bulk and scheduled data interfaces as baseline. This technology has been proven and used by many projects. • EAI technology is mature, EAI toolset (BizTalk).
  • 44. 44 44 Data Integration Strategy Decision Rationale - continued Interoperability • Partner orchestration of BizTalk and Share point portal. • Standard ETL and BI toolsets with web service to prove interoperability across the platforms. Supportability • Continue to use PI as data transfer hub between Plant Information Network (PIN) and Process Control Network (PCN) - Leverage what exists increases supportability. Apply PCIS standards to utilize OPC for PI integration. • GDST and Refinery have experiences with implementing and supporting ETL. ITC provides services for ETL support, database support and BI support. Total Cost of Ownership • Using EAI and ETL standard toolset to facilitate refinery centrally managed process and data flow would bring cost benefit in leveraging enterprise support and license costs. • Infrastructure and resources for ETL toolset may be shared with Lynx within refineries which can greatly reduce license and support cost.
  • 45. 45 45 Data Integration Architecture Decision Rational - continued Sustainability • EAI and workflow provides the foundation for SOA framework and adaption of newer technology (composite software and Web 2.0 ) is feasible • Once we gain more experiences with EAI and workflow toolset, it can be expanded to handle more integrations to accelerate SOA. • We will maintain ETL as a foundation to add on real-time and on-demand components. • SOA still maturing. Work with ITC to ensure we remain consistent with the company direction of SOA Data Management • Info Mgmt disciplines provide Data Governance, Master Data Management, Metadata Management and Data Quality improvement. • Using data integration hub to provide standardized data layer provides a good foundation for information management.
  • 46. 46 46 IT Recommendation Integration Strategy and Architecture
  • 47. 47 47 Integration Strategy Recommendation Business Requirements ETL EAI Workflow Near real time and scheduled Bulk Data Conversion and Interfaces X Integrate data from Operational BI to Analytical BI (load ODS and staging data into DW/Data Mart/OLAP) X Real-time Integration (application to application or Integration Hub to HMI) X On Demand low volume of data (event triggered data delivery) X Human-centric Workflow with orchestration X X Provide services to HMI in connecting all portals, application data, workflow data, integration hub data and collaboration data. X X Data Transformation, Meta Data Management, Data Cleansing X
  • 48. 48 Integration Architecture
  • 49. 49 Next Steps – Review Proof of Technology findings of EAI Tools – Gather and review feedback to update the DI Strategy recommendations: • Internal – IT EA and AA teams AT team, if feasible • External - Information Architects – Recommend Data Integration Toolsets
  • 50. 50 50 IT Integration Appendix – Lessons Learned
  • 51. 51 Lessons learned 1 – SOA architecture to facilitate multiple data integration points with real time BI integration instead of using an integration middleware • However, this alternative carries a very large architectural footprint, higher. costs, and demands for technology expertise. – Master Data Management with their SOA implementation • A reference data model was added to the SOA implementation when data quality issues were surfaced due to disparate data sources. 51
  • 52. 52 Upstream Foundation Services vs Data Integration SO – Small messages on demand – Transformations tend to be simple BI – Infrequent exchanges of (large) amounts of data – Transformations complex – Increasing drive for Real Time DW SOBI – Leverages the strengths at the extremes – Exploits the middle ground Messages vs. Data SO BI Fine Grain Services / Real- time events Medium Grain Services Coarse Grain Import / Export / ETL
  • 53. 53 SOBI Summary Service Orientation (SO) Business Intelligence (BI) • Provides application-to- application integration • Well suited to events and real- time data – high frequency • Allows agile change in business processes • Supports reuse of enterprise components • Encapsulates and abstracts functionality • Tightly defined data formats and structures • Well suited for data-to-data integration • Can handle large data volumes • Provides foundation for business decisions • Provides a combined model of the enterprise data • Good tools and mechanisms for transforming data • Ability to question the data and to answer key business questions
  • 54. 54 Solution Architecture Services Integration pattern Business Message Standards (schemas & semantics) Presentation Presentation Services (Analysis & Reporting) Business Analytics & Analysis Services (Dimensional Models) Integration Services Physical Integration Project DB Extract, Transform & Load Virtual Integration Data Integration Message Standards (schemas & semantics) Services Atomic & Composite Entity Services Proc ess & Workf low Business Services Production DrillingHES Maintenance Financial Well Reservoir Surveillance Analytics Notification Data • Enterprise • OPCO • SBU • Asset Message Standards (schemas & semantics) Applicati on Data Facade Document Repositories Facades Data Sources Facades Systems of Record (SoR) Hierarchy&CrossReferenceServices Master, Reference & Hierarchy SUPER 7 • Well • Reservoir • Equipment • Field • Property • Location • Facility
  • 55. 55 Lessons learned 2 – Select a hub and spoke architecture to facilitate multiple data integration points with complex data translations. Most data required for 1-7 day plans. – Use ETL platform for data movement for all planning and scheduling data. • Several ODS tables and data warehouse structures were built in the central hub (San Ramon) with supporting individual hubs within each refinery • A robust cross reference model was used for the numerous codes and data sources to provide a consistent name and definition of master data across the supply chain. – Use the value of web services to facilitate work flow for data validation processes. • A web services front end was added to the Validation Tool that provides updates and corrections for data to be used in the scheduling tool (SIMTO) 55
  • 56. 56 Conceptual Architecture Hub and Spoke Pattern External Source Systems SRA (Crude) ICTS SAP PS DF RSPF RBS&OP TI (SRA) ETL “Full visibility” with limited event notification capabilities “Integration” P to P Interfaces (Driven by SubTeams) – (Stored Procs, ETL or Connect Direct) WebaccessDashboard/KPI LynxReporting/AnalyticsArchitecture ADHOCReporting/QueriesDrillDown,OLAP Metadata Reporting/AnalyticsTool Lynx Data Warehouses (regional & global) Operational Data Stores/StagingETL Common Data Model Master Data Management Common Business Transformations SQL Server database ETL“Availability”
  • 57. 57 Lessons learned 3 – Hybrid architecture – To facilitate multiple data integration points with complex data translations. Most data was required in real time to capture trade deals. – ETL platform for application integration with robust transformation. – Orchestration tool (Bitzttalk) facilitates work flow and data integration with external parties and systems. 57
  • 58. 58 Logical Architecture - Hybrid integration pattern ServicesServices Service Providers Transport Providers Pipeline 3rd Party Leases Extex (Royalty Payments) Market Data Providers Deal Confirm Exchanges Banks Counterparty Inspection Terminal Ship Ports Rail Trucks 4GEN Tax “SOG” “Corporate Credit” Cashflow Netback “Master Data” - EA Master Data - EA Facilities - SAP SAP SAP Rolfe & Nolan NAVARIK Trading 1 Trading 2 Trading 3 Price Credit MDM +Xref O R C H E S T R A TI O N O R C H E S T R A TI O N RTR Intraday PositionSnapshot DB ETL BI Document Management MRA SAPXI MPA Price Noms Confirms Ship Status Lifting Schedules Ratings Deals Statements Royalty Vols Deals Corporate Credit Services SOG Services SAPXI TAX Clients Port Activity Credit Limit Master Data Credit Services Risk Algo License Mgm R&N Services Risk Services Price Services Master & Xref Services Unstructured Market/CP data Master Contracts Lease Vols Rates Payments Actual Volumes Inspection Reports Consolidated Position Viewer Brokers Rating Agency RailTrac CVMS/Shipnet Clients Confirms Tickets Schedules Refineries Exchange Allocation CP Services Banking Services Exchange Cuts News/Data AR/AP/GL Invoices Credit Exposure Plans Rail Car Ship info Ship Schedules Movement Tools Enterprise Facilities Credit Engine Valuation Libraries
  • 59. 59 59 IT Integration Best Practices
  • 60. 60 60 Best Practices for Data Integration 1. Don’t loose sight of DI Architecture vision, however include tactical data integration solution for specific business requirements. (phased approach) 2. Categorize data in business value and usage. (prioritize) 3. Prioritize the sequence of implementing data integration. (sequence) 4. Document data migration and infrastructure deployment roadmap. 5. Establish new standards for naming, data types and metadata. (governance) 6. Publish metadata definitions and glossaries of business terms. 7. Establish a coexistence strategy with legacy systems. Always have a migration plan. 8. Establish physical reference architecture and tools. 9. Implement environments for the foundation components ahead of time. 10.Begin data migration into the integrated environment.
  • 61. 61 61 Planning for Data Migration •Data Migration (Conversion) from legacy system to the newly integrated environment needs to be considered carefully by weighing highest value vs. highest usage. – Foundation Data Migration - Implementing the main lookup data, or master data, for enterprise – Core transactional data migration - Detailed transactions for the basic enterprise events – Application data migration – Supports specific company functions •This strategy leverages the building of foundational master data that will be most often queried by end users, then adding core transactional data that adds value and incrementally allows more business value as data becomes richer in content.
  • 62. 62 62 Data Governance Data Structure Data Quality Data Management Capabilities Data Creation Data Storage Data Movement Data Usage Data Retirement • Data Ownership • Data Stewardship • Data Policies • Data Standards • Data Workflow • Data Modeling • Data Taxonomy • Business Process Flows • Data Profiling • Data Cleansing • Data Transformation • Data Monitoring • Data Compliance • Data Traceability Master Data & Metadata • Master Data Management • Reference Data Management • Metadata Management Data Management Foundation
  • 63. 63 63 Integrating Data Content and Meaning •Another aspect of data integration is standardizing the usage of data content and meaning. This type of data content integration yields business efficiencies and quality of data. – Integration of content standardizes data values, e.g. lookup codes, across different data bases. (For example, if PI Tag or P&ID needs to be uniquely identified at the global level across all refineries, a newly defined unique ID can be created and tied with existing ID.) Depending on local operation or global data analysis, two sets of ID can be translated and delivered to satisfy user request. – Besides the physical data movement and storage of integrated data bases, the common integration of data meaning needs to be standardized. Metadata provides definitions of subject areas, tables, and columns in a data repository. – When all users refer to the data repository, the meaning of each data element is standardized to a common definition. – Additional metadata can be provided that displays calculations for derived data elements, glossaries of business terms, and lineage of the source of data.
  • 64. 64 64 Data Integration Architecture considerations Commonality, consistency and interoperability of DI components:  Minimal number of products or product suites supporting all data deliveries  Single metadata repository and/or the ability to share metadata across all components  Common design environment to support all deliverables  Interoperability with other integration tools and applications  Efficient support for all data deliveries regardless of runtime architecture (centralized vs. distributed )
  • 65. 65 65 Decision Making methodology Top-down •Integrate Use Case with Pattern Matching •Using integration-pattern matching, look for matches by comparing their specific use cases with “typically deployed” Data Integration patterns. Examples: • To improve Global Manufacturing-wide reliability reporting, the appropriate integration pattern would be an enterprise data warehouse that physically consolidates and summarizes OE data from across all refineries. • To provide operational DCS information to business level applications and for operational BI, an replicated operation data store that stores up-to-the-second transactional data would be the best fit. • To support upstream or downstream product movement analysis and establish a performance, a data mart or an OLAP cube sourced from the ODS or Data Warehouse would be the best pattern.
  • 66. 66 66 Decision Making methodology Bottom-up Assessing integration factors This is often valuable where the DI decision is complex and/or where a clear integration pattern match is not obvious. For example, to determine whether virtual, physical or a hybrid combination:  If data extracted from many source systems could be used by many other systems, then physical data store is good for data reuse and future expansion.  If significant data cleansing and complex transformation are required, then physical data consolidation is typically the most practical choice.  If harmonized data need to be aggregated, summarized to provide for analytical dashboard, then physical data store is needed to load into Data Warehouse/Data Mart and/or OLAP cubes.  If source systems are mostly available as system of record, data can be passed between systems without significant data matching, merging or harmonizing, then virtual makes sense.  Hybrid combination may be a good choice if a project has both real-time business process integration and large amount of data interfaces.
  • 67. 67 67 Data Integration Patterns
  • 68. 68 68 IT Integration Industry Research
  • 69. 69 69 Integration Product Comparison
  • 70. 70 70 Magic Quadrant Data Integration Tools
  • 71. 71 71 Traditional EAI vs. ESB Lightweight, distributed, standards-based and inexpensiveComplex, proprietary, centralized, and costly integration Flexible and adaptive business logicLack of support for new business logic AbstractionKnown Implementation Message OrientedObject and Message Oriented Loosely Coupled with coarse-grained Business ServicesTightly Coupled with use of proprietary adapters Services OrchestrationApplication Block Designed to changeDesigned to last Process OrientedFunctionality Oriented Service Oriented Architectureshub-and-spoke architecture ESBTraditional EAI
  • 72. 72 72 Use Cases of EAI, ETL, EAI + ETL •EAI Software An example - During the Internet boom, companies flocked to EAI to connect e-commerce with back-end inventory and shipping systems to reflect product availability and delivery times. •ETL toolset in an ‘always awake’ mode – near real time To deliver near-real-time capabilities. The ETL tools typically use application- level interfaces to detect new transactions or events as soon as they are generated by the source application. They then deliver these events to any application that needs them either immediately (near real time), at predefined intervals (scheduled), or when the target application asks for them (publish and subscribe). •EAI plus ETL EAI tools captures data and application events in real time and passes them to the ETL tools, which transform the data and loads it into the BI environment.
  • 73. 73 73 73 What vendors say about ESB? – Some stress the role of the ESB in eBusiness, its inter-organizational. Rather than intra- organizational role – Almost all believe, that the ESB is more than the bus it runs on. Essentially, they are describing a service-oriented architecture from another viewpoint – Some see orchestration as part of the ESB architecture, others do not – Some package MOM and EAI in their ESB products – Some identify event monitoring as the major differentiator from MOM – Some consider services management as part of the ESB solution – Some see an ESB as strictly related to Web services and describe it as a Web Services Network. All Vendors are “flexible” in defining ESB. Their definition always manages to show that their current solutions are using it
  • 74. 74 74 ESB, When to Consider – When deploying SOA across the enterprise – When establishing business processes (BPM) and orchestration infrastructure that will leverage a business services layer – When moving from a complex point-to-point or ‘spaghetti’ architecture to a more manageable and flexible IT infrastructure – When integrating to multiple and heterogeneous data sources and applications – When there is heavy business logic and security through the service bus to multiple end points – When further separation from composite applications is required (away from underlying implementations) – When flexible coupling is required
  • 75. 75 75 Information (Data) Services in SOA •For data to be a first-class citizen in the SOA world, a clear separation must exist between data consumers and data providers. This separation mirrors the principle that service consumers and providers must be distinct and separate in an SOA. Furthermore, this separation must be delineated by an interface, or contract, that both providers and consumers share
  • 76. 76 76 Gartner on SOA and Data Services Gartner suggested that success in loosely-coupled service-oriented business applications (SOBAs) becomes more difficult since each design point has to verify it own semantics, context and data structures. Key Findings Under a loosely-coupled architecture, data stewardship and governance best practices can be supported by data services within an SOA instead of embedding such practices within application logic. Where people and processes were formerly embedded in application design, they now fall under the domains of business process platforms and EIM - Enterprise Information Management. Predictions Based on lessons learned through data warehouse, data mart and operational data store implementation practices, 60% of failed information-as-a-service initiatives through 2009 will list a lack of an effective data governance strategy as one root cause of failure. Recommendations Organizations should begin their selection of data profiling, quality, mining and master data management tools with the end goal of deploying all the logic and processing within these tools as services that can interoperate and execute actions on behalf of and against data used by SOBAs, and as a callable service by business context services.
  • 77. 77 77 Composite solutions • Some of the approaches promoted by the Web 2.0 movement (mash-ups, RIA - Rich Internet Applications) are moving the Integration challenges up to the presentation layer SAP Work Management & Purchasing Personal Management Drilling Information Collaboration "As Is" Business Process: 3.0 Set-up New Well Sub process: 3.3 Set-up Well Ownership Company: APC Verison 1.0, Version Date 2/28/01 3.3.2 CREATE TEMP WELL FILE AND CHECKLIST OF STEPS TO COMPLETE D.O. PROCESS (LAND CLERK) R.O.W.L. DRILLING TITLE OPINION TITLE CURATIVES CONTRACTS AND LEASES FOR UNIT PLAT (IF NEEDED) SPACING/ POOLING INFORMATION 3.3.5 DELIVER WELL FILE TO D.O. MANAGER (LAND CLERK) 3.3.6 ASSIGN WELL FILE TO LAND ADMIN DIVISION ORDER ANALYST (D.O. MANAGER/ SUPERVISOR) 3.3.7 REVIEW WELL FILE FOR COMPLETENESS (LAND ADMIN) 3.3.9 ANALYZE AREA TO DETERMINE IF IN A PRIORITY MARKETING AREA (LAND ADMIN) PAPER PAPER 3.1.19 TRACK PARTNER AFE RESPONSES (LAND ADMIN) 3.5.1 PLACE DRILLING REPORT WITH "FINAL REPORT" STATUS ON NETWORK DRIVE (PROD CLERK) 3.3.1 SET-UP 100% APC BILLING SCHEDULE IN EXCALIBUR (JIB) A B 3.3.3 SEND R.O.W.L. TO JIB (LANDMAN) PAPER 3.3.8 COORDINATE WITH LANDMAN FOR MISSING FILE INFO. (LAND ADMIN DIVISION ORDER ANALYST) 3.3.4 UPDATE BILLING SCHEDULE WITH TRUE JIB INTEREST (JIB) PRE-DRILL ACTIVITIES "As Is" Business Process: 3.0 Set-up New Well Sub processes: 3.1 Set-up Drilling AFE Company: UPR Version 1.1, Version Date 3/5/01 3.1.1 RUN WELL ECONOMICS IN OGRE (RESVR ENGR) 3.1.2 TEAM MTNG TO COMMUNICATE NEED FOR AFE, LEASE AND WELL STATUS 3.1.3 SET-UP WELL NUMBER IN WINS (ENGR TECH) 3.1.4 CREATE $0.00 PENDING AFE IN WINS (LAND SPEC) 3.1.5 COMPLETE AND PRINT AFE (LAND SPEC) 3.1.6 ENTER $0.00 AFE IN EXCALIBUR (FIN SPEC) 3.1.9 APPROVE AFE BY COMMITTEE MEETING (CROSS-DEPT) E-MAIL E-MAIL, PHONE or FAX 3.1.7 NOTIFY LANDMAN AFE IS COMPLETE (LAND SPEC) PRINTED INTERNAL AFE 3.1.10 SEND SIGNED AFE TO FINANCIAL SPEC (LAND SPEC) 3.2.2 SET-UP WELL NUMBER IN PERC/ DIMS (AUTO) AUTO 3.1.8 NOTIFY ENGINEERING TECH AFE IS COMPLETE (LAND SPEC) SIGNED AFE A 3.2.1 SET-UP WELL NUMBER IN EXCALIBUR (AUTO) MARKETING PRICE INFORMATION G + G FORECAST ECONOMIC FORECAST WELL-UNIT OWNERSHIP (LANDMAN) "To Be" for 2001 Business Process: 3.0 Set-up New Well Sub-process: 3.3 Set-up Well Ownership Version 1.5, Version Date 7/18/01 3.3.1 PREPARE STAKE/ PERMIT PACKAGE IN WORD (LANDMAN) 3.3.4 BEGIN RELEASE OF WELL LOCATION MEMO (ROWL) IN WORD (LANDMAN/LAND EXPLORATION SPEC) 3.3.2 ORDER TITLE OPINION(S) (LANDMAN) 3.3.3 BUILD WELL/ UNIT FILES (LANDMAN) 3.3.7 EVALUATE PIPELINE CONNECTIONS TO WELL, PRIORITY OF MARKETING AREA (FIELD SERV) A 3.3.5 REVIEW JOA CONTRACT OWNERSHIP IN CONTRACTS (LANDMAN) 3.3.6 REVIEW OR CREATE CROSS REFERENCE OF JOA TO WELL(S) IN WINS (LANDMAN) 3.3.8 CAPTURE PRELIMINARY WELL OWNERSHIP IN ROWL (LANDMAN) PRE-DRILL ACTIVITIES 3.1.10 RECEIVE REQUEST FOR NEW WELL DRILL AFE OWNERSHIP (LAND ADMIN SPEC) 3.3.10 ENTER LEASES AND CONTRACTS INTO WINS; SET UP APO INTERESTS; SET INTEREST FINAL-LAND FLAG (LAND EXPLORATION SPEC) 3.3.13 REVIEW MKTG ARRANGEMENT SET-UP FOR ANY OWNER CHANGES (FIELD SERV) 3.3.14 REVIEW JIB DECK FOR ANY OWNER CHANGES (JIB ACCT) 3.3.12 COMPLETE AND APPROVE ROWL (LANDMAN/LAND EXPLORATION SPEC) 3.3.9 SEND WELL WORKING INTEREST PARTNERS AND PERCENTAGES TO BUSINESS SERVICES (LAND EXPLORATION SPEC) EMAIL AND POST TO NETWORK DRIVE 3.3.15 ANALYZE ROWL FOR DRILLING/ COMPLETION INFO IN DIMS, WINS, PDB (OPERATIONS TECH) 3.1.33 UPDATE FINAL INTERESTS BASED ON PARTNERS' RESPONSES IN ROWL (LAND ADMIN SPEC) EMAIL 3.3.11 ADD/COMPLETE NACU DATA TO ROWL (LANDMAN/LAND ADMIN ANALYST) TITLE CURATIVE, TITLE OPINIONS, ETC. 3.3.16 3.4.1 PRELIMINARY D.O. HEADER AUTO ESTABLISHED IN DOMAIN Documents Knowledge Management Planning Process Guides • Presentation Integration Servers enables the creation of Composite Applications by introducing a level of orchestration between the presentation layer of “legacy” and composed applications • Business processes are packaged and reused by BPM tools introducing business process layer composition • Solutions are built by combining capabilities at every level of the software stack: data, process and presentation
  • 78. Web 2.0 (1) 78 One’s view of Web 2.0 is highly dependent on one’s background and interest, and can best be described by these three anchor points: Technology and architecture – consists of the infrastructure of the Web and the concept of Web platforms. Examples of specific technologies include Ajax, Representational State Transfer (REST) and Really Simple Syndication (RSS.) Technologists tend to gravitate toward this view. Community and Social – looks at the dynamics around social networks, communities and other personal content publish/share models, WIKIs, and other collaborative-content models. Most people tend to gravitate toward this view, hence, there is a lot of Web 2.0 focus on “the architecture of participation.” Business and process – Web services-enabled business models and mashup/remix applications. (A mashup is a Web site of Web application that combines content from more than one source.) Examples include long-tail economics and advertising and subscription models such as a service (SaaS.) Of course, business people tend to zero in on this angle.
  • 79. 79 Web 2.0 (2) • What's Old Is New Again • Most of what people call Web 2.0 is not entirely new. Many of the concepts and technologies have existed for some time: • For example, RSS is essentially the same as resource definition framework, a format popularized by Netscape during Web 1.0 and the hype around push technology. • Ajax is essentially JavaScript, dynamic HTML and asynchronous XML, all of which have existed for more than five years and have become well-known with the advent of high-profile implementations such as Google Maps. • Certainly, collaboration and advertising are not new. • Mashups bear a striking similarity to the SOA-derived term "composite applications." What is new is how some of these are used, and in what combinations.