• Save
Data integration ppt-bhawani nandan prasad - iim calcutta
Upcoming SlideShare
Loading in...5
×
 

Data integration ppt-bhawani nandan prasad - iim calcutta

on

  • 1,051 views

Data integration ppt-bhawani nandan prasad - iim calcutta

Data integration ppt-bhawani nandan prasad - iim calcutta

Statistics

Views

Total Views
1,051
Views on SlideShare
1,051
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data integration ppt-bhawani nandan prasad - iim calcutta Data integration ppt-bhawani nandan prasad - iim calcutta Presentation Transcript

  • 1Data Integration StrategyApril 10, 2013 – BHAWANI NANDAN PRASAD– BI Practice Head SMP– IIM Calcutta, MBA – Stratford University USA, B.E. (IT)
  • 22IIF MomentIntegration Scope and FrameIntegration Strategy AgendaBusiness RequirementsAppendix – Lessons LearnedBest Practices and Industry ResearchRecommendation -Integration Strategy and ArchitectureIntegration Strategy DecisionsIntegration Technology Comparison
  • 33IT IntegrationScope and Frame
  • 44IN THE FRAME• Define IT enablement needs for the key Business Process Areas that are notmet by Advanced Technical Applications. e.g., L&D Support Systems,Portals to facilitate workflow, Management Reporting System(s).• Define standards for Data Definition, Information flow. Design architecturedata and repositories to enable Reporting and data sharing acrossapplications, including the “Given” applications.• Define standards for easy integration of new applications (design forgrowth).• Support Advanced Technical Apps team to meet ESAS and other standardsand to ensure data integration.• Validate the infrastructure needs of applications against the PCISenvironment. Initiate change if necessary.• Phase 4 planning – Proof of Concepts e.g. implement database structures,BI tools (internal pilots).• Application, and Information Governance design.4R00#1 Frame and Data Integration Charter“Robust”Messaging,Data transformation,FederationarchitectureData warehousing,archiveEnterprise data layerBusinessintegration/analyticsBi-directionaltransfer of dataShows the R00#1 Match-play selection
  • 555IT Integration ScopeKey Focus Area Related Area
  • 6Conceptual Architecture –Integration Focus areaPresentationRefining OptimizationCenterBusinessAnalytics(DimensionalModels)IntegrationServicesPhysical IntegrationProject DBExtract, Transform &LoadVirtual IntegrationInformation IntegrationInformation IntegrationServicesProcess &WorkflowProcess &WorkflowBusinessServices SchedulingSimulationsPlanningShipping MaintenanceBlending HistorianProductionOperationsInformationSourcesData IntegrationApplicationDataDocumentRepositoriesDataSourcesDataGovernanceMaster DataTop 10• Material• Production• Warf• TankDCS &instrumentsOE/ReliabilityTest ResultsField Reliability CenterKey Focus Area Related Area
  • 77Solution Integration Architecture and Information Architecture• Solution Integration Architecture focus areas:– Process Integration: Workflow, Orchestration, ESB– Business Performance Mgmt: Business Activity Monitoring– Collaboration: CommunicationsInformation Architecture focus areas:– Data Integration: EAI, ETL, EII, CDC, Replication– Data Management: Repository for Core Data standardization; Data Governance;Master Data Management; Taxonomy– Data Standardization: Chevron/Industry standard model Definition, Semantic model,– Business Intelligence (BI): Reporting; Dashboard– Unstructured Data Management: Semantic Web; Ontology
  • 88IT IntegrationBusiness Requirements
  • 99Console Operator’s Data ViewConsoleOperatorHMI360 viewROCOil MovementSystemsOptimization(V-mesa, GDOT,DMCplus)Early EventDetection (EED)Mass BalanceAlarm System(CAMMS)Procedure MgmtSystem (ExaPilot)Root Cause AnalysisPlant ResourceMgmt System (PRM)Business Event LogGlobal NotificationCommunicationAlert Mgmt SystemDocument MgmtKnowledge MgmtLearning MgmtConsole SchedulingInstructionEnterprise AssetManagement(Maximo)Erroneous Data LogIT Asset Mgmt (ITSM)KPI Tracking &DashboardLab System(STARLIMS)Optimization LogScheduling SystemsLynxSIMTO/MBO/RVTSimulatorShift TurnoverStructure RoundSafety InstrumentedSystem (SIS)LegendIT scopeAT scopeExistingLP System FlyingPetroLoss Prevention &HES Systems
  • 1010System Data Flow(Partial example)LegendIT scopeAT scopeExistingBased on BPR Business Requirements and IA Assessment details, every systems and data input/output are depicted indiagrams as below and vetted with SMEs to assure Business Requirements are captured correctly in relation to data. Thisthen serves as basis for our RICEF (Report, Interface, Conversion, Extension & Forms ) detail list.
  • 1111RICEF List SummaryReports Web ReportsDashboardConversions Extensions(Forms)6 >100 10 3Shift Turnover Report General KPIs Document Management System(DMS)MaximoConsole SchedulingInstructionsBlending KPIs Knowledge Management System(KMS)LabTraining Level Assessment EED & Alarms KPIs Turnover documents into ShiftTurnover SystemKnowledge ManagementSystemEquipment Maximo List Reliability KPIs Plant Procedures into ExaPilotLab Stream Reports SIS KPIs Console Scheduling Instructions(CSI)Energy KPIs Learning Management System(LMS)Optimization KPIs Plant economics information toArea of Optimization PetroProcess Control KPIs (?) P&IDS, drawingsOthers ITSM conversionFacilities Phone list to RefineryPhone directoryRICEF is an acronym for Reports, Interfaces, Conversions, Extensions, andForms, all of which are basis for Data Integration.
  • RICEF List Summary12Example Total of 36 “system to system” data interfaces in IT Scope 25 interfaces with bulk data on scheduled basis 7 interfaces on demand with low volume of data 4 interfaces with real-time streaming data 6+ of the 36 have process & workflow requirements
  • RICEF List Interfaces in IT ScopeBulk Data (large volumes) on a Schedule Basis (25)Scheduling (SIMTO/MBO) to Console SchedulingInstructions (CSI) *Scheduling (SIMTO/MBO) to RVTErroneous Data Log (EDL) to RVT (Lynx)*Structure Rounds to Shift TurnoverEvent Log to Shift TurnoverMaximo to Shift TurnoverShift Turnover to Knowledge Management System (KMS)**Optimization Log to KMS **Document Management System (DMS) to KMS **PI – Health Environmental and Safety (HES)(continued )PI - Accounting systemsPI-LabLab to DMSLearning Management System (LMS) to DMSLMS to People DataIT Asset Management System (ITSM) to SimulatorSchedulerCSI/SIMTO - RVTSIMTO to Area of OptimizationOil Movement System to Wharf/Pipeline Scheduling (2)Maximo / Lab / Reliability / PI to KPI (4+)13* Indicates possible workflow* Indicates Unstructured data
  • RICEF List Interfaces in IT Scope14On Demand & Lower Volume Data (7)Maximo to Maintenance ScheduleMaximo to Maintenance Request (approval/denial) *Learning Management System (LMS) to Simulation SystemsLMS to Simulator Scheduling SystemDocument Management System (DMS) to Simulator SchedulingSystemDMS to IT Asset Management System (ITSM)LMS to ITSM* Indicates possible workflowReal-time and streaming data (4)Oil Movement Systems (OMS) to Alert SystemLab to Alert SystemTask System to Alert SystemAlert System to Global Notification SystemProcess Interfaces – Workflow (6)Shift Turnover- Knowledge Management System (KMS)(approval) *Maximo to Maintenance Request (approval/denial) *Document Management System (DMS) to KMS (Approval)*Optimization Log to KMS * (approval)Scheduling (SIMTO/MBO) to Console SchedulingInstructions (CSI) *Erroneous Data Log (EDL) to RVT (Lynx) *Maximo to Maintenance Request (approval/denial) *
  • RICEF List Interfaces in PCIS Scope15PCIS Scope (11)PI to Mass BalancePRM to Alarms SystemDCS to Alarm SystemPRM to MaximoLab to-Mass BalanceDCS to APCDCS to ExapilotDCS-OMSDCS-PIPI-APCScheduling (SIMTO/MBO) - OMS
  • 1616Integration LandscapeOil Movement SystemEarly EventDetectionMass BalanceAlarm SystemProcedure MgmtSystem (Exa-Pilot)Plant ResourceMgmt SystemBusiness Event LogAlert MgmtSystemKnowledge MgmtSystemLearning MgmtSystemConsole SchedulingInstructionEnterprise AssetManagementErroneous Data LogIT Asset Mgmt (ITSM)KPI Tracking &DashboardLab Systems(Starlims)Optimization LogShift TurnoverStructure RoundPI (ODS with History)DCSSimulatorSchedulingOutlookPeopleDataSafety InstrumentedSystemReal TimeScheduledOn RequestScheduling RVTLegendIT scopeAT scopeExistingLevel 3.5PCISScopeITScopeScheduling SystemsSIMTO/MBOSimulationCommunicationGlobalNotificationDocument MgmtSystemFlying PetroOptimization
  • 1717IntegrationTechnology Comparison
  • 18The Integration Stack & ProductsLowHighHighIntegration Stack Vendor CapabilityBusinessCapabilityofadoptingIntegrationFile TransferETLDataTransfer File TransferETLAd-Hoc InterfacesPoint-to-pointInterfaces ETLAdaptersEAIWS*-CommunicationESBOrchestration EngineBAMService RegistryBPM & SOAWS*-CommunicationESBOrchestration EngineBAMService RegistryWeb 2.0CompositeSolutionsWS*-CommunicationESBOrchestration EngineBAMService RegistryWeb 2.0EDAIntegration 2.0• Innovative integration techniques• Complex & flexible Technology• Mature integration techniques• Proven & Robust TechnologyIntegration technology todaymoving data...…to synchronize andrationalize data for systems...…leveraging functionalityin applications……to create new processes and servicesto support business needs…predicting future businesswhile sharing businesscapabilities with partners...…anywhere, anytime, andthrough any standard meansMore LessCentralizedIntegration18
  • 1919Key Integration ComponentsCustom code remains a popular optionKey Data Integration MethodsEII(EnterpriseInformationIntegration)CDC(ChangeDataCapture)EAI(EnterpriseApplicationIntegration)DataReplicationETL(Extract,Transform,Load)SOA (Service Oriented Architecture) FrameworkKey Process Integration MethodsWorkflow EAIOrchestration +ESBESB (EnterpriseService Bus)Web ServicesEII(EnterpriseInformationIntegration)EAI(EnterpriseApplicationIntegration)ETL(Extract,Transform,Load)CDC(ChangeDataCapture)DataReplication
  • 2020Overview of Integration Components
  • 2121Enterprise Information Integration (EII)Data VirtualizationAs a project-oriented DI middleware, data virtualization is often referred to as virtual data federation,high-performance query or EII. As enterprise architecture, it is frequently described as a virtualizeddata layer, an information grid, an information fabric or as data services in SOA environments.EII is a middle tier query server: contains a metadata layer withconsolidated businessdefinitions. Communicates through webservices, database connections,or XML; Listener waits for a request –sends whatever queries areneeded across whatever datasources are required to returndata to the requestor; Metadata robustness is thedifferentiator. Federated data stores produceaccessibility to enterprise datawithout forcing central control.
  • 2222Technology Overview:EII (Enterprise Information Integration)EII toolsCreate virtual views of distributed enterprise data through queries executed in real timewithout physically moving or copying data* Also known as data federation; virtual data warehousing; data virtualizationBenefitsLatency: Through federated queries, information can be accessed within millisecondsStorage: Data is not moved or copied from source systems, so additional storage is notrequiredDrawbacksVolume: Should only be used for small targeted data setsQuality: Minimal transformation capabilities — efforts to include will negatively impactlatencyWhile never making a material impact as a pure-play market, datafederation is an important part of the data integration platform, but needto watch out for high level integration and maintenance effort
  • 2323Enterprise Application Integration (EAI)•EAI is focused on moving data between Enterprise Applications with business logic applied. It picks up anapplication transaction and initiates a transaction in another system, for example CRM system picks up a neworder and enters it into your Financial Application. Driven by business events Connectivity between applications Information consistency a key requirement Bus/hub with application adaptors Wire-level messaging protocols
  • 2424Technology Overview:EAI (Enterprise Application Integration)EAI ToolsThese products, which started out as rudimentary software that supported basic messaging,routing, and data transformation needs, have grown into more sophisticated tools that nowalso provide full support for SOA as well as electronic data interchange (EDI).BenefitsLatency: Through message based orchestration, information can be transferred withinseconds to service real-time data integrationEvent based: Data transfer can be triggered by eventDrawbacksProprietary: Traditional EAI vendors used proprietary protocolsQuality: Data validation can be performed, however doing it with match &merge multiple source systems is not the strength of this toolsetEAI toolset can be brought in as a middleware framework to support SOA,however with insufficient in-house experience, a POC is recommended.
  • 2525Extract Transform Load (ETL)As a data integration hub, ETL products connect to a broader array of databases, systems, andapplications as well as other integration hubs. ETL batch architecture is generally split into 4 majorcomponents: Extract, Clean, Transform and Load. Provide expanded functionality, especially inthe areas of data quality, transformation, andadministration Coordinate and exchange meta data amongheterogeneous systems to deliver a highlyintegrated environment that is easy to use andadapts well to change Capture and process data in batch or near-real time using a standardized informationdelivery architecture Provide greater performance, throughput, andscalability to process larger volumes of data athigher speeds Load data more quickly and reliably, by addingchange data capture techniques, continuousprocessing, and improved runtime operations.
  • 2626Technology Overview:ETL (Extract, Transform and Load)•ETL toolsBatch or incremental extraction of high volumes of data from one or more sourcesAble to run complex transformations on the data which can include cleansing, reformatting,standardization, aggregation, or the application of any number of business rulesLoads the resulting data set into specified target systemsBenefitsVolume: Manages extremely high volumes of data movementQuality: Allows for complex data transformations, enabling much higher quality, hence moreusable informationRe-Use: Routines to extract/transform/load can be re-used by many applicationsDrawbacksLatency: Optimized when scheduled as batch data movement as opposed to real-time oron-demand. By reducing the volume throughput (with CDC) ETL can be used to meetoperational near real-time requirements.Performance: Extracts can cause performance impacts to source production systems, solow-impact batch extraction “windows” need to be identified, or use CDC.ETL has spread beyond data warehousing and can supports near real-timedata integration for both operational and BI applications
  • 27Change Data Capture (CDC)•CDC Integration Suite provides secure, high-volume, real-time, and bi-directional data integration andtransformation between applications. This product supports a wide range of databases, including thosethat run on legacy, back-office, and other operational systems on different platforms.27Journal LogRedo/ArchiveLogsPublisherEngineAnd MetadataSubscriberEngineAnd MetadataTCP/IPGUIUnified AdminPointWith MonitoringDatabaseAuditDatabaseMessageQueueWebServicesBusinessProcessPublisher Subscriber• Provide a pseudo to actual real-timeupdate capabilities• Heterogeneous system and platformsupport• Real-time selective data capture anddelivery• Limited data transformation• High performance even with very highvolumes;• Guaranteed integrity of datatransactions- 2 phase commit
  • 2828Technology Overview:CDC Change Data CaptureCDC toolsThe optimal approach is to capture “deltas” or changes in the source data created orupdated in operational systems as they are written to the DBMS log files and make themimmediately available in real-time to downstream applicationsBenefitsVolume: Captures only changes or “deltas” since last pull from source databases,reducing amount of data that needs to be movedPerformance: With the option to access database log files versus production database —no performance impact to source operational systemsLatency: Can enable continuous updates throughout the dayDrawbacksLatency: No latency issue. Due to source log reading, source system down timecould cause extra administrative task to synchronize and monitor dataPerformance: Data transformation ability is more limited unlike ETL toolsCDC could be an option to combine with ETL for the enablement of near real-time and more throughput, with less impact to source systems.
  • 29Copyright © 2006 Accenture All Rights Reserved.Enterprise Service Bus (ESB)• ESB is a product category in the integration market• ESBs (as products) are a set of technologies developed to support program-to-programcommunication/integration (such as Web services, Object request brokers, Remote procedure calls,MOM-Message Oriented Middleware, etc.) and SOA• ESBs are seen as a potential evolution of middleware technologies All in one package (Administration and Management services, Service Definition tools and Repository services) Combine features from previous integration technologies Provide value added services: Intelligent Routing Message validation Transformation Security Load balancing, etc. Support highly distributedarchitecturesENTERPRISE SERVICE BUS.NETApplicationSOAP/HTTPJ2EEApplicationJMS/JCALegacyApplicationJCA / MQGateqayPartner Web ServiceSOAP/HTTP SOAP/HTTP AdaptersEnterpriseAppsEnterpriseAppsDistributedQueryEngiineDatabaseDatabase
  • 3030Technology Overview:ESB (Enterprise Service Bus)•ESB toolsThese technologies typically incorporate adapter technology to connect to a variety ofapplication and database types, ability to route transactions according to business rules andtransport transactions from source to target with low latency.BenefitsOpen: More open than EAI tools. Universal support for distributed processingExternal Entities: ESB is easier to configure and implement; hence often chosen tosupport B2B applicationsDrawbacksVendor: ESB only vendors are smaller than EAI vendorsExperience: Lack of Chevron internal working and support knowledgeESB is best used for establishing business processes (BPM) andorchestration infrastructure that will leverage a business services layer tosupport SOA across the entire enterprise. ESB federation can also beimplemented to mitigate drawbacks.
  • 3131• Focus on the Differences…Differences Between EAI and ETLEAI ETLFocus Application IntegrationProcess, B2BData IntegrationAnalytic, KPITiming Real-Time Batch, Near-real timeData Transactional HistoricalTransformation Minimal ComplexInterfaces Predictable EvolutionaryVolume Single Message orTransactionBulk (Hour, Day, Week,etc.)
  • 3232IT IntegrationStrategy Decision
  • 3333Approach for Integration Strategy Decision1. Translate BPR business process and functional requirements into system dataflow diagrams, vetted and confirmed with business SMEs and AT teams.Categorize data integration requirements based on system data flow diagrams,and RICEF list was created2. Study standards and seek to understand environment3. Leverage other Information Management initiatives and EA direction. Capturelessons learned from others projects.4. Conduct technology scanning and gather industry information from Gartner,Forrester, Open O&M and vendors5. Develop Integration strategy focused on Data Integration, that supportsRequirements, Process Integration and Data Management6. Vet with Architects - IT AA, IA and SIA teams to get feedbacks7. Present Integration strategy recommendation to technical review team8. Include stakeholder and technical review board feedbacks and updaterecommendation
  • 3434Objective and Criteria forData Integration Strategy DecisionDecision ObjectiveMust support the business’ need in delivering timely & well integrated data with consistent naming,content and meaning; providing Console Operators a complete and concise view of trusted data. Requirements: How well does the DI decision satisfy the requirements? Standards: How well does the DI decision align with Chevron standards? Reliability: Is the DI decision proven with robust technologies? Interoperability: How well does the DI components interoperate? Supportability: Does the DI decision match organization capability? Total Cost of Ownership: Does this decision offer the optimum TCO? Sustainability: Can the DI decision easily adapt to business changes? Data Management: Does the DI decision support information management disciplines?Decision Criteria
  • 3535Key Integration AlternativesCustom code remains a popular optionKey Data Integration MethodsEII(EnterpriseInformationIntegration)CDC(ChangeDataCapture)EAI(EnterpriseApplicationIntegration)DataReplicationETL(Extract,Transform,Load)SOA (Service Oriented Architecture) FrameworkKey Process Integration MethodsWorkflow EAIOrchestration +ESBESB (EnterpriseService Bus)Web ServicesEII(EnterpriseInformationIntegration)EAI(EnterpriseApplicationIntegration)ETL(Extract,Transform,Load)CDC(ChangeDataCapture)DataReplication
  • 36Step 1 Business Requirement36 “system to system” data interfaces in R00#1 IT Scope 25 bulk data on scheduled basis 7 on demand with low volume of data 4 real-time streaming data
  • 3737Component Technology StandardData IntegrationETL toolsManaged Choice:1. IBM InfoSphere DataStage2. MS SSISProcess IntegrationEAI toolsManaged Choice:1. SAP XI (version 3)2. BizTalkIntegrationMiddlewareManaged Choice:1. Integration Brokers (EAI tools)2. Web Services3. Batch file transfer4. Direct access5. Intermediate databases6. Custom builtStep 2 Standard and Usage
  • 38Step 3 Lessons learned from other projects• Selected SOA architecture to facilitate multiple data integration points withreal time BI integration• Realized the value of Master Data Management with their SOAimplementation• Selected a hub and spoke architecture to facilitate multiple data integrationpoints with complex data translations. Selected ETL platform for datamovement for all planning and scheduling data.• Realized the value of web services to facilitate work flow for data validationprocesses within the Refineries.• Selected a hybrid architecture to facilitate multiple data integration pointswith complex data translations. Most data was required in real time tocapture trade deals.• Selected an ETL platform for application integration with robusttransformation.• Selected orchestration to facilitate work flow and data integration withexternal parties and systems38
  • 39Step 4 Integration Capability ComparisonsData Integration Technologies ETLEAI(Orchestration)Bulk Data TransferReal Time Messaging RoutingOn Demand Data IntegrationMetadata Data ManagementData TransformationProcess OrchestrationDistributed ProcessingData StandardizationHuman InterfacesSOA and Web Services IntegrationWorkflowEII ESBfull support partial support no support
  • Step 5 Integration Components Chosen40Custom code remains a popular optionKey Data Integration MethodsEII(EnterpriseInformationIntegration)CDC(ChangeDataCapture)EAI(EnterpriseApplicationIntegration)DataReplicationETL(Extract,Transform,Load)SOA (Service Oriented Architecture) FrameworkKey Process Integration MethodsWorkflowEAIOrchestration +ESBESB (EnterpriseService Bus)Web ServicesEII(EnterpriseInformationIntegration)EAI(EnterpriseApplicationIntegration)ETL(Extract,Transform,Load)CDC(ChangeDataCapture)DataReplicationSelection
  • 41Integration Strategy Decision• Include both Process and Data Integration as a hybrid architecture• Process Integration includes EAI Orchestration and Workflow• Application Integration includes EAI for real-time data integration• Data Integration includes ETL for non real-time bulk data integration• ETL platform can be used to add on Data Management toolset• Exclude technologies that do not meet requirements or criteria• Custom coding does not meet supportability & TCO criteria• EII does not meet Info Mgmt - data standardization criteria• CDC for near real-time data can be handled by EAI• Replication/CDC is already used by PI (ODS), but is not extensible• ESB is not yet in CVX standard, EAI has some ESB features41
  • 4242Integration Conceptual Architecture Hybrid TechnologyData MartHumanWorkflowProcess IntegrationOrchestration (EAI)FilterRouteOtherRequestingapplicationsReceivingapplicationsXMLmessagesTransformFilterRouteService Calls Service wrapperGuaranteeData IntegrationHub (ETL)ExtractTransformLoadProfileQualityMetadataMgmtMDMStagingAreaDataWarehouseODS(PI)SourceSystemsOLAPcubeHMIOperational BIAnalytical BITargetSystems
  • 4343Data Integration StrategyDecision RationaleRequirements• The combination of ETL, EAI & Workflow components satisfy the businessrequirements of bulk data transfer, real-time data integration and workflow.• ETL platform is necessary to facilitate analytical BI environment. Mature ETLplatform incorporates information management - data standardization toolset.• EAI is required to facilitate the real-time application integration and automatedwork processes defined by the BPR teams.Standards• Ttechnology standards and best practices include Data Integration (ETL) andApplication Integration (EAI) Toolsets.• Select a Data Management standard for Master Data Management, MetadataManagement and Data Quality.Reliability• Using ETL toolset to provide bulk and scheduled data interfaces as baseline. Thistechnology has been proven and used by many projects.• EAI technology is mature, EAI toolset (BizTalk).
  • 4444Data Integration StrategyDecision Rationale - continuedInteroperability• Partner orchestration of BizTalk and Share point portal.• Standard ETL and BI toolsets with web service to prove interoperability across the platforms.Supportability• Continue to use PI as data transfer hub between Plant Information Network (PIN) andProcess Control Network (PCN) - Leverage what exists increases supportability. Apply PCISstandards to utilize OPC for PI integration.• GDST and Refinery have experiences with implementing and supporting ETL. ITC providesservices for ETL support, database support and BI support.Total Cost of Ownership• Using EAI and ETL standard toolset to facilitate refinery centrally managed process and dataflow would bring cost benefit in leveraging enterprise support and license costs.• Infrastructure and resources for ETL toolset may be shared with Lynx within refineries whichcan greatly reduce license and support cost.
  • 4545Data Integration ArchitectureDecision Rational - continuedSustainability• EAI and workflow provides the foundation for SOA framework and adaption of newertechnology (composite software and Web 2.0 ) is feasible• Once we gain more experiences with EAI and workflow toolset, it can be expanded tohandle more integrations to accelerate SOA.• We will maintain ETL as a foundation to add on real-time and on-demandcomponents.• SOA still maturing. Work with ITC to ensure we remain consistent with the companydirection of SOAData Management• Info Mgmt disciplines provide Data Governance, Master Data Management,Metadata Management and Data Quality improvement.• Using data integration hub to provide standardized data layer provides a goodfoundation for information management.
  • 4646IT RecommendationIntegration Strategy and Architecture
  • 4747Integration Strategy RecommendationBusiness Requirements ETL EAI WorkflowNear real time and scheduled Bulk DataConversion and Interfaces XIntegrate data from Operational BI toAnalytical BI (load ODS and staging datainto DW/Data Mart/OLAP)XReal-time Integration (application toapplication or Integration Hub to HMI) XOn Demand low volume of data (eventtriggered data delivery) XHuman-centric Workflow withorchestration X XProvide services to HMI in connecting allportals, application data, workflow data,integration hub data and collaborationdata.X XData Transformation, Meta DataManagement, Data CleansingX
  • 48Integration Architecture
  • 49Next Steps– Review Proof of Technology findings of EAI Tools– Gather and review feedback to update the DI Strategyrecommendations:• Internal – IT EA and AA teamsAT team, if feasible• External - Information Architects– Recommend Data Integration Toolsets
  • 5050IT IntegrationAppendix – Lessons Learned
  • 51Lessons learned 1– SOA architecture to facilitate multiple data integration points with realtime BI integration instead of using an integration middleware• However, this alternative carries a very large architecturalfootprint, higher. costs, and demands for technology expertise.– Master Data Management with their SOA implementation• A reference data model was added to the SOA implementationwhen data quality issues were surfaced due to disparate datasources.51
  • 52Upstream Foundation Services vs Data IntegrationSO– Small messages on demand– Transformations tend to be simpleBI– Infrequent exchanges of (large)amounts of data– Transformations complex– Increasing drive for Real Time DWSOBI– Leverages the strengths at theextremes– Exploits the middle groundMessages vs. DataSO BIFine GrainServices / Real-time eventsMedium GrainServicesCoarse GrainImport / Export /ETL
  • 53SOBI SummaryService Orientation (SO) Business Intelligence (BI)• Provides application-to-application integration• Well suited to events and real-time data – high frequency• Allows agile change in businessprocesses• Supports reuse of enterprisecomponents• Encapsulates and abstractsfunctionality• Tightly defined data formatsand structures• Well suited for data-to-dataintegration• Can handle large data volumes• Provides foundation for businessdecisions• Provides a combined model of theenterprise data• Good tools and mechanisms fortransforming data• Ability to question the data and toanswer key business questions
  • 54Solution Architecture Services Integration patternBusiness Message Standards (schemas & semantics)PresentationPresentation Services(Analysis & Reporting)Business Analytics& Analysis Services(Dimensional Models)IntegrationServicesPhysical IntegrationProjectDBExtract, Transform &LoadVirtual IntegrationData Integration Message Standards (schemas & semantics)ServicesAtomic & CompositeEntity ServicesProcess &WorkflowBusinessServicesProductionDrillingHESMaintenance FinancialWellReservoirSurveillanceAnalyticsNotificationData• Enterprise• OPCO• SBU• AssetMessage Standards (schemas & semantics)Application DataFacadeDocumentRepositoriesFacadesDataSourcesFacades Systemsof Record(SoR)Hierarchy&CrossReferenceServicesMaster,Reference& HierarchySUPER 7• Well• Reservoir• Equipment• Field• Property• Location• Facility
  • 55Lessons learned 2– Select a hub and spoke architecture to facilitate multiple dataintegration points with complex data translations. Most data requiredfor 1-7 day plans.– Use ETL platform for data movement for all planning and schedulingdata.• Several ODS tables and data warehouse structures were built inthe central hub (San Ramon) with supporting individual hubswithin each refinery• A robust cross reference model was used for the numerous codesand data sources to provide a consistent name and definition ofmaster data across the supply chain.– Use the value of web services to facilitate work flow for data validationprocesses.• A web services front end was added to the Validation Tool thatprovides updates and corrections for data to be used in thescheduling tool (SIMTO)55
  • 56Conceptual ArchitectureHub and Spoke PatternExternalSourceSystemsSRA(Crude)ICTSSAPPS DF RSPF RBS&OPTI(SRA)ETL“Full visibility” withlimited eventnotificationcapabilities“Integration”P to PInterfaces(Driven by SubTeams) –(Stored Procs, ETLor Connect Direct)WebaccessDashboard/KPILynxReporting/AnalyticsArchitectureADHOCReporting/QueriesDrillDown,OLAPMetadataReporting/AnalyticsToolLynx DataWarehouses(regional & global)Operational DataStores/StagingETLCommon Data ModelMaster DataManagementCommon Business TransformationsSQL Server databaseETL“Availability”
  • 57Lessons learned 3– Hybrid architecture– To facilitate multiple data integration points with complex datatranslations. Most data was required in real time to capture tradedeals.– ETL platform for application integration with robusttransformation.– Orchestration tool (Bitzttalk) facilitates work flow and dataintegration with external parties and systems.57
  • 58Logical Architecture - Hybrid integration patternServicesServicesService ProvidersTransport ProvidersPipeline3rd PartyLeasesExtex(Royalty Payments)Market DataProvidersDeal ConfirmExchangesBanksCounterpartyInspectionTerminalShipPortsRailTrucks4GENTax“SOG”“Corporate Credit”CashflowNetback“Master Data”- EA Master Data- EA Facilities- SAPSAPSAPRolfe & NolanNAVARIKTrading 1 Trading 2 Trading 3PriceCreditMDM +XrefORCHESTRATIONORCHESTRATIONRTRIntraday PositionSnapshot DBETLBIDocumentManagementMRASAPXIMPA PriceNomsConfirmsShip StatusLiftingSchedulesRatingsDealsStatementsRoyaltyVolsDealsCorporateCreditServicesSOGServicesSAPXITAXClientsPort ActivityCredit LimitMaster DataCredit ServicesRiskAlgoLicenseMgmR&NServicesRisk ServicesPrice Services Master &Xref ServicesUnstructuredMarket/CP dataMaster ContractsLease VolsRatesPaymentsActualVolumesInspectionReportsConsolidatedPositionViewerBrokersRating AgencyRailTracCVMS/ShipnetClientsConfirmsTicketsSchedulesRefineriesExchangeAllocationCPServicesBankingServicesExchangeCutsNews/DataAR/AP/GLInvoicesCreditExposurePlansRail CarShip infoShipSchedulesMovementToolsEnterpriseFacilitiesCreditEngineValuationLibraries
  • 5959IT IntegrationBest Practices
  • 6060Best Practices for Data Integration1. Don’t loose sight of DI Architecture vision, however include tactical dataintegration solution for specific business requirements. (phasedapproach)2. Categorize data in business value and usage. (prioritize)3. Prioritize the sequence of implementing data integration. (sequence)4. Document data migration and infrastructure deployment roadmap.5. Establish new standards for naming, data types and metadata.(governance)6. Publish metadata definitions and glossaries of business terms.7. Establish a coexistence strategy with legacy systems. Always have amigration plan.8. Establish physical reference architecture and tools.9. Implement environments for the foundation components ahead of time.10.Begin data migration into the integrated environment.
  • 6161Planning for Data Migration•Data Migration (Conversion) from legacy system to the newly integratedenvironment needs to be considered carefully by weighing highest valuevs. highest usage.– Foundation Data Migration - Implementing the main lookupdata, or master data, for enterprise– Core transactional data migration - Detailed transactions for thebasic enterprise events– Application data migration – Supports specific companyfunctions•This strategy leverages the building of foundational master data thatwill be most often queried by end users, then adding core transactionaldata that adds value and incrementally allows more business value asdata becomes richer in content.
  • 6262DataGovernanceDataStructureDataQualityDataManagementCapabilitiesDataCreationDataStorageDataMovementDataUsageDataRetirement• Data Ownership• Data Stewardship• Data Policies• Data Standards• Data Workflow• Data Modeling• Data Taxonomy• Business ProcessFlows• Data Profiling• Data Cleansing• Data Transformation• Data Monitoring• Data Compliance• Data TraceabilityMaster Data &Metadata• Master DataManagement• Reference DataManagement• MetadataManagementData Management Foundation
  • 6363Integrating Data Content and Meaning•Another aspect of data integration is standardizing the usage of data contentand meaning. This type of data content integration yields business efficienciesand quality of data.– Integration of content standardizes data values, e.g. lookup codes, acrossdifferent data bases. (For example, if PI Tag or P&ID needs to be uniquelyidentified at the global level across all refineries, a newly defined uniqueID can be created and tied with existing ID.) Depending on local operationor global data analysis, two sets of ID can be translated and delivered tosatisfy user request.– Besides the physical data movement and storage of integrated databases, the common integration of data meaning needs to bestandardized. Metadata provides definitions of subject areas, tables, andcolumns in a data repository.– When all users refer to the data repository, the meaning of each dataelement is standardized to a common definition.– Additional metadata can be provided that displays calculations forderived data elements, glossaries of business terms, and lineage of thesource of data.
  • 6464Data Integration Architecture considerationsCommonality, consistency and interoperability of DI components: Minimal number of products or product suites supporting all datadeliveries Single metadata repository and/or the ability to share metadataacross all components Common design environment to support all deliverables Interoperability with other integration tools and applications Efficient support for all data deliveries regardless of runtimearchitecture (centralized vs. distributed )
  • 6565Decision Making methodologyTop-down•Integrate Use Case with Pattern Matching•Using integration-pattern matching, look for matches by comparing theirspecific use cases with “typically deployed” Data Integration patterns.Examples:• To improve Global Manufacturing-wide reliability reporting, theappropriate integration pattern would be an enterprise datawarehouse that physically consolidates and summarizes OE datafrom across all refineries.• To provide operational DCS information to business levelapplications and for operational BI, an replicated operation datastore that stores up-to-the-second transactional data would be thebest fit.• To support upstream or downstream product movement analysisand establish a performance, a data mart or an OLAP cube sourcedfrom the ODS or Data Warehouse would be the best pattern.
  • 6666Decision Making methodologyBottom-upAssessing integration factorsThis is often valuable where the DI decision is complex and/or where a clearintegration pattern match is not obvious. For example, to determine whethervirtual, physical or a hybrid combination: If data extracted from many source systems could be used by many othersystems, then physical data store is good for data reuse and futureexpansion. If significant data cleansing and complex transformation are required,then physical data consolidation is typically the most practical choice. If harmonized data need to be aggregated, summarized to provide foranalytical dashboard, then physical data store is needed to load into DataWarehouse/Data Mart and/or OLAP cubes. If source systems are mostly available as system of record, data can bepassed between systems without significant data matching, merging orharmonizing, then virtual makes sense. Hybrid combination may be a good choice if a project has both real-timebusiness process integration and large amount of data interfaces.
  • 6767Data Integration Patterns
  • 6868IT IntegrationIndustry Research
  • 6969Integration Product Comparison
  • 7070Magic QuadrantData Integration Tools
  • 7171Traditional EAI vs. ESBLightweight, distributed, standards-based and inexpensiveComplex, proprietary, centralized, and costly integrationFlexible and adaptive business logicLack of support for new business logicAbstractionKnown ImplementationMessage OrientedObject and Message OrientedLoosely Coupled with coarse-grained Business ServicesTightly Coupled with use of proprietary adaptersServices OrchestrationApplication BlockDesigned to changeDesigned to lastProcess OrientedFunctionality OrientedService Oriented Architectureshub-and-spoke architectureESBTraditional EAI
  • 7272Use Cases of EAI, ETL, EAI + ETL•EAI SoftwareAn example - During the Internet boom, companies flocked to EAI to connect e-commercewith back-end inventory and shipping systems to reflect product availability and deliverytimes.•ETL toolset in an ‘always awake’ mode – near real timeTo deliver near-real-time capabilities. The ETL tools typically use application-level interfaces to detect new transactions or events as soon as they aregenerated by the source application. They then deliver these events to anyapplication that needs them either immediately (near real time), at predefinedintervals (scheduled), or when the target application asks for them (publish andsubscribe).•EAI plus ETLEAI tools captures data and application events in real time and passes them tothe ETL tools, which transform the data and loads it into the BI environment.
  • 737373What vendors say about ESB?– Some stress the role of the ESB in eBusiness, its inter-organizational. Rather than intra-organizational role– Almost all believe, that the ESB is more than the bus it runs on. Essentially, they aredescribing a service-oriented architecture from another viewpoint– Some see orchestration as part of the ESB architecture, others do not– Some package MOM and EAI in their ESB products– Some identify event monitoring as the major differentiator from MOM– Some consider services management as part of the ESB solution– Some see an ESB as strictly related to Web services and describe it as a Web ServicesNetwork.All Vendors are “flexible” in defining ESB. Their definition always manages to show that theircurrent solutions are using it
  • 7474ESB, When to Consider– When deploying SOA across the enterprise– When establishing business processes (BPM) and orchestrationinfrastructure that will leverage a business services layer– When moving from a complex point-to-point or ‘spaghetti’ architectureto a more manageable and flexible IT infrastructure– When integrating to multiple and heterogeneous data sources andapplications– When there is heavy business logic and security through the service busto multiple end points– When further separation from composite applications is required (awayfrom underlying implementations)– When flexible coupling is required
  • 7575Information (Data) Services in SOA•For data to be a first-class citizen in the SOA world, a clear separation must exist between dataconsumers and data providers. This separation mirrors the principle that service consumers andproviders must be distinct and separate in an SOA. Furthermore, this separation must be delineated byan interface, or contract, that both providers and consumers share
  • 7676Gartner on SOA and Data ServicesGartner suggested that success in loosely-coupled service-oriented business applications(SOBAs) becomes more difficult since each design point has to verify it own semantics,context and data structures.Key FindingsUnder a loosely-coupled architecture, data stewardship and governance best practices can besupported by data services within an SOA instead of embedding such practices within applicationlogic. Where people and processes were formerly embedded in application design, they now fallunder the domains of business process platforms and EIM - Enterprise Information Management.PredictionsBased on lessons learned through data warehouse, data mart and operational data storeimplementation practices, 60% of failed information-as-a-service initiatives through 2009 will list alack of an effective data governance strategy as one root cause of failure.RecommendationsOrganizations should begin their selection of data profiling, quality, mining and master datamanagement tools with the end goal of deploying all the logic and processing within these tools asservices that can interoperate and execute actions on behalf of and against data used by SOBAs, and as acallable service by business context services.
  • 7777Composite solutions• Some of the approaches promoted by the Web 2.0 movement (mash-ups, RIA - Rich InternetApplications) are moving the Integration challenges up to the presentation layerSAP Work Management& Purchasing PersonalManagementDrilling InformationCollaboration"As Is"Business Process: 3.0 Set-up New WellSub process: 3.3 Set-up Well OwnershipCompany: APCVerison 1.0, Version Date 2/28/013.3.2CREATE TEMP WELLFILE AND CHECKLISTOF STEPS TOCOMPLETE D.O.PROCESS(LAND CLERK)R.O.W.L.DRILLINGTITLEOPINIONTITLE CURATIVESCONTRACTS ANDLEASES FOR UNITPLAT (IF NEEDED)SPACING/ POOLINGINFORMATION3.3.5DELIVER WELLFILE TO D.O.MANAGER(LAND CLERK)3.3.6ASSIGN WELL FILETO LAND ADMINDIVISION ORDERANALYST(D.O. MANAGER/SUPERVISOR)3.3.7REVIEW WELLFILE FORCOMPLETENESS(LAND ADMIN)3.3.9ANALYZE AREATO DETERMINE IFIN A PRIORITYMARKETING AREA(LAND ADMIN)PAPER PAPER3.1.19TRACK PARTNERAFE RESPONSES(LAND ADMIN)3.5.1PLACE DRILLINGREPORT WITH "FINALREPORT" STATUS ONNETWORK DRIVE(PROD CLERK)3.3.1SET-UP 100% APCBILLING SCHEDULEIN EXCALIBUR(JIB)AB3.3.3SEND R.O.W.L. TOJIB(LANDMAN)PAPER3.3.8COORDINATE WITH LANDMANFOR MISSING FILE INFO. (LANDADMIN DIVISION ORDERANALYST)3.3.4UPDATE BILLINGSCHEDULE WITHTRUE JIBINTEREST(JIB)PRE-DRILL ACTIVITIES"As Is"Business Process: 3.0 Set-up New WellSub processes: 3.1 Set-up Drilling AFECompany: UPRVersion 1.1, Version Date 3/5/013.1.1RUN WELLECONOMICS INOGRE(RESVR ENGR)3.1.2TEAM MTNG TOCOMMUNICATENEED FOR AFE,LEASE ANDWELL STATUS3.1.3SET-UP WELLNUMBER INWINS(ENGR TECH)3.1.4CREATE $0.00PENDING AFEIN WINS(LAND SPEC)3.1.5COMPLETE ANDPRINT AFE (LANDSPEC)3.1.6ENTER $0.00 AFEIN EXCALIBUR(FIN SPEC)3.1.9APPROVE AFEBY COMMITTEEMEETING(CROSS-DEPT)E-MAILE-MAIL,PHONE orFAX3.1.7NOTIFYLANDMAN AFE ISCOMPLETE(LAND SPEC)PRINTEDINTERNAL AFE3.1.10SEND SIGNEDAFE TOFINANCIAL SPEC(LAND SPEC)3.2.2SET-UP WELLNUMBER INPERC/ DIMS(AUTO)AUTO3.1.8NOTIFYENGINEERINGTECH AFE ISCOMPLETE(LAND SPEC)SIGNEDAFEA3.2.1SET-UP WELLNUMBER INEXCALIBUR(AUTO)MARKETINGPRICEINFORMATIONG + GFORECASTECONOMICFORECASTWELL-UNITOWNERSHIP(LANDMAN)"To Be" for 2001Business Process: 3.0 Set-up New WellSub-process: 3.3 Set-up Well OwnershipVersion 1.5, Version Date 7/18/013.3.1PREPARE STAKE/PERMIT PACKAGEIN WORD(LANDMAN)3.3.4BEGIN RELEASEOF WELLLOCATION MEMO(ROWL) IN WORD(LANDMAN/LANDEXPLORATIONSPEC)3.3.2ORDERTITLEOPINION(S)(LANDMAN)3.3.3BUILDWELL/ UNITFILES(LANDMAN)3.3.7EVALUATE PIPELINECONNECTIONS TOWELL, PRIORITY OFMARKETING AREA(FIELD SERV)A3.3.5REVIEW JOACONTRACTOWNERSHIP INCONTRACTS(LANDMAN)3.3.6REVIEW ORCREATE CROSSREFERENCE OFJOA TO WELL(S) INWINS(LANDMAN)3.3.8CAPTUREPRELIMINARYWELL OWNERSHIPIN ROWL(LANDMAN)PRE-DRILL ACTIVITIES3.1.10RECEIVE REQUESTFOR NEW WELLDRILL AFEOWNERSHIP (LANDADMIN SPEC)3.3.10ENTER LEASES ANDCONTRACTS INTO WINS;SET UP APO INTERESTS;SET INTEREST FINAL-LANDFLAG(LAND EXPLORATIONSPEC)3.3.13REVIEW MKTGARRANGEMENTSET-UP FOR ANYOWNER CHANGES(FIELD SERV)3.3.14REVIEW JIBDECK FOR ANYOWNERCHANGES(JIB ACCT)3.3.12COMPLETE ANDAPPROVE ROWL(LANDMAN/LANDEXPLORATIONSPEC)3.3.9SEND WELL WORKINGINTEREST PARTNERSAND PERCENTAGES TOBUSINESS SERVICES(LAND EXPLORATIONSPEC)EMAIL AND POST TO NETWORK DRIVE3.3.15ANALYZE ROWL FORDRILLING/COMPLETION INFO INDIMS, WINS, PDB(OPERATIONS TECH)3.1.33UPDATE FINALINTERESTS BASED ONPARTNERS RESPONSESIN ROWL (LAND ADMINSPEC)EMAIL 3.3.11ADD/COMPLETENACU DATA TOROWL(LANDMAN/LANDADMIN ANALYST)TITLE CURATIVE,TITLE OPINIONS,ETC.3.3.163.4.1PRELIMINARY D.O.HEADER AUTOESTABLISHED INDOMAINDocumentsKnowledgeManagementPlanningProcess Guides• Presentation Integration Serversenables the creation of CompositeApplications by introducing a levelof orchestration between thepresentation layer of “legacy”and composed applications• Business processes are packagedand reused by BPM toolsintroducing business processlayer composition• Solutions are built by combiningcapabilities at every level ofthe software stack: data,process and presentation
  • Web 2.0 (1)78One’s view of Web 2.0 is highly dependent on one’s background and interest, and can best bedescribed by these three anchor points:Technology and architecture – consists of the infrastructure of the Web and the concept ofWeb platforms. Examples of specific technologies include Ajax, Representational StateTransfer (REST) and Really Simple Syndication (RSS.) Technologists tend to gravitate towardthis view.Community and Social – looks at the dynamics around social networks, communities andother personal content publish/share models, WIKIs, and other collaborative-contentmodels. Most people tend to gravitate toward this view, hence, there is a lot of Web 2.0focus on “the architecture of participation.”Business and process – Web services-enabled business models and mashup/remixapplications. (A mashup is a Web site of Web application that combines content from morethan one source.) Examples include long-tail economics and advertising and subscriptionmodels such as a service (SaaS.) Of course, business people tend to zero in on this angle.
  • 79Web 2.0 (2)• Whats Old Is New Again• Most of what people call Web 2.0 is not entirely new. Many of the concepts andtechnologies have existed for some time:• For example, RSS is essentially the same as resource definition framework, a formatpopularized by Netscape during Web 1.0 and the hype around push technology.• Ajax is essentially JavaScript, dynamic HTML and asynchronous XML, all of whichhave existed for more than five years and have become well-known with the adventof high-profile implementations such as Google Maps.• Certainly, collaboration and advertising are not new.• Mashups bear a striking similarity to the SOA-derived term "compositeapplications." What is new is how some of these are used, and in whatcombinations.