Data Integration framework services = Data Integration toolset and services It is designed to help data integration project development teams and support teams: Provide common/standard re-usable components and services to perform data integration tasks (File loader, Change data capture, Rejects recycling, Publication/Subscription, Notifications,...). Provide a metadata driven development environment: highly configurable. Collect and store operational metadata for all components/processes involved in data integration projects. Provide unique web entry point (reporting tool) to monitor end to end project activity (daily monitoring, performance analysis, capacity planning, impact analysis,..). Development Team is focused on Business Rules development (project core).
Linked in 4eme table ronde 20120601
Réseau des Professionnels de la Business Intelligence en Suisse Romande 4ème Table Ronde Lausanne, le 1er juin 2012Dario ManganoHead Of Knowledge ManagementNestlé Nespresso S.A. HQ
AGENDA 14h00 Welcome 14h30 Le groupe LinkedIn 14h45 Les métadonnées de chargement 15h30 Coffee Break 15h45 Le chargement par fuseaux horaires 16h30 Futures tables rondes 16h45 Coffee 17h30 Fin Lausanne, le 1er juin 2012
AGENDALe groupe LinkedIn Dario Mangano Head Of Knowledge Management Nestlé Nespresso S.A.
Data Integration Framework Overview Da t a M a n a g e me n t S e r v ic
Data Integration Best Practices orientedData integration is a family of techniques, most commonly including ETL (extract, transform, and load),but also lots of related techniques that are inevitable when dealing with Data Integration: Metadata,Change Data Capture, File loading, Publication, Data quality,... moreover it is always involving differenttechnologies: DB server, DB scripting, Shell scripting, …All these techniques and technologies require development and support for a wide range of interfacesusing solution that can be hand-coded, based on vendor’s tool , or mix of both.With such complexity in Data Integration systems, to develop and support these solutions is becomingvery challenging.Having Best Practices and standards will ensure that all the systems are developed in a way that it ismuch easier to support and also much safer and scalable to afford future needs and data volume.The Data Integration Framework is a metadata driven development environment that is providing turnkey solutions for all these tasks around Data Integration: - Metadata - Change Data Capture - File loading - Data quality 17
Metadata Management orientedMetadata is a key feature in Data Integration and Data Warehousing.This is the only way to get answers to the following questions: - Which column did this data come from? - When was this data populated in the system? - How is calculated this result on my report? - Is my report up to date? - Is my system scalable?Having these answers will just increase the trust in the data, enable a pro active monitoring of DataIntegration processes, ensure that the data are loaded in a effective manner and at the end preventour system to lose value over the time by decreasing and absorbing the costs of understanding,maintenance and repair.The Data Integration Framework is providing a metadata management solution without anydevelopment effort required from the project team: Collecting operational metadata in real time Capturing business and technical metadata related to data integration processes 18 Integrating all these metadata in a metadata repository
Data Integration Framework (DIF): Data Mngmt Framework D IF c o m p o n e n t s R o g rvices ep rtin Metadata se Support Teams monitor Monitoring D ta p blication a pu lica n ub tio C an e D ta ile ad g ha ge a Archiving rch g F loa in File lo din Business a ture ap re A ivin C ptu h ng Source systems Rules Downstream systems Notification Graphical User Interface develop Use develop (see slide notes for comments) 19 Development Teams
Designs methods and tools to perform dataintegration services R e fe r e n c e M e tho D IF A r c h it e c t u r e ds C omponE v e n t -T r ig g e r e d E T L E T L D e v e lo p m e n t W r atp p e r en s M e t h o d o lo g y B a tc h E TL F ile L o a d e r P a r s in g , M a t c h in g & S t a n d a r d In t e g r a t io n C ha ng e D a ta C a p ture M e r g in g , M e t h o d s b y S u b je c t A r e a R e je c t M a n a g e m e n t C o n s o lid a t io n … Q u a lit y A u d it in g D a t a Q u a lit y C o n t r o l M e t h o d s D Q m o d u le P u b /S u b E v e n t B u l k P u b /S u b P a t t e r n P u b lis h e r M o d u le O p . & T e c h n ic a l M e ta d a ta M e ta d a ta d a ta mo d e l M e ta d a ta Ma na g e me nt C o lle c t io n (Operational & Technical) M e t a d a t a c o lle c t io n d a e m o n S ta nd a rd s 20
DIF: Back end modular architecture Metadata Sheduler Wrapper module Repository FileLoader module Publisher module Archiver module Purge module Data Quality module Notification module Project specific code to apply business rules and requirements (Powercenter, Shell script, Sql script, Store procedures,…) Re-usable inlcudes (logging routines, mail sending routines,…) DIF Minimal installation DIF available Modules/Services/Re- HP OV Metrics collection services usable components Powercenter Metrics collection services 21
Potential for Global Monitoring Applicationenvironments Shared Metadata Repository APP1 Project Team us e Data access layer Support Team Reusable components - Autosys wrapper Middleware Team e erver Reports - File loader APP2 use - CDC Metadata W bS - Rejects recycling Repository - Archiver - Publisher - ... Publication Extract External system e us Engines APP3 Retrieve key metadata from infrastructure and middleware components Application process Unix HP PowerCenter Scheduler Oracle Dbs Oracle Dbs servers Openview 22
Reporting services (Cognos/BO reports) – Ex1Using the reporting layer we can have access to the integrated metadata repository for all kind of reportor ad hoc query: - monitoring report - capacity planning - impact analysisExample of monitoring report with embedded navigation capabilities: Dril down button Open log file for more details 23
Reporting services (Cognos/BO reports) - Ex2The value added of having integrated metadata, is to have report showing correlated metadata on thesame view.For example this Gantt view execution report, will show if there is a correlation between a given interfaceexecution and server workload. Drill through thisThis is very useful to understandto interface step issue, but also for capacity planning purposes…. performance interface details Drill down report Gantt view for this interface 24
Reporting services (Cognos/BO reports) - Ex3Another example of the details we can get from the reports.Using publication module, the metadata will tell you what are the XML files that were produced, howmany rows were extracted from the database,..And also to which downstream applications the package was pushed to: 25
Data Integration Application Architecture Reporting services Operational Metadata Repository Cognos configuration Exception Metadata Metrics Business Metadata Logs Objects Level 2 Supprt DEV Team (L3 support) Business Users Data Integration Framework Services (modules , monitoring services ) Data Change data Rejection Exception Data Quality Notification Archiving Auditing Workflow Publication Movement capture recycle MgmtInterface ETL ETL Task(s) Task(s) ETL ETL Task(s) Task(s) Flat File FileLoader Publisher Source Task(s) Script Script Task(s) system Task(s) Task(s) Data PublicationPublication Script Staging Integration Script Layer Layer Oracle Layer DWH DB Table(s) Layer Oracle store proc store proc Oracle Oracle store proc store proc Sql script Sql script Sql script Sql script Step (s) Step(s) Step(s) Step(s) Scheduler Data Flow 27
Monitoring Implementation 1s t L e v e l DB S e r v e r s S upport Application/service level P o we r C e n t e r A p p l ic a t io n & view enables Service We b S e r v e r s Desk to rapidly intercept fatal alerts & communicate serviceSc hedul DI F M o d u l e s a n d outages to affected er users S e r v ic e s 2 nd L e v e l S upport Identify root cause of Re p o r t s & Da s h b o a r d s issue & take effective Me t a d a t a action. Data is available R e p o s it o r y for analysis to anticipate Service-L evel V iew of the D ata I ntegration A pplication issues and bottlenecks. 28 rd
AGENDALe chargement par fuseaux horaires Cedric Zbinden Dario Mangano BI architect Head Of Knowledge Management Nestlé Nespresso S.A. Nestlé Nespresso S.A.
Le charement par fuseaux horairesQuestion: Comment gérer la cohérence et l’intégrité des données du DWH lorsque les données sont chargées par zones géographiques et par fuseaux horaires ?
Le charement par fuseaux horaires Débat – propositions ?
Le charement par fuseaux horairesRésumé des discussions:-Le HQ et les marchés n’ont pas les mêmes besoins en terme derafraicissmeent de données revoir si le HQ peut se satisfaire de J-2 ?-Utiliser le chargement de schémas différents afin de ne pas requêter leschémas qui est en train d’être lo9ader, puis faire un drop partition à lafin ?-Utiliser les master cubes Cognos