Your SlideShare is downloading. ×
0
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Quality Integration (ETL) Open Source

1,500

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,500
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Data Profiling: proceso de examinar los datos que existen en las fuentes de origen y recopilar estadísticas e información sobre los mismos. Data Cleansing: proceso de detectar y corregir datos corruptos, incoherentes o erróneos. Data Integrity: proceso de analizar la consistencia de los datos y las relaciones entre los diferentes conjuntos de datos. Data Validation: proceso de aplicar reglas de validación a los datos basándose en diccionarios de datos y/o reglas de negocio. Master Data Management: conjunto de procesos, políticas, estándares y herramientas que sirven para gestionar Datos Maestros de una organización (normalmente información no transaccional). Data Auditing: proceso de gestionar cómo los datos se ajustan a los propósitos definidos por la organización. Es necesario establecer las políticas necesarias. Actuar + Vigilar. Data Governance: concepto que engloba a todos los procesos anteriores y que permite a una organización disponer de una información confiable.
  • Transcript

    • 1. Data Integration & Data QualityData Integration & Data Quality Your open source based BI solution!! by
    • 2. Introduction to Data Quality What is Data Quality? Why Data Quality? Concepts Data Quality advantages Data Quality & Business Intelligence BI Tenets Data integration Best practices Open Source & Data Quality Data Quality & Pentaho Data Integration (PDI) PDI / ETLs / Integrity / Validation Data Cleaner Integration Data Cleaner and PDI Table of contents
    • 3. Initial Contact
    • 4. Customer Successes Private Sector Public Sector
    • 5. Introduction to Data QualityIntroduction to Data Quality http://optimizeyourdataquality.wordpress.com/
    • 6. Introducción What is Data Quality?What is Data Quality? Non-standard definition “The processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria” Search of attributes on data: Accuracy Consistency Integrity Validity http://unitar.org
    • 7. Introduction Why Data Quality?Why Data Quality?
    • 8. Introduction ConceptsConcepts
    • 9. Data governance Strategic decision making improved and faster Managing data quality: a critical issue Introduction Data Quality tasks must be performed in data integration stage
    • 10. Data Quality benefitsData Quality benefits Introduction Suitable Customer Segmentation  Customer Satisfaction Avoid processing unreliable data  Cost reduction Trustable and valuable information Improving Business Processes Increase profits
    • 11. & Business& Business IntelligenceIntelligence
    • 12. What is Business Intelligence? (BI) The ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal Data Quality & Business Intelligence Visual tools for optimal and simple analysis Robust and Trustable data Business Intelligence TenetsBusiness Intelligence Tenets Processes involved: •Data integration •Efficient usage of company information
    • 13. Data IntegrationData Integration Key for any BI project ETL = Extract, Transform and Load Data Integration process involves data moving from different sources, data transformation and storing in unified databases: data warehouse / data marts. Data Quality & Business Intelligence Main tasks: Extract data from multiple sources Ensuring clean consistent data Combining data Load data in a DW http://blog.bootstraptoday.com CRM ERP BPM CMS
    • 14. Data Quality & Business Intelligence CHALLENGES: Heterogeneous data sources Large data volumes Improve operational efficiency Data source synchronization Scalability Data integration and Data Quality, closely related conceptsData integration and Data Quality, closely related concepts Data IntegrationData Integration
    • 15. Data Quality process can be performed in different ways: Manual  Ad-hoc queries, file searching, etc… Automated  Included in data integration process Both are complementary though: Data Quality tasks as a part of Data Integration process (ETL)Data Quality tasks as a part of Data Integration process (ETL) Data Quality & Business Intelligence Data integrationData integration
    • 16. Best ETL practicesBest ETL practices Centralize procedures: Ensure homogeneity and consistency of data from a great variety of sources. Avoid redundant calculations: if a calculation has been calculated previously, avoid repeating the same operation. Improves performance and avoids possible inconsistencies. Establish points of “quality control”: ensures the execution of the process at key points and allows recording track data for future audits. Implement information reloading processes: useful to avoid initial loading issues/failures. Use intermediate structures: Eases monitoring and process monitoring Data Quality & Business Intelligence
    • 17. Best ETL practicesBest ETL practices Data Quality & Business Intelligence Centralized and standardized processes Checkpoints and registrations Intermediate structures Apply BI techniques to data quality process Analyze and take the best of data quality results Allows
    • 18. Open SourceOpen Source &&
    • 19. ETL tools and Data QualityETL tools and Data Quality Pentaho Data Integration Talend Open Studio DataCleaner Talend Data Quality Google Refine Open Source & Data Quality Data Quality Open Source solutions: Main ETL Open Source solutions
    • 20. Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration Intuitive ETL tool based in jobs and transformations Freedom to decide where and how performs tasks: profiling, cleansing, integrity, validation; base on metadata; Data Quality oriented components available on PDI transformations. Not a pure profiling tool, however DataCleaner can be integrated Plug-in architecture that allows expanding its functionalities. Open Source & Data Quality
    • 21. Open Source & Data Quality Component variety: Cleansing Scripting (sql, javascript) Validation Statistics Etc… Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
    • 22. Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration Open Source & Data Quality An accurate ETL divided in several phases is essential: 1. Preparation process 2. Data receipt 3. Data processing 4. Final Load 5. Result reports 6. Activity control This approach allows: Standardizing processes in an organization Scale better by increasing the amount of sources Centralized control of process results
    • 23. Data CleanerData Cleaner Open Source & Data Quality Profiling tool recommended by Pentaho Alternative tools: Desktop tools Web tools PDI Plugin
    • 24. Data Cleaner DesktopData Cleaner Desktop Open Source & Data Quality Functionalities: Data Cleansing Data dictionaries definition Search for patterns, duplicates, null check, etc. Monitoring Complete execution stats Etc.
    • 25. Data Cleaner Monitor (web)Data Cleaner Monitor (web) Open Source & Data Quality Functionalities: Centralized monitoring Smart visualization Schedule execution of Data Cleaner and PDI jobs Create custom metrics Etc.
    • 26. Integration Data Cleaner / PDIIntegration Data Cleaner / PDI Open Source & Data Quality After installing PDI Data Cleaner plug-in, there are two usage possibilities: Option A Profile data using a PDI step
    • 27. Integration Data Cleaner / PDIIntegration Data Cleaner / PDI Open Source & Data Quality After installing PDI Data Cleaner plug-in, there are two usage possibilities: Option B Executing a Data Cleaner job
    • 28. References International Association for Information and Data Quality: http://iaidq.org/ Pentaho Data Integration: http://www.pentaho.com/explore/pentaho-data-integration/ Data Cleaner: http://datacleaner.org/
    • 29. About us www.TodoBI.com info@stratebi.com www.stratebi.com More information: Tfno: 91.788.34.10 MadridMadrid: Pº de la Castellana, 164, 1º BarcelonaBarcelona: C/ Valencia, 63 BrasilBrasil:: Av. Paulista, 37 4 andar

    ×