SlideShare a Scribd company logo
1 of 29
Data Integration & Data QualityData Integration & Data Quality
Your open source based BI solution!!
by
Introduction to Data Quality
What is Data Quality?
Why Data Quality?
Concepts
Data Quality advantages
Data Quality & Business Intelligence
BI Tenets
Data integration
Best practices
Open Source & Data Quality
Data Quality & Pentaho Data Integration (PDI)
PDI / ETLs / Integrity / Validation
Data Cleaner
Integration Data Cleaner and PDI
Table of contents
Initial Contact
Customer Successes
Private Sector
Public Sector
Introduction to Data QualityIntroduction to Data Quality
http://optimizeyourdataquality.wordpress.com/
Introducción
What is Data Quality?What is Data Quality?
Non-standard definition
“The processes and technologies involved in
ensuring the conformance of data values to
business requirements and acceptance criteria”
Search of attributes on data:
Accuracy
Consistency
Integrity
Validity
http://unitar.org
Introduction
Why Data Quality?Why Data Quality?
Introduction
ConceptsConcepts
Data governance
Strategic decision making
improved and faster
Managing data
quality: a critical issue
Introduction
Data Quality tasks must be performed in data
integration stage
Data Quality benefitsData Quality benefits
Introduction
Suitable Customer Segmentation  Customer Satisfaction
Avoid processing unreliable data  Cost reduction
Trustable and valuable information
Improving Business Processes Increase profits
& Business& Business
IntelligenceIntelligence
What is Business Intelligence?
(BI)
The ability to apprehend the
interrelationships of presented
facts in such a way as to guide
action towards a desired goal
Data Quality & Business Intelligence
Visual tools for optimal and simple
analysis
Robust and Trustable data
Business Intelligence TenetsBusiness Intelligence Tenets
Processes involved:
•Data integration
•Efficient usage of company information
Data IntegrationData Integration
Key for any BI project
ETL = Extract, Transform and Load
Data Integration process involves data moving from different
sources, data transformation and storing in unified databases: data
warehouse / data marts.
Data Quality & Business Intelligence
Main tasks:
Extract data from multiple sources
Ensuring clean consistent data
Combining data
Load data in a DW
http://blog.bootstraptoday.com
CRM
ERP
BPM
CMS
Data Quality & Business Intelligence
CHALLENGES:
Heterogeneous data sources
Large data volumes
Improve operational efficiency
Data source synchronization
Scalability
Data integration and Data Quality, closely related conceptsData integration and Data Quality, closely related concepts
Data IntegrationData Integration
Data Quality process can be performed in different ways:
Manual  Ad-hoc queries, file searching, etc…
Automated  Included in data integration process
Both are complementary though:
Data Quality tasks as a part of Data Integration process (ETL)Data Quality tasks as a part of Data Integration process (ETL)
Data Quality & Business Intelligence
Data integrationData integration
Best ETL practicesBest ETL practices
Centralize procedures: Ensure homogeneity and consistency of data from a
great variety of sources.
Avoid redundant calculations: if a calculation has been calculated
previously, avoid repeating the same operation. Improves performance and
avoids possible inconsistencies.
Establish points of “quality control”: ensures the execution of the process at
key points and allows recording track data for future audits.
Implement information reloading processes: useful to avoid initial loading
issues/failures.
Use intermediate structures: Eases monitoring and process monitoring
Data Quality & Business Intelligence
Best ETL practicesBest ETL practices
Data Quality & Business Intelligence
Centralized and
standardized processes
Checkpoints and
registrations
Intermediate structures
Apply BI techniques to data
quality process
Analyze and take the best of
data quality results
Allows
Open SourceOpen Source &&
ETL tools and Data QualityETL tools and Data Quality
Pentaho Data Integration
Talend Open Studio
DataCleaner
Talend Data Quality
Google Refine
Open Source & Data Quality
Data Quality Open Source solutions:
Main ETL Open Source solutions
Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
Intuitive ETL tool based in jobs and transformations
Freedom to decide where and how performs tasks: profiling, cleansing,
integrity, validation; base on metadata;
Data Quality oriented components available on PDI transformations.
Not a pure profiling tool, however DataCleaner can be integrated
Plug-in architecture that allows expanding its functionalities.
Open Source & Data Quality
Open Source & Data Quality
Component variety:
Cleansing
Scripting (sql, javascript)
Validation
Statistics
Etc…
Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
Open Source & Data Quality
An accurate ETL divided in several phases is essential:
1. Preparation process
2. Data receipt
3. Data processing
4. Final Load
5. Result reports
6. Activity control
This approach allows:
Standardizing processes in an organization
Scale better by increasing the amount of sources
Centralized control of process results
Data CleanerData Cleaner
Open Source & Data Quality
Profiling tool recommended by Pentaho
Alternative tools:
Desktop tools
Web tools
PDI Plugin
Data Cleaner DesktopData Cleaner Desktop
Open Source & Data Quality
Functionalities:
Data Cleansing
Data dictionaries
definition
Search for patterns,
duplicates, null check,
etc.
Monitoring
Complete execution
stats
Etc.
Data Cleaner Monitor (web)Data Cleaner Monitor (web)
Open Source & Data Quality
Functionalities:
Centralized monitoring
Smart visualization
Schedule execution of
Data Cleaner and PDI
jobs
Create custom metrics
Etc.
Integration Data Cleaner / PDIIntegration Data Cleaner / PDI
Open Source & Data Quality
After installing PDI Data Cleaner plug-in, there are two usage possibilities:
Option A Profile data using a PDI step
Integration Data Cleaner / PDIIntegration Data Cleaner / PDI
Open Source & Data Quality
After installing PDI Data Cleaner plug-in, there are two usage possibilities:
Option B Executing a Data Cleaner job
References
International Association for Information and Data
Quality:
http://iaidq.org/
Pentaho Data Integration:
http://www.pentaho.com/explore/pentaho-data-integration/
Data Cleaner:
http://datacleaner.org/
About us
www.TodoBI.com
info@stratebi.com
www.stratebi.com
More information:
Tfno: 91.788.34.10
MadridMadrid: Pº de la Castellana, 164, 1º
BarcelonaBarcelona: C/ Valencia, 63
BrasilBrasil:: Av. Paulista, 37 4 andar

More Related Content

What's hot

Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Introduction to metadata management
Introduction to metadata managementIntroduction to metadata management
Introduction to metadata managementOpen Data Support
 
Data Quality
Data QualityData Quality
Data QualityVijaya K
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDATAVERSITY
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodologyDatabase Architechs
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachFindWhitePapers
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Burak S. Arikan
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaionsridhark1981
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 

What's hot (20)

Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Introduction to metadata management
Introduction to metadata managementIntroduction to metadata management
Introduction to metadata management
 
Data Quality
Data QualityData Quality
Data Quality
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodology
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step Approach
 
Data science
Data scienceData science
Data science
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 

Similar to Data Quality Integration (ETL) Open Source

Intro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro softwareIntro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro softwarerafeq
 
Intro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent SoftwareIntro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent Softwarerafeq
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayTorana, Inc.
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information StewardVinny (Gurvinder) Ahuja
 
Enterprise Architecture
Enterprise Architecture Enterprise Architecture
Enterprise Architecture gdavie
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
DDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...Cognizant
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And IntegrityGerrit Klaschke, CSM
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxBalvinder Hira
 
Data Governance That Drives the Bottom Line
Data Governance That Drives the Bottom LineData Governance That Drives the Bottom Line
Data Governance That Drives the Bottom LinePrecisely
 
AnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS
 
Targeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
Targeted Analytics: Using Core Measures to Jump-Start Enterprise AnalyticsTargeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
Targeted Analytics: Using Core Measures to Jump-Start Enterprise AnalyticsPerficient, Inc.
 
Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Edgewater
 
Computerized system validation_final
Computerized system validation_finalComputerized system validation_final
Computerized system validation_finalDuy Tan Geek
 
The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance Precisely
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingCognizant
 
You Need a Data Catalog. Do You Know Why?
 You Need a Data Catalog. Do You Know Why? You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 

Similar to Data Quality Integration (ETL) Open Source (20)

Intro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro softwareIntro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro software
 
Intro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent SoftwareIntro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent Software
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward
 
Enterprise Architecture
Enterprise Architecture Enterprise Architecture
Enterprise Architecture
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
Strategy For Data Quality
Strategy For Data QualityStrategy For Data Quality
Strategy For Data Quality
 
DDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: Datakwaliteit
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And Integrity
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptx
 
Data Governance That Drives the Bottom Line
Data Governance That Drives the Bottom LineData Governance That Drives the Bottom Line
Data Governance That Drives the Bottom Line
 
AnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS - Master Deck
AnalytiX DS - Master Deck
 
Targeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
Targeted Analytics: Using Core Measures to Jump-Start Enterprise AnalyticsTargeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
Targeted Analytics: Using Core Measures to Jump-Start Enterprise Analytics
 
Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?Is Your Data Ready to Drive Your Company's Future?
Is Your Data Ready to Drive Your Company's Future?
 
Computerized system validation_final
Computerized system validation_finalComputerized system validation_final
Computerized system validation_final
 
The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance The Persona-Based Value of Modern Data Governance
The Persona-Based Value of Modern Data Governance
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
You Need a Data Catalog. Do You Know Why?
 You Need a Data Catalog. Do You Know Why? You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 

More from Stratebi

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentesStratebi
 
Azure Synapse
Azure SynapseAzure Synapse
Azure SynapseStratebi
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with PythonStratebi
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with PythonStratebi
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasStratebi
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup SpainStratebi
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)Stratebi
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integrationStratebi
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data MarketingStratebi
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works Stratebi
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data AnalyticsStratebi
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosStratebi
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports AnalyticsStratebi
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme AnalysisStratebi
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIStratebi
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overviewStratebi
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalleStratebi
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con TalendStratebi
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend IntroducionStratebi
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent AnalyticsStratebi
 

More from Stratebi (20)

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentes
 
Azure Synapse
Azure SynapseAzure Synapse
Azure Synapse
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with Python
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with Python
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicas
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup Spain
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integration
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data Marketing
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y Cursos
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports Analytics
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme Analysis
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBI
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalle
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con Talend
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend Introducion
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent Analytics
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Data Quality Integration (ETL) Open Source

  • 1. Data Integration & Data QualityData Integration & Data Quality Your open source based BI solution!! by
  • 2. Introduction to Data Quality What is Data Quality? Why Data Quality? Concepts Data Quality advantages Data Quality & Business Intelligence BI Tenets Data integration Best practices Open Source & Data Quality Data Quality & Pentaho Data Integration (PDI) PDI / ETLs / Integrity / Validation Data Cleaner Integration Data Cleaner and PDI Table of contents
  • 5. Introduction to Data QualityIntroduction to Data Quality http://optimizeyourdataquality.wordpress.com/
  • 6. Introducción What is Data Quality?What is Data Quality? Non-standard definition “The processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria” Search of attributes on data: Accuracy Consistency Integrity Validity http://unitar.org
  • 9. Data governance Strategic decision making improved and faster Managing data quality: a critical issue Introduction Data Quality tasks must be performed in data integration stage
  • 10. Data Quality benefitsData Quality benefits Introduction Suitable Customer Segmentation  Customer Satisfaction Avoid processing unreliable data  Cost reduction Trustable and valuable information Improving Business Processes Increase profits
  • 12. What is Business Intelligence? (BI) The ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal Data Quality & Business Intelligence Visual tools for optimal and simple analysis Robust and Trustable data Business Intelligence TenetsBusiness Intelligence Tenets Processes involved: •Data integration •Efficient usage of company information
  • 13. Data IntegrationData Integration Key for any BI project ETL = Extract, Transform and Load Data Integration process involves data moving from different sources, data transformation and storing in unified databases: data warehouse / data marts. Data Quality & Business Intelligence Main tasks: Extract data from multiple sources Ensuring clean consistent data Combining data Load data in a DW http://blog.bootstraptoday.com CRM ERP BPM CMS
  • 14. Data Quality & Business Intelligence CHALLENGES: Heterogeneous data sources Large data volumes Improve operational efficiency Data source synchronization Scalability Data integration and Data Quality, closely related conceptsData integration and Data Quality, closely related concepts Data IntegrationData Integration
  • 15. Data Quality process can be performed in different ways: Manual  Ad-hoc queries, file searching, etc… Automated  Included in data integration process Both are complementary though: Data Quality tasks as a part of Data Integration process (ETL)Data Quality tasks as a part of Data Integration process (ETL) Data Quality & Business Intelligence Data integrationData integration
  • 16. Best ETL practicesBest ETL practices Centralize procedures: Ensure homogeneity and consistency of data from a great variety of sources. Avoid redundant calculations: if a calculation has been calculated previously, avoid repeating the same operation. Improves performance and avoids possible inconsistencies. Establish points of “quality control”: ensures the execution of the process at key points and allows recording track data for future audits. Implement information reloading processes: useful to avoid initial loading issues/failures. Use intermediate structures: Eases monitoring and process monitoring Data Quality & Business Intelligence
  • 17. Best ETL practicesBest ETL practices Data Quality & Business Intelligence Centralized and standardized processes Checkpoints and registrations Intermediate structures Apply BI techniques to data quality process Analyze and take the best of data quality results Allows
  • 19. ETL tools and Data QualityETL tools and Data Quality Pentaho Data Integration Talend Open Studio DataCleaner Talend Data Quality Google Refine Open Source & Data Quality Data Quality Open Source solutions: Main ETL Open Source solutions
  • 20. Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration Intuitive ETL tool based in jobs and transformations Freedom to decide where and how performs tasks: profiling, cleansing, integrity, validation; base on metadata; Data Quality oriented components available on PDI transformations. Not a pure profiling tool, however DataCleaner can be integrated Plug-in architecture that allows expanding its functionalities. Open Source & Data Quality
  • 21. Open Source & Data Quality Component variety: Cleansing Scripting (sql, javascript) Validation Statistics Etc… Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration
  • 22. Data Quality & Pentaho Data IntegrationData Quality & Pentaho Data Integration Open Source & Data Quality An accurate ETL divided in several phases is essential: 1. Preparation process 2. Data receipt 3. Data processing 4. Final Load 5. Result reports 6. Activity control This approach allows: Standardizing processes in an organization Scale better by increasing the amount of sources Centralized control of process results
  • 23. Data CleanerData Cleaner Open Source & Data Quality Profiling tool recommended by Pentaho Alternative tools: Desktop tools Web tools PDI Plugin
  • 24. Data Cleaner DesktopData Cleaner Desktop Open Source & Data Quality Functionalities: Data Cleansing Data dictionaries definition Search for patterns, duplicates, null check, etc. Monitoring Complete execution stats Etc.
  • 25. Data Cleaner Monitor (web)Data Cleaner Monitor (web) Open Source & Data Quality Functionalities: Centralized monitoring Smart visualization Schedule execution of Data Cleaner and PDI jobs Create custom metrics Etc.
  • 26. Integration Data Cleaner / PDIIntegration Data Cleaner / PDI Open Source & Data Quality After installing PDI Data Cleaner plug-in, there are two usage possibilities: Option A Profile data using a PDI step
  • 27. Integration Data Cleaner / PDIIntegration Data Cleaner / PDI Open Source & Data Quality After installing PDI Data Cleaner plug-in, there are two usage possibilities: Option B Executing a Data Cleaner job
  • 28. References International Association for Information and Data Quality: http://iaidq.org/ Pentaho Data Integration: http://www.pentaho.com/explore/pentaho-data-integration/ Data Cleaner: http://datacleaner.org/
  • 29. About us www.TodoBI.com info@stratebi.com www.stratebi.com More information: Tfno: 91.788.34.10 MadridMadrid: Pº de la Castellana, 164, 1º BarcelonaBarcelona: C/ Valencia, 63 BrasilBrasil:: Av. Paulista, 37 4 andar

Editor's Notes

  1. Data Profiling: proceso de examinar los datos que existen en las fuentes de origen y recopilar estadísticas e información sobre los mismos. Data Cleansing: proceso de detectar y corregir datos corruptos, incoherentes o erróneos. Data Integrity: proceso de analizar la consistencia de los datos y las relaciones entre los diferentes conjuntos de datos. Data Validation: proceso de aplicar reglas de validación a los datos basándose en diccionarios de datos y/o reglas de negocio. Master Data Management: conjunto de procesos, políticas, estándares y herramientas que sirven para gestionar Datos Maestros de una organización (normalmente información no transaccional). Data Auditing: proceso de gestionar cómo los datos se ajustan a los propósitos definidos por la organización. Es necesario establecer las políticas necesarias. Actuar + Vigilar. Data Governance: concepto que engloba a todos los procesos anteriores y que permite a una organización disponer de una información confiable.