EXTRACT
TRANSFORMATION LOAD
ETL—meaning extract, transform, load—is a data integration process that
combines, cleans and organizes data from multiple sources into a single,
consistent data set for storage in a data warehouse, data lake or other
target system.
• ETL pipelines are often used by organizations to:
• Extract data from legacy systems
• Cleanse the data to improve data quality and establish consistency
• Load data into a target database
Functions of ETL
• Reporting & Dashboards- Share key performance indicators (KPI)
with decision makers.
• Forecasting – Project future sales, demand, and maintenance
requirements.
• Visualization – Provide a visual way to interact with data and make
new insights.
Architecture ETL function lies at the core of Business Intelligence
systems. With ETL, enterprises can obtain historical, current, and
predictive views of real business data. Let’s look at some ETL features
that are necessary for business intelligence.
How ETL Works? ETL systems are designed to accomplish three
complex database functions: extract, transform and load.
1. Extraction The extraction phase maps
the data from different sources into a
unified format before processing.
ETL systems ensure the following while
extracting data.
• Removing redundant (duplicate) or
fragmented data
• Removing spam or unwanted data
• Reconciling records with source data
• Checking data types and key attributes.
2. Transformation This stage involves applying algorithms and
modifying data according to business-specific rules. The common
operations performed in ETL’s transformation stage is computation,
concatenation, filters, and string operations like currency, time, data
format, etc. It also validates the following-
• Data cleaning like adding ‘0’ to null values
• Threshold validation like age cannot be more than two digits
• Data standardization according to the rules and lookup table.
3. Loading is a process of migrating structured data into the
warehouse. Usually, large volumes of data need to be loaded in a short
time. ETL applications play a crucial role in optimizing the load process
with efficient recovery mechanisms for the instances of loading
failures.
A typical ETL process involves three types of loading functions-
• Initial load: it populates the records in the data warehouse.
• Incremental load: it applies changes (updates) periodically as per
the requirements.
• Full refresh: It reloads the warehouse with fresh records by erasing
the old contents.
Why is ETL important?
Organizations today have both structured and unstructured data from
various sources.
By applying the process of extract, transform, and load (ETL), individual
raw datasets can be prepared in a format and structure that is more
consumable for analytics purposes, resulting in more meaningful
insights.
For example, online retailers can analyze data from points of sale to
forecast demand and manage inventory. Marketing teams can
integrate CRM data with customer feedback on social media to study
consumer behavior.
How does ETL benefit business intelligence? Extract, transform, and load
(ETL) improves business intelligence and analytics by making the process
more reliable, accurate, detailed, and efficient. Historical context ETL gives
deep historical context to the organization’s data. An enterprise can
combine legacy data with data from new platforms and applications. You
can view older datasets alongside more recent information, which gives
you a long-term view of data.
What is ELT? Extract, load, and transform (ELT) is an extension of extract,
transform, and load (ETL) that reverses the order of operations. You can
load data directly into the target system before processing
The intermediate staging area is not required because the target data
warehouse has data mapping capabilities within it.
ELT has become more popular with the adoption of cloud infrastructure,
which gives target databases the processing power they need for
transformations.
ETL compared to ELT The primary difference between ETL (Extract,
Transform, Load) and ELT (Extract, Load, Transform) is the order in
which data is processed.
Extract Transformation Load (3) (1).pptx
Extract Transformation Load (3) (1).pptx

Extract Transformation Load (3) (1).pptx

  • 1.
  • 3.
    ETL—meaning extract, transform,load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system. • ETL pipelines are often used by organizations to: • Extract data from legacy systems • Cleanse the data to improve data quality and establish consistency • Load data into a target database
  • 4.
    Functions of ETL •Reporting & Dashboards- Share key performance indicators (KPI) with decision makers. • Forecasting – Project future sales, demand, and maintenance requirements. • Visualization – Provide a visual way to interact with data and make new insights.
  • 5.
    Architecture ETL functionlies at the core of Business Intelligence systems. With ETL, enterprises can obtain historical, current, and predictive views of real business data. Let’s look at some ETL features that are necessary for business intelligence.
  • 6.
    How ETL Works?ETL systems are designed to accomplish three complex database functions: extract, transform and load.
  • 7.
    1. Extraction Theextraction phase maps the data from different sources into a unified format before processing. ETL systems ensure the following while extracting data. • Removing redundant (duplicate) or fragmented data • Removing spam or unwanted data • Reconciling records with source data • Checking data types and key attributes.
  • 8.
    2. Transformation Thisstage involves applying algorithms and modifying data according to business-specific rules. The common operations performed in ETL’s transformation stage is computation, concatenation, filters, and string operations like currency, time, data format, etc. It also validates the following- • Data cleaning like adding ‘0’ to null values • Threshold validation like age cannot be more than two digits • Data standardization according to the rules and lookup table.
  • 9.
    3. Loading isa process of migrating structured data into the warehouse. Usually, large volumes of data need to be loaded in a short time. ETL applications play a crucial role in optimizing the load process with efficient recovery mechanisms for the instances of loading failures. A typical ETL process involves three types of loading functions- • Initial load: it populates the records in the data warehouse. • Incremental load: it applies changes (updates) periodically as per the requirements. • Full refresh: It reloads the warehouse with fresh records by erasing the old contents.
  • 10.
    Why is ETLimportant? Organizations today have both structured and unstructured data from various sources. By applying the process of extract, transform, and load (ETL), individual raw datasets can be prepared in a format and structure that is more consumable for analytics purposes, resulting in more meaningful insights. For example, online retailers can analyze data from points of sale to forecast demand and manage inventory. Marketing teams can integrate CRM data with customer feedback on social media to study consumer behavior.
  • 11.
    How does ETLbenefit business intelligence? Extract, transform, and load (ETL) improves business intelligence and analytics by making the process more reliable, accurate, detailed, and efficient. Historical context ETL gives deep historical context to the organization’s data. An enterprise can combine legacy data with data from new platforms and applications. You can view older datasets alongside more recent information, which gives you a long-term view of data.
  • 12.
    What is ELT?Extract, load, and transform (ELT) is an extension of extract, transform, and load (ETL) that reverses the order of operations. You can load data directly into the target system before processing The intermediate staging area is not required because the target data warehouse has data mapping capabilities within it. ELT has become more popular with the adoption of cloud infrastructure, which gives target databases the processing power they need for transformations.
  • 13.
    ETL compared toELT The primary difference between ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) is the order in which data is processed.