2. Introduction
• ETL is a processthat extractsthe datafrom different sourcesystems, then transforms
the data
• and finally loadsthe datainto the DataWarehousesystem. Full form of ETL is Extract,
TransformandLoad.
3. • The ETL process requiresactiveinputs from variousstakeholdersincluding
developers, analysts,testers, top executives and is technically challenging.
• ETL is a recurringactivity(daily,weekly,monthly) of a Datawarehousesystem
4. Extraction of data from source systems
• Source systems canbe RDBMS andfiles
• Datais extractedfromsource systems
• The mainobjectiveof this step isto retrieveall requireddatafrom sourcesystems
• The extractionstep should be designedin such a waythatit should not havenegative
effect on source systems
5. Data Transformation
• Thisstep includes cleaning,filtering,validatingandapplicationof rules to extracteddata
• The mainobjectiveof this step isto load the extracteddatainto targetdatabasewith clean
andgeneralformat
• The dataextractionisdone fromdifferentsources havingtheir ownformat
• E.g. dateformatsfrom twosources, dd/mm/yyyyand yyyy/mm/dd
6. Loading
• The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse.
• Sometimes the data is updated by loading into the data warehouse very
frequently and sometimes it is done after longer but regular intervals.
• The rate and period of loading solely depends on the requirements and
varies from system to system.
7. Introduction of data warehouse
• A Data Warehouse is Built by combining data from multiple diverse sources
• Data Warehousing is a step-by-step approach for constructing and using a Data
Warehouse.
• After the data is loaded, it often cleansed, transformed, and checked for quality
8. What is data warehouse?
• A Data Warehouse is a collection of software tools that facilitates analysis of a
large set of business data used to help an organization make decisions.
• A large amount of data in data warehouses comes from numerous sources such
that internal applications like marketing, sales, and finance; customer-facing
apps.
• It is a centralized data repository for analysts that can be queried whenever
required for business benefits.
9.
10. What is data warehousing?
The process of creating data warehouses to store a large amount of data is
named Data Warehousing.
Data Warehousing helps to improve the speed and efficiency of accessing
different data sets .
and makes it easier for company decision-makers to obtain insights that will help
the business.
11. The main goal of data warehousing
To create a hoarded wealth of historical data that can be retrieved and analyzed
to supply helpful insight into the organization’s operations.
12. Need of data warehousing.
Data Warehousing is a progressively essential tool for business intelligence.
It allows organizations to make quality business decisions.
• Business Users
• Maintains consistency
• Make strategic decisions
• High response time
13. Characteristics of data warehouse
1. Subject Oriented: A data warehouse is often subject-oriented because it delivers
may be achieved on a
particular theme .These themes are often sales, distribution, selling. etc.
2. Integrated: A data warehouse is created by integrating data from numerous
different sources such that from mainframe computers and a relational database.
3. Non-volatile: The data residing in the data warehouse is permanent
means that the data in the data warehouse cannot be erased or deleted.
14. Latest tools and technologies for data
warehousing :
1. Amazon Redshift
2. Microsoft Azure
3. Google BigQuery
4. Snowflake
5. Micro Focus Vertica
6. Teradata
7. Amazon DynamoDB
8. PostgreSQL
9. Amazon RD
10. Amazon S3
15. What is data marts
A datamart isa simple form of datawarehousefocused on a single subject or line of
business.
Witha datamart,teams canaccessdataandgaininsightsfaster,because they don’t
have to spendtime searchingwithina more complex datawarehouseor manually
aggregatingdatafrom differentsources.
16. Why create a data mart?
A data mart provides easier access to data required by a specific team or
line of business within your organization.
Teams forced to locate data from various sources most often rely on
spreadsheets to share this data and collaborate.
This usually results in human errors, confusion, complex reconciliations,
and multiple sources of truth—the so-called “spreadsheet nightmare.”
17. A data warehouse is a data management system designed to support
business intelligence and analytics for an entire organization. Data
warehouses often contain large amounts of data, including historical data.
A data mart is a simple form of a data warehouse that is focused on a
single subject or line of business, such as sales, finance, or marketing.
18. The benefits of data mart
• A single source of truth.
• Quicker access to data.
• Faster insights leading to faster decision making.
• Simpler and faster implementation.
• Creating agile and scalable data management.
• Transient analysis.
19. Architecture and components of data
warehouse
Data warehouse architecture defines the comprehensive architecture of data processing and
presentation that will be useful for data analysis and decision making within the enterprise and
organization. Each organization has different data warehouses depending upon their need, but
all of them are characterized by some standard components.
Data Warehouse applications are designed to support the user’s data requirements, an example
of this is online analytical processing (OLAP). These include functions such as forecasting,
profiling, summary reporting, and trend analysis.
The architecture of the data warehouse mainly consists of the proper arrangement of its
elements, to build an efficient data warehouse with software and hardware components. The
elements and components may vary based on the requirement of organizations. All of these
depend on the organization’s circumstances.
20.
21. 1. Source data components
In the Data Warehouse, the source data comes from different places. They are
group into four categories:
• External Data
• Internal Data
• Operational System data
• Flat files
22. 2. Data staging
After the data is extracted from various sources, now it’s time to prepare the data files for storing in the
data warehouse. The extracted data collected from various sources must be transformed and made ready
in a format that is suitable to be saved in the data warehouse for querying and analysis.
23. The data staging contains three primary functions
that take place in this part:
• Data Extraction
• Data Transformation
• Data Loading
24. 3. Data storage in warehouse
Data storage for data warehousing is split into multiple repositories.
• Metadata: Metadata means data about data i.e. it summarizes basic details
regarding data, creating findings & operating with explicit instances of data.
• Raw Data: Raw data is a set of data and information that has not yet been
processed and was delivered from a particular data entity to the data supplier
and hasn’t been processed nonetheless by machine or human.
• Summary Data or Data summary: Data summary is an easy term for a brief
conclusion of an enormous theory or a paragraph. This is often one thing where
analysts write the code and in the end, they declare the ultimate end in the form
of summarizing data.
25. 4. Data marts:
Data marts are also the part of storage component in a data warehouse. It can
store the information of a specific function of an organization that is
single authority. There may be any number of data marts in a particular
organization depending upon the functions. In short, data marts contain subsets
of the data stored in data warehouses.
Now, the users and analysts can use data for various applications like reporting,
analyzing, mining, etc. The data is made available to them whenever required.