- Business Intelligence refers to tools and technologies used to collect, integrate, analyze and visualize data. Raw data from sources is transformed into meaningful information using BI technologies.
- Data warehousing involves storing data from different sources in databases to provide insights for analysis and queries. It follows the relational database model.
- The ETL process extracts data from source systems, transforms it to match business needs, and loads it into the data warehouse. This involves activities like selecting, calculating, joining, and aggregating data.
2. What is Business Intelligence
The term Business Intelligence refers collectively to the tools and technologies used for
the collection, integration, analysis, and visualization of data. The raw data which we
collect from different data sources transform into comprehensible data or meaningful
information using BI technologies.
simpo313@gmail.com 2
12/07/2023
3. To simplify the concept, we collect raw data from various sources and with the
help of Business Intelligence tools transform it into meaningful information. We
can store such data in data warehouses or data lakes in specific data structures
From the data warehouses, we can retrieve stored data in the form of a report,
query, make a dashboard to conduct data analysis. We do this with the process
known as ETL (Extract, Transform, Load).
simpo313@gmail.com 3
12/07/2023
4. So What is Data Warehousing?
Data warehousing is the process of storing data in data warehouses, which are
databases following the relational database model. Data is selected from
different data sources, aggregated, organized and managed to provide
meaningful insights into data for analysis & queries.
simpo313@gmail.com 4
12/07/2023
5. A data warehouse is known by several other terms like Decision Support
System (DSS), Executive Information System, Management Information
System, Business Intelligence Solution, Analytic Application.
We call it Decision Support System as it provides useful insights and patterns
shown by data as a result of the analysis which makes taking important
decisions in business easy and safe.
simpo313@gmail.com 5
12/07/2023
6. How does a Data Warehousing Work?
In data warehousing, data is de-normalized i.e. it is converted to 2NF from 3NF
and hence, is called Big data. We call it big data because of data redundancy
increases and so, data size increases. The sole purpose of creating data
warehouses is to retrieve processed data quickly.
Also, to provide aggregate data like totals, averages, general trends etc for
enterprises to analyze and make decisions good for their business and functioning
in the industry
simpo313@gmail.com 6
12/07/2023
7. Components of Data Warehouse
Operational Systems: These are the different operational domains in an
enterprise which serve a unique purpose and contribute in their ways for the
proper functioning of the enterprise.
Different operating systems can be marketing, sales, Enterprise Resource
Planning (ERP), etc. All of these systems have their own normalized database
simpo313@gmail.com 7
12/07/2023
8. Integration Layer:
The normalized data is present in the operational systems must not be manipulated.
Instead, a copy of that we take data into an integration layer staging area where
manipulate and transform it in specific ways.
One basic operation done is bringing the copied data into a single standardized
format because, in the operational systems, data is not present in the same format.
For instance, in a data field, the data can be in pounds in one table, and dollars in
another.
simpo313@gmail.com 8
12/07/2023
9. Data Warehouse:
The transformed and standardized data flows into the next element, known
as the data warehouse which is a very large database. So, the data stores
from all over the enterprise in this data vault in the second normal form
having a certain uniform format and structure
simpo313@gmail.com 9
12/07/2023
10. Data Marts:
These are the purpose-specific sub-databases of the data warehouse containing only some
parts of the entire big data. In each data mart, only that data which is useful for a particular
use is available like there will be different data marts for analysis related to marketing,
finance, administration etc.
Each of these databases does not coincide or share their data with each other and operations
performed in each of them does not influence the other. This makes fetching data from the
data marts much faster than doing it from the much larger data warehouse.
simpo313@gmail.com 10
12/07/2023
11. Business Intelligence and Data
Warehousing
Data warehousing and Business Intelligence often go hand in hand, because the
data made available in the data warehouses are central to the Business
Intelligence tools’ use.
BI tools like Tableau, Sisense, Chartio, Looker etc, use data from the data
warehouses for purposes like query, reporting, analytics, and data mining.
simpo313@gmail.com 11
12/07/2023
12. In any enterprise, Business Intelligence plays a central
role in the smooth and cost-effective functioning of it
BI is helpful in operational efficiency which includes ERP
reporting, KPI tracking, risk management, product profitability, costing,
logistics etc.
helps in customer interaction which includes, sales analysis, sales forecasting,
segmentation, campaign planning, customer profitability etc.
simpo313@gmail.com 12
12/07/2023
13. . From our prior discussions, we know that data warehouses store processed
and aggregated data. Business Intelligence tools require such data from the
data warehouses.
The data is transported through the Online Analytical Processing (OLAP). Data
warehousing and OLAP has proved to be a much-needed jump from the old
decision-making apps which used OLTP.
simpo313@gmail.com 13
12/07/2023
15. – Architecture and Process of data
warehousing and BI
In this section, we will see how to extract, transform and load raw data into
data warehouses. Also, we discuss how BI tools use it for analytical purposes.
Refer to the image given below, to understand the process better
simpo313@gmail.com 15
12/07/2023
17. Step 1: Extracting raw data from data sources like traditional data, workbooks, excel
files etc.
Step 2: The raw data that is collected from different data sources are consolidated
and integrated to be stored in a special database called a data warehouse The process
by which we fetch the data into data warehouses from the source is ETL (Extract,
Transform, Load). This extracts raw data from the original sources, transforms or
manipulates it different ways and loads it into the data warehouse.
simpo313@gmail.com 17
12/07/2023
18. Step 3: If you wish to use data from the data warehouse for specific purposes
like marketing analysis, financial analysis etc., subsets of the data warehouse
are created known as data marts and data cubes. Data from the data
warehouse to the data marts also goes through the ETL.
simpo313@gmail.com 18
12/07/2023
19. Step 4: From both data warehouse and data marts, data is redirected to data
or OLAP cubes which are multi-dimensional data sets whose data is ready to
be used by front-end BI tools or clients.
At the front-end, exists BI tools such as query tools, reporting, analysis,
and data mining. These BI tools query data from OLAP cubes and use it for
analysis.
simpo313@gmail.com 19
12/07/2023
20. Summary
Thus, Business Intelligence and Data Warehousing are two important pillars in the
survival of an enterprise. It helps to keep a check on critical elements like CRM,
ERP, supply chain, products, and customers.
The Business Intelligence and Data Warehousing technologies give accurate,
comprehensive, integrated and up-to-date information on the current situation of
an enterprise which supports taking required steps and making important
decisions for the company’s growth
simpo313@gmail.com 20
12/07/2023
21. Understanding Data Warehouse-its features
Definition
Data warehouse is Subject Oriented, Integrated, Time-Variant and Non-volatile
collection of data that support management's decision making process.
Food for thought.
“what is e difference between data warehouses and Operational Databases?”
simpo313@gmail.com 21
12/07/2023
22. Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer, product, sales,
employees.
This subject specific design helps in reducing the query response
time by searching through very few records to get an answer to the
user‟s question.
simpo313@gmail.com 22
12/07/2023
23. Data Warehouse—Integrated
Constructed by integrating multiple, heterogeneous data sources
relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among
different data sources
E.g., When short listing your top 20 customers, you must know that “HAL” and “Hindustan Aeronautics
Limited” are one and the same. Much of the transformation and loading work that goes into the data
warehouse is centered on integrating data and standardizing it.
simpo313@gmail.com 23
12/07/2023
24. Data Warehouse—Time Variant/time
referenced data
The time horizon for the data warehouse is significantly longer than
that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a historical perspective (e.g.,
past 5-10 years)
For example, the user may ask “What were the total sales of product
A for the past three years on New Year’s Day across region Y ‟?”
simpo313@gmail.com 24
12/07/2023
25. And………
Time-referenced data when analyzed can also help in spotting the hidden
trends between different associative data elements, which may not be
obvious to the naked eye. This exploration activity is termed “data mining”.
simpo313@gmail.com 25
12/07/2023
26. Data Warehouse—Non-Volatile
Once data is in, it will not change, historical data in DW
should never be changed. This enables management to get a
consistent picture of the business
simpo313@gmail.com 26
12/07/2023
27. Metadata
Metadata is simply defined as data about data. For example the index of a book serve
as metadata for the contents in the book. In other words we can say that metadata
is the summarized data that lead us to the detailed data.
In terms of data warehouse we can understand metadata as following:
Metadata is a road map to data warehouse.
The metadata act as a directory. This directory helps the decision support system to
locate the contents of data warehouse.
simpo313@gmail.com 27
12/07/2023
28. Metadata Respiratory
The Metadata Respiratory is an integral part of data warehouse system. The
Metadata Respiratory contains the following metadata:
Business Metadata - This metadata has the data ownership information, business
definition and changing policies.
Operational Metadata -This metadata includes currency of data and data lineage.
Currency of data means whether data is active, archived. Lineage of data means
history of data migrated and transformation applied on it.
simpo313@gmail.com 28
12/07/2023
29. Data for mapping from operational environment to data warehouse -This
metadata includes source databases and their contents, data extraction, data
partition, cleaning, transformation rules, data refresh and purging rules.
The algorithms for summarization - This includes dimension algorithms, data
on granularity, aggregation, summarizing etc.
simpo313@gmail.com 29
12/07/2023
30. Data cube
Data cube help us to represent the data in multiple dimensions. The data cube is defined by
dimensions and facts.
Illustration of Data cube
Suppose a company wants to keep track of sales records with help of sales data warehouse with
respect to time, item, branch and location. These dimensions allow to keep track of monthly
sales and at which branch the items were sold. There is a dimension table table associated
with each dimension. This dimension table further describes the dimensions. For example
"item" dimension table may have attributes such as item_name, item_type and item_brand.
simpo313@gmail.com 30
12/07/2023
31. The following table represents 2-D view of Sales Data for a company with
respect to time, item and location dimensions
simpo313@gmail.com 31
12/07/2023
32. But here in this 2-D table we have records with respect to time and item only.
The sales for New Delhi are shown with respect to time and item dimensions
according to type of item sold.
If we want to view the sales data with one new dimension say the location
dimension. The 3-D view of the sales data with respect to time, item, and
location is shown in the table below:
simpo313@gmail.com 32
12/07/2023
34. The above 3-D table can be represented as 3-D data cube as
shown in the following figure:
simpo313@gmail.com 34
12/07/2023
35. Data mart
Data mart contains the subset of organisation-wide data. This subset of data is valuable to
specific group of an organisation. in other words we can say that data mart contains only that
data which is specific to a particular group.
For example the marketing data mart may contain only data related to item, customers
and sales. The data mart are confined to subjects.
simpo313@gmail.com 35
12/07/2023
36. Points to remember about data marts:
Data mart are small in size.
Data mart are customized by department.
The source of data mart is departmentally structured data warehouse.
Data mart are flexible.
simpo313@gmail.com 36
12/07/2023
38. Process Flow in Data Warehouse:
The ETL Process
Everyone understands the three letters:
You get the data out of its original source location (E), you
do something to it(T), and then you load it (L) into a final
set of tables for the business users to query
simpo313@gmail.com 38
12/07/2023
39. THE ETL Process
Extract, transform, and load (ETL) is a process in data warehousing that
involves:
extracting data from sources systems; (these are the (OLTP) On Line Transaction
Processes)
transforming the extracted data to match business needs.
loading the transformed into the data warehouse
39
simpo313@gmail.com 12/07/2023
40. Extract
The first part of an ETL process is to extract data from the source systems.
Data warehousing projects consolidate data from different source systems.
Each separate system may also use a different data format.
40
simpo313@gmail.com 12/07/2023
41. Transform
This phase applies a series of rules or functions to the extracted data to
derive the required data format to be loaded in the data warehouse.
Some data sources will require very little manipulation of data. In other
cases, one or more of the following transformations types may be required
7/12/2023 41
simpo313@gmail.com
42. POSSIBLE DATA TRANSFORMATIONS
1. Selecting only certain columns to load (or selecting null columns not to
load)
2. Translating coded values (e.g., if the source system stores M for male and F
for female, but the warehouse stores 1 for male and 2 for female)
3. Deriving a new calculated value (e.g., sale_amount = qty * unit_price)
7/12/2023 42
simpo313@gmail.com
43. Summarizing multiple rows of data (e.g., total sales for each region)
Joining together data from multiple sources (e.g., lookup, merge,
etc.)
Splitting a column into multiple columns (e.g., putting a comma-
separated list specified as a string in one column as individual values
in different columns)
Generating surrogate key values
7/12/2023 43
simpo313@gmail.com
44. Transformation types
Data must be merged from different systems, e.g. one source may store the
same information with a different structure.
Data must be scrubbed for inconsistencies in e.g. spelling errors or
variations. It is a good idea to use surrogate keys: keys maintained at the
data warehouse that are independent of keys from the data sources.
Data must be pre-aggregated for faster analysis.
44
simpo313@gmail.com 12/07/2023
45. Load
The load phase loads the data into the data warehouse. The data loaded can
be used to support BI eg for reporting purposes
7/12/2023 45
simpo313@gmail.com