1. Metadata for a Data Mart
Metadata describes the details aboutthe data in a data mart . Metadata is
information aboutthe data . for a data mart , metadata includes
* Description of sources ofthe data.
*Information aboutdata mart , its tables ,attributes,relationships etc.
*Frequencyofrefreshing data.
*Definitions of all types.
*Description of customization that may have taken while loading.
The primary objective ofmetadata is to provide the technical and business
views of the data mart . The metadata of a data mart is created and updated
from the load programs thatmove data from data warehouse to data mart.
Metadata in data mart can be categorized as :
1. Technicalmetadata
2. Business metadata
Technicalmetadata:-
This metadata consists of metadata created during the creation of the
data mart ,as well as metadata to supportthe managementofthe data mart.This
includes data acquisition rules ,transformation of source data into the target data.
Business metadata:-
It allows end userto understand whatinformation is available in
data mart and howit can be accessed.
The metadata of the individualdata martshould also be available to
the end userfor effective usage.
*****Designing the data mart:-
The main driving force of business for the data martis the need for
information and the best way to start the design process is by identifying the
business needs.
The steps in designing a data mart are:-
2. * Defining the scope ofthe data mart project.
*Defining the requirements for the data mart.
*Data martdesign**
1. Defining the scope ofthe data mart project:-
The scope ofdata martdefines the boundaries of the projectand is basically
expressed in some combination ofgeography,organization and application .
defining scope usuallyrequires making compromises as you try to balance
resources (such as people ,systems and budget) with the data and the capabilities
you.
Promised to deliver some importantpoints thatshould be remembered while
defining the scope of the data mart are:
1. sets the right expectations.
2. Prioritizes incremental development.
3. Highlights risks and issues.
4. Allows you to estimate costs.
Defining the requirements for the datamart
To start the implementation of the data mart,one need to define the business and
technical requirements .this section contains the following topics:
* Define business requirements .
* Define technicalrequirements.
* how do you knowif you have done it right?
Define business requirements:-
The purpose ofthe data mart is to provide access to data that refers to a
particular subject.the data provided for the analysis should be meaningfuland
should solve all the users queries .data should be presented in the business terms
that can be easily understood by the users.hence the business persons should
collect the data that can help the personnelto take accurate decisions.
The bestway to understand the business processes is through interviews
and questionaires . the setofquestions or interview template should be consistent
i.e should notbe changed on regularbasis .the questions should focus on the
3. users information requirements such as content,priorities etc. The requirements
identified as a resultof these the requirements definition process:
Involves end users throughoutthe process.
Classify the requirements analysis framework that includes requirements for
business sponsor,the architect , the data mart developerand the end users.
Manage the expectations of the end users.
Data MartDesign:-
Here we create the logicaland physical design for the data mart,define the
specific data content ,relationships within groups ofdata and the frequency with
which data is refreshed . the data mart design comprises of2 designs:-
*logical design
*physical design
*Logicaldesign :-
The logicaldesign is more related to the generalideas rather
then real things or events . In logicaldesign,we look at the logicalrelationship
among the objects.
*Physical design:-
In physicaldesign ,we look atthe mosteffective way ofstorming and
retrieving interviews comprise the business requirements for your data marts.
Defining technical requirements:-
The technicalrequirements specify where you getthe data that feeds the
data marts . The primary sources ofdata for data marts are the operational
systems thathandle the day-to-day transactional activities .Data martmaybe fed
from more than one operationalsources .the main things kept in mind while
defining technical requirements are:-
The data cannotbe transferred from the operationalsystems into the data
mart without intermediate processing.
One should understand howclean operationaldata in and howmuch
formatting is needed to integrate it with other sources.
4. We need to determine howoften you mustupdate the data.
How do you knowif you have done it right:-
When we finish the interviews,we have a information and performance
requirements that your data martapplication mustmeet we should prioritize the
needs and the motive a list of success criteria
.
-------------Following are some guidelines thathelp in the objects ------------------
.Data mart design should be made completelyaccording to the needs ofthe end
users.End users basically want to perform analysis and look at aggregated
data ,rather than individual transactions . A well planned design should be made
in such a way thatchanges ,ifrequired to be made, can be made easily and the
design helps in growth . this section comprises ofthe following topics :
* creating a logicaldesign
*creating a wish list of data
*Identifying sources:
Classifying data for the data mart schema
*designing the star schema
*moving from logical to physical design.
CREATING A LOGICAL DESIGN
The process oflogicaldesign involves arranging data into a series of logical
relationships called entities and attributes.
Entity: an entity represents a set of information . in RDBMS, an entity is often
related to a table.
Attribute:- it is a componentof an entity and helps define the uniqueness ofthe
entity . an attribute is related to a column
The physicalimplementation of the logical data mart modelmay require some
changes due to your system parameters size of computernumberofusers ,
5. storage capacity and software . the decisions that should be taken when we
develop the logical design are:
*facts and dimensions
*relationship between the entities
*duration of data
Creating a wish list of data::-
The wish list of data elements is generated from the userrequirements the
scope ofdata mart is fully specified by the users . if users needs are properly
considered then the scope ofthe data martis wide and if the needs are not
considered properlythen the scope of the data mart is limited .
After taking all the requirements provided by the users and considering allthe
factors involved,we should have the following details with us:
- a list of data elements ,both raw form and calculated form.
- attributes of the data ,such as characteror numeric data types.
-Grouping ofthe data,such as geographicalfor the elements.
- an idea of the relationship between the data ,such as a city is within a country(a
region having its own local government)
Users may also provide us with reports. These reports basically give the idea of
the user’s requirements .reports acts as a good medium in knowing the needs of
the users and then developing the data mart which suits their needs.
Identifying sources::-
After having the list of dimensions and facts that are required for the data
mart, the issue thatis looked after is “”how to collect the data””. data sources range
from operationalsystems to spreadsheets typically,
- a large percentage ofthe data comes from one or two sources.
- if facts are in raw form ,the facts are associated with transactional tables.
-the transactional data first is aggregated and then is used in the data mart.
-the data is aggregated on the basis ofgranularity granularity is the lowest level of
information that the usermight want.
Example :
6. In a telecommunication company,calls can be aggregated easilyby area code .
however, the data martneeds data by postalcode because an area code contains
multiple postal codes and one postalcode may span multiple area code.
Classifying data for the data mart schema :
A common representation of facts, dimensions and the relationships
between them data martapplication is the star schema .It is called a star schema
because the graph representation looks like a star with a large fact table in the
center and the smaller dimension tables arranged aroundit.
This section contains the following topics:
*dimensions
* facts
* granularity
*-*Dimensions:
Dimensions are the entities with respectto which an organization wants to
keep record the big design issue is to decide.when a field is just an item in a
dimension orwhen it should have its own dimension .In the star schema , the
sequence ofdimensions tables does notmatter as long as they are created before
fact table .hence , all the dimension tables created first.
*-* facts :
Facts are the numeric metrics of the business theysupportmathematical
calculations used to reporton and analyze the business .database size and
performance will improve if we categorize fields as dimensions .
*-* granularity:
When we define the facts and dimensions ,we determine the appropriate
granularity (level of information) for the data in the data mart we need to estimate
the requirements to achieve the desired level of granularity and decide whetheror
not we can supportthe desired level of granularity .
Designing the star schema :
7. After having a list of all facts , dimensions ,and the desired level of
granularity we are ready to create the star schema following are some things kept
in mind while making star schema:
* the relationships between the fact and dimension table using keys.
*the primary key of the fact table can consistofseveral columns such as key
is called a composite key.
* it is a good idea to use system generated key(synthetic keys), in place of
natural keys to link the facts and the dimensions
*a systemic key is generated sequence ofintegers systemic keys in the
dimensions table,in addition to the natural key.
---->benefits of systemic keys over natural keys
1.natural keys are often long characterstrings,because systemic keys are
integers,response time to queries is improved.
2.The datamartadministrator has control over the systemic key.If a
manufacturing group changed the productcode naming conventions , the changes
do not affect the structure of data mart.
Moving from logicalto physical design:-
During the physicaldesign process,the data gathered during the
logical design phase is converted into a description of the physicaidatabase
including tables and constraints .
Physical design decisions have a huge impacton queryperformance.
Scalability , the ability to increase the volume of data and numberofusers,is an
importantconsiderations when we move from logicaldesign to a physicaldesign .
the scalability can be improves by minimizing tthe limitation of factors such as
hardware ,software and bandwidths.
Steps in implementing a data mart:
The major steps in implementing a data mart are:
1.designing the schema
2.Constructing the physicalstorage
3.Populating the data mart
8. 4.Accessing it to make decisions
5.Managing itover time
Designing :
The design step in firstin the data martprocess.the design steps involves the
following tasks:
1.gathering the business and technicalrequirements
2.identifying the data sources.
3.Selecting the appropriate subsetof data .
4.Designing a logical and physicalstructure of data mart
Constructing:
This step deals with the creation of physicaldatabase and the logical
structures. This helps in giving fast and efficient access to the data . following are
the tasks performed in this step:
1 creating the physical database and storage structures
2.creating the schema objects,such as tables define in the design step
3.Determining howto set up the tables and structures in the bestpossible way.
Populating:
This step covers all the tasks related to getting the data from the
source,cleaning itup and moving it into the data mart.this steps involves the
following tasks:
1.mapping data sources to appropriate data structures
2.Extracting data
3.Cleansing and transforming the data
4.Loading data into the data mart
5.Creating and storing metadata.
Accessing :
This steps involves:
1.putting the data to use.
9. 2.Querying the data
3.Analyzing it
4.Creating reports
5.Charts and graphs
This step involves the following tasks:
*set up an intermediate layer for the pointend tool to use.
*translate database structure and objectneeds into business
*manage and maintain these business interfaces
*setup and manage databasestructure
Managing:
This steps comprises ofmanaging the data mart over its lifetime.this steps
involve the following tasks:
1.providing source access to the data.
2.Managing the growth of the data
3.Optimizing the system for better performance
4.Ensuring the availability of data even with system failures.