Presiding Officer Training module 2024 lok sabha elections
Afdal
1. CHAPTER 5
DEFINING THE BUSINESS REQUIREMENTS
CHAPTER OBJECTIVES
Discusshow and why defining requirementsisdifferent for a data
warehouse
Understand the role of businessdimensions
Learn about informationpackagesand their usein defining
requirements
Review methodsfor gathering requirements
Grasp the significanceofa formal requirementsdefinitiondocument
A data warehouseis an informationdeliverysystem for business
intelligence. It is not about technology, but about solving users' problems
and providing strategic informationtothe user. In the phase of defining
requirements, you need to concentrateon what informationtheusersneed,
not so much on how you are going to providethe required information. The
actualmethodsfor providing information willcome later, not while you are
collecting requirements.
Before we proceed, let us clarifythe scopeand content of thischapter. In
thischapter, we will focus on determining requirementsfor what was
referred to as the practicalapproach inChapter Aswe discussed in that
chapter, weplan for an enterprise-widesolution; thenwe gather
requirementsfor each data mart, subject bysubject. Theserequirements
will enable us to implement conformed data marts, oneby one, on a
prioritybasis; eventuallyto cover the needs of the entireenterprise.
On the other hand, if the goal is to take a strictlytop-downapproach and
build an enterprise-widedata warehousefirst, determiningthe
requirementswillbe different from what is discussed in thischapter. With
the purely top-downapproach, thedata warehousewill be developed based
on thethird normal form relationaldata model. Thisrelationaldatabase
will form the data warehouse. In turnthisenterprisedata warehousewill
feed the dependent data marts. However, these dependent data martswill
be developed based on the requirementsdeterminationelaborated onin
thischapter.
2. Now, on with our discussionof requirementsdeterminationfor the data
marts—theoutcomeofeither the practicalapproach or the derived
productsof the purely top-downapproach.
Most of the developers of data warehousescomefrom a background of
developing operationalor OLTP (online transactionsprocessing) systems.
OLTP systemsare primarilydata capturesystems. On theother hand, data
warehousesystems are informationdeliverysystems. When you begin to
collect requirementsfor your proposed data warehouse, your mindset will
have to be different. You have to go from a data capturemodeltoan
informationdeliverymodel. This differencewill have to show through all
phases of the data warehouseproject.
The users also have a different perspectiveabout a data warehousesystem.
Unlike anOLTP system, which is needed to run the day-to-daybusiness, no
immediatepayout isseen in a decisionsupport system. The users do not
immediatelyperceivea compelling need to use a decisionsupport system,
whereasthey cannot refrainfrom using an operationalsystem, without
which they cannot run their business.
DIMENSIONAL ANALYSIS
In several ways, building a data warehouseis very different from building
an operationalsystem. Thisbecomesnotable especiallyin the requirements
gathering phase. Becauseof this difference, thetraditionalmethodsof
collecting requirementsthatworkwell for operationalsystemscannot be
directlyapplied to data warehouses.
USAGE OF INFORMATION UNPREDICTABLE
Let us imagineyou are building anoperationalsystem for order processing
in your company. For gathering requirements, you interview theusers in
the order processing department. Theuserswill list all the functionsthat
need to be performed. They will inform you how they receivethe orders,
checkstock, verify customers' credit arrangements, pricetheorder,
determinetheshipping arrangements, and routethe order to the
appropriatewarehouse. Theywill show you how they would like the various
data elementsto be presented on the GUI (graphicaluser interface) screen
for the application. Theuserswill also give you a list of reportsthey would
need from the order processing application. Theywill be ableto let you
know how, when, and wherethey would use the applicationdaily.
In providing informationabout therequirementsfor an operationalsystem,
the users are ableto give you precisedetailsof the required functions,
3. informationcontent, and usagepatterns. In striking contrast, for a data
warehousing system, the users aregenerally unable to define their
requirementsclearly. They cannot define preciselywhat informationthey
really want from the data warehouse, nor can they expresshow they would
like to use the informationor processit.
For most of theusers, thiscould be the very first data warehousethey are
being exposed to. The users arefamiliar with operationalsystemsbecause
they use these in their daily work, so they areable to visualize the
requirementsfor other new operationalsystems. They cannot relatea data
warehousesystem to anything theyhave used before.
If, therefore, the whole processof defining requirementsfor a data
warehouseis so nebulous, how canyou proceed as one of the analysts in the
data warehouseproject? You arein a quandary. Tobe on the safe side, do
you theninclude every piece of data in the data warehouseyou thinkthe
users will be able to use? How can you build something the users areunable
to define clearly and precisely?
Initially, you may collect data on the overall businessof the organization.
You maycheck on the industry'sbest practices. You maygather some
businessrules guiding theday-to-daydecisionmaking. You mayfind out
how productsaredeveloped and marketed. But these are generalitiesand
are not sufficient todeterminedetailed requirements.
DIMENSIONAL NATURE OF BUSINESS DATA
Fortunately, the situationisnot as hopeless as it seems. Even though the
users cannot fully describewhat they want in a data warehouse, they can
provideyou with very importantinsightsintohow they thinkabout the
business. They cantell you what measurement unitsareimportant for
them. Each user department can let you know how they measuresuccessin
that particular department. Theusers cangive you insightsintohow they
combinethevarious piecesof informationfor strategic decisionmaking.
Managersthinkof the businessin termsof businessdimensions. Figure 5-
1 shows the kinds of questionsmanagersarelikely to ask for decision
making. Thefigureshows what questionsa typicalmarketing vice
president, a marketing manager, and a financialcontroller mayask.
Let us brieflyexaminethese questions. Themarketing vicepresident is
interested inthe revenue generated byher new product, but she is not
interested ina single number. She is interested in therevenue numbersby
month, in a certaindivision, by customer demographics, bysales office,
4. relativeto the previous product version, and compared toplan. So the
marketing vicepresident wantstherevenue numbersbroken down by
month, division, customer demographics, salesoffice, product version, and
plan. These are her businessdimensionsalong which she wants to analyze
her numbers.
Figure 5-1 Managers think in business dimensions.
Figure 5-2 Dimensional nature of business data.
5. Similarly, for themarketing manager, hisbusinessdimensionsareproduct,
product category, time(day, week, month), sales district, and distribution
channel. For the financialcontroller, the businessdimensionsare budget
line, time(month, quarter, year), district, and division.
If your users of the data warehousethinkin termsof business dimensions
for decisionmaking, you should also thinkof businessdimensionswhile
collecting requirements. Although theactualproposed usageof a data
warehousecould be unclear, the business dimensionsused by the managers
for decisionmaking arenot nebulous at all. The users will be ableto
describethesebusiness dimensionsto you. You are not totally lost in the
process of requirementsdefinition. You canfind out about thebusiness
dimensions.
Let us tryto get a good grasp of the dimensionalnatureof business
data. Figure5-2 shows the analysisof sales unitsalong the three business
dimensionsof product, time, and geography. Thesethree dimensionsare
plotted against threeaxesof coordinates. You will see that thethree
dimensionsform a collectionof cubes. In each of thesmall dimensional
cubes, you will find the sales units for that particular sliceof time, product,
and geographicaldivision. In thiscase, the business data of sales units is
three dimensionalbecausetherearejust three dimensions used in this
analysis. If there are morethanthree dimensions, we extend the concept to
multipledimensionsand visualizemultidimensionalcubes, also called
hypercubes.
EXAMPLES OF BUSINESS DIMENSIONS
The concept of business dimensionsis fundamentalto the requirements
definitionfor a data warehouse. Therefore, we want to look at some more
examplesof businessdimensionsin a few other cases. Figure 5-3 displays
the businessdimensionsin four different cases.
Let us quicklylook at each of these examples. For the supermarket chain,
the measurementsthat areanalyzed are thesales units. These are analyzed
along four businessdimensions. When you are looking for the hypercubes,
the sides of such cubesare time, promotion, product, and store. If you are
the marketing manager for the supermarket chain, you would want your
sales broken down by product, at each store, in timesequence, and in
relationto the promotionsthat takeplace.
Figure 5-3 Examples of business dimensions.
6. For the insurancecompany, thebusiness dimensionsaredifferent and
appropriatefor that business. Here you would want to analyze theclaims
data by agent, individualclaim, time, insured party, individualpolicy, and
statusof the claim. The exampleof the airlinescompanyshows the
dimensionsfor analysis of frequent flyer data. Here the business
dimensionsare time, customer, specific flight, fareclass, airport, and
frequent flyer status.
The exampleanalyzing shipmentsfor a manufacturingcompanyshows
some other businessdimensions. In thiscase, the businessdimensions
used for the analysis of shipmentsare theones relevant to that business
and the subject ofthe analysis. Here you see the dimensionsof time, ship-to
and ship-from locations, shipping mode, product, and any specialdeals.
What we find from these examplesis that the businessdimensionsare
different and relevant to the industryand to the subject for analysis. We
also find the timedimensionto be a commondimensionin all examples.
Almost all businessanalyses areperformed over time.
INFORMATION PACKAGES–A USEFUL CONCEPT
7. We will now introducea novel idea for determining and recording
informationrequirementsfor a data warehouse. This concept helps us to
give a concreteform to the variousinsights, nebulous thoughts, and
opinionsexpressed during theprocess of collecting requirements. The
informationpackages, put together whilecollecting requirements, arevery
useful for taking thedevelopment of the data warehouseto the next phases.
REQUIREMENTS NOT FULLY DETERMINATE
As we have discussed, the users are unableto describefully what they
expect to see in the data warehouse. You areunable to get a handle on what
piecesof informationyou want to keep in thedata warehouse. You are
unsure of the usage patterns. You cannot determinehow each class of users
will use the new system. So, when requirementscannot befully
determined, weneed a new and innovativeconcept togather and record the
requirements. Thetraditionalmethodsapplicabletooperationalsystems
are not adequateinthiscontext. Wecannot start with thefunctions,
screens, and reports. Wecannot beginwith the data structures. Wehave
noted that theusers tend to thinkin terms of business dimensionsand
analyze measurementsalong such businessdimensions. This is a significant
observationand can form the very basisfor gathering information.
The new methodology for determining requirementsfor a data warehouse
system is based on businessdimensions. It flows out of the need of the
users to base their analysison businessdimensions. The new concept
incorporatesthebasic measurementsand thebusiness dimensionsalong
which the users analyze these basic measurements. Using thenew
methodology, you comeup with the measurementsand the relevant
dimensionsthat must be captured and kept in the data warehouse. You
come up with what is known as an informationpackagefor the specific
subject.
Let us look at an informationpackagefor analyzing sales for a certain
business. Figure 5-4 containssuch an informationpackage. Thesubject
here is sales. The measured factsor the measurementsthat areof interest
for analysis areshown in the bottom sectionof the packagediagram. In this
case, themeasurementsareactualsales, forecast sales, and budget sales.
The businessdimensionsalong which these measurementsareto be
analyzed are shown at the top of the diagram ascolumnheadings. In our
example, these dimensionsaretime, location, product, and demographic
agegroup. Each of these businessdimensionscontainsa hierarchyor
levels. For example, the timedimensionhasthe hierarchygoing from year
8. down to the level of individualday. The other intermediarylevels in the
timedimensioncould be quarter, month, and week. These levels or
hierarchicalcomponentsareshownin the informationpackagediagram.
Figure 5-4 An information package.
Your primarygoalin the requirementsdefinitionphaseis to compile
informationpackagesfor all the subjectsfor the data warehouse. Once you
have firmed up the informationpackages, you'll be able to proceed to the
other phases.
Essentially, informationpackagesenableyou to:
Define the commonsubject areas
Design key business metrics
Decidehow data must be presented
Determinehow users will aggregateor roll up
Decidethe data quantityfor user analysisor query
Decidehow data will be accessed
Establish data granularity
Estimatedata warehousesize
Determinethefrequency for data refreshing
Ascertainhow informationmust bepackaged
BUSINESS DIMENSIONS
9. As we have seen, business dimensionsform the underlying basisof the new
methodologyfor requirementsdefinition. Data must bestored to provide
for the business dimensions. The businessdimensionsand their
hierarchicallevels form the basis for all further development phases. So we
want to take a closer look at businessdimensions. We should be able to
identifybusinessdimensionsand their hierarchicallevels. We must be able
to choose the proper and optimalset of dimensionsrelated to the
measurements.
We beginby examining thebusinessdimensionsfor an automobile
manufacturer. Let us say that the goal is to analyze sales. Wewant to build
a data warehousethat will allow the user to analyze automobilesales in a
number of ways. The first obviousdimensionis the product dimension.
Againfor the automaker, analysisof sales must includeanalysis by
breaking thesales down by dealers. Dealer, therefore, is another important
dimensionfor analysis. As an automaker, you would want to know how
your sales breakdown along customer demographics. You would want to
know who is buying your automobilesand inwhat quantities. Customer
demographicswould be another useful businessdimensionfor analysis.
How do the customerspayfor the automobiles? What effect does financing
for the purchaseshave on the sales? These questionscanbe answered by
including themethod of payment as another dimensionfor analysis. What
about timeas a business dimension? Almost every queryor analysis
involves the timeelement. In summary, wehave come up with the
following dimensionsfor the subject of sales for an automaker: product,
dealer, customer demographic, method of payment, and time.
Let us takeone moreexample. In thiscase, we want to come up with an
informationpackagefor a hotel chain. The subject inthiscase is hotel
occupancy. Wewant to analyze occupancyof the rooms in thevarious
branchesof the hotel chain. We want to analyze the occupancyby
individualhotels and by room types. So, hotel and room type arecritical
businessdimensionsfor the analysis. As in the other case, we also need to
includethe timedimension. In the hotel occupancyinformationpackage,
the dimensionsto be included arehotel, room type, and time.
DIMENSION HIERARCHIES AND CATEGORIES
When a user analyzes the measurementsalong a business dimension, the
user usually would like to see the numbersfirst in summaryand then at
variouslevels of detail. What the user does here is to traversethe
hierarchicallevels of a businessdimensionfor getting thedetailsat various
10. levels. For example, the user first sees the totalsales for the entire year.
Then the user moves down to the level of quartersand looks at the sales by
individualquarters. After this, theuser moves down further to the level of
individualmonthsto look at monthly numbers. What we noticehere is that
the hierarchyof thetimedimensionconsistsof the levels of year, quarter,
and month. The dimensionhierarchiesarethepathsfor drilling down or
rolling up in our analysis.
Withineach major businessdimensiontherearecategoriesof data
elements that canalso be useful for analysis. In thetimedimension, you
may have a data element to indicatewhether a particular dayisa holiday.
Thisdata element would enable you to analyze by holidaysand see how
sales on holidays comparewith saleson other days. Similarly, inthe
product dimension, you maywant to analyze by type of package. The
packagetypeis one such data element withintheproduct dimension. The
holiday flag in the timedimensionand the packagetypeinthe product
dimensiondo not necessarilyindicatehierarchicallevels in these
dimensions. Such data elementswithinthe businessdimensionmay be
called categories.
Hierarchiesand categoriesareincluded inthe informationpackagesfor
each dimension. Let us go backto the twoexamplesin the previous section
and find out which hierarchicallevels and categoriesmust beincluded for
the dimensions. Let us examinethe product dimension. Here, the product
is the basic automobile. Therefore, we includethe data elements relevant to
product ashierarchiesand categories. Thesewould be model name, model
year, packagestyling, product line, product category, exterior color, interior
color, and first model year. Looking at the other businessdimensionsfor
the autosales analysis, we summarizethehierarchiesand categoriesfor
each dimensionas follows:
Product: Model name, model year, package styling, productline, productcategory,
exterior color, interior color, first model year
Dealer: Dealer name, city, state, single brand flag, date of first operation
Customer Demographics:Age, gender, income range, marital status, household
size, vehicles owned, home value, own or rent
PaymentMethod: Finance type, term in months, interest rate, agent
Time: Date, month, quarter, year, day of week, day of month, season, holiday flag
11. Let us go backto thehotel occupancyanalysis. Wehave included three
businessdimensions. Let us list the possiblehierarchiesand categoriesfor
the threedimensions.
Hotel: Hotel line, branch name, branch code, region, address, city, state, zip code,
manager, construction year, renovation year
Room Type: Room type, room size, number of beds, type of bed, maximum
occupants, suite, refrigerator, kitchenette
Time: Date, day of month, day of week, month, quarter, year, holiday flag
KEY BUSINESS METRICS OR FACTS
So far we have discussed the businessdimensionsin the above two
examples. These are thebusiness dimensionsrelevant to theusers of these
two data martsfor performing analysis. The respectiveusersthink of their
businesssubjectsin termsof thesebusiness dimensionsfor obtaining
informationand for doing analysis.
But using these business dimensions, what exactlyarethe users analyzing?
What numbersarethey analyzing? The numbersthe users analyze are the
measurementsor metricsthat measurethesuccessof their departments.
These are thefactsthat indicatetothe users how their departmentsare
doing in fulfilling their departmentalobjectives.
In the caseof the automaker, thesemetricsrelatetothe sales. These are the
numbersthat tell the users about their performanceinsales. These are
numbersabout thesale of each individualautomobile. Theset of
meaningfuland useful metricsfor analyzing automobilesales is as follows:
Actualsale price
MSRP
Optionsprice
Full price
Dealer add-ons
Dealer credits
Dealer invoice
Amount of down payment
Mnufacturer proceeds
Amount financed
In the second exampleof hotel occupancy, thenumbersor metricsare
different. The natureof the metricsdependson what is being analyzed. For
12. hotel occupancy, themetricswould therefore relateto the occupancyof
rooms in each branch of the hotel chain. Here is a list of metricsfor
analyzing hotel occupancy:
Occupied rooms
Vacant rooms
Unavailablerooms
Number of occupants
Revenue
Now putting it all together, let us discusswhat goes into theinformation
packagediagramsfor thesetwo examples. In each case, the metricsor facts
go into thebottom sectionof the informationpackage. Thebusiness
dimensionswill be the column headings. In each column, you will include
the hierarchiesand categoriesfor the businessdimensions.
Figures5-5 and 5-6 show the informationpackagesfor the two examples
we just discussed.
Figure 5-5 Information package: automaker sales.
14. REQUIREMENTS GATHERING METHODS
Now that we have a way of formalizing requirementsdefinitionthrough
informationpackagediagrams, let us discussthe methodsfor gathering
requirements. Remember thata data warehouseis an informationdelivery
system for providing informationfor strategic decisionmaking. It is not a
system for running the day-to-daybusiness. Who arethe users that can
makeuse of the informationinthe data warehouse? Wheredo you go for
getting therequirements?
Broadly, we can classifythe users of the data warehouseas follows:
Senior executives(including thesponsors)
Key departmentalmanagers
Business analysts
Operationalsystem databaseadministrators(DBAs)
Othersnominated by the above
Executiveswill give you a sense of directionand scopefor your data
warehouse. They are the ones closely involved in the focused area. The key
15. departmentalmanagersaretheones who report to theexecutivesin the
area of focus. Business analystsare the ones who preparereportsand
analyses for the executivesand managers. Theoperationalsystem DBAs
and IT applicationsstaffwill give you informationabout thedata sources
for the warehouse.
What requirementsdoyou need to gather? Here is a broad list:
Data elements: fact classes, dimensions
Recording of data in termsof time
Data extractsfrom sourcesystems
Business rules: attributes, ranges, domains, operationalrecords
You will have to go to different groups of people in the variousdepartments
to gather therequirements. Threebasic techniquesareuniversally adopted
for obtaining informationfrom groupsof people: (1) interviews, one-on-one
or in small groups; (2) joint applicationdevelopment (JAD) sessions; (3)
questionnaires.
One-on-one interviewsarethe most interactiveofthe three methods. JAD
or group sessions area little less interactive. Questionnairesarenot
interactive. Aninteractivemethod hasseveral advantagesofgetting
confirmed information.
A few thoughtsabout these threebasic approachesfollow.
Interviews
Two or three persons at a time
Easy to schedule
Good approach whendetailsareintricate
Some users arecomfortableonly with one-on-one interviews
Need good preparationtobe effective
Always conduct pre-interview research
Establish objectivesfor each interview
Decideon the questiontypes
Also encourageusersto preparefor the interview
Group Sessions
Groupsof 20 or fewer persons at a time
Use only after getting a baselineunderstanding ofthe requirements
Not good for initialdata gathering
16. Use when free flow of ideasis essential
Useful for confirming requirements
Efficient when users are scattered acrosslocations
Need to be very well organized
Questionnaires
Can gather lots of requirements quickly
Useful when people to be questioned are widely dispersed
Good in explorationphaseto get overall reactions
Maybe used for people whose work schedule is too tight for
interviews
However, questionnairesdonot permit interactiveresponseslike
interviews
TYPES OF QUESTIONS
In all these three different methodsof gathering requirementsweuse
questionsto elicit information. It is, therefore, very importantto
understand thetypes of questionsthat maybe used and theeffectiveness of
each.
Open-EndedQuestions. These open up options for interviewees to respond. The
benefits of using open-ended questions are they put interviewees at ease, allow
insights into values and beliefs, provides exposure to interviewees' vocabulary,
opens up opportunities for more questioning, and are interesting and spontaneous.
Drawbacks are that they could result in too much unnecessary detail, the risk of
losing control in the interview, they may take too much time, not proportional to
the information gathered.
Closed Questions. These allow limited responses to interviewees. Some closed
questions are bipolar in the sense that these look for “Yes or No” type answers.
Closed questions enable you to save time and get to the point quickly and easily.
Closed questions allow for interviews to be compared, provide control over the
interview and the ability to cover a lot of ground quickly, and are likely to gather
only the relevant information. Drawbacks are the inability to get rich details, less
chance for building trust and rapport between interviewer and interviewee, and
they may becomeboring and dull.
Probes. These are really follow-up questions. Probes may be used after open-ended
or closed questions. The intention would be to go beyond the initial questions and
answers. Probes are useful in drawing out an interviewee's point of view.
17. ARRANGEMENT OF QUESTIONS
For informationgathering using theproper types of questionsalone is not
enough. You must be able to arrangethequestioning inaneffective
sequenceto suit theaudienceand thepurpose. The following structuresfor
arranging questionsareused in practice.
Pyramid Structure. This is an inductive method of arranging the questions. You
begin with very specific closed questions and then expand the topics with open-
ended questions. This structure is useful when the interviewee needs to warm up to
the topics being discussed. Use this structure when general views of the topics are
to be extracted at the end.
FunnelStructure. This is a deductive method. Begin with general open-ended
questions and then narrow the topics with specific, closed questions. This structure
is useful when the interviewee is emotional about the topics under discussion. Use
this structure when gradual levels of details are needed at the end.
Diamond-Shaped Structure. In this case, you warm up the interview with specific
closed questions. You then proceed towards broad, general, open-ended questions.
Finally you narrow the interview and achieve closure with specific closed
questions. Usually, this structure is better than the other two. However, this
structure may lengthen the interview.
INTERVIEW TECHNIQUES
The interview sessions canuse up a good percentageofthe project time.
Therefore, these will have to be organized and managed well. Before your
project team launchestheinterview process, makesure the following major
tasks arecompleted.
Select and traintheproject team membersconductingtheinterviews
Assignspecific roles for each team member (lead interviewer/scribe)
Preparea list of users to be interviewed and preparea broad schedule
List your expectationsfrom each set of interviews
Complete pre-interview research
Prepareinterview questionnaires
Preparetheusers for theinterviews
Conduct a kick-off meeting of all users to be interviewed
Most of the users you will be interviewing fallinto threebroad categories:
senior executives, departmentalmanagers/analysts, and IT department
18. professionals. What aretheexpectationsfrom interviewingeach of these
categories? Figure5-7 shows the baselineexpectations.
Pre-interview research isimportantfor the successof the interviews. Here
is a list of some key research topics:
History and current structureofthe business unit
Number of employees and their roles and responsibilities
Locationsof the users
Figure 5-7 Expectations from interviews.
Primarypurposeof the businessunit in the enterprise
Relationship of the businessunit to the strategicinitiativesofthe
enterprise
Secondarypurposesof the businessunit
Relationship of the businessunit to other unitsand to outside
organizations
Contributionofthe businessunit to corporaterevenuesand costs
19. The company'smarket
Competitioninthe market
Some tipson the natureof questionsto be asked in the interviewsfollow.
Current Information Sources Which operationalsystemsgenerate
data about important businesssubjectareas? What arethetypes of
computer systemsthat support thesesubject areas? What informationis
currentlydelivered in existing reportsand online queries? How about the
level of detailsin the existing informationdeliverysystems?
Subject Areas
Which subject areasaremost valuablefor analysis?
What arethe businessdimensions? Do these have natural
hierarchies?
What arethe businesspartitionsfor decisionmaking?
Do thevarious locationsneed globalinformationor just local
informationfor decisionmaking? What isthe mix?
Are certainproductsand servicesoffered only in certainareas?
Key Performance Metrics
How is theperformanceof the businessunit currentlymeasured?
What arethe criticalsuccessfactorsand how are these monitored?
How do the key metricsroll up?
Are all marketsmeasured in thesame way?
InformationFrequency
How often must the data beupdated for decisionmaking?
What isthe timeframe?
How does each type of analysiscomparethemetricsover time?
What isthe timelinessrequirement for the informationinthedata
warehouse?
As initialdocumentationfor therequirementsdefinition, prepareinterview
write-upsusing thisgeneral outline:
1. User profile
2. Background and objectives
3. Informationrequirements
20. 4. Analyticalrequirements
5. Current tools used
6. Successcriteria
7. Useful businessmetrics
8. Relevant business dimensions
ADAPTING THE JAD METHODOLOGY
If you are able to gather a lot of baselinedata up front from different
sources, group sessions may be a good substitutefor individualinterviews.
In this method, you areable to get a number of interested users to meet
together in group sessions. On thewhole, thismethod could result in fewer
group sessions thanindividualinterview sessions. The overall timefor
requirementsgatheringmayprove to be less and, therefore, shorten the
project. Also, group sessions may be more effectiveif the users are
dispersed in remotelocations.
Joint applicationdevelopment (JAD) techniquesweresuccessfully utilized
to gather requirementsfor operationalsystemsin the 1980s. Users of
computer systemshad grown to be more computer-savvyand their direct
participationinthedevelopment of applicationsproved tobe very useful.
As the nameimplies, JAD is a joint process, with all the concerned groups
getting together for a well-defined purpose. It is a methodology for
developing computer applicationsjointlybythe users and the IT
professionals in a well-structured manner. JADcentersaround discussion
workshops lasting a certainnumber of days under thedirectionof a
facilitator. Under suitableconditions, theJADapproach maybe adapted for
building a data warehouse.
JAD consistsof a five-phased approach:
Project definition
Complete high-level interviews
Conductmanagement interviews
Prepare management definition guide
Research
Become familiar with the business area and systems
Document user information requirements
Document business processes
21. Gather preliminary information
Prepare agenda for the sessions
Preparation
Create working document from previous phase
Train the scribes
Prepare visual aids
Conductpresessionmeetings
Set up a venue for the sessions
Prepare a checklist for objectives
JAD sessions
Open with a review of the agenda and purpose
Review assumptions
Review data requirements
Review business metrics and dimensions
Discuss dimension hierarchies and roll-ups
Resolve all open issues
Close sessions with lists of action items
Final document
Convert the working document
Map the gathered information
List all data sources
Identify all business metrics
List all business dimensions and hierarchies
Assemble and edit the document
Conductreview sessions
Get final approvals
Establish procedureto change requirements
The successof a project using theJAD approach very much dependson the
compositionofthe JAD team. The size and mix of the team will vary based
on thenatureand purposeof the data warehouse. Thetypicalcomposition,
22. however, must have pertinent roles present in the team. For each of the
following roles, usually one or more persons areassigned.
Executive sponsor—Personcontrolling the funding, providing the direction, and
empowering the team members
Facilitator—Person guiding the team throughout the JAD process
Scribe—Persondesignated to record all decisions
Full-time participants—Everyone involved in making decisions about the data
warehouse
On-call participants—Persons affected by the project, but only in specific areas
Observers—Personswho would like to sit in on specific sessions without
participating in the decision making
USING QUESTIONNAIRES
Becauseusing questionnairesisnot interactive, questionnairesmust be
designed with much care. We note important pointsrelatingtosignificant
aspectsof administeringquestionnaires. If properly done, questionnaires
may form an important method for gathering requirementsfor your data
warehouse.
Type and Choice of Questions. You may use both open-ended and closed questions
in a questionnaire. Choice of language is important. Use the language of the
respondents, not cryptic technical jargon. Be specific, not vague. Keep questions
short and precise. Avoid objectionable, politically incorrect language. Avoid
talking down to the respondents. Target the questions to the appropriate respondent
group.
Application of Scales. Questionnaires usually contain nominal and interval scales.
These make it easy to respond. Nominal scales are used to classify things. Interval
scales are used for quantitative analysis.
Questionnaire Design. The order of the questions is important. Start the
questionnaire with less controversial, highly important questions. Cluster questions
with similar content. The design must be inviting and pleasing. Allow ample white
space. Provide sufficient spacefor responses. Make it easy to mark or indicate
responses while using scales. Maintain a consistent style.
Administering Questionnaires. Carefully decide on who gets the questionnaire.
Ensure that there are no omissions. Some ways of administering the questionnaires
include at an initial group session, through personal delivery and later collection,
23. selfadministration by respondents, by mail to respondentlocations, and
electronically via e-mail.
REVIEW OF EXISTING DOCUMENTATION
Although most of the requirementsgatheringwillbe done through
interviews, group sessions, and questionnaires, you will be able to gather
useful informationfrom thereview of existing documentation. Review of
existing documentationcanbedone by the project team without toomuch
involvement from the users of the business units. Scheduling of the review
of existing documentationinvolves only the membersof the project team.
Documentation from UserDepartments What canyou get out of the
existing documentation? First, let us look at the reportsand screens used
by the users in the businessareasthat will be using the data warehouse.
You need to find out everything about thefunctionsof the businessunits,
the operationalinformationgatheredand used by these users, what is
important tothem, and whether they use any of theexisting reportsfor
analysis. You need to look at the user documentationfor all the operational
systems used. You need to grasp what is important totheusers.
The businessunits usually have documentationon theprocesses and
proceduresin those units. How do the users perform their functions?
Review in detailall the processesand procedures. You are trying to find out
what types of analyses theusers in thesebusiness unitsare likely to be
interested in. Review the documentationand thenaugment what you have
learned from the documentationprepared from theinterview sessions.
Documentation from IT The documentationfrom theusers and the
interviewswith the users will give you informationon the metricsused for
analysis and the businessdimensionsalong which the analysis getsdone.
But from where do you get the data for the metricsand business
dimensions? These will have to come from internaloperationalsystems.
You need to know what is availableinthe source systems.
Wheredo you turnfor informationavailableinthe source systems? This is
where the operationalsystem DBAs and applicationexpertsfrom IT
becomevery importantfor gathering data. TheDBAswill provideyou with
all thedata structures, individualdata elements, attributes, value domains,
and relationshipsamong fieldsand data structures. From the information
you have gathered from theusers, you will then be able to relatethe user
informationtothe sourcesystems as ascertained from theIT personnel.
24. Workwith your DBAs to obtaincopiesof the data dictionaryor data catalog
entriesfor therelevant source systems. Studythedata structures, data
fields, and relationships. Eventually, you will be populating thedata
warehouse from these sourcesystems, so you need to understand
completely thesource data, thesource platforms, and the operating
systems.
Now let us turn to the IT applicationexperts. Theseprofessionalswill give
you thebusiness rules and help you to understand and appreciatethe
variousdata elements from the source systems. You will learn about data
ownership, about people responsible for data quality, and how data is
gathered and processed in the source systems. Review theprogramsand
modules that makeup the sourcesystems. Look at the copy books inside
the programstounderstand how the data structuresareused in the
programs.
REQUIREMENTS DEFINITION: SCOPE AND
CONTENT
Formal documentationisoften neglected in computer system projects. The
project team goesthrough the requirementsdefinitionphase. Theyconduct
the interviewsand group sessions. They review the existing documentation.
They gather enough materialtosupport the next phases in the system
development life cycle. But they skip the detailed documentationofthe
requirementsdefinition.
There areseveral reasonswhy you should commit theresultsof your
requirementsdefinitionphasetowriting. First of all, the requirements
definitiondocument isthe basisfor the next phases. If project team
membershave to leave the project for any reason at all, the project willnot
suffer from people walking away with theknowledge they have gathered.
The formaldocumentationwillalso validateyour findingswhen reviewed
with the users.
We will come up with a suggested outlinefor the formal requirements
definitiondocument. Beforethat, let us look at the types of informationthis
document must contain.
DATA SOURCES
Thispiece of informationisessentialin the requirementsdefinition
document. Include all the detailsyou have gathered about thesource
systems. You will be using the source system data inthe data warehouse.
25. You will collect the data from these source systems, merge and integrateit,
transform thedata appropriately, and populatethedata warehouse.
Typically, the requirementsdefinitiondocument should includethe
following information:
Availabledata sources
Data structureswithinthedata sources
Locationof the data sources
Operating systems, networks, protocols, and client architectures
Data extractionprocedures
Availabilityofhistoricaldata
DATA TRANSFORMATION
It is not sufficient just to list the possibledata sources. You will list relevant
data structuresaspossiblesources becauseof the relationshipsof thedata
structureswith thepotentialdata inthe data warehouse. Onceyou have
listed thedata sources, you need to determinehow the sourcedata will have
to be transformed appropriatelyintothe type of data suitabletobe stored
in the data warehouse.
In your requirementsdefinitiondocument, includedetailsofdata
transformation. Thiswill necessarilyinvolve mapping ofsource data tothe
data in thedata warehouse. Indicatewherethe data about your metricsand
businessdimensionswill comefrom. Describethe merging, conversion,
and splitting that need to takeplacebefore moving thedata intothe data
warehouse.
DATA STORAGE
From your interviewswith the users, you would have found out the level of
detailed data you need to keep in the data warehouse. You will have an idea
of thenumber of data martsyou need for supporting theusers. Also, you
will know the detailsof the metricsand the businessdimensions.
When you find out about thetypes of analyses the users will usually do, you
candeterminethe types of aggregationsthat must bekept in the data
warehouse. This will give you informationabout additionalstorage
requirements.
Your requirementsdefinitiondocumentmust includesufficient details
about storagerequirements. Preparepreliminaryestimateson the amount
26. of storageneeded for detailed and summarydata. Estimatehow much
historicaland archived data needsto be in the data warehouse.
INFORMATION DELIVERY
Your requirementsdefinitiondocumentmust containthefollowing
requirementsoninformationdelivery to the users:
Drill-down analysis
Roll-up analysis
Drill-through analysis
Slicing and dicing analysis
Ad hoc reports
Online monitoring tools such as dashboardsand scorecards
INFORMATION PACKAGE DIAGRAMS
The presenceof informationpackagediagramsinthe requirements
definitiondocument isthe major and significantdifferencebetween
operationalsystemsand data warehousesystems. Remember that
informationpackagediagramsarethebest approach for determining
requirementsfor a data warehouse.
The informationpackagediagramscrystallizetheinformation
requirementsfor the data warehouse. They containthecriticalmetrics
measuring theperformanceofthe business units, the businessdimensions
along which the metricsareanalyzed, and the detailsof how drill-downand
roll-up analyses are done.
Spend as much timeas needed to make sure that theinformationpackage
diagramsarecompleteand accurate. Your data designfor the data
warehousewill be totallydependent on the accuracyand adequacyof the
informationpackagediagrams.
REQUIREMENTS DEFINITION DOCUMENT
OUTLINE
1. Introduction. Statethepurposeand scope of the project. Include
broad project justification. Provideanexecutivesummaryofeach
subsequent section.
2. GeneralRequirementsDescriptions. Describethesourcesystems
reviewed. Include interview summaries. Broadlystatewhat typesof
informationrequirementsareneeded in the data warehouse.
27. 3. Specific Requirements. Includedetailsof sourcedata needed. List the
data transformationand storage requirements. Describethetypes of
informationdeliverymethodsneeded by the users.
4. InformationPackages. Provideasmuch detailas possible for each
informationpackage. Includethisin the form of packagediagrams.
5. Other Requirements. Cover miscellaneousrequirementssuch as data
extract frequencies, data loading methods, and locationstowhich
informationmust bedelivered.
6. User Expectations. Statetheexpectationsintermsof problemsand
opportunities. Indicatehow the users expect touse the data
warehouse.
7. User Participationand Sign-Off. List the tasksand activitiesinwhich
the users are expected toparticipatethroughout thedevelopment life
cycle.
8. GeneralImplementationPlan. At this stage, give a high-level plan for
implementation.
CHAPTER SUMMARY
Unlike therequirementsfor an operationalsystem, the requirements
for a data warehousearequitenebulous.
Business data is dimensionalinnatureand the users of the data
warehousethinkin terms of business dimensions.
A requirementsdefinitionfor the data warehousecan, therefore, be
based on businessdimensionssuch as product, geography, time, and
promotion.
Informationpackages—anew and useful concept—arethebackbone
of therequirementsdefinition. Aninformationpackagerecordsthe
criticalmeasurementsor factsand business dimensionsalong which
the factsarenormally analyzed.
Interviews, group sessions, and questionnairesarestandardmethods
for collecting requirements.
Key people to be interviewed or to be included in group sessions are
senior executives(including thesponsors), departmentalmanagers,
businessanalysts, and operationalsystemsDBAs.
Review all existing documentationofrelated operationalsystems.
Scopeand content of therequirementsdefinitiondocumentinclude
data sources, data transformation, datastorage, informationdelivery,
and informationpackagediagrams.
28. REVIEW QUESTIONS
1. What arethe essentialdifferencesbetween defining requirementsfor
operationalsystemsand for data warehouses?
2. Explainbusiness dimensions. Whyand how can businessdimensions
be useful for defining requirementsfor thedata warehouse?
3. What data doesan informationpackagecontain?
4. What aredimensionhierarchies? Givethreeexamples.
5. Explainbusiness metricsor factswith five examples.
6. List the types of users who must be interviewed for collecting
requirements. What informationcanyou expect toget from them?
7. In which situationscanJADmethodologybe successful for collecting
requirements?
8. Why arereviews of existing documentsimportant? Whatcanyou
expect to get out of such reviews?
9. Variousdata sourcesfeed the data warehouse. What arethepieces of
informationyou need to get about data sources?
10. Nameany five major componentsof the formal requirements
definitiondocument. Describewhat goesintoeach of these
components.
EXERCISES
1. Indicateiftrue or false:
1. Requirementsdefinitionsfor a sales processing operational
system and a sales analysisdata warehouseare very similar.
2. Managersthinkintermsof business dimensionsfor analysis.
3. Unit sales and product costsareexamplesof business
dimensions.
4. Dimensionhierarchiesrelatetodrill-down analysis.
5. Categoriesareattributesofbusiness dimensions.
6. JAD is a methodologyfor one-on-one interviews.
7. Questionnairesprovidethe least interactivemethod for
gathering requirements.
8. The departmentalusersprovide informationaboutthe
company'soverall direction.
9. Departmentalmanagersarevery good sourcesfor information
on data structuresof operationalsystems.
10. Informationpackagediagramsareessentialpartsof the
formal requirementsdefinitiondocument.
29. 2. You arethe vice president of marketing for a nation-wideappliance
manufacturer with threeproductionplants. Describeanythree
different ways you will tend to analyze your sales. What arethe
businessdimensionsfor your analysis?
3. BigBook, Inc. is a large book distributor with domesticand
internationaldistributionchannels. The companyordersfrom
publishersand distributespublicationstoall the leading booksellers.
Initially, you want to build a data warehouseto analyze shipments
that aremadefrom the company'smanywarehouses. Determinethe
metricsor factsand the businessdimensions. Preparean information
packagediagram.
4. You areon the data warehouseproject of AuctionsPlus.com, an
Internet auctioncompanyselling upscaleworks of art. Your
responsibilityisto gather requirementsfor sales analysis. Find out
the key metrics, businessdimensions, hierarchies, and categories.
Draw the informationpackagediagram.
5. Createa detailed outlinefor theformal requirementsdefinition
document for a data warehousetoanalyze product profitabilityofa
largedepartment storechain.