My results.
HOW TO CREATE
A BROCHURE
To print (and preserve) these brochure instructions, click Print on the File menu. Press ENTER to print the brochure.
Using this template, you can create a professional brochure. Here’s how:
Insert your words in place of these words, using or re-arranging the preset paragraph styles.
Print pages 1 and 2 back-to-back onto sturdy, letter size paper.
Fold the paper like a letter to create a three-fold brochure (positioning the panel with the large picture on the front).What Else Should
I Know?
To change the style of any paragraph, select the text by positioning your cursor anywhere in the paragraph. Then, select a style from the Style list on the Formatting toolbar.
To change the picture, click it to select it. Click Picture on the Insert menu, and then click FromFile. Select a new picture, and then click Insert.
(
Company Name
Street
Address
Address 2
City,
ST ZIP
Code
Phone (
704
)
555-0125
Fax (
704
)
555-0145
Web site address
) (
Future
Solution
s Now
) (
Customized
T
urnkey Training Courseware
) (
Adventure
Works
Date of publication
)how to customize this brochure
You’ll probably want to customize all your templates when you discover how editing and saving your templates makes creating future documents easier. To customize this brochure template:
1. Insert your company information in place of the sample text.
Click Save As on the File menu. Click Document Template in the Save as Type box (the file name extension should change from .doc to .dot).
Next time you want to use it, click New on the File menu, and then double-click your template.about the “picture” Fonts
The “picture” fonts in this brochure are Wingdings typeface symbols. To insert a new symbol, select the symbol character and click Symbol on the Insert menu. Select a new symbol from the map, click Insert, and then click Close.
workING with breaks
Breaks in a Microsoft Word document appear as labeled dotted lines on the screen. Using the Break command, you can insert manual page breaks, column breaks, and section breaks.
To insert a break, click Break on the Insert menu. Select an option. Click OK to accept your choice.WorkING with Spacing
To reduce the spacing between, for example, body text paragraphs, click in this paragraph, and click Paragraph on the Format menu. Reduce Spacing After to 6 points, and make additional adjustments as needed.
To save your style changes (with the insertion point in the changed paragraph), click the style in the Style list on the Formatting toolbar. Press ENTER to save the changes and update all similar styles.
To adjust character spacing, select the text to be modified and click Font on the Format menu. Click CharacterSpacing and then enter a new value.
Other Brochure Tips
To change a font size, click Font on the Format menu. Adjust the size as needed, and then click OK or Cancel.
To change the shading of shaded paragraphs, click BordersandShading on the Format menu. Select a new shade or patte.
Interactive Powerpoint_How to Master effective communication
My results.HOW TO CREATEA BROCHURETo print (and preserve) .docx
1. My results.
HOW TO CREATE
A BROCHURE
To print (and preserve) these brochure instructions, click Print
on the File menu. Press ENTER to print the brochure.
Using this template, you can create a professional brochure.
Here’s how:
Insert your words in place of these words, using or re-arranging
the preset paragraph styles.
Print pages 1 and 2 back-to-back onto sturdy, letter size paper.
Fold the paper like a letter to create a three-fold brochure
(positioning the panel with the large picture on the front).What
Else Should
I Know?
To change the style of any paragraph, select the text by
positioning your cursor anywhere in the paragraph. Then, select
a style from the Style list on the Formatting toolbar.
To change the picture, click it to select it. Click Picture on the
Insert menu, and then click FromFile. Select a new picture, and
then click Insert.
(
Company Name
Street
Address
Address 2
City,
ST ZIP
Code
Phone (
704
)
2. 555-0125
Fax (
704
)
555-0145
Web site address
) (
Future
Solution
s Now
) (
Customized
T
urnkey Training Courseware
) (
Adventure
Works
Date of publication
)how to customize this brochure
You’ll probably want to customize all your templates when you
discover how editing and saving your templates makes creating
3. future documents easier. To customize this brochure template:
1. Insert your company information in place of the sample text.
Click Save As on the File menu. Click Document Template in
the Save as Type box (the file name extension should change
from .doc to .dot).
Next time you want to use it, click New on the File menu, and
then double-click your template.about the “picture” Fonts
The “picture” fonts in this brochure are Wingdings typeface
symbols. To insert a new symbol, select the symbol character
and click Symbol on the Insert menu. Select a new symbol from
the map, click Insert, and then click Close.
workING with breaks
Breaks in a Microsoft Word document appear as labeled dotted
lines on the screen. Using the Break command, you can insert
manual page breaks, column breaks, and section breaks.
To insert a break, click Break on the Insert menu. Select an
option. Click OK to accept your choice.WorkING with Spacing
To reduce the spacing between, for example, body text
paragraphs, click in this paragraph, and click Paragraph on the
Format menu. Reduce Spacing After to 6 points, and make
additional adjustments as needed.
To save your style changes (with the insertion point in the
changed paragraph), click the style in the Style list on the
Formatting toolbar. Press ENTER to save the changes and
update all similar styles.
4. To adjust character spacing, select the text to be modified and
click Font on the Format menu. Click CharacterSpacing and
then enter a new value.
Other Brochure Tips
To change a font size, click Font on the Format menu. Adjust
the size as needed, and then click OK or Cancel.
To change the shading of shaded paragraphs, click
BordersandShading on the Format menu. Select a new shade or
pattern, and then click OK. Experiment to achieve the best
shade for your printer.
To remove a character style, select the text and press
CTRL+SPACEBAR. You can also click Default Paragraph Font
on the Style list.Brochure Ideas
“Picture” fonts, like Wingdings, are gaining popularity.
Consider using other symbol fonts to create highly customized
icons.
Consider printing your brochure on colorful, preprinted
brochure paper—available from many paper suppliers.
(
Company Name
Street
Address
Address 2
City,
ST ZIP
5. Code
Phone (704) 555-0125
Fax (704) 555-0145
Web site address
) (
Company Name
Street
Address
Address 2
City,
ST ZIP
Code
Phone (555)555-0125
Fax (555)555-0145
Web site address
)
rpsgroup.com/energy
National Data Repositories (NDR)
define, develop,
deliver
6. Introduction
rpsgroup.com/energy
RPS Energy helps companies develop natural energy resources
across
the complete asset life cycle, combining our technical and
commercial
skills with an in-depth knowledge of environmental issues.
The expertise within RPS Energy is applied world-wide to a
broad
range of projects across a number of industry sectors. In each of
these
areas, we provide our clients with independent flexible support
to help
them achieve their technical and commercial goals.
RPS Energy has major regional offices across the UK,
Australia, USA
and Canada as well as local offices and agencies in many other
areas.
7. Oil and gas projects remain a central part of our work, but we
are also
world-leaders in advice to windfarm operators and are
increasingly
involved in other forms of renewable energy. Transferring skills
across
these sectors is a core capability for RPS Energy.
Our clients include governments, NOCs, IOCs, independents
and
financial institutions, as well as companies in the wider energy
industry
and other infrastructure and asset owners.
Increasingly we operate on projects where the issues
surrounding
the development energy resources and the preservation of the
environment converge. RPS Energy brings a unique combination
of
such skills to all our projects.
RPS Energy, through its acquisition of Paras Consulting, is one
of the world’s leading independent consulting companies in
the field of Information Systems and Processes. This
8. combination of vendor/service neutrality combined with E&P
technical
know-how, makes RPS Energy uniquely placed to offer services
to governments to implement an NDR.
The large system vendors are vital stakeholders in the NDR
domain and RPS Energy maintains excellent relationships with
these companies. We manage technology procurement projects
for both energy companies and governments, providing a
unique understanding of the capabilities of the technologies on
the market.
rpsgroup.com/energy
Typically, the role of any governing body
with the responsibility for a country’s
oil and gas industry, is as the domestic
petroleum resource owner, manager,
and regulator of the country’s E&P
industry. The role represents a number of
9. responsibilities including:
n Regulatory compliance
n Promotion of inward investment
n Cost savings
n Long term preservation of data for
scientific purposes
As investment within a country’s E&P
industry increases, so do the volumes and
complexity of data. An existing, efficient
data repository is an attractive proposition
to any exploration company looking for
new regions to explore.
The challenges facing government
departments include:
n Improved recognition of data as an
asset and as a key enabler for potential
10. new investment
n Adding value to current oil and gas
assets and realising new opportunities
in an environment with business,
technological and political fluctuations
n Managing increasing multi-sourced
volumes of a variety of data in a
multi-client environment
n Ready availability of high integrity
integrated datasets
n Continuous monitoring of data quality
and improved understanding of integrity
11. of datasets
n Improved interaction between the
Government and Industry
n Improved methods of addressing
security and entitlement issues
n Improved standards and procedures
to ensure preservation and increase in
value from the data including regulatory
compliance
n Establishing an improved strategy for
storage and archive of data
National Data Repositories
An NDR would provide long
term storage for future use,
ensuring that data retains
value into the future.
12. How can we help?
rpsgroup.com/energy
National Data Repositories Lifecycle
Define Develop Deliver Ongoing Audit
rpsgroup.com/energy
National Data Repositories Lifecycle
National Data Repositories Lifecycle
Define Develop Deliver Ongoing Audit
RPS Credentials
We were among the pioneers of the
concept of NDR’s through work with
energy companies and government in the
UK. We managed the design and set up
the UK’s Common Data Access (CDA)
13. initiative and have been involved since
inception in its continual evolution. Our
work as leaders in this field has taken us
to Norway, Colombia, Venezuela, Algeria,
Peru, Russia, and Romania. In parallel with
this NDR work, RPS Energy continues to
advise companies in data management
strategies. We are therefore able to
integrate a given country strategy into
a company’s internal data management
strategy. This has worked very effectively
in Norway.
Recent Project Examples
Specifically, RPS Energy is experienced in
working with a number of government
bodies around the world in developing
strategies for NDR implementation, and
has undertaken a number of projects in
this area. These include:
n Original design and implementation of
the UK Common Data Access (CDA)
systems. CDA provides data storage
14. and access systems to a consortium of
exploration and production companies
in the UK Continental shelf.
n Enhancement of the system to provide
UKDEAL – the web based information
system available on the internet
which shows data availability to
interested parties.
n Development of the commercial and
process models for the UK National
Hydrocarbon Data Archive (NHDA)
which is designed to provide long term
storage of data which is of no current
interest to the E&P industry.
n Development of the Banco de
Información Petroleó (BIP) for
Ecopetrol in Colombia. The system
was designed to hold all of Ecopetrol’s
information which it shared with the
operating companies in Colombia, plus
all the data from within Ecopetrol.
15. n Assisting Perupetro with the definition
and tendering for a commercial
solution for their petroleum data
system.
n Carrying out a feasibility study
on behalf of the West Australian
Government and industry for
development of a Petroleum Data
Centre supporting both government
and industry requirements. Further
work was carried out to update
the proposed roadmap as a result
of ongoing differences between
Government and Industry on the way
forward.
n Working with the Malay Thai
Joint Authority (MTJA) to help it
manage its interactions with the
Joint Development Area (JDA)
license holders more effectively. The
work includes streamlining decision-
making processes, developing data
analysis capabilities to allow MTJA to
16. better assess operational activities,
and investigating a shared system
for managing common data and
information. A common document
management system and a data analysis
technology have been implemented.
In addition to our work with national
authorities we have extensive experience
in developing systems and processes
for managing data and information in oil
companies. Our clients in this area include:
n BP
17. n BG
n OMV Group, including Petrom
n Shell
n Anadarko
n Hess
n UK DTI Oil and Gas division
(now part of DECC)
n Maersk Oil
n GDF Suez
rpsgroup.com/energy
Our Approach
RPS Energy have a structured approach
18. to undertaking projects of this nature
based on a “define, develop, deliver”
methodology.
Define
We are the E&P industry’s leading
independent data management
consultancy, and have the resource
capacity to support large and complex
programmes. RPS Energy has no
connections with data management
products or contracting services, and
therefore provide a service focused
entirely on adding value to the business.
We have an unrivalled reputation for
taking on complex problems and breaking
them down into manageable elements to
form an integrated programme, using a
wide range of specialist expertise in the
Technical Information Management arena.
n Understand the stakeholder drivers
(NOC/oil companies/ministry etc)
19. n Set up an effective governance
structure
n Understand data volumes, owners and
locations
n Build the business case and assist in
promoting the concept
n Outline statement of requirements
n Assist secure funding
Develop
The core capability of an NDR is
organizing and exploiting large volumes
of data covering a comprehensive
geographical area to serve a community
of users and prospective investors,
allowing improved promotion of acreage
20. and improved management of existing
acreage.
An NDR would therefore be of national
strategic importance for the prosperity of
the country’s petroleum industry.
An NDR would be able to store and
provide access through a single portal,
reducing the number of applications
required to access the information and
consolidating data sources into a single
framework. The data types included in
an NDR would include all exploration,
drilling and production activity – both
21. national and private enterprise. However,
there is scope to include other nationally
important data – for example planning
and environmental information.
Additionally the data stored can be
structured or unstructured data, and
include the majority of raw data formats.
n Write technology and service (either
in one or two tenders) RFP
n Manage procurement
n Define the entitlements system
n Assist in setting up the data release
22. policies
n Advise on optimal vendor/service
combination
n Assist in contract negotiations with
the chosen vendor
Deliver
Through the implementation of the plans
developed by RPS Energy, the challenges
facing government departments are met,
through delivery of the following:
n Implementation plan
n Map out processes
n Document policies and procedures
n Detailed data loading planning
23. n System roll out
n Testing
n Integration testing
n Training
n Manage Change
rpsgroup.com/energy
Ongoing Audit and support
RPS Energy has a continuing relationship
with many of the governments we have
worked with in the past, ensuring that the
systems put in place continue to function
fully. RPS Energy offers the following
services:
n Validating the security of systems
including the entitlements system
24. n Retendering of systems as and when
required
n Validating the processes used to
capture and manage data, improving
them where appropriate
In addition to the services outlined above,
RPS Energy is also involved in advising
governments on inward investment into
E&P. Data is a vital part of such a process
and is therefore integral to a country’s
strategy in attracting investment from the
oil industry.
National Data Repositories Lifecycle
Define Develop Deliver Ongoing Audit
RPS Energy is a part of RPS Group Plc, a consultancy
organisation employing
over 5000 professionals with a unique blend of skills and
experience. We
25. operate worldwide from regional offices in North America,
Europe, Australia
and S E Asia.
We have a reputation for successfully meeting the challenges
posed by large
complex projects and for providing reliable and practical advice
to clients in
all sectors of the economy. RPS Energy conducts business in an
open and fair
manner, contributing to society in a positive way.
rpsgroup.com/energy
23
45
6
A
ug
us
29. o
ri
ne
f
re
e
pr
o
ce
ss
.
UK | USA | Canada | Austr alia | Malaysia |
Singapore | Russia
For more information about our
Energy Services please contact:
[email protected]
30. Data Repository & Customer Portal
With the print data from your customer accounts backed up in
the FMAudit™ Central™
database repository at your location, you are the owner of a
powerful informational resource
that will help you provide better services for your customers,
increase operational
efficiencies and grow your business.
Fuel Your Growth Engine with:
Reports and/or Billing > Connecting Central™ to your
accounting system,
ERP or CRM system, you can generate reports and/or billing
invoices. With meter
synchronization with Onsite™, your accurate and timely
invoices sail through customer
approval with ease, resulting in quicker collection of your
31. money!
Online Customer Meter Validation > As a customer portal,
Central™ facilitates customer validation of meter readings
online – a convenience for you
and your customers and a huge source of savings for you!
(Note: You can also extract data
as .xls, csv, http and xml files.)
Contract Device Management > You get the data you need to
optimize contract efficiency for both you and your customers –
savings for your customer,
increased revenues for you.
Cross Account Analysis > You can compare account assets by
consumption,
geography, industry, account manager and other variables – spot
trends, manage sales and
data mine for golden opportunities to increase business.
Better Technical Services > Central’s Device Dashboard lets
32. you see an
easy-to-read virtual representation of critical service alerts –
technical support, toner low, etc.
– your customers get better service, you get better revenues.
How FMAudit™ Central™ Works
Residing at your location, Central™ is a backup database
repository for the print asset and
metering data gathered using any or all of FMAudit’s family of
data collection products:
Viewer USB™ (portable USB device; download collected data
for backup)
Onsite™ (resident on customer network; automatic,
synchronized feed into Central™)
WebAudit™ (internet based data collection; synchronized feed
into Central™)
When Used Viewer USB™
Central allows you to protect and save valuable meter readings
and audits in one central
33. repository that stays with you, regardless of where the USB key
goes.
When used with Onsite™
Central™ can synchronize and consolidate as many Onsite™
metering data feeds as you need
without any third party involvement. No ASP! The data is
secure and confidential.
Start revving your growth engine with print data that is
accurate, timely and all yours to explore
– to spur growth and to speed through revenues.
> DEALER PRINCIPALS
> SALES MANAGERS
> OPERATIONS
> SERVICE
> FINANCE
34. Call: 1-573-632-2461
E-mail: [email protected]
Website: www.FMAudit.com
Minimum Technical Requirements
Customer Workstation or Server
Windows 2000 or higher
Connected to target network
308 East High Street, Suite 109 Jefferson City, MO 65101
Q3 How Do Organizations Use Data Warehouses and Data
Marts to Acquire Data? 305
is s h o w n t o be - 6 0 0 . T h i s r e s u l t , an e r r o r , o c c u
r r e d because t h e o p e r a t i o n a l d a t a
showed t h a t 19,800 u n i t s were ordered and 19,800 u n i t s
were sold. However, the opera-
t i o n a l data also s h o w e d t h a t 600 u n i t s were d a m a g
e d . Clearly, s o m e t h i n g is w r o n g ,
35. somewhere. I t c o u l d be due t o a k e y i n g mistake by
someone o n the receiving dock, i t
c o u l d be t h a t t h e v e n d o r s u b s e q u e n t l y s h i p p
e d r e p l a c e m e n t i t e m s t h a t were n o t
charged a n d t h e r e f o r e d i d n o t appear i n t h e a c c o u
n t s payable database t h a t Lucas
q u e r i e d , or i t c o u l d be due t o some other reason.
Such a discrepancy is n o t u n u s u a l for B I analyses. W h e
n data are i n t e g r a t e d f r o m
several or m a n y d i f f e r e n t sources, t h e r e s u l t i n g c
o l l e c t i o n is f r e q u e n t l y i n c o n s i s t e n t .
The o n l y safeguard against inaccurate analyses f r o m such i
n c o n s i s t e n t data is for the
analysts and knowledge w o r k e r s t o k n o w t h a t such
inconsistencies are possible, to be
o n the l o o k o u t f o r t h e m , a n d t o a p p l y a c r i t i c a
l eye t o B I results.
A d d i s o n , Drew, a n d Lucas w o u l d use a process s i m i l
a r t o t h a t j u s t discussed t o f i n i s h
t h e i r analysis. They w o u l d l i k e l y a d d costs t o the
data they've already g a t h e r e d a n d
analyze i t so as t o p r o d u c e a n average cost per i t e m f o
r each v e n d o r a n d o t h e r s i m i l a r
36. results. The p a r t i c u l a r s are n o t i m p o r t a n t here; j u
s t realize they w o u l d c o n t i n u e i n a
s i m i l a r v e i n u n t i l t h e y were finished.
A t t h a t p o i n t , a c c o r d i n g t o the process s u m m a r y
i n Figure 9-3, t h e y w o u l d p u b l i s h
t h e i r results. Several possibilities exist:
P r i n t a n d d i s t r i b u t e t h e results v i a e m a i l or a c o
l l a b o r a t i o n t o o l .
Publish via a Web server or SharePoint.
• Publish o n a B I server.
A u t o m a t e the results via a Web service.
We w i l l discuss these alternatives i n m o r e d e t a i l i n Q7.
For now, j u s t realize t h a t
GearUp w o u l d choose a m o n g these alternatives a c c o r d
i n g to its needs. I f the business
intelligence is o n l y created to p r o v i d e guidance for
buyers, A d d i s o n a n d D r e w m i g h t
be c o n t e n t j u s t to p r i n t t h e i r results a n d e m a i l t
h e m to buyers or share t h e m u s i n g a
c o l l a b o r a t i o n t o o l . As an alternative, they c o u l d
also p r o d u c e t h e r e p o r t i n H T M L a n d
37. place i t o n a Web server. As an extension t o t h a t o p t i o n ,
t h e y c o u l d use SharePoint t o
p u b h s h t h e results. A l t h o u g h w e d i d n ' t discuss t h
e m i n C h a p t e r 2, SharePoint has
extensive features a n d f u n c t i o n s for B I r e p o r t i n g .
A d d i s o n a n d D r e w c o u l d integrate
t h e i r analyses w i t h these features a n d f u n c t i o n s so t
h a t users c o u l d go to a SharePoint
site f o r the latest data. F o u r t h , t h e y c o u l d p u b l i s h
via a B I server, w h i c h is a Web server
a p p l i c a t i o n t h a t is specialized for p u b h s h i n g B I
results. Finally, Lucas m i g h t assign a
p r o g r a m m e r i n his d e p a r t m e n t to create a Web
service t h a t w o u l d make i t possible f o r
o t h e r programs t o o b t a i n t h e B I results p r o g r a m m
a t i c a l l y . Most likely, f o r t h e i r s i t u a -
t i o n , t h e y w i l l p r i n t the results a n d e m a i l t h e m o
r share t h e m via a c o l l a b o r a t i o n t o o l .
W i t h this example i n m i n d , we w i l l n o w discuss each o
f the elements o f Figure 9-3
i n greater d e t a i l .
How Do Organizations Use Data
Warehouses and Data Marts to Acquire
38. Data?
A l t h o u g h i t is p o s s i b l e to create b a s i c r e p o r t s a
n d p e r f o r m s i m p l e analyses f r o m
o p e r a t i o n a l d a t a , t h i s course is n o t u s u a l l y r e c
o m m e n d e d . For reasons o f s e c u r i t y
a n d c o n t r o l , IS professionals do n o t w a n t business
analysts like A d d i s o n processing
o p e r a t i o n a l data. I f A d d i s o n makes a n error, t h a t
error c o u l d cause a serious d i s r u p -
t i o n i n Gearllp's o p e r a t i o n s . Also, o p e r a t i o n a l d
a t a is s t r u c t u r e d for fast a n d reliable
t r a n s a c t i o n p r o c e s s i n g . I t is s e l d o m s t r u c t
u r e d i n a w a y t h a t r e a d i l y s u p p o r t s B I
306 CHAPTER 9 Business Intelligence Systems
Components of a Data
Warehouse Production, Databases
' Other
Internal
Data
39. : External
Data J
Data }
Warehouse]
Metadata •
Data
Warehouse
Database
Data
Extraction/
Cleaning/
Preparation
Programs
Data
Warehouse
DBMS
41. reduce system p e r f o r m a n c e .
For these reasons, m o s t organizations extract o p e r a t i o n a
l data for B I processing.
For a s m a l l o r g a n i z a t i o n l i k e GearUp, t h e e x t r a
c t i o n m a y be as s i m p l e as an Access
database. Larger o r g a n i z a t i o n s , however, t y p i c a l l
y create and staff a g r o u p of people
w h o manage and r u n a data warehouse, w h i c h is a f a c i l i
t y for m a n a g i n g an organiza-
tion's B I data. The f u n c t i o n s of a data warehouse are t o :
O b t a i n data
Cleanse data
Organize a n d relate data
Catalog data
Figure 9-11 shows t h e c o m p o n e n t s o f a data warehouse.
Programs read p r o d u c -
t i o n a n d o t h e r d a t a a n d e x t r a c t , c l e a n , a n d p r
e p a r e t h a t d a t a f o r B I p r o c e s s i n g .
The p r e p a r e d data are stored i n a data warehouse database
u s i n g a data warehouse
D B M S , w h i c h c a n be d i f f e r e n t f r o m t h e
organization's o p e r a t i o n a l D B M S . For
42. example, an o r g a n i z a t i o n m i g h t use Oracle for its o p
e r a t i o n a l processing, b u t use SQL
Server f o r its d a t a w a r e h o u s e . O t h e r o r g a n i z a t
i o n s use SQL Server f o r o p e r a t i o n a l
processing, b u t use DBMSs f r o m statistical package v e n d
o r s such as SAS or SPSS i n
t h e data warehouse.
D a t a w a r e h o u s e s i n c l u d e d a t a t h a t are p u r c h
a s e d f r o m o u t s i d e s o u r c e s .
T h e p u r c h a s e o f d a t a a b o u t o t h e r c o m p a n i e s
is n o t u n u s u a l o r p a r t i c u l a r l y
c o n c e r n i n g f r o m a p r i v a c y s t a n d p o i n t . H o w
e v e r , s o m e c o m p a n i e s , Hke Fox Lake,
m i g h t c h o o s e t o b u y p e r s o n a l , c o n s u m e r d a t
a ( l i k e m a r i t a l s t a t u s ) f r o m d a t a
v e n d o r s l i k e A c x i o m C o r p o r a t i o n . F i g u r e 9-
12 l i s t s s o m e o f t h e c o n s u m e r d a t a
Examples of Consumer Data
for Sale
Name, address, phone
Age
Gender
43. Ethnicity
Religion
Income
Education
Voter registration
Home ownership
Vehicles
Magazine subscriptions
Hobbies
Catalog orders
Marital status, life stage
Height, weight, hair and
eye color
Spouse name, birth date
Children's names and
birth dates
Q3 How Do Organizations Use Data Warehouses and Data
Marts to Acquire Data?
t h a t can be r e a d i l y p u r c h a s e d . A n a m a z i n g ( a
n d f r o m a p r i v a c y s t a n d p o i n t , f r i g h t -
44. ening) a m o u n t o f data is available.
M e t a d a t a c o n c e r n i n g t h e d a t a — i t s source, its f
o r m a t , its a s s u m p t i o n s a n d
c o n s t r a i n t s , a n d o t h e r facts a b o u t t h e data—is k
e p t i n a d a t a w a r e h o u s e m e t a d a t a
database. The data warehouse DBMS extracts a n d provides
data to B I a p p l i c a t i o n s .
Most o p e r a t i o n a l a n d purchased data have p r o b l e m
s t h a t i n h i b i t t h e i r usefulness for
business intelligence. Figure 9-13 lists the m a j o r p r o b l e m
categories. First, a l t h o u g h
data t h a t are c r i t i c a l for successful o p e r a t i o n s m u
s t be c o m p l e t e a n d accurate, data
t h a t are o n l y m a r g i n a l l y necessary need n o t be. For e
x a m p l e , some systems gather
d e m o g r a p h i c data i n the o r d e r i n g process. But,
because such data are n o t needed t o
f i l l , ship, a n d b i l l orders, t h e i r q u a l i t y suffers.
P r o b l e m a t i c data are t e r m e d d i r t y data. Examples
are a value of B f o r c u s t o m e r
gender a n d of 213 for c u s t o m e r age. Other examples are a
value o f 999-999-9999 for a
45. U.S. p h o n e n u m b e r , a p a r t c o l o r o f g r e n , a n d an
e m a i l address o f [email protected]
W h o L A M . o r g . A l l o f these values can be p r o b l e m a
t i c for B I purposes.
Purchased d a t a o f t e n c o n t a i n m i s s i n g e l e m e n t s
. M o s t data v e n d o r s state t h e
percentage of m i s s i n g values for each a t t r i b u t e i n the
data they sell. A n o r g a n i z a t i o n
buys such data because for some uses some data are better t h a
n n o data at a l l . This is
especially t r u e for data items whose values are d i f f i c u l t t
o o b t a i n , such as N u m b e r of
A d u l t s i n H o u s e h o l d , H o u s e h o l d I n c o m e , D
w e l l i n g Type, a n d E d u c a t i o n o f P r i m a r y
I n c o m e Earner. However, care is r e q u i r e d here because
for some B I a p p l i c a t i o n s a few
m i s s i n g or erroneous data p o i n t s can seriously bias the
analysis.
I n c o n s i s t e n t data, the t h i r d p r o b l e m i n Figure 9-
13, is p a r t i c u l a r l y c o m m o n f o r
data t h a t have been gathered over t i m e . W h e n an area
code changes, for example, the
p h o n e n u m b e r for a given customer before the change w i
46. l l n o t m a t c h the customer's
n u m b e r after the change. L i k e w i s e , p a r t codes can
change, as c a n sales t e r r i t o r i e s .
Before such data can be used, t h e y m u s t be recoded for
consistency over t h e p e r i o d
o f the study.
Some data i n c o n s i s t e n c i e s o c c u r f r o m t h e n a t u
r e o f t h e business a c t i v i t y .
Consider a Web-based o r d e r - e n t r y system used by c u s t
o m e r s w o r l d w i d e . W h e n t h e
Web server records the t i m e o f order, w h i c h t i m e zone
does i t use? The server's system
clock t i m e is i r r e l e v a n t to an analysis o f c u s t o m e r
b e h a v i o r . C o o r d i n a t e d U n i v e r s a l
T i m e ( f o r m e r l y called G r e e n w i c h M e a n T i m e )
is also m e a n i n g l e s s . Somehow, Web
server t i m e m u s t be adjusted t o the t i m e zone of the
customer.
A n o t h e r p r o b l e m is n o n i n t e g r a t e d data. A p a r t
i c u l a r B I analysis m i g h t require data
f r o m an ERP s y s t e m , a n e - c o m m e r c e system, a n d
a social n e t w o r k i n g a p p U c a t i o n .
Analysts m a y w i s h t o i n t e g r a t e t h a t o r g a n i z a t i
47. o n a l data w i t h p u r c h a s e d c o n s u m e r
d a t a . Such a data c o l l e c t i o n w i l l l i k e l y have r e l
a t i o n s h i p s t h a t are n o t r e p r e s e n t e d
i n p r i m a r y k e y / f o r e i g n key r e l a t i o n s h i p s . I t
is the f u n c t i o n o f p e r s o n n e l i n t h e d a t a
warehouse to integrate such data, somehow.
Data can also have the w r o n g granularity, a t e r m t h a t
refers t o the level of d e t a i l
represented by the data. Granulaiit}' can be too fine or too
coarse. For the former, suppose
we w a n t t o analyze the p l a c e m e n t o f graphics a n d c o
n t r o l s o n an o r d e r - e n t r y Web
page. I t is possilale t o c a p t u r e the c u s t o m e r s ' c l i c
k i n g b e h a v i o r i n w h a t is t e r m e d
• Dirty data • Wrong granularity Possible Problems with
• Missing values -Too fine Source Data
• Inconsistent data - Not fine enough
• Data not integrated • Too much data
- Too many attributes
- Too many data points
48. 308 CHAPTER 9 Business Intelligence Systems
clickstream data. Those data, however, i n c l u d e e v e r y t h i
n g the c u s t o m e r does at the
Web s i t e . I n t h e m i d d l e o f t h e o r d e r s t r e a m are
d a t a f o r clicks o n t h e n e w s , e m a i l ,
i n s t a n t chat, a n d a weather check. A l t h o u g h all of t h
a t data m a y be useful for a s t u d y
o f c o n s u m e r b r o w s i n g behavior, i t w i l l be o v e r w
h e l m i n g i f a l l we w a n t t o k n o w is h o w
c u s t o m e r s r e s p o n d to an ad l o c a t e d d i f f e r e n t
l y o n t h e screen. To p r o c e e d , t h e d a t a
analysts m u s t t h r o w away m i l l i o n s and m i l l i o n s o
f clicks.
D a t a can also be t o o coarse. For e x a m p l e , a f i l e o f r e
g i o n a l sales totals c a n n o t be
used t o i n v e s t i g a t e t h e sales i n a p a r t i c u l a r store
i n a r e g i o n , a n d t o t a l sales f o r a
store c a n n o t be u s e d t o d e t e r m i n e t h e sales o f p a r
t i c u l a r i t e m s w i t h i n a s t o r e .
I n s t e a d , we n e e d t o o b t a i n d a t a t h a t is f i n e e n
o u g h f o r t h e l o w e s t - l e v e l r e p o r t we
w a n t t o p r o d u c e .
49. I n general, i t is better t o have too fine a g r a n u l a r i t y t h
a n t o o coarse. I f the g r a n u -
l a r i t y is too f i n e , the data can be m a d e coarser by s u m
m i n g a n d c o m b i n i n g . O n l y ana-
lysts' l a b o r a n d c o m p u t e r p r o c e s s i n g are r e q u i
r e d . I f t h e g r a n u l a r i t y is t o o coarse,
however, there is n o w a y t o separate t h e data i n t o c o n s t
i t u e n t parts.
T h e f i n a l p r o b l e m l i s t e d i n Figure 9-13 is t o have t
o o m u c h d a t a . As s h o w n i n
t h e f i g u r e , we c a n have e i t h e r t o o m a n y a t t r i b u
t e s o r t o o m a n y data p o i n t s . T h i n k
b a c k to t h e d i s c u s s i o n o f tables i n C h a p t e r 5. We
c a n have t o o m a n y c o l u m n s or t o o
m a n y r o w s .
Consider t h e f i r s t p r o b l e m : t o o m a n y a t t r i b u t e
s . Suppose w e w a n t t o k n o w t h e
factors t h a t i n f l u e n c e h o w customers r e s p o n d to a p
r o m o t i o n . I f we c o m b i n e i n t e r n a l
c u s t o m e r d a t a w i t h p u r c h a s e d c u s t o m e r d a t
a , w e w i l l have m o r e t h a n a h u n d r e d
d i f f e r e n t a t t r i b u t e s t o c o n s i d e r . H o w do w e
50. select a m o n g t h e m ? Because o f a
p h e n o m e n o n c a l l e d t h e curse of dimensionality, t h e
m o r e a t t r i b u t e s t h e r e are,
t h e easier i t is t o b u i l d a m o d e l t h a t fits t h e s a m p l
e data b u t t h a t is w o r t h l e s s as a
p r e d i c t o r . There are o t h e r good reasons f o r r e d u c i
n g the n u m b e r o f a t t r i b u t e s , a n d
one o f t h e m a j o r a c t i v i t i e s i n d a t a m i n i n g c o n
c e r n s e f f i c i e n t a n d effective ways o f
selecting a t t r i b u t e s .
The s e c o n d w a y t o have t o o m u c h d a t a is to have t o
o m a n y d a t a p o i n t s — t o o
m a n y rows o f data. Suppose we w a n t to analyze c l i c k s t
r e a m data o n C N N . c o m . H o w
m a n y clicks does t h a t site receive p e r m o n t h ? M i l l i
o n s u p o n m i l l i o n s ! I n o r d e r to
m e a n i n g f u l l y analyze such d a t a w e n e e d to r e d u c
e t h e a m o u n t of d a t a . One g o o d
s o l u t i o n t o this p r o b l e m is statistical s a m p l i n g .
Organizations s h o u l d n o t be r e l u c t a n t
t o sample data i n such s i t u a t i o n s .
Data Wareiioyses Versos Data Marts
To u n d e r s t a n d t h e d i f f e r e n c e b e t w e e n data
51. warehouses a n d d a t a m a r t s , t h i n k o f a
data warehouse as a d i s t r i b u t o r i n a s u p p l y c h a i n .
The data warehouse takes data f r o m
t h e data m a n u f a c t u r e r s ( o p e r a t i o n a l systems a
n d p u r c h a s e d d a t a ) , cleans a n d
processes the data, a n d locates the data o n the shelves, so t o
speak, o f the data w a r e -
house. The people w h o w o r k w i t h a data warehouse are
experts at data m a n a g e m e n t ,
data c l e a n i n g , data t r a n s f o r m a t i o n , data
relationships a n d the l i k e . However, t h e y are
n o t usually experts i n a given business f u n c t i o n .
A data m a r t is a data c o l l e c t i o n , smaller t h a n the data
warehouse, t h a t addresses
t h e needs of a p a r t i c u l a r d e p a r t m e n t or f u n c t i o
n a l area o f t h e business. I f the d a t a
warehouse is the d i s t r i b u t o r i n a s u p p l y c h a i n , t h
e n a data m a r t is like a retail store i n
a s u p p l y c h a i n . Users i n the data m a r t o b t a i n data t
h a t p e r t a i n to a particular- business
f u n c t i o n f r o m t h e d a t a w a r e h o u s e . Such users d
o n o t have t h e d a t a m a n a g e m e n t
expertise t h a t data warehouse employees have, b u t t h e y are
knowledgeable analysts
52. for a g i v e n business f u n c t i o n .
Figure 9-14 i l l u s t r a t e s these r e l a t i o n s h i p s . The d
a t a w a r e h o u s e takes data f r o m
t h e d a t a p r o d u c e r s a n d d i s t r i b u t e s the d a t a t
o t h r e e d a t a m a r t s . One d a t a m a r t is
used t o analyze c l i c k s t r e a m data f o r t h e p u r p o s e o
f d e s i g n i n g Web pages. A second
analyzes store sales d a t a a n d d e t e r m i n e s w h i c h p r o
d u c t s t e n d t o be p u r c h a s e d
together. This i n f o r m a t i o n is used t o t r a i n salespeople
o n t h e best w a y to u p - s e l l t o
c u s t o m e r s . T h e t h i r d d a t a m a r t is u s e d t o
analyze c u s t o m e r o r d e r d a t a f o r t h e
Q4 How Do Organizations tJse Typical Reporting Applications?
Data
Warehouse
Metadata
2
S
54. for store
management
^ Inventory
History
Data
BI tools
for Inventory
management
Inventory Data Mart
Web page
design features
Data Mart Examples
Market-basket
analysis for sales
training
Inventory
55. layout
for optimal
item picking
p u r p o s e o f r e d u c i n g l a b o r f o r i t e m p i c k i n g f
r o m t h e w a r e h o u s e . A c o m p a n y l i k e
A m a z o n . c o m , f o r e x a m p l e , goes t o great l e n g t h
s t o o r g a n i z e its w a r e h o u s e s t o
reduce p i c k i n g expenses.
As y o u can i m a g i n e , i t is expensive to create, staff, a n d
operate data warehouses
a n d data m a r t s . O n l y large o r g a n i z a t i o n s w i t h
deep pockets can a f f o r d t o operate a
s y s t e m l i k e t h a t s h o w n i n F i g u r e 9 - 1 1 . Smaller
o r g a n i z a t i o n s l i k e GearUp o p e r a t e
subsets o f t h i s system, b u t t h e y m u s t find ways t o solve
the basic p r o b l e m s t h a t data
warehouses solve, even i f those ways are i n f o r m a l .
How Do Organizations Use Typical
Reporting Applications?
A reporting application is a B I a p p l i c a t i o n t h a t i n p u
t s data f r o m one or m o r e sources
a n d applies r e p o r t i n g o p e r a t i o n s t o t h a t data t o
56. p r o d u c e business intelligence. We w i l l
f i r s t s u m m a r i z e r e p o r t i n g o p e r a t i o n s a n d t
h e n i l l u s t r a t e t w o i m p o r t a n t r e p o r t i n g
a p p l i c a t i o n s : RFIVI analysis a n d OLAR
R e p o r t i n g a p p l i c a t i o n s p r o d u c e business
intelligence u s i n g five basic o p e r a t i o n s :
S o r t i n g
F i l t e r i n g
G r o u p i n g
Calculating
F o r m a t t i n g
N o n e o f these o p e r a t i o n s is p a r t i c u l a r l y s o p h
i s t i c a t e d ; they can all be a c c o m p l i s h e d
u s i n g SQL a n d basic H T M L or a s i m p l e r e p o r t w r i
t i n g t o o l .
A d d i s o n at GearUp used Access t o a p p l y a l l five o f
these o p e r a t i o n s i n t h e p r e p a -
r a t i o n o f t h e r e p o r t s discussed i n Q2. E x a m i n e , f
o r e x a m p l e . Figure 9-7 (page 301).
T h e r e s u l t s are sorted a n d grouped b y V e n d o r l D a n
d , w i t h i n a v e n d o r , sorted i n
57. decreasing o r d e r by value o f SalesShortage. The value o f
SalesShortage as w e l l as t h e
314 CHAPTER 9 Business Intelligence Systems
A Group Exercise
Do You Have a Club Card?
A d a t a aggregator is a company ttiat obtains data from
public and private sources and stores, combines, and pub-
lishes it i n sophisticated ways. When you use your grocery
store club card, tlie data from your grocery shopping trip are
sold to a data aggregator. Credit card data, credit data, public
tax records, insurance records, product warranty card data,
voter registration data, and hundreds of other types of data are
sold to aggregators.
Not all of the data are identified i n the same way (or, i n
terms of Chapter 5, not all of it has the same primary key). But,
using a combination of phone number, address, email
address, name, and other partially identifying data, such com-
panies can integrate that disparate data into an integrated,
58. coherent whole. They then query, report, and mine the inte-
grated data to f o r m detailed descriptions about companies,
communities, zip codes, households, and individuals.
As you w i l l learn in Chapter 12, laws l i m i t the types of
data
that federal and other governmental agencies can acquire and
store. There are also some legal safeguards on data maintained
by credit bureaus and medical facilities. However, no such laws
l i m i t data storage by most companies (nor are there laws that
prohibit governmental agencies from buying results from data
aggregators).
Acxiom Corporation, a data aggregator w i t h $1.2 billion i n
sales in 2009, has been described as the "biggest compairy you
never heard of." Visit www.acxiom.com and complete the
following tasks:
1. Na'igate the Acxiom Web site and make a list of 10 different
products that Acxiom provides.
2. Describe Acxiom's top customers.
3. Examine your answers to items 1 and 2 and describe, i n
general terms, the kinds of data that Acxiom must collect
59. to be able to provide these products to its customers.
4. In what ways might companies like Acxiom need to limit
their marketing so as to avoid a privacy outcry from the
public?
5. According to the Web site, what is Acxiom's privacy policy?
Are you reassured by its policy? Why or why not?
6. Should there be laws governing companies like Acxiom?
Why or why not?
7. Prepare a 3-minute presentation of your answers to items
3, 4, 5, and 6. Give your presentation to the rest of the
class.
Data mining and other
business intelligence systems
are useful, but they are not
without problems, as
discussed in the Guide on
pages 330-331.
How Do Organizations Use Typical
Data Mining Applications?
60. D a t a m i n i n g is the a p p l i c a t i o n of statistical t e c h n
i q u e s t o find patterns a n d r e l a t i o n -
ships a m o n g d a t a f o r c l a s s i f i c a t i o n a n d p r e d i
c t i o n . As s h o w n i n F i g u r e 9-19, d a t a
m i n i n g resulted f r o m a convergence o f disciplines. Data
m i n i n g techniques emerged
f r o m statistics a n d m a t h e m a t i c s and f r o m a r t i f i c
i a l intelligence a n d m a c h i n e - l e a r n i n g
fields i n c o m p u t e r science. As a result, d a t a m i n i n g t
e r m i n o l o g y is an o d d b l e n d o f
t e r m s f r o m these d i f f e r e n t d i s c i p l i n e s . S o m e
t i m e s p e o p l e use t h e t e r m knowledge
discovery in databases (KDD) as a s y n o n y m f o r data m i n
i n g .
Data m i n i n g techniques take advantage o f developments i n
data m a n a g e m e n t f o r
processing t h e e n o r m o u s databases t h a t have emerged i
n the last 10 years. O f course,
these data w o u l d n o t have been generated were i t n o t for
fast a n d cheap c o m p u t e r s ,
a n d w i t h o u t such c o m p u t e r s t h e n e w techniques w
o u l d be i m p o s s i b l e t o c o m p u t e .
61. Q5 How Do Organizations Use Typical Data Mining
Applications?
Statistics/
Mathematics
Artificial Intelligence
^ Machine Learning
Cheap Computer
Processing and
Storage
I Huge
Databases
Data
Management
Technology
Marketing, Finance,
and Other Business
62. Professionals
Most data m i n i n g techniques are sophisticated, a n d m a n y
are d i f f i c u l t t o use w e l l .
Such t e c h n i q u e s are v a l u a b l e t o o r g a n i z a t i o n
s , however, a n d some business profes-
sionals, especially those i n finance and m a r k e t i n g , have
become expert i n t h e i r use. I n
fact, today there are m a n y interesting and r e w a r d i n g
careers for business professionals
w h o are knowledgeable about data m i n i n g techniques.
Data m i n i n g t e c h n i q u e s f a l l i n t o t w o b r o a d
categories: u n s u p e r v i s e d a n d super-
vised. We e x p l a i n b o t h types i n the f o l l o w i n g
sections.
W i t h unsupervised data m i n i n g , analysts do n o t create a
m o d e l o r hypothesis before
r u n n i n g the analysis. I n s t e a d , they a p p l y a data m i n
i n g a p p l i c a t i o n to t h e data a n d
observe the results. W i t h this m e t h o d , analysts create
hypotheses after the analysis, i n
order t o e x p l a i n the patterns f o u n d .
63. One c o m m o n u n s u p e r v i s e d t e c h n i q u e is c l u s t
e r a n a l y s i s . W i t h i t , s t a t i s t i c a l
techniques i d e n t i f y groups of entities t h a t have s i m i l a
r characteristics. A c o m m o n use
f o r cluster analysis is t o f i n d g r o u p s o f s i m i l a r c u s
t o m e r s f r o m c u s t o m e r o r d e r a n d
d e m o g r a p h i c data.
For example, suppose a cluster analysis finds t w o very d i f f e
r e n t c u s t o m e r groups:
One g r o u p has an average age of 33, o w n s t h r e e A n d r
o i d p h o n e s , t w o iPads, has
a n expensive h o m e e n t e r t a i n m e n t s y s t e m , d r i v
e s a Lexus SUV, a n d t e n d s t o b u y
expensive children's p l a y e q u i p m e n t . The s e c o n d g r
o u p has an average age o f 64,
owns A r i z o n a v a c a t i o n p r o p e r t y , plays golf, a n d
buys expensive w i n e s . Suppose t h e
analysis also finds t h a t b o t h groups b u y designer
children's c l o t h i n g .
These findings are obtained solely by data analysis. There is no
p r i o r m o d e l about the
patterns a n d relationships that exist, ft is up to the analyst to f
o r m hypotheses, after the
64. fact, to explain w h y t w o such different groups are b o t h b u
y i n g designer children's clothes.
Supervised Data Mininci
W i t h supervised data m i n i n g , data m i n e r s develop a m
o d e l prior to the analysis a n d
a p p l y statistical techniques t o data t o estimate parameters o
f t h e m o d e l . For example,
suppose m a r k e t i n g experts i n a c o m m u n i c a t i o n s c
o m p a n y believe t h a t c e l l p h o n e
usage o n weekends is d e t e r m i n e d b y t h e age o f t h e c
u s t o m e r a n d t h e n u m b e r o f
m o n t h s t h e c u s t o m e r has h a d t h e cell p h o n e a c c
o u n t . A data m i n i n g analyst w o u l d
t h e n r u n an analysis t h a t estimates the i m p a c t o f c u s t
o m e r a n d account age.
One s u c h analysis, w h i c h measures t h e i m p a c t o f a set
o f variables o n a n o t h e r
variable, is called a regression analysis. A sample result for the
cell phone example is:
Many problems arise with
classification schemes,
especially those that classify
65. people. The Ethics Guide on
pages 318-319 examines
some of these problems.
CellphoneWeekendMinutes = 12 + (17.5 x CustomerAgeJ
+ (23.7 X N u m b e r M o n t h s OfAccount)
316 CHAPTER 9 Business intelligence Systems
U s i n g t h i s e q u a t i o n , analysts can p r e d i c t t h e n u
m b e r o f m i n u t e s o f w e e k e n d c e l l
p h o n e use by s u m m i n g 12, p l u s 17.5 t i m e s t h e
customer's age, p l u s 23.7 t i m e s t h e
n u m b e r o f m o n t h s o f the account.
As y o u w i l l learn i n your statistics classes, considerable
skill is required to interpret the
q u a l i t y o f such a m o d e l . The regression t o o l w i l l
create an e q u a t i o n , such as the one
shown. Whether that equation is a good predictor of future cell
phone usage depends o n
statistical factors, such as rvalues, confidence intervals, and
related statistical techniques.
66. Neural networks are a n o t h e r p o p u l a r supervised data m
i n i n g a p p l i c a t i o n used t o
p r e d i c t values a n d m a k e classifications such as "good p r
o s p e c t " or " p o o r p r o s p e c t "
c u s t o m e r s . The t e r m neural networks is d e c e i v i n g
because i t c o n n o t e s a b i o l o g i c a l
process s i m i l a r t o t h a t i n a n i m a l b r a i n s . I n fact,
a l t h o u g h the o r i g i n a l idea o f n e u r a l
nets m a y have c o m e f r o m t h e a n a t o m y a n d p h y s i
o l o g y o f n e u r o n s , a n e u r a l n e t w o r k is
n o t h i n g m o r e t h a n a c o m p l i c a t e d set of possibly
n o n l i n e a r equations. E x p l a i n i n g the
t e c h n i q u e s used f o r n e u r a l n e t w o r k s is b e y o n
d the scope o f t h i s t e x t . I f y o u w a n t t o
learn m o r e , search h t t p : / / k d n u g g e t s . c o m f o r t h
e t e r m neural network.
I n the next sections, we w i l l describe a n d illustrate t w o t y
p i c a l data m i n i n g t o o l s —
market-basket analysis and decision trees—and show
applications o f those techniques.
F r o m t h i s discussion, y o u can gain a sense o f the nature
of data m i n i n g . These examples
s h o u l d give y o u , a future manager, a sense of the
67. possibilities of data m i n i n g techniques.
You w i l l need a d d i t i o n a l c o u r s e w o r k i n
statistics, data m a n a g e m e n t , m a r k e t i n g , a n d
finance, however, before y o u w i l l be able to p e r f o r m
such analyses yourself.
Suppose y o u r u n a dive shop, a n d one day y o u realize t h a
t one o f y o u r salespeople is
m u c h b e t t e r at u p - s e l l i n g t o y o u r c u s t o m e r s .
A n y o f y o u r sales associates c a n f i l l a
customer's o r d e r , b u t t h i s o n e salesperson is e s p e c i a
l l y g o o d at s e l h n g c u s t o m e r s
i t e m s in addition to those for w h i c h they ask. One day, y o
u ask h i m h o w he does i t .
"It's s i m p l e , " he says. " I j u s t ask m y s e l f w h a t is the
next p r o d u c t t h e y w o u l d w a n t t o
b u y I f someone buys a dive c o m p u t e r , I don't t r y t o
sell her fins. I f she's b u y i n g a dive
c o m p u t e r , she's already a d i v e r a n d she already has
fins. B u t , these d i v e c o m p u t e r
displays are h a r d t o read. A better mask makes i t easier t o
read the display a n d get the
f u l l b e n e f i t f r o m the dive c o m p u t e r . "
68. A m a r k e t - b a s k e t a n a l y s i s is an u n s u p e r v i s e
d data m i n i n g t e c h n i q u e for deter-
m i n i n g sales p a t t e r n s . A m a r k e t - b a s k e t
analysis shows the p r o d u c t s t h a t c u s t o m e r s
t e n d to b u y t o g e t h e r . I n m a r k e t i n g t r a n s a c t i
o n s , t h e fact t h a t c u s t o m e r s w h o b u y
p r o d u c t X also b u y p r o d u c t Y creates a c r o s s - s e l
l i n g o p p o r t u n i t y ; t h a t is, " I f they're
b u y i n g X, s e U t h e m Y " or " I f they're b u y i n g Y, sell
t h e m X."
F i g u r e 9-20 shows h y p o t h e t i c a l sales d a t a f r o m
400 sales t r a n s a c t i o n s at a d i v e
s h o p . T h e f i r s t r o w o f n u m b e r s u n d e r e a c h c o
l u m n is t h e t o t a l n u m b e r o f t i m e s
a n i t e m was s o l d . For example, t h e 270 i n the first r o w
o f M a s k means t h a t 270 o f t h e
400 t r a n s a c t i o n s i n c l u d e d m a s k s . The 90 u n d e
r D i v e C o m p u t e r m e a n s t h a t 90 o f
t h e 400 transactions i n c l u d e d dive c o m p u t e r s .
We c a n use t h e n u m b e r s i n t h e first r o w t o e s t i m a
t e t h e p r o b a b i l i t y t h a t a
c u s t o m e r w i l l purchase a n i t e m . Because 270 of the
400 transactions were masks, we
69. can estimate the p r o b a b i h t y t h a t a c u s t o m e r w i l l
b u y a mask t o be 270/400, or .675.
I n m a r k e t - b a s k e t t e r m i n o l o g y , support is the p r
o b a b i h t y t h a t t w o i t e m s w i l l be
purchased together. To estimate t h a t p r o b a b i l i t y , w e
examine sales t r a n s a c t i o n s a n d
c o u n t the n u m b e r of times that t w o items occurred i n
the same transaction. For the data
i n Figure 9-20, fins and masks appeared together 250 t i m e s ,
a n d thus the s u p p o r t for fins
and a mask is 250/400, or .625. Similarly the s u p p o r t for
fins a n d weights is 20/400, or .05.
These data are i n t e r e s t i n g by themselves, b u t we can
refine the analysis b y t a k i n g
a n o t h e r step a n d c o n s i d e r i n g a d d i t i o n a l p r o
b a b i l i t i e s . For example, w h a t p r o p o r t i o n
o f t h e c u s t o m e r s w h o b o u g h t a m a s k also b o u g
h t fins? Masks w e r e p u r c h a s e d
270 t i m e s , a n d o f those i n d i v i d u a l s w h o b o u g h t
masks, 250 also b o u g h t fins. T h u s ,
given t h a t a customer b o u g h t a mask, we can estimate the
p r o b a b i l i t y t h a t he or she
w i l l b u y fins to be 250/270, or .926. I n market-basket t e r
70. m i n o l o g i ^ such a c o n d i t i o n a l
p r o b a b i l i t y estimate is called the confidence.
Q5 How Do Organizations Use Typical Data Mining
Applications? 317
Mask Tank W e i g h t s Dive C o m p u t e r
Mask 27G IG 25C 10 90
Tank i c 2CG 40 15C 30
?>.ns 25C 4C 2SC 20 20
W e i g h t s 10 13C 20 130 10
Dive C o r n p u t e r 90 30 2C 10 120
N u m Trans
Support
Mask 0,675 0.025 0.625 C.025 0.-225
Tank G.C25 0.5 G-1 C.325 0.075
G,625 G.l C.7 C.05 0.05
W e i g h t s C.C25 0.325 C.G5 C.325 C.G25
Dive C o m p u t e r G.22S C.C75 G,C5 0,025 0.3
71. C o n f i d e n c e
Mask 1 C.C5 0,892357143 0X76923077 0.75
Tank C.G37CB7C57 1 0.142357143 i 0,25
- i n s G,925925926 G,.2 1 0.-153846154 0,166666667
W e i g h t s C-Ci7GS70i7 0-65 C.C7i42S571 0.083533353
Dive C o m p u t e r C.iyis'iiiii C.15 0X71428571 C.C76923C77
1
Lift ( i m p r o v e m e r
Mask C.C74D74C74 1.322751323 D.11396C114 i . i i i i i i i i i
Tank G,.C74074C74 0.23571^286 G.5
=!n5 1-322751323 G. 285714286 C-21978022 C.23309523S
W e i g h t s C.ii396G114 2 C,2197S022 G.25641C256
Dive C o n p u i e r i . i i i n i i n 0.5 0.233095238 0.256410256
Market-Basket Analysis at a
Dive Shop
Reflect o n t h e m e a n i n g o f t h i s c o n f i d e n c e v a l u
e . T h e l i k e l i h o o d o f s o m e o n e
w a l k i n g i n t h e d o o r a n d b u y i n g fins is 250/400, or
.625. But the l i k e l i h o o d o f someone
b u y i n g fins, given t h a t he or she b o u g h t a mask, is
.926. Thus, i f someone buys a mask,
t h e l i k e l i h o o d t h a t he or she w i l l also b u y fins
72. increases substantially, f r o m .625 t o .926.
Thus, all sales p e r s o n n e l s h o u l d be t r a i n e d t o t r y
t o sell fins to anyone b u y i n g a mask.
N o w c o n s i d e r d i v e c o m p u t e r s a n d f i n s . O f t h
e 400 t r a n s a c t i o n s , fins w e r e s o l d
250 t i m e s , so t h e p r o b a b i l i t y t h a t someone walks i
n t o the store a n d buys fins is .625.
But o f the 90 purchases o f dive c o m p u t e r s , o n l y 20
appeared w i t h f i n s . So t h e l i k e l i -
h o o d o f s o m e o n e b u y i n g f i n s , g i v e n he o r she b
o u g h t a d i v e c o m p u t e r , is 20/90,
or .1566. T h u s , w h e n someone buys a dive c o m p u t e r ,
the l i k e l i h o o d t h a t she w i l l also
b u y fins falls f r o m .625 t o .1566.
The ratio o f confidence to the base p r o b a b i l i t y of b u y i
n g an i t e m is called l i f t . L i f t
shows h o w m u c h the base p r o b a b i l i t y increases or
decreases w h e n other p r o d u c t s are
purchased. The l i f t o f fins and a mask is the confidence o f
fins given a mask, d i v i d e d b y
the base p r o b a b i l i t y o f fins. I n Figure 9-20, the l i f t of
fins and a mask is .926/.625, or 1.32.
Thus, the l i k e l i h o o d t h a t people b u y fins w h e n they
73. b u y a mask increases by 32 percent.
Surprisingly, i t t u r n s o u t t h a t the l i f t of fins and a mask
is the same as the l i f t o f a mask
and fins. Both are 1.32.
We n e e d t o be c a r e f u l here, t h o u g h , because t h i s
analysis o n l y shows s h o p p i n g
c a i X s wXVv X"NO vlerrvs. Y J e carvrvot sa^; t t o m t d
a t a wViat tVie ViVeWViood vs tVvat
c u s t o m e r s , g i v e n t h a t t h e y b o u g h t a mask, w i l
l b u y b o t h weights a n d fins. To assess
t h a t p r o b a b i l i t y , w e n e e d t o analyze s h o p p i n g
carts w i t h three i t e m s . T h i s s t a t e m e n t
i l l u s t r a t e s , o n c e a g a i n , t h a t w e n e e d t o k n o
w w h a t p r o b l e m we're s o l v i n g b e f o r e
w e s t a r t t o b u i l d the i n f o r m a t i o n system to m i n e
t h e d a t a . The p r o b l e m d e f i n i t i o n
w i l l h e l p us decide i f we need t o analyze t h r e e - i t e m
, f o u r - i t e m , or some o t h e r sized
s h o p p i n g c a i t .
M a n y o r g a n i z a t i o n s are b e n e f i t i n g f r o m m a r
k e t - b a s k e t analysis today. You c a n
expect that this technique wil] become a standard CBM ana}ysis