BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Data warehouse
1. Data WarehouseData Warehouse
Lutfi FreijLutfi Freij
Konstantin RimarchukKonstantin Rimarchuk
Vasken ChamlaianVasken Chamlaian
John SahakianJohn Sahakian
Suzan TonSuzan Ton
2. InmonInmon
Father of the data warehouseFather of the data warehouse
Co-creator of the CorporateCo-creator of the Corporate
Information Factory.Information Factory.
He has 35 years ofHe has 35 years of
experience in databaseexperience in database
technology managementtechnology management
and data warehouse design.and data warehouse design.
3. Inmon-Cont’dInmon-Cont’d
Bill has written about a varietyBill has written about a variety
of topics on the building, usage,of topics on the building, usage,
& maintenance of the data warehouse& maintenance of the data warehouse
& the Corporate Information Factory.& the Corporate Information Factory.
He has written more than 650He has written more than 650
articles (Datamation, ComputerWorld,articles (Datamation, ComputerWorld,
and Byte Magazine).and Byte Magazine).
Inmon has published 45 books.Inmon has published 45 books.
Many of books has been translated to Chinese, Dutch, French, German,Many of books has been translated to Chinese, Dutch, French, German,
Japanese, Korean, Portuguese, Russian, and Spanish.Japanese, Korean, Portuguese, Russian, and Spanish.
4. IntroductionIntroduction
What is Data Warehouse?What is Data Warehouse?
A data warehouse is a collection of integratedA data warehouse is a collection of integrated
databases designed to support a DSS.databases designed to support a DSS.
According to Inmon’s (father of data warehousing)According to Inmon’s (father of data warehousing)
definition(Inmon,1992a,p.5):definition(Inmon,1992a,p.5):
It is a collection of integrated, subject-orientedIt is a collection of integrated, subject-oriented
databases designed to support the DSS function,databases designed to support the DSS function,
where each unit of data is non-volatile and relevantwhere each unit of data is non-volatile and relevant
to some moment in time.to some moment in time.
5. Introduction-Cont’d.Introduction-Cont’d.
Where is it used?Where is it used?
It is used for evaluating future strategy.It is used for evaluating future strategy.
It needs a successful technician:It needs a successful technician:
Flexible.Flexible.
Team player.Team player.
Good balance of business and technicalGood balance of business and technical
understanding.understanding.
6. Introduction-Cont’d.Introduction-Cont’d.
The ultimate use of data warehouse is Mass Customization.The ultimate use of data warehouse is Mass Customization.
For example, it increased Capital One’s customers from 1For example, it increased Capital One’s customers from 1
million to approximately 9 millions in 8 years.million to approximately 9 millions in 8 years.
Just like a muscle: DW increases in strength with active use.Just like a muscle: DW increases in strength with active use.
With each new test and product, valuable information isWith each new test and product, valuable information is
added to the DW, allowing the analyst to learn from theadded to the DW, allowing the analyst to learn from the
success and failure of the past.success and failure of the past.
The key to survival:The key to survival:
Is the ability to analyze, plan, and react to changingIs the ability to analyze, plan, and react to changing
business conditions in a much more rapid fashion.business conditions in a much more rapid fashion.
7. Data WarehouseData Warehouse
In order for data to be effective, DW must be:In order for data to be effective, DW must be:
Consistent.Consistent.
Well integrated.Well integrated.
Well defined.Well defined.
Time stamped.Time stamped.
DW environment:DW environment:
The data store, data mart & the metadata.The data store, data mart & the metadata.
8. The Data StoreThe Data Store
An operational data store (ODS) stores data for aAn operational data store (ODS) stores data for a
specific application. It feeds the data warehouse aspecific application. It feeds the data warehouse a
stream of desired raw data.stream of desired raw data.
Is the most common component of DW environment.Is the most common component of DW environment.
Data store is generally subject oriented, volatile,Data store is generally subject oriented, volatile,
current commonly focused on customers, products,current commonly focused on customers, products,
orders, policies, claims, etc…orders, policies, claims, etc…
9. Data Store & Data WarehouseData Store & Data Warehouse
Data store & Data warehouse, table 10-1 pageData store & Data warehouse, table 10-1 page
296296
10. The data store-Cont’d.The data store-Cont’d.
Its day-to-day function is to store the data for aIts day-to-day function is to store the data for a
single specific set of operational application.single specific set of operational application.
Its function is to feed the data warehouse dataIts function is to feed the data warehouse data
for the purpose of analysis.for the purpose of analysis.
11. The Data MartThe Data Mart
It is lower-cost, scaled down version of theIt is lower-cost, scaled down version of the
DW.DW.
Data Mart offer a targeted and less costlyData Mart offer a targeted and less costly
method of gaining the advantages associatedmethod of gaining the advantages associated
with data warehousing and can be scaled up towith data warehousing and can be scaled up to
a full DW environment over time.a full DW environment over time.
12. The Meta DataThe Meta Data
Last component of DW environments.Last component of DW environments.
It is information that is kept about the warehouseIt is information that is kept about the warehouse
rather than information kept within the warehouse.rather than information kept within the warehouse.
Legacy systems generally don’t keep a record ofLegacy systems generally don’t keep a record of
characteristics of the data (such as what pieces of datacharacteristics of the data (such as what pieces of data
exist and where they are located).exist and where they are located).
The metadata is simply data about data.The metadata is simply data about data.
13. ConclusionConclusion
A Data Warehouse is a collection of integrated subject-A Data Warehouse is a collection of integrated subject-
oriented databases designed to support a DSS.oriented databases designed to support a DSS.
Each unit of data is non-volatile and relevant to some moment in time.Each unit of data is non-volatile and relevant to some moment in time.
An operational data store (ODS) stores data for a specificAn operational data store (ODS) stores data for a specific
application. It feeds the data warehouse a stream of desiredapplication. It feeds the data warehouse a stream of desired
raw data.raw data.
A data mart is a lower-cost, scaled-down version of a dataA data mart is a lower-cost, scaled-down version of a data
warehouse, usually designed to support a small group of userswarehouse, usually designed to support a small group of users
(rather than the entire firm).(rather than the entire firm).
The metadata is information that is kept about the warehouse.The metadata is information that is kept about the warehouse.
15. Characteristics of Data WarehouseCharacteristics of Data Warehouse
Subject oriented.Subject oriented. Data are organized based onData are organized based on
how the users refer to them.how the users refer to them.
IntegratedIntegrated. All inconsistencies regarding naming. All inconsistencies regarding naming
convention and value representations areconvention and value representations are
removed.removed.
NonvolatileNonvolatile. Data are stored in read-only format. Data are stored in read-only format
and do not change over time.and do not change over time.
Time variantTime variant.. Data are not current but normallyData are not current but normally
time series.time series.
16. Characteristics of Data WarehouseCharacteristics of Data Warehouse
SummarizedSummarized Operational data are mapped intoOperational data are mapped into
a decision-usable formata decision-usable format
Large volumeLarge volume.. Time series data sets areTime series data sets are
normally quite large.normally quite large.
Not normalizedNot normalized.. DW data can be, and oftenDW data can be, and often
are, redundant.are, redundant.
MetadataMetadata.. Data about data are stored.Data about data are stored.
Data sourcesData sources. Data come from internal and. Data come from internal and
external unintegrated operational systems.external unintegrated operational systems.
17. A Data Warehouse isA Data Warehouse is SubjectSubject OrientedOriented
18. Subject OrientationSubject Orientation
Application EnvironmentApplication Environment Data warehouseData warehouse
EnvironmentEnvironment
Design activities must be equallyDesign activities must be equally
focused on both process and databasefocused on both process and database
designdesign
DW world is primarily void of processDW world is primarily void of process
design and tends to focus exclusively ondesign and tends to focus exclusively on
issues of data modeling and databaseissues of data modeling and database
designdesign
19. Data IntegratedData Integrated
IntegrationIntegration –consistency naming–consistency naming
conventions and measurement attributers,conventions and measurement attributers,
accuracy, and common aggregation.accuracy, and common aggregation.
Establishment of a common unit ofEstablishment of a common unit of
measure for all synonymous datameasure for all synonymous data
elements from dissimilar database.elements from dissimilar database.
The data must be stored in the DW in anThe data must be stored in the DW in an
integrated, globally acceptable mannerintegrated, globally acceptable manner
21. Time VariantTime Variant
In an operational application system, theIn an operational application system, the
expectation is that all data within the databaseexpectation is that all data within the database
are accurate as of the moment of access. In theare accurate as of the moment of access. In the
DW data are simply assumed to be accurate asDW data are simply assumed to be accurate as
of some moment in time and not necessarilyof some moment in time and not necessarily
right now.right now.
One of the places where DW data display timeOne of the places where DW data display time
variance is in the structure of the record key.variance is in the structure of the record key.
Every primary key contained within the DWEvery primary key contained within the DW
must contain, either implicitly or explicitly anmust contain, either implicitly or explicitly an
element of time( day, week, month, etc)element of time( day, week, month, etc)
22. Time VariantTime Variant
Every piece of data contained within theEvery piece of data contained within the
warehouse must be associated with awarehouse must be associated with a
particular point in time if any usefulparticular point in time if any useful
analysis is to be conducted with it.analysis is to be conducted with it.
Another aspect of time variance in DWAnother aspect of time variance in DW
data is that, once recorded, data within thedata is that, once recorded, data within the
warehouse cannot be updated orwarehouse cannot be updated or
changed.changed.
23. NonvolatilityNonvolatility
Typical activities such as deletes, inserts,Typical activities such as deletes, inserts,
and changes that are performed in anand changes that are performed in an
operational application environment areoperational application environment are
completely nonexistent in a DWcompletely nonexistent in a DW
environment.environment.
Only two data operations are everOnly two data operations are ever
performed in the DW: data loading andperformed in the DW: data loading and
data accessdata access
24. NonvolatilityNonvolatility
Application DW
The design issues must focus on data
integrity and update anomalies. Complex
processes must be coded to ensure that the
data update processes allow for high
integrity of the final product.
Such issues are no concern to in a DW
environment because data update is never
performed.
Data is placed in normalized form to
ensure a minimal redundancy (totals that
could be calculated would never be
stored)
Designers find it useful to store many of
such calculations or summarizations.
The technologies necessary to support
issues of transaction and data recovery,
roll back, and detection and remedy of
deadlock are quite complex.
Relative simplicity in technology
25. Issues of Data Redundancy betweenIssues of Data Redundancy between
DW and operational environmentsDW and operational environments
The lack of relevancy of issues such as dataThe lack of relevancy of issues such as data
normalization in the DW environment may suggest thatnormalization in the DW environment may suggest that
existence of massive data redundancy within the dataexistence of massive data redundancy within the data
warehouse and between the operational and DWwarehouse and between the operational and DW
environments.environments.
Inmon(1992) pointed out and proved that it is not true.Inmon(1992) pointed out and proved that it is not true.
26. Issues of Data Redundancy betweenIssues of Data Redundancy between
DW and operational environmentsDW and operational environments
The data being loaded into the DW are filtered and “cleansed” as theyThe data being loaded into the DW are filtered and “cleansed” as they
pass from the operational database to the warehouse. Because of thispass from the operational database to the warehouse. Because of this
cleansing numerous data that exists in the operational environmentcleansing numerous data that exists in the operational environment
never pass to the data warehouse. Only the data necessary fornever pass to the data warehouse. Only the data necessary for
processing by the DSS or EIS are ever actually loaded into the DW.processing by the DSS or EIS are ever actually loaded into the DW.
The time horizons for warehouse and operational data elements areThe time horizons for warehouse and operational data elements are
unique. Data in the operational environment are fresh, whereasunique. Data in the operational environment are fresh, whereas
warehouse data are generally much older.(so there is minimalwarehouse data are generally much older.(so there is minimal
opportunity of the data to overlap between two environments )opportunity of the data to overlap between two environments )
The data loaded into the DW often undergo a radical transformation asThe data loaded into the DW often undergo a radical transformation as
they pass form operational to the DW environment. So data in DW arethey pass form operational to the DW environment. So data in DW are
not the same.not the same.
Given this factors, Inmon suggests that data redundancy between the twoGiven this factors, Inmon suggests that data redundancy between the two
environments is a rare occurrence with a typical redundancy factor ofenvironments is a rare occurrence with a typical redundancy factor of
less than 1 %less than 1 %
27. The Data WarehouseThe Data Warehouse
ArchitectureArchitecture
The architecture consists of variousThe architecture consists of various
interconnected elements:interconnected elements:
Operational and external database layerOperational and external database layer – the– the
source data for the DWsource data for the DW
Information access layerInformation access layer – the tools the end– the tools the end
user access to extract and analyze the datauser access to extract and analyze the data
Data access layerData access layer – the interface between the– the interface between the
operational and information access layersoperational and information access layers
Metadata layerMetadata layer – the data directory or– the data directory or
repository of metadata informationrepository of metadata information
28. Components of the DataComponents of the Data
Warehouse ArchitectureWarehouse Architecture
29. The Data WarehouseThe Data Warehouse
ArchitectureArchitecture
Additional layers are:Additional layers are:
Process management layerProcess management layer – the scheduler or job– the scheduler or job
controllercontroller
Application messaging layerApplication messaging layer – the “middleware” that– the “middleware” that
transports information around the firmtransports information around the firm
Physical data warehouse layerPhysical data warehouse layer – where the actual– where the actual
data used in the DSS are locateddata used in the DSS are located
Data staging layerData staging layer – all of the processes necessary to– all of the processes necessary to
select, edit, summarize and load warehouse dataselect, edit, summarize and load warehouse data
from the operational and external data basesfrom the operational and external data bases
30. Data Warehousing TypologyData Warehousing Typology
TheThe virtual data warehousevirtual data warehouse – the end users– the end users
have direct access to the data stores, usinghave direct access to the data stores, using
tools enabled at the data access layertools enabled at the data access layer
TheThe central data warehousecentral data warehouse – a single physical– a single physical
database contains all of the data for a specificdatabase contains all of the data for a specific
functional areafunctional area
TheThe distributed data warehousedistributed data warehouse – the– the
components are distributed across severalcomponents are distributed across several
physical databasesphysical databases
31. The MetadataThe Metadata
The name suggests some high-levelThe name suggests some high-level
technological concept, but it really is fairlytechnological concept, but it really is fairly
simple. Metadata is “data about data”.simple. Metadata is “data about data”.
With the emergence of the data warehouse as aWith the emergence of the data warehouse as a
decision support structure, the metadata aredecision support structure, the metadata are
considered as much a resource as the businessconsidered as much a resource as the business
data they describe.data they describe.
Metadata are abstractions -- they are high levelMetadata are abstractions -- they are high level
data that provide concise descriptions of lower-data that provide concise descriptions of lower-
level data.level data.
32. The MetadataThe Metadata
For example, a line in a sales database may contain:For example, a line in a sales database may contain:
4056 KJ596 223.454056 KJ596 223.45
This is mostly meaningless until we consult the metadataThis is mostly meaningless until we consult the metadata
that tells us it was store number 4056, product KJ596that tells us it was store number 4056, product KJ596
and sales of $223.45and sales of $223.45
The metadata are essential ingredients in theThe metadata are essential ingredients in the
transformation of raw data into knowledge. They are thetransformation of raw data into knowledge. They are the
“keys” that allow us to handle the raw data.“keys” that allow us to handle the raw data.
33. General Metadata IssuesGeneral Metadata Issues
General metadata issues associated with DataGeneral metadata issues associated with Data
Warehouse use:Warehouse use:
What tables, attributes and keys does the DWWhat tables, attributes and keys does the DW
contain?contain?
Where did each set of data come from?Where did each set of data come from?
What transformations were applied with cleansing?What transformations were applied with cleansing?
How have the metadata changed over time?How have the metadata changed over time?
How often do the data get reloaded?How often do the data get reloaded?
Are there so many data elements that you need to beAre there so many data elements that you need to be
careful what you ask for?careful what you ask for?
34. Components of the MetadataComponents of the Metadata
Transformation mapsTransformation maps – records that show– records that show
what transformations were appliedwhat transformations were applied
Extraction & relationship historyExtraction & relationship history – records– records
that show what data was analyzedthat show what data was analyzed
Algorithms for summarizationAlgorithms for summarization – methods– methods
available for aggregating and summarizingavailable for aggregating and summarizing
Data ownershipData ownership – records that show origin– records that show origin
Patterns of accessPatterns of access – records that show– records that show
what data are accessed and how oftenwhat data are accessed and how often
35. Typical Mapping MetadataTypical Mapping Metadata
Transformation mapping records include:Transformation mapping records include:
Identification of original sourceIdentification of original source
Attribute conversionsAttribute conversions
Physical characteristic conversionsPhysical characteristic conversions
Encoding/reference table conversionsEncoding/reference table conversions
Naming changesNaming changes
Key changesKey changes
Values of default attributesValues of default attributes
Logic to choose from multiple sourcesLogic to choose from multiple sources
Algorithmic changesAlgorithmic changes
36. Implementing the Data WarehouseImplementing the Data Warehouse
Kozar list of “seven deadly sins” of data warehouseKozar list of “seven deadly sins” of data warehouse
implementation:implementation:
1.1. ““If you build it, they will come”If you build it, they will come” – the DW needs to be– the DW needs to be
designed to meet people’s needsdesigned to meet people’s needs
2.2. Omission of an architectural frameworkOmission of an architectural framework – you need– you need
to consider the number of users, volume of data,to consider the number of users, volume of data,
update cycle, etc.update cycle, etc.
3.3. Underestimating the importance of documentingUnderestimating the importance of documenting
assumptionsassumptions – the assumptions and potential– the assumptions and potential
conflicts must be included in the frameworkconflicts must be included in the framework
37. ““Seven Deadly Sins”,Seven Deadly Sins”, continuedcontinued
4.4. Failure to use the right toolFailure to use the right tool – a DW project needs– a DW project needs
different tools than those used to develop andifferent tools than those used to develop an
applicationapplication
5.5. Life cycle abuseLife cycle abuse – in a DW, the life cycle really– in a DW, the life cycle really
never endsnever ends
6.6. Ignorance about data conflictsIgnorance about data conflicts – resolving these– resolving these
takes a lot more effort than most people realizetakes a lot more effort than most people realize
7.7. Failure to learn from mistakesFailure to learn from mistakes – since one DW– since one DW
project tends to beget another, learning from theproject tends to beget another, learning from the
early mistakes will yield higher quality laterearly mistakes will yield higher quality later
38. Data Warehouse TechnologiesData Warehouse Technologies
No one currently offers an end-to-end DWNo one currently offers an end-to-end DW
solution. Organizations buy bits and pieces fromsolution. Organizations buy bits and pieces from
a number of vendors and hopefully make thema number of vendors and hopefully make them
work together.work together.
SAS, IBM, Software AG, Information BuildersSAS, IBM, Software AG, Information Builders
and Platinum offer solutions that are at leastand Platinum offer solutions that are at least
fairly comprehensive.fairly comprehensive.
The market is very competitive. Table 10-6 inThe market is very competitive. Table 10-6 in
the text lists 90 firms that produce DW products.the text lists 90 firms that produce DW products.
39. The Future of Data WarehousingThe Future of Data Warehousing
As the DW becomes a standard part of anAs the DW becomes a standard part of an
organization, there will be efforts to find neworganization, there will be efforts to find new
ways to use the data. This will likely bring with itways to use the data. This will likely bring with it
several new challenges:several new challenges:
Regulatory constraintsRegulatory constraints may limit the ability to combinemay limit the ability to combine
sources of disparate data.sources of disparate data.
These disparate sources are likely to containThese disparate sources are likely to contain
unstructured dataunstructured data, which is hard to store., which is hard to store.
TheThe InternetInternet makes it possible to access data frommakes it possible to access data from
virtually “anywhere”. Of course, this just increasesvirtually “anywhere”. Of course, this just increases
the disparity.the disparity.
40. ObjectiveObjective
Interesting FactsInteresting Facts
Data Can be Used ToData Can be Used To
Robust InfrastructureRobust Infrastructure
Success of DataSuccess of Data
Warehouse ProjectsWarehouse Projects
Implementing DataImplementing Data
WarehouseWarehouse
Real Time Alerts &Real Time Alerts &
IntegrationIntegration
Identity TheftIdentity Theft
What Can You Do?What Can You Do?
41. Interesting FactsInteresting Facts
Harrah’s Entertainment’s Data Warehouse holdsHarrah’s Entertainment’s Data Warehouse holds
30 terabytes, or 30 trillion bytes of data, roughly30 terabytes, or 30 trillion bytes of data, roughly
three times the number of printed characters inthree times the number of printed characters in
the Library of Congressthe Library of Congress
Casinos, retailers, airlines, and banks are pilingCasinos, retailers, airlines, and banks are piling
up data so vast, it would have been unthinkableup data so vast, it would have been unthinkable
years ago; result from the curse of cheapyears ago; result from the curse of cheap
storagestorage
42. Interesting FactsInteresting Facts
Storage Shipments as of 2004: 22Storage Shipments as of 2004: 22
exabytes or 22 million trillion bytes of hardexabytes or 22 million trillion bytes of hard
disk space, double the amount in 2002.disk space, double the amount in 2002.
Equivalent to 4x’s the space needed toEquivalent to 4x’s the space needed to
store every word ever spoken by everystore every word ever spoken by every
human being who has ever lived.human being who has ever lived.
Should double again in 2006Should double again in 2006
43. Data Can be Used ToData Can be Used To
Quantify the volume impact of vehicles across theQuantify the volume impact of vehicles across the
marketing matrixmarketing matrix
Account for decay and saturation factors in theAccount for decay and saturation factors in the
determination of investment choices and returnsdetermination of investment choices and returns
Execute “what-if” simulations of pricing or promotionalExecute “what-if” simulations of pricing or promotional
scenarios before a proposed action is takenscenarios before a proposed action is taken
Provide a continuous planning, measurement, analysis andProvide a continuous planning, measurement, analysis and
optimization cycle supported by a software structureoptimization cycle supported by a software structure
Deliver robust data feeds into other systems supportingDeliver robust data feeds into other systems supporting
supply chain, sales, and financial reporting and endeavorssupply chain, sales, and financial reporting and endeavors
44. Robust InfrastructureRobust Infrastructure
Data Identification and AcquisitionData Identification and Acquisition
Data Cleansing, Mapping, andData Cleansing, Mapping, and
TransformationTransformation
Production System Loading and OngoingProduction System Loading and Ongoing
UpdateUpdate
45. Success of Data WarehouseSuccess of Data Warehouse
ProjectsProjects
Over half of Data Warehouse projects are DoomedOver half of Data Warehouse projects are Doomed
Fail due to lack of attention to Data Quality IssuesFail due to lack of attention to Data Quality Issues
More than half only have limited acceptanceMore than half only have limited acceptance
Consistency and Accuracy of DataConsistency and Accuracy of Data
Most businesses fail to use business intelligence (BI)Most businesses fail to use business intelligence (BI)
strategicallystrategically
IT organizations build data warehouses with little to no businessIT organizations build data warehouses with little to no business
involvementinvolvement
46. ““A real-time enterpriseA real-time enterprise
without real-time businesswithout real-time business
intelligence is a real fast,intelligence is a real fast,
dumb organization.”dumb organization.”
Stephen BrobstStephen Brobst
Chief Technology OfficeChief Technology Office
TeradataTeradata
47. Success of Data WarehouseSuccess of Data Warehouse
ProjectsProjects
Most challenging type of deployment for anMost challenging type of deployment for an
enterpriseenterprise
Large scale and complex system configurationsLarge scale and complex system configurations
Sophisticated data modeling and analysis toolsSophisticated data modeling and analysis tools
High visibility in broad range of important businessHigh visibility in broad range of important business
functions within companyfunctions within company
Adoption of Linux-Based PlatformAdoption of Linux-Based Platform
48. Implementing Data WarehouseImplementing Data Warehouse
Challenges:Challenges:
Identifying new processesIdentifying new processes
Assuring there were of real useAssuring there were of real use
Implementing and ensuring cultural shiftsImplementing and ensuring cultural shifts
Managing content and New communitiesManaging content and New communities
towards a common benefittowards a common benefit
Linear modelsLinear models
Standards, Governance, Controls, ValuationStandards, Governance, Controls, Valuation
49. TeradataTeradata
Division of NCR in Dayton, OhioDivision of NCR in Dayton, Ohio
Competitor of IBM and OracleCompetitor of IBM and Oracle
Multi-million Dollar Machines to run theMulti-million Dollar Machines to run the
world’s biggest data warehousesworld’s biggest data warehouses
Wal-MartWal-Mart
Bank of AmericaBank of America
Verizon WirelessVerizon Wireless
50. Teradata’s SuccessTeradata’s Success
Conventional IBM or Sun MicrosystemsConventional IBM or Sun Microsystems
overload for a couple hours to days on aoverload for a couple hours to days on a
few terabytes and/or data queriesfew terabytes and/or data queries
IBM cannot return computation on certainIBM cannot return computation on certain
complex requestscomplex requests
Equivalent to having data but not able toEquivalent to having data but not able to
use it.use it.
51. Real Time Alerts & IntegrationReal Time Alerts & Integration
Teradata 8.0 Version released in Oct 2004Teradata 8.0 Version released in Oct 2004
Improves real-time alerts and integrationImproves real-time alerts and integration
Businesses can analyze operational info againstBusinesses can analyze operational info against
historical info to identify events in near real-timehistorical info to identify events in near real-time
using the new table designusing the new table design
Used by:Used by:
Continental Airlines in the US: reroute passengers onContinental Airlines in the US: reroute passengers on
delayed flights, reissuing tickets, reserving a room indelayed flights, reissuing tickets, reserving a room in
a hotel booking systema hotel booking system
Southwest Airlines- savings between $1.2-$1.4 MillionSouthwest Airlines- savings between $1.2-$1.4 Million
52. Identity TheftIdentity Theft
Government Regulation of Personal Data is NeededGovernment Regulation of Personal Data is Needed
(National Consumer Protection Standards)(National Consumer Protection Standards)
ChoicePoint FollyChoicePoint Folly
Georgia-based data-collection companyGeorgia-based data-collection company
Founded in 1997 to analyze insurance claims information, butFounded in 1997 to analyze insurance claims information, but
now provides data to customers including finance companies,now provides data to customers including finance companies,
law enforcement, and governmentlaw enforcement, and government
Obtain personal information by perusing public records, orObtain personal information by perusing public records, or
purchasing the information from other companiespurchasing the information from other companies
53. Identity TheftIdentity Theft
Duped by scammers who set 150 phonyDuped by scammers who set 150 phony
accounts to access personal data of as many asaccounts to access personal data of as many as
145,000 people nationwide145,000 people nationwide
Scammers set user accounts by faxing in phonyScammers set user accounts by faxing in phony
business licenses, undetected for one yearbusiness licenses, undetected for one year
750 people had their identities stolen750 people had their identities stolen
Theft would have gone unnoticed withoutTheft would have gone unnoticed without
California Identity theft law SB 1386California Identity theft law SB 1386
54. Identity TheftIdentity Theft
MSN EventMSN Event
Data Warehouse Information GatheringData Warehouse Information Gathering
Over the Phone InterviewsOver the Phone Interviews
Trash Can HuntingTrash Can Hunting
Gathered from Doctors, Internet Transactions,Gathered from Doctors, Internet Transactions,
Telephone Operators (Overseas or Prisoners)Telephone Operators (Overseas or Prisoners)
56. What Can You Do?What Can You Do?
Carefully monitor your credit card bills and creditCarefully monitor your credit card bills and credit
reportsreports
Request a once a year free access credit reportRequest a once a year free access credit report
via the three big credit agencies.via the three big credit agencies.
Equifax, Experian, TransUnionEquifax, Experian, TransUnion
Victims: contact Federal Trade Commission toVictims: contact Federal Trade Commission to
report the theft and monitor credit reports.report the theft and monitor credit reports.
1-800-IDTHEFT1-800-IDTHEFT
57. ReferencesReferences
Decision Support Systems in the 21Decision Support Systems in the 21stst
CenturyCentury 22ndnd
Edition, by George M.Edition, by George M.
Marakas, Prentice Hall, Upper Saddle River, NJ, 2003Marakas, Prentice Hall, Upper Saddle River, NJ, 2003
http://seattletimes.nwsource.com/html/editorialsopinion/2002191098_credited27.hthttp://seattletimes.nwsource.com/html/editorialsopinion/2002191098_credited27.ht
Seattle times, plugging holes in data warehousingSeattle times, plugging holes in data warehousing
Teradata warehouse improves real-time alerts and integrationTeradata warehouse improves real-time alerts and integration
Cliff Saran.Cliff Saran. Computer Weekly.Computer Weekly. Sutton: Oct 12, 2004. p. 22 (1 page)Sutton: Oct 12, 2004. p. 22 (1 page)
ON THE MARKON THE MARK
Mark Hall.Mark Hall. Computerworld.Computerworld. Framingham: Oct 18, 2004. Vol. 38, Iss. 42; p.Framingham: Oct 18, 2004. Vol. 38, Iss. 42; p.
6 (1 page)6 (1 page)
Optimization: It's All About the DataOptimization: It's All About the Data Brandweek: Ellen Pederson, MarkBrandweek: Ellen Pederson, Mark
AndersonAnderson
THE NO-SACRIFICE, AFFORDABLE DATA WAREHOUSE APPTHE NO-SACRIFICE, AFFORDABLE DATA WAREHOUSE APP IntelligentIntelligent
Enterprises, Michael GonzalezEnterprises, Michael Gonzalez
58. ReferencesReferences
http://www.dmreview.com/article_sub.cfm?articleId=7071 Convergence-http://www.dmreview.com/article_sub.cfm?articleId=7071 Convergence-
Beyond the Data WarehouseBeyond the Data Warehouse
http://www.computerworld.com/printthis/2001/0,4814,56969,00.html Micro-http://www.computerworld.com/printthis/2001/0,4814,56969,00.html Micro-
segmentation – Computerworldsegmentation – Computerworld
Too Much InformationToo Much Information Forbes article on data warehouseForbes article on data warehouse
http://reviews.cnet.com/4520-3513_7-5690533-1.htmlhttp://reviews.cnet.com/4520-3513_7-5690533-1.html When identityWhen identity
thieves strike data warehousesthieves strike data warehouses
Over half of data warehouse projects doomedOver half of data warehouse projects doomed VNU Business PublicationsVNU Business Publications
Limited, Robert Jaques 25 February 2005Limited, Robert Jaques 25 February 2005
http://www.linuxworld.com/magazine/?issueid=571 Linux World Articlehttp://www.linuxworld.com/magazine/?issueid=571 Linux World Article
To maintain a robust infrastructure, you should follow these guidelines.
Which data sources reflect the dynamics of the contemporary and historical market place.
Validity Checks,
Above loaded into the integrated data repository enabling a continuous cycle of near-time data availability
These challenges are especially predominant in dot-com companies.
The key word of Implementing Data Warehouse is Convergence!
The inherent Linear relationship was either business has driven requirements for IT or IT has been an enabler of business. This was a practical need of separation.
Structural changes in business as a result of endemic information technology (information power on all desktops and many businesspersons acting as their own information centers, automated information is a basic lubricant of all business processes.)
Convergence means that info is no longer a separate commodity.
For the Final Note: Information, as any other business commodity, deserves the same treatment as any other core component of business
IBM and Oracle is designed more to handle online transaction processing