3. What is data warehousing
In computing, a data warehouse is a database
used for reporting and data analysis. Data
warehouses store current and historical data and
are used for creating trending reports for senior
management reporting such as annual and
quarterly comparisons.
4.
5. Characteristics of data warehouse
• Subject oriented: A data warehouse is organized around
major subjects such as customer, products, sales. Data is
organized according to subject instead of application. The
data organized by subject obtains only the information
necessary for the decision support processing.
• Integrated: A data warehouse is usually constructed
by integrating multiple relational databases. Data
integration techniques are applied to maintain
consistency in naming convention.
6. • Non-volatile: A data warehouse is always a physically
separate store of data due to this separation, data
warehouses do not require transaction processing,
recovery, concurrency control. The data is not updated
or changed in any way once they enter the data
warehouse, but are only loaded, refreshed and accessed
for queries.
• Time variant: Data is stored in data warehouse to
provide a historical perspective. The data warehouse
contains a place for sorting data that are 5 to 10 years
old, or older, to be used for comparisons.
7. Advantages and Disadvantages
Advantages
• Clean data.
• Query processing: multiple options.
• Security: data and access.
Disadvantages
• Long initial implementation time and associated high
cost.
• Adding new data sources takes time and associated
high cost.
• Typically, data is static and dated.
8. What is data mining
It is a process of extracting hidden information
from large databases. It is a powerful new
technology to help companies focus on the
most important information in their data
warehouses. Data mining tools predict future
trends and behaviors, allowing businesses to
make knowledge-driven decisions.
10. Methods Of Data Mining
• Descriptive method: It a method of finding
human interpretable patterns that describe the
data. Data mining in this case is useful to group
together similar documents returned by search
engine according to their context.
• Predictive method: In this method, we can use
some variables to predict unknown or future
values of other variable. It is used to predict
whether a newly arrived customer will spend
more than 100$ at a department store.
11. Data mining techniques
• Anomaly detection– The identification of unusual data records, that might
be interesting or data errors that require further investigation.
• Association rule learning– Searches for relationships between variables.
For example a supermarket might gather data on customer purchasing
habits. Using association rule learning, the supermarket can determine
which products are frequently bought together and use this information
for marketing purposes.
• Clustering is the task of discovering groups and structures in the data that
are in some way or another "similar", without using known structures in
the data.
• Regression attempts to find a function which models the data with the
least error.
• Summarization providing a more compact representation of the data set,
including visualization and report generation.
12. Advantages and Disadvantages
Advantages
• Help with decision making.
• Improve company revenue and lower costs.
• Market basket analysis.
Disadvantages
• Great cost at implementation stage.
• Possible misuse of information.
• Possible in accuracy of data.
13. Conclusion
Organizations today are under tremendous
pressure to compete in an environment of
tight deadlines. Business processes that
require data to be extracted and manipulated
prior to use will no longer be acceptable.
Instead, enterprises need rapid decision
support based on the analysis and forecasting
of predictive behavior. Data-warehousing and
data-mining techniques provide this capability.