Simplicity: The most frequently mentioned advantage of data warehousing is summarised as “simplicity.” Data warehousing makes business simple because a data warehouse provides a single image of business reality by integrating various data. Data warehouses allow existing legacy systems to continue in operation, consolidate inconsistent data from various legacy systems into one coherent set, and reap benefits from vital information about current operations. Current operations can be monitored and compared with past operations, predictions of future operations can be rationally made, new business processes can be devised, and new operational systems quickly spawn to support those processes. Data warehouses can also store large amounts of historical data and corporate-wide data that companies need to turn into vital business information. Data warehouses offer the benefit of a single, centralised data location while maintaining local client/server distribution. Furthermore, data warehouses are company-wide systems, therefore they improve corporate-wide communication. Better quality data; improved productivity: Another frequently mentioned advantage is better quality data . Other data quality issues include consistency, accuracy, and documentation. Improved decision making through online analytical processing (OLAP) and data mining analysis can lead to improvements in productivity. Fast access: Since data warehouses allow users to retrieve necessary data by themselves, the workload of the IS department is reduced. The necessary data is in one place, so system response time should be reduced. Easy to use: Data warehouses focus on subjects, support on-time, ad-hoc queries for fast decision-making as well as the regular reporting and they are targeted at end users. Separate decision-support operation from production operation: Another advantage is that data warehouses are built in order to separate operational, continually updated transaction data from historical, more static data required for business analysis. By doing so, managers and analysts can use historical data for their decision-making activities without slowing down the production operation. Queries from users do not interfere with normal operations, because a data warehouse enables easy access to business data without slowing down the operational database by taking some of the operational data and putting it in a separate database.
Gives competitive advantage: Data warehouses better manage and utilise corporate knowledge, which in turn helps a business become more competitive, better understand customers, and more rapidly meet market demands Ultimate distributed database: Data warehouses pull together information from disparate and potentially incompatible locations throughout the organisation and put it to good use. Middleware, data transfer software and other client/server tools are used to link those disparate data sources. A data warehouse is the ultimate distributed database. Information flow management: Data warehouses handle a large amount of data from various operational data sources, and data warehouses manage the flow of information rather than just collecting data. To respond to changing business needs, production systems are constantly changing along with their data encoding and structures. Data warehouses, especially the meta data, help continuous incremental refinement that must track both production systems and the changing business environment. Enables parallel processing: Users can ask questions that were too process-intensive to answer before and data warehouse can handle more customers, users, transactions, queries, and messages. It supports the higher performance demands in client/server environment, provides unlimited scalability, and thus, better price/performance. Robust processing engines: Data warehouses allow users to directly obtain and refine data from different software applications without affecting the operational databases, and to integrate different business tasks into a single, streamlined process supported by real-time information. This provides users with robust processing engines. Security: Since clients of the data warehouses cannot directly query the production databases, the security of the production databases is increased as well as their productivity. Some warehouses also provide management services for handling security.
Complexity and anticipation in development: The disadvantage mentioned most frequently is complexity in development. IS cannot just buy a data warehouse; IS has to build one because each warehouse has a unique architecture and a set of requirements that spring from the individual needs of the organisation. IS needs to ask a wide range of questions in building it. Builders need to pay as much attention to the structure, definitions, and flow of data as they do to choosing hardware and software. Data warehouse construction requires a sense of anticipation about future ways to use the collected records. Developers need to be aware of the constantly changing needs of their company’s business and the capabilities of the available and emerging hardware and software. How to scale the warehouse to meet increasing user demand for both volume and complexity makes its development more complex. Also, there are difficulties in choosing the right products. In summary, developing such a large database requires an expert. Takes time to build: to build a data warehouse takes time (2 to 3 years). In a situation where strong executive sponsorship is not present, IS directors or others wishing to develop a warehouse may spend an inordinate amount of time justifying the need. Expensive to build: Similarly, a data warehouse is also expensive to build ($2 to 3 million). One reason data warehouses are so expensive is that data must be moved or copied from existing databases, sometimes manually, and data needs to be translated into a common format. End-user training: It is necessary to create a new “mind-set” with all employees who must be prepared to capitalise upon the innovative data analysis provided by data warehouses; those end users require extensive training. A communication plan is essential to educate all constituents. Complexity involved in SMP and MPP: The complexity of data warehousing, which will be increased if the warehouses involve symmetrical multiprocessing (SMP) and massively parallel processing (MPP). Synchronization and shared access are difficult.
Traditionally the task of identifying and utilising information hidden in data has been achieved through some form of traditional statistical methods
Typically, this involves a user formulating a guess about a possible relationship in the data and evaluating this hypothesis via a statistical test. This is a largely time-intensive, user-driven, top-down approach to data analysis.
With data mining, the interrogation of the data is done by the data mining algorithm rather than by the user
Data mining is a self-organising, data-influenced, bottom-up approach to data analysis
Simply put, what data mining does is sort through masses of data to uncover patterns and relationships, then build models to predict behaviours
Statistics – the most mature data mining technologies, but are often not applicable because they need clean data. In addition, many statistical procedures assume linear relationships, which limits their use.
Neural networks, genetic algorithms, fuzzy logic – these technologies are able to work with complicated and imprecise data. Their broad applicability has made them popular in the field.
Decision trees – these technologies are conceptually simple and have gained in popularity as better tree growing software was introduced. Because of the way they are used, they are perhaps better called “classification” trees.
Pharmaceuticals: Massive amounts of biological and clinical information can be analysed with data mining methods to discover new uses for existing drugs
Healthcare: Hospitals are using data mining to perform utilisation analysis and pricing analysis, to estimate outcome analysis, to improve preventive care, and to detect fraud and questionable practices
Banking: Data mining tools help banks to understand customer behaviour, conduct profitability analysis, improve cross-selling efforts, identify credit risk, identify customers for loan campaigns, tailor financial products to meet customer needs, seek new customers, and enhance customer service
Credit card companies: Predictors for credit card customer attrition and fraud are frequently identified via data mining. Successful users of data mining include American Express and Citibank.
Financial services: Security analysts are using data mining extensively to analyse large volumes of financial data in order to build trading and risk models for developing investment strategies
Telemarketing and direct marketing: In this sector, companies have gained big savings and are able to target customers more accurately by using data mining. Direct marketers are configuring and mailing their product catalogs based on customers' purchase history and demographic data.
Airlines: As the competition in the airline business increases, understanding customers' needs has become imperative. Airlines capture customer data in order to make strategic movements such as expanding their services in new routes.
Manufacturers: Data mining is widely used in manufacturing industries to control and schedule technical production processes.
Insurance companies: The insurance industry is data intensive. Data mining has recently provided insurers with a wealth of useful information extracted from huge databases for decision making.
Telecommunications: By applying the insights learned through data mining, telecommunications companies can identify products and services that maximise value and then use this information to establish marketing campaigns to improve market share. A common example in this industry is identifying factors that influence customer retention. In the US, telephone companies were famous for their price-cutting strategy in the past, but the new strategy is to know their customers better. Using data mining, telephone companies are able to provide customers with a great variety of new services they are likely to purchase.
Distribution and retailing: With the huge amount of consumer data flowing in daily from different sources, especially from e-commerce Web sites, data mining helps companies learn more about their customers and develop insights into their buying habits. Knowing the behaviours (e.g. likes and dislikes) of customers leads to better customer service and allows companies to create one-to-one relationships with customers, hopefully prolonging loyalty and prompting repeat business. As such, data mining is used extensively in the area of customer relationship management. Large users of data mining in retailing industry include Wal-Mart and Victoria's Secret.
Remotely sensed data: Huge amounts of remotely sensed data are taken in every day from satellite images and other related sources. Data mining is used in prediction of weather, monitoring and reasoning about ozone depletion, etc.
Provide better information to achieve competitive edge
This advantage is the primary motivation for data mining. Data mining has a powerful analytical ability to generate information, which allows an organisation to better understand itself, its customers, and the marketplace it competes in. When used as a marketing tool, data mining often results in sharper competitive edge, an evidence-based selling approach, a customer-oriented marketing plan, shorter selling cycles, and reduced operational costs.
Add value to a data warehouse
A data warehouse by itself is just a large repository of unstructured data, and data mining is the process of analysing the data and transforming it into useful information. Organisations have experienced a payback of 10 to 70 times their data warehouse investment after data mining components are added.
Increase operating efficiency
Data mining's ability to quickly organise and analyse a large pool of data has dramatically increased workplace efficiency. It allows users to create complex financial statement in minutes compared with weeks by traditional methods.
With data mining, users gain control over the data. Instead of letting the system push the data, users are now able to pull the data they need. Users can let their imagination run and manipulate data in various ways to answer their questions. The easy-to-use interface of data mining tools and client/server technology has made the information directly accessible by individual users.
Reduce operating costs
Modern data mining tools are made of highly sophisticated hardware and software components. They allow these tools to analyse massive data sets efficiently with reduced operating costs. (e.g. the high costs faced by public sector organisations such as healthcare providers when asked to answer a “parliamentary question” raised in the Oireachtas could be reduced by the use of data warehouses and data mining)
Unlike traditional data analysis methods, data mining hardly requires pre-processing of data prior to analysis. It can use a mixture of numeric, categorical, and date data, and can tolerate missing and noisy data. The results are in the form of ready-to-use business rules with almost no statistical expertise and guesswork needed.
Solve research bottleneck
In many social science and business situations, conducting real experiments is almost impossible. Data mining is able to provide these research agendas with a more limited set of working hypotheses for further investigation based on large, unstructured data sets.
Data mining yields useful insights and clues but no definitive answers. The definitive answers need to be achieved through much more rigorous scientific experimentation. Experiences from Wall Street have shown that this technology may not outperform traditional methods. Therefore, users should have a realistic expectation of the results of data mining.
The cost of implementing data mining is quite high; thus, it may not be appropriate in some business environments. Need to justify ROI by cost-benefit analysis
Complex and lengthy project
Experience from data mining system developers has shown that it takes a long time to get the project right. Developers suggest focusing on incremental development and benefits.
The detailed data about individuals used in data mining might involve a violation of privacy. This problem worsens when the World Wide Web is involved, because detailed personal information is easily accessible and can fall into wrong hands.
Despite its increasingly simple interface and automation of the thinking processes, data mining is more suitable for people with statistical, operation research, and management science backgrounds. The ease of use becomes a critical factor for attracting more businesses to invest in this technology.
Many authors have suggested that organisations must increase the size of their databases tremendously in order to do data mining. However, some are concerned that this will result in unmanageable and unnecessary databases.
Wrong information from errors in data
The massive data used in data mining inevitably contains mistakes caused by human errors. Information generated should be used with caution to avoid lawsuits in areas such as hiring. Experts suggest using only relevant information for mining to reduce such risks.