ABSTRACTWe live in the age of information. Data is the most valuable resource of an enterprise. In today’scompetitive global business environment, understanding and managing enterprise wide information iscrucial for making timely decisions and responding to changing business conditions. Many companies arerealizing a business advantage by leveraging one of their key assets – business Data. There is atremendous amount of data generated by day-to-day business operational applications. In addition thereis valuable data available from external sources such as market research organizations, independentsurveys and quality testing labs. Studies indicate that the amount of data in a given organization doublesevery 5 years.Data warehousing has emerged as an increasingly popular and powerful concept of applying informationtechnology to turn these huge islands of data into meaningful information for better business. Datamining, the extraction of hidden predictive information from large databases is a powerful newtechnology with great potential to help companies focus on the most important information in their datawarehouses. Data mining tools predict future trends and behaviors, allowing businesses to makeproactive, knowledge-driven decisions.This paper describes the practicalities and the constraints in Data mining and Data warehousing and itsadvancements from the earlier technologies.INTRODUCTIONData Warehousing A data warehouse can be defined as any centralized data repository which can be queried for business benefit Warehousing makes it possible to o Extract archived operational data o Overcome inconsistencies between different legacy data formats o Integrate data throughout an enterprise, regardless of location, format, or communication requirements o Incorporate additional or expert informationData MiningData mining is not an “intelligence” tool or framework, typically drawn from an enterprise datawarehouse is used to analyze and uncover information about past performance on an aggregate level.Data warehousing and business intelligence provide a method for users to anticipate future trends fromanalyzing past patterns in organizational data. Data mining is more intuitive, allowing for increasedinsight beyond data warehousing. An implementation of data mining in an organization will serve as aguide to uncover inherent trends and tendencies in historical information, as well as allow for statisticalpredictions, groupings and Classification of data.Typical data warehousing implementations in organizations will allow users to ask and answer questionssuch as “How many sales were made, by territory, by sales person between the months of May and Junein 1999?” Data mining will allow business decision makers to ask and answer questions, such as “Who ismy core customer that purchases a particular product we sell?” or “Geographically, how well would a lineof products sell in a particular region and who would purchase them, given the sale of similar products inthat region.WHAT IS DATA MINING?Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing datafrom different perspectives and summarizing it into useful information – information that can be used toincrease revenue, cuts costs, or both. Data mining software is one of a number of analytical tools foranalyzing data. It allows users to analyze data from many different dimensions or angles, categorize it,
and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost. WHAT IS DATA WAREHOUSING? Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are allowing users to access this data freely. The data analysis software is what supports data mining. According to Bill Inman, author of Building the Data Warehouse and the guru who is widely considered to be the originator of the data warehousing concept, there are generally four characteristics that describe a data warehouse:1. Subject-oriented: data are organized according to subject instead of application e.g. an insurance company using a data warehouse would organize their data by customer, premium, and claim, instead of by different products (auto, life, etc.). The data organized by subject contain only the information necessary for decision support processing.2. Integrated: When data resides in many separate applications in the operational environment, encoding of data is often inconsistent. For instance, in one application, gender might be coded as “m” and “f” in another by 0 and 1. When data are moved from the operational environment into the data warehouse, they assume a consistent coding convention e.g. gender data is transformed to “m” and “f”.3. Time-variant: The data warehouse contains a place for storing data that are five to 10 years old, or older, to be used for comparisons, trends, and forecasting. These data are not updated. An Overview of Data Mining Techniques: This overview provides a description of some of the most common data mining algorithms in use today. We have broken the discussion into two sections, each with a specific theme: 1) Classical Techniques such as statistics, neighborhoods and clustering, and 2) Next Generation Techniques such as trees, networks and rules. Each section will describe a number of data mining algorithms at a high level, focusing on the “big picture” so that the reader will be able to understand how each algorithm fits into the landscape of data mining techniques. HOW DO DATAMINING AND DATAWAREHOUSING WORK TOGETHER?? Extracting meaningful information from numerous databases and cross-referencing it to find patterns, trends and correlations that might otherwise be overlooked is called “data mining.” Assembling the information in one place is called “data warehousing.”
Datamining and Data warehousing1. All the information is stored in Information repositories.2. Data warehouse takes the cleaned and integrated data.3. The data taken by Data warehouse is selected and transformed and the useful data is sent through Data mining.4. The data, which is sent through data mining is evaluated and presented. APPLICATIONS Data Warehousing Insulate data – i.e. the current operational information o Preserves the security and integrity of mission-critical OLTP applications o Gives access to the broadest possible base of data. Retrieve data – from a variety of heterogeneous operational databases o Data is transformed and delivered to the data warehouse/store based on a selected model (or mapping definition) o Metadata – information describing the model and definition of the source data elements Data cleansing – removal of certain aspects of operational data, such as low-level transaction information, which slow down the query times. Transfer – processed data transferred to the data warehouse, a large database on a high performance box. Data Mining Medicine – drug side effects, hospital cost analysis, genetic sequence analysis, prediction etc. Finance – stock market prediction, credit assessment, fraud detection etc. Marketing/sales – product analysis, buying patterns, sales prediction, target mailing, identifying `unusual behavior’ etc. Knowledge Acquisition Scientific discovery – superconductivity research, etc. Engineering – automotive diagnostic expert systems, fault detection etc. ADVANTAGES:1. Enhances end-user access to a wide variety of data.2. Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years. A data warehouse can be a significant enabler of commercial business applications, most notably Customer relationship Management (CRM). DISADVANTAGES: Data mining systems rely on databases to supply the raw data for input and this raises problems in that databases tend be dynamic, incomplete, noisy, and large. Other problems arise as a result of the adequacy and relevance of the information stored.
Limited InformationA database is often designed for purposes different from data mining and sometimes the properties orattributes that would simplify the learning task are not present nor can they be requested from the realworld. Inconclusive data causes problems because if some attributes essential to knowledge about theapplication domain are not present in the data it may be impossible to discover significant knowledgeabout a given domain. For example cannot diagnose malaria from a patient database if that databasedoes not contain the red blood cell count of the patients.Missing data can be treated by discovery systems in a number of ways such as; Simply disregard missing values Omit the corresponding records Infer missing values from known values Treat missing data as a special value to be included additionally in the attribute domain Or average over the missing values using Bayesian techniques.FUTURE VIEWSThe future of data mining lies in predictive analytics. The technology innovations in data mining since2000 have been truly Darwinian and show promise of consolidating and stabilizing around predictiveanalytics. Variations, novelties and new candidate features have been expressed in a proliferation of smallstart-ups that have been ruthlessly culled from the herd by a perfect storm of bad economic news.Nevertheless, the emerging market for predictive analytics has been sustained by professional services,service bureaus (rent a recommendation) and profitable applications in verticals such as retail, consumerfinance, telecommunications, travel and leisure, and related analytic applications. Predictive analyticshave successfully proliferated into applications to support customer recommendations, customer valueand churn management, campaign optimization, and fraud detection. On the product side, successstories in demand planning; just in time inventory and market basket optimization are a staple ofpredictive analytics. Predictive analytics should be used to get to know the customer, segment andpredict customer behavior and forecast product demand and related market dynamics. Be realistic aboutthe required complex mixture of business acumen, statistical processing and information technologysupport as well as the fragility of the resulting predictive model; but make no assumptions about thelimits of predictive analytics. Breakthroughs often occur in the application of the tools and methods tonew commercial opportunities.
Datamining and Data warehousing Future ViewsCONCLUSION:Comprehensive data warehouses that integrate operational data with customer, supplier, and marketinformation have resulted in an explosion of information. Competition requires timely and sophisticatedanalysis on an integrated view of the data. However, there is a growing gap between more powerfulstorage and retrieval systems and the users’ ability to effectively analyze and act on the information theycontain. Both relational and OLAP technologies have tremendous capabilities for navigating massive datawarehouses, but brute force navigation of data is not enough. A new technological leap is needed tostructure and prioritize information for specific end-user problems. The data mining tools can make thisleap. Quantifiable business benefits have been proven through the integration of data mining with currentinformation systems, and new products are on the horizon that will bring this integration to an evenwider audience of users. Data mining has a lot of potential Diversity in the field of application Estimated market for data mining is $500 millionREFERENCES:Books Referred:a. Data Mining: concepts and techniques-Jiawei Hanb. Data Mining Techniques- Arun k. Pujari.c. Decision Support and Data Warehouse systems-EfremG.Mallach