PRESENTATION ON DATA WAREHOUSING AND DATA MININGSUBMITTED TO: SUBMITTED BY:-MRS.MANISHA BHATNAGAR(HOD OF COMP. SCI DEPT) MCA-IIIMRS.HARKAWNALJEET KAUR ROLL NO:9(ASST. PROF OF COMP. SCIENCE DEPT)
CONTENTSDATA WAREHOUSECHARACTERSTICS OF DATA WAREHOUSEARCHITECTURE OF DATA WAREHOUSEDATA STORING IN DATA WAREHOUSEDATA WAREHOUSE DIMESIONAL MODELLINGINSTALLING THE SERVICE MANAGER DATAWAREHOUSE SERVEREXAMPLE OF DATA WAREHOUSEADVANTAGE OF DATA WAREHOUSEDISADVANTAGE OF DATA WAREHOUSE
CONTENTSDATA MININGELEMENTS OF DATA MININGDATA MINING PROCESSARCHITECTURE OF DATA MININGADDING THE OPTION OF DATA MINING TO ADATABASEADVANTAGES OF DATA MININGDISADVANTAGES OF DATA MINING
DATA WAREHOUSEA data warehouse is a relational database that is designed for queryand analysis rather than for transaction processing. It usuallycontains historical data derived from transaction data, but caninclude data from other sources.DEFINITION OF DATA WAREHOUSE “ A data warehouse is simply a single, complete, and consistentstore of data obtained from a variety of different sources and madeavailable to end users in a way they can understand and use it in abusiness context.” BARLIEN DEVLIN, IBM CONSULTANT
CHARACTERSTICS OF DATA WAREHOUSINGSubject Oriented: -Data are organized according to subject instead ofapplication. Data warehouses are designed to help you analyze dataIntegrated: -Integration is closely related to subject orientation. Datawarehouses must put data from disparate sources into a consistentformat.Nonvolatile: - Nonvolatile means that, once entered into the datawarehouse, data should not change.Time Variant: - Data warehouse maintains historical data which areused to analyze the business or market trends and facilitate futurepredictions.
Data Warehouse ArchitecturesData warehouses and their architectures vary depending upon thespecifics of an organizations situation. Three common architectures are:■ Data Warehouse Architecture (Basic)■ Data Warehouse Architecture (with a Staging Area)■ Data Warehouse Architecture (with a Staging Area and Data Marts)
Data Warehouse Architecture (Basic)Figure shows a simple architecture for a data warehouse.End users directly access data derived from several sourcesystems through the data warehouse.
Data Warehouse Architecture (with a Staging Area)you need to clean and process your operational data before putting it intothe warehouse.You can do this programmatically, although most data warehouses usea staging area instead.A staging area simplifies building summaries and general warehousemanagement.Figure illustrates this typical architecture.
Data Warehouse Architecture (with a Staging Area)
Data Warehouse Architecture (with a Staging Area and Data Marts)Although the architecture in Figure is quite common, you may want tocustomize your warehouses architecture for different groups within yourorganization.You can do this by adding data marts, which are systems designed for aparticular line of business.Figure illustrates an example where purchasing, sales, and inventoriesare separated. In this example, a financial analyst might want to analyzehistorical data for purchases and sales.
Data Warehouse Architecture(with a Staging Area and Data Marts)
DATA STORING IN DATA WAREHOUSEFACT TABLE: - The central table that contains the fact data. Fact tables represent datausually numeric that are analyzed and examined.DIMENSION TABLE:-Dimension tables store the information you normally use to containqueries.
Data Warehouse Dimensional Modelling (Types of Schemas)There are four types of schemas are available in data warehouse. SCHEMA FACT STAR CONSTELLATION SCHEMA SCHEMA SNOWFLAKE GALAXY SCHEMA SCHEMA
Star SchemaA star schema is the one in which a central fact table is sourrounded bydenormalized dimensional tables.
Snowflake schemaA snow flake schema is an enhancement of star schema by addingadditional dimensions.
Snowflake Schema Sale fact table geography Number Number Store Prod_id Prod_idId store_id id store_idState quantity Name quantitycountry Geography_id product Brand Prod_id Brand_id Id cost Brand
Galaxy SchemaGalaxy schema contains many fact tables with some commondimensions (conformed dimensions). It is also known as FactConstellation Schema
Galaxy SchemaRetailer supplier Supplier_idRetail_id NameName countrycity Sale fact table Purchase fact table Number Number Number Number Prod_id Prod_id Prod_id Prod_id Retail_id Retail_id supplier_id supplier_id quantity quantity quantity quantity product Prod_id Type cost
Installing the Service Manager Data Warehouse ServerBy using Service Manager Setup .
EXAMPLE OF DATA WAREHOUSEMcMaster’s Data Warehouse design
EXAMPLE OF DATA WAREHOUSEARCHITECTURE (PRODUCTION ENVIRONMENT)We are considering implementing the following three-tier platformwhich will allow us to scale horizontally in the future:Our development environment consists of a server with 2 x Intel Xeon2.8GHz Processors, 2GB of RAM and is running Windows 2000 –Service Pack 4.We are considering the following for the scaled roll-out of ourproduction environment.A. Hardware1. Server 1 - SAS® Data Server- 4 way 64 bit 1.5Ghz Itanium2 server- 16 Gb RAM- 2 73 Gb Drives (RAID 1) for the OS- 1 10/100/1Gb Cu Ethernet card
EXAMPLE OF DATA WAREHOUSEARCHITECTURE (PRODUCTION ENVIRONMENT)- 1 Windows 2003 Enterprise Edition for Itanium2 Mid-Tier (Web) Server- 2 way 32 bit 3Ghz Xeon Server- 4 Gb RAM- 1 10/100/1Gb Cu Ethernet card- 1 Windows 2003 Enterprise Edition for x863. SAN Drive Array (modular and can grow with the warehouse)- 6 – 72GB Drives (RAID 5) total 360GB for SAS® and Data
EXAMPLE OF DATA WAREHOUSEARCHITECTURE (PRODUCTION ENVIRONMENT) B. Software1. Server 1 - SAS® Data Server- SAS® 9.1.3 - SAS® Metadata Server - SAS® WorkSpace Server - SAS® Stored Process Server - Platform JobScheduler 2. Mid -Tier Server- SAS® Web Report Studio - SAS® Information Delivery Portal - BEA Web Logic for future SAS® SPM Platform - Xythos Web File System (WFS)
EXAMPLE OF DATA WAREHOUSEARCHITECTURE (PRODUCTION ENVIRONMENT) 3. Client –Tier Server- SAS® Enterprise Guide - SAS® Add-In for Microsoft Office
BENFITS OF DATA WAREHOUSE1. A Data Warehouse Delivers Enhanced Business Intelligence: -Insights will be gained through improved information access. Managers and executives will be freed from making their decisions based on limited data and their own “gut feelings”. Decisions that affect the strategy and operations of organizations will be based upon credible facts and will be backed up with evidence and actual organizational data.2. A Data Warehouse Saves TimeSince business users can quickly access critical data from a number of sources—all in one place—they can rapidly make informed decisions on key initiatives. They won’t waste precious time retrieving data from multiple sources.
BENFITS OF DATA WAREHOUSE3. A Data Warehouse Enhances Data Quality and ConsistencyA data warehouse implementation includes the conversion of data from numerous source systems into a common format. So you can have more confidence in the accuracy of your data. And accurate data is the basis for strong business decisions.4. A Data Warehouse Provides Historical IntelligenceA data warehouse stores large amounts of historical data so you can analyze different time periods and trends in order to make future predictions. Such data typically cannot be stored in a transactional database or used to generate reports from a transactional system.
BENFITS OF DATA WAREHOUSE5. A Data Warehouse Generates a High ROIFinally, the piece de resistance—return on investment. Companies that have implemented data warehouses and complementary BI systems have generated more revenue and saved more money than companies that haven’t invested in BI systems and data warehouses.
DISADVANTAGES OF DATA WAREHOUSE•Long initial implementation time and associated high cost•Adding new data sources takes time and associated high cost•Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users•Typically, data is static and dated•Difficult to accommodate changes in data types and ranges, data source schema.
DATA MININGData mining is process of discovering hidden, previously unknown and usable information from a large amount of data. It is often defined as finding hidden information in a database.DEFINITION OF DATA MINING: -“The efficient discovery of valuable non-obvious information from a large collection of data.” [BIGUS 96]
Elements of Data mining•Extract, transform, and load transaction data onto the data warehouse system.•Store and manage the data in a multidimensional database system.•Provide data access to business analysts and information technology professionals.•Analyze the data by application software.•Present the data in a useful format, such as a graph or table.
DATA MINING PROCESSSELECTION:- Selecting the data according to some criteria.PREPROCESSING:- This is the data cleansing stage where certain information is removed which is unnecessary and may slow down queries.TRANSFORMATION:- The data is not merely transferred across but transformed in that overlays may be added such as demographic overlays commonly used in market research.
DATA MINING PROCESSDATA MINING:- the stage is concerned with the extraction of patterns from the data. A pattern can be defined as given a set of facts(data) F, a language L, and some measure of certainty C, a pattern is a statement S in L that describes relationships among a subset F(s) of F with a certainty C.INTERPRETATION AND EVALUTION:- The Patterns identified by the system are interpreted into knowledge which can then be used to support human decision making.
ARCHTITECTURE OF DATA MININGThere are three tiers in the tight-coupling data mining architecture:Data layer: data layer can be database and/or data warehouse systems.This layer is an interface for all data sources. Data mining results arestored in data layer so it can be presented to end-user in form of reportsor other kind of visualization.Application layer: -Data mining application layer is used to retrievedata from database. Some transformation routine can be performed hereto transform data into desired format.Front-end layer: -Front-end layer provides intuitive and friendly userinterface for end-user to interact with data mining system. Data miningresult presented in visualization form to the user in the front-end layer.
Adding the Data Mining Option to a DatabaseOnce you have installed the Oracle Database software, you can builddatabases as needed. You might build a database without the DataMining option but later decide to add it.
Advantages of Data MiningMarketing / RetailData mining helps marketing companies to build models based onhistorical data to predict who will respond to new marketing campaignsuch as direct mail, online marketing campaign and etc.Data mining brings a lot of benefit s to retail company in the same wayas marketing. Through market basket analysis, the store can have anappropriate production arrangement in the way that customers can buyfrequent buying products together with pleasant.
Advantages of Data MiningFinance / BankingData mining gives financial institutions information about loaninformation and credit reporting. By building a model from previouscustomer’s data with common characteristics, the bank and financial canestimate what are the good and/or bad loans and its risk level. Inaddition, data mining can help banks to detect fraudulent credit cardtransaction to help credit card’s owner prevent their losses.
Advantages of Data MiningManufacturingBy applying data mining in operational engineering data, manufacturerscan detect faulty equipments and determine optimal control parameters.GovernmentsData mining helps government agency by digging and analyzing recordsof financial transaction to build patterns that can detect moneylaundering or criminal activity.
Disadvantages of data miningPrivacy IssuesThe concerns about the personal privacy have been increasingenormously recently especially when internet is booming with socialnetworks.Security issuesSecurity is a big issue. Businesses owns information about theiremployee and customers including social security number, birthday,payroll and etc. However how properly this information is taken is stillin questions. There have been a lot of cases that hackers were accessesand stole big data of customers from big corporation such as Ford MotorCredit Company, Sony… with so much personal and financialinformation available, the credit card stolen and identity theft become abig problem.
Disadvantages of data miningMisuse of information/inaccurate informationInformation collected through data mining intended for marketing orethical purposes can be misused. This information is exploited byunethical people or business to take benefit of vulnerable people ordiscriminate against a group of people.