Data warehousing

641 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
641
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data warehousing

  1. 1. PRESENTATION ON DATA WAREHOUSING AND DATA MININGSUBMITTED TO: SUBMITTED BY:-MRS.MANISHA BHATNAGAR(HOD OF COMP. SCI DEPT) MCA-IIIMRS.HARKAWNALJEET KAUR ROLL NO:9(ASST. PROF OF COMP. SCIENCE DEPT)
  2. 2. CONTENTSDATA WAREHOUSECHARACTERSTICS OF DATA WAREHOUSEARCHITECTURE OF DATA WAREHOUSEDATA STORING IN DATA WAREHOUSEDATA WAREHOUSE DIMESIONAL MODELLINGINSTALLING THE SERVICE MANAGER DATAWAREHOUSE SERVEREXAMPLE OF DATA WAREHOUSEADVANTAGE OF DATA WAREHOUSEDISADVANTAGE OF DATA WAREHOUSE
  3. 3. CONTENTSDATA MININGELEMENTS OF DATA MININGDATA MINING PROCESSARCHITECTURE OF DATA MININGADDING THE OPTION OF DATA MINING TO ADATABASEADVANTAGES OF DATA MININGDISADVANTAGES OF DATA MINING
  4. 4. DATA WAREHOUSEA data warehouse is a relational database that is designed for queryand analysis rather than for transaction processing. It usuallycontains historical data derived from transaction data, but caninclude data from other sources.DEFINITION OF DATA WAREHOUSE “ A data warehouse is simply a single, complete, and consistentstore of data obtained from a variety of different sources and madeavailable to end users in a way they can understand and use it in abusiness context.” BARLIEN DEVLIN, IBM CONSULTANT
  5. 5. CHARACTERSTICS OF DATA WAREHOUSINGSubject Oriented: -Data are organized according to subject instead ofapplication. Data warehouses are designed to help you analyze dataIntegrated: -Integration is closely related to subject orientation. Datawarehouses must put data from disparate sources into a consistentformat.Nonvolatile: - Nonvolatile means that, once entered into the datawarehouse, data should not change.Time Variant: - Data warehouse maintains historical data which areused to analyze the business or market trends and facilitate futurepredictions.
  6. 6. Data Warehouse ArchitecturesData warehouses and their architectures vary depending upon thespecifics of an organizations situation. Three common architectures are:■ Data Warehouse Architecture (Basic)■ Data Warehouse Architecture (with a Staging Area)■ Data Warehouse Architecture (with a Staging Area and Data Marts)
  7. 7. Data Warehouse Architecture (Basic)Figure shows a simple architecture for a data warehouse.End users directly access data derived from several sourcesystems through the data warehouse.
  8. 8. Data Warehouse Architecture (with a Staging Area)you need to clean and process your operational data before putting it intothe warehouse.You can do this programmatically, although most data warehouses usea staging area instead.A staging area simplifies building summaries and general warehousemanagement.Figure illustrates this typical architecture.
  9. 9. Data Warehouse Architecture (with a Staging Area)
  10. 10. Data Warehouse Architecture (with a Staging Area and Data Marts)Although the architecture in Figure is quite common, you may want tocustomize your warehouses architecture for different groups within yourorganization.You can do this by adding data marts, which are systems designed for aparticular line of business.Figure illustrates an example where purchasing, sales, and inventoriesare separated. In this example, a financial analyst might want to analyzehistorical data for purchases and sales.
  11. 11. Data Warehouse Architecture(with a Staging Area and Data Marts)
  12. 12. DATA STORING IN DATA WAREHOUSEFACT TABLE: - The central table that contains the fact data. Fact tables represent datausually numeric that are analyzed and examined.DIMENSION TABLE:-Dimension tables store the information you normally use to containqueries.
  13. 13. Data Warehouse Dimensional Modelling (Types of Schemas)There are four types of schemas are available in data warehouse. SCHEMA FACT STAR CONSTELLATION SCHEMA SCHEMA SNOWFLAKE GALAXY SCHEMA SCHEMA
  14. 14. Star SchemaA star schema is the one in which a central fact table is sourrounded bydenormalized dimensional tables.
  15. 15. Snowflake schemaA snow flake schema is an enhancement of star schema by addingadditional dimensions.
  16. 16. Snowflake Schema Sale fact table geography Number Number Store Prod_id Prod_idId store_id id store_idState quantity Name quantitycountry Geography_id product Brand Prod_id Brand_id Id cost Brand
  17. 17. Galaxy SchemaGalaxy schema contains many fact tables with some commondimensions (conformed dimensions). It is also known as FactConstellation Schema
  18. 18. Galaxy SchemaRetailer supplier Supplier_idRetail_id NameName countrycity Sale fact table Purchase fact table Number Number Number Number Prod_id Prod_id Prod_id Prod_id Retail_id Retail_id supplier_id supplier_id quantity quantity quantity quantity product Prod_id Type cost
  19. 19. Installing the Service Manager Data Warehouse ServerBy using Service Manager Setup .
  20. 20. EXAMPLE OF DATA WAREHOUSEMcMaster’s Data Warehouse design
  21. 21. EXAMPLE OF DATA WAREHOUSEARCHITECTURE (PRODUCTION ENVIRONMENT)We are considering implementing the following three-tier platformwhich will allow us to scale horizontally in the future:Our development environment consists of a server with 2 x Intel Xeon2.8GHz Processors, 2GB of RAM and is running Windows 2000 –Service Pack 4.We are considering the following for the scaled roll-out of ourproduction environment.A. Hardware1. Server 1 - SAS® Data Server- 4 way 64 bit 1.5Ghz Itanium2 server- 16 Gb RAM- 2 73 Gb Drives (RAID 1) for the OS- 1 10/100/1Gb Cu Ethernet card
  22. 22. EXAMPLE OF DATA WAREHOUSEARCHITECTURE (PRODUCTION ENVIRONMENT)- 1 Windows 2003 Enterprise Edition for Itanium2 Mid-Tier (Web) Server- 2 way 32 bit 3Ghz Xeon Server- 4 Gb RAM- 1 10/100/1Gb Cu Ethernet card- 1 Windows 2003 Enterprise Edition for x863. SAN Drive Array (modular and can grow with the warehouse)- 6 – 72GB Drives (RAID 5) total 360GB for SAS® and Data
  23. 23. EXAMPLE OF DATA WAREHOUSEARCHITECTURE (PRODUCTION ENVIRONMENT) B. Software1. Server 1 - SAS® Data Server- SAS® 9.1.3 - SAS® Metadata Server - SAS® WorkSpace Server - SAS® Stored Process Server - Platform JobScheduler 2. Mid -Tier Server- SAS® Web Report Studio - SAS® Information Delivery Portal - BEA Web Logic for future SAS® SPM Platform - Xythos Web File System (WFS)  
  24. 24. EXAMPLE OF DATA WAREHOUSEARCHITECTURE (PRODUCTION ENVIRONMENT) 3. Client –Tier Server- SAS® Enterprise Guide - SAS® Add-In for Microsoft Office 
  25. 25. BENFITS OF DATA WAREHOUSE1. A Data Warehouse Delivers Enhanced Business Intelligence: -Insights  will  be  gained  through  improved  information  access.   Managers  and  executives  will  be  freed  from  making  their  decisions  based  on  limited  data  and  their  own  “gut  feelings”.   Decisions that affect the strategy and operations of organizations  will  be  based  upon  credible  facts  and  will  be  backed  up  with  evidence and actual organizational data.2. A Data Warehouse Saves TimeSince business users can quickly access critical data from a  number of sources—all in one place—they can rapidly make  informed decisions on key initiatives. They won’t waste  precious time retrieving data from multiple sources.
  26. 26. BENFITS OF DATA WAREHOUSE3. A Data Warehouse Enhances Data Quality and ConsistencyA  data  warehouse implementation  includes  the  conversion  of   data  from numerous  source  systems   into  a  common  format.   So  you  can  have  more  confidence  in  the  accuracy  of  your  data.  And  accurate  data  is  the  basis  for  strong  business decisions.4. A Data Warehouse Provides Historical IntelligenceA  data  warehouse  stores  large  amounts  of  historical  data  so  you can analyze different time periods and trends in order to  make  future  predictions.  Such  data  typically  cannot  be  stored  in  a  transactional  database  or  used  to  generate  reports from a transactional system.
  27. 27. BENFITS OF DATA WAREHOUSE5. A Data Warehouse Generates a High ROIFinally,  the  piece  de  resistance—return  on  investment.  Companies  that  have  implemented  data  warehouses  and  complementary  BI  systems  have generated  more  revenue  and  saved  more  money  than  companies  that  haven’t invested in BI systems and data warehouses.
  28. 28. DISADVANTAGES OF DATA WAREHOUSE•Long initial implementation time and associated high cost•Adding new data sources takes time and associated high cost•Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users•Typically, data is static and dated•Difficult to accommodate changes in data types and ranges, data source schema. 
  29. 29. DATA MININGData  mining  is  process of  discovering  hidden, previously  unknown and  usable  information  from  a  large  amount  of  data.  It  is  often defined as finding hidden information in a database.DEFINITION OF DATA MINING: -“The efficient discovery of valuable non-obvious information from a large collection of data.”                                                                                         [BIGUS 96]
  30. 30. Elements of Data mining•Extract,  transform,  and  load  transaction  data  onto  the  data  warehouse system.•Store and manage the data in a multidimensional database system.•Provide  data  access  to  business  analysts  and  information  technology professionals.•Analyze the data by application software.•Present the data in a useful format, such as a graph or table. 
  31. 31. DATA MINING PROCESS Slide 1 and slide 2
  32. 32. DATA MINING PROCESSSELECTION:-                            Selecting the data according to some criteria.PREPROCESSING:-                                  This is  the data cleansing stage where certain information  is  removed  which  is  unnecessary  and  may  slow  down queries.TRANSFORMATION:-                                          The data is not merely transferred across but transformed in that overlays may be added such as demographic overlays commonly used in market research.
  33. 33. DATA MINING PROCESSDATA MINING:-                                the stage is concerned with the extraction of patterns from the data. A pattern can be defined as given a set of facts(data) F, a language L, and some measure of certainty C, a pattern is a statement S in L that describes relationships among a subset F(s) of F with a certainty C.INTERPRETATION AND EVALUTION:-                                                                           The Patterns identified by the  system  are  interpreted  into  knowledge  which  can  then  be  used  to support human decision making.
  34. 34. ARCHTITECTURE OF DATA MININGThere are three tiers in the tight-coupling data mining architecture:Data layer: data layer can be database and/or data warehouse systems.This layer is an interface for all data sources. Data mining results arestored in data layer so it can be presented to end-user in form of reportsor other kind of visualization.Application layer: -Data mining application layer is used to retrievedata from database. Some transformation routine can be performed hereto transform data into desired format.Front-end layer: -Front-end layer provides intuitive and friendly userinterface for end-user to interact with data mining system. Data miningresult presented in visualization form to the user in the front-end layer. 
  35. 35. ARCHTITECTURE OF DATA MINING
  36. 36. Adding the Data Mining Option to a DatabaseOnce you have installed the Oracle Database software, you can builddatabases as needed. You might build a database without the DataMining option but later decide to add it.
  37. 37. Advantages of Data MiningMarketing / RetailData mining helps marketing companies to build models based onhistorical data to predict who will respond to new marketing campaignsuch as direct mail, online marketing campaign and etc.Data mining brings a lot of benefit s to retail company in the same wayas marketing. Through market basket analysis, the store can have anappropriate production arrangement in the way that customers can buyfrequent buying products together with pleasant.
  38. 38. Advantages of Data MiningFinance / BankingData mining gives financial institutions information about loaninformation and credit reporting. By building a model from previouscustomer’s data with common characteristics, the bank and financial canestimate what are the good and/or bad loans and its risk level. Inaddition, data mining can help banks to detect fraudulent credit cardtransaction to help credit card’s owner prevent their losses.
  39. 39. Advantages of Data MiningManufacturingBy applying data mining in operational engineering data, manufacturerscan detect faulty equipments and determine optimal control parameters.GovernmentsData mining helps government agency by digging and analyzing recordsof financial transaction to build patterns that can detect moneylaundering or criminal activity.
  40. 40. Disadvantages of data miningPrivacy IssuesThe concerns about the personal privacy have been increasingenormously recently especially when internet is booming with socialnetworks.Security issuesSecurity is a big issue. Businesses owns information about theiremployee and customers including social security number, birthday,payroll and etc. However how properly this information is taken is stillin questions. There have been a lot of cases that hackers were accessesand stole big data of customers from big corporation such as Ford MotorCredit Company, Sony… with so much personal and financialinformation available, the credit card stolen and identity theft become abig problem. 
  41. 41. Disadvantages of data miningMisuse of information/inaccurate informationInformation collected through data mining intended for marketing orethical purposes can be misused. This information is exploited byunethical people or business to take benefit of vulnerable people ordiscriminate against a group of people.
  42. 42. QUERIES? ?
  43. 43. THANK YOU

×