Data warehousing and Data mining


Published on

Content is about why we need Data ware Housing and data mining and describe why we need this and it's applications

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository. A process of transforming data into information and making it available to users in a timely enough manner to make a difference.
  • , e.g. regression analysis, standard distribution, standard deviation, etc (STATISTICS)
  • . Machine learning attempts to let computer programs learn about the data they study, such that programs make different decisions based on the qualities of the studied data, using statistics for fundamental concepts, and adding more advanced AI heuristics and algorithms to achieve its goals.Data mining, in many ways, is fundamentally the adaptation of machine learning techniques to business applications. Data mining is best described as the union of historical and recent developments in statistics, AI, and machine learning. These techniques are then used together to study data and find previously-hidden trends or patterns within.
  • Data Mining Applications in Sales/MarketingData mining enables businesses to understand the hidden patterns inside historical purchasing transaction data, thus helping in planning and launching new marketing campaigns in prompt and cost effective way. The following illustrates several data mining applications in sale and marketing.Data mining is used for market basket analysis to provide information on what product combinations were purchased together, when they were bought and in what sequence.  This information helps businesses promote their most profitable products and maximize the profit. In addition, it encourages customers to purchase related products that they may have been missed or overlooked.Retail companies uses data mining to identify customer’s behavior buying patterns.
  • Several data mining techniques e.g., distributed data mining have been researched, modeled and developed to help credit card fraud detection.Data mining is used to identify customers loyalty by analyzing the data of customer’s purchasing activities such as the data of frequency of purchase in a period of time, total monetary value of all purchases and when was the last purchase. After analyzing those dimensions, the relative measure is generated for each customer. The higher of the score, the more relative loyal the customer is.To help bank to retain credit card customers, data mining is applied.  By analyzing the past data, data mining can help banks predict customers that likely to change their credit card affiliation so they can plan and launch different special offers to retain those customers.Credit card spending by customer groups can be identified by using data mining.The hidden correlation’s between different financial indicators can be discovered by using data mining.From historical market data, data mining enables to identify stock trading rules.
  • The growth of the insurance industry entirely depends on the ability of converting data into the knowledge, information or intelligence about customers, competitors and its markets. Data mining is applied in insurance industry lately but brought tremendous competitive advantages to the companies who have implemented it successfully. The data mining applications in insurance industry are listed below:Data mining is applied in claims analysis such as identifying which medical procedures are claimed together.Data mining enables to forecasts which customers will potentially purchase new policies.Data mining allows insurance companies to detect risky customers’ behavior patterns.Data mining helps detect fraudulent behavior.
  • Data mining helps determine the distribution schedules among warehouses and outlets and analyze loading patterns.
  • Data mining enables to characterize patient activities to see incoming office visits.Data mining helps identify the patterns of successful medical therapies for different illnesses.
  • Data warehousing and Data mining

    1. 1. Data Mining andData Warehousing TechniquesPresented to : Muhammad FaisalPresented by:Faizan SaleemPireh PirzadaAhmed HassanMuhammad UsmanBSE-4 | DATABASE MANAGEMENT SYSTEM
    2. 2. Topics Why we need Data warehouses andData mining? What Data warehouses and Datamining? History of Data warehouses and Datamining? Techniques of Data warehouses andData mining
    3. 3. Why we need Data Mining andWare-housingProblem ScenarioSolutionNeeds of Data warehouses and Data Mining
    4. 4. Why Data Warehouse?Necessity is the mother of invention
    5. 5. Information
    6. 6. Problem Scenario 1ABC Pvt Ltd is a company withbranches at Karachi, Lahore,Peshawar and Islamabad.The Sales Manager wants quarterlysales report.Each branch has a separateoperational system.
    7. 7. ABC Pvt Ltd.KarachiLahorePeshawarIslamabadSalesManagerSales per item type per branchfor first quarter.
    8. 8. Solution for ABC Pvt Ltd. Extract sales informationfrom each database andStore the information in acommon repository at asingle site.
    9. 9. Solution ABC Pvt Ltd.KarachiLahorePeshawarIslamabadDataWarehouseSalesManagerQuery &Analysis toolsReports
    10. 10. Problem Scenario 2A Shopping Super Market has hugeoperational database. WheneverExecutives wants some report the OLTPsystem becomes slow and data entryoperators have to wait for some time.
    11. 11. ProblemOperationalDatabaseData Entry OperatorData Entry OperatorManagementWaitReport
    12. 12. Solutions for Shopping Mart Extract data needed for analysis fromoperational database and Store it in warehouse. Refresh warehouse at regular interval so that itcontains up to date information for analysis. Warehouse will contain data with historicalperspective.
    13. 13. SolutionOperationaldatabaseDataWarehouseExtractdataData EntryOperatorData EntryOperatorManagerReportTransaction
    14. 14. Need for Data Warehousing Industry has huge amount of operational data Knowledge worker wants to turn this data intouseful information. This information is used by them to supportstrategic decision making .
    15. 15. Need for Data Warehousing It is a platform for consolidated historical datafor analysis. It stores data of good quality so that knowledgeworker can make correct decisions.
    16. 16. Need for Data Warehousing From business perspectiveIt is latest marketing weaponHelps to keep customers by learning moreabout their needs .Valuable tool in today’s competitive fastevolving world.
    17. 17. Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Cardtransactions Computers have become cheaper and more powerful Competitive Pressure is Strong Provide better, customized services for an edge ( Customer Relationship Management)
    18. 18. Why Mine Data in Scientific Viewpoint Data collected and stored at enormous speeds(GB/hour) Remote sensors on a satellite telescopes scanning the skies Microarrays generating gene expression data Scientific simulations generating terabytes ofdata
    19. 19. What is Data Mining and Ware-housing?Definition Data WarehouseData Ware houses UsesDefinition Data WarehouseData Mining UsesData Ware Housing Verses Data MiningExamples
    20. 20. What is Data Ware-Housing?20Data warehousing can besaid to be the process ofcentralizing oraggregating data frommultiple sources into onecommon repository.A process of transforming datainto information and making itavailable to users in a timelyenough manner to make adifference.Data Information
    21. 21. Data Ware-Housing Uses Reporting and Data Analysis. Data warehouses store current as well as historicaldata and are used for creating trending reports forsenior management reporting such as annual andquarterly comparisons.
    22. 22. What is Data Mining?23Data mining is the processof mining and discoveringof new information interms of patterns or rulesfrom vast amounts of datainvolving methods at theintersection of artificialintelligence, machinelearning, statistics, anddatabase systems.
    23. 23. What is Data Mining? Extract information and transform it into anunderstandable structure. Uses past data to analyze the outcome of a particularproblem or situation.
    24. 24. Data Mining Uses To decide upon marketing strategies for their product. They can use data to compare and contrast amongcompetitors. Data mining interprets its data into real time analysisthat can be used to: increase sales, promote new product, or delete product that is not value-added to the company.
    25. 25. Data Mining works with WarehouseData26 Data Warehousing providesthe Enterprise with a memoryData Mining providesthe Enterprise withintelligence
    26. 26. Data ware-housing VS dataminingData Ware Housing Occurs before any Datamining process. data warehousing is theprocess of compiling andorganizing data into onecommon databaseData Mining Relies on datawarehousing data todetect meaningfulpatterns. data mining is theprocess of extractingmeaningful data fromthat database.
    27. 27. Example of data mining Credit Card Fraud. Data it collection on shoppers to find patternsin their shopping habits. A great example of data warehousing thateveryone can relate to is what Facebook does.
    28. 28. History of Data Mining andWare-housing?Data Warehouse HistoryData Mining History
    29. 29. History of Data warehouse 1960s — General Mills and Dartmouth College, in a jointresearch project, develop theterms dimensions and facts. 1970s — ACNielsen and IRI provide dimensional datamarts for retail sales. 1970s — Bill Inmon begins to define and discuss theterm: Data Warehouse
    30. 30. History of Data warehouse 1975 — Sperry Univac Introduce MAPPER (MAintain,Prepare, and Produce Executive Reports) is a databasemanagement and reporting system that includes theworlds first 4GL.
    31. 31. History of Data warehouse 1983 — Tera data introduces a database managementsystem specifically designed for decision support. 1983 — Sperry Corporation Martyn Richard Jones definesthe Sperry Information Center approach, which whilenot being a true DW in the Inmon sense, did containmany of the characteristics of DW structures.
    32. 32. History of Data warehouse 1984 — Metaphor Computer Systems releases DataInterpretation System (DIS). DIS was ahardware/software package and GUI for business usersto create a database management and analytic system.
    33. 33. History of Data warehouse 1988 — Barry Devlin and Paul Murphy publish the articlein IBM Systems Journal where they introduce the term"business data warehouse". 1990 — Red Brick Systems, founded by Ralph Kimball,introduces Red Brick Warehouse, a databasemanagement system specifically for data warehousing. 1991 — Prism Solutions, founded by Bill Inmon,introduces Prism Warehouse Manager, software fordeveloping a data warehouse.
    34. 34. History of Data warehouse 1992 — Bill Inmon publishes the book Building the DataWarehouse. 1995 — The Data Warehousing Institute, a for-profitorganization that promotes data warehousing, isfounded.
    35. 35. History of Data warehouse 1996 — Ralph Kimball publishes the book The DataWarehouse Toolkit. 2000 — Daniel Linstedt releases the Data Vault, enablingreal time auditable Data Warehouses warehouse.
    36. 36. Brief History Of Data Mining The term "Data mining" was introduced in the 1990s. Data mining can be tracked through classical statistics,artificial intelligence, and machine learning. Statistics are the foundation of most technologies onwhich data mining is built. All of these are used to studydata and data relationships.
    37. 37.  Artificial intelligence, or AI, which is built uponheuristics as opposed to statistics, attempts toapply human-thought-like processing to statisticalproblems. AI concepts were adopted for RDBMS ‘sQuery processor.Brief History Of Data Mining
    38. 38. Brief History Of Data Mining Machine learning is the union of statisticsand AI. It could be considered anevolution of AI, because it blends AIheuristics with advanced statisticalanalysis.
    39. 39. Data Mining TechniquesTask of data miningApplications of data mining
    40. 40. Processes Used in Data MiningIt is done by two Methods:• Prediction Methods• Description Methods
    41. 41. How it works Data mining involves six common taskso Classification [Predictive]o Clustering [Descriptive]o Association Rule Discovery [Descriptive]o Sequential Pattern Discovery [Descriptive]o Regression [Predictive]o Deviation Detection [Predictive]
    42. 42. Anomaly detection What is Anomaly Detection ? Types of Anomaly Detection:• Unsupervised anomaly detection• Supervised anomaly detection• Semi-supervised anomaly detection
    43. 43. Association rule learning What is Association rule learning The examples:• In super Market• Inventory Management
    44. 44. ClassificationWhat is it ? Given a collection of records (training set )Find a model for class attribute as a function of the valuesof other attributesGoal: previously unseen records should be assigned a classas accurately as possible. Example:
    45. 45. Clusters What is it ? Example:
    46. 46. Sequential PatternDiscovery What is it? Example: In point-of-sale transaction sequences, Computer Bookstore:(Intro_To_Visual_C) (C++_Primer) -->(Perl_for_dummies,Tcl_Tk) Athletic Apparel Store:(Shoes) (Racket, Racketball) --> (Sports_Jacket)(A B) (C)  (D E)
    47. 47. Regression What is it ? Example: Pagerank as used by google • Page structure implicitly holds importance of a page • Important pages are linked to by important pages
    48. 48. Applications Of Data Mining Data Mining Applications in Sales/Marketing Data Mining Applications in Banking / Finance Data Mining Applications in Health Care and Insurance Data Mining Applications in Transportation Data Mining Applications in Medicine
    49. 49. Data Mining Applications inSales/Marketing enables businesses to understand the hidden patternsinside historical purchasing transaction Market basket analysis Identify customer’s behavior
    50. 50. Data Mining Applicationsin Banking / Finance credit card fraud detection identify customers loyalty identify stock trading rules Identify users by method of payment/transaction
    51. 51. Data Mining Applicationsin Health Care and Insurance Claims analysis Forecasts of customers Detect risky customers Fraudulent behavior
    52. 52. Data Mining Applicationsin Transportation Determine the distribution schedules
    53. 53. Data Mining Applicationsin Medicine Characterize patient activities Identify the patterns
    54. 54. Data Ware-housingTechniquesStar SchemaElementsExampleStar Schema VS Snowflake Schema
    55. 55. Star Schema Star schema is the simplest form of a dimensional model, inwhich data is organized into facts and dimensions. A star schema is diagramed by surrounding each fact withits associated dimensions. The resulting diagram resembles a star. Star schemas are optimized for querying large data sets andare used in data warehouses and data marts to supportOLAP cubes, business intelligence and analytic applications,and queries.
    56. 56. Elements of star schema Dimension tables A dimension contains reference informationabout the fact, such as date, product, orcustomer. Demoralized, decoded and cleaned set ofdescriptive data elements Geography dimension tables describelocation data, such as country, state, or city Employee dimension tables describeemployees, such as salespeople
    57. 57. Fact TablesA fact is an event that is counted or measured,such as a sale or login.Contains foreign keys referencing dimensionrecordsContain either additive or semi-additivemeasures for analysis
    58. 58. Example Each dimension table has a primary key on its Id column, relatingto one of the columns (viewed as rows in the example schema) ofthe Fact_Sales tables three-column (compound) primary key(Date_Id, Store_Id, Product_Id). The non-primary key Units_Sold column of the fact table in thisexample represents a measure or metric that can be used incalculations and analysis. The non-primary key columns of the dimension tables representadditional attributes of the dimensions (such as the Year of theDim_Date dimension). For example, the following query answers how many TV sets havebeen sold, for each brand and country, in 1997: SELECT P.Brand, S.Country, SUM(F.Units_Sold)FROMFact_Sales FINNER JOIN Dim_Date D ON F.Date_Id = D.IdINNERJOIN Dim_Store S ON F.Store_Id = S.IdINNER JOIN Dim_Product PON F.Product_Id = P.IdWHERE D.YEAR = 1997ANDP.Product_Category = tvGROUP BY P.Brand, S.Country
    59. 59. SnowflakeSchemaStar SchemaEase ofmaintenance/change:No redundancyand hence moreeasy to maintainand changeHas redundant data and hence less easy tomaintain/changeEase of Use:More complexqueries and henceless easy tounderstandLess complex queries and easy tounderstandQuery Performance:More foreign keys-and hence morequery executiontimeLess no. of foreign keys and hence lesserquery execution timeNormalization:Has normalizedtablesHas De-normalized tables
    60. 60. Type ofDatawarehouse:Good to use fordatawarehousecore to simplifycomplexrelationships(many:many)Good for datamarts with simplerelationships (1:1 or 1:many)Joins:Higher number ofJoinsFewer JoinsDimension table:It may have morethan onedimension tablefor eachdimensionContains only single dimension table foreach dimensionWhen to use:When dimensiontable is relativelybig in size,snowflaking isbetter as itreduces space.When dimension table contains less numberof rows, we can go for Star schema.
    61. 61. References
    62. 62. Thank you For Your AttentionAny Questions
    63. 63. Presented byEngr.Faizan SaleemSoftware EngineerBahria University Karachi