Data warehousing

739 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
739
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
62
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Our bag is a data warehouse containing databases of different subjects and in different formats(books,notes,ppt)
  • Example of Samsung productsSales manager wants to know quarterly sales all over india
  • Data warehousing

    1. 1. Data Warehousing & Data Mining By Mandar Kulkarni PRN 10030141129 MBA-IT SICSR
    2. 2. Contents• Data warehousing• Understanding data warehousing• Data warehouse architecture• Data Mining• Data mining techniques
    3. 3. Warehouse?Real time example?
    4. 4. Data Warehousing
    5. 5. SamsungMumbai Delhi Sales per item type per branch Sales for first quarter. ManagerChennaiBanglore
    6. 6. • Now, the sales manager wants to know the sales of first quarter.?• Solution – Extract information from each database store it at a single place, and process using operational systems.!
    7. 7. SolutionMumbai Report Delhi Query & Sales Data Analysis tools Manager WarehouseChennaiBanglore
    8. 8. Operational Systems• Running the business real time• Routine tasks• Decision Support Systems(DSS) – Help in taking actions!• Used by people who deal with customers, products• They are increasingly used by customers
    9. 9. Data Warehouse• A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.• A process of transforming data into information and making it available to users in a timely enough manner to make a difference
    10. 10. Definition• Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for decision making
    11. 11. Data warehouse architecture
    12. 12. SourceData Information Management & Control Delivery External MetadataProduction MDDB Data Warehouse DBMSInternal Report / QueryArchived Data Marts Data Staging Data Mining
    13. 13. Components• Source Data• Data Staging (Data Extraction, cleaning And Loading ) – Talend is the first open source ETL tool• Data Storage• Information Delivery (EIS)• Management and control
    14. 14. OLAP• Online Analytical Processing Tools• DSS tools that use multidimensional data analysis techniques – Support for a DSS data store – Data extraction and integration filter – Specialized presentation interface• Oracle OLAP 11G
    15. 15. Multidimensional analysis
    16. 16. OLAP architecture
    17. 17. 12 Rules of Data Warehouse1. Data Warehouse and Operational Environments are Separated2. Data is integrated3. Contains historical data over a long period of time4. Data is a snapshot data captured at a given point in time5. Data is subject-oriented
    18. 18. 6.Mainly read-only with periodic batch updates7.Development Life Cycle has a data driven approach versus the traditional process-driven approach8.Data contains several levels of detail -Current, Old, Lightly Summarized, Highly Summarized
    19. 19. 9.Environment is characterized by Read-only transactions to very large data sets10.System that traces data sources, transformations, and storage11.Metadata is a critical component – Source, transformation, integration, storage, relationships, history, etc12.Contains a chargeback mechanism for resource usage that enforces optimal use of data by end users
    20. 20. OLTP v/s Data warehousing OLTP Data Warehousing• Application Oriented • Subject Oriented• Used to Run Business • Used to analyze business• Detailed data • Summarized and refined• Current up-to date • Snapshot Data• Isolated data • Integrated Data • Ad-Hoc Access• Repetitive Access • Performance relaxed• Performance Sensitive • Large volume accessed at a• Few records accessed time• Read/Update Access • Mostly Read
    21. 21. Data Warehouse summary• Integrated platform for OLAP and DSS• Helps optimize business operations• Easy access to multidimensional data
    22. 22. Data Mining
    23. 23. Why Data Mining? Wealth generation Analyzing trendsStrategic decision making Security
    24. 24. Data Mining• Look for hidden patterns and trends in data that is not immediately apparent from summarizing the data• No Query…• …But an “Interestingness criteria”
    25. 25. Data Mining + = Interestingness Hidden Data criteria patterns
    26. 26. Data Mining Type of Patterns + = Interestingness Hidden Data criteria patterns
    27. 27. Data Mining Type of data Type of Interestingness criteria + = Interestingness Hidden Data criteria patterns
    28. 28. Type of Data• Tabular (Ex: Transaction data) – Relational – Multi-dimensional• Tree (Ex: XML data)• Graphs• Sequence (Ex: DNA, activity logs)• Text, Multimedia …
    29. 29. Type of Interestingness• Frequency• Rarity• Correlation• Length of occurrence (for sequence and temporal data)• Consistency• Repeating / periodicity• “Abnormal” behavior• Other patterns of interestingness…
    30. 30. Data Mining vs Statistical InferenceStatistics: Statistical Conceptual Reasoning Model (Hypothesis) “Proof” (Validation of Hypothesis)
    31. 31. Data Mining vs Statistical InferenceData mining: Mining Algorithm Based on Data Interestingness Pattern (model, rule, hypothesis) discovery
    32. 32. Used for..• Data mining is used for – Frequent Item-sets – Associations – Classifications – Clustering
    33. 33. Techniques• Algorithms – Apriori algorithm – Decision tree • SLIQ – Supervised Learning in QUEST – IBM• “GROUP BY” mysql> select sum(sal),deptno from emp group by deptno;
    34. 34. Data Mining Summary• Helps in pattern analysis and thus taking actions –real time and future based.• Analyzing trends and clusters in business operations.
    35. 35. References• http://www.datawarehousing.com/• http://www.dw-institute.com/• http://www.almaden.ibm.com/cs/quest/index.html
    36. 36. Thank youAny Questions?

    ×