Datawarehousing & DSS


Published on


Published in: Education, Technology, Business
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The data model is a diagram that represents the entities in the database and their relationships. An entity is a person, place, thing, or event about which information is maintained. A record generally describes an entity. An attribute is a particular characteristic or quality of a particular entity. The primary ke y is a field that uniquely identifies a record. Secondary keys are other field that have some identifying information but typically do not identify the file with complete accuracy.
  • Datawarehousing & DSS

    1. 1. Presented By: Mahesh Choudhari- 10 Deepali Raut- 54 Rohit Muslonkar- 30 Prajakta Mali- 25 Sheetal Sonawane- 49
    2. 2. Database Approach and Design <ul><li>DBMS’s minimize the following problems: </li></ul><ul><ul><li>Data redundancy </li></ul></ul><ul><ul><li>Data isolation </li></ul></ul><ul><ul><li>Data inconsistency </li></ul></ul><ul><ul><li>Designing of Database by means of </li></ul></ul><ul><ul><li>Tables and Constraints </li></ul></ul><ul><ul><li>Data Dictionary </li></ul></ul><ul><ul><li>Normalization </li></ul></ul><ul><ul><li>Indexing </li></ul></ul>
    3. 3. Database Table
    4. 4. Definition – Data Dictionary <ul><li>Its an integral part of a database, which holds information about the meta-data i.e. Data about data </li></ul><ul><li>Advantages of a Data Dictionary </li></ul><ul><li>Creating an informative and well-designed database </li></ul><ul><li>Identifying table structures and types </li></ul>
    5. 5. Data Dictionary
    6. 6. What is Normalization? <ul><li>Normalization is a method for analyzing and reducing a relational database to its most streamlined form </li></ul><ul><ul><li>Minimum redundancy </li></ul></ul><ul><ul><li>Maximum data integrity </li></ul></ul><ul><ul><li>Best processing performance </li></ul></ul>
    7. 7. Non-Normalized Relation
    8. 8. Normalizing the Database
    9. 9. What is Indexing? <ul><li>An index is Database object use to improve the speed of data retrieval operations </li></ul><ul><li>Indexes can be created using one or more columns of a database table which are frequently used together </li></ul><ul><li>Providing the basis for rapid random lookups and efficient access of ordered records </li></ul><ul><li>Index provide function base search to allow case-insensitive search i.e. Upper/Lower case . </li></ul>
    10. 11. Why a data warehouse? <ul><ul><li>Data - scattered, different versions, subtle differences </li></ul></ul><ul><ul><li>Poor data documentation </li></ul></ul><ul><ul><li>Requires Data transformation </li></ul></ul><ul><li>Traditional data management approach is query driven, i.e., lazy and on-demand </li></ul>
    11. 12. Why a data warehouse? (cont’d) <ul><li>Query driven approach has its problems </li></ul><ul><ul><li>Delay in query processing </li></ul></ul><ul><ul><ul><li>Unavailability of a data source </li></ul></ul></ul><ul><ul><ul><li>Need to filter and integrate results </li></ul></ul></ul><ul><ul><li>Frequent queries are usually inefficient and expensive </li></ul></ul><ul><ul><ul><li>Difficult to implement caching </li></ul></ul></ul><ul><ul><ul><li>Lack of standards </li></ul></ul></ul><ul><ul><li>Need to compete with local processing resources </li></ul></ul>
    12. 13. Data Warehouse Definition <ul><li>Subject-oriented </li></ul><ul><li>Integrated </li></ul><ul><li>Time-variant </li></ul><ul><li>Non-volatile collection of data </li></ul>
    13. 14. Data Warehouse Definition… <ul><li>Subject-Oriented: </li></ul><ul><ul><li>The data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects include </li></ul></ul><ul><ul><ul><li>Customers </li></ul></ul></ul><ul><ul><ul><li>Suppliers </li></ul></ul></ul><ul><ul><ul><li>Revenues </li></ul></ul></ul><ul><ul><ul><li>Products ,etc. </li></ul></ul></ul>
    14. 15. Data Warehouse Definition… <ul><li>Integrated </li></ul><ul><ul><li>The data housed in the data warehouse are defined using consistent </li></ul></ul><ul><ul><ul><li>Naming conventions </li></ul></ul></ul><ul><ul><ul><li>Formats </li></ul></ul></ul><ul><ul><ul><li>Encoding Structures </li></ul></ul></ul><ul><ul><ul><li>Related Characteristics </li></ul></ul></ul>
    15. 16. Data Warehouse Definition… <ul><li>Time-variant </li></ul><ul><ul><li>The data in the warehouse contain a time dimension so that they may be used as a historical record of the business </li></ul></ul><ul><li>Non-volatile </li></ul><ul><ul><li>Data in the data warehouse are loaded and refreshed from operational systems, but cannot be updated by end-users </li></ul></ul>
    16. 17. The Data Warehouse advantage <ul><li>Semantic reconciliation </li></ul><ul><ul><li>Data sources are distributed in many businesses </li></ul></ul><ul><ul><li>Different encoding of the same entities </li></ul></ul><ul><ul><li>A warehouse encompasses the full volume of data in a single unified schema </li></ul></ul><ul><li>Performance </li></ul><ul><ul><li>Managers need different views of the same data </li></ul></ul><ul><ul><li>Efficiently supports OLAP operations </li></ul></ul>
    17. 18. The data warehouse advantage (cont’d...) <ul><li>Improves data quality </li></ul><ul><ul><li>Data from a source usually needs “cleaning” </li></ul></ul><ul><ul><li>The warehouse acts as a “cleaning buffer” </li></ul></ul><ul><ul><li>Thus, minimizes data error </li></ul></ul><ul><li>There is clear ROI (Return on Investment) for organizations implementing a data warehouse </li></ul><ul><ul><li>Quick and easy access to data </li></ul></ul><ul><ul><li>Extensive analysis of data for Decision making </li></ul></ul><ul><ul><li>Consolidated view of organizational data </li></ul></ul>
    18. 19. Evolution of Data warehouse
    19. 20. What is a Data Warehouse? . . . . . A Practitioners Viewpoint <ul><li>A single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context </li></ul>
    20. 21. <ul><li>Enterprise </li></ul><ul><li>Data </li></ul><ul><li>Warehouse </li></ul><ul><li>Execution Systems </li></ul><ul><li>CRM </li></ul><ul><li>ERP </li></ul><ul><li>Legacy </li></ul><ul><li>e-Commerce </li></ul><ul><li>Reporting Tools </li></ul><ul><li>OLAP Tools </li></ul><ul><li>Ad Hoc Query Tools </li></ul><ul><li>Data Mining Tools </li></ul><ul><li>External </li></ul><ul><li>Data </li></ul><ul><li>Purchased Market Data </li></ul><ul><li>Spreadsheets </li></ul><ul><li>Oracle </li></ul><ul><li>SQL Server </li></ul><ul><li>Teradata </li></ul><ul><li>DB2 </li></ul><ul><li>Custom Tools </li></ul><ul><li>HTML Reports </li></ul><ul><li>Cognos </li></ul><ul><li>Business Objects </li></ul><ul><li>MicroStrategy </li></ul><ul><li>Oracle Discoverer </li></ul><ul><li>Brio </li></ul><ul><li>Data Mining Tools </li></ul><ul><li>Portals </li></ul>Data and Metadata Repository Layer <ul><li>ETL Tools: </li></ul><ul><li>Informatica PowerMart </li></ul><ul><li>Ab Initio </li></ul><ul><li>Data Stage </li></ul><ul><li>Oracle Warehouse Builder </li></ul><ul><li>Custom programs </li></ul><ul><li>SQL scripts </li></ul><ul><li>Extract, Transformation, and Load (ETL) Layer </li></ul><ul><li>Cleanse Data </li></ul><ul><li>Filter Records </li></ul><ul><li>Standardize Values </li></ul><ul><li>Decode Values </li></ul><ul><li>Apply Business Rules </li></ul><ul><li>House holding </li></ul><ul><li>Deduce Records </li></ul><ul><li>Merge Records </li></ul>Presentation Layer ETL Layer Source Systems Sample Technologies: <ul><li>Metadata Repository </li></ul><ul><li>PeopleSoft </li></ul><ul><li>SAP </li></ul><ul><li>Siebel </li></ul><ul><li>Oracle Applications </li></ul><ul><li>Manugistics </li></ul><ul><li>Custom Systems </li></ul>Data Warehouse Architecture
    21. 22. Typical Data Warehouse Architecture
    22. 23. DW Architecture <ul><li>Generic Two-Level Architecture </li></ul><ul><li>Independent Data Mart </li></ul><ul><li>Dependent Data Mart and Operational Data Store </li></ul><ul><li>Logical Data Mart and active Warehouse </li></ul>
    23. 24. Tools used in Data Warehousing Component Product used Purpose Reporting Crystal Reports Create presentation style reports with chart and graphs Querying Access 2000 Create complex ad-hoc queries against a variety of data sources OLAP Crystal Analysis Professional Access data cubes for designing views to pivot, filter and aggregate facts on pre-defined dimensions for specific subject areas Data Mining/Statistical Analysis SAS Statistical Analysis and Churn analysis
    24. 25. Components of Data warehouse <ul><li>Operational Source System </li></ul><ul><li>Data Staging Area </li></ul><ul><li>-- Services: Clean, combine and standardize </li></ul><ul><li>-- Data Store: Flat files and Relational tables, </li></ul><ul><li>-- Processing: Sorting and sequential processing. </li></ul><ul><li>Data Presentation Area </li></ul><ul><li>-- Data Marts :Data being divided into different blocks of data as per requirement or application area </li></ul><ul><li>Data Access Tools </li></ul>
    25. 26. ETL – E xtract T ransform and L oad <ul><li>Extract Transform and Load (ETL) is a process that involves extracting data from multiple sources in various formats, transforming it to fit business needs, and ultimately, loading it into a target system. </li></ul><ul><li>The target system will generally be configured as a data warehouse or data mart, though ETL can refer to a process that loads to any type of data storage structure. </li></ul><ul><li>The structure itself will typically be a database, but may also be an application, file or other storage facility. </li></ul><ul><li>The purpose of ETL is to reformat, cleanse and standardize data so that it can be analyzed or exchanged to address business needs and/or promote interoperability. </li></ul><ul><li>Note that ETT (extraction, transformation, transportation), ETM (extraction, transformation, move), ELT (extraction, load, transform) may be used synonymously with ETL. </li></ul>
    26. 27. ETL Data Flow…
    27. 28. ETL process <ul><li>Stand for Extract, Transform and Load </li></ul><ul><li>Its a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. </li></ul><ul><li>Involves the following tasks: 1. Extracting the data from source systems (SAP, ERP, other operational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing. 2. Transforming the data - </li></ul><ul><ul><li>applying business rules ( like derivations, calculating new measures and dimensions),   </li></ul></ul><ul><ul><li>cleaning (e.g., mapping NULL to 0 or &quot;Male&quot; to &quot;M&quot; and &quot;Female&quot; to &quot;F&quot; etc.),   </li></ul></ul><ul><ul><li>filtering (e.g., selecting only certain columns to load),   </li></ul></ul><ul><ul><li>splitting a column into multiple columns and vice versa,   </li></ul></ul><ul><ul><li>joining together data from multiple sources (e.g., lookup, merge),   transposing rows and columns,   </li></ul></ul><ul><ul><li>applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing) </li></ul></ul><ul><li>3. Loading the data into a data warehouse or data repository or other reporting applications </li></ul>
    28. 29. ETL Tools Informatica Power Center IBM Websphere DataStage(Formerly known as Ascential DataStage) SAP BusinessObjects Data Integrator IBM Cognos Data Manager (Formerly known as Cognos DecisionStream) Microsoft SQL Server Integration Services Oracle Data Integrator (Formerly known as Sunopsis Data Conductor) SAS Data Integration Studio Oracle Warehouse Builder AB Initio   Information Builders Data Migrator Pentaho Pentaho Data Integration Embarcadero Technologies DT/Studio IKAN ETL4ALL IBM DB2 Warehouse Edition Pervasive Data Integrator ETL Solutions Ltd. Transformation Manager Group 1 Software (Sagent) DataFlow Sybase Data Integrated Suite ETL Talend Talend Open Studio Expressor Software Expressor Semantic Data Integration System Elixir Elixir Repertoire OpenSys CloverETL
    29. 30. OLTP-O n L ine T ransaction P rocessing <ul><li>Facilitate and manage transaction-oriented applications in terms of business or commercial context </li></ul><ul><li>E.g.- ATM, electronic banking, order processing, employee time clock systems, e-commerce and many more… </li></ul><ul><li>Advantages – simplicity, efficiency and faster </li></ul><ul><li>Disadvantages – security, reliability and susceptible to direct attack </li></ul>
    30. 31. OLAP – On L ine A nalytical P rocessing <ul><li>Generally synonymous with terms such as Decisions Support, Business Intelligence, Executive Information System </li></ul><ul><li>OLAP is…. </li></ul><ul><li> Fast </li></ul><ul><li>Analysis </li></ul><ul><li>Shared </li></ul><ul><li>Multidimensional </li></ul><ul><li>A powerful visualization paradigm </li></ul>
    31. 32. OLTP vs. OLAP
    32. 33. Example: Invoice / Bill amount for a specific customer based on CAF Number (or) MDN needs to be found from a transactional system which is ADC Number of customers whose invoice / bill is greater than Rs.1000.00 for the past three months needs to have OLAP system which is DSS
    33. 34. Data Warehouse for Decision Support <ul><li>Putting Information technology to help the organization make faster and better decisions </li></ul><ul><ul><li>Which of my customers are most likely to go to the competition? </li></ul></ul><ul><ul><li>What product promotions have the biggest impact on revenue? </li></ul></ul><ul><ul><li>How did the share price of software companies correlate with profits over last 10 years? </li></ul></ul>
    34. 35. DSS – D ecision S upport S ystem <ul><li>An interactive computer based system </li></ul><ul><li>Used to manage and control business </li></ul><ul><li>Data is historical or point-in-time </li></ul><ul><li>Optimized for inquiry rather than update </li></ul><ul><li>Use of the system is loosely defined and can be ad-hoc </li></ul><ul><li>Used to understand the business and make judgments </li></ul>
    35. 37. DSS Development Process <ul><li>Understand </li></ul><ul><li>User requirement </li></ul><ul><li>Business Process </li></ul><ul><li>Key Result Areas to be analysed in the report </li></ul><ul><li>Source System based on which report to be built </li></ul><ul><li>Agree upon the business logic and time line for implementation of reports in a phased manner </li></ul><ul><li>Develop </li></ul><ul><li>Logical & Physical data model </li></ul><ul><li>Programs </li></ul><ul><li>Database to suit to business need </li></ul><ul><li>Multiple programs are required to develop the database. This involves integration of programs in an optimized manner </li></ul><ul><li>Testing Involve </li></ul><ul><li>Data validation with reference to source system and business rules agreed upon with users </li></ul><ul><li>User Acceptance </li></ul><ul><li>This could be an iterative process till final acceptance by the user </li></ul><ul><li>QA ensure </li></ul><ul><li>Application development is in accordance to the development process defined at DSS </li></ul><ul><li>Delivery of reports in a consistent manner </li></ul>Release indicates the report is productionised Necessary user guide and training are given to the users to facilitate the use of reports Creation of userid’s and assign access rights for reports Requirement Analysis Application Development Exhaustive Testing Quality Assurance Release Report
    36. 38. Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
    37. 39. Benefits of DSS <ul><li>Improving Personal Efficiency </li></ul><ul><li>Expediting Problem Solving </li></ul><ul><li>Facilitating Interpersonal Communications </li></ul><ul><li>Promoting Learning or Training </li></ul><ul><li>Increasing Organizational Control </li></ul>
    38. 40. Need of DSS ... at Different Level in An Organization
    39. 41. Case Study Telecom Industry
    40. 42. DSS Data warehouse Architecture
    41. 43. Component Details <ul><li>Source systems which DSS accesses or gets feed from . </li></ul><ul><li>-- ADC(Billing), Clarify (Customer Master Data), Interconnect (for CDRs) </li></ul><ul><li>ETL box on which datastage is installed. </li></ul><ul><li>-- To store the in process temporary files. </li></ul><ul><li>Repository database i.e PRODDSS and PRODBILL database </li></ul><ul><li>--All Business Objects reports are taken from both of these servers. </li></ul><ul><li>For SAP BIW applications there are 2 boxes. </li></ul><ul><li>-- One box is the Server for SAP BIW and </li></ul><ul><li>-- Other is the application boxe for SAP BIW . </li></ul><ul><li>All BIW reports are taken from these boxes. </li></ul><ul><li>The data is segregated from the servers using SAN box . </li></ul>
    42. 44. User Involved <ul><li>COO/CIO/CEO </li></ul><ul><li>Customer Support Executive </li></ul><ul><li>Revenue Assurance Manager </li></ul><ul><li>Sales Manager </li></ul><ul><li>Account Manager </li></ul><ul><li>Circle Head </li></ul><ul><li>Service Assurance Manager , etc.. </li></ul>
    43. 45. Sample Reports Delivery Circle Refund Pendency Report Total refund pendency JAN FEB MAR APR MAY JUNE AP 1 - - - 209 90 300 DL 4 - 2 - 112 23 141 GJ - - - - 411 123 534 KA 1 1 6 - 84 27 119 KL - - - - 31 10 41 MH 1 - 6 10 53 28 98 MP 12 13 53 - 82 - 160 MU - - - - 150 61 211 PB - 20 8 16 52 2 98 RJ - - - 2 9 4 15 TN 1 2 2 13 153 75 246 UP - 5 5 1 58 9 78 WB - - 1 - 97 64 162 Grand Total 20 41 83 42 1501 516 2203
    44. 46. Thank You
    45. 47. Generic two-level architecture Periodic extraction  data is not completely current in warehouse E T L BACK
    46. 48. Independent Data Mart BACK E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts
    47. 49. Dependent data mart with operational data store BACK E T L Single ETL for enterprise data warehouse (EDW) Dependent data marts loaded from EDW
    48. 50. Logical data mart and @active data warehouse BACK BACK E T L Near real-time ETL for @active Data Warehouse Data marts are NOT separate databases, but logical views of the data warehouse  Easier to create new data marts ODS and data warehouse are one and the same