Your SlideShare is downloading. ×
Presented By: Mahesh Choudhari-  10 Deepali Raut- 54 Rohit Muslonkar-  30 Prajakta Mali- 25 Sheetal Sonawane- 49
Database Approach and Design <ul><li>DBMS’s minimize the following problems: </li></ul><ul><ul><li>Data redundancy </li></...
Database Table
Definition – Data Dictionary <ul><li>Its an integral part of a database, which holds information about the meta-data i.e. ...
Data Dictionary
What is Normalization? <ul><li>Normalization  is a method for analyzing and reducing a relational database to its most str...
Non-Normalized Relation
Normalizing the Database
What is Indexing? <ul><li>An  index  is Database object use to improve the speed of data retrieval operations </li></ul><u...
 
Why a data warehouse? <ul><ul><li>Data - scattered, different versions, subtle differences </li></ul></ul><ul><ul><li>Poor...
Why a data warehouse? (cont’d) <ul><li>Query driven approach has its problems </li></ul><ul><ul><li>Delay in query process...
Data Warehouse   Definition <ul><li>Subject-oriented </li></ul><ul><li>Integrated </li></ul><ul><li>Time-variant  </li></u...
Data Warehouse Definition… <ul><li>Subject-Oriented: </li></ul><ul><ul><li>The data warehouse is organized around the key ...
Data Warehouse Definition… <ul><li>Integrated </li></ul><ul><ul><li>The data housed in the data warehouse are defined usin...
Data Warehouse Definition… <ul><li>Time-variant </li></ul><ul><ul><li>The data in the warehouse contain a time dimension s...
The Data Warehouse advantage <ul><li>Semantic reconciliation </li></ul><ul><ul><li>Data sources are distributed in many bu...
The data warehouse advantage (cont’d...) <ul><li>Improves data quality </li></ul><ul><ul><li>Data from a source usually ne...
Evolution of Data warehouse
What is a Data Warehouse?  . . . . . A Practitioners Viewpoint <ul><li>A single, complete, and consistent store of data ob...
<ul><li>Enterprise </li></ul><ul><li>Data </li></ul><ul><li>Warehouse </li></ul><ul><li>Execution Systems </li></ul><ul><l...
Typical  Data Warehouse Architecture
DW Architecture <ul><li>Generic Two-Level Architecture  </li></ul><ul><li>Independent Data Mart  </li></ul><ul><li>Depende...
Tools used in Data Warehousing Component  Product used Purpose Reporting  Crystal Reports  Create presentation style repor...
Components of Data warehouse <ul><li>Operational Source System </li></ul><ul><li>Data Staging Area  </li></ul><ul><li>--  ...
ETL – E xtract  T ransform   and  L oad <ul><li>Extract Transform and Load (ETL) is a process that involves extracting dat...
ETL Data Flow…
ETL process <ul><li>Stand for  Extract, Transform and Load </li></ul><ul><li>Its a process in data warehousing responsible...
ETL Tools Informatica  Power Center IBM  Websphere DataStage(Formerly known as Ascential DataStage) SAP  BusinessObjects D...
OLTP-O n L ine  T ransaction  P rocessing   <ul><li>Facilitate and manage transaction-oriented applications in terms of bu...
OLAP  –  On L ine  A nalytical  P rocessing <ul><li>Generally synonymous with terms such as Decisions Support, Business In...
OLTP  vs.  OLAP
Example: Invoice / Bill amount for a specific customer based on CAF Number (or) MDN needs to be found from a transactional...
Data Warehouse for Decision Support <ul><li>Putting Information technology to help the organization make faster and better...
DSS – D ecision  S upport  S ystem <ul><li>An interactive computer based system </li></ul><ul><li>Used to manage and contr...
 
DSS  Development Process <ul><li>Understand </li></ul><ul><li>User requirement </li></ul><ul><li>Business Process </li></u...
Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Cal...
Benefits of DSS <ul><li>Improving Personal Efficiency  </li></ul><ul><li>Expediting Problem Solving  </li></ul><ul><li>Fac...
Need of DSS   ... at Different Level  in An Organization
Case Study Telecom Industry
DSS Data warehouse Architecture
Component Details <ul><li>Source systems which DSS accesses or gets feed from .  </li></ul><ul><li>-- ADC(Billing), Clarif...
User Involved <ul><li>COO/CIO/CEO </li></ul><ul><li>Customer Support Executive </li></ul><ul><li>Revenue Assurance Manager...
Sample Reports Delivery Circle Refund Pendency Report  Total refund pendency  JAN FEB MAR APR MAY JUNE AP 1 - - - 209 90 3...
Thank You
Generic two-level architecture Periodic extraction    data is not completely current in warehouse E T L BACK
Independent Data Mart BACK E T L Separate ETL for each  independent  data mart Data access complexity due to  multiple  da...
Dependent  data mart with  operational data store BACK E T L Single ETL for  enterprise data warehouse (EDW) Dependent  da...
Logical data mart and @active data warehouse BACK BACK E T L Near real-time ETL for  @active Data Warehouse Data marts are...
Upcoming SlideShare
Loading in...5
×

Datawarehousing & DSS

3,834

Published on

MIM-Sem IV @JBIMS

Published in: Education, Technology, Business
1 Comment
5 Likes
Statistics
Notes
  • hi can u send me this slipe and my emailid is diptimayeebhadra@gmail.com.Thanks in advnce
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,834
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide
  • The data model is a diagram that represents the entities in the database and their relationships. An entity is a person, place, thing, or event about which information is maintained. A record generally describes an entity. An attribute is a particular characteristic or quality of a particular entity. The primary ke y is a field that uniquely identifies a record. Secondary keys are other field that have some identifying information but typically do not identify the file with complete accuracy.
  • Transcript of "Datawarehousing & DSS"

    1. 1. Presented By: Mahesh Choudhari- 10 Deepali Raut- 54 Rohit Muslonkar- 30 Prajakta Mali- 25 Sheetal Sonawane- 49
    2. 2. Database Approach and Design <ul><li>DBMS’s minimize the following problems: </li></ul><ul><ul><li>Data redundancy </li></ul></ul><ul><ul><li>Data isolation </li></ul></ul><ul><ul><li>Data inconsistency </li></ul></ul><ul><ul><li>Designing of Database by means of </li></ul></ul><ul><ul><li>Tables and Constraints </li></ul></ul><ul><ul><li>Data Dictionary </li></ul></ul><ul><ul><li>Normalization </li></ul></ul><ul><ul><li>Indexing </li></ul></ul>
    3. 3. Database Table
    4. 4. Definition – Data Dictionary <ul><li>Its an integral part of a database, which holds information about the meta-data i.e. Data about data </li></ul><ul><li>Advantages of a Data Dictionary </li></ul><ul><li>Creating an informative and well-designed database </li></ul><ul><li>Identifying table structures and types </li></ul>
    5. 5. Data Dictionary
    6. 6. What is Normalization? <ul><li>Normalization is a method for analyzing and reducing a relational database to its most streamlined form </li></ul><ul><ul><li>Minimum redundancy </li></ul></ul><ul><ul><li>Maximum data integrity </li></ul></ul><ul><ul><li>Best processing performance </li></ul></ul>
    7. 7. Non-Normalized Relation
    8. 8. Normalizing the Database
    9. 9. What is Indexing? <ul><li>An index is Database object use to improve the speed of data retrieval operations </li></ul><ul><li>Indexes can be created using one or more columns of a database table which are frequently used together </li></ul><ul><li>Providing the basis for rapid random lookups and efficient access of ordered records </li></ul><ul><li>Index provide function base search to allow case-insensitive search i.e. Upper/Lower case . </li></ul>
    10. 11. Why a data warehouse? <ul><ul><li>Data - scattered, different versions, subtle differences </li></ul></ul><ul><ul><li>Poor data documentation </li></ul></ul><ul><ul><li>Requires Data transformation </li></ul></ul><ul><li>Traditional data management approach is query driven, i.e., lazy and on-demand </li></ul>
    11. 12. Why a data warehouse? (cont’d) <ul><li>Query driven approach has its problems </li></ul><ul><ul><li>Delay in query processing </li></ul></ul><ul><ul><ul><li>Unavailability of a data source </li></ul></ul></ul><ul><ul><ul><li>Need to filter and integrate results </li></ul></ul></ul><ul><ul><li>Frequent queries are usually inefficient and expensive </li></ul></ul><ul><ul><ul><li>Difficult to implement caching </li></ul></ul></ul><ul><ul><ul><li>Lack of standards </li></ul></ul></ul><ul><ul><li>Need to compete with local processing resources </li></ul></ul>
    12. 13. Data Warehouse Definition <ul><li>Subject-oriented </li></ul><ul><li>Integrated </li></ul><ul><li>Time-variant </li></ul><ul><li>Non-volatile collection of data </li></ul>
    13. 14. Data Warehouse Definition… <ul><li>Subject-Oriented: </li></ul><ul><ul><li>The data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects include </li></ul></ul><ul><ul><ul><li>Customers </li></ul></ul></ul><ul><ul><ul><li>Suppliers </li></ul></ul></ul><ul><ul><ul><li>Revenues </li></ul></ul></ul><ul><ul><ul><li>Products ,etc. </li></ul></ul></ul>
    14. 15. Data Warehouse Definition… <ul><li>Integrated </li></ul><ul><ul><li>The data housed in the data warehouse are defined using consistent </li></ul></ul><ul><ul><ul><li>Naming conventions </li></ul></ul></ul><ul><ul><ul><li>Formats </li></ul></ul></ul><ul><ul><ul><li>Encoding Structures </li></ul></ul></ul><ul><ul><ul><li>Related Characteristics </li></ul></ul></ul>
    15. 16. Data Warehouse Definition… <ul><li>Time-variant </li></ul><ul><ul><li>The data in the warehouse contain a time dimension so that they may be used as a historical record of the business </li></ul></ul><ul><li>Non-volatile </li></ul><ul><ul><li>Data in the data warehouse are loaded and refreshed from operational systems, but cannot be updated by end-users </li></ul></ul>
    16. 17. The Data Warehouse advantage <ul><li>Semantic reconciliation </li></ul><ul><ul><li>Data sources are distributed in many businesses </li></ul></ul><ul><ul><li>Different encoding of the same entities </li></ul></ul><ul><ul><li>A warehouse encompasses the full volume of data in a single unified schema </li></ul></ul><ul><li>Performance </li></ul><ul><ul><li>Managers need different views of the same data </li></ul></ul><ul><ul><li>Efficiently supports OLAP operations </li></ul></ul>
    17. 18. The data warehouse advantage (cont’d...) <ul><li>Improves data quality </li></ul><ul><ul><li>Data from a source usually needs “cleaning” </li></ul></ul><ul><ul><li>The warehouse acts as a “cleaning buffer” </li></ul></ul><ul><ul><li>Thus, minimizes data error </li></ul></ul><ul><li>There is clear ROI (Return on Investment) for organizations implementing a data warehouse </li></ul><ul><ul><li>Quick and easy access to data </li></ul></ul><ul><ul><li>Extensive analysis of data for Decision making </li></ul></ul><ul><ul><li>Consolidated view of organizational data </li></ul></ul>
    18. 19. Evolution of Data warehouse
    19. 20. What is a Data Warehouse? . . . . . A Practitioners Viewpoint <ul><li>A single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context </li></ul>
    20. 21. <ul><li>Enterprise </li></ul><ul><li>Data </li></ul><ul><li>Warehouse </li></ul><ul><li>Execution Systems </li></ul><ul><li>CRM </li></ul><ul><li>ERP </li></ul><ul><li>Legacy </li></ul><ul><li>e-Commerce </li></ul><ul><li>Reporting Tools </li></ul><ul><li>OLAP Tools </li></ul><ul><li>Ad Hoc Query Tools </li></ul><ul><li>Data Mining Tools </li></ul><ul><li>External </li></ul><ul><li>Data </li></ul><ul><li>Purchased Market Data </li></ul><ul><li>Spreadsheets </li></ul><ul><li>Oracle </li></ul><ul><li>SQL Server </li></ul><ul><li>Teradata </li></ul><ul><li>DB2 </li></ul><ul><li>Custom Tools </li></ul><ul><li>HTML Reports </li></ul><ul><li>Cognos </li></ul><ul><li>Business Objects </li></ul><ul><li>MicroStrategy </li></ul><ul><li>Oracle Discoverer </li></ul><ul><li>Brio </li></ul><ul><li>Data Mining Tools </li></ul><ul><li>Portals </li></ul>Data and Metadata Repository Layer <ul><li>ETL Tools: </li></ul><ul><li>Informatica PowerMart </li></ul><ul><li>Ab Initio </li></ul><ul><li>Data Stage </li></ul><ul><li>Oracle Warehouse Builder </li></ul><ul><li>Custom programs </li></ul><ul><li>SQL scripts </li></ul><ul><li>Extract, Transformation, and Load (ETL) Layer </li></ul><ul><li>Cleanse Data </li></ul><ul><li>Filter Records </li></ul><ul><li>Standardize Values </li></ul><ul><li>Decode Values </li></ul><ul><li>Apply Business Rules </li></ul><ul><li>House holding </li></ul><ul><li>Deduce Records </li></ul><ul><li>Merge Records </li></ul>Presentation Layer ETL Layer Source Systems Sample Technologies: <ul><li>Metadata Repository </li></ul><ul><li>PeopleSoft </li></ul><ul><li>SAP </li></ul><ul><li>Siebel </li></ul><ul><li>Oracle Applications </li></ul><ul><li>Manugistics </li></ul><ul><li>Custom Systems </li></ul>Data Warehouse Architecture
    21. 22. Typical Data Warehouse Architecture
    22. 23. DW Architecture <ul><li>Generic Two-Level Architecture </li></ul><ul><li>Independent Data Mart </li></ul><ul><li>Dependent Data Mart and Operational Data Store </li></ul><ul><li>Logical Data Mart and active Warehouse </li></ul>
    23. 24. Tools used in Data Warehousing Component Product used Purpose Reporting Crystal Reports Create presentation style reports with chart and graphs Querying Access 2000 Create complex ad-hoc queries against a variety of data sources OLAP Crystal Analysis Professional Access data cubes for designing views to pivot, filter and aggregate facts on pre-defined dimensions for specific subject areas Data Mining/Statistical Analysis SAS Statistical Analysis and Churn analysis
    24. 25. Components of Data warehouse <ul><li>Operational Source System </li></ul><ul><li>Data Staging Area </li></ul><ul><li>-- Services: Clean, combine and standardize </li></ul><ul><li>-- Data Store: Flat files and Relational tables, </li></ul><ul><li>-- Processing: Sorting and sequential processing. </li></ul><ul><li>Data Presentation Area </li></ul><ul><li>-- Data Marts :Data being divided into different blocks of data as per requirement or application area </li></ul><ul><li>Data Access Tools </li></ul>
    25. 26. ETL – E xtract T ransform and L oad <ul><li>Extract Transform and Load (ETL) is a process that involves extracting data from multiple sources in various formats, transforming it to fit business needs, and ultimately, loading it into a target system. </li></ul><ul><li>The target system will generally be configured as a data warehouse or data mart, though ETL can refer to a process that loads to any type of data storage structure. </li></ul><ul><li>The structure itself will typically be a database, but may also be an application, file or other storage facility. </li></ul><ul><li>The purpose of ETL is to reformat, cleanse and standardize data so that it can be analyzed or exchanged to address business needs and/or promote interoperability. </li></ul><ul><li>Note that ETT (extraction, transformation, transportation), ETM (extraction, transformation, move), ELT (extraction, load, transform) may be used synonymously with ETL. </li></ul>
    26. 27. ETL Data Flow…
    27. 28. ETL process <ul><li>Stand for Extract, Transform and Load </li></ul><ul><li>Its a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. </li></ul><ul><li>Involves the following tasks: 1. Extracting the data from source systems (SAP, ERP, other operational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing. 2. Transforming the data - </li></ul><ul><ul><li>applying business rules ( like derivations, calculating new measures and dimensions),   </li></ul></ul><ul><ul><li>cleaning (e.g., mapping NULL to 0 or &quot;Male&quot; to &quot;M&quot; and &quot;Female&quot; to &quot;F&quot; etc.),   </li></ul></ul><ul><ul><li>filtering (e.g., selecting only certain columns to load),   </li></ul></ul><ul><ul><li>splitting a column into multiple columns and vice versa,   </li></ul></ul><ul><ul><li>joining together data from multiple sources (e.g., lookup, merge),   transposing rows and columns,   </li></ul></ul><ul><ul><li>applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing) </li></ul></ul><ul><li>3. Loading the data into a data warehouse or data repository or other reporting applications </li></ul>
    28. 29. ETL Tools Informatica Power Center IBM Websphere DataStage(Formerly known as Ascential DataStage) SAP BusinessObjects Data Integrator IBM Cognos Data Manager (Formerly known as Cognos DecisionStream) Microsoft SQL Server Integration Services Oracle Data Integrator (Formerly known as Sunopsis Data Conductor) SAS Data Integration Studio Oracle Warehouse Builder AB Initio   Information Builders Data Migrator Pentaho Pentaho Data Integration Embarcadero Technologies DT/Studio IKAN ETL4ALL IBM DB2 Warehouse Edition Pervasive Data Integrator ETL Solutions Ltd. Transformation Manager Group 1 Software (Sagent) DataFlow Sybase Data Integrated Suite ETL Talend Talend Open Studio Expressor Software Expressor Semantic Data Integration System Elixir Elixir Repertoire OpenSys CloverETL
    29. 30. OLTP-O n L ine T ransaction P rocessing <ul><li>Facilitate and manage transaction-oriented applications in terms of business or commercial context </li></ul><ul><li>E.g.- ATM, electronic banking, order processing, employee time clock systems, e-commerce and many more… </li></ul><ul><li>Advantages – simplicity, efficiency and faster </li></ul><ul><li>Disadvantages – security, reliability and susceptible to direct attack </li></ul>
    30. 31. OLAP – On L ine A nalytical P rocessing <ul><li>Generally synonymous with terms such as Decisions Support, Business Intelligence, Executive Information System </li></ul><ul><li>OLAP is…. </li></ul><ul><li> Fast </li></ul><ul><li>Analysis </li></ul><ul><li>Shared </li></ul><ul><li>Multidimensional </li></ul><ul><li>A powerful visualization paradigm </li></ul>
    31. 32. OLTP vs. OLAP
    32. 33. Example: Invoice / Bill amount for a specific customer based on CAF Number (or) MDN needs to be found from a transactional system which is ADC Number of customers whose invoice / bill is greater than Rs.1000.00 for the past three months needs to have OLAP system which is DSS
    33. 34. Data Warehouse for Decision Support <ul><li>Putting Information technology to help the organization make faster and better decisions </li></ul><ul><ul><li>Which of my customers are most likely to go to the competition? </li></ul></ul><ul><ul><li>What product promotions have the biggest impact on revenue? </li></ul></ul><ul><ul><li>How did the share price of software companies correlate with profits over last 10 years? </li></ul></ul>
    34. 35. DSS – D ecision S upport S ystem <ul><li>An interactive computer based system </li></ul><ul><li>Used to manage and control business </li></ul><ul><li>Data is historical or point-in-time </li></ul><ul><li>Optimized for inquiry rather than update </li></ul><ul><li>Use of the system is loosely defined and can be ad-hoc </li></ul><ul><li>Used to understand the business and make judgments </li></ul>
    35. 37. DSS Development Process <ul><li>Understand </li></ul><ul><li>User requirement </li></ul><ul><li>Business Process </li></ul><ul><li>Key Result Areas to be analysed in the report </li></ul><ul><li>Source System based on which report to be built </li></ul><ul><li>Agree upon the business logic and time line for implementation of reports in a phased manner </li></ul><ul><li>Develop </li></ul><ul><li>Logical & Physical data model </li></ul><ul><li>Programs </li></ul><ul><li>Database to suit to business need </li></ul><ul><li>Multiple programs are required to develop the database. This involves integration of programs in an optimized manner </li></ul><ul><li>Testing Involve </li></ul><ul><li>Data validation with reference to source system and business rules agreed upon with users </li></ul><ul><li>User Acceptance </li></ul><ul><li>This could be an iterative process till final acceptance by the user </li></ul><ul><li>QA ensure </li></ul><ul><li>Application development is in accordance to the development process defined at DSS </li></ul><ul><li>Delivery of reports in a consistent manner </li></ul>Release indicates the report is productionised Necessary user guide and training are given to the users to facilitate the use of reports Creation of userid’s and assign access rights for reports Requirement Analysis Application Development Exhaustive Testing Quality Assurance Release Report
    36. 38. Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
    37. 39. Benefits of DSS <ul><li>Improving Personal Efficiency </li></ul><ul><li>Expediting Problem Solving </li></ul><ul><li>Facilitating Interpersonal Communications </li></ul><ul><li>Promoting Learning or Training </li></ul><ul><li>Increasing Organizational Control </li></ul>
    38. 40. Need of DSS ... at Different Level in An Organization
    39. 41. Case Study Telecom Industry
    40. 42. DSS Data warehouse Architecture
    41. 43. Component Details <ul><li>Source systems which DSS accesses or gets feed from . </li></ul><ul><li>-- ADC(Billing), Clarify (Customer Master Data), Interconnect (for CDRs) </li></ul><ul><li>ETL box on which datastage is installed. </li></ul><ul><li>-- To store the in process temporary files. </li></ul><ul><li>Repository database i.e PRODDSS and PRODBILL database </li></ul><ul><li>--All Business Objects reports are taken from both of these servers. </li></ul><ul><li>For SAP BIW applications there are 2 boxes. </li></ul><ul><li>-- One box is the Server for SAP BIW and </li></ul><ul><li>-- Other is the application boxe for SAP BIW . </li></ul><ul><li>All BIW reports are taken from these boxes. </li></ul><ul><li>The data is segregated from the servers using SAN box . </li></ul>
    42. 44. User Involved <ul><li>COO/CIO/CEO </li></ul><ul><li>Customer Support Executive </li></ul><ul><li>Revenue Assurance Manager </li></ul><ul><li>Sales Manager </li></ul><ul><li>Account Manager </li></ul><ul><li>Circle Head </li></ul><ul><li>Service Assurance Manager , etc.. </li></ul>
    43. 45. Sample Reports Delivery Circle Refund Pendency Report Total refund pendency JAN FEB MAR APR MAY JUNE AP 1 - - - 209 90 300 DL 4 - 2 - 112 23 141 GJ - - - - 411 123 534 KA 1 1 6 - 84 27 119 KL - - - - 31 10 41 MH 1 - 6 10 53 28 98 MP 12 13 53 - 82 - 160 MU - - - - 150 61 211 PB - 20 8 16 52 2 98 RJ - - - 2 9 4 15 TN 1 2 2 13 153 75 246 UP - 5 5 1 58 9 78 WB - - 1 - 97 64 162 Grand Total 20 41 83 42 1501 516 2203
    44. 46. Thank You
    45. 47. Generic two-level architecture Periodic extraction  data is not completely current in warehouse E T L BACK
    46. 48. Independent Data Mart BACK E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts
    47. 49. Dependent data mart with operational data store BACK E T L Single ETL for enterprise data warehouse (EDW) Dependent data marts loaded from EDW
    48. 50. Logical data mart and @active data warehouse BACK BACK E T L Near real-time ETL for @active Data Warehouse Data marts are NOT separate databases, but logical views of the data warehouse  Easier to create new data marts ODS and data warehouse are one and the same

    ×