Csci12 report aug18

303 views

Published on

Data Warehousing and related components

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
303
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Csci12 report aug18

  1. 1. GROUP 5 Master Jenniferson Napallatan Neri Openiano Ostil
  2. 2. TOPICS <ul><li>Index </li></ul><ul><li>Indexed Sequential File </li></ul><ul><li>Properties of Indexed Sequential </li></ul><ul><li>File </li></ul><ul><li>Datawarehousing </li></ul>
  3. 3. Index <ul><li>Indexes provide fast searching of a </li></ul><ul><li>table based on one or more key </li></ul><ul><li>columns. Indexes on foreign keys can also greatly improve the performance of join. </li></ul>
  4. 4. Indexed Sequential File <ul><li>A file combining properties of </li></ul><ul><li>random-access files and sequential </li></ul><ul><li>files </li></ul>
  5. 5. <ul><li>Records in indexed sequential </li></ul><ul><li>files are stored in the order that they </li></ul><ul><li>are written to the disk. </li></ul><ul><li>Records may be retrieved in sequential </li></ul><ul><li>order or in random order using a numeric </li></ul><ul><li>index to represent the record number in file </li></ul>
  6. 6. Properties <ul><li>Primary Storage Area : </li></ul><ul><li>Records in indexed sequential files are stored in the order that they are written to the disk. Records may be retrieved in sequential order or in random order using a numeric index to represent the record number in the file. </li></ul>
  7. 7. Properties <ul><li>Records are stored sequentially, </li></ul><ul><li>originally to speed access on a tape </li></ul><ul><li>system. In contrast, a relational database </li></ul><ul><li>uses a query optimizer which automatically </li></ul><ul><li>Selects indexes. The record size, specified </li></ul><ul><li>when the file is created, may range from 1 to </li></ul><ul><li>8000 bytes. </li></ul>
  8. 8. Properties <ul><li>2. Separate Indexes : </li></ul><ul><li>The Indexed Access method of reading </li></ul><ul><li>or writing data only provides the desired </li></ul><ul><li>outcome if in fact the file is organized as an </li></ul><ul><li>ISAM file with the appropriate, previously </li></ul><ul><li>defined keys. Access to data via the </li></ul><ul><li>previously defined key(s) is extremely fast. </li></ul>
  9. 9. Properties <ul><li>Multiple keys, overlapping keys and </li></ul><ul><li>key compression within the hash </li></ul><ul><li>tables are supported. A utility to </li></ul><ul><li>define/redefine keys in existing files is </li></ul><ul><li>provided. Records can be deleted, although </li></ul><ul><li>&quot;garbage collection&quot; is done via a separate </li></ul><ul><li>utility. </li></ul>
  10. 10. Properties <ul><li>3. Overflow Area : </li></ul><ul><li>When an ISAM file is created, index </li></ul><ul><li>nodes are fixed, and their pointers do not </li></ul><ul><li>change during inserts and deletes that occur </li></ul><ul><li>later (only content of leaf nodes change </li></ul><ul><li>afterwards). </li></ul>
  11. 11. Properties <ul><li>node exceed the node's capacity, </li></ul><ul><li>new records are stored in overflow chains. If </li></ul><ul><li>there are more inserts than deletions </li></ul><ul><li>from a table, these overflow chains can </li></ul><ul><li>gradually become very large, and this </li></ul><ul><li>affects the time required for retrieval of a </li></ul><ul><li>record. </li></ul>
  12. 12. Properties <ul><li>Indexed sequential files: </li></ul><ul><li>commonly used for transaction files </li></ul><ul><li>because they take less disk space than </li></ul><ul><li>keyed files, and are faster to read from </li></ul><ul><li>beginning to end than a keyed file. </li></ul>
  13. 13. Data Warehousing <ul><li>What is a Data Warehouse? </li></ul><ul><li>DW is a subject-oriented , </li></ul><ul><li>integrated , time-variant , and </li></ul><ul><li>nonvolatile collection of data intended to support management decision making </li></ul>
  14. 14. Data Warehousing <ul><li>What is a Data Warehouse? </li></ul><ul><li>DW is a subject-oriented , integrated , </li></ul><ul><li>time-variant , and nonvolatile collection of </li></ul><ul><li>data intended to support management </li></ul><ul><li>decision making </li></ul>
  15. 15. Data Warehousing <ul><li>DATABASE vs DATA WAREHOUSE </li></ul><ul><li>Database: transactional </li></ul><ul><li>(relational, object-oriented, network, heierarchical) </li></ul><ul><li>Data Warehouse: mainly INTENDED for decision support applications </li></ul><ul><li>**optimized for retrieval not routine transactional processing** </li></ul>
  16. 16. Data Warehousing <ul><li>What is a Data Warehousing? </li></ul><ul><li>combining multiple and usually </li></ul><ul><li>varied sources into one comprehensive </li></ul><ul><li>and easily manipulated database. </li></ul><ul><li>(wiseGEEK.com) </li></ul>
  17. 17. Data Warehousing <ul><li>Properties: </li></ul><ul><li>1. Organized around major subject </li></ul><ul><li>areas of an org. (i.e. sales ,suppliers,products, etc.) </li></ul><ul><li>2. Integrated from multiple operational </li></ul><ul><li>OLTP data sources </li></ul><ul><li>** OLTP = OnLine Transaction Processing db </li></ul>
  18. 18. Data Warehousing <ul><li>Properties: </li></ul><ul><li>3. Periodic updates (based on schedules) </li></ul><ul><li>There is a trend wherein updates are </li></ul><ul><li>gearing towards near real-time reporting </li></ul><ul><li>of business analytics. </li></ul>
  19. 19. Data Warehousing <ul><li>Advantages: </li></ul><ul><li>Competitive advantage </li></ul><ul><li>Increased productivity of corporate </li></ul><ul><li>decision makers </li></ul><ul><li>3. Potential high return on investment as the org. Finds the best way to impove efficiency and/or profitability </li></ul>
  20. 20. Data Warehousing <ul><li>Encountered Problems: </li></ul><ul><li>Underestimation of resources </li></ul><ul><li>required to load the data </li></ul><ul><li>2. Hidden data integrity problems in source </li></ul><ul><li>data </li></ul><ul><li>3. Omitting data later found to be required </li></ul>
  21. 21. Data Warehousing <ul><li>Encountered Problems: </li></ul><ul><li>4. Ever increasing end user demands </li></ul><ul><li>5. Consolidating data from diparate data </li></ul><ul><li>sources </li></ul><ul><li>6. High resource demands (huge amount of </li></ul><ul><li>storage; queries that process millions of rows) </li></ul><ul><li>7. Ownership of data </li></ul>
  22. 22. Data Warehousing <ul><li>Encountered Problems: </li></ul><ul><li>8. Difficulty in determining what the </li></ul><ul><li>business really wants or needs to </li></ul><ul><li>analyze </li></ul><ul><li>9. “Big Bang” projects that seem never-ending </li></ul>

×