Published on

The basic concept of DBMS,data mining

Published in: Education, Technology
1 Comment
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. DBMS BY: Bhalwinder kaur I.T
  2. 2. Data Structure vs. File Structure 2 Data Structure : How to arrange data in memory File Structure : How to arrange data in Disk and/or any other secondary storage DataBase and DataBase Management System Users do NOT have to care about how to store data in a file. DBMS will handle the detail.  Users can use SQL (Structured Query Language) to access the DataBase in an interactive command and/or through a program (embeded SQL)  RAM Disk
  3. 3. File Types 3 Sequential file: one accessed in a serial manner from beginning to end. E.g. audio, video, text, programs. Text file: sequential file in which each logical record is a single character. ASCII: 1 byte/char Unicode: 2 bytes/char
  4. 4. Sequential Files 4 Sequential file: A file whose contents can only be read in order   Reader must be able to detect end-of-file (EOF) Data can be stored in logical records, sorted by a key field  Greatly increases the speed of batch updates
  5. 5. Text Files 5 Simple file structure. Extendable to more complex file structures using markup languages (XHTML, HTML). XHTML, HTML control the display of the file on a monitor. XML is a standard for markup languages.
  6. 6. Converting data from two’s complement notation into ASCII for 6 storage in a text file
  7. 7. A procedure for merging two sequential files 7
  8. 8. Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.) 8
  9. 9. XML 9 Example <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> http://www.w3schools.com/xml/xml_whatis.asp
  10. 10. Indexed File key pointer Brown P1 Johnson P3 Smith P4 Watson P5 Data item for Brown P2 Jones 10 Index file Data file Index: A list of key values and the location of their associated records
  11. 11. An inverted file 11
  12. 12. A file with a partial index 12
  13. 13. Opening an indexed file 13
  14. 14. Hashing 14 In hashing the index file is replaced by a hash function. The storage space is divided into buckets.  Each record has a key field. Each record is stored in the bucket corresponding to the hash of its key.  A hash function computes a bucket number for each key value. Advantage: no index table needed. Disadvantages:  i) hash function needs careful design; ii) unpredictable performance
  15. 15. Terminology 15 Bucket: section of the data storage area. Key: identifier for a block of information. Hash function: takes as input a key and outputs a bucket number. Collision: two keys yield the same bucket number.
  16. 16. Hash Functions 16 Hash Function Requirements  Easily and quickly computed.  Values evenly spread over the bucket numbers.  What can go wrong: bucket number computed from 1st and 3rd characters of a name: Brown, Brook, Broom, Broadhead, Biot, Bloom, …  Examples of Hash Functions  Mid square: compute (key x key) and set bucket number = middle digits.  Extraction: select digits from certain positions within the key.  Divide key by number of buckets and use the remainder.
  17. 17. Hashing the key field value 25X3Z to one of 41 buckets 17
  18. 18. The rudiments of a hashing system, in which each bucket holds those records that hash to that 18 bucket number
  19. 19. Collisions in Hashing 19 Collision: The case of two keys hashing to the same bucket Clustering problem: Poorly designed hashing function can have uneven distribution of keys into buckets  Collision also becomes a problem when there aren’t enough buckets (probability greatly increases as load factor (% of buckets filled) approaches 75%)  Solution: somewhere between 50% and 75% load factor, increase number of buckets and rehash all data 
  20. 20. Handling bucket overflow 20
  21. 21. A large file partitioned into buckets to be accessed by hashing 21
  22. 22. The role of an operating system when accessing a 22 file System calls ?
  23. 23. Maintaining a file’s order by means of a File Allocation Table (FAT) 23
  24. 24. Information Required on a Hard Drive to Load 24 Startup BIOS (POST , Load MBR) Master Boot Record (MBR)  Master Boot Program  Partition Table (16 bytes * 4) OS Boot Record (Boot Sector)  Loads the first program file of the OS Boot Loader Program  Begins process of loading OS into memory an OS
  25. 25. How Data Is Logically Stored on a Floppy Disk 25 All floppy and hard disk drives are divided into tracks and sectors Tracks are concentric circles on a disk Sector Always 512 bytes  Physical organization of a disk  BIOS manages disk as sectors Cluster (file allocation unit)  Group of sectors  Logical organization of a disk  OS views disk as a list of clusters 
  26. 26. The Boot Record 26 Track 0, sector 1 of a floppy disk Contains basic information about how the disk is organized Includes bootstrap program, which can be used to boot from the disk Uniform layout and content of boot record allows any version of DOS or Windows to read any DOS or Windows disk
  27. 27. The File Allocation Table (FAT) 27 Lists the location of files on disk in a one-column table Floppy disk FAT is 12 bits wide, called FAT12 Each entry describes how a cluster on the disk is used A bad cluster on the disk will be marked in the FAT
  28. 28. The Root Directory 28 Lists all the files assigned to this table Contains a fixed number of entries Some items included are:  Filename and extension  Time and date of creation or last update  File attributes  First cluster number
  29. 29. How a Hard Drive is Logically Organized to Hold Low-level format Data 29  Creates tracks and sectors, done at factory Partition the hard drive (FDISK.EXE)  Creates partition table at the beginning of drive High-level format  Done by OS for each logical drive Master Boot Record (MBR) is the first 512 bytes of a hard drive Master boot program (446 bytes) calls boot program to load OS  Partition table   Description, Location, Size
  30. 30. Partitions and Logical Drives 30
  31. 31. FAT16 31 Supported by DOS and all versions of Windows Uses 16 bits for each cluster entry As the size of the logical drive increases, FAT16 cluster size increases dramatically
  32. 32. FAT32 32 Became available with Windows 95 OSR2 Used 32 bits per FAT entry, although only 28 bits were used to hold cluster numbers More efficient than FAT16 in terms of cluster size
  33. 33. NTFS 33 Supported by Windows NT/2000/XP Provides greater security Used a database called the master file table (MFT) to locate files and directories Supports large hard drives
  34. 34. Comparing FAT16, FAT32, NTFS 34
  35. 35. A file versus a database organization 35 (1/2)
  36. 36. A file versus a database organization (2/2) 36
  37. 37. The conceptual layers of a database 37
  38. 38. Schemas 38 Schema: A description of the structure of an entire database, used by database software to maintain the database Subschema: A description of only that portion of the database pertinent to a particular user’s needs, used to prevent sensitive data from being accessed by unauthorized personnel
  39. 39. Database Management Systems 39 Database Management System (DBMS): A software layer that manipulates a database in response to requests from applications Distributed Database: A database stored on multiple machines  DBMS will mask this organizational detail from its users Data independence: The ability to change the organization of a database without changing the application software that uses it
  40. 40. Database Models 40 Database model: A conceptual view of a database   Relational database model Object-oriented database model
  41. 41. Relational Database Model 41 Relation: A rectangular table   Attribute: A column in the table Tuple: A row in the table Relational Design  Avoid multiple concepts within one relation   Can lead to redundant data Deleting a tuple could also delete necessary but unrelated information
  42. 42. Improving a Relational Design 42 Decomposition: Dividing the columns of a relation into two or more relations, duplicating those columns necessary to maintain relationships  Lossless or nonloss decomposition: A “correct” decomposition that does not lose any information
  43. 43. A relation containing employee information 43
  44. 44. A relation containing redundancy 44
  45. 45. An employee database consisting of three relations 45
  46. 46. Finding the departments in which employee has worked 46
  47. 47. A relation and a proposed decomposition 47
  48. 48. Relational Operations 48 Select: Choose rows Project: Choose columns Join: Assemble information from two or more relations
  49. 49. The SELECT operation 49
  50. 50. The PROJECT operation 50
  51. 51. The JOIN operation 51
  52. 52. Another example of the JOIN operation 52
  53. 53. An application of the JOIN operation 53
  54. 54. The associations between objects in an objectoriented database 59
  55. 55. Maintaining Database Integrity (1/2) 60 Transaction: A sequence of operations that must all happen together  Example: transferring money between bank accounts Transaction log: A non-volatile record of each transaction’s activities, built before the transaction is allowed to execute Commit point: The point at which a transaction has been recorded in the log  Roll-back: The process of undoing a transaction 
  56. 56. Maintaining database integrity (2/2) 61 Simultaneous access problems  Incorrect summary problem  Lost update problem Locking = preventing others from accessing data being used by a transaction   Shared lock: used when reading data Exclusive lock: used when altering data
  57. 57. Data Mining 62 Data Mining: The area of computer science that deals with discovering patterns in collections of data Data warehouse: A static data collection to be mined  Data cube: Data presented from many perspectives to enable mining Data Mining Strategies  Class description  Class discrimination  Cluster analysis  Association analysis  Outlier analysis  Sequential pattern analysis
  58. 58. Social Impact of Database Technology 63 Problems  Massive amounts of personal data are being collected    Often without knowledge or meaningful consent of affected people Data merging produces new, more invasive information Errors are widely disseminated and hard to correct Remedies  Existing legal remedies often difficult to apply  Negative publicity may be more effective