Published on

Introduction to database ; This lecture is about the basic things like what is data and how to make data into information etc

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. 1-1What is “Data”? ANSI definition:¡Data¢A representation of facts, concepts, or instructions in aformalized manner suitable for communication, interpretation,or processing by humans or by automatic means.£Any representation such as characters or analog quantities towhich meaning is or might be assigned. Generally, we performoperations on data or data items to supply some informationabout an entity. Volatile vs. persistent data¡Our concern is primarily with persistent data1-2Early Data Management -Ancient History¤Data are not stored on disk¤Programmer defines both logical data structure and physicalstructure (storage structure, access methods, I/O modes, etc)¤One data set per program. High data redundancy.PROGRAM 2DataManagementPROGRAM 3DataManagementPROGRAM 1DataManagementDATA SET 1DATA SET 2DATA SET 3
  2. 2. 1-3Problems¥There is no persistence.¦All data is transient and disappears when the programterminates.¥Random access memory (RAM) is expensive andlimited¦All data may not fit available memory¥Programmer productivity low¦The programmer has to do a lot of tedious work.1-4File Processing -Recent (and Current) History§Data are stored in files with interface between programs andfiles.§Various access methods exist (e.g., sequential, indexed, random)§One file corresponds to one or several programs.PROGRAM 1DataManagement FILE 1FILE 2RedundantDataPROGRAM 2DataManagementPROGRAM 3DataManagementFileSystemServices
  3. 3. 1-5File System Functions¨Mapping between logical files and physical files©Logical files: a file viewed by users and programs.Data may be viewed as a collection of bytes or as a collectionof records (collection of bytes with a particular structure)Programs manipulate logical files©Physical files: a file as it actually exists on a storagedevice.Data usually viewed as a collection of bytes located at aphysical address on the deviceOperating systems manipulate physical files.¨A set of services and an interface (usually calledapplication independent interface – API)1-6Problems With File Systems¨Data are highly redundant©sharing limited and at the file level¨Data is unstructured©“flat” files¨High maintenance costs©data dependence; accessing data is difficult©ensuring data consistency and controlling access to data¨Sharing granularity is very coarse¨Difficulties in developing new applications
  4. 4. 1-7PROGRAM 1PROGRAM 1PROGRAM 2IntegratedDatabaseDBMSQuery ProcessorTransaction Mgr…Database Approach1-8What is a Database?A database is an integrated and structuredcollection of stored operational data used (shared)by application systems of an enterpriseManufacturing Product dataUniversity Student data, coursesHospital Patient data, facilitiesBank Account data
  5. 5. 1-9What is a Database?A database (DB) is a structured collection of dataabout the entities that exist in the environment thatis being modeled.The structure of the database is determined by theabstract data model that is used.A database management system (DBMS) is thegeneralized tool that facilitates the management ofand access to the database.1-10Data ModelFormalism that defines what the structure of thedata arewithin a filebetween filesFile systems can at best specify data organizationwithin one fileAlternatives for business dataHierarchical; networkRelationalObject-orientedThis structure is usually called the schemaA schema can have many instances
  6. 6. 1-11Example Relation InstancesENO ENAME TITLEE1 J. Doe Elect. Eng.E2 M. Smith Syst. Anal.E3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE5 B. Casey Syst. Anal.E6 L. Chu Elect. Eng.E7 R. Davis Mech. Eng.E8 J. Jones Syst. Anal.EMPENO PNO RESPE1 P1 Manager 12DURE2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36E8 P3 Manager 40WORKSE7 P5 Engineer 23PROJPNO PNAME BUDGETP1 Instrumentation 150000P3 CAD/CAM 250000P2 Database Develop. 135000P4 Maintenance 310000P5 CAD/CAM 5000001-12Data constitute an organizational asset IntegratedcontrolReduction of redundancyAvoidance of inconsistencySharabilityStandardsImproved securityData integrityProgrammer productivity Data IndependenceWhy Database Technology
  7. 7. 1-13Programmer productivityHigh data independenceWhy Database TechnologyTimeMaintenanceNew systemdevelopmentTotalSystemCost1-14Invisibility (transparency) of the details ofconceptual organization, storage structure andaccess strategy to the usersLogicaltransparency of the conceptual organizationtransparency of logical access strategyPhysicaltransparency of the physical storage organizationtransparency of physical access pathsData Independence
  8. 8. 1-15ANSI/SPARC ArchitectureExternalSchemaConceptualSchemaInternalSchemaInternalviewConceptualviewExternalviewExternalviewExternalviewUsersDBMSEMP(ENO: string, ENAME: string, TITLE: string)PROJ(PNO: string, PNAME: string, BUDGET: integer)WORKS(ENO: string, PNO: string, RESP: string,DUR: integer)Store all the relations as unsorted files.Build indexes on EMP(ENO), PROJ(PNO) andWORKS(ENO,PNO).ASSIGNMENT(ENO,PNO,ENAME,PNAME)1-16Database FunctionalityIntegrated schema!Users have uniform view of data!They see things only as relations (tables) in therelational modelDeclarative integrity and consistency enforcement!24000 ≤ Salary ≤ 250000!No employee can have a salary greater than his/hermanager.!User specifies and system enforces.Individualized views!Restrictions to certain relations!Reorganization of relations for certain classes of users
  9. 9. 1-17Database Functionality (cont’d)Declarative access#Query language - SQL$Find the names of all electrical engineers.SELECT ENAMEFROM EMPWHERE TITLE = “Elect. Eng.”$Find the names of all employees who have worked on a projectas a manager for more than 12 months.SELECT EMP.ENAMEFROM EMP,ASGWHERE RESP = “Manager”AND DUR 12AND EMP.ENO = ASG.ENOSystem determined execution#Query processor optimizer1-18Database Functionality (cont’d)Transactions#Execute user requests as atomic units#May contain one query or multiple queries#Provide$Concurrency transparency%Multiple users may access the database, but they each see thedatabase as their own personal data%Concurrency control$Failure transparency%Even when system failure occurs, database consistency is notviolated%Logging and recovery
  10. 10. 1-19Database Functionality (cont’d)Transaction PropertiesAtomicity(All-or-nothing propertyConsistency(Each transaction is correct and does not violate databaseconsistencyIsolation(Concurrent transactions do not interfere with each otherDurability(Once the transaction completes its work (commits), its effectsare guaranteed to be reflected in the database regardless ofwhat may occur1-20Application ProgramsOperating System (?)DBMSApp. development toolsHardwarePlace in a Computer System
  11. 11. 1-21System StructureNaïve users ApplicationprogrammersCasual users DatabaseadministratorFormsApplicationFront ends DML Interface CLI DDLIndexes SystemCatalogDatafilesDDLCompilerDisk Space ManagerBuffer ManagerFile Access MethodsQuery Evaluation EngineSQL CommandsRecoveryManagerTransactionLockManagerDBMS1-22Database Users)End user0Naïve or casual user0Accesses database either through forms or through applicationfront-ends0More sophisticated ones generate ad hoc queries using DML)Application programmer/developer0Designs and implements applications that access the database(some may be used by end users))Database administrator (DBA)0Defines and manages the conceptual schema0Defines application and user views0Monitors and tunes DBMS performance (defines/modifies internalschema)0Loads and reformats the database0Responsible for security and reliability
  12. 12. 1-23DBMS Languages1Data Definition Language (DDL)2Defines conceptual schema, external schema, and internal schema,as well as mappings between them2Language used for each level may be different2The definitions and the generated information is stored in systemcatalog1Data Manipulation Language (DML)2Can be3embedded query language in a host language3“stand-alone” query language2Can be3Procedural: specify where and how (navigational)3Declarative: specify what1-24Brief History of Databases41960s:5Early 1960s: Charles Bachmann developed first DBMS atHoneywell (IDS)6Network model where data relationships are represented as agraph.5Late 1960s: First commercially successful DBMSdeveloped at IBM (IMS)6Hierarchical model where data relationships are represented as atree6Still in use today (SABRE reservations; Travelocity)5Late 1960s: Conference On DAta Systems Languages(CODASYL) model defined. This is the network model,but more standardized.
  13. 13. 1-25Brief History of Databases71970s:81970: Ted Codd defined the relational data model atIBM San Jose Laboratory (now IBM Almaden)8Two major projects start (both were operational in late1970s)9INGRES at University of California, Berkeley@Became commercial INGRES, followed-up by POSTGRESwhich was incorporated into Informix9System R at IBM San Jose Laboratory@Became DB281976: Peter Chen defined the Entity-Relationship (ER)model1-26Brief History of Databases71980s8Maturation of relational database technology8SQL standardization (mid-to-late 1980s) through ISO8The real growth period71990s8Continued expansion of relational technology andimprovement of performance8Distribution becomes a reality8New data models: object-oriented, deductive8Late 1990s: incorporation of object-orientation inrelational DBMSs → Object-Relational DBMSs8New application areas: Data warehousing and OLAP,Web and Internet, interest in text and multimedia