2Chapter - Objectives• Basics of Internet, Web, HTTP, HTML, URLs.• Difference between two-tier and three-tier client-serverarchitecture.• Advantages and disadvantages of Web as a database platform.• Approaches for integrating databases into Web:• The Common Gateway Interface (CGI).• Server-Side Includes.• HTTP Cookies.
4Introduction• World-Wide Web (Web, WWW, or W3) possibly most popularand powerful networked information system to date.• As architecture of Web was designed to be platform-independent, can significantly lower deployment and trainingcosts.• Organizations using Web as strategic platform for innovativebusiness solutions, in effect becoming Web-centric.
5Introduction• Many Web sites today are file-based where each Web documentis stored in separate file.• For large sites, this can lead to significant management problems.• Also many Web sites now contain more dynamic information,such as product and pricing data.• Maintaining such data in both a database and in separate HTMLfiles is problematic.• Accessing database directly from Web would be a betterapproach.
6InternetWorldwide collection of interconnected networks.• Began in late ‘60s in ARPANET, a US DOD project, investigatinghow to build networks that could withstand partial outages.• Starting with a few nodes, Internet estimated to have over 100million users in 1997, and over 270 million users in over 100countries in 1998, with one million new users joining eachmonth.• May be 199 million users of Web by year 2000.
7Intranet and Extranet• Intranet - A Web site or group of sites belonging to anorganization, accessible only by the members of an organization.• Extranet - An intranet that is partially accessible to authorizedoutsiders.• Whereas intranet resides behind firewall and is accessible only topeople who are members of same organization, extranet providesvarious levels of accessibility to outsiders.
8The WebHypermedia-based system that provides a simple ‘point and click’means of browsing information on the Internet using hyperlinks.• Information presented on Web pages, which can contain text,graphics, pictures, sound, and video.• Can also contain hyperlinks to other Web pages, which allow usersto navigate in a non-sequential way through information.• Web documents written using HTML.
9The Web• Web consists of network of computers that can act in two roles:• as servers, providing information;• as clients (browsers), requesting information.• Protocol that governs exchange of information between Webserver and browser is HTTP and locations within documentsidentified as a URL.• Much of Web’s success is due to its simplicity and platform-independence.
10Basic Components of Web Environment
11HyperText Transfer Protocol (HTTP)Protocol used to transfer Web pages through Internet.• Based on request-response paradigm:Connection - Client establishes connection with Webserver.Request - Client sends request to Web server.Response - Web server sends response (HTMLdocument) to client.Close - Connection closed by Web server.
12HyperText Transfer Protocol (HTTP)• HTTP/1.0 is stateless protocol - each connection is closed onceserver provides response.• This makes it difficult to support concept of a session that isessential to basic DBMS transactions.
13HyperText Markup Language (HTML)Document formatting language used to design most Web pages.• A simple, yet powerful, platform-independent documentlanguage.• HTML is an application of Standardized Generalized MarkupLanguage (SGML), a system for defining structured documenttypes and markup languages to represent instances of thosedocument types.
14HyperText Markup Language (HTML)
15HyperText Markup Language (HTML)
16Uniform Resource Locators (URLs)String of alphanumeric characters that represents location oraddress of a resource on Internet and how that resource should beaccessed.• Defines uniquely where documents (resources) can be found.• Uniform Resource Identifiers (URIs) - generic set of all Internetresource names/addresses.• Uniform Resource Names (URNs) - persistent, location-independent name. Relies on name lookup services.
17Uniform Resource Locators (URLs)• URL consists of three basic parts:• protocol used for the connection,• host name,• path name on host where resource stored.• Can optionally specify:• port through which connection to host should be made,• query string.http://www.w3.org/WWW/MarkUp.html
18Static and Dynamic Web Pages• HTML document stored in file is static Web page.• Content of dynamic Web page is generated each time it isaccessed.• Thus, dynamic Web page can:• respond to user input from browser.• be customized by and for each user.• Requires hypertext to be generated by servers.• Need scripts that perform conversions from different dataformats into HTML ‘on-the-fly’.
19Requirements for Web-DBMS Integration• Ability to access valuable corporate data in a secure manner.• Data and vendor independent connectivity to allow freedom ofchoice in DBMS selection.• Ability to interface to database independent of any proprietarybrowser or Web server.• Connectivity solution that takes advantage of all the features ofan organization’s DBMS.
20Requirements for Web-DBMS Integration• Open-architecture to allow interoperability with a variety ofsystems and technologies. For example:• different Web servers;• Microsofts (Distributed) Common Object Model (DCOM/COM);• CORBA/IIOP (Internet Inter-ORB protocol);• Java/Remote Method Invocation.
21Requirements for Web-DBMS Integration• Cost-effective solution that allows for scalability, growth, andchanges in strategic directions, and helps reduce applicationsdevelopment costs.• Support for transactions that span multiple HTTP requests.• Support for session- and application-based authentication.• Acceptable performance.
22Requirements for Web-DBMS Integration• Minimal administration overhead.• Set of high-level productivity tools to allow applications to bedeveloped, maintained, and deployed with relative ease andspeed.
23Two-Tier Client-Server Architecture
24Three-Tier Client-Server Architecture• Client side presented two problems preventing true scalability:• ‘Fat’ client, requiring considerable resources on client’s computer torun effectively.• Significant client side administration overhead.• By 1995, three layers proposed, each potentially running on adifferent platform.
25Three-Tier Client-Server Architecture• Advantages:• ‘Thin’ client, requiring less expensive hardware.• Application maintenance centralized.• Easier to modify or replace one tier without affecting others.• Separating business logic from database functions makes it easier toimplement load balancing.• Maps quite naturally to Web environment.
28Disadvantages of Web-DBMS Approach• Reliability• Security• Cost• Scalability• Limited functionality of HTML• Statelessness• Bandwidth• Performance• Immaturity of development tools
30Common Gateway Interface (CGI)Specification for transferring information between a Web serverand a CGI program.• Server only intelligent enough to send documents and to tellbrowser what kind of document it is.• But server also knows how to launch other programs.• When server sees that URL points to a program (script), itexecutes script and sends back script’s output to browser as if itwere a file.
31CGI - Environment
32Common Gateway Interface (CGI)• CGI defines how scripts communicate with Web servers.• A CGI script is any script designed to accept and return data thatconforms to the CGI specification.• Before server launches script, prepares number of environmentvariables representing current state of the server, who isrequesting the information, and so on.• Script picks this up and reads STDIN.
33Common Gateway Interface (CGI)• Then performs necessary processing and writes its output toSTDOUT.• Script responsible for sending MIME header, which allowsbrowser to differentiate between components.• CGI scripts can be written in almost any language, provided itsupports reading and writing of an operating system’senvironment variables.
34Common Gateway Interface (CGI)• Four primary methods for passing information from browser to aCGI script:• Passing parameters on the command line.• Passing environment variables to CGI programs.• Passing data to CGI programs via standard input.• Using extra path information.
35CGI - Passing Parameters on Command Line
36CGI - Advantages• CGI is the de facto standard for interfacing Web servers withexternal applications.• Possibly most commonly used method for interfacing Webapplications to data sources.• Advantages:• simplicity,• language independence,• Web server independence,• wide acceptance.
37CGI - Disadvantages• Communication between client and database server must alwaysgo through Web server.• Lack of efficiency and transaction support, and difficulty validatinguser input inherited from statelessness of HTTP protocol.• HTTP never intended for long exchanges or interactivity.• Server has to generate a new process or thread for each CGI script.• Security.
38Server-Side Includes (SSI)• Allows a program to be executed, like CGI, and to incorporate itsoutput into the document.• Generally, end result is a text document.• SSI is not governed by an Internet RFC or other standard. Eachserver vendor is free to implement SSI on an ad-hoc basis, if atall.• Most servers follow NCSA’s specification.• All SSI commands are embedded within regular HTMLcomments, making the HTML portable.• Security risks of SSI are similar to those of CGI.
39HTTP Cookies• Cookies can make CGI scripts more interactive.• Cookies are small text files stored on Web client.• CGI script creates cookie and has Web server send it to client’sbrowser to store on hard disk.• Later, when client revisits Web site and uses a CGI script thatrequests this cookie, client’s browser sends information stored inthe cookie.• However, not all browsers support cookies.
40Extending the Web Server• To overcome limitations of CGI, many servers provide an API thatadds functionality to server.• Two of main APIs are Netscape’s NSAPI and Microsoft’s ISAPI.• Scripts are loaded in as part of the server, giving back-endapplications full access to all the I/O functions of server.• One copy of application is loaded and shared between multiplerequests to server.
41Extending the Web Server• Approach more complex than CGI, possibly requiring specializedprogrammers.• Can provide very flexible and powerful solution.• API extensions can provide same functionality as a CGI program,but as API runs as part of the server, API approach can performsignificantly better than CGI.• Extending Web server is potentially dangerous, since serverexecutable is being changed.
42Comparison of CGI and API• CGI and API both extend capabilities of server.• CGI scripts run in environment created by Web server program.• Scripts only execute once Web server interprets request frombrowser, then returns results back to the server.• API approach not nearly so limited in its ability to communicate.• API-based extensions are loaded into same address space as Webserver.
43Java• Proprietary language developed by Sun and currently marketedby JavaSoft.• Originally intended to support environment of networkedmachines and embedded systems.• Now, Java is rapidly becoming de facto language for Webcomputing.• Interesting because of its potential for building Web applications(applets) and server applications (servlets).
44Java• Java is ‘a simple, object-oriented, distributed, interpreted, robust,secure, architecture neutral, portable, high-performance, multi-threaded and dynamic language’.• Has a machine-independent target architecture, the Java VirtualMachine (JVM).• Since almost every Web browser vendor has already licensedJava and implemented an embedded JVM, Java applications cancurrently be deployed on most end-user platforms.
46Java• Before Java application can be executed, it must first be loadedinto memory.• Done by Class Loader, which takes ‘.class’ file(s) containingbytecodes and transfers it into memory.• Class file can be loaded from local hard drive or downloaded fromnetwork.• Finally, bytecodes must be verified to ensure that they are validdo not violate Java’s security restrictions.
47Java• Loosely speaking Java is a ‘safe’ C++.• Safety features include strong static type checking, automaticgarbage collection, and absence of machine pointers at languagelevel.• Safety is central design goal: ability to safely transmit Java codeacross Internet.• Security is also integral part of Java’s design - sandbox ensuresuntrusted application cannot gain access to system resources.
48JDBC• Modeled after ODBC, JDBC API supports basic SQL functionality.• With JDBC, Java can be used as host language for writing databaseapplications.• On top of JDBC, higher-level APIs can be built.• Currently, two types of higher-level APIs:• An embedded SQL for Java.• A direct mapping of relational database tables to Java classes.
49JDBC• JDBC API consists of two main interfaces: an API for applicationwriters, and a lower-level driver API for driver writers.• Applications and applets can access databases using:• JDBC API with pure Java JDBC drivers,• ODBC drivers and existing database client libraries.
50JDBC - Advantages/Disadvantages• Advantage of using ODBC drivers is that they are a de factostandard for PC database access, and are available for manyDBMSs, for very low price.• Disadvantages with this approach:• Non-pure JDBC driver will not necessarily work with a Web browser.• Currently downloaded applet can connect only to database locatedon host machine.• Deployment costs increase.
52JSQL• Another JDBC-based approach uses Java with static embeddedSQL.• JSQL comprises a set of clauses that extend Java to include SQLconstructs as statements and expressions.• JSQL translator transforms JSQL clauses into standard Java codethat accesses database through a CLI.
53Java Relational Binding (JRB)• Middleware product that bridges from Java to RDBMSs.• Provides orthogonal persistence through three-stage process:• database creation,• an import program,• JRB API.
54Java Relational Binding (JRB)
55Java Relational Binding (JRB)• API is set of public classes used by programmer, which includesmethods to connect to database server, open database, start/endtransactions, create/update/read objects.• JRB relies on security and integrity of underlying RDBMS:• references between objects implemented as FKs;• OIDs modeled as system generated PKs.
56Java Relational Binding (JRB)• JRB also provides notion of class extents.• Can associate predicate with class extent and retrieve objectsbased on their contents (in a select-from-where style).• The runtime system is implemented on top of a JDBC-compliantinterface layer.
59Microsoft Active Platform• Microsoft Active Platform is an ‘open, standards-based softwarearchitecture for delivering applications over the Internet andintranets’.• Contains HTML, scripting languages, and components (Java,ActiveX).• On client machine called an Active Desktop.• On Web server called an Active Server.• Active Platform is encompassing term given to these relatedtechnologies.
60Object Linking and Embedding forDataBases (OLE DB)• Microsoft has defined set of data objects, collectively known asOLE DB.• Allows OLE-oriented applications to share and manipulate sets ofdata as objects.• OLE DB is an object-oriented specification based on C++ API.• Components can be treated as data consumers and dataproviders. Consumers take data from OLE DB interfaces andproviders expose OLE DB interfaces.
62Active Server Pages (ASP)• ASP is programming model that allows dynamic, interactive Webpages to be created on server.• ASP provides flexibility of CGI, without performance overheaddiscussed previously.• ASP runs in-process with the server, and is optimized to handlelarge volume of users.• When an ‘.asp’ file is requested, Web server calls ASP, whichreads requested file, executes any commands, and sendsgenerated HTML page back to browser.
63Active Server Pages (ASP)
64Active Data Objects (ADO)• Programming extension of ASP supported by Microsoft IIS fordatabase connectivity.• Supports following key features:• Independently-created objects.• Support for stored procedures.• Support for different cursor types.• Batch updating.• Support for limits on number of returned rows.• Designed as an easy-to-use interface to OLE DB.
65Microsoft Internet Database Connector(IDC)• Similar approach to ASP, again specific to Microsoft InternetInformation Server.• IDC is an ISAPI that reads an ‘.idc’ file that contains SQLcommands.• IDC communicates with a DBMS’s ODBC driver to retrievenecessary data from database and format it using information inan ‘.htx’ file.
66Microsoft Internet Database Connector(IDC)
67Oracle Network ComputingArchitecture(NCA)• NCA aimed at providing extensibility for distributedenvironments.• It is three-tier architecture based on industry standards such as:• OMG’s CORBA 2.0 technology.• HTTP and HTML for Web enablement.• IIOP for object interoperability.• OMG’s IDL for language neutral interfaces.
70Security• All Internet traffic travels ‘in the clear’ and anyone who monitorstraffic can read it.• Need to ensure with communication that:• It is inaccessible to anyone but sender and receiver (privacy).• It has not been changed during transmission (integrity).• Receiver can be sure it came from sender (authenticity).
71Security• Sender can be sure receiver is genuine (non-fabrication).• Sender cannot deny he or she sent it (non-repudiation).• Must also protect information once it has reached Web server.
72Security• Download may have executable content, which can performfollowing malicious actions:• Corrupt data or execution state of programs.• Reformat complete disks.• Perform a total system shutdown.• Collect and download confidential data.• Usurp identity and impersonate user.• Lock up resources.• Cause non-fatal but unwelcome effects.
73Security• Look at:• Proxy Servers. - Firewalls• Message Digest - DigitalAlgorithms Signatures.• Digital Certificates. - Kerberos.• SSL and S-HTTP. - SET and SST.• Java Security. - ActiveX Security.
74Proxy Servers• Proxy server is computer that sits between browser and Webserver.• It intercepts all requests to Web server to try to fulfil requestsitself.• Has two main purposes:• improve performance• filter requests.
75Firewalls• Designed to prevent unauthorized access to/from a privatenetwork.• Can be implemented in both hardware and software, or acombination of both.• Several types of firewall techniques:• Packet filter.• Application gateway.• Circuit-level gateway.• Proxy server.
76Message Digest Algorithms• Message digest algorithm takes an arbitrary-sized string(message) and generates fixed-length string (digest or hash).• A digest has following characteristics:• It should be computationally infeasible to find another message thatwill generate same digest.• Digest does not reveal anything about message.
77Digital Signatures• Digital signature consists of two parts:• string of bits computed from data being ‘signed’• private key of individual or organization wishing the signature.• Can be used to verify data comes from this individual ororganization.
78Digital Signatures• Digital signature has many useful properties:• Authenticity can be verified, using public key.• Cannot be forged (assuming private key is kept secret).• Function of data signed and cannot be claimed to be signature forany other data.• Signed data cannot be changed or signature will no longer verifydata as being authentic.
79Digital Certificates• Attachment to electronic message used for security purposes(e.g. verify user sending message), and provide receiver withmeans to encode reply.• Sender applies for certificate from Certificate Authority (CA).• CA issues encrypted certificate containing applicant’s public keyand other identification information.
80Digital Certificates• CA makes its own public key readily available.• Recipient uses CA’s public key to decode certificate attached tomessage, verifies it as issued by CA, and obtains sender’s publickey and identification information held within certificate.• With this information, recipient can send an encrypted reply.• CA’s role is critical, acting as go-between in relationship betweentwo parties.
81Kerberos• A server of secured user names and passwords.• Provides one centralized security server for all data and resourceson network.• Database access, login, authorization control, and other securityfeatures are centralized on trusted Kerberos servers.• Has similar function to that of Certificate server: to identify andvalidate a user.
82Secure Sockets Layer (SSL)• Encryption protocol for transmitting private documents• Designed to prevent eavesdropping, tampering, and messageforgery.• Works by using private key to encrypt data that is transferred overSSL connection.• Layered between application-level protocols such as HTTP andTCP/IP transport-level protocol.• Thus, may be used for other application-level protocols such asFTP and NNTP.
83Secure-HTTP (S-HTTP)• Protocol for securely transmitting individual messages over Web.• Both SSL and S-HTTP use techniques such as encryption, anddigital signatures, and:• Allow browsers and servers to authenticate each other.• Allow controlled access to Web site.• Ensure data exchanged between browser and server is secure andreliable.
84Secure Electronic Transactions• Open, interoperable standard for processing credit cardtransactions over Internet, in simple and secure way.• Transaction is split in such a way that merchant has access toinformation about:• what is being purchased,• how much it costs,• whether payment is approved,• but no information on what payment method customer is using.
85SET• Card issuer (e.g. Visa) has access to purchase price, but noinformation on type of merchandise involved.• Certificates are heavily used by SET, both for certifyingcardholder and for certifying that merchant has relationship withfinancial institution.
87Secure Transaction Technology (SST)• Protocol designed to handle secure bank payments over Internet.• Uses DES encryption of information, RSA encryption of bankcardinformation, and strong authentication of all parties involved intransaction.
88Java Security• Sandbox ensures untrusted application cannot gain access tosystem resources.• Involves three components:• class loader,• bytecode verifier,• security manager.• Safety features provided by language and JVM, and enforced bycompiler and runtime system.• Security is a policy built on top of safety layer.
89Class Loader• Allocates a (hierarchically structured) namespace for each class.• Never allows class from ‘less protected’ namespace to replaceclass from more protected namespace.• Thus, I/O primitives, defined in local Java class, cannot be invokedor overridden by classes from out with local machine.• As browsers and Java applications can provide their own classloader, this may be viewed as weakness in security.
90Bytecode Verifier• JVM verifies bytecode instructions before allowingapplication/applet to run• Typical checks include verifying:• Compiled code is correctly formatted.• Internal stacks will not overflow/underflow.• No ‘illegal’ data conversions will occur.• Bytecode instructions are appropriately typed.• All class member accesses are valid.
91Security Manager• Each Java application defines and implements its own securitypolicy.• A Java-enabled browser contains its own Security Manager, andany applets it downloads are subject to its policies.• Generally, downloaded applets are prevented from:• Reading and writing files on client’s file system.• Making network connections to machines other than host.• Starting other programs on the client.
92Security Manager• Loading libraries.• Defining method calls.• These restrictions apply to applets downloaded overInternet/intranet.• Also do not apply to applets on client’s local disk and in directoryon CLASSPATH.• Local applets are loaded by file system loader and can read andwrite files, exit JVM, and are not passed through bytecodeverifier.
93ActiveX Security• ActiveX security model places no restrictions on what a controlcan do.• Instead, each ActiveX control can be digitally signed by its authorusing system called Authenticode™.• Digital signatures are then certified by CA.• This security model places responsibility for the computer’ssecurity on the user.
94HTTP/1.1• Number of new features added. Look at two:• Persistent connections become default behavior. While open, clientcan send synchronous or asynchronous messages, and server canrespond to them in order.• Digest authentication provided as replacement for basicauthentication. Password remains secret between client and server.Client and server compute digest value using the MD5 and digest issent across network.
95XML (eXtensible Markup Language)• XML is new standard that could preserve general applicationindependence that makes HTML portable and powerful.• Pared-down version of SGML, designed especially for Webdocuments.• Designers can create their own customized tags to providefunctionality not available with HTML.
96XML (eXtensible Markup Language)• SGML allows document to be logically separated into two:• Document Type Definition (DTD)• other containing the text itself.• Useful features include:• Database Schema Definition.• Linking to relative objects or elements.• Support for bi-directional links.• Simplicity may be lost with move to XML.