MELJUN CORTES Jedi course notes web programming-lesson1-introduction to the course
Introduction to Web ProgrammingWhy the Web?Welcome to this course on web programming. To start things off, lets begin with a betterappreciation of why it is worthwhile for companies and programmers alike to focus onweb programming.Technology-Neutral EnvironmentFirst of all, one of the great things about applications on the Internet is that Internet is atechnology-neutral environment. Communication with any application in the web is donethrough popular protocols (HTML/HTTP) that do not require the user to have a particularoperating system nor a client that is programmed in a particular programming languageor framework. All that the users will be needing is a web browser, an application which isnow bundled standard with any operating system. This translates into a wider possibleaudience for any web-based application.Ease of Distribution/UpdatesSince the only program that the user needs is their web browser, there is no need to giveaway programs through CDs. There is no need as well for the user to go through apossibly lengthy installation sequence; all they need is the location of the application inthe Internet, and they are ready to go.Another benefit of having the actual binaries of the program residing in an accessibleserver instead of in the users computer is that the usual problems related with programupdates, such as the need to periodically check for newer versions of the program andthe problem of how to actually get the program updates, are eliminated altogether. Theuser need not to be informed of an update in the program; all that would be neededwould be to update the codebase in the web server, and automatically, all users who willmake use of it afterwards will enjoy the benefits of the updates.Client-Server ArchitectureThick and Thin ClientsA web application is a kind of application that makes use of what is called a client-serverarchitecture. In this kind of architecture, a client program connects to a server inretrieving information that it needs to complete the tasks that the user has set it to do.There are what are called thin clients, and there are thick clients.Thin clients are clients containing only a minimum of what is required for the userexperience, mostly only an interface. All business logic, all data aside from the onesprovided by the user, reside within the server. Thick clients are clients that, aside froman interface, also contain some, if not many, of the processing logic required for user-specified tasks.
Client-Server Architecture from a Web Perspective.From the definition above, we can tell that the client used for web applications are whatwe call thin clients. The client program, a browser in this case, is only an interface thatthe user makes use of to perform tasks. Everything else, from the data that the userneeds to operate on to the logic that determines program flow and execution, resides onthe server.From a more web-based perspective here are the duties of the server and the client:Web server Server response (contains the document requested by the user or an error code if Server the item does not exist) processes client request by looking for the resource requested by the client Client request (contains the name and address of the item the Machine running client is looking for) a Web Browser Machine running a Web Server Figure 1: Responsibility of ServerBasically, the server takes in requests from web browser clients and returns a response.Any request coming in from the client includes the name and address of the item theclient is looking for, as well as any user-provided data. The server takes in that request,processes it, and either returns as a response the data looked for by the client ordisplays an error code indicating that the item does not exist on the server.Web clientIt is the browsers responsibility to provide the user with an interface with which to issuerequests to the server and to view the servers response.When the user issues a request to the server (for example, to retrieve a document, ormaybe to submit a form), it is the browser that formats that request into something thatthe server can understand. Once the server has finished processing the request and hassent a response, it is the browser that retrieves the required data from the serverresponse and then renders that for display to the user.HTMLHow does the browser know what to display to the user? Most web sites do not have justsimple text content, but instead employ graphics or have forms that retrieve data. Howdoes each browser know what to display?
The answer lies with HTML, an acronym for Hypertext Markup Language. HTML can bethought of as a set of instructions for the web browser on how to present content to theuser. It is an open standard updated by the W3C or the World Wide Web Consortium.Since it is an open standard, everybody has access to it. It also means that browsers aredeveloped with that standard in mind. This further means that all browsers know what todo when it encounters HTML, although some older browsers might have problems inrendering some pages that were written using newer versions of HTML that were updatedafter their development.HTTPDefinitionHTTP stands for Hypertext Transfer Protocol. It is a network protocol with Web-specificfeatures that runs on top of two other protocol layers, TCP and IP. TCP is a protocol thatis responsible for making sure if a file sent from one end of a network is deliveredcompletely and successfully at its destination. IP is a protocol that routes file pieces fromone host to another on their way to its destination. HTTP uses these two protocols tomake sure that requests and responses are delivered completely between each end ofthe communication.HTTP uses a Request/Response sequence: an HTTP client opens a connection and sendsa request message to an HTTP server; the server then returns a response message,usually containing the resource that was requested; after delivering the response, theserver closes the connection making HTTP a stateless protocol (i.e. not maintaining anyconnection information between transactions).The format of the request and response messages are similar and English-oriented. Bothkinds of messages consist of: • an initial line, • zero or more header lines, • a blank line (i.e. a CRLF by itself), and an optional message body (e.g. a file, or query data, or query output).HTTP RequestsRequests from the client to the server contain the information about the kind of data theuser is requesting. One of the items of information encapsulated in the HTTP request is amethod name. This tells the server the kind of request being made, as well as how therest of the message from the client is formatted. There are two methods that youll likelyencounter and use: GET and POST.GETGET is the simplest HTTP method that is used mainly to request a particular resourcefrom the server, whether it be a web page, a graphic image file, a document, etc.GET can also be used to send data over to the server, though doing this has itslimitations. For one, the total amount of characters that can be encapsulated into a GET
request is limited, so for situations where a lot of data need to be sent to the server, notall of the message can come through.Another limitation of the GET request method when it comes to sending data is that thedata you send using this method is simply appended to the URL you send to the server.(For now, think of URL as the unique address you send to the server denoting thelocation of whatever it is you are requesting). One of the problems encountered in thismethod is that the URL of any request you make to the server is displayed in the browserbar of any browser. This means that any sensitive data such as passwords or contactinformation can be exposed to anybody.The advantage of using GET to send data over to the server is that the URL requestingfrom a GET request can be bookmarked by the browser. This means that the user cansimply bookmark his request and access that every now and then instead of having to gothrough a process every time. Take note though that this can also be dangerous; ifbookmark functionality is not something that you want your users to have, use anothermethod instead.Here is what a URL generated with a GET request may look like:http://jedi-master.dev.java.net/servlets/NewsItemView?newsItemID=2359&filter=trueAll of the items before the question mark (?) is the original URL of the request (in thiscase its http://jedi-master.dev.java.net/servlets/NewsItemView). Everything after thatare the parameters or data that you send along to the server.Lets take a closer look at that part. Here are the parameters added to that request: newsItemID=2359&filter=trueIn GET requests, parameters are encoded as name and value pairs. You dont send overdata values to the server without it knowing specifically what that value is for. The nameand value pairs are encoded as: name=valueAlso, if there are more than one set of parameters, they are separated using theampersand symbol (&). So, in this case, the parameter names we are specifying for theserver are newsItemID and filter, with the values of 2359 and true, respectively.POSTThe other kind of request method that you are most likely to use would be the POSTrequest. These kinds of requests are designed such that the browser can make complexrequests of the server. That is, they are designed so that the user, through the browser,can send a lot of data to the server. Complex forms are generally accomplished usingPOST requests, as well as simple forms that require the uploading of files to the server.One apparent difference between the GET and POST methods is the way they send datato the server. As stated before, GET simply appends the data to the URL it sends over.POST, on the other hand, encapsulates or hides the data inside of the message body itsends. When the server receives the request and determines that it is a POST request, itlooks in the message body for this data.
HTTP ResponseHTTP responses from the server contain both headers and a message body like HTTPrequests do. They use a different set of headers though, but we wont go into too muchdetail of those in here. It is sufficient to say that the headers contain information aboutthe version of the HTTP protocol that the server is using, as well as the type of contentthat is encapsulated within the message body. The value for the content type is calledthe MIME-type. This tells the browser if the message contains HTML, a picture, or someother type of content.Dynamic Over Static PagesThe kind of content that can be served up by the web server can either be static ordynamic. Static content is content that does not change. This kind of content usually justsits around in storage where the server can access it and is brought up on request. Whenthese contents are sent as a response from the server, they are sent exactly the waythey were as when they were residing in the server. Examples of static content includearchived newspaper articles, family pictures from an online photo gallery, or evenpossibly an online copy of this document!Dynamic content, on the other hand, changes according to user input. What applicationsin the server have access to for this type of content is a kind of template that they canrefer to to know how the document to be sent will look like in general. This template isthen filled in according to the parameters sent in by the user and returned to the client.Suffice it to say, dynamic pages have a lot more flexibility and have more utility thanstatic pages. Here are a couple of scenarios where dynamic content is the only thing thatwill fit the bill: • The Web page is based on data submitted by the user. For example, the results pages from search engines are generated this way. Programs that process orders for e-commerce sites do this as well. • The data changes frequently. A weather-report or news headlines page might build the page dynamically, perhaps returning a previously built page if it is still up to date. • The Web page uses information from corporate databases or other such sources.It is important to realize though, that web servers by themselves do not have thecapability to serve dynamic content. Web servers need to have access to applicationsthat can build dynamic content. Also, aside from needing separate applications forcreating dynamic content, web servers also need separate applications that will storepertinent user information (such as data collected from forms) into storage. You cantexpect to create a form, have the user input data into it, submit it to the server, andhave the server automatically know what to do with that data.We are now into that part of our discussion where we can explicitly point out that it isthe creation of these web applications that form the basis of our course. So, how do wego on about creating these applications?In this course, we will be turning primarily to Java-based technologies to create our webapplications. More specifically, we will be making extensive use of the APIs provided in
the web tier of the J2EE (Java 2 Enterprise Edition) specification.J2EE Web Tier OverviewThe Java 2 Enterprise Edition (J2EE) platform is a platform introduced for thedevelopment of enterprise applications in a component-based manner. The applicationmodel used by this platform is called a distributed multi-tier application model. Thedistributed aspect of this model simply means that most applications designed anddeveloped with this platform in mind can have their different components installed indifferent machines. The multi-tier part means that the applications are designed withmultiple degrees of separation with regards to the various major components of theapplication. An example of a multi-tiered application is a web application: thepresentation layer (the client browser), the business logic layer (the program thatresides on the web server), and the storage layer (the database which will handle theapplication data) are distinctly separated, but are all needed as a whole to create oneapplication for the user.One of the tiers in the J2EE platform as previously mentioned is the Web tier. This tier isdescribed to be the layer which interacts with browsers in order to create dynamiccontent. There are two technologies within this layer: servlets and JavaServerPages. Figure 2: The Web Tier in the J2EE Platform (Image from J2EE Tutorial)Since these will be tackled more intensively later, only a brief description will be givenhere.ServletsServlet technology is Javas primary answer for adding additional functionality to serversthat use a request-response model. They have the ability to read data contained in therequests passed to the server and generate a dynamic response based on that data.Servlets are not necessarily limited to HTTP-based situations; as stated before, they areapplicable for any scenario requiring the request-response model. HTTP-based situationsare currently their primary use, so Java has provided a HTTP-specific version thatimplements HTTP-specific features.
JavaServerPagesOne of the disadvantages of using servlets in generating a response to the client isformatting the HTML to be sent back. Since servlets are simply Java language classes,they produce output the way other Java programs would: through printing characters asStrings into the output stream, in this case the HTTP-response. However, HTML can bequite complex and it can be very hard to encode HTML through the use of String literals.Also, engaging the services of a dedicated graphics and web page designer to help in thestatic parts of the pages is hard if not impossible. We would be expecting him to have aminimum knowledge of Java.This is where JavaServerPage(JSP) technology comes in. JSP looks just like HTML, only ithas access to all the dynamic capabilities of Servlets through the use of scripts andexpression languages. Since it looks just like HTML, designers can concentrate on simpleHTML design and simply leave placeholders for developers to fill with dynamic content.ContainersCentral to the concept of any J2EE application is the Container. All J2EE components,including web components (servlets, JSPs) rely on the existence of a container; withoutthe appropriate container, they would not run.Perhaps another way to explain this would be to think of the normal mode of executionof Java programs. Java programs, in order to be run, must have a main method defined;this marks the start of program execution and is the method performed when theprogram is executed from the command line. Figure 3: Containers in the J2EE Platform (Image from J2EE Tutorial)
But, as we can see later, servlets do not have a main method defined. And if there is onedefined (bad programming design), it does not mark the start of program execution.When a user makes an HTTP request for a servlet, its methods are not called directly.Instead, the server hands the request not to the servlet, but to the container in whichthe servlet is deployed. The container is then the one responsible for calling theappropriate method in the servlet, depending on the type of user request.Features provided by the container: • Communications support. The container handles all of the code necessary for your servlet to communicate with the web server. Without the container, developers may need to write code that will create a socket connection from the server to the servlet (and vice-versa) and manage how they talk to each other every single time. • Lifecycle management. The container handles everything in the life of your servlet, from its class-loading, instantiation and initialization, and garbage collection. • Multi-threading support. The container manages the duty of creating a new thread each time a call to a servlet is made. NOTE: The container is NOT responsible for the thread safety of your servlet. • Declarative security. A container supports the use of an XML configuration file that can handle security for your web application without needing to hard-code any of it into your servlets. • JSP Support. JSP pages, in order to work, must be compiled into Java code. The container manages the task of translating your JSP pages into Java code, compiling it, and calling the appropriate methods in that code.Basic Structure of a Java Web AppFor a container to recognize your application as a valid web application, it must conformto a specific directory structure: Contains HTML, images, other static content, plus JSPs Contains meta-information about your application (optional) All contents of this folder cannot be seen from the web browser Contains class files of Java classes created for this application (optional) Contains JAR files of any third-party libraries used by your app (optional) XML file storing the configuration entries for your application Figure 4: Directory Structure of Java Web ApplicationThe illustration above shows the directory structure required by the container to
recognize your application.Some points regarding this structure:One: The top-level folder (the one containing your application) does NOT need to benamed Document Root. It can be, in fact, named any way that you like, though it ishighly recommended that the top-level folder name be the same name as yourapplication. It is only named Document Root in the figure to indicate that it serves as theroot folder of the files or documents in your application.Two: Any other folder can be created within this directory structure. For example, fordevelopers wishing to organize their content, they can create an images folder fromwithin the document root to hold all their graphics files, or maybe a config directoryinside the WEB-INF folder to hold additional configuration information. As long as theprescribed structure as shown above is followed, the container will allow additions.Three: The capitalization on the WEB-INF folder is intentional. The lowercaps on classesand lib are intentional as well. Not following the capitalization on any of these folders willresult in your application not being able to see the contents of these folders.Four: All contents of the WEB-INF folder cannot be seen from the browser. The containerautomatically manages things such that, for the browser, this folder does not exist. Thismechanism protects your sensitive resources such as Java class files, applicationconfiguration, etc. The contents of this folder can only be accessed by your application.Five: There MUST be a file named web.xml inside the WEB-INF folder. Even if, forexample, your web application contains only static content and does not make use ofJava classes or library files, the container will still require your application to have thesetwo items.ExerciseAnswer the following questions:1. What kind of architecture does a web application make use of? Who are the participants of such an architecture, and what are their roles?2. What markup language is used to instruct the browser on how to present content to the user?3. HTTP is a (stateful | stateless) connection protocol. (Underline the best answer).4. The two most used HTTP request methods are GET and POST. How are they different? When is it better to use one over the other?5. How are request parameters sent to the server using the GET method?6. What component is absolutely necessary to be able to run web applications?7. What are the non-optional elements of a web applications directory structure?8. What is the name of the XML file used for configuring the web application? In what directory can it be found?9. Which folder contains the JAR files of the software libraries used by your application?10. What folder will contain the class files of the Java code used by the application?