Web Information Systems (WE-DINF-11912): Lecture 02 - Web Architectures

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    3 Favorites

    Web Information Systems (WE-DINF-11912): Lecture 02 - Web Architectures - Presentation Transcript

    1. Web Information Systems Web Architectures Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://vub.academia.edu/BeatSigner 2 December 2005
    2. Web Information Systems  A web information system uses web technologies for information and service delivery  Modern web information systems and web architectures have to  be extensible to cater for emerging technolgies and new forms of interaction (e.g. multimodal interaction)  manage heterogenous information such as documents, structured data, multimedia resources, semi-structured information, ...  integrate various sources (e.g. DBs) via multi-tier architectures  offer a notion of state to reflect the current application context  deal with information about users and their environment (context)  ... October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2
    3. Basic Client-Server Web Architecture HTTP Request Internet HTTP Response Client Server  Effect of typing http://www.vub.ac.be in the broswer bar (1) use a Domain Name Service (DNS) to get the IP address for www.vub.ac.be (answer 134.184.129.2) (2) create a TCP connection to 134.184.129.2 (3) send an HTTP request message over the TCP connection (4) visualise the received HTTP response message in the browser October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3
    4. Web Servers  Tasks of a web server (1) setup connection (2) receive and process HTTP request (3) fetch resource (4) create and send HTTP response Worldwide Web Servers, http://news.netcraft.com (5) logging  The most prominent web servers are the Apache HTTP Server and Microsoft's Internet Information Services (IIS)  A lot of devices have an embedded web server  printers, WLAN routers, TVs, ... October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4
    5. Example HTTP Request Message GET / HTTP/1.1 Host: www.vub.ac.be User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5
    6. Example HTTP Response Message HTTP/1.1 200 OK Date: Wed, 30 Sep 2009 13:01:41 GMT Server: Apache/1.3.33 (Unix) PHP/5.2.8 X-Powered-By: PHP/5.2.8 Keep-Alive: timeout=15, max=1000 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><!-- InstanceBegin template="/Templates/main.dwt" codeOutsideHTMLIsLocked="false" --> <head> <title>Vrije Universiteit Brussel</title> <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /> ... </html> October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6
    7. HTTP Protocol  Request/response communication model  HTTP Request  HTTP Response  Communication always has to be initiated by the client  Stateless protocol  HTTP can be used on top of various reliable protocols  TCP is by far the most commonly used one  runs on TCP port 80 by default  Latest version: HTTP/1.1  Use HTTPS scheme for encrypted connections October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7
    8. Uniform Resource Identifier (URI)  A Uniform Resources Identifier (URI) uniquely identifies a resource.  There are two types of URIs  Uniform Resource Locator (URL) - contains information about the exact location of a resource - consists of a scheme, a host and the path (resource name) - e.g. http://wise.vub.ac.be/members/beat/ - problem: URL changes if resource is moved! • idea of Persistent Uniform Resource Locators (PURLs) [http://purl.oclc.org]  Uniform Resource Name (URN) - unique and location independent name for a resource - consists of a scheme name, a namespace identifier and a namespace-specific string (separated by colons) - e.g. urn:ISBN:3837027139 October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8
    9. HTTP Message Format HTTP/1.1 200 OK start line Date: Wed, 30 Sep 2009 13:01:41 GMT Server: Apache/1.3.33 (Unix) PHP/5.2.8 X-Powered-By: PHP/5.2.8 header field(s) Transfer-Encoding: chunked Content-Type: text/html blank line (CRLF) <html> ... message body (optional) </html>  Request and response messages have the same format HTTP_message = start_line , {header} , "CRLF" , {body}; October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9
    10. HTTP Request Message  Request-specific start line start_line = method, " " , resource , " " , version; method = "GET" , "HEAD" , "POST" , "PUT" , "TRACE" , "OPTIONS" , "DELETE"; resource = complete_URL | path; version = "HTTP/" , major_version, "." , minor_version;  Methods  GET : get a resource from the server  HEAD : get the header only (no body)  POST : send data (in the body) to the server  PUT : store request body on server  TRACE : get the "final" request (after it has potentially been modified by proxies)  OPTIONS : get a list of methods supported by the server  DELETE: delete a resource on the server October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10
    11. HTTP Response Message  Response-specific start line start_line = version , status_code , reason; version = "HTTP/" , major_version, "." , minor_version; status_code = digit , digit , digit; reason = string_phrase;  Status codes  100-199 : informational  200-299 : success (e.g. 200 for 'OK')  300-399 : redirection  400-499 : client error (e.g. 404 for 'Not Found')  500-599 : server error (e.g. 503 for 'Service Unavailable') October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11
    12. HTTP Header Fields  There exist general headers (for requests and responses), request headers, response headers, entity headers and extension headers  Some important headers  Accept - request header definining the Multipurpose Internet Mail Extensions (MIME) that the client will accept  User-Agent - request header specifying the type of client  Keep-Alive (HTTP/1.0) and Persistent (HTTP/1.1) - general header helping to improve the performance since otherwise a new HTTP connection has to be established for every single webpage element  Content-Type - entity header specifing the body's MIME type October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12
    13. HTTP Header Fields  Some important headers ...  If-Modified-Since - request header that is used in combination with a GET request (conditional GET); the resource is only returned if it has been modified since the specified date October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13
    14. MIME Types  The MIME type defines the request or response body's content and is used for the appropiate processing mime = toplevel_type , "/" , subtype;  Standard MIME types are registered with the Internet Assigned Numbers Authority (IANA) [RFC-2045] MIME Type Description text/plain Human-readable text without formatting information text/html HTML document image/jpeg JPEG-encoded image ... ... October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14
    15. HTTP Message Information  Various tools for HTTP message logging  e.g. HttpFox add-on for Firefox browser  Simple telnet connection telnet wise.vub.ac.be 80 (press Enter) GET /members/beat/ HTTP/1.1 Host: wise.vub.ac.be (press Enter 2 times)  Until 1999 the W3C has been working on the HTTP Next Generation (HTTP-NG) protocol as a replacement for HTTP/1.1  never introduced October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15
    16. Proxies Internet Proxy Client Server  Web proxy between client and server  acts as a server to the client and as a client to the server and may, for example, be specified in the browser settings; used for - firewalls and content filters - transcoding (on the fly transformation of HTTP message body) - content router (e.g. select optimal server in content distribution networks) - anonymous browsing, ... October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16
    17. Caches Client 1 1 Internet 2 1 2 Client 2 Server Proxy Cache  A proxy cache is a special type of proxy server  can reduce server load if multiple clients share the same cache  often multi-level hierarchies of caches (e.g. continent, country and regional level) with communication between sibling and parent caches as defined by the Internet Cache Protocol (ICP)  passive and active (prefetching) caches October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17
    18. Caches ...  Special HTTP cache control header fields  Expires - expiration date after which the cached resource has to be refetched  Cache-Control: max-age - maximum age of a document (in s) after it has been added to the cache  Cache-Control: no-cache - response cannot be directly served from the cache (has to be revalidated first)  ...  Validators  Last-modified time as validator - cache with resource that has been last modified at time t uses an If-Modified-Since t request for updates  Entity tags (ETag) - changed by the publisher if content has changed; If-None-Match etag request October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18
    19. Caches ...  Advantages  reduces latency and used network bandwidth  reduces server load (client and reverse proxy caches)  transparent to client and server  Disadvantages  additional resources (hardware) required  might get stale data out of the cache  creates additional network traffic if we use an aggressive caching approach (prefetching) but achieve a low cache hit rate  server loses control (e.g. access statistics) since no longer all requests have to be sent to the server October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19
    20. Tunnels SSL Internet SSL HTTP SSL Client HTTP[SSL] HTTP[SSL] SSL Server  Implement one protocol on top of another protocol  e.g. HTTP as a carrier for SSL connections  Often used to "open" a firewall to protocols that would otherwise be blocked  e.g. tunneling of SSL connections through an open HTTP port October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20
    21. Gateways HTTP Internet FTP HTTP/FTP HTTP Client FTP Server Gateway  A gateway can act as a kind of "glue" between applications (client) and resources (server)  translate between two protocols (e.g. from HTTP to FTP)  security accelerator (e.g. HTTPS/HTTP on the server side)  often the gateway and destination server are combined in a single application server (HTTP to server application translator) October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21
    22. Session Management  HTTP is a stateles protocol  Session (state) tracking solutions  use of IP address - problem: IP address is often not uniquely assigned to a single user  browser login - use of special HTTP authenticate headers - after a login the browser sends the user information in each request  URL rewriting - add information to URL in each request  hidden form fields - similar to URL rewriting but information can also be in body (POST request)  cookies - the server stores a piece of information on the client which is then sent back to the server with each request October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22
    23. Cookies  Introduced by Netscape in June 1994  A cookie is a piece of information that is assigned to a client on their first visit  list of <key,value> pairs  often just a unique identifier  sent via Set-Cookie or Set-Cookie2 HTTP response headers  The browser stores the info in a "cookie database" and sends it back every time the same server is accessed  Potential privacy issues  third-party websites might use persistent cookies for user tracking  Cookies can be disabled in the browser settings October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23
    24. Hypertext Markup Language (HTML) <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Beat Signer: Interactive Paper, PaperWorks, Paper++, ...</title> </head> <body> Beat Signer is Assistant Professor of Computer Science at the VUB ... </body> </html>  Dominant markup language for webpages  If you never heard about HTML have a look at  http://www.w3schools.com/html/  More details in the exercise and in the next lecture October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24
    25. Dynamic Web Content  Often it is not enough to serve static web pages and content should be changed on the client or server side  Server-side processing  Common Gateway Interface (CGI)  Java Servlets  JavaServer Pages (JSP)  PHP: Hypertext Preprocessor (PHP)  ...  Client-side processing  JavaScript  Java Applets  Adobe Flash  ... October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25
    26. Common Gateway Interface (CGI) Program in CGI HTTP Request Perl, Tcl, C, C++, Java, .. Internet HTTP Response HTML Pages Client Server  CGI was the first server-side processing solution  transparent to the user  certain requests (e.g. /account.pl) are forwarded over CGI to a program by creating a new process  program processes the request and creates an answer and optional HTTP response headers October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26
    27. Common Gateway Interface (CGI) ...  CGI Problems  a new process has to be started for each request  if the CGI program for example acts as a gateway to a database, a new DB connection has to be established for each request which results in a very poor performance  FastCGI solves some of the problems by introducing persistent processes and process pools  CGI/FastCGI becomes more and more replaced by other technologies (e.g. Java Servlets) October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27
    28. Java Servlets Servlet Container HTTP Request Servlets Internet HTTP Response HTML Pages Client Server  A Java servlet is a Java class that has to extend the abstract HTTPServlet class  The Java servlet class is loaded by a servlet container and relevant requests (based on servlet binding) are forwarded to the servlet instance for further processing October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28
    29. Java Servlets ...  Main HttpServlet methods doGet(HttpServletRequest req, HttpServletResponse resp) doPost(HttpServletRequest req, HttpServletResponse resp) init(ServletConfig config) destroy()  Servlet life cycle  a servlet is initialised once via the init() method  the doGet(), doPost() methods may be executed multiple times (by different HTTP requests)  finally the servlet container may unload a servlet (upcall of the destroy() method before that happens)  Servlet container (e.g. Apache Tomcat) integrated with Web Server or as standalone component October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29
    30. Java Servlet Example package org.vub.wise; import java.io.*; import java.util.Date; import javax.servlet.http.*; import javax.servlet.*; public class HelloWorldServlet extends HttpServlet { public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { PrintWriter out = res.getWriter(); out.println("<html>"); out.println("<head><title>Hello World</title></head>"); out.println("<body>The time is " + new Date().toString() + "</body>"); out.println("</html>"); out.close(); } }  In the exercise you will learn how to process parameters etc. October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30
    31. JavaServer Pages (JSP)  A "drawback" of Java Servlets is that the whole page (HTML) has to be defined within the servlet  not easy to share tasks between web designer and programmer  Add program code through scriptlets and markup to existing HTML pages  These JSP documents are then either interpreted on the fly (Apache Tomcat) or compiled into Java Servlets  The JSP approach is similar to PHP or Active Server Pages (ASP)  Note that Java Servlets become more and more and enabling technology (as with JSP) October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31
    32. JavaScript  Interpreted scripting language for client-side processing  JavaScript functionality often embedded in HTML documents but can also be provided in separate files  JavaScript often used to  validate data (e.g. in a form)  dynamically add content to a webpage  process events (onLoad, onFocus, etc.)  change parts of the original HTML document  create cookies  ...  Note: Java and JavaScript are completely different languages! October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32
    33. JavaScript Example <html> <body> <script type="text/javascript"> document.write("<h1>Hello World!</h1>"); </script> </body> </html>  Please have a look at the following JavaScript tutorial to learn some of the basic constructs (operators, control statements, etc.)  http://www.w3schools.com/JS/  In the exercise session you will use JavaScript to implement a web application October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33
    34. Java Applets  A Java applet is a program delivered to the client-side in the form of Java bytecode  executed in the browser using a Java Virtual Machine (JVM)  an applet has to extend the Applet or JApplet class  runs in the sandbox  Advantages  the user automatically has always the most recent version  high security for untrusted applets  full Java API available  Disadvantages  requires a browser Java plug-in October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34
    35. Java Applets ...  Disadvantages ...  Only signed applets can get more advanced functionality - e.g. network connections to other machines than the source machine  More recently Java Web Start (JavaWS) is replacing Java Applets  program no longer runs within the browser - less problematic security restrictions - less browser compatibility issues  Java Chess Applet Example  http://english.op.org/~peter/ChessApp/ October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35
    36. Exercise 2  Hands-on experience with various web technologies  HTTP  Java Servlets  JavaServer Pages  Apache Tomcat  HTML  JavaScript  ... October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36
    37. References  David Gourley et al., HTTP: The Definitive Guide, O'Reilly Media, September 2002  R. Fielding et al., RFC2616 - Hypertext Transfer Protocol - HTTP/1.1  http://www.faqs.org/rfcs/rfc2616.html  N. Freed et al., RFC2045 - Multipurpose Internet Mail Extensions (MIME)  http://www.faqs.org/rfcs/rfc2045.html  HTML and JavaScript Tutorials  http://www.w3schools.com October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37
    38. References ...  Java Servlet Tutorial  http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets.html October 1, 2009 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38
    39. Next Week Markup Languages 2 December 2005

    + Beat SignerBeat Signer, 4 weeks ago

    custom

    724 views, 3 favs, 2 embeds more stats

    This lecture is part of a Web Information Systems c more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 724
      • 722 on SlideShare
      • 2 from embeds
    • Comments 0
    • Favorites 3
    • Downloads 1
    Most viewed embeds
    • 1 views on http://www.inf.ethz.ch
    • 1 views on http://wise.vub.ac.be

    more

    All embeds
    • 1 views on http://www.inf.ethz.ch
    • 1 views on http://wise.vub.ac.be

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories