application layer!?
Upcoming SlideShare
Loading in...5
×
 

application layer!?

on

  • 1,051 views

 

Statistics

Views

Total Views
1,051
Views on SlideShare
1,051
Embed Views
0

Actions

Likes
0
Downloads
20
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • nice creation
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

application layer!? application layer!? Presentation Transcript

  • INFO 330 Computer Networking Technology I Chapter 2 The Application Layer Glenn Booker
  • Application Layer
    • The Application Layer is the reason the rest of the network exists – to serve applications
    • Most of the software familiar to end users are applications
      • Email, FTP, newsgroups, chat, the Web, streaming video, video conferencing, etc.
    • We focus first on key concepts related to the Application Layer, then discuss some specific applications in detail
  • Application Layer
    • New applications designed for network implementation need to decide whether the application is based on
      • Client-server architecture
      • Peer to peer (P2P)
      • Or some hybrid combination of the two
  • Client-server Architecture
    • In client-server architecture, the server
      • Handles requests from many clients, and
      • Is generally always available
      • Often has a fixed IP address
    • Clients generally don’t communicate with each other, and may be on or off independently of each other and the server
      • Client-server applications include email, FTP, the Web, remote login
  • Architecture
    • Multiple servers may be needed to keep up with the volume of client requests, hence the need for a server farm
    • P2P architecture assumes the clients are on or off at will, and all are treated equally as potential servers and/or clients
      • Apps include Gnutella , Morpheus , BitTorrent , Kazaa and more
  • P2P Architecture
    • P2P architecture is inherently self-scalable – millions of computers may participate, because each computer adds capacity at the same time it adds possible workload
    • Managing contents of a P2P application can be difficult – only one computer may have a particular file, and there’s no control over when that computer is available
  • Hybrid Architecture
    • Combinations of client-server and P2P exist
      • Napster is the best known for file sharing
        • Obtains file location and description information from a P2P network, but maintains that information on a central server farm
      • Instant messaging (IM) is also hybrid
        • Chats are all P2P, but logging into the system is centralized
        • Includes ICQ , AOL IM , MSN Messenger , and more
  • Process Communication
    • Any network application (no matter which architecture) needs to communicate between hosts using processes
      • In this sense, a process is a program running on a client, server, or peer host
      • Processes may communicate with other processes on the same host; this is controlled by the host’s operating system (OS)
      • We are interested in processes that communicate between hosts
  • Process Communication
    • Processes exchange messages
      • The sending or client process creates a message and sends it into the network
      • The receiving or server process gets the message from the network and might reply
    • Notice that client and server process only relate to their relative roles in sending a message, not the client-server or other architectures mentioned earlier
  • Sockets
    • A socket is the doorway through which the process sends a message to the network
    • The message goes through a socket on the client process, passes through the network, then enters the server process through another socket
    • A socket bridges the application and transport layers within each host
  • Sockets Could be UDP on both ends
  • Sockets
    • A socket is the Application Programming Interface (API) between application and the network
      • The API is all the developer sees of the network connection
        • The developer can choose to use TCP or UDP, and maybe tweak a few transport layer parameters
      • For example, Winsock is the Microsoft socket API
  • Addressing Processes
    • For the server process to get the message, it has to be addressed correctly
    • The host address and receiving process are the key parts of the address
      • The host address is its IP address (the 32- or 128-bit address of the host’s network interface)
      • The receiving process is identified by its port number , since many processes can be running at once
  • Addressing Processes Sockets send packets Ports listen for them
  • Port Number
    • Port numbers follow default values, set by the IANA , unless specified otherwise
      • 21 = FTP
      • 23 = Telnet
      • 25 = SMTP
      • 53 = DNS
      • 80 = HTTP, http://mine.com implies http://mine.com:80
      • 110 = POP3
      • 194 = IRC, and hundreds more
  • More Protocols
    • Application-layer protocols define how a particular application’s processes are structured
      • What types of messages are allowed
      • The syntax of those messages
      • The meaning of the fields in the syntax
      • Rules for processing messages – when and how to send messages, how to reply, etc.
  • Application vs its protocols
    • A single application often needs to use several application-layer protocols
      • A web browser might use HTTP, but also FTP, telnet, gopher, etc.
      • An email application might use POP3, SMTP, IMAP, etc.
    • Many app protocols are defined in RFCs
      • But some application-layer protocols are proprietary
  • RFC Summary
    • For an RFC which lists the current RFC standards, look in the RFC Index for “Internet Official Protocol Standards”
      • The current one is RFC 5000, dated May 2008
  • Application Services
    • The transport layer connects the application layer to everything else
    • Have a choice of two protocols, TCP and UDP, unless you want to write your own!
    • Key services include
      • Reliable data transfer – how important is it? Or is your app loss-tolerant?
  • Application Services
    • How much bandwidth or throughput does your app need?
      • Does sending rate have to equal receiving rate?
      • Some apps are elastic – can tolerate wide ranges of available bandwidth
    • How sensitive is your app to timing?
      • Games and telephony tend to be sensitive to slow or erratic transmission delays
    • How important is security?
  • TCP Services
    • TCP provides a connection-oriented service, where the sockets of the client and server recognize a connection for the duration of the session
      • Connection is duplex – messages can go both ways at once
      • TCP is highly reliable – the bits leaving one side all get to the other side, and get put back in the original order
  • TCP Services
    • TCP also provides congestion control, for benefit of the Internet
      • This throttles the sending processes when the connection is congested, and can limit bandwidth
    • TCP does not guarantee any level of transmission rate, or provide delay guarantees
    • So you’ll get your data across, but we don’t know when
  • UDP Services
    • UDP is a lightweight protocol – meaning it doesn’t do much!
      • UDP is connectionless
      • UDP is unreliable – data may never get there
      • UDP packets may arrive out of order and not realize it
      • There are no transmission rate guarantees
  • Services NOT Provided
    • TCP and UDP do not provide guarantees of throughput or timing
    • TCP does nothing for security per se, but SSL can be added on at the transport layer
      • See Chapter 7 for INFO 331
  • Application Protocols
    • We’ll examine protocols for Internet-based applications
      • HTTP
      • FTP
      • SMTP
      • POP3
      • IMAP
      • DNS
  • The Web and HTTP
    • Through the 1980’s, the Internet was used mostly for remote login, file transfer, newsgroups, and email
    • The World Wide Web changed all that, and made the Internet visible to the public
      • Comparable in significance to inventing movable type, the telephone, radio, or TV
      • The Web provides demand-based information, vs. broadcast info on radio and TV
  • HTTP
    • The HyperText Transfer Protocol ( HTTP ) is the heart of the Web
      • Defined by RFCs 1945 (v1.0) and 2616 (v1.1)
      • Has client and server programs which communicate via HTTP messages
    • Web pages contain objects – files of various sorts, such as a base HTML file, which cites JPG and/or GIF images, etc.
    • App to use HTTP is a browser
  • HTTP
    • A Web server houses the objects
      • Apache and Microsoft Internet Information Services ( IIS ) are common Web server apps
    • HTTP defines the messages that pass between client and server
      • Uses TCP for transport protocol
      • HTTP has no memory of previous actions (is a stateless protocol ) – so if you ask for the same file 126 times, it will send the file 126 times
  • HTTP
    • HTTP can use persistent or non-persistent connections – persistent is the default, but non-persistent can be specified
    • A non-persistent connection to get a web page might work like this:
      • Client requests a TCP connection to web server on port 80
      • Client requests the HTML page
      • Server retrieves the HTML page, and sends it
  • HTTP
      • Server closes the TCP connection
      • Client closes the TCP connection
      • Client reads the HTML file, and finds 10 JPGs referenced
      • Client repeats steps 1-4 ten times (!) to download each of the JPG images
    • Not very efficient!
    • Browser can determine how many parallel TCP connections are used (typically 5-10)
  • More Delays!
    • How long does this process take?
      • The round-trip time (RTT) is for a packet to go from client to server and back
      • Includes propagation delays, queuing delays, processing delays
    • TCP handshake involves two messages between client (C) and server (S); C-S, S-C,
    • Then request the file (C-S), and get the file from the server (S-C)
  • RTT Delay
    • So the time for getting one file is two times the RTT, plus the transmission time for uploading the file from the server (Fig. 2.7, p. 92)
    • In the non-persistent connection example, this is done 11 times for one HTML file and 10 JPGs
  • Persistent Connection
    • If there’s a persistent connection, the TCP connection stays, so the handshake is done once not only for the web page in the example, but for many HTTP requests
      • Connection is closed after some period of inactivity
    • Persistent connections can be with or without pipelining
  • Persistent Connection
    • Without pipelining , the client requests a new object only after the previous request has been filled
    • With pipelining , the clients requests new objects as needed, and may be waiting for several responses at once
      • This is the default setting for web browsers
      • Could reduce total RTT to one RTT unit for all parts of a web page, vs. 22 units for a non-persistent connection!
  • HTTP vs HTML
    • Don’t confuse HTTP with HTML
      • HTTP is the protocol used to define how files are requested and transferred between server and clients
      • HTML is the format of web pages
    • So an HTML file might be the structure of an entity body transferred using HTTP
  • HTTP Messages
    • HTTP messages are two types, request messages (from client) and response messages (from server)
      • All HTTP messages are plain ASCII text
        • ‘ Both types of message consist of a start-line, zero or more header fields (also known as "headers"), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body.’ [RFC 2616, para 4.1]
        • CRLF is a “carriage return and line feed”
  • HTTP Messages
    • There are many headers which could appear in requests or responses
      • Cache-Control, Connection , Date , Pragma, Trailer, Transfer-Encoding, Upgrade, Via, and/or Warning [RFC 2616, para 4.5]
      • Disclaimer : RFC 2616 is 176 pages long – so we’re just providing a summary of where to look for info if you’re curious about the details of these messages
  • HTTP Requests
    • Request messages have variable number of lines, depending on the method called
    • General request syntax is
      • Method Request-URI HTTP-Version
      • Methods are OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, or CONNECT [RFC 2616, para 5.1.1]
        • Most commonly used is GET
      • Request-URI is the desired Uniform Resource Identifier (URI, commonly called a URL)
  • HTTP Requests
      • HTTP-Version is what it sounds like, e.g. HTTP/1.1
    • There are many possible request headers
      • Accept, Accept-Charset, Accept-Encoding, Accept-Language, Authorization, Expect, From, Host , If-Match, If-Modified-Since, If-None-Match, If-Range, If-Unmodified-Since, Max-Forwards, Proxy-Authorization, Range, Referer, TE (extension transfer-codings), and/or User-Agent [RFC 2616, para 5.3]
  • HTTP Responses
    • HTTP responses go from server to client
    • General syntax starts with
      • HTTP-Version Status-Code Reason-Phrase [RFC 2616, para 6.1]
      • The Status-Code could be dozens of values
        • "200" OK
        • "403" Forbidden
        • "404" Not Found
      • The Reason-Phrase is any text phrase assigned
  • HTTP Responses
    • Response headers can include
      • Accept-Ranges, Age, ETag, Location, Proxy-Authenticate, Retry-After, Server , Vary, and/or WWW-Authenticate [RFC 2616, para 6.2]
    • Responses usually include entities, unless the HEAD method was used
  • HTTP Entities
    • An entity is the object sent or returned with an HTTP message
    • Entities can be with requests or responses
      • Entity headers include Allow, Content-Encoding, Content-Language, Content-Length (bytes), Content-Location, Content-MD5, Content-Range, Content-Type , Expires, Last-Modified , and/or extension-header [RFC 2616, para 7.1]
        • Where extension-header is any allowable message-header for that kind of message
  • HTTP
    • So HTTP describes request and response message formats
      • Both types typically have a first line which tells its purpose (the request or status line)
      • There can be many header lines
      • There might be an entity attached
  • Cookies!
    • HTTP is stateless
    • But some would like to remember a little information about web site visitors, hence cookies were defined with RFC 2965
    • Cookies require four parts
      • A cookie header in HTTP responses
      • A cookie header in HTTP requests
      • Cookie files on the user’s computer
      • A database on the web server
  • Cookies
    • When a user visits a cookied web site the first time, they are assigned a unique ID number, which is stored in the database
    • A Set-cookie method is used in their response to flag that ID number
      • Set-cookie: 1678
    • All subsequent HTTP interaction with that site, even years later, will flag that cookie number and identify the user
  • Cookies
      • Cookie: 1678
    • This provides a way for web sites to automate login for repeat customers, and track browsing and spending patterns
      • One-click shopping is only possible with cookies
      • The price for convenience is the lack of privacy
    • Ads on web sites can be targeted to match the user’s preferences
  • Other HTTP Content
    • So far we assumed the file content for HTTP was HTML files, JPGs, GIFs, etc.
    • Entities can be many other file formats
      • XML files, which are structured text
      • VoiceXML , WML (web pages for mobile phones), streaming audio and video, and P2P file sharing
  • Web Caching
    • A Web cache, or proxy server, acts as an intermediate between clients and servers
      • The cache stores recently used files, so they don’t have to be requested again
      • The cache acts as client and server
    • ISPs typically use web caching to cut down on outgoing web traffic (to the servers) and lower request response time
  • Web Caching
    • Tends to work well when the client-cache connection is faster than the cache-server connection
    • Often helps avoid upgrading the cache-server connection speed, which saves money
    • Implement by using a conditional GET method in HTTP
      • With the If-Modified-Since request header
      • If the cache is still current, don’t download the file
  • FTP
    • The File Transfer Protocol is one of the oldest Internet applications (now RFC 959, but started as RFC 114)
    • While HTTP and FTP both send files
      • FTP uses two connections – one for control, one for data (control information is out-of-band )
        • User login and commands are on the control connection, files move on the data connection
      • HTTP uses one connection for both purposes (control information is in-band )
  • FTP
    • FTP uses TCP, and usually connects to the server on ports 20 and 21
    • The client sends user ID and password
      • FTP may be done to some sites with generic ID, known as anonymous FTP
    • Once logged in, the user may navigate and view directories, and upload (STOR or PUT) or download (RETR or GET) files
  • FTP
    • Commands and replies are very basic
      • Most commands are three or four-letter abbreviations
      • Replies are three-digit codes, followed by text
    • Command connection is based on Telnet, incidentally [RFC 959, para 2.3]
    • Due to its age, FTP has provisions for a huge range of data types (ASCII or EBCDIC) and file, record, and page structures
  • Electronic Mail
    • E-mail is another ancient Internet application, with origins in RFC 772, in 1980
    • It provides asynchronous text communication and allows files to be attached to messages
      • Even voice and video messages
    • Main elements are users (sender and recipient), mail servers, and the Simple Mail Transfer Protocol (SMTP)
      • Careful, there’s also an S N TP for network time
  • Electronic Mail
    • Email is composed in a client, which sends it to a mail queue in the sender’s mail server
    • The sending mail server uses SMTP to send the message to the recipient’s mail server
      • If mail can’t be sent successfully, the sender’s mail server will keep trying (typically for 3 days)
    • The recipient is notified that the message is present, which they read with their client
  • Electronic Mail
    • Each user has a mailbox on the mail server
      • Access to the mailbox is controlled with user name and password
    • SMTP is the main protocol to get email from one mail server to another
      • It uses TCP, not surprisingly
      • Defined in proposed standard RFC 2821
      • Only uses 7-bit ASCII for message AND body
        • Forces binary files to be converted to ASCII & back
  • SMTP
    • After the TCP connection is established, SMTP does a handshake with port 25 of the recipient’s mail server
    • The client then sends the message
    • Multiple messages can be sent if needed, then the connection is closed
    • Client commands include HELO, MAIL FROM:, RCPT TO:, DATA (then the message body), and QUIT
  • SMTP
    • Other commands include ( with comments in italics )
      • RSET (abort current transaction)
      • SEND FROM:<reverse-path>
      • SOML FROM:<reverse-path> (send or mail)
      • SAML FROM:<reverse-path> (send and mail)
      • VRFY <string> (verify a user name)
      • EXPN <string> (expand mailing list)
      • HELP [ <string>]
      • NOOP (just send an OK reply)
      • TURN (your turn to be client or server)
  • SMTP vs HTTP
    • SMTP and HTTP can both move files using persistent TCP connections
      • SMTP pushes messages to the recipient’s mail server
        • HTTP pulls contents when desired from a web server
      • SMTP incorporates attachments into the body of the message as one big object
        • HTTP downloads attachments in separate responses
  • Mail Message Formats
    • Email contains header information defined by RFC 822 (Standard for ARPA Internet Text Messages)
      • The sender headers can include: FROM, SENDER, REPLY-TO, RESENT-FROM, RESENT-SENDER, and RESENT-REPLY-TO
      • The receiver headers can be: TO, CC, and BCC
      • Reference headers can be: MESSAGE-ID, IN-REPLY-TO, REFERENCES and KEYWORDS
  • Mail Message Formats
      • Other allowable header fields are: SUBJECT, COMMENTS, ENCRYPTED, and possibly some extension fields or user-defined fields
    • While many of these headers also sound like SMTP commands, they are part of the email message
    • This works fine for ASCII data
      • For anything outside of that, call a MIME
  • MIME
    • Multipurpose Internet Mail Extensions (MIME) are used for handling non-ASCII contents in email, e.g. non-Latin character sets, binary files, images, audio, video, etc.
    • MIME (RFC 2045) adds the ability to handle
      • (1) textual message bodies in character sets other than US-ASCII, (2) an extensible set of different formats for non-textual message bodies, (3) multi-part message bodies, and (4) textual header information in character sets other than US-ASCII.
  • MIME
    • The key three parts of MIME are defining the version of MIME, the encoding scheme, and the type of content
      • MIME-Version: 1.0
      • Content-Transfer-Encoding: can be &quot;7bit&quot; / &quot;8bit&quot; / &quot;binary&quot; / &quot;quoted-printable&quot; / &quot;base64“
      • Content-Type: describes the type and subtype
        • Type is discrete (&quot;text&quot; / &quot;image&quot; / &quot;audio&quot; / &quot;video&quot; / &quot;application&quot;) or composite (&quot;message&quot; / &quot;multipart&quot;)
  • MIME
        • Subtype is an ietf-token (An extension token defined by a standards-track RFC and registered with IANA) or an X-token (The two characters &quot;X-&quot; or &quot;x-&quot; followed, with no intervening white space, by an ASCII text string)
    • There are many other variations of type and subtype (see RFC 2046), including for
      • Other character sets (Content-type: text/plain; charset=iso-8859-1), or proprietary formats (image/JPEG, application/postscript, etc.)
  • MIME
    • The received message also includes a Received: header added to the top of the message
    • This is familiar in email if you look at the full headers
  • Uuencode and uudecode
    • Historic note:
      • Before MIME, uuencode was used to convert non-ASCII files to text
        • Doing so expanded the file in size 35%, because of the conversion from 7 bit to 8 bit, plus control information
      • Uudecode reversed the operation after the file was received
      • These commands still exist under UNIX
  • Mail Access Protocols
    • If you log directly into your email server, SMTP is all you need to handle email
    • But if you wish to access email from a local host, you need to use a mail access protocol
    • The biggies at present are
      • Post Office Protocol version 3 (POP3) and
      • Internet Mail Access Protocol (IMAP)
  • POP3
    • POP3 is defined in RFC 1939
      • It’s a pretty simple protocol compared to many
    • SMTP sends mail between mail servers, and from the user agent (email app) to their mail server
    • POP3 transfers mail from your mail server to your user agent
    • From a user’s view, SMTP handles outgoing email, and POP3 handles incoming email
  • POP3
    • POP3 uses TCP, and connects to port 110 on the mail server
    • POP3 does three things – authorization, transaction, and update
      • Authorization verifies the user identity
      • Transaction retrieves email, marks messages for deletion, and gets mail statistics
      • Update ends the session, and deletes flagged messages
  • POP3
    • POP3 communicates with the mail server by commands, which get a +OK response if it worked, and an –ERR response of some kind if it didn’t work
      • Authorization uses commands ‘user’ and ‘pass’
      • Transaction uses commands
        • ‘ list’ to see list of messages
        • ‘ dele x’ to delete message number x
        • ‘ retr x’ to retrieve message number x
        • ‘ quit’ ends the session
  • POP3
    • POP3 allows two modes, depending on whether you delete the messages after retrieving them
      • If you download-and-delete messages from the server, you only download them to one local host
      • If you download-and-keep the messages on the server, then you can download them to more than one local host (e.g. home and work)
        • Disadvantage is that the volume of mail on the server can be too big
  • POP3
    • POP3 maintains a little state information during a session, such as which files have been marked for deletion
    • However after a session is over, all state information is gone
      • This makes a POP3 server a fairly simple beast
    • Users use folders locally (on their email app) to store and organize messages
  • IMAP
    • IMAP , defined in RFC 3501, allows folders to be defined on the mail server to organize email there
      • Messages are associated with a folder – first the generic INBOX, then moved by the user
      • Hence state information about the folder for each message must be saved across sessions
    • IMAP also provides search capability within the mailbox
  • IMAP
    • Users can also get just the headers of messages, and avoid downloading the MIME portion
      • Handy when on a low speed connection
  • Web Email
    • Hotmail (now owned by Microsoft) introduced web-based email shortly after the Web became popular
      • Mail is accessed by HTTP not POP3 or IMAP
      • But the server-to-server connection is still SMTP
    • Very convenient for accessing mail with limited bandwidth or from many locations
    • Widely imitated ( Gmail , Yahoo , AOL , etc.)
  • DNS
    • A key need, once the Internet grew beyond a few thousand hosts, was to automate converting human* readable addresses or hostnames (www.microsoft.com) to IP addresses (207.46.198.60) got IP here
    • That is the purpose of the Domain Name System (DNS)
      • Before DNS, really big lookup tables were used!
    * Humans who read English, at least!
  • Host vs Domain Names
    • A hostname is the name of a particular host computer, such as banner.drexel.edu
      • May really represent multiple computers, but logically they are all the same host
    • A domain name is the top level domain and the specific domain name, like drexel.edu
    • Top level domains are com, edu, gov, mil, org, net, and the country codes uk, de, fr, etc.
  • IP Addresses
    • IP addresses have four groups of bytes, each group from 0 to 255, separated by periods
      • Why called bytes? Each value from 0 to 255 corresponds to a value of from 0 to (2 8 -1), and a byte is eight bits
    • IP addresses are typically static (fixed) for servers and other semi-permanent Internet connections, and dynamic for temporary connections (e.g. dial-up, wireless)
  • DNS
    • DNS runs over UDP, port 53 (something uses UDP!)
    • DNS is managed by DNS servers, typically running Berkeley Internet Name Domain ( BIND ) software
    • DNS is used by other applications (HTTP, SMTP, FTP) to translate host names to IP addresses
      • You can also do a reverse DNS lookup (convert 205.188.97.2 to www-vd03.evip.aol.com)
  • Reverse DNS Lookup
    • So if you try to look up a random IP address like 123.45.67.89, dnsstuff.com gives
      • The reverse DNS entry for an IP is found by reversing the IP, adding it to &quot;in-addr.arpa&quot;, and looking up the PTR record. So, the reverse DNS entry for 123.45.67.89 is found by looking up the PTR record for 89.67.45.123.in-addr.arpa.
        • “ tinnie.arin.net (an authoritative nameserver for 123.in-addr.arpa., which is in charge of the reverse DNS for 123.45.67.89) says that there are no PTR records for 123.45.67.89.”
  • DNS
    • DNS also provides other key services
      • Host aliasing allows the true or canonical hostname to have aliases
        • When blah.com works to get to www.blah.com, it’s because blah.com is a host alias of www.blah.com
      • Mail server aliasing – same concept, but for mail server names
      • Load distribution across many servers for the same hostname – so everyone in the world doesn’t use one IP address for microsoft.com
  • DNS Structure
    • DNS is highly decentralized, so that many servers can share the workload
      • Good for speed, redundancy, and security
    • There are three levels of structure – the job of looking up a given address is partitioned among them
      • Root DNS Servers – are 13 sets of servers around the world that provide top level delegation of DNS information
  • DNS Structure
      • Top-Level Domain (TLD) DNS Servers – sets of servers are maintained for each of the top level domains, including country codes
        • Network Solutions Inc maintains the .COM domain
      • Authoritative DNS Servers – everyone who has publicly visible web or mail servers has to maintain DNS records
        • Drexel, large ISPs, etc. all can maintain DNS servers
      • Local DNS servers – are used to forward to the nearest authoritative DNS server
  • DNS Lookup
    • DNS lookup typically follows the pattern at right
      • A request to the local DNS server finds the TLD server from root
      • Then the auth. server from the TLD server, who gives the desired IP address
  • Recursive vs Iterative Queries
    • DNS queries which ask another server to get information are recursive
      • Query 1 on previous slide is recursive
    • DNS queries which which get the information directly are iterative
      • Queries 2, 4, and 6 are iterative
    • All DNS queries can, in general, be recursive or iterative – the example shown is typical
  • DNS Lookup
    • This would be terribly tedious without caching
      • Common queries are stored on each level of DNS server, so they don’t have to be looked up constantly
      • Cached values are cleared typically every two days or less, in case the data changes
  • DNS Records
    • Data about a hostname, its aliases, domain, and mail servers are captured in resource records (RR)
    • Each RR is a line with four fields
      • (Name, Value, Type, and TTL)
        • Name is a hostname, domain name, or canonical host or mail server name (depending on the Type)
        • Value is the IP address, mail server, or of the Name
        • Type is the record type
        • TTL is the time the resource should be removed from cache (in seconds)
  • DNS Records
    • DNS RR types are one of several options
      • Type=A gives the IP address Value for a hostname Name
        • (relay1.bar.foo.com, 145.37.93.126, A) (TTL not shown)
      • Type=NS (name server) gives the authoritative DNS server Value for a domain Name
        • (foo.com, dns.foo.com, NS)
      • Type=CNAME defines the alias Name for the canonical hostname Value
        • (foo.com, relay1.bar.foo.com, CNAME)
  • DNS Records
      • Type=MX gives the canonical mail server Value for an alias hostname Name
        • (foo.com, mail.bar.foo.com, MX)
      • Most hostnames have many RRs
    The Start of Authority ( SOA) resource record indicates that this DNS name server is the best source of information for the data within this DNS domain
  • DNS Messages
    • The same format DNS messages are used to both query a DNS server, and receive the reply
    • The messages have a header section, the question, the answer, a section for other authoritative servers, and possibly additional information (such as A records for mail servers)
  • nslookup
    • The command nslookup provides basic IP data for a hostname or domain
    • Nslookup snip.net
      • Server: ns2.snip.net
      • Address: 209.204.64.3
      • Name: snip.net
      • Address: 216.83.103.123
  • DNS Changes
    • A registrar makes changes to the DNS database
      • The list of registrars is at http://www.internic.net/ (the text is full of typos!)
    • Changes to DNS records typically take hours to a couple days to become available – less if lots of people are requesting a new domain
      • Likewise, email won’t find you right away
  • Peer-to-Peer File Sharing
    • Peer-to-Peer (P2P) file sharing occupies much of the volume of Internet traffic
    • It allows a user to find a file on another user’s computer, and download it directly from them
      • Everyone can be client and server, even at the same time
      • Napster used a centralized index , but true P2P just indexes the files you will share
        • Please don’t share your entire hard drive!
  • P2P File Distribution
    • P2P can be used to distribute a file from one source (e.g. a new Linux kernel) to hundreds of peer servers
    • P2P is inherantly scalable
      • Client-server file distribution time increases linearly with the number of nodes on the network
      • P2P distribution time levels off asymptotically
  • BitTorrent
    • Bittorrent.org manages the protocol used by most file sharing (30% of all Internet backbone traffic!)
      • m Torrent is a commercial version; see also Azureus/Vuze , BitComet , etc.
    • A torrent is the set of peers participating in distribution of a file
      • A tracker node keeps track of which nodes are in the torrent
  • BitTorrent
    • When you join a torrent, you identify up to 50 neighboring peers already in the torrent
      • Then find what chunks of the file each has, and get the rarest first
    • When responding to requests for file chunks, focus on neighbors with the highest data rate
      • Peers also send chunks to random neighbors
      • In order to get good download rates, must share nicely with others! (no free-riding !)
  • Peer-to-Peer File Sharing
    • TCP connections between the computers and FTP make it possible
      • The server computer is a transient Web server
    • Gnutella has a proprietary protocol (not everything is an RFC!)
      • A request for a file produces query flooding to find that file is neighboring peers, and collects query hits; from those hits, an HTTP GET command downloads the file
  • Peer-to-Peer File Sharing
      • More refined limited scope query flooding is now done to minimize Internet traffic required per user
        • Only looks at nearby peers in decreasing numbers
      • Gnutella also manages how people find peers on the network ( bootstrapping ), and maintain whether they are still online by pinging them
    • KaZaA and Morpheus borrowed from both Napster and Gnutella
      • It searches nearby peers, but not all are equal – some have higher bandwidth and more to share
  • Peer-to-Peer File Sharing
      • More powerful peers are group leaders ( super peers ) for those around them, acting like mini hubs of the network
        • Group leaders connect via TCP, and map out what’s available from their local peers
      • Other tricks include
        • Limiting the number of simultaneous downloads
        • Giving priority to those who upload more than download
        • Download parts of the same file in parallel from multiple sources at once
  • Skype
    • Skype is a popular P2P Internet telephony app, which goes beyond file distribution and sharing in the P2P world
    • Nodes in Skype are in a hierarchical overlay (like the super peer concept), which makes it faster to locate a user
    • Skype uses relays to establish calls across NAT-hidden local networks
  • Peer-to-Peer File Sharing
    • A massive issue for P2P file sharing is the intellectual property rights of the files being shared
      • Music and video industry lawyers have claimed enormous losses from file sharing, and have vigorously fought file sharing applications
      • Napster, BearShare, Grokster, Morpheus, iMesh, DVDxCopy, KaZaA, and others are involved in such ongoing disputes