Documente Academic
Documente Profesional
Documente Cultură
The Internet
Open-access networks
1983 MILNET- 113 nodes connected Universities and Organizations involved in DoD
sponsored research
NSFNET (1985-1995)
The Internet
Designed for use both within local area networks (LANs) and between networks
Important
TCP/IP
IP is the fundamental protocol defining the Internet (as the name implies!)
IP Function
Data itself
Route The sequence of computers that a packet travels through from source to
destination
Limitations of IP:
TCP/IP split long messages into shorter for transport over Internet and transparently
reassembling them on receiving side (Fragmentation & Reassembly)
Builds on IP
Heavyweight (TCP)
Provide mechanism to map back and forth between host names and IP
addresses
Computer DNS service (to convert host name to IP address) use UDP
software to send UDP message to DNS server
Host names
Top level domains are divided into sub domains / second-level domains, which can be
further divided into sub domains, etc.
A host name plus domain name information is called the fully qualified domain
name (FQDN) of the computer
User level tool to query Internet DNS- program provides command-line access to
DNS (on most systems)
(E.g:192.0.34.166 -www.example.org,www.example.com)
Eventhough multiple qualified names are associated with IP address ,only one of the
name is returned by reverse lookup. This is called as canonical name
(www.example.com) of the host others are referred as aliases
Higher-level Protocols
These protocols are used to communicate once a TCP connection has been established
Telephone analogy: TCP specifies how we initiate and terminate the phone
call, but some other protocol specifies how we carry on the actual
conversation
Some examples:
First Internet chat software-IRC (Internet Relay Chat) private and public chat
facilities
Client and Server communicate over the Internet by HTTP a communication protocol
built on the top of TCP/IP
Gopher has links but documents are plain text, ARCHIE and WIAS provide no
support for links , but HTML hyper links, page layout facilities, inline
graphics
Definition :
The World Wide Web is the collection of machines (Web servers) on the
Internet (collection of machines globally connected via IP) that provide
information, particularly HTML documents, via HTTP.
Machines that access information on the Web are known as Web clients. A Web
browser is software used by an end user to access the Web.
The protocol does not require the server to remember anything about the client
between requests.
Normally implemented over a TCP connection (80 is IANA standard port number for
HTTP)
Browser displays body of response in the client area of the browser window
Can use the Internets Telnet protocol to simulate browser request and view server
response
start line
blank line
Start line
HTTP version
HTTP Version
Request URI
URI
In addition to http, some other URL schemes are https, ftp, mailto,
telnet and file
Request-URI
addresses
Ex: In http://www.example.com/
the scheme is http
Request-URI is the portion of the requested URI that follows the host name
(which is supplied by the required Host header field)
Request methods
GET
POST
HEAD
OPTIONS
DELETE
PUT
Requests that only header fields (no body) be returned in the response
TRACE
HTTP Request
start line
header field(s)
blank line
optional body
Field value may continue on multiple lines(more than one value) by starting
continuation lines with one/more white spaces or tabs
Field values may contain MIME types, quality values, and wildcard characters
(*s)
For specifying content type of a message has two parts case insensitive strings
Note use of wildcards to specify quality 0.1 for any MIME type not specified earlier
application
audio
image
message
model
multipart
text
video
HTTP Request
Referer: URL of document containing link that supplied URI for this HTTP
request
HTTP Response
status line
header field(s)
blank line
HTTP Response
Status line
specified class
HTTP Response
Status code
Three-digit number
301 Moved Permanently URI for the requested resource has changed
307 Temporary Redirect - URI for the requested resource has temporarily changed
HTTP Response
status line
header field(s)
blank line
optional body
HTTP Response
Last-Modified: date and time the requested resource was last modified on the
server
Expires: date and time after which the clients copy of the resource will be outof-date
ETag: a unique identifier for this version of the requested resource (changes if
resource changes)
Most web browsers use cache to store requested resources so that subsequent requests
to the same resource will not necessarily require an HTTP request/response
Client Caching
Client Caching
Client Caching
Client Caching
Client Caching
Cache advantages
Cache disadvantage
Client Caching
The response message contains a Last-Modified time and this time precedes
the value of the Date header field returned with the cached resource , so
cached copy is valid otherwise cached copy is invalid and the browser should
send a normal GET request for the response
2. Check ETag header in response; Compare ETag returned by a HEAD request with the
cached resource. If ETag values are match then the cached copy is valid
3. Server can determine in advance the earliest time at which a resource will change, The
server can return that time in an Expires header. In this case, as long as the Expires time has
not been reached ,the client may use the cached copy of the resource without need to validate
with the server. If Expires time is not included in a response header was sent, use heuristic
algorithm to estimate value for Expires
Character Sets
Accept-Charset: request header listing character sets that the client can
recognize
Content-Type: can include character set used to represent the body of the
HTTP message
Character Sets
Character Sets
US-ASCII character set can be used for such documents, but is not recommended
UTF-8 can represent all ASCII characters using a single byte each and
arbitrary Unicode characters using up to 4 bytes each
21-bits are used for character, then the request and response message in between
client server is long(three times longer than ASCII)
Character sets
Character encoding is a bit string that must be decoded into a code-point integer that
is then mapped to a character according to the definition provided by some character
set
Web Clients
It is a software that accesses a web server by sending an HTTP Request message and
processing the resulting HTTP response
Web clients not designed to directly use by humans Ex: Software Robots
(software-only clients, e.g., search engine crawlers)
etc.
Any web client that is designed to directly support user access to web server is known
as User Agent
First graphical browser running on general-purpose platforms: Mosaic (1993)-NCSANational Centre for Super computer Applications
Web Browsers
Browser Bars
Title bar- title of the document currently displayed by the author, display browser
name as well as standard window management control
Location Bar- enter URL to display the document located at the specified URL
Status bar- display message and icons related to the status of the browser(Resolving
host, Connecting to, Waiting for, Transferring data from,Done)
Send HTTP request over TCP connection and wait for servers response
Render (position the text and graphics appropriately within the browser
window and display) documents returned by a server
HTTP URLs
Fragment identifier not sent to server (used to scroll browser client area)
Http-scheme URL
Path- The portion from the slash following the authority through the (?) or through
the end of URL if there is no ? Mark(/a/b/c.txt)
Query String- Following the path there may be a question mark followed by
information(string form-to pass search terms to a web server)up to a (#) sign.
The browser forms the Request URI portion of an HTTP request message from a
URL by concatenating the path and query portions of the URL (/a/a/c.txt?
t=win&s=chess)
Fragment- final optional part of an URL exclude number sign.The string contained in
the fragment is known as fragment identifier - used by browser to scroll HTML
documents
URL
User types URL in the location bar ,HTTP request message start line:
GET /a/b/c.txt?t=win&s=chess HTTP/1.1
..
Host: www.example.org:56789
Fragment portion of URL not sent to server but used by browser to scroll Http
response HTML document
User-controllable Features
Standard features
Set preferences customize browser functionality(Acceptlanguage(Navigator|Languages)->Language For Web pages, character set
(Navigator|Languages)-> Character Coding , cache properties (Advanced
->Cache->Set Cache Options box) and HTTP parameters (Advanced
->HTTP Networking ->Direct Connections Option Box) Edit|Preferences
User-controllable Features
Modify display style (e.g., increase font sizes) View -> Use style,View->Text
Zoom
Document meta information - Display raw HTML and HTTP header info (e.g.,
Last-Modified)
Web Browsers
Additional functionality:
Event handling (e.g., mouse clicks, mouse movement, Events not under user
control browser finish its rendering)
Managing Cookies
Web Servers
Basic functionality:
Server calls TCP software and wait for connection request to one|more ports
When connection request is received ,server dedicates sub task to handle this
connection
Map Host header to specific virtual host (one of many host names sharing an
IP address)
Map type of resource to appropriate MIME type and use to set Content-Type
header in HTTP response
TCP connection is kept alive,server monitor the request coming from client
until the length of time has elapsed
Web Servers-History
Several individuals running httpd created updates to the open source httpd software
updates are called as patches a patchy server Apache Server first public
release free-open source software April 1995 widely used server
IIS Vs Apache
The servlet container provides JVM(Java Virtual Machine) to run java(servlets) and
provide communication b/w servlet and Apache/IIS
Server Configuration broken into two areas: External Communication and Internal
Processing
Allowed/blocked IP addresses
These parameters decides performance of a server; Changing the values of these and
similar parameters in order to optimize performance is often referred to as tuning the
server
Tuning trial and error change the parameter values Load generation and stress test
tools used to simulate requests to web server and analyze the performance
Logging preferences
Browse to
http://localhost:8080
and click on Server Administration link
Connector Component
Each host and context associated with directory in the servers file system
Logging
The primary web server log recording normal activity is an access log , a file that
records information about every HTTP request processed by the server
Information in log entry: Host Name,User Name,Date and Time of response plus the
time zone,Start line of HTTP request,HTTP status code of response,Number of bytes
sent in body of response
Logging
Advantage Log files are used by Log analyzers- produce reports on various
aspects of site usage ( Number of accesses per day,% of request thatvreceived error
status code,break down of accesses by domain)
Access Control
Valve component Object type of Remote Host Valve and Remote Address Valve
specify allow and deny list of clients (*.example.org allow list baduser.example.orgdeny list)
Secure Server
HTTP request and response messages simple text-carried by TCP/IP ,each message
travel through number of machines before reaching its destination
Any machine other than the sender or receiver that extracts information from network
messages Eavesdropper
[RFC-2246]
Secure Server
During handshake client and server agree on various parameters to encrypt messages
Server sends certificate to the client, using certificate avoid meet-in-the-middleattack-avoid unauthorized access
Case Study
User blogger able to add text entries to the blog.Most recent entry appear at the
beginning of the web page,followed by next most recent and so on for all entries made
during the current month.