Sunteți pe pagina 1din 85

Unit - IV

ADVANCEDSOCKETS
4.1

IPV4 and IPv6 Interoperability

4.1.1Introduction
A gradualtransition of the Internet from IPv4 (Currentversion) to IPv6 (Next generation)
is probablyrequired. So it is importantthat existing IPv4 applicationscontinueto work with
newerIPv6 applications.
.

An IPv6 Telnet serversmust provide one that works with IPv4 serversand one
that work's with IPv6 servers.

One IPv6 Telnet client that can work with both IPv4 and IPv6 servers,.

One Telnet severthat can work with both IPv4 and IPv6 clients.

The hosts are running dualstacks,that is both and IPv4 protocol stack and an
IPv6 protocol stack.

There are four combinationsof clients and serversusing either IPv4 or IPv6 and we
show thesein Figure4.1
IPv4 server

Pv4client
Pv6client

Almostall existing
clientsandservers
Discussed
in section4.1.3

lPv6server
Discussed
i n s e c t i o n4 . 1 . 2
Simple modification to most
existineclientsand servers

Fig 4.1 combinations of clients and servers using IPv4 or IPv6.

The following conceptsare discussedin this chapter.


.

IPv4 client, IPv6 serverover dual-stackserverhost

IPv6 client over dual-stackclient host, IPv4 server

IPv6 addressmacro, function and option

Sourcecode portability

AdvoncedSockets

4.3

4.1.2 lPv4 Client, IPv6 server


A generalproperty of a dual-stackhost is that IPv6 serverscan handle both IPv4 and
IPv6 clients. This is done using IPv4 mappedIPv6 addresses.Figure 4.2 shows an example
of this.
lHVtl

client

lPv4
client

l P v 6l i s t e n i n g
s o c k e tb o u n d e d
t o 0 : : 0 ,p o r t 8 8 8 8

lPv4-mapped
l P v 6a d d r e s s

zvo.oz.zzo.+z

Enet Pv4
ndr hdr

hdr

l P v Oa d d r e s s

5 f 1b : d f 0 0 : c e 3 e : e 2 0 0 :
2O:80O:2b37:6426

TCP
oata

Type Dport
0800 8888
E n e t lPv6 TCP T C P
n o r n o r h d r oala

J;ffi?ffi'
Fig 4.2 IPv6 server on dual -stack host serving IPv4 and IPv6 clients

From the fig 4.2 :


.

An IPv4 client and an IPv6 client are on the left. The serveron the risht is written
using IPv6 and it is runningon a dual-stackhost.

The server has created an IPv6 listenrng TCP socket that is bound to the IPv6
wildcard addressand TCP port 8888.

Assume that the clients and serverare on the sameEthernet.

Routerscould also connectthem, as long as all the routers supportIPv4 and IPv6.

Assume that both clients send SYN sesmentsto establisha connectionwith the
server.

IPV4 client

4.4

The IPv4 client host will sendthe SYN in an IPV4 datagrams

The TCP segmentfrom the IPv4 client appearson the wire as an Ethernetheader
followed by and IPv4 header,a TCP header,and the TCP data.

The Ethernetheadercontainsa type field of oxo800,which identrfiesthe frameas


an IPv4 frame.

The TCP headercontainsthe destinationport of 8888.

The destinationIP addressin the IPv4 headerwould be 206.62.226.42.


NetworkProgramming
and Management

IPVS client
.

The IPv6 client's host will sendthe SYN in an IPv6 datagram.

The TCP segmentfrom the IPv6 client appearson the wire as an Ethernet header
followed by and IPv6 anlPv6 header,TCP header,and the TCP data.

The Ethernet headercontains a type field of 0x86dd, which identifies the frame as
an IPV6 frame.

The TCP header in the IPV6 packet has the sameformat as the TCP headerin the
IPV4 packet,and containsthe destinationport of 8888'

The destination IP address in


5f I b :df00 :ce3e:200:20:800:2b37:6426.

the

IPv6

header

would

be

The receiving datalink looks at the Ethernet type field and passeseach frame to the
appropriate IP module.
Address conversion
.

The IPv4 module, probably in conjunctionwith the TCP module, detectsthat the
destination socket is an IPv6 socket,and the source IPV4 addressin the IPV4
headeris convertedinto the equivalentlPV4-mappedIPV6 address.

That mapped addressis returnedto the IPv6 socket as the client's IPv6 address
when acceptreturns to the serverwith the IPv4 client connection'

All remaining datagramsfor this connectionare IPv4 datagrams.

When accept returns to the server with the IPv6 client connection,the client's
IPv6 addressdoes not change from whatever source addressappearsin the IPv6
header.

All remaining datagramsfor this connectionare IPv6 datagrams.

We can summarizethe steps that allow an IPv4 TCP client to communicatewith an


IPv6 server.
l.

The lpv6 serverstarts,createsan IPv6 listening socket,and it binds the wildcard address
to the socket.

2.

The IPv4 client calls gethostbynameand finds an A record for the server. The server
host will have both an A record and a AAAA record, since it supportsboth protocols
but the IPv4 client asks for only an A record.

3.

The client calls connectand the client's hosts sendan IPV4 SYN to the server.

4.

The serverhost receivesthe IPv4 SYN directedto the IPV6 listening socket,setaa flag
indicating that this connectionis using IPV4 -mapped IPV6 addressesand responds
with an IPV4 SYN/ACK. When the connection is established,the addressreturned to
the serverby acceptis the IPv4 mappedIPv6 address.

5.

All communicationbetweenthis client and servertakesplace using IPv4 datagrams.

AdvoncedSockefs

4.5

6.

Unless the server explicitly checkswhether this IPv6 addressis and IPv4 mapped IPv6
address(using the IN6_Is_l DDR_V4MAPPED macro), the server never knows that it
is communicating with an IPv4 client. The dual protocol stack handles this detail.
Similarly, the IPv4 client has no idea that it is communicating with an IPv6 server.

An underlying assumption in this scenario is that the dual - stack server host has both
an IPv4 addressand an IPv6 address.This will work until all the IPv4 addressesare taken.
The scenario is similar for an IPv6 UDP server,but the addressformat can change for
each datagram.For example, if the IPv6 server receives a datagram from an IPv4 client, the
addressreturned by recvfrom will be the client's IPv4 mapped IPv6 address. The server
respondsto this client's requestby calling sendtowith the IPv4 mappedIPv6 addressas the
destination.This addressformat tells the kernel to sendan IPV4 datagramto the client. But
the next datagram received for the server would be IPV6 datagram. But the next datagrams
receivedfor the servercould be an IP v6 datagrams,and recvfrom will return the IPv6 address.
It the server responds,the kernel will generatean IPv6 datagrar4.
Figure 4.3 summarizeshow a receivedIPv4 or IPv6 datagramis processed,depending
on the type of the receiving socket, for TCP and UDR assuming a dual stack host.
AF-INET
SOCK-DGRAM
sockaddr_in

lPv4sockets

revosocr<ets
{

AF_INET6
SOCK-STREAM
sockaddr_in6

AF'INET6
SOCK-DGRAM
sockaddrin6

address
returned by
acceptor
recvfrom
{,,*

lPv4 datagram

lPv6 datagram

Fig. 4.3 Processing ofreceived IPv4 or IPv6 datagrams, depending on type ofreceiving socket

4.6

If an IPV4 datagram is received for an IPv4 socket, nothing special is done


There are the two arrows labeled "IPv4 " in the figure, one to TCP and one to
UDP.IPv4 datagramsare exchangedbetween the client and server.
NetworkProgramming
and Management

Ifan IPv6 datagram is received for an IPv6 socket, nothing special is done
"IPv6" in the figure, one to TCP and one to
These are the two arrows labeled
UDP. IPv6 datagramsare exchangedbetween the client and server.

But when an IPv4 datagram is received for an IPv6 socket, the kernel returns
the corresponding IPv4 mapped IPv6 addressas the addressreturned by
accept(TCP) or recvfrom (UDP). These are the two dashed arrows in the
figure. This mapping is possiblebecausean IPv4 addresscan always be
representedas an IPv6 address.IPvA datagramsare exchangedbetween the
client and server.

The converseofthe previous bullet (an Ipv6 datagramis received for an Ipv4 socket)
is false: in general an IPv6 address connote be representedas an IPv4 address: therefore
there are no arrows from the IPv6 protocol box to the two IPv4 sockets.
Most dual- stackhostsshouldusethe following rules in dealing with listening sockets:
l.

A listening IPv4 socketcan acceptincoming connectionsfrom only IPv4 clients.

2.

Ifa serverhas a listening IPv6 socketthat has bound the wildcard address,that socket
can accept incoming connections from either IPv4 clients or IPv6 clients. For a
connectionfrom an IPv4 client the server'slocal addressfor the connectionwill be the
corresponding IPv4 mapped IPv6 address.

3.

If a server has a listening IPv6 socket that has bound an IPv6 addressother than an
IPv4 mapped IPv6 address,that socket can accept incoming connectionsfrom IPv6
clients only.

4.1.3 IPv6 client,IPv4 serve


Consideran IPv6 TCP client running on a dual -stack host
l.

An IPv4 server startson an lPv4-only host and createsan IPv4 listening socket.

2.

The IPv6 client starts,calls gethostbynameasking for only IPv6 addresses(It enables
the RES_USE_INET6 option). Since the IPv4-only server host has only A records, an
IPV4-mapped IPv6 addressis returned to the client.

3.

The IPv6 client calls connect with the lPv4-mappedIPv6 addressin the IPv6 socket
addressstructure. The kernel detects the mapped addressand automatically sendsan
IPv4 SYN to the server.

4.

The serverrespondswith an IPv4 SYN/ACK, and the connectionis establishedusing


IPv4 datagrams.
We can summarizethis scenarioin Fieure 4.4

AdvoncedSockels

4.7

AF-INET
SOCK-DGRAM
sockaddr_in

lPv4sockets

AF_INET6
SOCK-STREAM
sockaddr_in6

lPv6sockets

AF-INET6
SOCK-DGRAM
sockaddr_in6

address
returnedby
acceptor
recvfrom { , * .

lPv4datagram

lPv6datagram

Fig, 4.4 processing of client rcquests, depending on address type and socket type

If an IPv4 TCP client calls connectspecifyingan IPv4 address,or if an IPv4 UDP


client calls sendto specifyingan IPv4 address,nothing specialis done.Theseare
"IPv4"
the two arrows labeled
in the figure.

If an IPv6 TCP client calls connectspecifyingan IPv6 address,or if an IPv6 UDP


client calls sendtospecifyingan IPv6 address,nothing specialis done Thesesare
the two arrow labeled "IPv6 "in the figure

If an IPv6 TCP client specifiesan IPv4-mappedIPv6 addressto connector if an


IPv6 UDP client specifies an IPv4-mapped IPv6 address to sendto, the kernel
detectsthe mappedaddressand causesan IPv4 datagramto be sent,insteadofan
IPv6 datagram. These are the two dashedarrows in the figure.

An IPv4 client cannotspecify an IPv6 addressto either connector sendtobecause


a l6-byte IPv6 addressdoesnot fit in the 4-byte in_addr structurewithin the IPv4
sockaddr_in structure. Therefore there are no arrows from the IPv4 clients to the
IPv6 protocol box in the figure.

In an IPv4 datagram arriving for an IPv6 server socket, the conversion ofthe received
addressto the IPv4-mappedIPv6 addressis done by the kernel and returned transparentlyto
the applicationby acceptor recvfrom.
In an IPv4 datagramneedingto be sent on an IPv6 socket the conversionofthe IPv4
addressto the IPv4- mappedIPv6 addressis done by the resolver and the mappedaddressis
then passedtransparentlyby the applicationto connector sendto.
4.8

NetworkProgramming
and Management

Summary of interoPerabilitY
of
Figure 4.5 summarizes this section and the previous section and the combinations
clients and servers.
IP6server
IPv4 server
IPv6 server
IPv4
-stackhost
dual
IPv6-onlyhost dual- stackhost
server
(A and AAAA) (A andAAAA)
IPv4-only (AAAA only)
host
(A only)
lPv4 client, IPv4- onlY host
IPv6 cl ent, IPv6 only host
IPv4 ci ent. dual-stackhost
IPv6 client. dual- stack host

IPv4
(no)

(no)

IPv4

IPv6

(no)

IPv4

(no)

IPv4

IPv6

(no*)

IPv4

IPv4
IPv6
IPv4
IPv6

Figure 4.5 Summary of interoperability between IPv4 and IPv6 clients and servers

"IPv6" if the combinationis OK, indicatingwhich protocol


Eachbox contains"IPv4"or
"(no)"-if the combination is invalid.
is used,
The third column on the final row is marked with an asterisk becauseinteroperability
dependson the addresschosenby the client.
Choosing the AAAA record and sendingan IPV6 datagramwill not work. But choosing
the A record, which is returnedto the client as an IPv4- mappedIPv6 addresses'causesan
IPv4 datagramto be sent, which will work'
Although it appearsthat one-fourth of the table will not interoperate, in the real world for
the foreseeablefuture, most implementationsof IPv6 will be on dual stack hosts and will
not be IPv6- only implementations.If we then removethe secondrow and secondcolumn,all
"(no)" entries disappearand the only problem is the entry with the asterisk.
of the
4.1.4 IPv6 Address Testing Macros
IPv6 Address Macro, Function, Option
.

IPv6 addresstesting macros:IN6-IS-ADDR-* (e.g. V4MAPPED)

Protocol independentsocketaddressfunctions: sock-* (e.g. cmp-addr)

IPv6_ADDRFORM socket option: change a socket type between IPv4 and IPv6,
by setsockopt function with IPv6-ADDRFORM option

There are small classesof IPv6 applications that must know whether they are talking
to an IPv4 peer. These applicationsneed to know if the peer's addressis an IPv4 mapped
IPv6 address.Twelve macros are defined to test an IPv6 addressfor certain properties.
Advonced Sockets

4.9

>
f,include< netinet/in.h
int lN6_|S_ADDR_UNSPECIFIED
(conststructin6_addr"aptr);
int lN6_|S_ADDR_LOOPBACK
(conststructin6_addr*aptr);
int lN6_IS_ADDR_MULTICAST
(conststructin6_addr*aptr);
int lN6_|S_ADDR_LINKLOCAL
(conststructin6_addr*apt|;
int lN6_lS_ADDR_SITELOCAL
(conststructin6_addr*aptr);
int lN6_IS_ADDR_V4MAPPED
(conststructin6_addr*aptr);
int lN6_lS_ADDR_V4COMPAT
(conststructin6_addr*aptr);
int lN6_|S_ADDR_MC_NODELOCAL
(conststructin6_addr*aptr);
int lN6_|S_ADDR_MC_LINKLOCAL
(conststructin6_addr*aptr);
int lN6_lS_ADDR_MC_SITELOCAL
(conststructin6_addr*aptr);
int lN6_|S_ADDR_MC_ORcLOCAL
(conststructin6_addr*aptr);
int lN6_lS_ADDR_MC_GLOBAL
(conststructin6_addr*aptr);

All return:Nonzeroif IPV6 addressis of specitiedtype,0 otherwise


The first sevenmacrostestthe basictype of IPv6 address.The final five macrostest
the scopeof an IPv6 multicastaddress
An IPv6 client could call the IN6_ IS_ADDR_V4MAPPEDmacro to test the IPv6
addressreturnedby the resolver.An IPv6 servercould call this function to test the IPv6
addressreturnedby acceptor recvfrom.
4.1.5IPV6_ADDRFORMSocketOption
The IPv6-ADDRFORM socketoption can changea socketfrom one type to another,
subjectto the followingrestrictions:
l.

An IPv4 socketcan alwaysbe changedto an IPv6 socket.Any IPv4 addresses


already
associated
with the socketare convertedto IPv4 mappedIPv6 addresses.

2.

An IPv6socketcanbechangedto anIPv4socketonly ifany addresses


alreadyassociated
with the socketare IPv4 mappedIPv6 addresses.

The reasonfor wantingto changethe addressformatof a socketis that descriptorscan


be pasbed
betweenprocesses
easilyunderUnix.
As an example,considera processthatcreatesa listeningIPv4 socketandthenaccepts
a connectionfrom an IPv4 client.This serverthencallsfork andexecstartinga new program
to handlethe client. Assumethat the conventionwith this applicationis that the connected
socketis passedto the new programas standardinput, standardoutput,and standarderror
.Wecouldhavethepseudocode
shownin Figure4.6 The only differencefrom our concurrent
Serveris duplicatingthe connectedsocketto the agreedon descriptorsandthencalling
exec.
4.10

NetworkProgramming
and Managemenl

But theprogramthat is execedexpectsan IPv6socket.Wecanusethe IPv6-ADDMORM


format,asshownin Figure4.7
socketoptionto convertthesocket'saddress
int

listenfd,connfd;
clilen;

socklen_t

serv,cli;
struct sockaddr_in
:
fistenfd sockeI(AF_INET,SOCK_STREAM,0);

l*lPv4 structs*/
l*lPv4 socket*/

/* fill in serv{} with well-known port*/


(serv));
Bind(listenfd,&serv,sizeof
Listen(listenfd,LISTENQ);
for (;;){
c l i l e n= s i z e o f ( c l i ) ;
connfd = Accept(listenfd,&cli,&clilen);
if (Fork(): = 0){
close(listenfd);
D u p 2 ( c o n n f d ,S T D I N _ F I L E N O ) ;
D u p 2 ( c o n n f d ,S T D O U T _ F I L E N O ) ;

/*child*/

D u p 2 ( c o n n f d ,S T D E R R _ F I L E N O ) ;
close (connfd);

/* start new program*/

Exec(...);
)

l*Parenl*l

Close(connfd);
)
Fig.4.6 server that accepts inconting connection and execs new progranl

int

ali

socklen_t

clilen;

structsockaddr_in6

cli; /*lPv6struct*/
*ptr;

structhostent
af=AF_lNET6;

N_FlLENO,
Pv6,IPv6_ADD
RFORM,&af,
setsockopt(STDI
IPPROTO_I
sizeof(af));
clilen= sizeof(cli);
(0,&cli,&clilen)
Getpeername
;
ptr= gethostbyaddr(&cli.
sin6_addr,
16,AF_l
NET6);
Fig 4.7 converting an IPv4 socket to an IPv6 socket

From the fig 4.7 thg call to stockpot changesthe addressformat of the socket from
IPv4 to IPv6, and the call to getpeernamewillreturn an IPv4 mappedIPv6 address,assuming
the socketwas an IPv4 socket.But if this programis execedwith an IPv6 socketon standard
input, the call to setsockoptdoes nothing, as the addressformat is already IPv6.
AdvoncedSockels

4.1'l

If getsockopt is called for IPv6_ADDRFORM, the returned value is either AF_INET


or AF_INET6, depending on the address format of the socket. The second argument to
getsockopt or setsockopt can be either IPPROTO_IP or IPPROTO_IPV6.
4.1.6 Sourcecode portability
.

Automatic program conversion from IPv4 to IPv6 (sockaddr_in, AF_INET, etc)


and #ifdef to use IPv6 when possible

Deal with socket addressstructuresas opaqueobjects:

Remove gethostbynameand gethostbyaddr.use getaddrinfo and getnameinfo(which


use #ifdef internally)
4.2

Threaded servers

4.2.1 Threads Introduction


Concurrencyis achievedby using fork (earlier) approachand Thread (current) approach.
Problemswith fork approachfor concurrency
l.

Expensive
.

Memory copied from parent to child

All descriptorsare duplicatedin the child

Can be implementedusing copy-on-write(don't copy until child needsown copy)

2.
IPC (Inter-processcommunication)is requiredto passinformation betweenparent
and child after fork.
In order to avoid the above problems threadsare used. All threadswithin a process
share.
.

Processinstructions.

Most data,

Open files (e.g.,descriptors),

Signalhandlersand signaldispositions,

Current working directory, and

IJser and group IDs.

But eachthread has its own:


.
thread ID.

4.12

set ofregisters, including program counterand stack pointer,

stack (for local variablesand return addresses)


and Management
NetworkProgramming

elTno,

Signal mask, and

Priority

4.2.2 Advantagesof Thread


.
Lightweight process.
.

Thread creation can be l0-100 times fasterthan process.

Lowercontext-switchingoverhead.

All threadswithin a processsharesameglobal memory.

POSIX threadsstandard
l.

Pthreadslibrary

2.

Supportedby Linux and is portable to most UNIX

3.

Has been ported to Windows

Increased concurrency (i.e., interleaveblocking operationsle.g., computation


and I/O in one process)

Simplified programming model (cf. IRP-drivenprogramming)

Increased responsiveness(balanceofI/O and compute-boundthreads)

Resourcesharing (i.e.,leadsto lessmemorybeing used)

Economy (i.e., cheaperto createthreads)

Potential performance gain: Utilization of multi-CPU architectures

4.2.3 Disadvantages
.

Global variablesare sharedbetweenthreads->lnadvertentmodification of shared


variablescan be disastrous(need for concurrencycontrol)

Many library functions are not thread-saJb.


l.

Library functions that returnp ointers to internal static aruaysare not thread

2.

safe.
To make it thread-safe+ caller allocatesspacefor the result and passes
that pointer as argumentto the function

Lack of robustness:If one threadcrashes,the whole application crashes

4.2.4 Processesvs. Threads


Process(traditional)
.

Unit of resourceownershipwith respectto the executionof a single program

Can encompassmore than one thread of execution

Advonced Sockels
13

4.13

One thread of control through a large, potentially sparse(small) addressspace

Address spacemay be sharedwith other processes(sharedmem)

Collection of systemsresources(files, semaphores)

Expensive

Private multiple-addressspaces

Less required synchronization

Thread (light weight process)


.
Unit of execution
.

Belongsto a process

Can be traced(i.e.listthe sequenceof instructions)

d flow ofcontrol through an addressspace

Each addressspacecan have multiple concurrentcontrol flows

Each threadhas accessto entire addressspace

Potentiallyparallel execution,minimal state(low overheads)

May need synchronizationto control accessto sharedvariables

Cheap

Sharedsingle- addressspace

Fast IPC(Inter ProcessCommunication)and synchronization


ProcesslProcess2Process3

process

Threads
Kernel
Use
. Processesare largely
independent

Use
. Threads are part of same ttjob" and
are actively and closely cooperating

Each threadhas its own stack, PC registerswhich shareaddressspace,files,...

4.14

and Management
NetworkProgramming

Process
gigabyta
virtual
address

Fig.4. I Thread Stack(s)

4.2.5 Why use Threads?


.

Large multiprocessorsneed many computing entities (one per CPU)

Switching betweenprocessesincurs high overhead

With threads,an application can avoid per-processoverhead like Thread creation,


Deletion Switching cheaperthan processes

Threadshave full accessto addressspace(easysharing)

Threadscan executein parallel on multiprocessors

4.2.6 Why threads?


.

Threadsretain the idea of sequentialprocesseswith blocking system calls, and


yet achieveparallelism

Softwareengineeringperspective
-Applications are easierto structureas a collection of threads

Each threadperforms several[mostly independent]tasks

4.2.7 Thread Management


.

Creation and deletion of threads


- Static versusdynamic

Critical sections
- Synchronizationprimitives: blocking, spin-lock (busy-wait)
- Condition variables

Global thread variables

Kernel versususer-levelthreads

AdvqncedSockets

4.15

4.2.8 Thread Packages


.
PosixThreads(pthreads)
- Widelyusedthreadspackage
- Conformsto the Posix standard
- Samplecalls:pthread_create,...
- Typical.used
in CIC++applications
- Can be implementedas user-levelor kernel-levelor via LWPs
Java Threads
- Native thread supportbuilt into the language
- Threadsare scheduledby the JVM

4.2.9Implementation of Threads - Overview


a

In user space

In kernel space

User-levelThreads
.

Thread managementdone at the user-level,e.g., in a threads library (a.k.a. a thread


run-time system)

Kernel knows nothing about the existenceof threads

Examples
- POSIX Pthreads
- Mach c-threads
- Solaris threads

ill
I l

fEl

Thread
run-time
system
Thread
table

Kernel
I PCBl+l PCBH...
Fig .4.9 User level thread

4.16

and Managemenl
NetworkProgramming

Callsinto the run-timearelike (local)procedurecalls(no kernelmodeswitch,no trap,


no contextswitchrequired.)
Kernel Threads
Supportedby the kernel
Threadcreation/destruction
is donein thekernel;kerneldatastructuresaremanipulated
info as for user-levelthreads
Samethreadmanagement
Mgmt. info is subsetof processcontext
Processcontextis alsoadditionallvmanased

Thread
table
Fig.4.I0 Kernel Thread

decisions(same& diff. process)


Blockingcallsyield scheduling
Threadrecycling
implications
management
costandperformance
Considerable
Solaris,Tru64UNIX, Linux
Examples:
Windows95/98AIT/2000,

AdvoncedSockets

4.' t7

4.2.10Singleand MultithreadedProcesses

f."de ll-d't"l l- riresl


l"drfr"l
I rt""kI

registers

*l
thread
(

multithreaded

single-threaded

FiS. 4.II Single and Multithreaded Processes

4.2.11Sharedand PrivateItems of Threads


Per thread items
. Programcounter
. Registers
. Stack
. Thread state

4.2.12

Per Processitems

.
.
.
.
.
.
.

Addressspace
Globalvariables
Openfiles
Child processes
Pendingalarms
Signalandsignalhandlers
Accountinginformation

Client and serverwith threads


Thread
2 makes
requests
to server
Receipt
&
queurng

Inpufoutput

N threads

Server
Fig. 4. I2 Client and server with threads

4.18

NetworkProgramming
and Management

4.2.13

Threaded Server Architectures


Workers

r
Ii
\r/o/"\r,.rot,

./\J \1

/l;i.;i;

per-connection
threads

.--;

o+

t-/
-bremore

Wl;j.;i;

,_,JI

per-obiect
threads

o. _u.r''.r0,,
\vqr'-

r'+L/+Lr ro-oj.'t,

/ \u,

a.Thread-per-request
b.Thread-per-connectionc.Thread-per-object
Fig.4.13 Threaded Server A)"hitr"turx

The thread-per-requestarchitecture
o

d coordinatorthreadreadsincoming requests

As soon as a requestis read, it spawns(creates)a thread to handle the request

The new thread


decodesthe request;
calls the servantto perform the request
-

exits

The thread-per-connectionarchitecture
.

The coordinatorthread detectsa new client

It connectsthe incomingclient to a new thread

The new thread


decodesthe request;
calls the servantto perform the request
returns to read next requestfrom sameclient

The Thread - per - Object Server Architecture


'

Providesmeansto accessobjectsremotely accordingto different activationpolicies

Object creationon invocation vs. during serverinitialization

Separatememory segmentsvs. sharingcode (classdefinition)

Threadpolicres(single,oneperobject,oneperrequest)andpool vs. ondemand oblect


adapter

4.2 .14 Single Threaded Web Server Implementation Issues


Overview
.

Sequentialprocessingofrequests

Finite statemachine

AdvoncedSockets

4.19

event-drivenprgromming model
Interrupt-driven model
.

Processesinsteadofthreads

Explanation
Sequential processing of requests
Getsrequest,processesit. getsnext
CPU idle while datais retrievedfrom disk
Poor performance
Finite state machine (Interrupt/event -driven prg. model)
Use non-blockingsystemcalls (read)
-

Record stateof current request

Event: Get next request


Event: On reply from disk (signal/interrupt)processdata read
Acceptableperformance
Complicated
t o d e v e l o pd
, ebug...

Processesinstead of threads
-

(e.g.,manyCGI (CommonGatewayInterface)
Use of IPC facilitiesfor communicatron
and Web serverrmplementations)
(heavyweight solution)

Single-thread Javo server


. server: echoeslines from client
publicclassserver{
staticStringport = "5194";
publicstaticvoidmain(String[]
argv)i
== 0)
if (argv.length
newserver(port);
else
newserver(argv[0]);
)
port){ lltcplipversion
server(String
try {
ServerSocket
srv =
newServerSocket(lnteger.
parselnt(port))
;
while(true){
Socketsock : srv.accept0;
System.err.println("server
socket" + sock);
newecho(sock);
)
e) {
) catch(lOException

4.20

NetworkProgramming
and Management

e.printStackTrace0;
)
)
)
classecho{
Socketsock;
echo(Socket
sock)throwslOException
{
in = new BufferedReader(
BufferedReader
getlnputStream0))
newInputStreamReader(sock.
;
ll lrom socket
out : new BufferedWriter(
BufferedWriter
getOutputStream
newOutputStreamWriter(sock.
0));
// to socket
Strings;
!: null){
while((s = in.readLine$)
out.write(s);
out.newLine0;
out.flush0;
if (s.equals("exit"))
break;
)
sock.close0;
)
)
. this is single-threaded - only servicesone client at a time
4.2.15

Multithreaded Server

Introduction to Multithreading
.

A thread is an independentstreamofexecution

Different threads in the same processshare a global environment (e.g. same object
instance)

The accessesofthreads to sharedresourcesmust be coordinated(synchronized)

Different threadsmay run on differentprocessorsif availableor sharea singleprocessor

Threadsare usually provided by operatingsystems,not programming languages

Threadsare used to increaseperformance:

Application-levelparallelism

Parallelismon multiprocessormachines

Local communicationcheaper(vs. IPC context switching)

Threadsare used to improve the structureof a processrespectivelya large program

Advonced Sockefs

4.21

Multi-threading Issues
'
Safety - how to synchronize threads so that they do not interfere with one another
'
Liveness-how to avoid deadlocksituationsto ensurethat all threadsmake progress
'

Performance - Overhead (performance penalty) from context-switching and


synchronization

Multithreaded server design Issues:


Simplifies servercode
Exploit parallelism for high performance(multiprocessorworkstation)
Requestdispatched
to a workerthread

Disoatcher
thread

Server

Workerthread

Requestcomingin
fromthe network

Fig.4. I 4 M ultithreaded Server

Fromthe fig 4.4


Dispatcher
threadwaitsfor requests
For eachrequest,choosean idle workerthread
Workerthreadusesblockingsystemcallsto servicewebrequest
Requestdispatchthread

Webserverprocess

User
space

r0.t
Web servercac

Kernel
space

Kernel

Networkconnection

Disk

Fig .4.l5 Multithreaded web server

4.22

NetworkProgramming
and Management

Characteristics

Model
Threads

blockingsystemcalls
Parallelism,

process
Single-threaded

blockingsystemcalls
No parallelism,

machine
Finite-state

nonblockingsystemcalls
Parallelism,

These are the three wavs to construct a server.

Multi-threadedserver
publicclassmultiserver
{
staticStringport = "5194";
publicstaticvoidmain(String[]
argv)
{
== 0)
if (argv.length
multiserver(port);
else
multiserver(argvIO])
;
)
port)
publicstaticvoidmultiserver(String

lltcplipversion

trv{

parselnt(port)) ;
nteger.
srv = new ServerSocket(l
ServerSocket
while(true)

Socket sock = srv.accept0;


" +
sock);
System.err.println("multiserver
new echo(sock);
)
) catch (lOExceptione) {
e.printStackTrace0;
)
)
T
4.2.16

Multithreaded Client

browserscan display data


Browserssuch as InternetExplorer are multi-threaded.Such
tasks like
simultaneous
before entire document is downloaded and performs multiple
l.

Fetch main HTML page,activateseparatethreadsfor other parts

2.

Each thread sets up a separateconnection with the server


.

Uses blocking calls

Advonced Sockets

4.23

3.

Each part (gif image) fetched separatelyand in parallel

4.

Advantage: connections can be setupto different sourceslike Ad server,image server,


and web server.

Example: Web Browsers


.

lnteract with human user and remote server in parallel (non blocking towards
user)

Hide communicationlatencies:Do somethingelse in parallel (e.g. Web


browser)

Load balancing (only useful if supporteton server side): connetionsto different


replicas,data transfer in parallel

Local dataprocessing

Compounddocuments:Drag-and-drop;inplace editing(notifications)

Componentsfor distribution transparency

4.2.17Common Thread Interface


.
thread_create(...):
createsa thread
.

thread_wait(...):waits for a specificthreadto exit

thread_exit(...):terminatesthe callingthread

'

thread*yield(...):calling threadpassescontrolon voluntarilyto anotherthread

4.3

Thread Creation and termination

4.3.1 Basic thread functions


Note: pthread - POSH Thread
4.3.1.1
pthread_createFunction
When a program is startedby exec,a single threadis created,called the initial thread
or main thread. Additional threadsare createdby pthread create.
# i n c l u d e< p t h r e a d . h >
int pthread_create(pth
read_t *tid,const pthread_attr_l*attr,void
* ( * f u n c )( v o i d * ) ,v o i d * a r g ) ;

returns:0 lf OK, posltiveExxxvalue on error

4.24

When a program is started,single threadis created(main thread).

Createmore threadsby calling pthread-create0

pthread_t is often an unsignedint. returns the new thread ID

attr: is the new threadattributes:priority, initial stack size,daemonthreador not.


To get default attributes, passas NULL

func: addressof a function to executewhen the thread starts


NetworkProgramming
and Management

arg: pointer to argumentto function (for multiple arguments,packageinto a


structureandpassaddressof structureto function)

Example
void * func (void*); //functionprototype
pthread_ttid; //to holdthreadlD
(&tid,NULL,func,NULL);
Pthread_create
*
void func (void* arg)

t
)

The returnvalue from the threadfunctionsis normally0 if OK or nonzeroon an eror.


But unlikethe socketfunctions,andmostsystemcalls,which return-l on an errorandseterr
no to a positivevalue,thepthreadfunctionsreturnthepositiveerrorindicationasthefunction's
return value.(Exxx) as in <sys/errno.h>
4.3.1.2

Pthread_join Function

#include<pthread.h>
Int pthreadjoln (pthread-ttld, vold *t status);
//Returns0 lf OK, posltlveExxxvalue on error

Wait for a giventhreadto terminate(similarto waitpid0 for Unix processes)

Must specifythreadID (tid) of threadif we wish to wait for


If statusargumentnon-null
Returnvaluefrom thread(pointerto someobject)is pointedto by status

.
.
4.3.1.3

pthread_selfFunction

A threadfetchesID valuefor itself usingthread-self


#lnclude<pthread.h>
pthread_tpthread_self(vold);
//Returnsthread lD of calllngthread

.
4.3.1.4

similarto getpid0 for Unix processes


pthread_detach
Function

#lnclude<pthread.h>
(pthread_ttld);
Int pthread_detach
//Returns0 lf OK, poslllveExxxvalue on error
*

A thread is eitherToinable(default) or detached.

When a joinable thread terminates thread ID and exit status are retained until another
thread calls pthreadjoin.

Advonced Sockets

4.25

Whena detachedthreadterminatesall resourcesarereleasedandcannotbe waitedfor


to terminate.

Whenonethreadneedsto know whenanotherthreadterminates,it is bestto leavethe


threadasjoinable.

The pthread_detach
functionchangesthe specifiedthreadso that it is detached.
Example:pthread_detach(pthread_self0);

4.3.1.5

pthread_exitfunction

Oneway for a threadto terminateis to call pthread_exit.


#lnclude<pthread.h>
void pthread_exit(vold * status);
//Does not returnto caller

If threadis not detached,threadID andexit statusis retainedfor a later pthreadjoin


by anotherthreadin the callingprocess.

Thepointerstatusmustnot point to an objectthat is local to the callingthread,since


that objectdisappearswhen the threadterminates.

Otherwavsfor a threadto terminate


Functionthat startedthe threadterminates,
with its returnvaluebeingthe exit
statusof the thread
.

main functionofprocessreturnsor anythreadcallsexit,. In suchcase,process


terminates
includinganythreads.

4.3.2 Str*cli Function usingThreads(TCP Echo Client using threads)


Recodestr_cli functionusingthreadsinsteadof usingfork0.Thedesignof our threads
versionis.

Clent

pthread_create

Fig, 4,16 Recoding str_cli to use threqds.

Fromthefig.4.l6:
4.26

NetworkProgramming
and Management

unpthread.hheader
It includesour normal unp.hheader,followed by the <pthread.h> header,and then
definesthe function prototypesfor our wrapperversionsof thepthread-XXX functionswhich
all beginwith pthread_.
Saveargumentsin externals
Storethe valuesof the two argumentsto str_cli: fp,(thestandardVO FILE pointerfor
theinput file), andsockfd,(theTCP socketthat is connectedto the server)in externalvariables.
Createnew thread
.
The threadis createdandthenew threadID is savedin tid.
.

The functionthat is executedby the new threadis copyto.

No argumentsare passedto the thread.

#include "unpthread.h"
void *copyto(void*);
sockfd;
staticint
/* globalfor both threadsto access*/
*fp;
staticFILE
void
*fp_arg,int sockfd_arg)
str_cli(FILE
{
recvline[MAXLINE];
char
pthread_t tid;
sockfd = sockfd_arg;
/* copy argumentsto externals*/
fp = fp_arg;
NULL,copyto,NULL);
Pthread_create(&tid,
> 0)
recvline,MAXLINE)
while(Readline(sockfd,
stdout);
Fputs(recvline,
)
void *
*arg)
copyto(void
{
char sendlinelMAXLlNEl;
MAXLINE,
fp) != NULL)
while(Fgets(sendline,
sendline,strlen(sendline));
Writen(sockfd,
SHUT-WR); /* EOFon stdin,sendFIN*/
Shutdown(sockfd,
return(NULL);
whenend-of-file
on stdin*/
/* 4return(i.e.,threadterminates)
)
Fig 4.17 str_clifunctionusingthreads
From the frg.4.l7:
Main thread loop: copy socket to standard output
The main thread calls readline andfputs, gopying from the socket to the standardou$ut.
Terminate
Advonced Sockefs

4.27

Whenthe str_cli functionreturns,


main functionterminatesby calling exit which terminateall threads.When
this happens,a//
threadsin the processare terminated.
Copyto thread
.
This threadjustcopiesfrom standardinput to the socket.
.

Whenit readsan end-of file on standardinput,a FIN is sentacrossthe socketby


shutdownand the threadreturns.

The return from this functionterminatesthe thread.

4.4 TCP Echo Serverusing Threads


.
TCP Echoserver-> Onethreadper client insteadof onechild processper client.
.

by usingour tcp_listenfunction.
It is madeas protocolindependent
"unpthread.h
#include

staticvoid 'doit(void*);
int
main(intargc,char**argv)

/* eachthreadexecutesthisfunction*/

t
int
listenfd,connfd;
socklen_t
addrlen,len;
structsockaddr rcliaddr;
if (argc== 2)
listenfd= Tcp_listen(NULL,
argv[1],&addrlen);
=
=
(argc
else if
3)
listenfd= Tcp_listen(argv[1
], argv[2],&addrlcn[
else
tcpserv0lt I ");
err_quit("usage:
cliaddr= Malloc(addrlen);
for(;;){
len = addrlen;
connfd = Accept(listenfd,
cliaddr,&len);
Pthread_create(NULL,
NULL,&doit,(voidr) connfd);
)
)
staticvoid *
doit(voidrarg)
{
(pthread_self
Pthread_detach
0);
str-echo((int)arg);/* samefunctionas before'/
Close((int)arg); /* we are done with connectedsocket*/
return(NULL);
)
Fig.l.It TCPEchoserverasing Thretds

Fromthe fig.4.18:
4.28

and Management
NetworkProgramming

Create a thread
.

is calledinsteadof fork.
When acceptreturns,pthread-create

The first argumentis a null pointer.

The single argumentthat we passto the doit functionsis the connectedsocket


descriptor,connfd.

Thread functlon
.

doit is the functionexecutedby the thread.

The threaddetachesitself, sincethereis no reasonfor the main threadto wait for


eachthreadthat it creates.

The function str_echodoesnot change.

'

Whenthis functionreturns,we mustclosethe connectedsocket,sincethe thread


sharesall descriptorswith the main thread.

With fork, the child did not needto closethe connectedsocketbecausethe child
then terminatedand all opendescriptorsareclosedon processtermination.

The main threaddoesnot closethe connectedsocket,which we alwaysdid with


concurrentserverthat calls fork.

This is becauseall threadswithin a processsharethe descriptors,so if the main


threadwereto call close,it would terminatethe connection.

Creatinga new threaddoesnot affectthe referencecountsfor opendescriptors,


Argumentsto New Threads
which is different from fork. 4.4.1Passing

routinepermitsthe programmerto passoneargumentto the


The pthread_createO
routine.
start
thread

For caseswheremultiple argumentsmust be passed,this limitation is easily


overcomeby creatinga structurewhich containsall of the arguments,and then
passinga pointerto that structurein the pthread-create0routine.

to work on
The integervariableconnfdis a void pointer,but this is not guaranteed
all systems.To handlethis correctlyrequiresadditionalwork.

First notice that we cannotjust passthe addressof connfdto the new thread.That is,
the following doesnot work
int
main( int argc,** argv)

int listenfd,connfd;
for (;;){
len= addrlen;
connfd = Acept ( listenfd,cliaddr,&len);
Advonced Sockcfs

4.29

pthread_create
(&tid,
NULL,&doit,
&confd)
;
staticvoid*
doit(void"arg){
int connfd:
connfd=*((int*)arg);
pthread_detach
(pthread_self
0);
str_echo((int)arg); /* samefunctionsas before*/
close((int)
arg); /* we aredonewithconnected
socket*/
return(NULL);
)
From an ANSI C perspectivethis is OK that we can castthe integerpointer to be a void
x and then cast this poiner back to an integerpointer.
The problem in this program is that it is not clear, what this pointer actually points
to.
Thereis one integervariableconnfd in the main threadand eachcall to acceptoverwrites
this variablewith a new value ( the connecteddescriptor).The following scenariocan occur,
l.

acceptreturns,connfd is storedinto and the main threadcalls pthread_create.The


pointer to connfd is the final argumentto pthread_create.

2.

A thread is createdand the doit functionsis scheduledto start executing .

3.

Another connectionis ready and the main thread runs again ( before the newly
created thread). accept returns, connfd ils stored , and the main thread calls
pthread_create.

'

Even though two threadsare created,both will operateon the final value storedinto
connfd. The problem is that multiples threadsare accessinga sharedvariable ( the
integer value in connfd) with no synchronization.

'

This problem is solvedin TCP Echo serverusing threadsprogram(Fig4.18) by passing


Ihe value of connfd to pthread_create, insteadof a pointer to the value.

'

This is fine given the way that C passesinteger valuesto a called function ( a copy of
the value is pushedonto the stack for the called function).

Fig. 4.19 showsa better solutionto this problem.


#include "unpthread.h"
staticvoid*doit(void*); /* eachthreadexecutes
thisfunction*/
int
main(int
argc,char**argv)
I

int
l i s t e n f d ,* i p t r ;
socklen_t
a d d r l e n ,l e n ;
s t r u c ts o c k a d d r* c l i a d d r :
if (argc== 2)
l i s t e n f d= T c p _ l i s t e n ( N U L La ,r g v [ 1 ] ,& a d d r l e n ) ;

4.30

NetworkProgramming
and Management

elseif (argc= = 3)
listenfd= Tcp_listen(argv[1
], argv[2],&addrlen);
else
err_quit("usage:
tcpservOl[ <host> ] <serviceor port>");
cliaddr= Malldc(addrlen);
for(;;){
len = addrlen:
iptr = Malloc(sizeof(int));
*iptr = Accept(listenfd,
cliaddr,&len);
Pthread_create(NULL,
NULL,&doit,iptO;

l
I
staticvoid *
doit(void*arg)
{
int connfd;
connfd- *((int*) arg);
free(arg);
(pthread_self
Pthread_detach
0) ;
str_echo(connfd); /* samefunctionas before*/
Close(connfd);
/* we are donewithconnectedsocket*/
return(NULL);
)
Fig.4.l9 TCP Echo server using threads with more portable argument passing.

From the fig. 4.19


.

Each time we call accept we first call malloc and allocate space for an integer
variable, the connecteddescriptor.

This gives each thread its own copy of the connected descriptor.

The thread fetches the value of the connected descriptor and then calls free to
release the memory.

Historically the malloc and free functions have been non-reentrant.

That is, calling either function from a signal handler while the main thread is in
the middle of one of thesetwo functionshasbeena recipe for disasters,becauseof
static data structures that are manipulated by these two functions.

So it requires these two functions, along with many others, to be thread- safe.
This is normally done by some form of synchronization performed within the library
functions that is transparent to us.4.4.2Thread-safeFunctions
.

We have to be careful with libraries.

If a function usesany static variables(or global memory) it's not safe to use
with threads!

Advonced Sockets

4.31

A piece of code is thread-safeif it functions correctly during simultaneous


executionby multiple threads.

It must satisfy the need for multiple threadsto accessthe sameshareddata, and
the need for a sharedpiece of data to be accessedby only one thread at any
given time.

There are a few ways to achievethread-safety:


.

re-entrancy: Basically, writing code in such a way that it can be interrupted during
one task, reenteredto perform anothertask, and then resumedon its original task.
This usually precludesthe saving of state information, such as by using static or
global variables.

mutual exclusion:Accessto shareddatais serializedusing mechanismsthat ensure


only one thread is accessingthe shareddataat any time. Great care is requiredif a
piece of code accessesmultiple sharedpieces of data - problems include race
conditions. deadlocks.livelocks. starvation.and various other ills enumeratedin
many operating systemstextbooks.

thread-local storage: Variables arelocalized so that eachthread has its own private
copy.The variablesretaintheir valuesacrosssubroutineand other codeboundaries,
and the code which accessesthem might not be reentrant,but since they are local
to each thread,they are thread-safe.

atomic operations: Shareddata are accessedby using atomic operationswhich


cannotbe interruptedby other threads.This usually requiresusing specialmachine
languageinstructions, which might be available in a runtime library. Since the
operationsare atomic, the shareddata are always kept in a valid state,no matter
what order threads accessit. Atomic ooerationsform the basis of many thread
locking mechanisms.

A con"monlyused idiom combinestheseapproaches:


.

make changesto a private copy of the shareddata and then atomically updatethe
shareddata from the private copy. Thus, most of the code is concurrent,and little
time is spent serialized.

The common techniquefor making a function thread-safeis to define a new function


whose name ends in _r.

4.4.3 Thread-Specific Data


When converting existing functions to run in a threadsenvironment, a common problem
encounteredis due to static variables,becausemultiple threadssharethe static buffer.
There are Yarious solutions:
1. Use thread-specificdata.
2.

4.32

Changethe calling sequenceso that the caller packagesall the argumentsinto a


NetworkProgramming
and Management

structure,and also store the static variables in the structure.Fig.4.20shows the


new structure and the new function prototypes.
typedefstruct{
int read_fd; / *caller'sdescriptorto readfrom */
char"read_ptr; / *caller'sbufferto readinto *l
size_tread_maxlen;/* caller'smax # bytesto readl*
/ * nextthreeare usedinternalyby the function*/
int rl_cnt;
/" intializeto 0*/
to r1_buf*/
char *rl_bufptr;
/* initialize
char r1_buf[MAXLINE];
) Rline;
void *, size_t,Rline*);
voidreadline_rinit(int,
*) ;
ssize_treadline_r(Rline
*);
ssize_tReadline_r(Rline
Fig,4.20. Data structure and futtction prototype for rcentrant version of readline.

3.

Restructurethe interfaceto avoid any staticvariablesso that the function is threadsafe.

Thread-specificdata is a commontechniquefor making an existing function threadsafe.

Before describingthe pthread functions that work with thread-specificdata, the


conceptand a possible implementationare described.

Each systemsupportsa limited number of thread-specificdata items.

Posix.l requiresthis limit be no lessthan 128(perprocess),and we assumethis


limit in the following example.The system(probablythe threadslibrary) maintains
one array of structurespcr process,which we call key structures,as we show in
Fig.4.21

*,pr{{
Key[1]

Flag
ptr
Destructor
Flag
ptr
Destructor

Flag
K e y [ 1 2 7 ] Destructor
ptr
Fig.4,2l possible implementation of thread-specific data

Advonced Sockofs

4.33

'

The flag in the key structure indicates whether this array element in currently in
use, and all the flages are initialized to be "not in use".

'

When a thread calls pthread_key_createto create a new thread-specific data item,


the system searchesthrough its array ofkey structures and finds the first one not
in use.

'

Its index, 0 through 127, is called the key and this index is returnedto the calling
thread.
The "destructor pointer, " is the other memberof the key structure.

In addition to the process-wide arrayof key structures,the systemmaintainsnumerous


piecesof information about eachthreadwithin a process.We call this a pthreadstructureand
part of this information is a 128-elementarray of pointers,which we call the pkey array. We
show this inFig.4.22
Thread0
Pthread{}

Thread n
Pthread{}

Otherthread
information
Pkey[0]

Pointer

NULL

Pkey[1
]

Pointer

NULL

Pkeyll27)

Pointer

NULL

Fig. 4,22 Information

Pkey[0]
pkey[1]
Thread
specific
data
ilems
pkeyll2zl

maintained by the system about each rhread.

All entriesin the pkeyarrayareinitialilzedto null pointers.

'

These 128 pointersare the "values" associatedwith eachof the possible 128 "Key
"
in the process.

'

When we createa key with pthread_keycreate,the systemtells us its key (index).

Each thread can then store a value (pointer) for the key, and each thread normally
obtainsthe pointer by calling malloc.
An example of how thread-specificdata is used, assumingthat our readline function
usesthread-specificdata to maintain per-threadstateacrosssuccessivecalls to the function.
Our readline function is modified to follow thesesteps.

4.34

A processis startedand multiple threadsare created.

2.

One of the threads will be the first to call readlilne and it in turn calls
pthread-key-create. The system finds the first unused key structure inFig.4.2l
NetworkProgramming
and Management

and returns its index (0-127) to the caller.Thepthread_oncefunction is used to


guaranteethat pthread_key_createis called only by the first thread to call readhne.
3.

readline calls pthread_getspecificto get the pkey(1) value for this thread,and the
returnd value is a null pointer . Therefore readline calls malloc to allocate the
memory that it needsto keep the per-threadinformation acrosssuccessivecalls to
readline for this thread. readline initializes this memory as needed and calls
pthread_setspecific
to set the thread-specificdatapointer (pkey[1]) for this key to
point to the memory that it.just allocated. The calling thread ThreadOin the
processfrom the fi9.4.23.
Thread0

Threadn

Thread
specific
data
items
Pkey[1
27]

datastructures
System

Fig.4.23 Associating malloced region with thread-specijlc data pointer

In this figure the pthread structure is maintained by the system , but the actual threaddoes
specificdatathat we malloc is maintainedby our function . All that pthread-setspecific
is set the pointer for this key in the pthread structure to point to our allocated memory.
Similarly, all that pthread_getspecificdoes is return that pointer to us.
4.

Another thread, say thread n, calls readlilne, perhaps while thread 0 is still
exeecutingwithin readline.
readline calls pthread*onceto initialize the key for this thread-specificdata item,
but since it has alreadybeen called, it is not called again.

5.

readlinecalls pthread-getspecificto fetch the pkey [1] pointer for this thread,and
a null pointer is returned. This thread then calls malloc follwed by pthreadsetspecific,just like thread0,initializing its thread-specificdata for this key(l). It
is shown infie.4.24.

Advonced Sockels

4.35

Thread
0

Pkey[0]

Threadn

NULL

Pkey[1
]
Pkey[1
27]

Pkey[0]
Pkey[1]
Pkeyll27l

Svstem
datastructures
Memoryallocaled
by thread

Fig.4.24 Data Structures after thread n initializes its thread-speciJic data.

6.

Thread n continues executing in readline, using and modifying its own threadspecific data.

llhat happens when a thread terminates?


If the threadhas called our readline function, that function allocated a region of memory
that needsto be freed. This is where the "destructorpointer " from Fig.4.21 is used.When
the threadthat createsthe thread-specific data item calls thread_key_create
one argumentto
this function is a pointer to a destructor./znction.
When a threadterminates,the systemgoesthroughthat thread'spkey array,calling the
correspondingdestructorfunction for eachnonnul pkey pointer.
What we mean by " correspondingdestructor" is the function pointer stored in the key
array in Fig.4.2l. This is how the thread -specific data is free when a thread terminates.
The two functions that are normally called when dealing with thread-specificdata are
pthread_onceand pthread_key_create.
#include <pthread.h>
*onceptr,void(*init)(void));
pthread_once(pthread_once_t
*keyptr,void (*destructor)(void *value));
pthread_key_create(pthread_key_t
Both return: 0 if Ok, positiveExxx value on error

pthread_once0
is normally calledevery time a function that usesthread-specificdata
iscalled,it usesthe value in the variable pointed to by onceptr to guaranteethat
init0functionis calledonly one time per process.
4.36

NetworkProgramming
and Management

pthread_key_create)must be called only one time for a given key within a process.The
destructorfunction will be called at the terminationof the thread'
Typical usageof thesetwo functions (ignoring error returns) is as follows:
pthread-key-trl-key;
pthread_once_t
rl_once= PTHREAD-ONCE-lNlT;
void
*
readline_destructor(void
Ptr)

t
free (pt0;
)
void
readline_once(void)

t
(&r1_key,readline-destructor)
pthread_key_create
;
)
ssize_t
r e a d l i n(e. . ,)

t
:
pthread_once(&r1
_once,readline_once)
== NULL){
il (( ptr= pthread_getspecific(rl_key))
p t r = M a l l o c( . . . ) ;
(rl-key,ptr);/*intialilze
memorypointedto by ptr */
pthread_setspecific
)
... /*usethe valuespoinmtedto by ptr*/
I

'
.

Every time readline is called, it calls pthread-once.

This function uses the value pointed to by its onceptr argument (the contents of
the variable rl_once) to make certain that its init function is called only one time.

This initialization function, readline_once,and creators the thread- specific data


key that is stored in rl_key, and which readline then uses in calls to
pthread-getspecific and pthread-setspecific.

The pthread_getspecific and pthread_setspecificfunctions are used to fetch and


"pointer" in
storehe value associatedwith a key. This value is what we called the
Fig.4.22

Advonced Sockefs

4.37

#include <pthread.h>
void *pthread_getspecific(pthread_key_t
key);
void *pthread_setspecific(pthread_key_t
key,constvoid *value);
returns : 0 if OK, positive Exxx value on error

'

These two functions are used to fetch and store the value associatedwith a key.

'

The value is normally a pointer,which normally points to dynamically allocated


memory.

'

4.5

The argument to pthread-key- create is a pointer to the key while the arguments
to the get ants set functions are the key itself

Mutexes
Mechanismfor thread coordinationand svnchronization
l.

Semaphores

2.

mutex calls

Semaphores
o

{ semaphore can allow someN threadsto enter a critical region.


It is used when there is a limited (but more than 1) number of a copy of a shared
resource.

'

It can be dynamically initialized. Threadcalls a semaphorewait function before it


entersa critical region

Semaphoreis a generalizationof a mutex

Mutex
'

Mutex is an abbreviationfor "mutual exclusion". Mutex variablesare one of the


primary meansof implementingthreadsynchronizationand for protectingshared
data when multiple writes occur.

{ mutex variable acts like a "lock" protecting accessto a shareddata resource.


The basic concept of a mutex as used in Pthreadsis that only one thread can lock
(or own) a mutex variable at any given time. Thus, even if several threads try to
lock a mutex only one thread will be successful.No other thread can own that
mutex until the owning thread unlocks that mutex. Threads must "take turns"
accessingprotecteddata.

r
4.38

d mutex allows one thread to enter a critical region.


NetworkProgramming
and Management

Threadscancreatea mutex andinitialize it. Beforeenteringa critical region,lock


the mutex.

Unlock the mutex after exiting the critical region.

Very often the action-performed


by a threadowning a mutex is the updatingof
globalvariables.This is a safeway to ensurethat when severalthreadsupdatethe
samevariable,the final value is the sameas what it would be if only one thread
performedthe update.The variablesbeingupdatedbelongto a "critical section".

in the useof a mutexis as follows:


I typicalsequence
o Createand initializea mutexvariable
o

Severalthreadsattemptto lock the mutex

o Only one succeedsand that threadownsthe mutex


o The ownerthreadperformssomesetof actions
o The ownerunlocksthe mutex
o Anotherthreadacquiresthe mutexandrepeatsthe process
o Finally the mutexis destroyed
.

When severalthreadscompetefor a mutex,the losersblock at that call - an


unblockingcall is availablewith "trylock" insteadof the "lock" call.

When protectingshareddata,it is the programmer'sresponsibilityto makesure


every thread that needsto use a mutex does so. For example,if 4 threadsare
updatingthe samedata,but only oneusesa mutex,the datacan still be corrupted.

due to raceconditions.
Mutexesare usedto preventdatainconsistencies

I raceconditionoftenoccurswhentwo or morethreadsneedto performoperations


on the samememoryarea,but the resultsof computationsdependson the orderin
which theseoperationsareperformed.

Mutexesare usedfor serializingsharedresources.Anytime a global resourceis


accessedby more than one threadthe resourceshouldhave a Mutex associated
with it.

One can apply a mutex to protecta segmentof memory("critical region") from


otherthreads.

Mutexescanbe appliedonly to threadsin a singleprocessanddo not work between


processesas do semaphores

Advonced Sockefs

4.39

Example threaded function:

Without Mutex

With Mutex
/* Note scope of variableand mutex are the same */
pthread_mutex_t
mutexI =
PTHREAD_MUTEX_INITIALIZER;

int counter=0;

int counter=0;

/* FunctionC */
voidfunctionC0

/* FunctionC */
void functionC0

{
pthread_mutex_lock
( &mutexl
counter++
pthread_mutex_unlock(
&mutexl) ;

counter++
)

Possibleexscutionsequence
Thread I

Thread 2

Thread I

counter=0

counter= 0

counter= 0

counter= 0

counter=
I

counter: I

counter = I

Thread2 lockedout.

Thread 2

ThreadI hasexclusiveuseof
woriqhla

anrrniFr

counter= 2

When a thread terminates the main loop decrementsboth nconn and nlefttoread.
void*
(void*vptr)
do_get_read
{
int
fd, n;
char
linelMAXL|NEl;
*fptr;
structtile
*)
fptr = (structfile vptr;
fd = Tcp_connect(fptr->f_host,
SERV);
=
fptr->f_fd fd;
printf("do_get_read
for %s,tdY"d,thread%d\n",
fptr->f_name,
fd, fptr->f_tid);
write_get_cmd(fpt4;
/* write0the GETcommand*/
l* 4Readserver'sreply *l
for(;;){
== g;
if ((n = Read(fd,
line,MAXLINE))
break;
/* serverclosedconnection*/
printf("read
%d byteslrom %s\n",n, fptr->f_name);
4.40

NetworkProgramming
and Management

)
", f ptr-> f-n ame)
printf("end-of-f
ile on o/os\n
;
Close(fd);
*/
/* clearsF-READING
fptr->f-flags= F-DoNE;
*/
return(fpt0;
/* terminatethread
l
/* end do_get_read*/
Fig. 4. 2 5 do-g et-rea d fu n ctio n

Fromthe fig.4.25,
Thesetwo decrementsare placedin the do_get-readfunction in which eachthread
It resultsin a slight
beforethethreadterminates.
thesetwo countersimmediately
decrement
concurrentprogrammlngerror.
The problemin the functionthat eachthreadexecutesis that thesetwo variablesare
a variable,that
If one threadis in the middleof decrementing
global,not thread-specific.
the samevariable,an error
and anotherthreadexecutesanddecrements
threadis suspended,
canresult.
For example,Assume that the C compiler turns the decrementoperatorinto 3
instructions:
.

Load from Memory into a register.

Decrementthe register.

Storefrom the registerinto memory.

scenarios
Possible
ThreadA is runningandit loadsthevalueof nconninto a register.
l.

The systemswitchesthreadsfrom A to B.A'sregistersare savedandB's registersare


restored.

2.

nconnto the C expressions


corresponding
ThreadB executesthe threeinstructions
, storingthe valueof 2.

3.

Sometimelaterthe systemswitchesthreadsfrom B to A. A's registersarerestoredand


A continueswhereit left off, at thesecondmachineinstructionin the three-instructions
from 3 to 2 and the valueof 2 is
the valueof the registeris decrements
sequence:
storedin nconn

The end resultis that nconnis 2 whenit shouldbe l. This is wrong.Thesetypesof


concurrentprogrammingerors arehardto find for numerousreasons.They are
o

it is an error and it will fail (Murphy'slaw) .


they occurrarely.Nevertheless

the error is hard to duplicate,since,it dependson the non-deterministictiming


of many events.

Advonced Sockels

4'41

On some systemsthe hardware instructions might be atomic; that is there exists


a hardware instruction to decrementan integer in memory and the hardware cannot
be interruptedduring this instruction.But this is not guaranteedby all systems,
so the code works on one system but not on another.

Threadsprogramming is also called as concurrentprogramming or parallel programming


sincemultiple threadscan be running concurrently( in parallel) accessingthe samevariables.
Fig.4.26 is a simple program that createstwo threadsand then has eachthread increment
a global variable 5000 times.
#include "unpthread.h"
#define
NLOOP5000
int
counter;
by the threads*/
/* this is incremented
*doit(void
*);
void
int
main(intargc,char**argv)
{
pthread_t tidA,tidB;
Pthread_create(&tidA,
NULL,&doit,NULL);
Pthread_create(&tidB,
NULL,&doit,NULL);
/* 4waitfor boththreadsto terminate*/
Pthreadjoin(tidA,
NULL);
Pthreadjoin(tidB,
NULL);
exit(0);
)
void*
doit(void*vptr)
{
int
i, val;
t,
* Eachthreadfetches,prints,and increments
the counterNLOOPtimes.
* Thevalueof the countershouldincreasemonotonically.
*l
for (i = 0; i < NLOOP;i++) {
val = counter;
printf("%d:7od\n",pthread_selffl,
val + t);
counter= val + 1:
)
return(NULL);
)
Fig.4.26 Two threads that increment a global variable incorrectly

We get worse the potential for a problemby fetching the current value of counter,
printing the new value,and then storingthe new value.
4.42

NetworkProgramming
and Managoment

Multiple threadsupdatinga sharredvariable,is the simplestof theseproblems.The


"
solutionis to protectthe sharedvariablewith a mutex(which standsfor mutualexclusion)"
and accessthe variable only when we hold the mutex. In terms of pthreads,a mutex is a
we lock and unlock the mutex using the following two
variableof type pthread_murex_t,
functions.
#include<pthread.h>
(pthread-mutex-t*mptr)
int pthread_mutex_lock
(pthread-mulex-t*mplr)
Int pthread_mutex_trylock
*mptr)
(pthread_mutex-t
int pthread_mutex_unlock
Both return:0 if OK positiveExxxvalueon error

If we try to lock a mutex that is alreadylockedby someother thread,we areblocked


until themutexis unlocked.
o

routineis usedby a threadto acquirea lock on the


The pthread_mutex_lock0
specifiedmutexvariable.If the mutexis alreadylockedby anotherthread,this
call will block the callingthreaduntil the mutexis unlocked.

will attemptto lock a mutex.However,if themutexis


pthread_mutex_trylock0
"busy" error code.
alreadylocked,the routinewill returnimmediatelywith a
This routinemay be usefulin preventingdeadlockconditions,as in a priorityinversionsituation.

will unlock a mutex if called by the owning thread.


pthread_mutex_unlock0
Callingthis routineis requiredaftera threadhascompletedits useof protected
data if other threadsare to acquirethe mutex for their work with the protected
data.An error will be returnedif:
o

If the mutexwas alreadyunlocked

If the mutexis ownedbv anotherthread

Usage
Mutex variablesmust be declaredwith type pthread-mutex-t,and must be
initialized before they can be used.There are two ways to initialize a mutex
variable:

l.

Statically,whenit is declared.For example:


pthread-mutex-tmymutex= PTHREAD- MUTEX-INITIALIZER;

2.

Dynamically,with the pthread_mutex_init0routine. This methodpermits


settingmutexobjectattributes,a/lr.

Fig.4.27is a correctedversionof Fig.4.26thatusesa singlemutexto lock thecounter


betweenthe two threads.
Advonced Sockefs

4.43

#include

"unpthread.h"

#define

NLOOP5000

by the threads*/
/* this is incremented
= PTHREAD_MUTEX_lNlTlALlZER;
pthread_mutex_tcounter_mutex
*doit(void
*);
void
int

counter;

int
main(intargc,char**argv)

pthread_t tidA,tidB;
Pthread_create(&tidA,
NULL,&doit,NULL);
Pthread_create(&tidB,
NULL,&doit,NULL);
4wail
for
both
threadsto terminate*/
l*
Pthreadjoin(tidA,
NULL);
Pthreadjoin(tidB,
NULL);
exit(0);

)
void *
doit(void*vptr)
{
int
i, val;
l*
* Eachthreadfetches,prints,and increments
the counterNLOOPtimes.
*rThevalueof the countershouldincreasemonotonically.
for (i = 0; i < NLOOP;i++) {
(&counter_mutex)
Pthread_mutex_lock
;
val = counter;
printf("o/od:
%d\n",pthread_selffl,
val + 1);
counter= vrl* li
Pthread_mutex_unlock(&counter_mutex)
;
)
return(NULL);
)
Fi9.4.27.Correctedversionof fig.4.26usinga mutexto protectthe sharedvariable
.

Declare a mutex named counter_mutex and this mutex must be locked by the
thread before the thread manipulates the counter variable.

When we run this program, the output is always correct: the value is incremented
monotonically and the final value printed is always 10000.

4.6

Conditionvariables

4.6.1 Introduction
.

Condition variables provide yet another way for threads to synchronize. While
mutexesimplement synchronizationby controlling thread accessto data, condition
variablesallow threadsto synchronizebasedupon the actual value ofdata.

4.44

NetworkProgrammingand Management

Without condition variables,the programmerwould needto have threadscontinually


polling (possiblyin a critical sectron),to checkif the conditionis met. Thls canbe
very resource consuming since the thread would be continuously busy in this
activity. A condition variable is a way to achievethe samegoal without polling.

d condition variable is always used in conjunction with a mutex lock.

Main Thread
.

Declare and initialize global datalvariableswhich require synchronizatron(such


as "count")

Declareand initialize a conditionvariableobject.

mutex
Declareand initialize an associated

CreatethreadsA and B to do work


Thread A

Do work up to the point where a


certainconditionmust occur (suchas
"count" must reacha specifiedvalue)
mutex and checkvalue
Lock associated
of a global variable
Call pthread_cond_wait0to perform a
blocking wait for signal from Thread-B.
Note that a call to pthread_cond_wait0
automaticallyand atomically unlocks the
associatedmutex variable so that it can
be used by Thread-B.

Thread B
. Do work
. Lock associatedmutex
. Changethe value of the global
variablethat Thread-A is waiting upon.
. Check value of the global ThreadAwait variable.If it fulfills the desired
condition,signalThread-A.
. Unlock mutex
. Continue

When signalled,wake up. Mutex is


automaticallyand atomically locked.
Explicitly unlock mutex
Continue
A condition variable is only neededwhereA set of threadsare using a mutex to provide
mutually exclusive accessto someresource.
Once a threadacquiresthe resource,it needsto wait for a particular condition to occur
If no condition variables are available some form of busy waiting in which thread
repeatedlyacquiresthe mutex, teststhe condition, and then releasesthe
mutex (wasteful solution)

Advonced Sockefs
15

4.45

A condition variable allows a thread to releasea mutex and block on a condition


atomically
A mutex is fine to prevent simultaneousaccessto a shared variable, but we need
somethingelse to let us go to sleepwaiting for someconditions to occur. Let's demonstrate
this with an example.But we cannotcall the pthreadfunction until we know that a threadhas
terminated.We first declarea global that countsthe numberof terminatedthreadsand protect
rt w,itha mutex
int

ndone:

/* numberof terminated
threads*/

pthread_mutex_t
= PTHREAD_MUTEX_INTIALIZER;
ndone_mutex
We thenrequirethat eachthreadincrementthis counterwhen it terminates,beingcareful
to usethe associated
mutex.
4.6.2 Functions used in conjunction with the condition variable:
o

Creating/Destroying:
*

pthread_cond_init

pthread_cond_t
cond:
PTHREAD_COND_INITIALIZER;

pthread_cond_destroy

Waiting on condition:
*

pthread_cond_wart

.& pthread_cond_timedwait- place limit on how long it will block.


o

4.6.2.1

Wakingthreadbasedon condition:
*

pthread_cond_signal

pthread-cond_broadcast- wake up all threadsblocked by the specifiedcondition


variable.
pthread_cond_initSubroutine

Purpose
Inittalizes a condition variable and setsits attributes.
Syntax
#include <pthread.h>
int pthread_cond_init(pthread_cond_t*condition, pthread_condattr_t*attr);
Both return: 0 if OK, positive Exxx value on error

4.46

NetworkProgramming
and Management

Description
The pthread_cond_initsubroutineinitializes a new condition variable,and setsits
attributesaccordingthe conditionattributesobjectattr.
After initializationof the conditionvariable,the conditionattributesobjectcan be
deleted.
reusedfor anotherconditionvariableinitialization.or
Parameters
condition Specifiesthe conditionto be created.
attr Specifiesthe conditionattributesobjectto usefor initializingthe conditionvariable.
If the valueis NULL. the defaultattributes
valuesareused.
pthread_cond_destroy

4.6.2.2

, PurposeDeletesa conditionvariable.
Syntax
#include<pthread.h>
*condition);
(pthread_cond_t
int pthread_cond_destroy
Bothreturn:0 if OK, positiveExxxvalueon error
Description
The pthread_cond_destroy
subroutinedeletesthe condition variable.After deletionof
the condition variable, the condition parameteris not valid until it is initialized again by a
call to the pthread_cond_init subroutine.
Parameter
Condition -Specifiesthe condition variable to delete.
4.6.2.3

pthread_cond_wait or pthread_cond-timedwait Subroutine

Purpose -Blocks the calling thread on a condition.


Syntax
#include<pthread.h>
*cond);
(pthread_cond_t
int pthread_cond_wait
*cond,pthread_mutex_t'mutex,
const
( pthread_cond_t
int pthread_cond_timedwait
structtimespec*abstime);
Bothreturn:0 if OK positiveExxxvalueon error

Advonced Sockets

4.47

Description
The pthread_cond_waitand pthread_cond_timedwait
functions are used to block on a
condition variable. They are called with mutex locked by the calling thread or undefined
behaviourwill result.
These functions atomically releasemutex and causethe calling thread to block on the
conditionvariablecond; atomicallyheremeans" atomicallywith respectto accessby another
thread to the mutex and then the condition variable". That is, if another thread is able to
acquire the mutex after the about-to-blockthread has releasedit, then a subsequentcall to
pthread_cond_signalor pthread_cond_broadcast
in that thread behavesas if it were issued
after the about-to-blockthread has blocked.
Upon successfulreturn, the mutex has been locked and is owned by the calling thread.
When using condition variablesthere is always a boolean predrcateinvolving shared
variablesassociatedwith eachconditionwait that is true if the threadshouldproceed.Spurious
wakeupsfrom the pthread_cond_waitor pthread_cond_timedwait
functionsmay occur.Since
the return from pthread_cond_waitor pthread_cond_timedwaitdoesnot imply anything about
the value of this predicate,the predicateshould be re-evaluatedupon such return.
The effect of using more than one mutex for concurrent pthread-cond-wait or
thread_cond_timedwaitoperationson the same condition variable is undefined; that is, a
condition variable becomesbound to a unique mutex when a thread waits on the condition
variable,and this (dynamic) binding ends when the wait returns.
A thread that has been unblockedbecauseit has been canceledwhile blocked in a
call to pthread_cond_waitor pthread_cond_timedwait
doesnot consumeany conditionsignal
that may be directedconcurrentlyat the condition variable if there are other threadsblocked
on the conditronvariable.
The pthread_cond_timedwaitfunction is the sameas pthread_cond_waitexcept that
an error is returned if the absolute
time specifiedby abstimepasses(that is, systemtime
equalsor exceedsabstime) before the condition cond is signaled or broadcasted,or if the
absolutetime specifiedby abstimehasalreadybeenpassedat the time of the call. When such
time-outs occur, thread_cond_timedwaitwill nonethelessreleaseand reacquirethe mutex
referencedby mutex. The functron pthread_cond_timedwaitis also a cancellationpoint.
If a signal is delivered to a thread waiting for a condition variable, upon return from
the signal handler the thread resumeswaiting for the condition variable as if it was not
interrupted,or it returns zero due to spuriouswakeup.
Parameters
Condition - Specifiesthe condition variable to wait on. mutex - Specifiesthe mutex
used to protect the condition variable. The mutex must be locked when the subroutineis
called. timeout - Points to the absolutetime structurespecifying the blocked statetimeout.
4.48

NetworkProgramming
and Management

Subroutine
or pthread_cond_broadcast
4.6.2.4 pthread_cond_signal
Purpose-Unblocksone or morethreadsblockedon a condition.
Syntax
# i n c l u d e< p t h r e a d . h >
i n t p t h r e a d _ c o n d _ s i g n a( pl t h r e a d - c o n d - t* c o n d i t i o n ) ;
int pthread_cond_broadcast(pthread-cond-t *condition);
Both return: 0 if OK positive Exxxvalue on error

Description
Thesesubroutinesunblock one or more threadsblocked on the condition specifiedby
condition. The pthread_cond_signalsubroutineunblocks at least one blocked thread,while
subroutineunblocks all the blocked threads.
the pthread_cond_broadcast
If more than one thread is blocked on a condition variable, the scheduling policy
determinesthe order in which threadsare unblocked.When eachthreadunblockedas a result
o f a p t h r e a d _ c o n d _ s i g n a lo r p t h r e a d _ c o n d _ b r o a d c a srte t u r n s f r o m i t s c a l l t o
pthread_cond_waitor pthread_cond_timedwait,the thread owns the mutex with which it
called pthread_cond_waitor pthread_cond_timedwait.The thread(s) that are unblocked
contendfor the mutex according to the schedulingpolicy (if applicable),and as if eachhad
called pthread_mutex_lock.
functions may be called by a thread
The pthread_cond_signalor pthread_cond_broadcast
whether or not it currently owns the mutex that threads calling pthread-cond-wait or
pthread_cond_timedwaithave associatedwith the condition variable during their waits;
however, if predictable schedulingbehavior is required, then that mutex is locked by the
threadcalling pthread_cond-signalor
pthread_cond_broadcast.
If no thread is blocked on the condition, the subroutinesucceeds,but the signalingof
the condition is not held. The next thread calling pthread_cond-waitwill be blocked.
Parameter
condition - Specifiesthe condition to signal.
4.7

Row sockets

4.7.1 Introduction
. It provides accessto internal network protocols and interfaces.
.
It provides accessto ICMP.
.
R a w s o c k e t s a l l o w a n a p p l i c a t i o n t o h a v e d i r e c t a c c e s st o l o w e r - l e v e l
communicationprotocols.
Advonced Sockels

4.49

Raw sockets are intended for advanceduserswho want to take advantageof some
protocol featurethat is not directly accessiblethrough a normal interface,or who
wants to build new protocols on top of existing low-level protocols.

'

Raw socketsare normally datagram-oriented,


although their exact characteristics
are dependenton the interfaceprovided by the protocol.

Raw socketsare not for most applications.

They are provided to supportdevelopingnew communicationprotocols or for access


to more impenetrablefacilities of an existing protocol.

Only superuserprocessmay use raw socket.

The socket type is SOCK*RAW

Raw socketsprovide three featuresnot provided by normal TCP and UDP sockets;
l.

Raw socketsare used to read and write ICMPv4 IGMPv4 and ICMPv6 packets.The
ping program, for examplesendsICMP echo requestsand receivesICMP echoreplies.
The multicast routing daemon,mrouted , sendsand receivesIGMPv4 packets.
This capability also allows ICMP or IGMP applicationsto be handledentirely as user
processes,insteadof putting more code into the kernel. The router discovery daemon
processestwo ICMP messages(rounderadvertisementand router solicitation) that the
kernel knows nothing about.

2.

With a raw socket a processcan read and write IPv4 datagramswith an IPv4 protocol
field that is not processedby the kernel.Most kernelsonly processdatagramscontaining
valuesof I (ICMP), 2 (IGMP),6 (UDP). The OSPFrouting protocol doesnot use TCP
or UDP but uses IP directly, setting the protocol field of the IP datagramsto 89. The
gatedprogram that implementsOSPF must use a raw socketto read and write theseIP
datagrams.This capability will carry over tolPv6 also.

3.

With a raw socket a processcan build its own IPv4 header,using the IP_HDRINCL
socketoption.

3 4

7 8

1 5 1 6

versi0nheader
qontino
lcnnlh

type of

identifccation
timeto live

31
Totallength

Offset
0lPlyl Fragment

HeaderChecksum
Protocol
32-bitsourcelPv4adress

20 bytes

32-bitdestination
lPv4adress
options(ifany)
data
4.50

NetworkProgramming
and Management

Protocolfield: I

ICMP

IGMP

TCP

I7

UDP

l.

R/W ICMPv4.IGMPv4.ICMPv6
ex. Ping

2.

by kernel
not processed
R/W otherdatagrams
ex. Gated= implementOSPF(protocol: 89)

3.

Build one'sown IPv4Header


ex. traceroute

4.8

Raw socketcreation
The stepsinvolvedin creatinga raw socketareas follows'

l.

The socketfunctioncreatesa raw socketwhenthe secondargumentis SOCK-RAW.


Thethird argument(theprotocol)is normallynonzero.For example,to createan IPv4
raw socketwe wouldwrite
int sockfd;
SOCK_RAW,protocol);
sockfd=socket(AF_lNET,
Here,
- Denotesthe ARPAlnternetaddresses.
AF_INET
- DenotesRAWsocket.
SOCK_RAW

Protocol - Specifiesa particular protocol to be used with the socket.


Specifying the Protocol parameterof 0 causesthe socket subroutineto default to the
typical protocol for the requestedtype ofreturned socket.For SCTP sockets,the protocol
parameterwill be IPPROTO-SCTP. Here protocol can be IPPROTO-RAW,
IPPROTO_ICMP which are defined in <netinet/in.h> header.
Only the super user can createa raw socket.This preventsnormal usersfrom writing
their own IP datagramsto the network.
2.

The IP_HDRINCL socketoption specifieswhetherthe processor the kernel builds the


IP header.Receiving of all IP protocols via IPPROTO_RAW is not possible using
raw sockets.

Advonced Sockeis

4.51

IP Header fields modified on sendingby IP_HDRINCL


IP Checksum

Always filled in.

SourceAddress

Filled in when zero.

PacketId

Filled in when zero.

Total Lensth

Always filled in.

It can be set as:


constint on: l;
if (setsockopt( sockfd, IPPROTO_IP,IP_HDRINCL,&on,sizeof (on)<0)

3.

If IP_HDRINCL is specifiedandthe IP headerhasa non-zerodestinationaddress


then the destinationaddressofthe socket is used to route the packet

If IP_HDRINCL isn't set then IP headeroptions can be set on raw socketswith


setsockopt.

Bind can be called on the raw socket,but this is rare. This function setsonly the local
address;there is no conceptof a port numberwith a raw socket.With regardto output,
calling bind setsthe sourceIP addressthat will be used for datagramssent on the raw
socket(but only if the I_HDRINCL socketoption is not set). If bind is not called, the
kernel setshe sourceIP addressto the primary IP addressof the outgoing interface.A
raw socketcan be bound to a specificlocal addressusrngthe bind 0 call. If it isn't
bound all packetswith the specifiedIP protocol are received.In addition a RAW socket
can be bound to a specificnetworkdeviceusing SO_BINDTODEVICE.

4.

Connect can be called on the raw socket,but this is rare. This function sets only the
foreign address:again, there is no conceptof a port number with a raw socket.With
regardto output,calling connectlets us callwrite or sendinsteadof sendto, sincethe
destinationIP addressis already specified.

4.9

Raw Socket Output

4.9.1 Output on a raw socket is governed by the following rules:


1. Normal output is perormed by calling sendto or sendmsg and specifying the
destinationIP address.Write, writev or sendcan also be called if the sockethas
been connected.
2.

If the IP_HDRINCL option is not set, the starting addressof the data for the
kernel to write specifiesthe first byte following the IP header,because
the kernal
will built the IP headerand prependit to the data from the process.Thekernel sets
the protocol field of the Ipv4 headerthat it builds to the third argumentfrom the
call to the socket.

4.52

NetworkProgramming
and Management

3.

If the IP_HDRINCL option is set,the startingaddressof the data for the kernel to
write specifies the first byte of the IP header. The amount of data to wite must
include the size of the caller's IP header.The processbuilds the entire IP header,
except (a)the Ipv4 identification field can be set to zero, which tells the kernal to
set this value, and (b) the kernel always calculatesand stores the lpv4 header
checksum.

4.

The kernel fragmentsthe raw packet that exceedsthe outgoing interfaceMTU.

A more network friendly and fasteralternativeis to implementpath MTU discoveryas


describedtn the IP-MTU-DISCOVER

Raw SocketOutput
RawSockets
lPv4= ByApplication
lPv6+ By Kernel

Kernel

1.

IP
Sendto/ sendmsg+ destination
connect+ write / writev / send

2.

StartingAddressfor the kernelto write


StartingAddr. : First byte following the IP header

3.

StartingAddressfor the kernelto write


Set IP_HDRINCL=
StartingAddr. = First byte of the IP header

4.

Fragmentationby kernel

With IPv4 it is the responsibilityof the userprocessto calculateand set any header
For example.In Pingprogramwe
containedin whateverfollowstheIPv4header.
checksums
mustcalculatetheICMPv4checksumandstoreit in theICMPv4headerbeforecallingsendto.
4.9.2 UsingIPv6 Raw Sockets
Raw socketsare usedin both IPv4 and IPv6 to bypassthe TCP and UDP transport
layers.
Advonced Sockets

4.53

lPv4

lPv6

Use

A c c e s sI C M P v 4 , I G M P v 4 ,
and to read and write IPv4
datagramsthat contain a protocol
field the kernel does not
recognize.

AccessICMPv6 and to read and


write IPv6 datagramsthat
contain a Next Header field the
kernel does not recognize.

Byte
order

Not specified.

Network byte order for all data


sent and received.

Send
and
receive
complete
packets

Yes

No. Uses ancillary data objects


to transfer extension headers
and hop limit information.

4.9.3 lpv6 dil'ferences


There are a few differenceswith Raw Ipv6 sockets.
'

All fields in the protocol headerssent or received on a raw IPV6 socket are in
network byte order.

'

There is nothing similar to the IPv4 IP_HDRINCL socket option with


IPv6.CompleteIPv6 packets cannot be read or written on an IPv6 rew socket.
Almost all fields in an IPv6 headerand all extensionheadersare availableto the
applicationthrough socketoptionsor ancillary data.Should an applicationneedto
read or write complete IPv6 datagrams,datalink accessmust be used.

Checksumson raw IPv6 socketsare handleddifferentlv.

4.9.4 lPv6 _CHECKSUM Socket option


For an ICMPv6 raw socketthe kernel always calculatesand storesthe checksumin the
ICMPv6 header.This differs from an ICMPv4 raw socket,where the socketapplicationmust
do this itself.
While ICMPv4 and ICMPv6 both requirethe senderto calculatethe checksum,ICMPv6
includesa pseudoheaderin its checksum.One of the fields in the pseudoheaderis the source
IPv6 addressand normally the application lets the kernel choosethis value. To prevent the
application from having to try to choose this addressjust to calculate the checksum,it is
easierto let the kernel calculatethe checksum.
For other IPv6 socketsa socketoption tells the kernel whether to calculateand storea
checksumibn outgoing packetsand verify the checksumin receivedpackets.
By default this option is disabled,and it is enabledby settingthe option value to a nonnegativevalue, as in
NetworkProgramming
and Management

nt offset =2;
(offset))< 0)
if( setsockopt(sockfd,IPPROTO_lPv6,IPv6_CHECKSUM,&offset,sizeof
error

This not only enables checksumson this socket, it also tells the kernel the byte offset
of the l6-bit checksum: 2 bytes from the start of the application data in this example. To
disablethe option it must be set to -1. When enabledthe kernel will calculateand store the
checksum for outgoing packets sent on the socket and also verify the checksums for the
packetsreceivedon the socket.
Using IPv6 raw sockets,an applicationcan accessthe following information:
.

ICMPv6 messages

IPv6 header

Routing header

IPv6 options headers:hop-by-hopoptionsheaderand destinationoptionsheader

4.10 Raw Socket Input


The kemel passesthe received IP datagramsto Raw sockets.The following rules apply.
l.

ReceivedUDP packetsand receivedTCP packetsare never passedto a raw socket.If


a processwants to read IP datagramscontaining UDP or TCP packets,the packets
must be read at the data link layer.

2.

Most ICMP packetsare passedto a RAW socket,after the kernel has finished processing
the ICMP message.Berkeley-derived implementationspassall receivedICMP packets
to a Raw socketother than Echo request,timestamprequestand addressmask request.
Thesethree ICMP messagesare processedentirely by the kernel.

3.

All IGMP packetsare passedto a RAW socket,after the kernel has finished processing
the IGMP message.

4.

All IP datagramswith a protocol field that the kernel doesnot understandare passedto
a raw socket.The only kernel processingdone on this packetsis the minimal verification
of someIP headerfields: the IP version,IPv4 headerchecksum,the headerlength and
the destinationIP address.
5.
If the datagramarrives in fragments,nothing is passedto the raw socketuntil all
fragments have arrived and have been reassembled.

When the kernel has an IP datagram to pass to a RAW sockets all Raw sockets for all
processesare examined, looking for all matching sockets.A copy of IP datagram is delivered
to eachmatching socket.
The following tests are performed for each raw socket and only if all three test are true
is the datagram delivered to the socket.
Advonced Sockels

4.55

l.

If a non zero protocol is specified, when the raw socket is created(thethird


argumentto socket),then the receiveddatagram'sprotocol field must match this
value, or the datagramis not deliveredto this socket.

2.

If a local IP addressis bound to the raw socketby bind, then the destinationIp
addressof the received datagrammust match this bound address,or the datagram
is not delivered to the socket.

3.

If a foreign IP addresswas specified for the raw socket by connect, then the
sourceIP addressof the receiveddatagrammust match this connectedaddressor
the datagramis not deliveredto this socket.

If a raw socket is created with a protocol of zero, and neither bind nor connect is
called , then that socketreceivesa copy ofevery raw datagramthat the kernel passesto raw
sockets.
Whenever a received datagramis passedto a raw IPv4 socket, the entire datagram
including the IP headeris passedto the process.

I.

UDP/ TCP

Never passto Raw Socket

2. Most ICMP

Kernel = Raw Socket

3. AII IGMP

Kernel => Raw Socket

4. All UnknownIP DatagramKernel + Raw Socket


5. FragmentIn

Reassemble= Raw Socket

RawSockets
4.10.1

protocolfield
bounddddr.= dest.lP
connected
dddr.= sourcelP

ICMPv6 type filtering

A Raw ICMPv4 socketreceivesmost ICMPv4 messagesreceivedby the


kernel. But ICMPv6 is a supersetof ICMPv4, including the functionality ARP
and IGMP. Thereforea raw ICMPv6 socketcan potentially receivemany more
packetscomparedto a raw ICMPv4 socket.
To reducethenumberofpacketspassedfromthekernelto theapplicationacrossa raw
ICMPv6socket,anapplication- specifiedfilter is provided.A filter is declaredwith a datatype
of structicmp6_filter,which is definedby including<netinet/icmp6.h>.
The currentfilter
for a raw ICMPv6socketis setand fetchedusingsetsockopt
andgetsockopt
with a levelof
IPPROTO_ICMPv6and opt nameof ICMP6 FILTER.
4.56

NetworkProgramming
and Management

Macro

Description

ICMP6 FILTER SETPASSALL

Passesall ICMPv6 messagesto an application.

ICMP6 FILTER SETBLOCKALL

from beingpassedtc
Blocksall ICMPv6messages
an application.

ICMP6 FILTER SETPASS

PassesICMPv6 messagesof a specified type to an


application.

ICMP6 FILTER SETBLOCK

Blocks ICMPv6 messagesof a specified


type from being passedto an application.

ICMP6 FILTER WILLPASS

typeis passedto
Returnstrue,if specifiedmessage
apDlication.

ICMP6 FILTER WILLBLOCK

Returnstrue, if the specified messagetype is blocked


from being passedto an application.

Six macrosoperateon the icmp6_filter structure.


# i n c l u d e <n e t i n e t / i c m p 6h.>
icmp6_filter*filt);
void ICMP6_FILTER_SETPASSALL(struct
(struct icmp6_filter *f ilt);
void ICMP6_FlLTER_SETBLOCKALL
(int msgtype,structicmp6_filter *f ilt);
void ICMP6_FlLTER_SETPASS
(int msgtype,struclicmp6_filter*lilt);
void ICMP6_FILTER_SETBLOCK
(int msgtype,conststruct icmp6-filter *filt);
void ICMP6_FILTER_WILLPASS
(int msgtype,conststruct icmp6-filter *filt);
void ICMP6_FILTER_WILLBLOCK
both return: 1 if filter will pass (block) message type, 0 otherwise

filt - is a pointerto an icmp6_filtervariablethat is modifiedby first 4 macrosand


examinedby final two macros.
type.
Msgtype- is a valuebetween0 and255 specifyingthe icmp message
option,
usetheICMP6_FILTER
For example,to enablefilteringof ICMPv6messages,
as follows:
structicmp6_filter myfiI ter;
(fd, IPPROTO_ICMPV6,
IPV6_FILTER,&(myfilter),(sizeof)(myfilter));
setsockopt
perror("setsockopt:
IPV6_FILTERerror")
4.ll

Ping program
o
o

routers,
betweentwo hosts(computers,
The ping utility testsresponsiveness
switches,etc.).Pingis usedprimarilyto find out if a computeris reachable.
It is often believedthat "Ping" is an abbreviationfor PacketInternet Groper

AdvoncedSockets

4.57

Ping accomplishesthis task by sendingout a special packet called the Internet


Control MessageProtocol (ICMP) echo requestpacket. ICMP packetsare special
IP messagesthat are used to send network information between two hosts
(computers,routers, switches,etc.).

When a machine receivesan echo request,it respondswith an echo reply. One


ICMP echo requestpacket is sent every second.

When the ping program gets an echo reply back from the remote host, it prints out the
response,giving severalpiecesof information:
1.

IP addressof where the Echo Reply came from

2.

Number of bytes of data sent

3.

Round trip time it took for a packet to go to and from the remote host

4.

Time-to-live (TTL) field

Every packet that gets sent out has a TTL field, which is set to a relatively high
number (ping packetsget a TTL of 255).
o As the packet travels over the network, the TTL field gets decreasedby one
for each node, server,or router it passesthrough.
o When the TTL drops to 0, the packet is discardedby the router.
o The main purposeof this is so that a packetdoesn'tlive forever on the network
and will eventuallydie when it is deemed"lost."
o If the TTL field variesin successivepings,it could indicatethat the successive
reply packetsare going via different routes.
o This could indicatethat certainnetwork routesmay be experiencingproblems.
r

Packetsare being sentalong different paths(and not the samepath eachtime)


trying to find the quickestalternativeroute.

o The time field is an indication of the round-trip time to get a packet to the
remote host.
o The reply is measuredin milliseconds.In general,it's best if round-trip times
are under 200 milliseconds.
o The time it takes a packet to reach its destinationis called latency.
If there is a large variance in the round-trip times, the network may be experiencing
problems.

4.58

NetworkProgramming
and Management

ICMPechorequest
<Wpe=128
, d e= 0 >
co

ICMPechoreply
< t y p e =1 2 9 c, o d e= 0 >
15 16

type

code

identifier

checksum
number
sequence

optionaldata

8 bytes

Fig 4.28 Format of ICMPv4 and ICMPv6 Echo request and Echo reply mess4Ses

Fromthe Fie.4.28;
TYpe- valuesfor ICMP messages.
, 4[ 0-Echoreply, 3- Destinationunreachable
l0- routersolicitation
Sourcequench, 5- redirect,8-Echo request,9-routerAdvertisement,
reply,l5 problem,l3-timestamp
request,14-timestamp
1I - time exceeded
, 12- parameter
informationrequest, 16-informationreply, l7- addressmaskrequest, I 8- addressmaskreply]
code is 0 and it containsthe error code for the datagramreportedon by this ICMP
type.
is dependent
uponthe message
message.
This interpretation
identifier -it is setto the processID of the Pingprocess
by one for eachpacketthat we send.
sequencenumber - is incremented
optional data - it is storedas 8 Bytetimestampwhenthepacketis sent.
by storing the timestampin thepacket when
RTT(RoundTrip time)-It is calculatetl
thereply is received.
The "ping" programcontainsa client interfaceto ICMP. It may be usedby a userto
verify an end-to-endInternetPathis operational.Theping programalsocollectsperformance
(i.e.themeasured
statistics
roundtrip timeandthenumberof timestheremoteserverfailsto
reply.
is received,
thepingprogramdisplaysa single
Eachtimean ICMP echoreplymessage
number,andthemeasured
line of text.The text printedby ping showsthereceivedsequence
roundtrip time (in milliseconds).
number(startingat 0) thatis incremented
containsa sequence
EachICMP Echomessage
after eachtransmission,and a timestampvalueindicatingthe transmissiontime.
Advonced Sockefs

4.59

in
Destination
ESsoecilied
lPprotocol
destination
payload
Server
copies
data
withthe
andreturns
a reply
source
anddestination
reversed
lPaddresses
Serverreceives
-/
echorcquesy'
test

icmpecho-feply

{4
sysais alive

Fig. 4.29 Use of the piug program to test whether tlte computer "syst" is operational.

The operation of ICMP is illustrated in the frame transition diagram shown above.In
this casethere is only one IntermediateSystem(IS) (router).
A version of the ping program works with both IPv4 and IPv6. We develop our own
program, insteadof presentingthe publicly availablesourcecode for two reasons.
o

The publicly available ping program suffers from a common Programming disease
know as creeping featurism: it supports a dozen different options Our goal in
examining a ping program is to understandthe network programming concepts
and techniques without being distracted with all these options. Our version of
Ping supports only one Option and is about five times smaller than the public
version.

the public version works with only IPv4 and we want to show a vision that also
supportsIPv6

The operation of ping is extremely simple: an ICMP echo request is sent to some IP
addressand that node respondswith an ICMP echo reply. Fig.4.30 shows the sampleoutput
of our ping program. It usesIPv4.
Solaris# pinggemini.tuc.noao.edu
PlNG gemini.tuc.
noao.edu (140,252.4.54)
:56databytes
64 bytesfrom 140.252.4.54:
seq=0,ttl=248,rtt=37.542ms
64 bytesfrom 140.252.4.54:
seq-1, ttl=248,rtt=34.596ms
64 bytesfrom 140.252.4.54:
seq=2,ttl=248,rtt=29.204ms
64 bytesfrom 140.252.4.54:
seq=3,ttl=248,rtt=52.630ms
Fig ,4.30 Sampleoutputfrom our ping program

4.60

NetworkProgramming
and Management

Fig .4.31 is an overview of the functions that compriseour ping program.


Establishsignal handler for SIGALRM

Establish
signalhandler
for SIGALRM ,^,
\
( sig_alrm )

.*.

v
r

hee
proc_v4

or

gv

sendan echo request


oncea second

Infinitereceiveloop
Fig.4.3l

Overview of the functions in our ping progrant

The program operates in two parts.


l.

It readseverythingreceivedon a raw socket,printing the ICMP echo replies.

2.

It sendsan ICMP echo requestonce a second(Itis driven by SIGALRM signal once a


second).

Fi9.4.32 shows our ping.h headerthat is included by all our program files.
#include "unp.h"
#include <netinet/in_systm.h>
#include < netinet/ip.h
>
#include <netinet/ip_icmp.h>
#defineBUFSIZE 1500
/* globals*/
char recvbufIBUFSIZE];
char sendbufIBUFSIZE];
int
char
int
pid_t
int

datalen;
*host;
nsent;
pid;
sockfd;

Advonced Sockets
16

/* #bytes of data, following ICMP header */


l* add 1 for each sendto0 */
/* our PID*/

4.61

int

verbose;

void

/* function prototypes */
proc_v4(char*, ssize_t,struct timeval ");

void

proc_v6(char*, ssize_t,struct timeval *);

void

send_v4(void);

void

send_v6(void);

void

readloop(void);

void

sig_alrm(int);

void

tv_sub(structtimeval *, struct timeval *);

struct proto {
void (*fproc)(char*, ssize_t,struct timeval *);
void (*fsend)(void);
s t r u c ts o c k a d d r * s a s e n d ;/ * s o c k a d d r { } f o r s e n d ,f r o m g e t a d d r i n f o* /
struct sockaddr *sarecv; /* sockaddr{} for receiving*/
socklen_t salen;
/ * l e n g t ho f s o c k a d d r { } s * /
i c m p p r o t o ; / * I P P R O T O _ x xvxa l u el o r I C M P* /

int
*pr;

)
#ifdef lPV6
#include "ip6.h"
#include

/ * s h o u l db e < n e t i n e t / i p 6 . h *>/
"icmp6.h"
/ * s h o u l d b e < n e t i n e t / i c m p 6 . h >* /

#endif
Fig.4.32ping.h header

From the fig.4.32


include IPv4 and ICMPv4 headers
.

The Basic IPv4 and ICMPv4 headersare included that define someglobal
variablesand our function prototypes.

Define proto Structure


.

The proto structureis used to handlethe differencesbetweenIPv4 and IPv6.

This structurecontainstwo function pointers,two pointers to socket address


structures,the size ofthe socketaddressstructures,and the protocol value for
ICMP.

The global pointer pr will point to one of theseaddressstructurethat is


initialized for either IPv4 or IPv6.

include IPv6 and ICMPv6 headers


.

4.62

Two headersare included that define the IPv6 and ICMPv6 structuresand
constants.
and Management
NetworkProgramming

Themain functionis shownin fig.4.33


#include "ping.h"
struct proto proto_v4 : { proc_v4,send_v4,NULL, NULL, 0, IPPROTO_ICMP
};
#ifdef lPV6
struct proto proto_v6 = { proc_v6,send_v6,NULL, NULL, 0, IPPROTO-ICMPV6};
#endif
*/
int datalen = 56;
/" data that goes with ICMP echo request
int
main(intargc, char **argv)

t
int
c;
structaddrinfo*ai;
opterr= 0; /* don't wantgetopt0writingto stderr*/
while( (c = getopt(argc,
argv,"v"))l= -1) {
switch(c) {
case'v':
verbose+
+;
break;
case'?':
option:%c",c);
err_quit("unrecognized
)
)
if (optind!= argc-'l)
ping [-v ] <hostname>");
err_quit("usage:
=
host argv[optind];
pid = getpid0;
sig_alrm)
Signal(SIGALRM,
;
NULL,0, 0);
ai = Host_serv(host,
o/os
printf("PlNG
(%s):o/"ddalabytes\n",ai->ai_canonname,
datalen);
ai->ai_addrlen),
Sock_ntop_host(ai->ai_addr,
*/
protocol
accordingto
/* 4initialize
:= AF-INET)
if (ai->ai-family
{
pr : &proto_v4;
#ifdef lPV6
{
) elseif (ai->ai-family== AF-INETO)
pr = &proto_v6;
sockaddr-in6*)
if (lN6_lS_ADDR_V4MAPPED(&(((struct
ai->ai_addr)->sin6_addr)))
ping IPv4-mapped
IPv6address");
err_quit("cannot
#endif
) else
err_quit("unknown
addressfamily"/"d",ai-> ai-f amily);
=
pr->sasend ai->ai_addr;
= Calloc(1,
pr->sarecv
ai->ai_addrlen);
=
pr->salen ai->ai_addrlen;
readloop0;
exit(0);
)
mainfunction
Fi9.4.33
Advonced Sockcfs

4.63

Define proto structures for IPv4 and IPv6


o

I proto structure is defined for IPv4 and for IPv6.

'

The socket addressstructurepointers are initialized to null pointers.

Length of optional data


'

The amount of optional data is set that gets sent with the ICMP echo requestto 56
bytes.

'

This will yield an 84 byte IPv4 datagrams(2O-byteIPv4 headerand 8 byte ICMP


header)or a 104 -byte IPv6 datagrams.

'

Any data that accompaniesan echo requestmust be sent back in the echo reply.

'

The time at which an echo requestis sendin the first 8 bytes of this data area is
storedand the RTT is calculatedand printed by using this when the echo reply is
received.

Handle command line options


'

The only commandline option -v is supportedwhich is usedto print most received


ICMP messages.

Echo replies are not printed.

{ signal handler is establishedfor SIGALRM and this signal is generatedonce a


secondand causesan ICMP echo requestto be sent.

Processhostnameargument
o

d hostnameor IP addressstring is a requiredargumentand it is processedby our


host_servfunction.

'

The returned addrinfo structure contains the protocol family, either AF_INET or
AF-INET6

'

The pr global to the correct proto structureis initialized. An IPv6 addressis not
really an IPv4-mapped IPv6 addressby calling IN6_IS*ADDR_V4MAPPED,
becauseeventhoughthe returnedaddressis an IPv6 address,IPv4 packetswill be
sent to the host.

'

The socket address structure that has already been allocated by the getaddrinfo
function is used as the one for sending,and another socket addressstructure ofthe
samesize is allocated for receiving.

The function read loop is where the processingtakesplace and we show this Fig. 4.34

4.64

NetworkProgramming
and Management

#include "ping.h"
void
readloop(void)
f
I

int
size;
char
recvbuf
[BUFSIZE];
socklen_t len;
ssize_t
n;
structtimevaltval;
pr->icmpproto);
sockfd: Socket(pr->sasend->sa_family,
SOCK_RAW,
setuid(getuid0); /* don'tneedspecialpermissions
any more*/
*
*/
size= 60 1024; /* OK if setsockoptfails
setsockopt(sockfd,
SOL_SOCKET,
SO_RCVBUE
&size,sizeof(size));
sig_alrm(SIGALRM);/* sendfirstpacket*/
for(;;){
1sn: pr->salen;
n = recvfrom(sockfd,
recvbuf,sizeof(recvbuf),
0, pr->sarecv,
&len);
il(n<0){
if (errno== EINTR)
continue;
else
err_sys("recvfrom
error");
)
Gettimeofday(&tval,
NULL);
(*pr->fproc)(recvbuf,
n, &tval);
)
)
Fig. 4. 3 4 readI oop funct ion

Create socket
o

{ raw socket ofthe appropriateprotocol is created.

The call to setuid setsour effective user ID to our real user ID.

The program must have superuser privileges to createthe raw socket,but now
that the socket is created,and the extra privileges are given up.

It is always best to give up this extra privilege when it is no longer needed.

Set socket receive buffer


.
The socket receive buffer size is set to 61,440 bytes(60X1024)which should be
larger than the default.
.
The user pings either the IPv4 broadcast addressor a relticast addresseither of
which can generatea lots of replies.
.

By making the buffer large4 thereis a smaller chancethat the socket receive buffer
will overflow.

Advonced Sockets

4.65

Send first packet


.

Signal handler is called which sendsa packetand schedulesa SIGALRM for one
secondin future.

It is not common to see a signal handlercalled directly but it is ok.

d s i g n a l h a n d l e r i s j u s t a C f u n c t i o n , e v e n t h o u g hi t i s n o r m a l l y c a l l e d
asynchronouslyby the kernel.

Infinite loop reading all ICMP messages


.

The main loop of the program is an infinite loop that readsall packetsreturnedon
the RAW ICMP sockets.

Gettimeofdayis called to record the time that the packet was received and then
call the appropriate protocol function(proc-v4 or proc-v6) to process ICMP
messages.

Fig .4.35 shows the proc_v4 function which processesall received ICMPv4 messages.
.

It is also realizedthatwhen the ICMPv4 messageis receivedby the processon the


RAW socket, the kernel has already verified that the basic fields in the IPv4 header
and in the ICMPv4 headerare valid.

#include "ping.h"
*ptr,ssize-tlen,structtimeval*tvrecv)
void proc_v4(char
{
int
h l e n l ,i c m p l e n ;
rtt;
double
*ip;
structip
structicmp *icmp;
structtimeval*tvsend:
ip = (structip *) ptr; /* startof lP header'/
hlenl = ip->ip_hl<< 2; /* lengthof lP header*/
icmp= (structicmp*) (ptr + hlenl); /* startof ICMPheader*/
n l e n- h l e n l )< 8 )
i f ( ( i c m p l e=
(%d)< 8", icmplen);
err_quit("icmplen
== ICMP_ECHOREPLY)
if (icmp->icmp_type
{
!= Pid)
if (icmp->icmp_id
*/
return;
/" not a responseto our ECHO-REQUEST
if (icmplen< 16)
(%d)< 16",icmplen);
err_quit("icmplen
tvsend= (structtimeval") icmp->icmp_data;
tvsend);
tv_sub(tvrecv,
* 'l000.0+ tvrecv->tv_usec
/ 1000.0;
rtt = tvrecv->tv_sec
4.66

Network Programmingand Management

printf("%dbytes from 7os:seq=7og,ttl=%d, rtt=%.3f ms\n",


icmplen, Sock-ntop-host(pr->sarecv,pr->salen),
i c m p - > i c m p _ s e q i,p - > i p - t t l , r t t ) ;
) else if (verbose){
printf(" %d bytes from %s: type : %d, code = %d\n"'
icmplen, Sock-ntop-host(pr->sarecv,pr->salen),
icmp-> icmp-type, icmp-> icmp-code);
I

)
Fig.4.35 proc-v4 function:process ICMPv4 message

Get pointer to ICMP header


.

The IPv4 headerlength field is multiplied by four, giving the size of the IPv4
headerin bytes.

icmp is set to point to the beginning of the ICMP header.

Fig.4.36 shows the various headers,pointersand lengths used by the code.

len
hlenl

icmplen

r.\-

header ICMPdata
lPv4header lPv4optionslCMPva

tl 2obyes
ip

o-40 t
l

icmP

Fig.4.36. Headers, poinlers and lengths in processing ICMPv4 reply

Check for ICMP echo reply


.
.

If the messageis an ICMP echo reply then the identifier field is checkedto seeif
this reply is in responseto a requestthat our processsent.
[f the ping program is running multiple times on this host, each processgets a
copy of all received ICMP messages.

.
.

The RTT is calculatedby subtractingthe time the messagewas sent from the current
time.
The RTT is convertedfrom microsecondsto millisecondsand printed, along with
the sequencenumber field and the receivedTTL.

The sequencenumber field lets the user seeifthe packetsare dropped,reordered


or duplicatedand the TTL gives an indication of the number of hops betweenthe
two hosts.

Advonced Sockets

4.67

Print all received ICMP messagesif verbose option specified


.

If the user specified the -v commandline option, the type and code fields are
printed from all other receivedICMP messages.

The fig.4.37 shows the tv_sub function which subtractstwo timeval


structures,storing the result in the first structure.
#include "unp.h"
void
tv_sub(structtimeval*out, struct tlmeval*in)
{
if ( (out->tv-usec-= in->tv-usec)< 0) { /* out -= in *l
- -out->tv_sec;
out->tv_usec+ = 1000000;
)
out->tv_sec- = in->tv_sec;
)
Fig.4.37 tv_sub function: subtract two timeval structures,

The processingof ICMPv6 messagesis handledby the proc_v6 function


shown in fig.4.38 .lt is similar to the proc_v4 function.
#include "ping.h"
void
*ptr,ssize_tlen,structtimeval*tvrecv)
proc_v6(char
{
#ifdef lPV6
int

h l e n 1i,c m p 5 l e n ;
double
rtt;
structip6_hdr *ip6;
structicmp6_hdr*icmp6;
structtimeval *tvsend:
ip6 = (structip6_hdr*) ptr;
/* startof lPv6header*/
hlen'l= sizeof(struct
ip6_hdr);
if (ip6->ip6_nxt
! = IPPROTO_ICMPV6)
err_quit("next
headernot IPPROTO_ICMPV6");
*) (ptr + hlenl);
icmp6= (structicmp6_hdr
: len - hlen'l)< 8)
if ( (icmp6len
(%d)< 8", icmp6len);
err_quit("icmp6len
== ICMP6_ECHO_REPLY)
if (icmp6->icmp6_type
{
t= pid)
if (icmp6->icmp6_id
4.68

NetworkProgramming
and Management

*/
return;
/* not a responseto our ECHO-REQUEST
if (icmp6len< 16)
(%d)< 16",icmp6len);
err_quit("icmp6len
tvsend= (structtimeval") (icmp6+ 1);
tv_sub(tvrecv,tvsend);
* 1000.0+ tvrecv->tv-usec/1000.0;
rtt = tvrecv->tv_sec
printf("o/od
bytesfrom 7os:seq=7otl,hlim=%d,rtt=%.3fms\n",
pr->salen)'
icmp6len,Sock-ntop-host(pr->sarecv,
ip6-> ip6_hlim,rtt);
icmp6-> icmp6_seq,
) elseif (verbose){
printf(" %d bytesfrom %s:type = %d,code = %d\n",
pr->salen),
icmp6len,
Sock-ntop-host(pr->sarecv,
icmp6->icmp6-code);
icmp6->icmp6_type,
)
*enart /* lPV6*/
)
Fig.4.38 Proc-v6function : process received ICMPv6 message.

Get pointer ICMPv6 header


o

The size ofthe IPv6 headeris fixed(40 bytes)and it is ensuredthat the next header
is ICMPv6.

Fig.4.39 shows the various headers,pointersand lengthsused by the code.

len
hlenl

,
header
lCMPv6

lPv6header

tt 4obytes
ip6

tl

icmpGlen
ICMPdata

icmp
Fig.4.39 Headers, pointers and lengths in processittg ICMPv6 reply,

Check for ICMP echo reply


o

If the ICMP messagetype is an Echo reply, the identifier field is checkedto see
if the reply is received.

The RTT is calculated and then printed along with sequencenumber and the
IPv6 hop limit.

Advonced Sockets

4.69

Print all received ICMP messagesif verbose option specified


o

If the user specifiedthe -v commandline option, the type and code fields are
printed from all other receivedICMP messages.

Our signal handler for the SIGALRM signal is the sig_alrm function shown in
fig.4.40.
o Our readloop calls this signal handleronce at the beginning to sendthe first
packet.
o

This function just calls the protocol dependentfunction to send an ICMP echo
request(send_v4or send_v6)and then schedulesanother SIGALRM for I
secondin the future.

#include "ping.h"
void
s i g _ a l r m ( i nst i g n o )
f

(*pr->fsend)0;
alarm(1);
return; /* probablyinterruptsrecvfromQ
I

Fig. 4.40 Sig_alrm function : SIGALRM signal handler

The function send_v4shown in fig.4.4l, builts an ICMPv4 echo requestmessage


and writes it to the raw socket.
#include "ping.h"
void
send_v4(void)
{
int
len;
structicmp *icmp;
icmp - (structicmp *) sendbuf;
= ICMP_ECHO;
icmp->icmp_type
: 0;
icmp->icmp_code
=
icmp->icmp_;6 pid;
= nsent++;
icmp->icmp_seq
Gettimeofday((struct
timeval*) icmp->icmp_data,
NULL);
=
len 8 + datalen: /* checksum
ICMPheaderand data*/
= 0;
icmp->icmp_cksum
*) icmp,len);
: in_cksum((u_short
icmp->icmp_cksum
pr->salen);
Sendto(sockfd,
sendbuf,len,0, pr->sasend,
)
Fig.4,4l Send_v4 Function : Build an ICMPv4 Echo request and send it.

4.70

NetworkProgramming
and Management

Buitd an ICMPv4 message


o

The ICMPv4 messageis built.

The identifier field is set to our processID and the sequencenumber field is set
to the global nsent, which is then incremented for the next packet.

The current day of time is stored in the data portion of the ICMP message.

Calculate ICMP checksum


o

To calculatethe ICMP checksum,thecheck sum field is set to 0 and the function


in_cksum is called which storesthe result in the checksumfield.

The ICMPv4 cheksum is calculated from the ICMPv4 header and any data that
follows.

Send datagram
o

The ICMP messageis sent on the raw socket'

If IP_HDRINCL socket option is not set, then the kernel builds the IPv4 header
and prepends it to our buffer.

The Internet checksumis the ones-complementsum of the l6 bit values to be


check summed.

If the data length is an odd number,then one byte of zero is logically appended
to the end of the datajust for the checksumcomputation.

This algorithm is used for the IPv4,ICMPv4 ,IGMPv4,ICMPV6,UDP and TCP


checksums.
the checksum.
in-cksumfunctionshowninlig.4-42calculates

Unsignedshort
short*addr,int len)
in_cksum(unsigned
{
nleft= len;
int
int
sum = 0:
unsignedshort *w = addr;
unsignedshort answer= 0;
l*
'
(sum),we add
Our algorithmis simple,usinga 32 bit accumulator
.
sequential16 bit wordsto it, and at the end,fold backallthe
Advonced Sockets

4.71

'

carry bits from the top 16 bits into the lower 16 bits.
w h i l e ( n l e f t> 1 ) {

s u m+ = * w + + ;
nleft-= 2;

*/
/* 4mopup an odd byte,if necessary
if (nleft=: 1) {
*(unsigned
= *(unsigned
char")(&answer)
char")w ;
sum += answer:

l* 4addbackcarryoutsfromtop '16bitsto low 16 bits */


sum = (sum>> 16) + (sum& Oxffi);I'add hi 16to low 16 */
surn*= (sum>> 16);
l* add carry*l
answer= -SU[li
/* truncateto 16 bits */
return(answer);
)
Fig.4.42 in_cksumfunc!ion;calculates the Internet checksum

Internet checksum al gorithm


.

The first while loop calculatesthe sum of all the l6 bit values.

If the length is odd, then the final byte is addedinto the sum.

'

This algorithm is fine for ping program but inadequatefor the high volumes of
checksumcomputationsperformedby the kernel.

The final function for our ping program is send_v6shown in fig.4.43 which builds and
sendsan ICMPv6 echo request.
#include "ping.h"
voidsend_v60
{
#ifdef lPV6
int
len;
structicmp6_hdr*icmp6;
icmp6= (structicmp6_hdr*) sendbuf;
= ICMP6_ECHO_REOUEST;
icmp6->icmp6_type
= O;
icmp6->icmp6_code
=
icmp6->icmp6_idpid;
= rsaflt* *i
icmp6->icmpo_seq
Gettimeofday((struct
timeval*) (icmp6+ 1), NULL);
=
len 8 + datalen; /* 8-bytelCMPvoheader*/
Sendto(sockfd,
sendbuf,len,0, pr->sasend,pr->salen);
4kernel
calculates
and storeschecksumfor us */
l*
#endif l* lPV6*l
l
Fig.4.43 send_v6function:build and send an ICMPv6 echo request messege

4.72

NetworkProgramming
and Management

This function is similar to send v4 but in that it does not compute the ICMPv6
checksum.

Since the ICMPv6 checksumusesthe sourceaddressfrom the IPv6 headerin its


computation, this checksum is calculatedby the kernel for us, after the kernel
choosesthe sourceaddress.

4.12 Traceroute program


4.12.1

Introduction

"ping" program, it may be used by a user to verify an end-to-end Internet


Like the
Path is operational,but also provides information on each of the IntermediateSystems(i.e.
IP routers) to be found along the IP Path from the senderto the receiver'
.

as well as trace the path a


The tracerouteutility will check responsiveness
packet takes to get from one host to the other'

Some deviceson a network will not respondto ping or tracerouterequests.

In this caseusing the tracerouteutility can show the location right before the
host that's not resPonding'

Traceroute (tracert) works by sending a packet to an open UDP port on a


destinationmachine.

For the initial three packets,traceroutesetsthe TTL to I and releasesthe packet.

The packet then gets transferred to the first router (completing the first hop), and
the TTL gets decrementedby the router from I to 0.

The router then discards the packet and sendsoff an ICMP notification packet to
the original host with the messagethat the TTL expired from the router.

This tells tracert what the first hop is and how long it takes to get there.

Traceroute repeatsthis, gradually incrementing the TTL until a path to the remote
host is traced and it gets back an ICMP Port Unreachablemessage,indicating that
the remote host has been reached.

Responsetimes may vary dramaticallybecausethe packet is crossinglong distances,


other times the increasescome from network congestion.
"traceroute" encountersa router that does not respond, it prints ? "*;s
When
character.
The "traceroute" program also contains a client interface to ICMP.

TracerouteusesICMP echo messages.

These are addressedto the target IP address.

Advonced Sockets

4.73

The sender manipulates the TTL (hop count) value at the IP layer to force each
hop in turn to return an error message

Traceroute is a computer network tool used to determine the route taken by packets
acrossan IP network. An IPv6 variant. traceroute6. is also widely available.
4.12.2

Uses

Tracerouteis often used for network troubleshooting.

By showing a list of routers traversed,it allows the user to identify the path
taken to reach a particular destination on the network.

This can help identify routing problemsor firewalls that may be blocking access
to a site. Tracerouteis also usedby penetrationtestersto gather information
about network infrastructure and IP ranges around a given host.

It can also be used when downloadingdata, as if there are multiple mirrors


available for the samepiece of data, one can trace each mirror to get a good idea
of which mirror would be the fastestto use.

Fig.4.44 shows our trace.hheaderwhich all of our program files include.


#include "unp.h"
#include <netinet/in_systm.h>
#include <netinet/ip.h>
#include <netinet/ip_icmp.h>
#include <netinet/udp.h>
#defineBUFSIZE 1500
structrec {
/* formatof outgoingUDPdata */
u_short rec_seq;
/* sequencenumber*/
u_short rec_ttl;
/* TTLpacketleftwith */
structtimeval rec_tv; /* time packetleft */
);
/* globals*/
char recvbufIBUFSIZE];
char sendbufIBUFSIZE];
int datalen;
/* #bytesof data,followingICMPheader*/
char *host;
u_shortsport,dport;
int nsent;
/* add 1 for eachsendtoO*/
pid_t pid;
/* our PID*/
int probe,nprobes;
int sendfd,recvfd; /* sendon UDPsock,readon raw ICMPsock */
int ttl, max_ttl;
int verbose:
/* functionprototypes*/
*icmpcode_v4(int);
char
char *icmpcode_v6(int);
int recv_v4(int,
structtimeval*);
int recv_v6(int,
structlimeval*);
4.74

NetworkProgramming
and Management

void sig_alrm(int);
void traceloop(void);
timeval*, structtimeval*);
void tv-sub(struct
structproto {
char *("icmpcode)(int);
structtimeval*);
int (*recv)(int,
getaddrinfo*/
structsockaddr*sasend;/"sockaddr{}for send,from
*/
*sarecv;
/* sockaddr{}lor receiving
structsockaddr
*/
structsockaddr*salast;/* lastsockaddr{}lor receiving
*sabind;
port *i
source
binding
for
/* sockaddr{}
structsockaddr
*/
salen; /* lengthof sockaddr{}s
socklen-t
valuefor ICMP*i
icmpproto;/* IPPROTO-xxx
int
level
to set TTL*/
int ttllevel;
/* setsockopt0
nameto set TTL*/
int ttloptname, /* setsockoptQ
'pr;
)
#ifdef lPV6
*/
/* shouldbe <netinet/ip6.h>
#include "ip6.h"
*/
"icmp6.h"
/* shouldbe <netinet/icmp6.h>
#include
#endif
Fig.4.44 Trace.h header

IncludeIPv4 headers
o
o

The standardIPv4 headersare includedthat define the IPv4 ,ICMPv4 and UDP
andconstants.
structures
The rec structuredefinesthe dataportion of the UDP datagramthat is send,but
purposes.
It is sentmainlyfor debugging
to be examined.
thedatais not necessary

Defineproto Structure
o

The proto structureis usedto handlethe protocoldifferencesbetweenIPv4 and


IPv6.

This structurecontainsfunctionpointers,pointersto socketaddressstructures,


that differ betweenthetwo IP versions
andotherconsants

The globalpointerpr will point to oneof theseaddressstructurethat is


by
initialized for eitherIPv4 or IPv6 after the destinationaddressis processes
themain function.

includeIPv6 headers
o

The headersare includedthat definethe IPv6 and ICMPv6 structuresand


constants.

the commandline arguments,


The main function is shownin fig.4.45.It processes
initializesthe pr pointer for eitherIPv4 or IPv6 and calls our traceloopfunction.
AdvoncedSockets

4.75

#include "trace.h"
structproto proto_v4={icmpcode_v4,
recv_v4,
NULL,NULL,NULL,NULL,0,
TPPROTO_|CMB
TPPROTO_|B
rP_TTL);
#ifdef lPV6
structproto proto_v6= { icmpcode_v6,
recv_v6,NULL,NULL,NULL,NULL,0,
IPPROTO_ICM
PV6,IPPROTO_|
PV6,IPV6_U
NrCAST_HOPS
);
#endif
int datalen= sizeof(struct
rec); /* defaults*/
int max_ttl= 30;
int nprobes= 3;
u_shortdport = 32768+ 666;
int
main(intargc,char**argv)

int
c;
structaddrinfo*ai;
opterr: 0; /* don't wantgetopt0writingto stderr*/
while( (c = getopt(argc,
argv,"m:v"))!= -1) {
switch(c) {
case'm':
if ( (max_ttl= atoi(optarg))
<= 1)
-m value");
err_quit("invalid
break;
case'v':
verbose+ + ;
break;
case'?':
err_quit("
unrecognized
option: o/oc",
c)i
)

)
if (optindl= argc-1)
err_quit("usage:
traceroute[ -m <maxttl>-v ] <hostnamet");
host = argv[optind];
pid = getpid$;
. Signal(SIGALRM,
sig_alrm);
ai = Host_serv(host,
NULL,0, 0);
printf("traceroute
to %s (7os):7odhops max,o/oddata bytes\n",
ai-> ai_canonname,
Sock_ntop_host(ai> ai_addr,ai-> ai_addrlen),
max_ttl,datalen);
/* initializeaccordingto protocol*/
== AF_INET)
if (ai->ai_family
{
pr : &proto_v4;
#ifdef lPV6
== 1p_;11ET6)
) elseif (ai->ai_family
{
pr = &proto_v6;
*)ai-> ai_addr).
if (lN6_lS_ADDR_V4MAPPED(&(((struct
> sin6_addr)))
sockaddr_in6
ping lPv4-mapped
err_quit("cannot
lPv6address");
#endif
) else
err_quit("unknownaddressfamily o/"d",ai-> ai_family);
pr->sasend= ai->ai_addr;
4.76

address*/
/* containsdestination
NetworkProgrammingand Management

p r - > s a r e c v= C a l l o c ( 1 ,a i - > a i _ a d d r l e n ) ;
p r - > s a l a s t= C a l l o c ( 1, a i - > a i _ a d d r l e n ) ;
p r - > s a b i n d= C a l l o c ( 1 ,a i - > a i _ a d d r l e n ) ;
p r - > s a l e n= a i - > a i _ a d d r l e n ;
traceloop0;
exit(0);

Fig.4,45 Main function for traceroute prograrfl'

Defineproto structures
The two proto structuresaredefined,one for IPv4 and one for IPv6, althoughthe
arenot allocateduntil the endof this function.
pointerto the socketaddressstructures
SetDefaults
.

The maximum TTL or Hop limit that the program usesdefaults 230.

The -m commandlineoption is provided to let the user changethis'

For each TTL three probe packetsare sent but this could be changedwith
anothercommandline option.

The intial destinationport is 32768+ 666 and this will be incrementedby one
each time a UDP datagramis sent.

Theseports are not in use on the destinationhost when the datagramsfinally


reach the destination.but there is no guarantee.

ProcessCommand Line Argument


The -v command line option causesmost receivedICMP messagesto be printed.
Processhostnameor IP addressargument and finish initialization
.

The destinationhost name or IP addressis processedby our host-serv function


returning a pointer to an addrinfo structure

Dependingon the type of returnedaddressIPv4 or IPv6, initializing the proto


structure,store the pointer in the pr global and allocate additional socketaddress
structureofthe correct size are finished.

The function traceloop shown in fig.4.46 sendsthe datagramsand reads the


returnedICMP messages.This is the main loop of the program.

AdvoncedSockefs

4.77

"trace.h"
#include
void
traceloop(void)

t
int
seq,code, done;
rtt;
double
*rec:
struct rec
struct timeval
tvrecv;
recvfd = Socket(pr-> sasend-> sa_family,SOCK_RAW,pr-> icmpproto);
s e t u i d ( g e t u i d 0 ) ; / * d o n ' t n e e d s p e c i a lp e r m i s s i o n sa n y m o r e * /
s e n d f d = S o c k e t ( p r - > s a s e n d - > s a _ f a m iS
l yO, C K _ D G R A M0,) ;
pr-> sabind-> sa_family= pr-) sosnd-> sa_family;
sport = (getpid0 & oxffff) | 0x8000; /* our source UDP porl# *l
s o c k _ s e t _ p o r t ( p r - > s a b i npdr,- > s a l e n ,h t o n s ( s p o r t ) ) ;
B i n d ( s e n d l dp
, r - > s a b i n d ,p r - > s a l e n ) ;
sig_alrm(SIGALRM);
seq = 0'
done = 0;
f o r ( t t l= 1 ; t t l < = m a x _ t t&
l & done == 0;ttl++) {
Setsockopt(sendfd,pr->ttllevel,pr->ttloptname,&ttl, sizeof(int));
bzero(pr-> salast, pr-> salen);
p r i n t f ( " % 2 d" , t t l ) ;
f fl u s h ( s t d o u t ) ;
f o r ( p r o b e = 0 ; p r o b e < n p r o b e s ;p r o b e + + ) {
yss = (struct rec *) sendbul;
r e c - > r e c _ s e q= + + s e q ;
rec->rec_ttl = ttl;
> rec_tv,NULL);
Gettimeofday(&recpr->salen, htons(dport + seq));
sock_set_port(pr->sasend,
S e n d t o ( s e n d f ds, e n d b u f ,d a t a l e n ,0 , p r - > s a s e n d ,p r - > s a l e n ) ;
if ( (code = (*pr->recv)(seq,&tvrecv))== -3;
p r i n t f ( "* " ) '
/ * t i m e o u t ,n o r e p l y * /
else {
char strINI_MAXHOST];
if (sock_cmp_addr(pr->sarecv,
pr->salast,pr->salen) !: 0) {
pr->salen, str, sizeof(str),
if (getnameinfo(pr->sarecv,
N U L L , 0 , 0 )= = g ;
printf(" %s (o/os)",
str,
> sarecv,pr-> salen));
Sock_ntop_host(prelse
printf(" 7os",
pr->salen));
Sock_ntop_host(pr->sarecv,
p
r
>
s
a
r
e
c
v
p
,
r
>
salen);
memcpy(pr->salast,
)
tv_sub(&tvrecv,&rec-> rec_tv);
rtt = tvrecv.tv_sec* 1000.0 + tvrecv.tv_usec/ 1000.0;
printf(" %.31ms", rtt);
if (code == -1) /* port unreachable;at destination*/
d o n e ++ :

4.78

NetworkProgramming
and Management

p r - > s a r e c v= C a l l o c ( ' | ,a i - > a i _ a d d r l e n ) ;


p r - > s a l a s t= C a l l o c ( 1 ,a i - > a i _ a d d r l e n ) ;
p r - > s a b i n d= C a l l o c ( 1 ,a i - > a i _ a d d r l e n ) ;
p r - > s a l e n= a i - > a i _ a d d r l e n ;
traceloop0;
exit(0);

Fig.4.45 Main function lor tracerouteprograu.

Define proto structures


The two proto structures are defined, one for IPv4 and one for IPv6, although the
pointer to the socketaddressstructuresare not allocateduntil the end of this function.
Set Defaults
'

The maximum TTL or Hop limit that the program usesdefaults 230.

The -m commandlineoption is provided to let the user changethis.

For each TTL threeprobe packetsare sent but this could be changedwith
anothercommandline option.

The intial destinationport is 32768+ 666 and this will be incrementedby one
each time a UDP datagramis sent.

Theseports are not in use on the destinationhost when the datagramsfinally


reach the destination,but there is no guarantee.

ProcessCommand Line Argument


The -v command line option causesmost receivedICMP messagesto be printed.
Processhostnameor IP addressargument and finish initialization
.

The destinationhost name or IP addressis processedby our host-serv function


returning a pointer to an addrinfo structure

Depending on the type of returned addressIPv4 or IPv6, initializing the proto


structure,store the pointer in the pr global and allocate additional socketaddress
structure ofthe correct size are finished.

The function traceloop shown in fig.4.46 sendsthe datagramsand reads the


returnedICMP messages.This is the main loop of the program.

Advonced Sockets

4.77

elseif (code> = 0)
printf("(ICMP%s)",(*pr->icmpcode)(code));
)
fflush(stdout);
)
printf("\n");

i
Fig.4.46 Traceloopfunction : main processing loop.

Createtwo sockets
.

Two socketsare needed:a raw socketon which all returnedICMP are readanda
Udp socketon which a probepacketsare sentwith the increasingTTLs.

After creatinga raw socket,our effectiveuserID to our realuserID is reset,since


privilegesarenot longerrequired.
superuser

Bind sourceport of UDP sockets


.

Bind a sourceport to theUDP socketthatis usedfor sending,usingthe low order


l5 bits of our processID with thehigh orderbit setto I .

Sinceit is possiblefor multiplecopiesof thetracerouteprogramto be runningat


wasgenerated
if a receivedICMPmessage
anygiventime,it is neededto determine
to a dtagramsentby another
or in response
to oneof our datagrams,
in response
copyof theprogram.

Thesourceport int theUDPheaderis usedto identifythesendingprocessbecause


the ICMP messagealwaysreturnsthe UDP headerfrom the datagramthat caused
the ICMP error.

Establishsignalhandler for SIGALRM


as the signalhandlerfor SIGALRM becauseeach
Sig_alrmfunctionis established
before
time a UDP datagramis sentand3 secondsis requiredto wait for an ICMP message
sendingthenextprobe.
Main loop;setTTL or hop limit andsend3 probes
.

The main loop of the functionis a doublenestedfor loop'

by it to l,while the innerloop


The outer loop starsthe TTL at I and increases
sends3 probesto the destination.

is calledto setthe new valueusingeither


Eachtime,the TTL changessetsockopt
IP_TTL or IPV6_UNICAST-HOPSsocketoption.

Eachtime aroundthe outer loop, the socketaddressstructureis initializedto 0.

Advonced Sockcls

4.79

This structurewill be comparedto the socketaddressstructurereturnedby recvfrom


when the ICMP messageis read,and if the 2 structuresare different, the IP address
from the new structureis printed.

Using this techniquethe IP addresscorrespondingto the first probe for eachTTL


is printed

If the IP addresschangesfor a given value of TTL, the new IP addressis then


printed.

Read ICMP message


The functions recv_v4 0r recv_v6 calls recvfrom to read and processthe ICMP messages
"time exceeded
that arereturned.These2 functions return -3 if a timeout occurs,-2 if an ICMP
in transit" error is received,- I if an ICMP "port unreachable"error is received,or the non
negativeICMP code if some other ICMP destinationerror is received.
Print reply
.

If this is a first reply for a given TTL or if the IP addresssof the node sendingthe
ICMP messagehas changedfor the TTL,the host name and IP addresssare printed.

The RTT is calculatedas the time difference from when the probe is sent to the
time the ICMP messageis returnedand printed.
Our recv_v4 function is shown inthe fig.4.47

#include "trace.h"
* Return:
-3 on timeout
*
-2 on ICMPtimeexceeded
in transit(callerkeepsgoing)
*
-1 on ICMPportunreachable
(calleris done)
int
recv_v4(int
seq,structtimeval*tv)
{
int
h l e n 1h, l e n 2i,c m p l e n ;
socklen_t len;
ssize_t
n;
*ip, *hip;
structip
structicmp *icmp;
structudphdr *udp;
alarm(3);
for(;;){
;gn = pr->salen;
n = recvfrom(recvfd,
recvbuf,sizeof(recvbuf),
0, pr->sarecv,
&len);
i f ( n < 0 ){
if (errno== EINTR)
return(-3); l* alarmexpired*/
else
4.80

NetworkProgramming
and Management

err_sys("
recvfrom error");
I
*/
Gettimeofday(tv,NULL); /* get time of packet arrival
*/
*)
=
lP
header
(struct
start
of
ip
recvbuf;
ip
/*
*/
h l e n l = i p - > i p _ h l < < 2 ; / * l e n g t ho f l P h e a d e r
*)
(recvbuf + hlenl); /* start of lcMP header */
icmp (struct icmp
=
if ( (icmplen n hlenl) < 8)
e r r _ q u i t ( " i c m p l e(n% d ) < 8 " , i c m p l e n ) ;
D&
i f ( i c m p - > i c m p _ t y p e= = I C M P - T I M X C E E &
i c m p - > i c m p _ c o d e= = l c M P _ T I M X C E E D _ I N T R A N{ S )
if(icmplen<8+20+8)
e r r _ q u i t ( " i c m p l e(n% d ) < 8 + 2 0 + 8 " , i c m p l e n ) ;
hip = (structip *) (recvbuf + hlenl + 8);
h l e n 2 = h i p - > i p _ h l< < 2 ;
u d p = ( s t r u c tu d p h d r * ) ( r e c v b u f+ h l e n l + 8 + h l e n 2 ) ;
&&
i f ( h i P - > i P - P= = I P P R O T O - U D P
)&
u d p - > u h _ s P o r t: : h t o n s ( s P o r t&
u d p - > u h _ d p o r t= = h t o n s ( d p o r t+ s e q ) )
return(-2); /* we hit an intermediaterouter */
) e l s e i f ( i c m p - > i c m p _ t y p e= = I C M P - U N R E A C H{)
if(icmplen<8+20+8)
e r r _ q u i t ( " i c m p l e(n% d ) < 8 + 2 0 + 8 " , i c m p l e n ) ;
h i p = ( s t r u c ti p " ) ( r e c v b u f+ h l e n l + 8 ) ;
h l e n 2= h i p - > i p _ h l< < 2 ;
u d p = ( s t r u c tu d p h d r * ) ( r e c v b u f+ h l e n l + 8 + h l e n 2 ) ;
&&
i f ( h i P - > i P - P= = I P P R O T O - U D P
)&
u d p - > u h _ s p o r t: = h t o n s ( s p o r t &
u d p - > u h _ d p o r t= = h t o n s ( d p o r t+ s e q ) ) {
i l ( i c m p - > i c m p _ c o d e= = I C M P - U N R E A C H - P O R T )
r e t u r n ( - 1 )/;" h a v e r e a c h e dd e s t i n a t i o n* /
else
r e t u r n ( i c m p - > i c m p _ c o d e )/ ;* 0 , 1 , 2 , . . . ' l
)
) else if (verbose){
printf(" (from %s: type : %d, code = o/od)\n",
pr->salen),
Sock_ntop_host(pr->sarecv,
i c m p -> i c m p _ t y p e ,i c m p - >i c m p _ c o d e;)
)
*/
/* Some other ICMP error, recvfrom$ again
) )
Fig.4.47 recv-v4funclion: readand processICMPv4nrcssages
Set alarm and read each ICMP

message

An alarmis set for 3 secondsin the futureand the functionentersa loop that calls
returnedon the raw socket.
recvfrom,readingeachICMPv4message
Get pointer to ICMP header
ip pointsto the beginningof the IPv4headerandicmp pointsto the beginningof the
ICMP header.
pointers,andlengthsusedby the code'
Fig.4.48showsthe variousheaders,
AdvoncedSockels

4.81

n
icmplen
hlen2

hlenl
lPv4
ICMP4
lPv4
header options header

fzo

ovtes o-40

ip

i cmp

lPv4
lPv4
h e a d e r options

t z o
hip

F--

o-40 t
I

UDP
header

udp
lPv4datagramthat
generated
the ICMP
error

Fig,4.48 Headers, pointers and lengths in processing ICMPv4 erron

ProcessICMP time exceededin transit message


If the ICMP messageis a " time exceededin transit" message,it is possibilily a reply
to one of our probes.hip points to the IPv4 headerthat is returnedin the ICMP message,
following the 8 byte ICMP header.udp points to the UDP headerthat follows. If the ICMP
messagewas generatedby a UDP datagramand if the sourceand destinationports of that
datgramare the valuesthat are sent,then this is a reply to out probe from an intermediate
router.
ProcessICMP port unreachablemessage
If the ICMP messageis a "destinationunreachable",then the UDP headeris returned
in the ICMP messageto see if the messageis a responseto our probe. If the ICMP code is
"port
unreachable"-1 is returnedand the final destinationis reached.If the ICMP message
is from one of ourprobes, but it is not a "port unreachable"then that ICMP port value is
returned.
Handle other ICMP messages
All other ICMP messagesare printed if the -v flag was specified.Thenext function
recv_v6 is shown infig.4.49
#include "trace.h"
* Return:
-3 on timeout
*
-2 on ICMPlime exceeded
in transit(callerkeepsgoing)
*
-'l on ICMPport unreachable
(calleris done)

int
4.82

NetworkProgramming
and Management

seq,structtimeval*tv)
recv_v6(int

#ifdef lPV6
hlen'|,hlen2,icmP6len;
int
n;
ssize_t
len;
socklen_t
structip6_hdr *ip6,*hiP6;
structicmp6_hdr*icmP6;
structudphdr *udp;
alarm(3);
for(;;){
len = pr->salen;
&len);
0, pr->sarecv,
recvbuf,sizeof(recvbuf),
n = recvfrom(recvfd,
if(n<0){
if (errno== EINTR)
return(-3); /* alarmexPired*/
else
recvfrom error");
err_sys("
)
NULL); /* get time of packetarrival*/
Gettimeofday(tv,
*/
ip6 = (structip6-hdr*) recvbuf; /r startof lPv6header
ip6_hd0;
hlenl = sizeof(struct
*) (recvbuf+ hlenl);/* ICMPhdr */
=
(struct
icmp6-hdr
icmp6
= n - hlenl) < 8)
il ( (icmp6len
(%d)< 8", icmp6len);
err_quit("icmp6len
= = ICMP6-TIME-EXCEEDED
&&
(icmp6->icmp6-type
if
==
ICMP6-TIME-EXCEED-TRANSIT)
icmp6->icmp6-code
{
if(icmp6len<8+40+8)
(%d)< I + 40 + 8", icmp6len);
err_quit("icmp6len
=
hip6 (structip6_hdr*) (recvbuf+ hlenl + 8);
ip6_hdr);
hlen2= sizeof(struct
udp = (structudphdr*) (recvbuf+ hlenl + I + hlen2);
:= IPPROTO-UDP
&&
if (hip6->ip6-nxt
:= htons(sPort)
&&
udp->uh_sport
== htons(dport
+ seq))
udp->uh_dport
router*/
return(-2); /* we hit an intermediate
==
{
) elseif (icmp6->icmp6-type ICMP6-DST-UNREACH)
if(icmpOlen<8+40+8)
(%d)< 8 + 40 + 8", icmp6len);
err_quit("icmp6len
hip6 = (structip6_hdr*) (recvbuf+ hlenl + 8);
hlen2= 40;
udp - (structudphdr*) (recvbuf+ hlenl + 8 + hlenz);
== IPPROTO-UDP
&&
if (hip6->ip6_nxt
== htons(sPort)
&&
udp->uh_sport
== htons(dport
+ seq)){
udp->uh_dport
= : ICMP6-DST-UNREACH-NOPORT)
if (icmp6->icmp6-code
*/
return(-1);
/* havereacheddestination
else
return(icmp6->icmp6_code);
/* 0, 1,2, ... *l
AdvoncedSockets

4.83

) else if (verbose){
printf(" (from %s: type = %d, code = %d)\n",
sarecv,pr->salen),
Sock_ntop_host(pr->
icmp6-> icmp6_type,icmp6-> icmp6-code);
lt

*/
/* Some other ICMP error, recvfromQagain
l

#endif
)
Fig.4.49 recv_v6 function: read and process ICMPv6 messoges.

This function is nearly identical to recv_v4. except for the different constant
namesand the different structuremembernames.Also the size of the IPv6 headeris a
fixed 40 bytes; while with IPv4 the headerlength field is fetched and multiply it by 4 to
accountfor any IP options. Fig.4.50 showsvarious headers,pointer and lengthsusedby
the code.

icmolen
hlenl
lPv4
UDP
lPv4
ICMP4 lPv4
lPv4
header options header header options header

oytes o-40 t
fzo
l
l
icmp

ip

t
hip
I
I

20

o-40 t

udp
that
lPv4datagram
generated
thelcMP
error

Fi9.4.50 Headers, pointers and lengths iu processing ICMPv6 error,

Two functionsare defined , icmpcode_v4and icmpcode_v6that can be called from the


bottom of the traceloop function to print a description string correspondingto an ICMP
destinationunreachableerror.
Fig.4.5l showsjust the IPv6 function.TheIPv4 function is similar, althoughlonger,as
ther are ICMPv4 destinationunreachablecodes.

4,84

NetworkProgramming
and Management

#include
char*

"trace.h

icmpcode_v6(intcode)

t
switch(code){
REACH-NOROUTE:
case ICMP6-DST-UN
routeto host");
return("no
:
REACH-ADMIN
case ICMP6-DST-UN
prohibited")
;
return("administratively
case ICMP6-DST-UNREACH-NOTNEIGHBOR:
a neighbor");
return("not
REACH-ADDR:
case ICMP6-DST-UN
;
return("addressunreachable")
case ICMP6-DST-UNREACH-NOPORT:
unreachable")
;
return("Port
default:
")
return("
Iunknowncode] ;
)
I

Fig.4.5 t relurn lhe string c:orrespondingto an ICMPv6 unreachable code.

programis our SIGALRM handler,the sig-alrm


The final functionin our traceroute
t'unctionshownin fi9.4.52.
"trace.h"
#include
void
signo)
sig_alrm(int
{

*/
return; /* just interruptthe recvfrom0

)
F ig.4. 5 2 sig-al rnr funct ion

All this function does is return, causing an error return of EINTR from the recvfrom in
either recv v4 or recv v6.

AdvoncedSockeis

4.85

S-ar putea să vă placă și