Sunteți pe pagina 1din 751

Introduction to Computer Networks

Slides courtesy: T. S. Eugene Ng

1
Organizing Network Functionality

• Many kinds of networking functionality


– e.g., encoding, framing, routing, addressing, reliability, etc.
• Many different network styles and technologies
– circuit-switched vs packet-switched, etc.
– wireless vs wired vs optical, etc.
• Many different applications
– ftp, email, web, P2P, etc.

• Network architecture
– How should different pieces be organized?
– How should different pieces interact?

2
Problem

Application SMTP SSH FTP HTTP

Transmission Coaxial Fiber Packet


Media cable optic radio

• new application has to interface to all existing media


– adding new application requires O(m) work, m = number of media
• new media requires all existing applications be modified
– adding new media requires O(a) work, a = number of applications
• total work in system O(ma) → eventually too much work to add
apps/media
• Application end points may not be on the same media!

3
Solution: Indirection
• Solution: introduce an intermediate layer that provides a single
abstraction for various network technologies
– O(1) work to add app/media
– Indirection is an often used technique in computer science

Application SMTP SSH NFS HTTP

Intermediate
layer

Transmission Coaxial Fiber 802.11


Media cable optic LAN

4
Network Architecture

• Architecture is not the implementation itself

• Architecture is how to “organize” implementations


– what interfaces are supported
– where functionality is implemented

• Architecture is the modular design of the network

5
Software Modularity

Break system into modules:

• Well-defined interfaces gives flexibility


– can change implementation of modules
– can extend functionality of system by adding new modules

• Interfaces hide information


– allows for flexibility
– but can hurt performance

6
Network Modularity

Like software modularity, but with a twist:

• Implementation distributed across routers and hosts

• Must decide both:


– how to break system into modules
– where modules are implemented

7
Outline

• Layering
– how to break network functionality into modules

• The End-to-End Argument


– where to implement functionality

8
Layering

• Layering is a particular form of modularization

• The system is broken into a vertical hierarchy of


logically distinct entities (layers)

• The service provided by one layer is based solely


on the service provided by layer below

• Rigid structure: easy reuse, performance suffers

9
ISO OSI Reference Model

• ISO – International Standard Organization


• OSI – Open System Interconnection
• Goal: a general open standard
– allow vendors to enter the market by using their own
implementation and protocols

10
ISO OSI Reference Model
• Seven layers
– Lower two layers are peer-to-peer
– Network layer involves multiple switches
– Next four layers are end-to-end

Host 1 Intermediate switch Host 2

Application Application
Presentation Presentation
Session Session
Transport Transport
Network Network Network
Datalink Datalink Datalink
Physical Physical Physical
Physical medium A Physical medium B

11
Layering Solves Problem

• Application layer doesn’t know about anything below


the presentation layer, etc.

• Information about network is hidden from higher


layers

• This ensures that we only need to implement an


application once!

12
Key Concepts
• Service – says what a layer does
– Ethernet: unreliable subnet unicast/multicast/broadcast
datagram service
– IP: unreliable end-to-end unicast datagram service
– TCP: reliable end-to-end bi-directional byte stream service
– Guaranteed bandwidth/latency unicast service
• Service Interface – says how to access the service
– E.g. UNIX socket interface
• Protocol – says how is the service implemented
– a set of rules and formats that govern the communication
between two peers

13
Physical Layer (1)

• Service: move information between two systems


connected by a physical link

• Interface: specifies how to send a bit

• Protocol: coding scheme used to represent a bit,


voltage levels, duration of a bit

• Examples: coaxial cable, optical fiber links;


transmitters, receivers

14
Datalink Layer (2)

• Service:
– framing (attach frame separators)
– send data frames between peers
– others:
• arbitrate the access to common physical media
• per-hop reliable transmission
• per-hop flow control

• Interface: send a data unit (packet) to a machine


connected to the same physical media
• Protocol: layer addresses, implement Medium Access
Control (MAC) (e.g., CSMA/CD)…

15
Network Layer (3)

• Service:
– deliver a packet to specified network destination
– perform segmentation/reassemble
– others:
• packet scheduling
• buffer management

• Interface: send a packet to a specified destination


• Protocol: define global unique addresses; construct
routing tables

16
Transport Layer (4)

• Service:
– Multiplexing/demultiplexing
– optional: error-free and flow-controlled delivery

• Interface: send message to specific destination

• Protocol: implements reliability and flow control

• Examples: TCP and UDP

17
Session Layer (5)

• Service:
– full-duplex
– access management (e.g., token control)
– synchronization (e.g., provide check points for long transfers)

• Interface: depends on service

• Protocol: token management; insert checkpoints,


implement roll-back functions

18
Presentation Layer (6)

• Service: convert data between various


representations

• Interface: depends on service

• Protocol: define data formats, and rules to convert


from one format to another

19
Application Layer (7)

• Service: any service provided to the end user

• Interface: depends on the application

• Protocol: depends on the application

• Examples: FTP, Telnet, WWW browser

20
Who Does What?

Host A Host B
Application Application
Presentation Presentation
Session Session
Router
Transport Transport
Network Network Network
Datalink Datalink Datalink
Physical Physical Physical
Physical medium

21
Logical Communication

• Layers interacts with corresponding layer on peer

Host A Host B
Application Application
Presentation Presentation
Session Session
Router
Transport Transport
Network Network Network
Datalink Datalink Datalink
Physical Physical Physical
Physical medium

22
Physical Communication

• Communication goes down to physical network, then


to peer, then up to relevant layer

Host A Host B
Application Application
Presentation Presentation
Session Session
Router
Transport Transport
Network Network Network
Datalink Datalink Datalink
Physical Physical Physical
Physical medium

23
Encapsulation
• A layer can use only the service provided by the layer
immediate below it
• Each layer may change and add a header to data packet

data data

data data
data data
data data

data data

data data
data data

24
Example: Postal System

Standard process (historical):


• Write letter
• Drop an addressed letter off in your local mailbox
• Postal service delivers to address
• Addressee reads letter (and perhaps responds)

25
Postal Service as Layered System

Layers:
• Letter writing/reading Customer Customer
• Delivery

Information Hiding:
• Network need not know letter contents Post Office Post Office

• Customer need not know how the


postal network works

Encapsulation:
• Envelope

26
Internet Protocol Architecture

• The TCP/IP protocol suite is A p p lic a tio n


Layer te ln e t, ftp , e m a il
the basis for the networks
that we call the Internet.
• The TCP/IP suite has four T ra n s p o rt
TC P , U D P
layers: Application, Layer
Transport, Network, and
(Data) Link Layer. N e tw o rk
• Computers (hosts) Layer IP , IC M P , IG M P
implement all four layers.
Routers (gateways) only (D a ta ) L in k
have the bottom two layers. D e v ic e D riv e rs
Layer

27
Functions of the Layers

– Service: Handles details of application programs. Application telnet, ftp, email


– Functions: Layer www, AFS
– Service: Controls delivery of data between hosts.
– Functions: Connection establishment/termination, Transport
TCP, UDP
error control, flow control, congestion control, Layer
etc.
Network IP, ICMP, OSPF
– Service: Moves packets inside the network.
Layer RIP, BGP
– Functions: Routing, addressing, switching, etc.

– Service: Reliable transfer of frames over a link. (Data) Link Ethernet, WiFi
– Functions: Synchronization, error control, flow Layer T1
control, etc.

28
Internet Protocol Architecture

FTP FTP
FTP protocol
program program

TCP TCP protocol TCP

IP IP protocol IP IP protocol IP

Ethernet Ethernet Ethernet ATM ATM ATM


Driver protocol Driver Driver protocol Driver

29
Internet Protocol Architecture

MPEG Servier MPEG Player


RTP protocol
program program

UDP UDP protocol UDP

IP IP protocol IP IP protocol IP

Ethernet Ethernet Ethernet ATM ATM ATM


Driver protocol Driver Driver protocol Driver

30
Encapsulation
• As data is moving down the protocol stack, each protocol
is adding layer-specific control information.
U s e r d a ta
A p p lic a tio n
A p p lic a tio n
H eader U s e r d a ta
TC P
TC P H eader A p p lic a tio n d a ta
IP TC P segm ent

IP H e a d e r TC P H eader A p p lic a tio n d a ta


E th e rn e t
IP d a ta g ra m
D riv e r
E th e r n e t E th e r n e t
H eader
IP H e a d e r TC P H eader A p p lic a tio n d a ta T r a ile r

E th e rn e t fra m e

31
Hourglass

Note: Additional protocols like routing


protocols (RIP, OSPF) needed to make
IP work

32
Implications of Hourglass

A single Internet layer module:

• Allows all networks to interoperate


– all networks technologies that support IP can exchange
packets

• Allows all applications to function on all networks


– all applications that can run on IP can use any network

• Simultaneous developments above and below IP

33
Reality

• Layering is a convenient way to think about networks


• But layering is often violated
– Firewalls
– Transparent caches
– NAT boxes

34
Summary

• Layering is a good way to organize network functions

• Unified Internet layer decouples apps from networks

• E2E argument argues to keep IP simple

• Be judicious when thinking about adding to the


network layer

35
OSI & Internet protocol suite

36
Where we work?

Sockets
API

Open/X
Transport
Interface

37
Two reasons for this design
• Upper three layers handle all the details of
application and know little about communication i.e.
sending, receiving data etc
• Upper three layers form a user process while the
lower four layers are provided as part of operating
system or kernel.
About kernel
Kernel
• the part of the operating system that is mandatory
and common to all other software
• simply the name given to the lowest level of
abstraction that is implemented in software
Functionalities of Kernel
• Process Management
• Memory Management
• Device Management
• System Calls
Process Management
• A kernel typically sets up an address space for the
process,
• loads the file containing the code into memory, sets
up a stack for the program and branches to a given
location inside the program, thus starting its
execution
Memory Management
• The kernel has full access to the system's memory and must
allow processes to safely access this memory as they require it.
• Virtual addressing allows the kernel to make a given physical
address appear to be another address, the virtual address.
• Virtual address spaces may be different for different processes;
Device Management
• Processes need access to the peripherals connected to the
computer, which are controlled by the kernel through device
drivers.
• For example, to show the user something on the screen, an
application would make a request to the kernel, which would
forward the request to its display driver, which is then
responsible for actually plotting the character/pixel
System Calls
• A process must be able to access the services provided by the
kernel. This is implemented differently by each kernel, but most
provide a C library or an API, which in turn invokes the related
kernel functions
• Implemented using software simulated interrupts
Programs and Processes
• A program is an executable file residing on disk. A
program is read into memory and executed by the
kernel
• An executing instance of a program is called a
process
• Every process has a unique non-negative identifier
called process id (PID)
Process Environment
• What happens when we execute a C program?
./a.out
• How the command-line arguments are passed to the
process?
• Memory layout of a process
What happens when we execute a C program?

• int main(int argc, char *argv[]);


• When a C program is executed by the kernel by one of the exec
functions, a special start-up routine is called before the main
function is called.
• The executable program file specifies this routine as the starting
address for the program;
• This start-up routine takes values from the kernel the command-
line arguments and the environment
Memory Layout of C Program
• Code - text segment
• Initialized data – data segment
• Uninitialized data – bss segment
• Heap
• Stack
Memory Layout of C Program
• Code - text segment
– Machine instructions that the CPU executes
– Sharable
– Read-only
Memory Layout of C Program
• Initialized data – data segment
– Variables initialized to non-zero values appearing outside
any function causes this variable to be stored in the
initialized data segment with its initial value.
– Statically allocated and global data that are initialized with
nonzero values live in the data segment
Memory Layout of C Program
• Uninitialized data – bss segment
– BSS stands for ‘Block Started by Symbol’.
– Global and statically allocated data that initialized to zero
by default are kept here
Memory Layout
• Stack
– The stack segment is where local (automatic) variables are allocated.
– The data is popped up or pushed into the stack following the Last In First
Out (LIFO) rule.
– When a function is called, a stack frame is created and PUSHed onto the
top of the stack. This stack frame contains information such as the address
from which the function was called and where to jump back to when the
function is finished (return address), parameters, local variables, and any
other information needed by the invoked function.
– When a function returns, the stack frame is POPped from the stack.
Typically the stack grows downward, meaning that items deeper in the call
chain are at numerically lower addresses and toward the heap.
Stack
Memory Layout of C Program
• Heap
– The heap is where dynamic memory (obtained by malloc(), calloc(),
realloc()) comes from.
– It is typical for the heap to grow upward. This means that successive items
that are added to the heap are added at addresses that are numerically
greater than previous items.
– The end of the heap is marked by a pointer known as the break. You cannot
reference past the break. You can, however, move the break pointer (via
brk() and sbrk() system calls) to a new position to increase the amount of
heap memory available.
Environment Variables
• Stored in process memory
• Set of parameters that are inherited from process to process.
• Each program is also passed an environment list like the
argument list.
• Environment list is an array of character pointers, with each
pointer containing the variable name and its value.
Environment Variables
Listing all arguments and environment vars
int
main (int argc, char *argv[])
{
int i;
char **ptr;
extern char **environ;
for (i = 0; i < argc; i++) /* echo all command-line args */
printf ("argv[%d]: %s\n", i, argv[i]);
for (ptr = environ; *ptr != 0; ptr++) /* and all env strings */
printf ("%s\n", *ptr);
exit (0);
}
Functions to access environment variables
Process Control
• Every process has a unique process ID, a non-
negative integer.
• Although unique, process IDs are reused. As
processes terminate, their IDs become candidates for
reuse.
• Process ID 0 is usually the scheduler process and is
often known as the swapper.
Process Control
• Process ID 1 is usually the init process and is invoked by the
kernel at the end of the bootstrap procedure. This process is
responsible for bringing up a UNIX system after the kernel has
been bootstrapped.
• The init process never dies. It is a normal user process, not a
system process within the kernel, although it does run with
super user privileges.
• init becomes the parent process of any orphaned child process.
Process Identifiers

#include <unistd.h>
• pid_t getpid(void);
Returns: process ID of calling process
• pid_t getppid(void);
Returns: parent process ID of calling process
• uid_t getuid(void);
Returns: real user ID of calling process
• uid_t geteuid(void);
Returns: effective user ID of calling process
• gid_t getgid(void);
Returns: real group ID of calling process
• gid_t getegid(void);
Returns: effective group ID of calling process
fork()
• An existing process can create a new one by calling the fork function.
#include <unistd.h>
pid_t fork(void);
Returns: 0 in child, process ID of child in parent, 1 on error
• The new process created by fork is called the child process. This
function is called once but returns twice. The only difference in the
returns is that the return value in the child is 0, whereas the return value
in the parent is the process ID of the new child
fork()
• Both the child and the parent continue executing with the
instruction that follows the call to fork.
• The child is a copy of the parent. For example, the child gets a
copy of the parent's data space, heap, and stack. Note that this
is a copy for the child; the parent and the child do not share
these portions of memory. The parent and the child share the
text segment
copy-on-write (COW)
• don't perform a complete copy of the parent's data, stack, and
heap
• These regions are shared by the parent and the child and have
their protection changed by the kernel to read-only
• If either process tries to modify these regions, the kernel then
makes a copy of that piece of memory only, typically a "page" in
a virtual memory system.
int glob = 6; //global variable
int
main ()
{
int var;
pid_t pid;
var = 88;
printf ("Before fork\n");
if ((pid = fork ()) < 0)
perror ("fork"); //function to print error that occurred in the process
else if (pid == 0)
{
glob++;
var++;
printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var);
exit (0);
}
else
{
printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var);
exit (0);
}
}
fork()
• In general, we never know whether the child starts executing
before the parent or vice versa. This depends on the
scheduling algorithm used by the kernel.
• To synchronize child and parent, some form of interprocess
communication is required.
File sharing between parent and child

• one characteristic of fork is that all file descriptors that are open
in the parent are duplicated in the child.
• The parent and the child share a file table entry for every open
descriptor .
• Generally shell process has three different files opened for
standard input, standard output, and standard error. When a
command is executed as a process, they are inherited
vfork()
• The vfork function is intended to create a new process when the
purpose of the new process is to exec a new program
• The vfork function creates the new process, just like fork,
without copying the address space of the parent into the child,
as the child won't reference that address space
• vfork guarantees that the child runs first, until the child calls
exec or exit. When the child calls either of these functions, the
parent resumes.
What child inherits?
• Real user ID, real group ID, effective user ID, effective group ID
• Current working directory
• Root directory
• File mode creation mask
• Environment
• Process group ID
• Session ID
• Controlling terminal
• Attached shared memory segments
• Memory mappings
• Resource limits
What values in child are different from parent?

• The return value from fork


• The process IDs are different
• The two processes have different parent process IDs: the parent
process ID of the child is the parent; the parent process ID of the parent
doesn't change
• The child's tms_utime, tms_stime, tms_cutime, and tms_cstime values
are set to 0
• File locks set by the parent are not inherited by the child
• Pending alarms are cleared for the child
• The set of pending signals for the child is set to the empty set
Process Termination
• Normal Termination
– Return from main
– Calling exit
– Calling _exit or _Exit
– Return of the last thread from its start routine
– Calling pthread_exit from the last thread
• Abnormal termination
– Calling abort
– Receipt of a signal
– Response of the last thread to a cancellation request
Process Termination
• Regardless of how a process terminates, the same code in the kernel is
eventually executed. This kernel code closes all the open descriptors
for the process, releases the memory that it was using, and the like.
• Te able to notify its parent how it terminated, child passes an exit status
as the argument to exit functions (exit, _exit, and _Exit),
• In the case of an abnormal termination, however, the kernel, not the
process, generates a termination status to indicate the reason for the
abnormal termination.
• In any case, the parent of the process can obtain the termination
status using wait or the waitpid function
Process Termination
• When a process terminates, either normally or abnormally, the
kernel notifies the parent by sending the SIGCHLD signal to the
parent.
• This signal is the asynchronous notification from the kernel to
the parent. The parent can choose to ignore this signal, or it can
provide a function that is called when the signal occurs: a signal
handler.
• The default action for this signal is that it is ignored.
wait() & waitpid()
• Parent can obtain termination status from kernel using these
calls
• Process that calls wait or waitpid can
– Block, if all of its children are still running
– Return immediately with the termination status of a child, if a child
has terminated and is waiting for its termination status to be fetched
– Return immediately with an error, if it doesn't have any child
processes
Syntax
waitpid()
main ()
{
int i = 0, j = 0;
pid_t ret;
int status;
ret = fork ();
if (ret == 0)
{
for (i = 0; i < 5000; i++)
printf ("Child: %d\n", i);
printf ("Child ends\n");
}
else
{
wait (&status);
printf ("Parent resumes.\n");
for (j = 0; j < 5000; j++)
printf ("Parent: %d\n", j);
}
}
What happens if parent terminates before child?

• the init process becomes the parent process of any


process whose parent terminates ( process has been
inherited by init)
• parent process ID of the surviving process is
changed to be 1 (the process ID of init). This way,
we're guaranteed that every process has a parent.
What happens when a child terminates before
its parent ?
• Kernel keeps small amount of information (process
ID, the termination status of the process, and the
amount of CPU time taken by the process ) until
parent asks for it
• a process that has terminated, but whose parent has
not yet waited for it, is called a zombie
exec functions
• fork function creates a new process (the child). Then causes another
program to be executed by calling one of the exec functions.
• When a process calls one of the exec functions, that process is
completely replaced by the new program, and the new program starts
executing at its main function.
• The process ID does not change across an exec, because a new
process is not created;
• exec replaces the current process, its text, data, heap, and stack
segments with a new program from disk.
#include <unistd.h>
• int execl(const char *pathname, const char *arg0, ... /*
(char *)0 */ );
• int execv(const char *pathname, char *const argv []);
• int execle(const char *pathname, const char *arg0, ... /*
(char *)0, char *const envp[] */ );
• int execve(const char *pathname, char *const argv[], char
*const envp []);
• int execlp(const char *filename, const char *arg0, ... /*
(char *)0 */ );
• int execvp(const char *filename, char *const argv []);
Remembering arguments

Function pathname filename Arg list argv[] environ envp[]

execl
•   •   •  
execlp
  • •   •  
execle
•   •     •
execv
•     • •
execvp
  •   • •  
execve
•     •   •
(letter in p l v e
name)    
Example

Output: Executes ls command with –l option


int main ()
{
execl ("/bin/ls", "ls", "-l", (char *) 0);
printf ("hello");
}
• Input: a command to execute and its arguments
int main(int argc, char **argv)
{
execvp(argv[1], argv+1);
}
Signals
• A signal is an asynchronous event which is delivered
to a process.

• Asynchronous means that the event can occur at any


time
– may be unrelated to the execution of the process
– e.g. user types ctrl-C, or the modem hangs
Signals
• Name Description Default Action
SIGINT Interrupt character typed terminate process
SIGQUIT Quit character typed (^\) terminate + create
core image
SIGKILL kill -9 terminate process
SIGSEGV Invalid memory reference terminate +
create core image
SIGPIPE Write on pipe but no reader terminate process
SIGALRM alarm() clock ‘rings’ terminate process
SIGUSR1 user-defined signal type terminate process
SIGUSR2 user-defined signal type terminate process

• See man 7 signal


Signal Sources
• Terminal-generated signals: SIGINT, SIGQUIT
• Hardware exceptions generate signals: SIGFPE, SIGSEGV
• kill function allows a process to send any signal to another
process or process group
• The kill command allows us to send signals to other processes.
• Software conditions: SIGURG, SIGPIPE, SIGALRM
kill() and raise()function
• Send a signal to a process (or group of processes).

#include <signal.h>
int kill( pid_t pid, int signo );
int raise(int signo);

• pid > 0 send signal to process pid


pid== 0 send signal to all processes
whose process group ID equals the sender’s pgid.
e.g. parent kills all children
• Return 0 if ok, -1 on error.
Responding to a Signal
• A process can:
– ignore/discard the signal (not possible with SIGKILL or
SIGSTOP)
– Catch the signal and execute a signal handler function, and
then possibly resume execution
– Let the default action apply. Every signal has a default action
• The choice is called the signal disposition
Signal Handler Function

• Specify a signal handler function to deal with a signal type.


• #include <signal.h>
typedef void Sigfunc(int); /* my defn */
Sigfunc *signal( int signo, Sigfunc *handler );
– signal returns a pointer to a function that takes an int (i.e. it returns a
pointer to Sigfunc)
• Returns previous signal disposition if ok, SIG_ERR on error.
Example

int main()
{
signal( SIGINT, foo );
:

/* do usual things until SIGINT */


return 0;
}
void foo( int signo )
{
: /* deal with SIGINT signal */

return; /* return to program */


}
Special Sigfunc * Values

• Value Meaning

SIG_IGN Ignore / discard the signal.

SIG_DFL Use default action to handle signal.

SIG_ERR Returned by signal() as an error.


Signals - History

• Unreliable Signals - Orignal System V (SVR2 and earlier)


implementation.
– Handlers are not persistent, while reinstalling handlers signals
could get lost
– No way to block a signal while executing critical code
• Reliable Signals - BSD and SVR3.
– Handlers are persistent
– Process can tell kernel to block or unblock signals
Signals Overview
• Three phases to processing signals:
– Signal is generated
• when the event that causes the signal occurs
– Signal is delivered
• signal is said to be delivered to the process when process takes
action for the signal
– Signal is pending
• during the time between generation and delivery, the signal is
said to be pending
Signal blocking
• Blocking the delivery of a signal
– process informs the signal to be blocked to kernel
– When such signal is generated for the process, if the action
is not ignore, that signal remains pending until the process
either unblocks it or changes action to ignore
Multiple Signals
• If a blocked signal is generated more than once then
in most systems the signal is delivered only once.
That is the signal is not queued.
• If many signals of different types are ready to be
delivered (e.g. a SIGINT, SIGSEGV, SIGUSR1), they are
not delivered in any fixed order.
Signal Sets
• A data type to represent multiple signals
• #include <signal.h>
– int sigemptyset(sigset_t *set);
– int sigfillset(sigset_t *set);
– int sigaddset(sigset_t *set, int signo);
– int sigdelset(sigset_t *set, int signo);
All four return: 0 if OK, 1 on error int
– sigismember(const sigset_t *set, int signo);
– Returns: 1 if true, 0 if false, 1 on error
sigprocmask()

• A process uses a signal set to create a mask which


defines the signals it is blocking from delivery. – good for
critical sections where you want to block certain signals.
• #include <signal.h>
int sigprocmask( int how,
const sigset_t *set,
sigset_t *oldset);
• how – indicates how mask is modified
‘how’ Meanings

• Value Meaning

SIG_BLOCK set signals are added to mask

SIG_UNBLOCK set signals are removed from mask

SIG_SETMASK set becomes new mask


A Critical Code Region

sigset_t newmask, oldmask;

sigemptyset( &newmask );
sigaddset( &newmask, SIGINT );

/* block SIGINT; save old mask */


sigprocmask( SIG_BLOCK, &newmask, &oldmask );

/* critical region of code */

/* reset mask which unblocks SIGINT */


sigprocmask( SIG_SETMASK, &oldmask, NULL );
sigaction()

• Supercedes (more powerful than) signal()


– sigaction() can be used to code a non-
resetting signal()
• #include <signal.h>
int sigaction(int signo,
const struct sigaction *act,
struct sigaction *oldact );
sigaction Structure
struct sigaction
{
void (*sa_handler)( int );
/* action to be taken or SIG_IGN, SIG_DFL */
sigset_t sa_mask; /* additional signal to be blocked */
int sa_flags; /* modifies action of the signal */
void (*sa_sigaction)( int, siginfo_t *, void * );
/*The sa_sigaction field is an alternate signal handler used when the
SA_SIGINFO flag is used with sigaction. */
}
• sa_flags –
– SIG_DFL reset handler to default upon return
– SA_SIGINFO denotes extra information is passed to handler (.i.e. specifies the
use of the “second” handler in the structure.
sigaction() Behavior

struct siginfo {
• A signo
int signal causes
si_signo; /* the sa_handler
signal numbersignal
*/ handler to be
called.
int si_errno; /* if nonzero, errno value from
• While */ executes, the signals in sa_mask are blocked.
sa_handler
<errno.h>
Any
int more signo signals
si_code; are also blocked.
/* additional info (depends on
signal) */remains installed until it is changed by another
• sa_handler
sigaction()
pid_t si_pid; call. No
/* reset problem.
sending process ID */
• sa_sigaction
uid_t si_uid;specifies
/* handler
sending if SA_SIGINFO
process realflag isuser
set. ID */
void *si_addr; /* address that caused the fault
*/
int si_status; /* exit value or signal number */
long si_band; /* band number for SIGPOLL */
/* possibly other fields also */
};
Other POSIX Functions
• sigpending() examine blocked signals

• sigsetjmp()
siglongjmp() jump functions for use
in signal handlers which
handle masks correctly

• sigsuspend() atomically reset mask


and sleep
pause()
• Suspend the calling process until a signal is caught.
• #include <unistd.h>
int pause(void);
• Returns -1 with errno assigned EINTR.
• pause() only returns after a signal handler has returned.
alarm()

• Set an alarm timer that will ‘ring’ after a specified


number of seconds
– a SIGALRM signal is generated

• #include <unistd.h>
long alarm(long secs);

• Returns 0 or number of seconds until previously set


alarm would have ‘rung’.
Some aspects of alarm()
• A process can have at most one alarm timer running
at once.
• If alarm() is called when there is an existing alarm
set then it returns the number of seconds remaining
for the old alarm, and sets the timer to the new alarm
value.
• An alarm(0) call causes the previous alarm to be
cancelled.
setjmp() and longjmp()

• In C we cannot use goto to jump to a label in another function


– use setjmp() and longjmp() for those ‘long jumps’
• Uses :
– error handling which requires a deeply nested function to recover to
a higher level (e.g. back to main())
– coding timeouts with signals
Prototypes

• #include <setjmp.h>
int setjmp( jmp_buf env );
• Returns 0 if called directly, non-zero if returning from a call to longjmp().
• #include <setjmp.h>
void longjmp( jmp_buf env, int val );
• In the setjmp() call, env is initialized to information about the current
state of the stack.
• The longjmp() call causes the stack to be reset to its env value.
• Execution restarts after the setjmp() call, but this time setjmp()
returns val.
Example
jmp_buf env; /* global */
int main(){
char line[MAX];
int errval;
if(( errval = setjmp(env) ) != 0 )
printf( “error %d: restart\n”, errval );
while( fgets( line, MAX, stdin ) != NULL )
process_line(line);
return 0;
}

continued
:
void process_line( char * ptr )
{
:
cmd_add()
:
}

void cmd_add()
{
int token;
token = get_token();
if( token < 0 ) /* bad error */
longjmp( env, 1 );
/* normal processing */
}
int get_token()
{
if( some error )
longjmp( env, 2 );
}
Stack Frames before calling longjmp()

top of stack
main()
stack frame
setjmp(env)
returns 0;
direction of env records stack
stack growth frames info
Stack Frames after longjmp()

top of stack
main()
stack frame

process_line()
stack frame
direction of
stack growth :
:
longjmp(env,1)
cmd_add() causes stack frames
stack frame to be reset
What happens if longjmp() is called in signal
handler?
• Signal is automatically added to signal mask (which
prevents it from further delivery) when a signal
handler is is entered. When signal handler is exited,
signal is removed from the mask.
• When longjmp() is called in signal handler, the signal
remains blocked.
siglongjmp & sigsetjmp
• POSIX does not specify whether longjmp will restore the signal context. If you
want to save and restore signal masks, use siglongjmp.
• POSIX does not specify whether setjmp will save the signal context. If you
want to save signal masks, use sigsetjmp.

• #include <setjmp.h>
• int sigsetjmp(sigjmp_buf env, int savemask);
Returns: 0 if called directly, nonzero if returning from a call to siglongjmp
• void siglongjmp(sigjmp_buf env, int val);
Inter Process Communication

122
Why do processes communicate?

 To share resources
 Client/server paradigms
 Inherently distributed applications
 Reusable software components
 etc

123
Types of IPC
• Message Passing
– Pipes, FIFOs, and Message Queues
• Synchronization
– Mutexes, condition variables, read-write locks, file and record locks,
and semaphores
• Shared memory
• Remote Procedure Calls
– Solaris doors and Sun RPC
Sharing of information
What is IPC?
• Each process has a private address space. Normally, no
process can write to another process’s space. How to get
important data from process A to process B?
• Message passing between different processes running on
the same operating system is IPC
• Synchronization is required in case of IPC through shared
memory or file system
Pipes
• Pipes are the oldest form of UNIX System IPC and are provided
by all UNIX systems
• Most commonly used form of IPC
• Historically, they have been half duplex (i.e., data flows in only
one direction).
• Because they don’t have names, pipes can be used only
between processes that have a common ancestor.
– Normally, a pipe is created by a process, that process calls fork,
and the pipe is used between the parent and the child.
UNIX Pipes
Parent process, p1 Child process, p2

Info to be
Info to be Info copy
Info copy
shared
shared
int p[2];
pipe(p); read(p[0], inbuf, size);
write(p[1], “hello”, size); ….
….

pipe for p1  and p2

write function Φ olleh read function


FIFO buffer
size = 4096 characters
Pipes
• #include <unistd.h>
• int pipe(int fd[2]); returns 0 if OK,
else -1
• fd[0]-> for reading, fd[1] is for writing
Pipes
• Pipes are rarely used in a single process. They are generally
used between parent and child
Pipes
main ()
{
int i;
int p[2];
pid_t ret;
pipe (p); //creating pipe
char buf[100];
ret = fork ();
if (ret == 0)
{
write (p[1], "hello", 6);//writing to parent through pipe
}
if (ret > 0)
{
read (p[0], buf, 6); //reading from child via pipe
printf ("Child Said:%s\n", buf); //printing to stdout
}
}
Pipes: who|sort

stdout
who|sort
• Create a pipe in the parent
• Fork a child
• Duplicate the standard output descriptor to write end of pipe
• Exec ‘who’ program
• In the parent wait for the child.
• Duplicate the standard input descriptor to read end of pipe
• Exec ‘sort’ program
who|sort
main ()
{ int i;
int p[2];
pid_t ret;
pipe (p);
ret = fork ();
if (ret == 0)
{
close (1);
dup (p[1]);
close (p[0]);
execlp (“who", “who", (char *) 0);
}
if (ret > 0)
{
close (0);
dup (p[0]);
close (p[1]);
wait (NULL);
execlp (“sort", “sort", (char *) 0);
}}
dup and dup2 Functions
• #include <unistd.h>
• int dup(int filedes);
• int dup2(int filedes, int filedes2);
Both return: new file descriptor if OK, 1 on error
• The new file descriptor returned by dup is guaranteed to be the lowest-
numbered available file descriptor.
• With dup2, we specify the value of the new descriptor with the filedes2
argument. If filedes2 is already open, it is first closed. If filedes equals
filedes2, then dup2 returns filedes2 without closing it.
dup and dup2
Popen
• #include <stdio.h>
• FILE *popen(const char *cmdstring, const char *type);

• Returns: file pointer if OK, NULL on error


• int pclose(FILE *fp);
popen
• Popen does
– creating a pipe, forking a child, closing the unused ends of
the pipe, executing a shell to run the command, and waiting
for the command to terminate
– fp = popen("ls *.c", "r");
FIFOs
• Pipes have no names and their biggest disadvantage is used
between processes that have common ancestor
• FIFO is similar to pipe. It stands for First In First Out. FIFOs are
also known as ‘named pipes’
• Half duplex
• Unlike pipes, FIFOs have pathname associated with it
– Allows unrelated processes to access a single FIFO
Name Spaces
• When two unrelated processes use some type of IPC
to exchange information, the IPC object must have a
name or identifier of some form
• The set of possible names for a given type of IPC is
called its name space
• FIFOs have pathname in the file system as identifier
FIFOs
• Create a FIFO
– #include <sys/types.h>
– #include <sys/stat.h>
– int mkfifo(const char *pathname, mode_t mode)
//returns 0 if OK or -1
• Ex: if( mkfifo("fifo1", 0666)<0) perror();
– mkfifo returns error ‘EEXIST’ if the FIFO already exists at the
given path
FIFOs
• Once a FIFO is created, it should be opened either for reading
or writing
– wfd=open("fifo1",O_WRONLY); or
– FILE *fp = fopen(“fifo1”, “w”);
• FIFO can’t be opened both for reading and writing at the same
time
• Unlike pipe, FIFO is not deleted as soon as all the processes
referring to it exit. It has to be explicitly deleted from system.
– unlink(“fifo1”)
FIFOs between parent and child
FIFOs between parent and child
Properties of FIFO
FIFOs between parent and child

Swap these two calls and see


Non-blocking option
• A descriptor can be set non-blocking in one of the
two ways

Or
Read and write operations Pipe and FIFO
Writing to pipe/fifo when pipe/fifo is open for
reading
• If data size is less than or equal to PIPE_BUF, the write is atomic i.e.
either all the data is written or no data written
• If there is no room in the pipe for the requested data (<PIPE_BUF), by
default it blocks.
– If O_NONBLOCK option is set, EAGAIN error is returned
• If data is >PIPE_BUF and O_NONBLOCK option is set, even if 1 byte
space is available in the pipe, it will write that much data and return
– Atomicity is not guaranteed
Message Queues
• A message queue is a linked list of messages stored within the
kernel and identified by a message queue identifier
• Any process with adequate privileges can place the message
into the queue and any process with adequate privileges can
read from queue
• There is no requirement that some process must be waiting to
receive message before sending the message
Message Queues
• Every message queue has following structure in kernel
Message Queues
Permissions
• struct ipc_perm {
uid_t uid; /* owner's effective user id */
gid_t gid; /* owner's effective group id */
uid_t cuid; /* creator's effective user id */
gid_t cgid; /* creator's effective group id */
mode_t mode; /* access modes */ . . . };
• Permission Bit
– user-read 0400
– user-write (alter) 0200
– group-read 0040
– group-write (alter) 0020
– other-read 0004
– other-write (alter) 0002
Message Queues
• First msgget is used to either open an existing queue or create a new
queue
• #include <sys/msg.h>
int msgget(key_t key, int flag);
– Returns: message queue ID if OK, 1 on error
• Key value can be IPC_PRIVATE, key generated by ftok() or any key
(long integer)
• Flag value must be
– IPC_CREAT if a new queue has to be created
– IPC_CREAT and IPC_EXCL if want to create a new a queue but don’t
reference existing one
Key Values
• The server can create a new IPC structure by specifying a key of
IPC_PRIVATE
– Kernel generates a uniqe id
• The client and the server can agree on a key by defining the key in a
common header.
• The client and the server can agree on a pathname and project ID
and call the function ftok to convert these two values into a key.
– #include <sys/ipc.h>
– key_t ftok(const char *path, int id);
– The path argument must refer to an existing file. Only the lower 8 bits of
id are used when generating the key.
Message Queues
• When a new queue is created, the following members of the
msqid_ds structure are initialized.
– The ipc_perm structure is initialized
– msg_qnum, msg_lspid, msg_lrpid, msg_stime, and msg_rtime are
all set to 0.
– msg_ctime is set to the current time.
– msg_qbytes is set to the system limit.
• On success, msgget returns the non-negative queue ID. This
value is then used with the other three message queue
functions.
Messages
• Each message is composed of a positive long integer type field, and the actual
data bytes. Messages are always placed at the end of the queue.
• Messaeg Template

• Most applications define their own message structure according to the needs of
the application
Sending Messages
• #include <sys/msg.h>
int msgsnd(int msqid, const void *ptr, size_t nbytes, int
flag);
• msqid is the id returned by msgget sys call
• The ptr argument is a pointer to a message structure
• Nbytes is the length of the user data i.e. sizeof(struct mesg) – size
of(long). Length can be zero.
• A flag value of 0 or IPC_NOWAIT can be specified
• mssnd() is blocked until one of the following occurs
– Room exists for the message
– Message queue is removed (EIDRM error is returned)
– Interrupted by a signal ( EINTR is returned)

158
Receiving Messages

• ptr points to the message structure where message will be stord


• Length points to the size available on the message structure excluding
size of (long)
• Type indicates the message desired on the message queue
• Flag can be 0 or IPC_NOWAIT or MSG_NOERROR

159
Receiving Messages
• The type argument lets us specify which message we want.
– type == 0: The first message on the queue is returned.
– type > 0:The first message on the queue whose message type equals type
is returned.
– type < 0:The first message on the queue whose message type is the lowest
value less than or equal to the absolute value of type is returned.
• A nonzero type is used to read the messages in an order other than
first in, first out.
– Priority to messages, Multiplexing

160
Receiving Messages
• IPC_NOWAIT flag makes the operation nonblocking, causing msgrcv to
return -1 with errno set to ENOMSG if a message of the specified type
is not available.
• If IPC_NOWAIT is not specified, the operation blocks until
– a message of the specified type is available,
– the queue is removed from the system (-1 is returned with errno set to
EIDRM)
– a signal is caught and the signal handler returns (causing msgrcv to return 1
with errno set to EINTR).

161
Receiving Messages
• If the returned message is larger than nbytes and the
MSG_NOERROR bit in flag is set, the message is truncated.
– no notification is given to us that the message was truncated, and
the remainder of the message is discarded.
• If the message is too big and MSG_NOERROR is not specified,
an error of E2BIG is returned instead (and the message stays
on the queue).

162
Control Operations on Message Queues
• #include <sys/msg.h>
int msgctl(int msqid, int cmd, struct msqid_ds *buf );
• IPC_STAT: Fetch the msqid_ds structure for this queue, storing it in the
structure pointed to by buf.
• IPC_SET: Copy the following fields from the structure pointed to by buf to the
msqid_ds structure associated with this queue: msg_perm.uid, msg_perm.gid,
msg_perm.mode, and msg_qbytes.
• IPC_RMID: Remove the message queue from the system and any data still on
the queue. This removal is immediate.
– Any other process still using the message queue will get an error of EIDRM on its next attempted operation on the queue.
– Above two commands can be executed only by a process whose effective user ID equals msg_perm.cuid or
msg_perm.uid or by a process with superuser privileges

163
Server.c
/*key.h*/
#define MSGQ_PATH "/home/students/f2007045/msgq_server.c " if ((msqid = msgget (key, IPC_CREAT | 0644)) == -1)
{
struct my_msgbuf perror ("msgget");
{ exit (1);
long mtype; }
char mtext[200]; printf ("server: ready to receive messages\n");
}; for (;;)
{
int main (void) if (msgrcv (msqid, &(buf.mtype), sizeof (buf), 0, 0) == -1)
{ {
struct my_msgbuf buf; perror ("msgrcv");
int msqid; exit (1);
key_t key; }
if ((key = ftok (MSGQ_PATH, 'B')) == -1) printf ("server: \"%s\"\n", buf.mtext);
{ }
perror ("ftok"); return 0;
exit (1); }
}

164
Client.c
#include "key.h“ printf ("Enter lines of text, ^D to quit:\n");
struct my_msgbuf buf.mtype = 1;
{
long mtype; while (gets (buf.mtext), !feof (stdin))
char mtext[200]; {
}; if (msgsnd (msqid, &(buf.mtype), sizeof (buf), 0) == -1)
perror ("msgsnd");
main (void) }
{
struct my_msgbuf buf; if (msgctl (msqid, IPC_RMID, NULL) == -1)
int msqid; {
key_t key; perror ("msgctl");
if ((key = ftok (MSGQ_PATH, 'B')) == -1) exit (1);
{ }
perror ("ftok");
exit (1); return 0;
} }
if ((msqid = msgget (key, 0) == -1)
{
perror ("msgget");
exit (1);
}

165
Multiplexing Messages

• Possibility of dead lock

166
Multiplexing Messages

167
System V Semaphores
• A semaphore is a primitive used to provide synchronization
between various processes (or between various threads in a
given process)
• Binary Semaphores: a semaphore that can assume only values
0 or 1
• Counting Semaphores: semaphore is initialized to N indicating
the number of resources

168
System V Semaphores

• Semaphores are maintained by kernel

169
Semaphore operations
• Create a semaphore and initialize it
– should be atomically done
• Wait for a semaphore: This tests the value of the semaphore. waits
(blocks) if the value is less than or equal to 0 and then decrements the
semaphore value once it is greater than 0 (aka P, lock, wait)
– Testing and decrementing should be a single atomic operation
• Post a semaphore. This increments the semaphore value. If any
processes are blocked waiting for this semaphores’s value o be greater
than 0, one of those processes are woken up (aka V, unlock, signal)

170
Producer Consumer Problem

• Producer produces one item and keeps in buffer.


• Consumer removes that item for processing
• How to synchronize?

171
Producer Consumer Problem

• Semaphore put controls whether the producer can place an item into the
shared buffer
• Semaphore get controls whether the consumer can remove an item from the
shred buffer

172
System V Semaphores
• Add one more level of detail by defining “a set of
counting semaphores”
• When we say System V semaphore it refers to a set
of couting semaphores ( max size of set is 25)

173
System V Semaphores
• Kernel maintains the following structure for every set

• Sem structure maintains info about each semaphore. Sem_base


contains pointer to an array of these structures

174
System V Semaphores
• Kernel structure for a semaphore set having 2 counting
semaphores

175
Creating Semaphores

• The number of semaphores in the set is nsems. If a new set is being


created, we must specify nsems. If we are referencing an existing set, we
can specify nsems as 0.
• When a new set is created, the following members of the semid_ds
structure are initialized.
– The ipc_perm structure
– sem_otime is set to 0.
– sem_ctime is set to the current time.
– sem_nsems is set to nsems.

176
Initializing a semaphore value

• Semnum specifies which semaphore (0,1,2 …)


• Semun union is used for some commands

• This union desn’t appear in any application, it should be declared in


your program

177
Testing whether semaphore has been
initilized
• When process P1 creates semaphore sem_otime is
set to zero.
• When P1 calls semctl to initialize and then semop,
sem_otime is set to current time.
• When process P2 checks sem_otime is non zero it
understands that semaphore has been initialized.

178
semctl() commands
• IPC_STAT, IPC_SET, IPC_RMID same as in message queues
• GETVAL: Return the value of semval for the member semnum.
• SETVAL: Set the value of semval for the member semnum. The value is
specified by arg.val.
• GETPID: Return the value of sempid for the member semnum.
• GETNCNT: Return the value of semncnt for the member semnum.
• GETZCNT: Return the value of semzcnt for the member semnum.
• GETALL: Fetch all the semaphore values in the set. These values are stored in
the array pointed to by arg.array.
• SETALL: Set all the semaphore values in the set to the values pointed to by
arg.array

179
Semaphore opearions

• Opsptr points to an array of following structure

• nops specifies number of structures in the array


• Semop gurantees that either all these operations are done or
none are done

180
Semaphore operations
• The operation on each member of the set is specified by the
corresponding sem_op value. This value can be negative, 0, or
positive.
• If sem_op>0:
– returning of resources by the process.
– Semval+=sem_op
– If the SEM_UNDO flag is specified, semadj -=sem_op
– subtracted from the semaphore's adjustment value for this process.

181
Semaphore operations
• If sem_op <0
– obtain resources that the semaphore controls.
• If semval>= |sem_op|
– the resources are available
– Semva -= |sem_op|
– If the SEM_UNDO flag is specified,
– semadj += sem_op
– added to the semaphore's adjustment value for this process.

182
Semaphore operations
• If semval < |sem_op|
– the resources are not available
– If IPC_NOWAIT is specified, semop returns with an error of EAGAIN.
– If IPC_NOWAIT is not specified, the semncnt value for this semaphore is incremented
(since the caller is about to go to sleep), and the calling process is suspended until
one of the following occurs.
• Semval>=|sem_op| i.e. some other process has released some resources. Semncnt--
• The semaphore is removed from the system. In this case, the function returns an error of
EIDRM.
• A signal is caught by the process, and the signal handler returns. and the function returns an
error of EINTR. semncnt--

183
Semaphore operations
• If sem_op = 0,
– this means that the calling process wants to wait until the semaphore's value becomes 0.
• If the semaphore's value is currently 0, the function returns immediately.
• If the semaphore's value is nonzero, the following conditions apply.
– If IPC_NOWAIT is specified, return is made with an error of EAGAIN.
– If IPC_NOWAIT is not specified, semzcnt++, and the calling process is suspended until one of the
following occurs.
• The semaphore's value becomes 0. semzcnt--
• The semaphore is removed from the system. In this case, the function returns an error of EIDRM.
• A signal is caught by the process, and the signal handler returns. the function returns an error of EINTR. Semzcnt--

184
Semval adjustment on process
termination
• it is a problem if a process terminates while it has resources allocated through a
semaphore.
• Whenever we specify the SEM_UNDO flag for a semaphore operation and we
allocate resources (a sem_op value less than 0), the kernel remembers how
many resources we allocated from that particular semaphore (the absolute value
of sem_op).
• When the process terminates, either voluntarily or involuntarily, the kernel
checks whether the process has any outstanding semaphore adjustments and, if
so, applies the adjustment to the corresponding semaphore value.
• If we set the value of a semaphore using semctl, with either the SETVAL or
SETALL commands, the adjustment value for that semaphore in all processes is
set to 0.

185
Producer Consumer
unsigned short val[1]; id = semget (KEY, 1, 0666);
id = semget (KEY, 1, IPC_CREAT | 0666); operations[0].sem_num = 0;
setval.val = 2; operations[0].sem_op = -1;
semctl (id, 0, SETVAL, setval); operations[0].sem_flg = 0;

operations[0].sem_num = 0; for (;;)


operations[0].sem_op = 0; {
operations[0].sem_flg = 0; retval = semop (id, operations, 1);
if (retval == 0)
{
operations[1].sem_num = 0; printf ("Consumer: Getting one object from shelf.\n");
operations[1].sem_op = 10; setval.array=val;
operations[1].sem_flg = 0; semctl (id, 0, GETALL, setval);
for (;;) printf("Sem Value: %d\n", setval.array[0]);
{ }
retval = semop (id, operations, 2); }
if (retval == 0)
{
printf ("Producer: Adding 10 objects\n");
getval.array = val;
semctl (id, 0, GETALL, getval);
printf ("Sem Val: %d\n", getval.array[0]);
}
}

186
Shared Memory
• Shared memory allows two or more processes to
share a given region of memory.
• This is the fastest form of IPC, because the data
does not need to be copied between the client and
the server

187
Message Passing

• Takes 4 copies to transfer data between two


processes

188
Shared Memory

• Takes only two steps


• Kernel is not involved in transferring data but it is involved in
creating shared memory

189
Memory mapped files

190
Memory mapped files
• proto argument for read-write access is
PROT_READ|PROTO_WRITE
• Flags must be either MAP_SHARED or
MAP_PRIVATE
• MAP_SHARED is used to share
memory with other processes

191
Why mmap()?
• It makes file handling easy. We open some file and
map that file into our process address space. To write
or read from file we don’t have to use read(), write()
or lseek()
• Another use is to provide shared memory between
unrelated processes

192
Counter Example

• Closing file has no effect on


memory mapping
• Memory mappings are
propagated to newly created
child

193
System V Shared Memory
• For every shared memory segment kernel maintains
the following structure

194
System V Shared Memory
• Creating or opening shared memory
– #include <sys/shm.h>
– int shmget(key_t key, size_t size, int flag);

in mo f
me ze o
by r y
s
te
Si
– Size is given as zero if we are referencing existing shared
memory segment
– When a new segment is created, the contents of the
segment are initialized with zeros

195
Attaching shared memory to a process
• Once a shared memory segment has been created, a process attaches
it to its address space by calling shmat.
– #include <sys/shm.h>
– void *shmat(int shmid, const void *addr, int flag);
Returns: pointer to shared memory segment if OK, 1 on error
• The address in the calling process at which the segment is attached
depends on the addr argument
• If addr is 0, the segment is attached at the first available address
selected by the kernel. This is the recommended technique.

196
Dettaching shared memory from a
process
• #include <sys/shm.h>
• int shmdt(void *addr);
• this does not remove the identifier and its associated data
structure from the system.
• The identifier remains in existence until some process (often a
server) specifically removes it by calling shmctl with a command
of IPC_RMID.

197
shmctl
• #include <sys/shm.h>
• int shmctl(int shmid, int cmd, struct shmid_ds *buf);
• IPC_STAT, IPC_SET same as other XSI IPC.
• IPC_RMID:
• Remove the shared memory segment set from the system. The
segment is not removed until the last process using the
segment terminates or detaches it.

198
Memory Mapping of /dev/zero
• Shared memory can be used between unrelated processes. But if the processes
are related, some implementations provide a different technique.
• The device /dev/zero is an infinite source of 0 bytes when read. This device also
accepts any data that is written to it, ignoring the data.
• An unnamed memory region is created and is initialized to 0.
• Multiple processes can share this region if a common ancestor specifies the
MAP_SHARED flag to mmap.

void *area;
if ((fd = open("/dev/zero", O_RDWR)) < 0) perror("open error");
if ((area = mmap(0, SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0)) == MAP_FAILED) perror();
close(fd); 199
Anonymous Memory Mapping
• A facility similar to the /dev/zero feature. To use this facility, we specify
the MAP_ANON flag to mmap and specify the file descriptor as -1.
• The resulting region is anonymous (since it's not associated with a
pathname through a file descriptor) and creates a memory region that
can be shared with descendant processes.

• this call, we specify the MAP_ANON flag and set the file descriptor to
-1.

void *area;
if ((area = mmap(0, SIZE, PROT_READ | PROT_WRITE,
MAP_ANON | MAP_SHARED, -1, 0)) == MAP_FAILED)
perror();

200
Shared Memory
• Between unrelated processes:
– XSI or System V shared memory
– can use mmap to map the same file into another process
address spaces using the MAP_SHARED flag.
• Between related processes
– Memory mapping of /dev/zero
– Unonymous memory mapping

201
• Pipes and FIFOS
• System V Message
Queues, Semaphores,
Shared Memory
• Posix Message Queues,
semaphores, shared
memory

202
Effect of fork, exec, _exit on IPC

203
TCP/UDP
TCP/IP
TCP or UDP
• At the internet layer, a destination address identifies a host
computer; no further distinction is made regarding which
process will receive the datagram
• TCP or UDP add a mechanism that distinguishes among
destinations within a given host, allowing multiple processes to
send and receive datagrams independently
UDP (User Datagram Protocol)

• UDP provides an unreliable connectionless delivery


service
• UDP uses IP to deliver datagrams to the right host.
• UDP uses ports to provide communication services to
individual processes.
Ports
• TCP/IP uses an abstract destination point called a
protocol port.
• Ports are identified by a positive integer.
• Operating systems provide some mechanism that
processes use, to specify a port.
Port Numbers
• The port numbers are divided into three ranges by Internet Assigned
Numbers Authority
• The well-known ports: 0 through 1023. These port numbers are
controlled and assigned by the IANA.
• The registered ports: 1024 through 49151. These are not controlled by
the IANA, but the IANA registers and lists the uses of these ports as a
convenience to the community.
• The dynamic or private ports, 49152 through 65535. The IANA says
nothing about these ports. These are what we call ephemeral ports.
(The magic number 49152 is three-fourths of 65536.)
Ports
UDP header

• Header size is 8 bytes


• Lack of reliability: If a datagram reaches its final destination but the checksum
detects an error, or if the datagram is dropped in the network, it is not delivered
to the UDP socket and is not automatically retransmitted.
• If we want to be certain that a datagram reaches its destination, we can build
lots of features into our application: acknowledgments from the other end,
timeouts, retransmissions, and the like.
Some standard UDP based services and their
ports
TCP
Transmission Control Protocol
• TCP provides connections between clients and servers.
• TCP uses the connection, not the protocol port, as its fundamental
abstraction.
• Connections are identified by a pair of endpoints.
– Endpoint means (ip, port)
• TCP provides:
– Connection-oriented
– Reliable
– Full-duplex
– Byte-Stream
Connection-Oriented
• Connection oriented means that a virtual connection is
established before any user data is transferred.
• A TCP client establishes a connection with a given server,
exchanges data with that server across the connection, and
then terminates the connection.
• If the connection cannot be established - the user program is
notified.
• If the connection is ever interrupted - the user program(s) is
notified.
Reliable
• TCP also provides reliability. When TCP sends data to the other
end, it requires an acknowledgment in return.
• If an acknowledgment is not received, TCP automatically
retransmits the data and waits a longer amount of time.
• After some number of retransmissions, TCP will give up
– the total amount of time spent trying to send data typically between
4 and 10 minutes (depending on the implementation).
Reliable
• How can TCP provide reliable transfer if the
underlying communication system offers only
unreliable packet delivery?
• Answer is positive acknowledgement with
retransmission.
Positive Acknowledgement with Retransmission
Positive Acknowledgement with Retransmission
Reliability - duplicates
• When an underlying packet delivery system duplicates packets.
– Duplicates can arise when networks experience high delays that cause
premature retransmission.
– Both packets and acknowledgements can be duplicated.
• To detect duplicate packets by assigning each packet a sequence
number and requiring the receiver to remember which sequence
numbers it has received.
• To avoid confusion caused by delayed or duplicated
acknowledgements, TCP acknowledgement specifies the sequence
number of the next octet that the receiver expects to receive.
Byte Stream
• Stream means that the connection is treated as a
stream of bytes.
– If payroll data is being sent, there are no boundaries in the
stream differentiating employee records
• The user application does not need to package data
in individual datagrams (as with UDP).
Buffering
• TCP is responsible for buffering data and determining
when it is time to send a datagram.
• It is possible for an application to tell TCP to send the
data it has buffered without waiting for a buffer to fill
up.
Full Duplex
• TCP provides transfer in both directions.
• To the application program these appear as 2
unrelated data streams, although TCP can piggyback
control and data communication by providing control
information (such as an ACK) along with user data.
TCP Ports
• Interprocess communication via TCP is achieved with
the use of ports (just like UDP).
• UDP ports have no relation to TCP ports (different
name spaces).
TCP Segments
• TCP views the data stream as a sequence of bytes that it
divides into segments for transmission. Segments carry varying
sizes of data.
• The chunk of data that TCP asks IP to deliver is called a TCP
segment.
• Each segment contains:
– data bytes from the byte stream
– control information that identifies the data bytes
TCP Segment Format
TCP Segments
• Segments are exchanged to establish connections, transfer
data, send acknowledgements, advertise window sizes, and
close connections.
• Because TCP uses piggybacking, acknowledgement can be
sent along with data
– an acknowledgement traveling from machine A to machine B may
travel in the same segment as data traveling from machine A to
machine B, even though the acknowledgement refers to data sent
from B to A
Flags

• TCP advertises how much data it is willing to accept every time


it sends segment by specifying its buffer size in the WINDOW
field.
Sliding Window
• TCP uses a specialized sliding window mechanism to solve two
important problems
– efficient transmission
– flow control.
• The TCP window mechanism makes it possible to send multiple
segments before an acknowledgement arrives.
• The TCP form of a sliding window protocol also solves the end-to-end
flow control problem, by allowing the receiver to restrict transmission
until it has sufficient buffer space to accommodate more data.
TCP Sliding Window
• Three markers are maintained

• octets upto 2 have been sent and acknowledged,


• octets 3 through 6 have been sent but not acknowledged,
• octets 7 though 9 have not been sent but will be sent without delay
• octets 10 and higher cannot be sent until the window moves
Variable Window Size and Flow Control

• Each acknowledgement contains a window advertisement that specifies


how many additional octets of data the receiver is prepared to accept.
• In response to an increased window advertisement, the sender
increases the size of its sliding window
• In response to a decreased window advertisement, the sender
decreases the size of its window and stops sending octets beyond the
boundary.
• In the extreme case, the receiver advertises a window size of zero to
stop all transmissions.
TCP Connection Establishment

• Three-way handshake
• It accomplishes two important functions.
– It guarantees that both sides are ready to transfer data (and that
they know they are both ready)
– it allows both sides to agree on initial sequence numbers.
• Sequence numbers are sent and acknowledged during the
handshake. Each machine must choose an initial sequence
number at random that it will use to identify bytes in the stream
it is sending.
TCP Connection Establishment

• When a client requests a connection, it sends a


“SYN” segment (a special TCP segment) to the
server port.
• SYN stands for synchronize. The SYN message
includes the client’s ISN.
• ISN is Initial Sequence Number.
TCP Connection Establishment

• Every TCP segment includes a Sequence Number


that refers to the first byte of data included in the
segment.
• Every TCP segment includes a Request Number
(Acknowledgement Number) that indicates the byte
number of the next data that is expected to be
received.
– All bytes up through this number have already been
received.
TCP Connection Establishment

• A server accepts a connection.


– Must be looking for new connections!
• A client requests a connection.
– Must know where the server is!
Client Starts
• A client starts by sending a SYN segment with the
following information:
– Client’s ISN (generated pseudo-randomly)
– Maximum Receive Window for client.
– Optionally (but usually) MSS (largest datagram accepted).
– No payload! (Only TCP headers)
Sever Response
• When a waiting server sees a new connection
request, the server sends back a SYN segment with:
– Server’s ISN (generated pseudo-randomly)
– Request Number is Client ISN+1
– Maximum Receive Window for server.
– Optionally (but usually) MSS
– No payload! (Only TCP headers)
Finally
• When the Server’s SYN is received, the client sends
back an ACK with:
– Request Number is Server’s ISN+1
TCP Connection Establishment
TCP Connection Establishment
TCP Connection Establishment

• Why is the third message necessary?


– HINTS:
• TCP is a reliable service.
• IP delivers each TCP segment.
• IP is not reliable.
• Why not each connection start with the initial
sequence number 1?
TCP Options
• MSS option. the maximum amount of data that it is willing to accept in
each TCP segment, on this connection.
• Window scale option. The maximum window that either TCP can
advertise to the other TCP is 65,535. This option specifies that the
advertised window in the TCP header must be scaled (left-shifted) by
0–14 bits, providing a maximum window of almost one gigabyte (65,535
x 214).
• Timestamp option. This option is needed for high-speed connections to
prevent possible data corruption caused by old, delayed, or duplicated
segments.
TCP Buffers
• Both the client and server allocate buffers to hold
incoming and outgoing data
– The TCP layer does this.
• Both the client and server announce with every ACK
how much buffer space remains (the Window field in
a TCP segment).
Send Buffers
• The application gives the TCP layer some data to send.
• The data is put in a send buffer, where it stays until the data is
ACK’d.
– it has to stay, as it might need to be sent again!
• The TCP layer won’t accept data from the application unless (or
until) there is buffer space.
Connection Termination
• The TCP layer can send a RST segment that
terminates a connection if something is wrong.
• Usually the application tells TCP to terminate the
connection gracefully with a FIN segment.
Connection Termination
FIN
• Either end of the connection can initiate termination.
• A FIN is sent, which means the application is done
sending data.
• The FIN is ACK’d.
• The other end must now send a FIN.
• That FIN must be ACK’d.
Connection Termination
TCP Connection State Diagram

• There are 11 different states defined for a connection


– based on the current state and the segment received in that state.
• One reason for showing the state transition diagram is to show
the 11 TCP states with their names. These states are displayed
by netstat, which is a useful tool when debugging client/server
applications
What is the purpose of TIME_WAIT?
• Once a TCP connection has been terminated (the last ACK
sent) there is some unfinished business:
– What if the ACK is lost? The last FIN will be resent and it must be
ACK’d.
– What if there are lost or duplicated segments that finally reach the
incarnation of the previous connection after a long delay?
• The MSL is the maximum amount of time that any given IP
datagram can live in a network
Socket Pair
• The socket pair for a TCP connection is the four-tuple that defines the
two endpoints of the connection:
– the local IP address, local port, foreign IP address, and foreign port.
• A socket pair uniquely identifies every TCP connection on a network.
• The two values that identify each endpoint, an IP address and a port
number, are often called a socket.
• We can extend the concept of a socket pair to UDP, even though UDP
is connectionless.
Socket Pair
Writing to TCP Socket
Writing to UDP Socket
Sockets

259
TCP/IP Model
TCP/IP
• TCP/IP does not include an API definition.
• There are a variety of APIs for use with TCP/IP:
– Sockets
– TLI, XTI
– Winsock
– MacTCP
Functions needed:
• Specify local and remote communication endpoints
• Initiate a connection
• Wait for incoming connection
• Send and receive data
• Terminate a connection gracefully
• Error handling
Berkeley Sockets
• Generic:
– support for multiple protocol families.
– address representation independence
• Uses existing I/O programming interface as much as
possible.
– Socket api is similar to file I/O
Socket
• A socket is an abstract representation of a communication
endpoint.
• Sockets work with Unix I/O services just like files, pipes &
FIFOs.
• Sockets (obviously) have special needs over files:
– establishing a connection
– specifying communication endpoint addresses
Unix Descriptor Table
Socket Descriptor Data Structure
Creating a Socket

int socket(int family,int type,int proto);

• family specifies the protocol family (AF_INET for


TCP/IP).
• type specifies the type of service (SOCK_STREAM,
SOCK_DGRAM).
• protocol specifies the specific protocol (usually 0,
which means the default).
socket()
• The socket() system call returns a socket
descriptor (small integer) or -1 on error.
• socket() allocates resources needed for a
communication endpoint - but it does not deal with
endpoint addressing.
Specifying an Endpoint Address
• Remember that the sockets API is generic.
• There must be a generic way to specify endpoint
addresses.
• TCP/IP requires an IP address and a port number for
each endpoint address.
• Other protocol suites (families) may use other
schemes.
Necessary Background Information:
POSIX data types

int8_t signed 8bit int


uint8_t unsigned 8 bit int
int16_t signed 16 bit int
uint16_t unsigned 16 bit int
int32_t signed 32 bit int
uint32_t unsigned 32 bit int

u_char, u_short, u_int, u_long


More POSIX data types

sa_family_t address family


socklen_t length of struct
in_addr_t IPv4 address
in_port_t IP port number
Generic socket addresses
struct sockaddr {
uint8_t sa_len;
sa_family_t sa_family;
char sa_data[14];
};

• sa_family specifies the address type.


• sa_data specifies the address value.
AF_INET
• For AF_INET we need:
– 16 bit port number
– 32 bit IP address
struct sockaddr_in (IPv4)
struct sockaddr_in {
uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
A special kind of sockaddr structure – used for IPV4 sockets
struct in_addr
struct in_addr {
in_addr_t s_addr;
};
Byte Order
Network Byte Order
• Network communication uses Bigendian style, also
known as Network Byte Order (NBO)
• All values stored in a sockaddr_in must be in
network byte order.
– sin_port a TCP/IP port number.
– sin_addr an IP address.
Network Byte Order Functions

‘h’ : host byte order ‘n’ : network byte order


‘s’ : short (16bit) ‘l’ : long (32bit)

uint16_t htons(uint16_t);
uint16_t ntohs(uint_16_t);

uint32_t htonl(uint32_t);
uint32_t ntohl(uint32_t);
TCP/IP Addresses
• We don’t need to deal with sockaddr structures
since we will only deal with a real protocol family.
• We can use sockaddr_in structures.

BUT: The C functions that make up the sockets API


expect structures of type sockaddr.
Assigning an address to a socket

• The bind() system call is used to assign an address to an


existing socket.

int bind( int sockfd,


const struct sockaddr *myaddr, int
addrlen);

•const!
bind returns 0 if successful or -1 on error.
bind()
• calling bind() assigns the address specified by the
sockaddr structure to the socket descriptor.
• You can give bind() a sockaddr_in structure:
bind( mysock,
(struct sockaddr*) &myaddr,
sizeof(myaddr) );
bind() Example
int mysock,err;
struct sockaddr_in myaddr;

mysock = socket(PF_INET,SOCK_STREAM,0);
myaddr.sin_family = AF_INET;
myaddr.sin_port = htons( portnum );
myaddr.sin_addr = htonl( ipaddress);

err=bind(mysock, (sockaddr *) &myaddr,


sizeof(myaddr));
Uses for bind()

• There are a number of uses for bind():


– Server would like to bind to a well known address (port
number).

– Client can bind to a specific port.

– Client can ask the O.S. to assign any available port number.
IPv4 Address Conversion
int inet_aton( char *, struct in_addr *);

Convert ASCII dotted-decimal IP address to network byte


order 32 bit value. Returns 1 on success, 0 on failure.

char *inet_ntoa(struct in_addr);

Convert network byte ordered value to ASCII dotted-


decimal (a string).
Server
TCP Client Server
socket()
“well-known”
bind()
port
listen()
Client
accept()
socket()
(Blockuntil connection )
“Handshake”
connect()
Data (request)
write()
read()
Data (reply)
write()
read()

End-of-File close()
read()

close()
TCP Client

PF_INET
PF_INET6 STREAM 0, used by
PF_UNIX DGRAM RAW socket
PF_X25 RAW

sd = socket (family, type, protocol);


family

port
ephemeral port three way
addr ip addr
(routing)
sd = connect (sd, server_addr, handshaking
addr_len);
Server

CONNECT actions
write (sd, *buff, mbytes); PORT#
IP-ADDR
1. socket is valid
2. fill remote endpoint
addr/port
3. choose local endpoint
read (sd, *buff, mbytes);
add/port
4. initiate 3-way handshaking

disconnect
close (sd); sequence
TCP Server

LISTEN sd = socket (family, type, protocol);


family SOCKET

port well-known port #


INADDR_ANY
addr bind port # bind (sd, *server_addr, len);

1. Turn sd from
listen (sd, backlog); active to passive
2. Queue length

family

port
CONNECT
SOCKET ssd = accept (sd, *cliaddr, *len); three way
handshaking
addr

read (ssd, *buff, mbytes);

closes socket for R/W


non-blocking
write (ssd, *buff, mbytes); disconnect
attempts to send unsent sequence
data

socket option SO_LINGER


block until data sent
close (ssd);
socket()
Create a socket

int socket(int family, int type, int protocol);


• family is one of
– PF_INET (IPv4), PF_INET6 (IPv6), PF_LOCAL (local Unix),
– PF_ROUTE (access to routing tables), PF_KEY (encryption)
• type is one of
– SOCK_STREAM (TCP), SOCK_DGRAM (UDP)
– SOCK_RAW (for special IP packets, PING, etc. Must be root)
• protocol is 0 (used for some raw socket options)
• upon success returns socket descriptor
– Integer, like file descriptor
– Return -1 if failure
connect()
Connect to server

int connect(int sockfd, const struct sockaddr


*servaddr, socklen_t addrlen);

• sockfd is socket descriptor from socket()


• servaddr is a pointer to a structure with:
– port number and IP address
– must be specified (unlike bind())
• addrlen is length of structure
• client doesn’t need bind()
– OS will pick ephemeral port
• returns socket descriptor if ok, -1 on error
bind()
Assign a local protocol address (“name”) to a socket

int bind(int sockfd, const struct sockaddr *myaddr,


socklen_t addrlen);

• sockfd is socket descriptor from socket()


• myaddr is a pointer to address struct with:
– port number and IP address
– if port is 0, then
• host will pick ephemeral port (very rare for server)
• How do you know assigned port number?
– if IP address is wildcard: INADDR_ANY (multiple net cards)
• host kernel will choose IP address
• INADDR_ defined in <netinet/in.h>
• INADDR_ in host byte order => htonl(INADDR_ANY)
• addrlen is length of structure
• returns 0 if ok, -1 on error
– EADDRINUSE (“Address already in use”)
bind()
address and port

process specifies result
IP address port

wildcard 0 kernel chooses IP addr and port
wildcard nonzero kernel chooses IP, process specifies port
local IP addr 0 process specifies IP, kernel chooses port
local IP addr nonzero process specifies IP and port

Wildcard specified as INADDR_ANY


listen()
Change socket state to TCP server

intlisten(int sockfd ,int backlog );

• Sockets default to active (for a client)


– change to passive so OS will accept connection
• sockfd is socket descriptor from socket()
• backlog is maximum number of connections that the server
should queue for this socket
– historically 5
– rarely above 15 on a even moderate Web server!
listen()
listen()

• Possibility of SYN flooding attack


accept()
Return next completed connection

int accept(int sockfd, struct sockaddr


*cliaddr, socklen_t *addrlen);

• sockfd is socket descriptor from socket()


• cliaddr and addrlen return protocol address from client
• returns brand new descriptor, created by OS
• if used with fork(), can create concurrent server
read() and write()

int read (int sockfd, void *buff, size_t mbytes);


int write (int sockfd, void *buff, size_t mbytes);

• Reading and writing packets


• Both are system calls
close()
Close socket for use

intclose(int sockfd) ;

• sockfd is socket descriptor from socket()


• closes socket for reading/writing
– returns (doesn’t block)
– attempts to send any unsent data
– socket option SO_LINGER
• block until data sent
• or discard any remaining data
– Returns -1 if error
Descriptor Reference Counts
• For every socket a reference count is maintained, as to how
many processes are accessing that socket
• When close() is called on socket descriptor reference count is
decreased by 1
• When close() is called on socket descriptor, TCP 4 packet
termination sequence will be initiated only if the reference count
goes to zero
getsockname() and getpeername() Functions

• getsockname return the local endpoint address associated with a


socket
• getpeername return the foreign protocol address associated with a
socket
• #include <sys/socket.h>
int getsockname(int sockfd, struct sockaddr
*localaddr, socklen_t *addrlen);
int getpeername(int sockfd, struct sockaddr *peeraddr,
socklen_t *addrlen);
getsockname()
• TCP client that does not call bind, getsockname returns the local IP
address and local port number assigned to the connection by the
kernel.
• After calling bind with a port number of 0, getsockname returns the
local port number that was assigned.
• getsockname can be called to obtain the address family of a socket
• In a TCP server that binds the wildcard IP address, once a connection
is established with a client (accept returns successfully), the server can
call getsockname to obtain the local IP address assigned to the
connection.
getpeername()
• When a server is execed by the process that calls
accept, the only way the server can obtain the
identity of the client is to call getpeername
• inetd server works by execing the respective server’s
image
getpeername() : inetd
TCP Echo Client
int bzero(&servaddr, sizeof(servaddr));
main(int argc, char **argv) servaddr.sin_family = AF_INET;
{ servaddr.sin_port = htons(SERV_PORT);
int sockfd; Inet_pton(AF_INET, argv[1], &servaddr.sin_addr);
struct sockaddr_in servaddr; Connect(sockfd, (SA *) &servaddr, sizeof(servaddr));
if (argc != 2) str_cli(stdin, sockfd);
err_quit("usage: tcpcli <IPaddress>"); exit(0);
sockfd = Socket(PF_INET, SOCK_STREAM, 0); }
str_cli function
2 void
3 str_cli(FILE *fp, int sockfd)
4{
5 char sendline[MAXLINE], recvline[MAXLINE];

6 while (Fgets(sendline, MAXLINE, fp) != NULL) {

7 Write(sockfd, sendline, strlen (sendline));

8 if (Read(sockfd, recvline, MAXLINE) == 0)


9 err_quit("str_cli: server terminated prematurely");

10 Fputs(recvline, stdout);
11 }
12 }
TCP Concurrent Server
TCP Concurrent Server
2 int 15 Listen(listenfd, LISTENQ);
3 main(int argc, char **argv) 16 for ( ; ; ) {
4{ 17 clilen = sizeof(cliaddr);
5 int listenfd, connfd; 18 connfd = Accept(listenfd, (SA *) &cliaddr, &clilen);
6 pid_t childpid;
7 socklen_t clilen; 19 if ( (childpid = Fork()) == 0) { /* child process */
8 struct sockaddr_in cliaddr, servaddr; 20 Close(listenfd); /* close listening socket */
21 str_echo(connfd); /* process the request */
9 listenfd = Socket (AF_INET, SOCK_STREAM, 0); 22 exit (0);
23 }
10 bzero(&servaddr, sizeof(servaddr)); 24 Close(connfd); /* parent closes connected socket */
11 servaddr.sin_family = AF_INET; 25 }
12 servaddr.sin_addr.s_addr = htonl (INADDR_ANY); 26 }
13 servaddr.sin_port = htons (SERV_PORT);

14 Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));


str_echo function
void
str_echo(int sockfd)
{
ssize_t n;
char buf[MAXLINE];
again:
while ( (n = read(sockfd, buf, MAXLINE)) > 0)
Write(sockfd, buf, n);

if (n < 0 && errno == EINTR)


goto again;
else if (n < 0)
err_sys("str_echo: read error");
}
TCP Concurrent Server
• Handling zombies
– while ( (pid = waitpid(-1, &stat, WNOHANG)) > 0) in SIGCHLD
signal handler
• Handling interrupted system calls
– when writing network programs that catch signals, we must be
cognizant of interrupted system calls, and we must handle them
– Slow system call is any system call that can block forever
Handling interrupted system calls

for ( ; ; ) {
clilen = sizeof (cliaddr);
if ( (connfd = accept (listenfd, (SA *) &cliaddr,
&clilen)) < 0) {
if (errno == EINTR)
continue; /* back to for () */
else
err_sys ("accept error");
}
Connection Abort before accept Returns
Connection Abort before accept Returns

• SVR4 and POSIX return an error of EPROTO or


ECONNABORTED
• Berkeley-derived kernels never return any error
Termination of Server Process
• FIN is sent to client
• Client tcp sends ACK to server
• What if client application doesn’t take not of it, and
sends data to server?
SIGPIPE Signal
• When a process writes to a socket that has received
an RST, the SIGPIPE signal is sent to the process.
The default action of this signal is to terminate the
process, so the process must catch the signal to
avoid being involuntarily terminated.
Crashing of Server Host
• Nothing is sent to client
• Client will try to reach the host, but will get errors
such as ETIMEDOUT, EHOSTUNREACH,
ENETWORKUNREACH
Crashing and Rebooting of Server Host

• When client sends packets, server will respond with


RST
Shutdown of Server Host
• Init sends SIGTERM to all processes
• Then sends SIG KILL to all processes
• Fin is sent to the client
I/O Multiplexing

318
I/O Multiplexing
• We often need to be able to monitor multiple
descriptors:
– a generic TCP client (like telnet)
– need to be able to handle unexpected situations, perhaps a
server that shuts down without warning.
– A server that handles both TCP and UDP
Example - generic TCP client
• Input from standard input should be sent to a TCP
socket.
• Input from a TCP socket should be sent to standard
output.
• How do we know when to check for input from each
source?
TCP SOCKET Generic TCP Client

STDIN

STDOUT
Different Solutions
• Use nonblocking I/O.
– use fcntl() to set O_NONBLOCK
• Use alarm and signal handler to interrupt slow
system calls.
• Use multiple processes/threads.
• Use functions that support checking of multiple input
sources at the same time.
Non blocking I/O

• use fcntl() to set O_NONBLOCK:


int flags;
flags = fcntl(sock,F_GETFL,0);
fcntl(sock,F_SETFL,flags | O_NONBLOCK);
• Now calls to read() (and other system calls) will return an
error and set errno to EWOULDBLOCK.
while (! done) {
if ( (n=read(STDIN_FILENO,…)<0))
if (errno != EWOULDBLOCK)
/* ERROR */
else write(tcpsock,…)

if ( (n=read(tcpsock,…)<0))
if (errno != EWOULDBLOCK)
/* ERROR */
else write(STDOUT_FILENO,…)
}
The problem with nonblocking I/O
• Using blocking I/O allows the Operating System to
put your program to sleep when nothing is happening
(no input). Once input arrives the OS will wake up
your program and read() (or whatever) will return.
• With nonblocking I/O the process will waste
processor time in a busy-wait
Using alarms

signal(SIGALRM, sig_alrm);
alarm(MAX_TIME);
read(STDIN_FILENO,…);
...

signal(SIGALRM, sig_alrm);
alarm(MAX_TIME);
read(tcpsock,…);
...
Alarming Problem

• What will be the effect on response time ?

• What is the ‘right’ value for MAX_TIME?


Select()
• The select() system call allows us to use blocking I/O
on a set of descriptors (file, socket, …).
• For example, we can ask select to notify us when
data is available for reading on either STDIN or a
TCP socket.
I/O Models
• Blocking
• Non-Blocking
• IO Multiplexing
• Signal-driven IO
• Asynchronous IO
IO Models
• Two phases
– Waiting for the data
– Copying the data
Blocking I/O
application kernel
System call
recvfrom No datagram ready
Wait for 
data
Process blocks 
in a call to  Datagram ready
recvfrom Copy data
copy datagram
from kernel
 to user

Return OK
Process Copy complete
datagram
nonblocking I/O
application kernel
System call
recvfrom No datagram ready
EWOULDBLOCK

System call
recvfrom No datagram ready
EWOULDBLOCK Wait for 
data
Process System call
 repeatedly recvfrom datagram ready
call recvfrom copy datagram
wating for an 
OK return Copy data
(polling) from 
Return OK kernel
 to user
Process application
datagram
I/O multiplexing(select and poll)

application kernel
System call
Process block select No datagram ready
in a call to
select waiting Wait for 
for one of data
possibly many Return readable
sockets to Datagram ready
become readable System call
recvfrom copy datagram

Process blocks Copy data
while data  from kernel
copied Return OK  to user
into application Process Copy complete
buffer datagram
signal driven I/O(SIGIO)

application kernel
Sigaction system call
Establish SIGIO
Process 
continues  Signal handler
executing Return  Wait for 
data

Deliver SIGIO
Signal handler Datagram ready
System call copy datagram
recvfrom Copy data
Process blocks
while data  from kernel
copied Return OK  to user
into application Process Copy complete
buffer datagram
asynchronous I/O

application kernel
System call
aio_read No datagram ready

Return Wait for 
data
Process 
continues Datagram ready
executing copy datagram Copy data
from kernel
 to user

Signal Delever signal
 handler Copy complete
Process Specified in aio_read
datagram
Comparison of the I/O Models

 I/O  signal-driven  asynchronous


blocking nonblocking multiplexing I/O I/O

 initiate  check  check  initiate


 check

 blocked
 check  wait for
 check data
 check
 check  ready  notification
initiate initiate
 blocked

 blocked

 blocked  copy data
from kernel
 complete  complete  complete  complete  notification to user

 ist phase handled differently,  handles both phases
2nd phase handled the same
Select()
int select( int maxfd,
fd_set *readset,
fd_set *writeset,
fd_set *excepset,
const struct timeval *timeout);
maxfd :highest number assigned to a descriptor.
weadset: set of descriptors we want to read from.
writeset: set of descriptors we want to write to.
excepset: set of descriptors to watch for exceptions.
timeout: maximum time select should wait
struct timeval
struct timeval {
long tv_usec; /* seconds */
long tv_usec; /* microseconds */
}

struct timeval max = {1,0};


Condition of select function
• Wait forever : return only descriptor is ready(timeval
= NULL)
• wait up to a fixed amount of time:
• Do not wait at all : return immediately after checking
the descriptors(timeval = 0)
wait: normally interrupt if the process catches a signal
and returns from the signal handler
Select Function

• Readset => descriptor for checking readable


• writeset => descriptor for checking writable
• exceptset => descriptor for checking
two exception conditions
:arrival of out of band data for a socket
:the presence of control status information to be read from the
master side of a pseudo terminal
Descriptor sets

• Array of integers : each bit in each integer correspond to


a descriptor.

• fd_set: an array of integers, with each bit in each integer corresponding to a


descriptor.

• Void FD_ZERO(fd_set *fdset); /* clear all bits in fdset */


• Void FD_SET(int fd, fd_set *fdset); /* turn on the bit for fd in fdset */
• Void FD_CLR(int fd, fd_set *fdset); /* turn off the bit for fd in fdset*/
• int FD_ISSET(int fd, fd_set *fdset);/* is the bit for fd on in fdset ? */
Example of Descriptor sets function

fd_set rset;

FD_ZERO(&rset);/*all bits off : initiate*/


FD_SET(1, &rset);/*turn on bit fd 1*/
FD_SET(4, &rset); /*turn on bit fd 4*/
FD_SET(5, &rset); /*turn on bit fd 5*/
Maxfdp1
• specifies the number of descriptors to be tested.
• Its value is the maximum descriptor to be tested,
plus one
– (example:fd1,2,5 => maxfdp1: 6)
• constant FD_SETSIZE defined by including
<sys/select.h>, is the number of descriptors in
the fd_set datatype.(1024)
When is the descriptor ready for reading?
• The number of bytes of data in the socket receive buffer is greater than or equal
to the current size of the low-water mark for the socket receive buffer.
SO_RCVLOWAT socket option. It defaults to 1 for TCP and UDP sockets
• The read half of the connection is closed (i.e., a TCP connection that has
received a FIN)
• The socket is a listening socket and the number of completed connections is
nonzero.
• A socket error is pending. A read operation on the socket will not block and will
return an error (–1) with errno set to the specific error condition.
– These pending errors can also be fetched and cleared by calling getsockopt and
specifying the SO_ERROR socket option.
When the socket is ready for writing?

• The number of bytes of available space in the socket send buffer is


greater than or equal to the current size of the low-water mark for the
socket send buffer and eit
• The write half of the connection is closed. A write operation on the
socket will generate SIGPIPE
• A socket using a non-blocking connect has completed the
connection, or the connect has failed
• A socket error is pending. A write operation on the socket will not
block and will return an error (–1) with errno set to the specific error
condition.
– These pending errors can also be fetched and cleared by calling
getsockopt with the SO_ERROR socket option.
When is the socket descriptor returned in
exception list?
• A socket has an exception condition pending if there
is out-of-band data for the socket
• or the socket is still at the out-of-band mark
Condition that cause a socket to be ready
for select

Condition Readable? writable? Exception?


Data to read •
read-half of the connection closed •
new connection ready for listening socket •
Space available for writing •
write-half of the connection closed •
Pending error • •

TCP out-of-band data •


Condition handled by select in str_cli

client

Data of EOF select() for 
• stdin
readability on either 
Socket
standard input or 

socket

error EOF

TCP

RST data FIN


Three conditions are handled with the
socket

• Peer TCP send a data,the socket becomr readable and read


returns greater than 0
• Peer TCP send a FIN(peer process terminates), the socket
become readable and read returns 0(end-of-file)
• Peer TCP send a RST(peer host has crashed and rebooted),
the socket become readable and returns -1 and errno contains
the specific error code
Implimentation of str_cli function using
select
Void str_cli(FILE *fp, int if (FD_ISSET(sockfd, &rset)) { /* socket
sockfd) is readable */
{ if (Readline(sockfd, recvline, MAXLINE) ==
int maxfdp1; 0)
fd_set rset; err_quit("str_cli: server terminated
char sendline[MAXLINE], prematurely");
recvline[MAXLINE]; Fputs(recvline, stdout);
}
FD_ZERO(&rset);
for ( ; ; ) { if (FD_ISSET(fileno(fp), &rset)) { /*
FD_SET(fileno(fp), &rset); input is readable */
FD_SET(sockfd, &rset); if (Fgets(sendline, MAXLINE, fp) == NULL)
maxfdp1 = max(fileno(fp), return; /* all done */
sockfd) + 1; Writen(sockfd, sendline, strlen(sendline));
}
Select(maxfdp1, &rset, NULL, }//for
NULL, NULL); }//str_cli

Continue…..
client request
time0

time1 request

Stop and wait


sends a line to the server time2 request
and then waits for the reply

time3 request server

time4
reply server
time5
reply
time6
reply
time7
reply
Batch input

Time 7:
request8 request7 request6 request5

reply1 reply2 reply3 reply4

Time 8:
request9 request8 request7 request6

reply2 reply3 reply4 reply5


Handling batch input
• The problem with our revised str_cli function
– After the handling of an end-of-file on input, the send function
returns to the main function, that is, the program is terminated.
– However, in batch mode, there are still other requests and replies in
the pipe.
• A way to close one-half of the TCP connection
– send a FIN to the server, telling it we have finished sending data,
but leave the socket descriptor open for reading <= shutdown
function
Shutdown function
• Close one half of the TCP connection
• Close function :
– decrements the descriptor’s reference count and closes the
socket only if the count reaches 0, terminate both directions
of data transfer(reading and writing)
• Shutdown function closes just one of them (reading
or writing)
Calling shutdown to close half of a
TCP connection
client server
data
write
write data Read returns > 0
shutdown FIN Read returns > 0
Read returns 0
Ack of data and FIN

data write
write
Read returns > 0 data close
Read returns > 0 FIN
Read returns 0 Ack of data and FIN
Shutdown function

• #include<sys/socket.h>
int shutdown(int sockfd, int howto);
/* return : 0 if OK, -1 on error */
• howto argument
SHUT_RD : read-half of the connection closed. No more reads can be issued
SHUT_WR : write-half of the connection closed. Also called half-close. Buffered
data will be sent followed by termination sequence.
SHUT_RDWR : both closed
Str_cli function using select and
shutdown
#include "unp.h"
void   str_cli(FILE *fp, int sockfd)
{
int maxfdp1, stdineof;
fd_set rset;
charsendline[MAXLINE], recvline[MAXLINE];

stdineof = 0;
FD_ZERO(&rset);
for ( ; ; ) {
if (stdineof == 0) // select on standard input for readability
FD_SET(fileno(fp), &rset);
FD_SET(sockfd, &rset);
maxfdp1 = max(fileno(fp), sockfd) + 1;
Select(maxfdp1, &rset, NULL, NULL, NULL);     
Continue…..
Str_cli function using select and shutdown

if (FD_ISSET(sockfd, &rset)) { /* socket is readable */


if (Readline(sockfd, recvline, MAXLINE) == 0) {
if (stdineof == 1)
return; /* normal termination */
else
err_quit("str_cli: server terminated prematurely");
}
Fputs(recvline, stdout);
}
if (FD_ISSET(fileno(fp), &rset)) { /* input is readable */
if (Fgets(sendline, MAXLINE, fp) == NULL) {
stdineof = 1;
Shutdown(sockfd, SHUT_WR); /* send FIN */
FD_CLR(fileno(fp), &rset);
continue;
}
Writen(sockfd, sendline, strlen(sendline));
}
}
}
TCP echo server
• Single process server that uses select to handle any
number of clients, instead of forking one child per
client.
Data structure TCP server(1)

Before first client has established a connection

Client[ fd0 fd1 fd2 fd3


]
[0] -1 rset: 0 0 0 1
[1] -1
[2] -1 Maxfd + 1 = 4

fd:0(stdin),1(stdout),2(stderr)
[FD_SETSIZE -1] -1
fd:3 => listening socket fd
Data structure TCP server(2)

After first client connection is established

Client[ fd0 fd1 fd2 fd3 fd4


]
[0] 4 rset: 0 0 0 1 1
[1] -1
[2] -1 Maxfd + 1 = 5

* fd3 => listening socket fd
[FD_SETSIZE -1] -1
*fd4 => client socket fd
Data structure TCP server(3)
After second client connection is established

Client[ fd0 fd1 fd2 fd3 fd4 fd5


]
[0] 4 rset: 0 0 0 1 1 1
[1] 5
[2] -1 Maxfd + 1 = 6

* fd3 => listening socket fd
[FD_SETSIZE -1] -1
* fd4 => client1 socket fd
* fd5 => client2 socket fd
Data structure TCP server(4)
After first client terminates its connection

Client[ fd0 fd1 fd2 fd3 fd4 fd5


]
[0] -1 rset: 0 0 0 1 0 1
[1] 5
[2] -1 Maxfd + 1 = 6

*Maxfd does not change
* fd3 => listening socket fd
[FD_SETSIZE -1] -1
* fd4 => client1 socket fd deleted
* fd5 => client2 socket fd
TCP echo server using single process
#include "unp.h"
int main(int argc, char **argv)
{
int i, maxi, maxfd, listenfd, connfd, sockfd;
int nready, client[FD_SETSIZE];
ssize_t n;
fd_set rset, allset;
char line[MAXLINE];
socklen_t clilen;
struct sockaddr_in cliaddr, servaddr;
listenfd = Socket(AF_INET, SOCK_STREAM, 0);
bzero(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(SERV_PORT);
Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));
Listen(listenfd, LISTENQ);
maxfd = listenfd; /* initialize */
maxi = -1; /* index into client[] array */
for (i = 0; i < FD_SETSIZE; i++)
client[i] = -1; /* -1 indicates available entry */
FD_ZERO(&allset);
FD_SET(listenfd, &allset);
for ( ; ; ) {
rset = allset; /* structure assignment */
nready = Select(maxfd+1, &rset, NULL, NULL, NULL);
if (FD_ISSET(listenfd, &rset)) { /* new client connection */
clilen = sizeof(cliaddr);
connfd = Accept(listenfd, (SA *) &cliaddr, &clilen);
for (i = 0; i < FD_SETSIZE; i++)
if (client[i] < 0) {
client[i] = connfd; /* save descriptor */
break;}
if (i == FD_SETSIZE)
err_quit("too many clients");
FD_SET(connfd, &allset); /* add new descriptor to set */
if (connfd > maxfd)
maxfd = connfd; /* maxfd for select */
if (i > maxi)
maxi = i; /* max index in client[] array */
if (--nready <= 0)
continue; /* no more readable descriptors */
}
for (i = 0; i <= maxi; i++) { /* check all clients for data */
if ( (sockfd = client[i]) < 0)
continue;
if (FD_ISSET(sockfd, &rset)) {
if ( (n = Readline(sockfd, line, MAXLINE)) == 0) {
/*connection closed by client */
Close(sockfd);
FD_CLR(sockfd, &allset);
client[i] = -1;
} else
Writen(sockfd, line, n);
if (--nready <= 0)
break; /* no more readable descriptors */
}
}
}
}
Denial of service attacks
• If malicious client connect to the server, send 1 byte of
data(other than a newline), and then goes to sleep.
=>call readline, server is blocked.
Denial of service attacks
• Solution
– use nonblocking I/O
– have each client serviced by a separate thread of control
(spawn a process or a thread to service each client)
– place a timeout on the I/O operation
pselect function

#include <sys/select.h>
#include <signal.h>
#include <time.h>

int pselect(int maxfdp1, fd_set *readset, fd_set *writeset,


fd_set *exceptset, const struct timespec *timeout,
const sigset_t *sigmask)

pselect function was invented by Posix.1g.


pselect function
• struct timespec{
time_t tv_sec; /*seconds*/
long tv_nsec; /* nanoseconds */
• sigmask => pointer to a signal mask.
Name and Address Conversions
DNS

RFC 1034
RFC 1035
Hierarchical Namespace
Naming Authorities
DNS Record Types
Types
Sample DNS Records
aix IN A 192.168.42.2
IN AAAA 3ffe:b80:1f8d:2:204:acff:fe17:bf38
IN MX 5 aix.unpbook.com.
IN MX 10 mailhost.unpbook.com.
aix-4 IN A 192.168.42.2
aix-6 IN AAAA 3ffe:b80:1f8d:2:204:acff:fe17:bf38
aix-611 IN AAAA fe80::204:acff:fe17:bf38
Resolvers and Name Servers
DNS library functions

gethostbyname

gethostbyaddr

getservbyname

getservbyport

getaddrinfo

379
gethostbyname

struct hostent *gethostbyname( const char


*hostname);

struct hostent is defined in netdb.h:

#include <netdb.h>

380
struct hostent

struct hostent {
char *h_name;
char **h_aliases; official name
int h_addrtype; (canonical)
int h_length; other names
char **h_addr_list; AF_INET or AF_INET6
};
address length (4 or
16)
array of ptrs to
addresses

381
struct hostent
gethostbyname and errors
• On error gethostbyname return null.
• Gethostbyname sets the global variable h_errno to indicate
the exact error:
– HOST_NOT_FOUND
– TRY_AGAIN
– NO_RECOVERY
– NO_DATA
– NO_ADDRESS
Sample code using gethostbyname()
char *ptr, **pptr;
char str [INET_ADDRSTRLEN];
struct hostent *hptr; switch (hptr->h_addrtype) {
case AF_INET:
while (--argc > 0) { pptr = hptr->h_addr_list;
ptr = *++argv; for ( ; *pptr != NULL; pptr++)
if ( (hptr = gethostbyname (ptr) ) == printf ("\taddress: %s\n",
NULL) { Inet_ntop (hptr->h_addrtype, *pptr,
err_msg ("gethostbyname error for host: str, sizeof (str)));
%s: %s", break;
ptr, hstrerror (h_errno) ); default:
continue; err_ret ("unknown address type");
} break;
printf ("official hostname: %s\n", }
hptr->h_name);
}
for (pptr = hptr->h_aliases; *pptr ! =
NULL; pptr++)
printf ("\talias: %s\n", *pptr);
gethostbyaddr
• #include <netdb.h>
struct hostent *gethostbyaddr (const char *addr, socklen_t
len, int family);
• The addr argument is not a char*, but is really a pointer to an in_addr
structure containing the IPv4 address. len is the size of this structure: 4
for an IPv4 address. The family argument is AF_INET.
• The function gethostbyaddr takes a binary IPv4 address and
tries to find the hostname corresponding to that address. This is
the reverse of gethostbyname
getservbyname and getservbyport

• Services are often known by names.


• mapping from the name to port number is contained
in a file (normally /etc/services)
• if the port number changes, all we need to modify is
one line in the /etc/services file instead of having to
recompile the applications.
getservbyname
• #include <netdb.h>
struct servent *getservbyname (const char *servname, const
char *protoname);
struct servent {
char *s_name; /* official service name */
char **s_aliases; /* alias list */
int s-port; /* port number, network-byte order */
char *s_proto; /* protocol to use */
};
• The service name servname must be specified. If a protocol is also specified
(protoname is a non-null pointer), then the entry must also have a matching protocol.
Some Internet services are provided using either TCP or UDP
Usage of getservbyname
struct servent *sptr;

sptr = getservbyname("domain", "udp"); /* DNS using UDP */


sptr = getservbyname("ftp", "tcp"); /* FTP using TCP */
sptr = getservbyname("ftp", NULL); /* FTP using TCP */
sptr = getservbyname("ftp", "udp"); /* this call will fail */
/etc/services file
• freebsd % grep -e ^ftp -e ^domain /etc/services

ftp-data 20/tcp #File Transfer [Default Data]


ftp 21/tcp #File Transfer [Control]
domain 53/tcp #Domain Name Server
domain 53/udp #Domain Name Server
ftp-agent 574/tcp #FTP Software Agent System
ftp-agent 574/udp #FTP Software Agent System
ftps-data 989/tcp # ftp protocol, data, over TLS/SSL
ftps 990/tcp # ftp protocol, control, over TLS/SSL
getservbyport
• looks up a service given its port number and an optional protocol
• usage
struct servent *sptr;

sptr = getservbyport (htons (53), "udp"); /* DNS using UDP */


sptr = getservbyport (htons (21), "tcp"); /* FTP using TCP */
sptr = getservbyport (htons (21), NULL); /* FTP using TCP */
sptr = getservbyport (htons (21), "udp"); /* this call will fail */
getaddrinfo
• The gethostbyname and gethostbyaddr functions only support IPv4
• handles both
– name-to-address
– service-to-port translation,
• returns
– sockaddr structures instead of a list of addresses.
• hides all the protocol dependencies
• The application deals only with the socket address structures that are
filled in by getaddrinfo
getaddrinfo
• #include <netdb.h>
int getaddrinfo (const char *hostname, const char *service,
const struct addrinfo *hints, struct addrinfo **result) ;
struct addrinfo {
int ai_flags; /* AI_PASSIVE, AI_CANONNAME */
int ai_family; /* AF_xxx */
int ai_socktype; /* SOCK_xxx */
int ai_protocol; /* 0 or IPPROTO_xxx for IPv4 and IPv6 */
socklen_t ai_addrlen; /* length of ai_addr */
char *ai_canonname; /* ptr to canonical name for host */
struct sockaddr *ai_addr; /* ptr to socket address structure */
struct addrinfo *ai_next; /* ptr to next structure in linked list */
};
Hints structure
• hints is either a null pointer or a pointer to an addrinfo structure that the
caller fills in with hints about the types of information the caller wants
returned.
• The members of the hints structure that can be set by the caller are:
– ai_flags (zero or more AI_XXX values OR'ed together)
– ai_family (an AF_xxx value)
– ai_socktype (a SOCK_xxx value)
– ai_protocol
• For example,
– if the specified service is provided for both TCP and UDP, set ai_socktype
member of the hints structure to SOCK_DGRAM. The only information
returned will be for datagram sockets.
ai_flags
• AI_PASSIVE
The caller will use the socket for a passive open.
• AI_CANONNAME
Tells the function to return the canonical name of the host.
• AI_NUMERICHOST
Prevents any kind of name-to-address mapping; the hostname argument
must be an address string.
• AI_NUMERICSERV
Prevents any kind of name-to-service mapping; the service argument must
be a decimal port number string.

ai_flags
• AI_V4MAPPED
If specified along with an ai_family of AF_INET6, then returns IPv4-mapped IPv6
addresses corresponding to A records if there are no available AAAA records.
• AI_ALL
If specified along with AI_V4MAPPED, then returns IPv4-mapped IPv6 addresses
in addition to any AAAA records belonging to the name.
• AI_ADDRCONFIG
Only looks up addresses for a given IP version if there is one or more interface that
is not a loopback interface configured with an IP address of that version.
Result
• linked list of addrinfo structures, linked through the
ai_next pointer.
• There are two ways that multiple structures can be
returned:
– Multiple ips per hostname; one sockaddr structure for each
ip
– Service is provided for multiple socket types;
SOCK_STREAM or SOCK_DGRAM
Usage
• Sockaddr structure in addrinfo structures is ready for
– a call to socket
– then either a call to connect or sendto (for a client), or bind (for a
server).
• The arguments to socket are the members ai_family,
ai_socktype, and ai_protocol.
• The second and third arguments to either connect or bind are
ai_addr, and ai_addrlen
Usage
• struct addrinfo hints, *res;

• bzero(&hints, sizeof(hints) ) ;
• hints.ai_flags = AI_CANONNAME;
• hints.ai_family = AF_INET;

• getaddrinfo("freebsd4", "domain", &hints, &res);


Passive sockets
• specifies the service but not the hostname, and
specifies the AI_PASSIVE flag in the hints structure.
• The socket address structures returned should
contain an IP address of INADDR_ANY (for IPv4) or
IN6ADDR_ANY_INIT (for IPv6).
Errors: gai_strerror

• const char *gai_strerror (int error);


freeaddrinfo
• Storage returned by getaddrinfo, the addrinfo
structures, the ai_addr structures, and the
ai_canonname string are obtained dynamically (e.g.,
from malloc).
• This storage is returned by calling freeaddrinfo
• void freeaddrinfo (struct addrinfo *ai);
getnameinfo function

• Takes a socket address and returns a character string


describing the host and another character nstring describing the
service

int getnameinfo(const struct sockaddr *sockaddr, socklen_t


addrlen, char *host, size_t hostlen, char *serv, size_t
servlen, int flags);
Elementary UDP Socket
Contents
◆ recvfrom and sendto Function
◆ UDP Echo Server( main, de_echo Function)
◆ UDP Echo Client( main, de_cli Function)
◆ Lost datagrams
◆ Verifying Received Response
◆ Sever not Running
◆ Connect Function with UDP
◆ Lack of Flow Control with UDP
◆ Determining Outgoing Interface with UDP
◆ TCP and UDP Echo Server Using select
UDP

 connectionless
 unreliable
 datagram protocol
 popular using
 DNS(the Domain Name System)
 NFS(the Network File System)
 SNMP(Simple Network Management Protocol)
Socket functions for UDP client-server UDP Server
socket( )

bind( )
UDP Client
socket( ) recvfrom(
)
block until datagram
sendto( ) received from a client
data(request)

Process request

recvfrom( ) data(reply)
sendto( )
close( )
recvfrom and sendto functions

#include<sys/socket.h>

ssize_t recvfrom(int sockfd, void *buff, size_t nbyte, int flag,


struct sockaddr *from, socklen_t *addrlen);

ssize_t sendto(int sockfd, const void *buff, size_t nbyte, int flag,
const struct sockaddr *to, socklen_t addrlen);

Both return: number of bytes read or written if OK,-1 on error


Sending UDP Datagrams
ssize_t sendto( int sockfd,
void *buff,
size_t nbytes,
int flags,
const struct sockaddr* to,
socklen_t addrlen);
sockfd is a UDP socket
buff is the address of the data (nbytes long)
to is the address of a sockaddr containing the destination address.
Return value is the number of bytes sent, or -1 on error.
sendto()

• You can send 0 bytes of data!


• Some possible errors :
EBADF, ENOTSOCK: bad socket descriptor
EFAULT: bad buffer address
EMSGSIZE: message too large
ENOBUFS: system buffers are full
More sendto()

• The return value of sendto() indicates how much data was


accepted by the O.S. for sending as a datagram - not how
much data made it to the destination.
• There is no error condition that indicates that the destination
did not get the data!!!
Receiving UDP Datagrams
ssize_t recvfrom( int sockfd,
void *buff,
size_t nbytes,
int flags,
struct sockaddr* from,
socklen_t *fromaddrlen);
sockfd is a UDP socket
buff is the address of a buffer (nbytes long)
from is the address of a sockaddr.
Return value is the number of bytes received and put into buff, or -1 on
error.
recvfrom()
• If buff is not large enough, any extra data is lost forever...
• You can receive 0 bytes of data!
• The sockaddr at from is filled in with the address of the sender.
• You should set fromaddrlen before calling.
• If from and fromaddrlen are NULL we don’t find out who sent
the data.
More recvfrom()

• Same errors as sendto, but also:


– EINTR: System call interrupted by signal.
• Unless you do something special - recvfrom doesn’t return
until there is a datagram available.
server as we had with TCP

connection server fock listenin fock server connection


client g child client
child
server

TCP TCP TCP

connection connection

Summary of TCP client-server with two clients.


server as with UDP

client server client

Socket receive
buffer

UDP UDP
UDP
datagram datagram

Summary of UDP client-server with two clients.


UDP Echo client: main Function
#include “unp.h”
int main(int argc, char **argv)
{
int sockfd;
struct sockaddr_in servaddr;
if (argc != 2)
err_quit( “usage : udpcli <Ipaddress>”);
bzero(&servaddr, sizeof(servaddr);
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons(SERV_PORT);
Inet_pton(AF_INET, argv[1], &servaddr.sin_addr);
sockfd = Socket(AF_INET, SOCK_DGRAM, 0);
dg_cli(stdin, sockfd, (SA *) &servaddr, sizeof(servaddr);
exit(0);
}
UDP Echo Client: dg_cli Function

#include “unp.h”
void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, soklen_t servlen)
{
int n;
char sendline[MAXLINE], recvline[MAXLINE+1];
while(Fgets(sendline, MAXLINE, fp) != NULL) {
sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);

n = Recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL);

recvline[n] = 0; /* null terminate */


Fputs(recvline,stdout);
}
}

dg_cli function: client processing loop


Lost Datagrams
If the client datagram arrives at the server but the server’s reply is
lost, the client will again block forever in its call to recvfrom.

The only way to prevent this is to place a timeout on the recvfrom.


Verify Received Response
#include “unp.h”
void dg_cli(FILE *fp, int sock, const SA *pseraddr, socklen_t servlen)
{
int n;
char sendline[MAXLINE], recvline[MAXLINE];
socklen_t len;
struct sockaddr *preply_addr;
preply_addr = Malloc(servlen);

while(Fget(sendline, MAXLINE, fp) ! = NULL) {


Sendto(sockfd,sendline, strlen(sendline), 0, pservaddr, servlen);
len = servlen;
n = Recvfrom(sockfdm, recvline, MAXLINE, 0, preply_addr,&len)

continue
Verify Received
Response
If(len != servlen || memcmp(pservaddr, preply_addr, len) != 0) {
printf(“reply from %s (ignore)\n”,
Sock_ntop(preply_addr, len);
continue;
}
recvline[n] = 0; /*NULL terminate */
Fputs(recvline, stdout);
}
}

The server has not bound an IP address


to its socket, the kernel choose the source address for
the IP datagram. It is chosen
to be the primary IP address of the outgoing interface.
Server Not Running

◆ Client blocks forever in the call to recvfrom.


◆ ICMP error is asynchronous error.
◆The basic rule is that asynchronous errors are not returned for
UDP sockets unless the socket has been connected.
connect Function with UDP
This does not result in anything like a TCP connection: there is no three-way
handshake. Instead, the kernel just records the IP address and port
number of the peer.
With a connect UDP socket three change:
1. We can no long specify the destination IP address and port for an output
operation. That is, we do not use sendto but use write or send instead.
2. We do not use recvfrom but read or recv instead.
3. Asynchronous errors are returned to the process for a connected UDP socket.
connect Function with UDP

application peer

???
UDP } Stores peer IP address
and port#from connect
UDP

UDP datagram
UDP datagram from
some other
IP address and/or port#
UDP datagram
Lack of Flow Control with UDP

#include “unp.h”

#define NDG 2000


#define DGLEN 1400

void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t, servlen)


{
int i;
char sendline[MAXLINE];
for(I = 0; I< NDG ; I++) {
Sendto(sockfd, sendline, DGLEN, 0, pservaddr, servlen);
}
}

dg_cli function that writes a fixed number of datagram to server


Lack of Flow Control with UDP
#include “unp.h”
static void recvfrom_int(int);
static int count;
void dg_echo(int sockfd, SA *pcliaddr, socklen_t clilen)
{
socklen_t len;
char mesg[MAXLINE];
Signal(SIGHT, recvfrom_int);
for( ; ; ) {
len=clilen;
Recvfrom(sockfd, mesg, MAXLINE, 0, pcliaddr, &len);
count++;
}
}

static void recvfrom_int(int signo)


{
printf(“\nreceived %d datagram\n”, count);
exit(0);
}
Lack of Flow Control with UDP

 The interface’s buffers were full or they could have been discarded by
the sending host.

 The counter “dropped due to full socket buffers” indicates how many
datagram were received by UDP but were discarded because the
receiving socket’s receive queue was full

 The number of datagrams received by the server in this example is


nondeterministic. It depends on many factors, such as the network
load, the processing load on the client host, and the processing load in
the server host.

 Solution
 fast server, slow client.
 Increase the size of socket receive buffer.
TCP and UDP Echo Server Using select

#include “unp.h”
int main(int argc, char **argv)
{
int listenfd, connfd, udpfd, nready, maxfd1;
char mesg[MAXLINE];
pid_t childpid;
fd_set rset;
ssize_t n;
socklen_t len;
const int on = 1;
struct sockaddr_in cliaddr, servaddr;
void sig_chld(int);
TCP and UDP Echo Server Using select

/* Create listening TCP socket */


listenfd = Socket(AF_INET,SOCK_STREAM, 0);

bzero(&seraddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htol(INADDR_ANY);
servaddr.sin_port = htos(SERV_PORT);

Setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));


Bind(listenfd, (SA *)&servaddr, sizeof(servaddr));

Listenfd, LISTENQ);
/* Create UDP socket */
udpfd = Socket(AF_INET, SOCK_DGRAM, 0);

bzero(&seraddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htol(INADDR_ANY);
servaddr.sin_port = htos(SERV_PORT);

Bind(udpfd, (SA *) &servaddr, sizeof(servaddr));


TCP and UDP Echo Server Using select
Signal(SIGCHLD, sig_chld); /* must call waitpd( )*/
FD_ZERO(&rset);
maxfdp1=max(listenfd, udpfd)+1;
for( ; ; ) {
FD_SET(listenfd, &rset);
FD_SET(udpfd, &rset);
if((nready = selext[,axfdp1, &rset, NULL, NULL,NULL) < 0) {
if(errno == EINTR)
continue;
else
err_sys(“select error”);
}
if(FD_ISSET(listenfd,&rset)) {
len = sizeof(cliaddr);
connfd = Accept(listenfd, (SA *) &cliaddr, &len);

if((childpid = fork( )) == 0) { /* child process */


Close(listenfd); /* Close listening socket */
str_echo(connfd); /* process the request */
exit(0);
}
Close(connfd);
}
TCP and UDP Echo Server Using select

if(FD_ISSET(udpfd, &rset)) {
len = sizeof(cliaddr);
n = Recvfro,(udp, mesg, MAXLINE, 0, (SA *)
&cliaddr, &len);

Sendto(udpfd, ,esg, n, 0, (SA *) &cliaddr,


len);
}
} /* for */
} /* main */
Advanced UDP Sockets
When to use UDP instead of TCP?

• Advantages of UDP:
– UDP supports broadcasting and multicasting
– UDP has no connection setup or teardown
• For a two packet request-reply, we need 8 extra packets to be
transmitted in TCP
• UDP: RTT+SPT, TCP: 2 *RTT + SPT
When to use UDP instead of TCP?

• Features of TCP not provided by UDP:


– Positive acknowledgments, retransmission of lost packets,
duplicate detection, and sequencing of packets reordered by the
network
• Seq nos, estimate RTO
– Windowed flow control
– Slow start and congestion avoidance
• to determine the current network capacity and to handle periods of
congestion
When to use UDP instead of TCP?

• Recommendations:
– UDP must be used for broadcast and multicast applications
• Error control or reliability be added if reqd at appl layer
– UDP can be used for simple request-reply applications, but error
detection must be built into the application
• Acknowledgements, timeouts, retransmissions
– UDP should not be used for bulk data transfer
• Bulk transfer requires flow control along with error control which is like
replicating TCP at appl layer
Adding Reliability to a UDP Application

• UDP for a request-reply application


– Timeout and retransmission to handle datagrams that are
discarded
– Sequence numbers so the client can verify that a reply is for
the appropriate request
• Examples which use simple request-reply with
reliability:
– DNS resolvers, SNMP agents, TFTP, and RPC
Handling Timeout and Retransmission

• Old fashioned: Send a request and wait for N seconds  linear


retransmit timer
• RTT on a network can vary from fractions of a second on a LAN
to many seconds on a WAN.
• Factors affecting the RTT are distance, network speed, and
congestion
• Timeout should take into account the actual RTTs that we
measure along with the changes in the RTT over time
Retransmission Timeout (RTO)
Jacobson's algorithm
• two statistical estimators: srtt is the smoothed RTT
estimator and rttvar is the smoothed mean deviation
estimator
RTO
• When the retransmission timer expires, an
exponential backoff must be used for the next RTO
– For example, if our first RTO is 2 seconds and the reply is
not received in this time, then the next RTO is 4 seconds. If
there is still no reply, the next RTO is 8 seconds, and then
16, and so on.
Retransmission ambiguity problem

• Jacobson's algorithms tell us how to calculate the


RTO each time we measure an RTT and how to
increase the RTO when we retransmit.
• But, a problem arises when we have to retransmit a
packet and then receive a reply. This is called the
retransmission ambiguity problem
Retransmission ambiguity problem
Retransmission ambiguity problem: Karns
Algorithm
• the following rules that apply whenever a reply is received for a
request that was retransmitted:
– If an RTT was measured, do not use it to update the estimators
since we do not know to which request the reply corresponds.
– Since this reply arrived before our retransmission timer expired,
reuse this RTO for the next packet. Only when we receive a reply to
a request that is not retransmitted will we update the RTT
estimators and recalculate the RTO
Concurrent UDP Servers
• two different types of servers:
– First is a simple UDP server that reads a client request, sends a
reply, and is then finished with the client
• fork a child and let it handle the request
– Second is a UDP server that exchanges multiple datagrams with
the client.
• Create a new socket for each client, bind an ephemeral port to that
socket, and use that socket for all its replies.
• The client look at the port number of the server's first reply and send
subsequent datagrams for this request to that port.
Concurrency in UDP server that exchanges
multiple datagrams with the client
Socket Options
abstraction

• Introduction
• getsockopt and setsockopt function
• socket state
• Generic socket option
• IPv4 socket option
• ICMPv6 socket option
• IPv6 socket option
• TCP socket option
• fcnl function
Introduction

• Three ways to get and set the socket option that


affect a socket
– getsockopt , setsockopt function=>IPv4 and IPv6
multicasting options
– fcntl function =>nonblocking I/O, signal driven I/O
– ioctl function =>chapter16
getsockopt and setsockopt function

#include <sys/socket.h>
int getsockopt(int sockfd, , int level, int optname, void *optval, socklent_t 
*optlen);
int setsockopt(int sockfd, int level , int optname, const void *optval, 
socklent_t  optlen);

•sockfd => open socket descriptor


•level => code in the system to interprete the option(generic, IPv4, IPv6, 
TCP)
•optval => pointer to a variable from which the new value of option is 
fetched by setsockopt, or into which the current value of the option is 
stored by setsockopt.
•optlen => the size of the option variable.
Generic socket option

• SO_BROCAST =>enable or disable the ability of the process to send


broadcast message.(only datagram socket : Ethernet, token ring..)
• SO_DEBUG =>kernel keep track of detailed information about all packets sent
or received by TCP(only supported by TCP)
• SO_DONTROUTE=>outgoing packets are to bypass the normal routing
mechanisms of the underlying protocol.
• SO_ERROR=>when error occurs on a socket, the protocol module in a
Berkeley-derived kernel sets a variable named so_error for that socket.
Process can obtain the value of so_error by fetching the SO_ERROR
socket option
SO_KEEPALIVE
• SO_KEEPALIVE=>wait 2hours, and then TCP automatically sends a
keepalive probe to the peer.
– Peer response
• ACK(everything OK)
• RST(peer crashed and rebooted):ECONNRESET
• no response:ETIMEOUT =>socket closed
– example: Rlogin, Telnet…
– Normally used by servers
SO_LINGER

• SO_LINGER =>specify how the close function operates for a connection-


oriented protocol(default:close returns immediately)
– struct linger{
int l_onoff; /* 0 = off, nonzero = on */
int l_linger; /*linger time : second*/
};
• l_onoff = 0 : turn off , l_linger is ignored
• l_onoff = nonzero and l_linger is 0:TCP abort the connection (send RST),
discard any remaining data in send buffer.
• l_onoff = nonzero and l_linger is nonzero : process wait until remained data
sending, or until linger time expired. If socket has been set nonblocking it will not
wait for the close to complete, even if linger time is nonzero.
SO_LINGER

client server

write data

           Close Data queued by TCP
FIN
close returns
Ack of data and FIN

Application reads queued 
data and FIN
FIN close

Ack of data and FIN

Default operation of close:it returns immediately
SO_LINGER

client server

write data

           Close Data queued by TCP
FIN

Ack of data and FIN
close returns
Application reads queued 
data and FIN
FIN close

Ack of data and FIN

Close with SO_LINGER socket option set and l_linger a positive value
SO_LINGER

client server

write data

Shutdown Data queued by TCP
FIN
 read block
Ack of data and FIN

Application reads queued 
data and FIN
FIN close
read returns 0
Ack of data and FIN

Using shutdown to know that peer has received our data
• An way to know that the peer application has read the data
– use an application-level ack or application ACK
– client
char ack;
Write(sockfd, data, nbytes); // data from client to server
n=Read(sockfd, &ack, 1); // wait for application-level ack
– server
nbytes=Read(sockfd, buff, sizeof(buff)); //data from client
//server verifies it received the correct amount of data
from
// the client
Write(sockfd, “”, 1);//server’s ACK back to client
SO_RCVBUF , SO_SNDBUF
• let us change the default send-buffer, receive-buffer size.
– Default TCP send and receive buffer size :
• 4096bytes
• 8192-61440 bytes
– Default UDP buffer size : 9000bytes, 40000 bytes
• SO_RCVBUF option must be setting before connection established.
– For client, it should be before calling connect()
– For server it should be before calling listen()
• TCP socket buffer size should be at least three times the MSSs
SO_RCVLOWAT , SO_SNDLOWAT

• Every socket has a receive low-water mark and send low-water mark.
(used by select function)
• Receive low-water mark:
– the amount of data that must be in the socket receive buffer for select to
return “readable”.
– Default receive low-water mark : 1 for TCP and UDP
• Send low-water mark:
– the amount of available space that must exist in the socket send buffer for
select to return “writable”
– Default send low-water mark : 2048 for TCP
– UDP send buffer never change because dose not keep a copy of send
datagram.
SO_RCVTIMEO, SO_SNDTIMEO

• allow us to place a timeout on socket receives and


sends.
• Default disabled
SO_REUSEADDR, SO_REUSEPORT

• Allow a listening server to start and bind its well known port even if
previously established connection exist that use this port as their local
port.
• Allow multiple instance of the same server to be started on the same
port, as long as each instance binds a different local IP address.
• Allow a single process to bind the same port to multiple sockets, as
long as each bind specifies a different local IP address.
• Allow completely duplicate bindings : multicasting
SO_TYPE
• Return the socket type.
• Returned value is such as SOCK_STREAM,
SOCK_DGRAM...
SO_USELOOPBACK
• This option applies only to sockets in the routing
domain(AF_ROUTE).
• The socket receives a copy of everything sent on the
socket.
IPv4 socket option
• Level => IPPROTO_IP
• IP_HDRINCL => If this option is set for a raw IP
socket, we must build our IP header for all the
datagrams that we send on the raw socket.
IPv4 socket option
• IP_OPTIONS=>allows us to set IP option in IPv4
header.(chapter 24)
• IP_RECVDSTADDR=>This socket option causes the
destination IP address of a received UDP datagram
to be returned as ancillary data by recvmsg.
(chapter20)
IP_RECVIF
• Cause the index of the interface on which a UDP
datagram is received to be returned as ancillary data
by recvmsg.(chapter20)
IP_TOS
• lets us set the type-of-service(TOS) field in IP header
for a TCP or UDP socket.
• If we call getsockopt for this option, the current value
that would be placed into the TOS(type of service)
field in the IP header is returned
IP_TTL
• We can set and fetch the default TTL(time to live
field).
ICMPv6 socket option
• This socket option is processed by ICMPv6 and has
a level of IPPROTO_ICMPV6.
• ICMP6_FILTER =>lets us fetch and set an
icmp6_filter structure that specifies which of the
256possible ICMPv6 message types are passed to
the process on a raw socket.(chapter 25)
IPv6 socket option
• This socket option is processed by IPv6 and have a
level of IPPROTO_IPV6.
• IPV6_ADDRFORM=>allow a socket to be converted
from IPv4 to IPv6 or vice versa.(chapter 10)
• IPV6_CHECKSUM=>specifies the byte offset into the
user data of where the checksum field is located.
IPV6_DSTOPTS
• Specifies that any received IPv6 destination options
are to be returned as ancillary data by recvmsg.
IPV6_HOPLIMIT
• Setting this option specifies that the received hop
limit field be returned as ancillary data by recvmsg.
(chapter 20)
• Default off.
IPV6_HOPOPTS
• Setting this option specifies that any received IPv6
hop-by-hop option are to be returned as ancillary
data by recvmsg.(chapter 24)
IPV6_NEXTHOP
• This is not a socket option but the type of an ancillary
data object that can be specified to sendmsg. This
object specifies the next-hop address for a datagram
as a socket address structure.(chapter20)
IPV6_PKTINFO
• Setting this option specifies that the following two
pieces of infoemation about a received IPv6
datagram are to be returned as ancillary data by
recvmsg:the destination IPv6 address and the
arriving interface index.(chapter 20)
IPV6_PKTOPTIONS
• Most of the IPv6 socket options assume a UDP
socket with the information being passed between
the kernel and the application using ancillary data
with recvmsg and sendmsg.
• A TCP socket fetch and store these values using
IPV6_ PKTOPTIONS socket option.
IPV6_RTHDR
• Setting this option specifies that a received IPv6
routing header is to be returned as ancillary data by
recvmsg.(chapter 24)
• Default off
IPV6_UNICAST_HOPS
• This is similar to the IPv4 IP_TTL.
• Specifies the default hop limit for outgoing datagram
sent on the socket, while fetching the socket option
returns the value for the hop limit that the kernel will
use for the socket.
TCP socket option
• There are five socket option for TCP, but three are
new with Posix.1g and not widely supported.
• Specify the level as IPPROTO_TCP.
TCP_KEEPALIVE
• This is new with Posix.1g
• It specifies the idle time in second for the connection
before TCP starts sending keepalive probe.
• Default 2hours
• this option is effective only when the
SO_KEEPALIVE socket option enabled.
TCP_MAXRT
• This is new with Posix.1g.
• It specifies the amount of time in seconds before a
connection is broken once TCP starts retransmitting
data.
– 0 : use default
– -1:retransmit forever
– positive value:rounded up to next transmission time
TCP_MAXSEG
• This allows us to fetch or set the maximum segment
size(MSS) for TCP connection.
TCP_NODELAY
• This option disables TCP’s Nagle algorithm.
(default this algorithm enabled)
• purpose of the Nagle algorithm.
==>prevent a connection from having multiple small
packets outstanding at any time.
• Small packet => any packet smaller than MSS.
Nagle algorithm

• Default enabled.
• Reduce the number of small packet on the WAN.
• If given connection has outstanding data , then no
small packet data will be sent on connection until the
existing data is acknowledged.
Nagle algorithm disabled

h 0
e 250
l 500
l 750
o 1000
! 1250
1500
1500
1750
2000
Nagle algorithm enabled

h h
0
e 250
l 500
el
l 750
o 1000
! 1250
lo
1500
1500
1750
!
2000
2250
2500
fcntl function
• File control
• This function perform various descriptor control
operation.
• Provide the following features
– Nonblocking I/O(chapter 15)
– signal-driven I/O(chapter 22)
– set socket owner to receive SIGIO signal.
(chapter 21,22)
#include <fcntl.h>
int fcntl(int fd, int cmd, …./* int arg */);
Returns:depends on cmd if OK, -1 on error

O_NONBLOCK : nonblocking I/O


O_ASYNC : signal driven I/O notification
Nonblocking I/O using fcntl
Int flags;
/* set socket nonblocking */
if((flags = fcntl(fd, f_GETFL, 0)) < 0)
err_sys(“F_GETFL error”);
flags |= O_NONBLOCK;
if(fcntl(fd, F_SETFL, flags) < 0)
err_sys(“F_ SETFL error”);

each descriptor has a set of file flags that fetched with 
the F_GETFL command
and set with F_SETFL command.
Misuse of fcntl
/* wrong way to set socket nonblocking */
if(fcntl(fd, F_SETFL,O_NONBLOCK) < 0)
err_sys(“F_ SETFL error”);

/* because it also clears all the other file status flags.*/


Turn off the nonblocking flag

Flags &= ~O_NONBLOCK;


if(fcntl(fd, F_SETFL, flags) < 0)
err_sys(“F_SETFL error”);
F_SETOWN
• The integer arg value can be either positive(process
ID) or negative (group ID)value to receive the signal.
• F_GETOWN => retrurn the socket owner by fcntl
function, either process ID or process group ID.
Unix Domain Protocols
Chapter 14

Unix domain protocol


contents
• Introduction
• unix domain socket address structure
• socketpair
• socket function
• unix domain stream client-server
• unix domain datagram client-server
• passing descriptors
• receiving sender credentials
Unix Domain Protocol

• perform client-server communication on a single host using same API


that is used for client-server model on the different hosts.
• Faster than internet protocol suite
– UNIX domain sockets only copy data; they have no protocol processing to
perform, no network headers to add or remove, no checksums to calculate,
no sequence numbers to generate, and no acknowledgements to send.
• The Unix domain protocols are an alternative to the interprocess
communication (IPC) methods described
Unix Domain Protocol
• Two types of sockets are provided in the Unix
domain:
– stream sockets (similar to TCP)
– datagram sockets (similar to UDP).
• The UNIX domain datagram service is reliable, however.
Messages are neither lost nor delivered out of order
Unix Domain Protocol
• Unix domain sockets are used for three reasons:
– Unix domain sockets are often twice as fast as a TCP socket when
both peers are on the same host
– used when passing descriptors between processes on the same
host.
– Unix domain sockets provide the client's credentials (user ID and
group IDs) to the server, which can provide additional security
checking
Unix Domain Protocol
• End Point Address
– pathnames within the normal filesystem
– The pathname associated with a Unix domain socket should
be an absolute pathname
unix domain socket address structure

• <sys/un.h>
struct sockaddr_un{
uint8_t sun_len;
sa_family_t sun_family; /*AF_LOCAL*/
char sun_path[104]; /*null terminated pathname*/
};
• sun_path => must null terminated
socketpair Function

• Create two sockets that are then connected


together(only available in unix domain socket)

#include<sys/socket.h>
• family must be AF_LOCAL
int socketpair(int family, int type, int protocol, int sockfd[2]);
• protocol must be 0
                                        return: nonzero if OK, -1 on error
socketpair Function

• Although the socketpair function creates sockets that


are connected to each other, the individual sockets
don't have names.
• This means that they can't be addressed by unrelated
processes.
unix domain stream client-server

#include "unp.h"
int main(int argc, char **argv)
{
int listenfd, connfd;
pid_t childpid;
socklen_t clilen;
struct sockaddr_un cliaddr, servaddr;
void sig_chld(int);

listenfd = Socket(AF_LOCAL, SOCK_STREAM, 0);

unlink(UNIXSTR_PATH);
bzero(&servaddr, sizeof(servaddr));
servaddr.sun_family = AF_LOCAL;
strcpy(servaddr.sun_path, UNIXSTR_PATH);

Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));


Listen(listenfd, LISTENQ);
Signal(SIGCHLD, sig_chld);
unix domain stream client-server(2)

for ( ; ; ) {
clilen = sizeof(cliaddr);
if ( (connfd = accept(listenfd, (SA *) &cliaddr,
&clilen)) < 0) {
if (errno == EINTR)
continue; /* back to for() */
else
err_sys("accept error");
}
if ( (childpid = Fork()) == 0) { /* child process */
Close(listenfd); /* close listening socket */
str_echo(connfd); /* process the request */
exit(0);
}
Close(connfd); /* parent closes connected socket */
}
}
passing descriptors

• Current unix system provide a way to pass any open descriptor from one process to any other
process.(using sendmsg)
• The ability to pass an open file descriptor between processes is powerful. It can lead to different
ways of designing clientserver applications.
• It allows one process (typically a server) to do everything that is required to open a file (involving
such details as translating a network name to a network address, dialing a modem, negotiating locks
for the file, etc.) and simply pass back to the calling process a descriptor that can be used with all the
I/O functions.
• All the details involved in opening the file or device are hidden from the client.
passing descriptors(2)

1. Create a unix domain socket(stream or datagram)


2. one process opens a descriptor by calling any of the unix function that
returns a descriptor
3. the sending process build a msghdr structure containing the
descriptor to be passed
4. the receiving process calls recvmsg to receive the descriptor on the
unix domain socket
Passing a descriptor is not passing a descriptor number, but
involves creating a new descriptor in the receiving process that
refers to the same file table entry within the kernel as the
descriptor that was sent by the sending process.
Passing Descriptor
Descriptor passing example

[0]             [1]

After creating stream pipe using socketpair
mycat openfile

fork

Exec(command-line args)
[0]        [1]

descriptor

mycat program after invoking openfile program 
recvmsg and sendmsg
#include <sys/socket.h>

ssize_t recvmsg (int sockfd, struct msghdr *msg, int flags);

ssize_t sendmsg (int sockfd, struct msghdr *msg, int flags);

Struct msghdr {
void *msg_name; /* starting address of buffer */
socklen_t msg_namelen; /* size of protocol address */
struct iovec *msg_iov; /* scatter/gather array */
size_t msg_iovlen; /* # elements in msg_iov */
void *msg_control; /* ancillary data; must be aligned
for a cmsghdr structure */
socklen_t msg_controllen; /* length of ancillary data */
int msg_flags; /* flags returned by recvmsg() */
};
recvmsg and sendmsg

m s g h d r{ }
m s g _ n a m e
m s g _ n a m1 6e le n io v e c { }
m s g _ io v io v _ b a s e
1 0 0
m s g _ io v l 3e n io v _ le n
m s g _ c o n t r o l io v _ b a s e
6 0
m s g _ c o n 2 t 0r o l l e i no v _ le n
m s g _ f la g 0s io v _ b a s e
8 0
io v _ le n

F ig u re 1 3 . 8 D a t a s t ru c t u r e s w h e n r e c v m s g is c a lle d
recvmsg and sendmsg
s o c k a d d r _ in { }
1 6 , A F _ IN E T , 2 0 0 0
1 9 8 . 6 9 . 1 0 . 2

m s g h d r{ }
m s g _ n a m e
m s g _ n a m 1 e6 l e n io v e c { } [ ]
m s g _ io v io v _ b a s e
1 0 0
m s g _ io v l e3 n io v _ le n
m s g _ c o n t r o l io v _ b a s e
6 0
m s g _ c o n 2t r0 o lle ni o v _ le n
m s g _ f la g s0 io v _ b a s e
8 0
io v _ le n

c m s g _ l e1 n6
c m s g _ l e I P v eP l R O T P _ I P
c m s g _ t yI P p _e R E C V D S T A D D R
2 0 6 . 6 2 . 2 2 6 . 3 5

F ig u re 1 3 . 9 U p d a te o f F ig u re 1 3 . 8 w h e n re c v m s g r
Ancillary Data
• Ancillary data can be sent and received using the msg_control and
msg_controllen members of the msghdr structure with sendmsg and recvmsg
functions.

Protocol c
IPv4 IP
Ancillary Data
m s g _ c o n t r o l

c m s g _ l e n
c m s g _ l e v e l c m s g h d r{ }

CMSG_LEN()
c m s g _ t y p e
cmsg_len p a d a c c i lla r y
d a t a o b je c t
C M S G _ S P A C E ( )
d a t a
msg_controllen

p a d
c m s g _ l e n
c m s g _ l e v e l c m s g h d r{ }
CMSG_LEN()

c m s g _ t y p e
cmsg_len

a c c i lla r y
p a d d a t a o b je c t
C M S G _ S P A C E ( )

d a t a

F ig u r e 1 3 . 1 2 A n c illa r y d a t a c o n t a in in g t w o a n c illa r
Ancillary Data

c m s g h d r{ } c m s g h d r{ }
c m s g _ l e n1 6 c m s g _ l e n1 6
c m s g _ l e v S e Ol L _ S O C K E cT m s g _ l e v S e Ol L _ S O C K E T
c m s g _ t y p Se C M _ R IG H T Sc m s g _ t y p S e C M _ C R E D S
d i s c r i p t o r

f c r e d { }

F ig u r e 1 3 . 1 3 c m s g h d r s t r u c t u r e w h e n u s e d w it h
Control Message Header

struct cmsghdr {
socklen_t cmsg_len; /* data byte count, including header */
int cmsg_level; /* originating protocol */
int cmsg_type; /* protocol-specific type */
/* followed by the actual control message data */
};
Control Message Header

• To send a file descriptor,


– set cmsg_len to the size of the cmsghdr structure, plus the size
of an integer (the descriptor).
– The cmsg_level field is set to SOL_SOCKET, and cmsg_type is
set to SCM_RIGHTS, to indicate that we are passing access
rights. (SCM stands for socket-level control message.)
– Access rights can be passed only across a UNIX domain socket.
The descriptor is stored right after the cmsg_type field, using the
macro CMSG_DATA to obtain the pointer to this integer.
Control Message Header
#include <sys/socket.h> if (fd_to_send < 0) {
/* size of control buffer to send/recv one file msg.msg_control = NULL;
descriptor */
msg.msg_controllen = 0;
#define CONTROLLEN CMSG_LEN(sizeof(int))
buf[1] = -fd_to_send; /* nonzero status
static struct cmsghdr *cmptr = NULL; /* means error */
malloc'ed first time */
if (buf[1] == 0)
/*
buf[1] = 1; } else {
* Pass a file descriptor to another process.
if (cmptr == NULL && (cmptr = malloc(CONTROLLEN))
* If fd<0, then -fd is sent back instead as the == NULL)
error status.
return(-1);
*/
cmptr->cmsg_level = SOL_SOCKET;
int
cmptr->cmsg_type = SCM_RIGHTS;
send_fd(int fd, int fd_to_send)
cmptr->cmsg_len = CONTROLLEN;
{
msg.msg_control = cmptr;
struct iovec iov[1];
msg.msg_controllen = CONTROLLEN;
struct msghdr msg;
*(int *)CMSG_DATA(cmptr) = fd_to_send;
char buf[2]; /* /* the fd to pass */
send_fd()/recv_fd() 2-byte protocol */
buf[1] = 0; /* zero status means
OK */
iov[0].iov_base = buf; }
iov[0].iov_len = 2; buf[0] = 0; /* null byte flag to
recv_fd() */
msg.msg_iov = iov;
if (sendmsg(fd, &msg, 0) != 2)
msg.msg_iovlen = 1;
return(-1);
msg.msg_name = NULL;
return(0);
msg.msg_namelen = 0;
}
Control Message Header
#include "apue.h" if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL)
#include <sys/socket.h> /* struct msghdr */ return(-1);
msg.msg_control = cmptr;

/* size of control buffer to send/recv one file descriptor */ msg.msg_controllen = CONTROLLEN;

#define CONTROLLEN CMSG_LEN(sizeof(int)) if ((nr = recvmsg(fd, &msg, 0)) < 0) {


err_sys("recvmsg error");

static struct cmsghdr *cmptr = NULL; /* malloc'ed first time */ } else if (nr == 0) {

/* err_ret("connection closed by server");

* Receive a file descriptor from a server process. Also, any data return(-1);

* received is passed to (*userfunc)(STDERR_FILENO, buf, nbytes). }

* We have a 2-byte protocol for receiving the fd from send_fd(). for (ptr = buf; ptr < &buf[nr]; ) {
if (*ptr++ == 0) {
*/
if (ptr != &buf[nr-1])
int
err_dump("message format error");
recv_fd(int fd, ssize_t (*userfunc)(int, const void *, size_t))
status = *ptr & 0xFF; /* prevent sign extension */
{
if (status == 0) {
int newfd, nr, status;
if (msg.msg_controllen != CONTROLLEN)
char *ptr;
err_dump("status = 0 but no fd");
char buf[MAXLINE];
newfd = *(int *)CMSG_DATA(cmptr);
struct iovec iov[1];
} else {
struct msghdr msg;
newfd = -status;
}
status = -1;
nr -= 2;
for ( ; ; ) {
}
iov[0].iov_base = buf;
}
iov[0].iov_len = sizeof(buf);
if (nr > 0 && (*userfunc)(STDERR_FILENO, buf, nr) != nr)
msg.msg_iov = iov;
return(-1);
msg.msg_iovlen = 1;
if (status >= 0) /* final data has arrived */
msg.msg_name = NULL;
return(newfd); /* descriptor, or -status */
msg.msg_namelen = 0;
}
if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL) }
return(-1);
Control Message Header
if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL)
return(-1);
msg.msg_control = cmptr;
msg.msg_controllen = CONTROLLEN;
if ((nr = recvmsg(fd, &msg, 0)) < 0) {
err_sys("recvmsg error");
} else if (nr == 0) {
err_ret("connection closed by server");
return(-1);
}
for (ptr = buf; ptr < &buf[nr]; ) {
if (*ptr++ == 0) {
if (ptr != &buf[nr-1])
err_dump("message format error");
status = *ptr & 0xFF; /* prevent sign extension */
if (status == 0) {
if (msg.msg_controllen != CONTROLLEN)
err_dump("status = 0 but no fd");
newfd = *(int *)CMSG_DATA(cmptr);
} else {
newfd = -status;
}
nr -= 2;
}
}
if (nr > 0 && (*userfunc)(STDERR_FILENO, buf, nr) != nr)
return(-1);
if (status >= 0) /* final data has arrived */
return(newfd); /* descriptor, or -status */
}
}
Ancillary Data
#include "unp.h"
int my_open(const char *, int);
int   main(int argc, char **argv)
{
int fd, n;
charbuff[BUFFSIZE];

if (argc != 2)
err_quit("usage: mycat <pathname>");

if ( (fd = my_open(argv[1], O_RDONLY)) < 0)
err_sys("cannot open %s", argv[1]);

while ( (n = Read(fd, buff, BUFFSIZE)) > 0)
Write(STDOUT_FILENO, buff, n);

exit(0);
}
mycat program show in Figure 14.7)
#include "unp.h"

int
my_open(const char *pathname, int mode)
{
int fd, sockfd[2], status;
pid_t childpid;
char c, argsockfd[10], argmode[10];

Socketpair(AF_LOCAL, SOCK_STREAM, 0, sockfd);

if ( (childpid = Fork()) == 0) { /* child process */
Close(sockfd[0]);
snprintf(argsockfd, sizeof(argsockfd), "%d", sockfd[1]);
snprintf(argmode, sizeof(argmode), "%d", mode);
execl("./openfile", "openfile", argsockfd, pathname, argmode,
  (char *) NULL);
err_sys("execl error");
}

myopen function(1) : open a file and return a descriptor
/* parent process - wait for the child to terminate */
Close(sockfd[1]); /* close the end we don't use */

Waitpid(childpid, &status, 0);
if (WIFEXITED(status) == 0)
err_quit("child did not terminate");
if ( (status = WEXITSTATUS(status)) == 0)
Read_fd(sockfd[0], &c, 1, &fd);
else {
errno = status; /* set errno value from child's status */
fd = -1;
}

Close(sockfd[0]);
return(fd);
}

myopen function(2) : open a file and return a descriptor
receiving sender credentials

• User credentials via fcred structure

Struct fcred{
uid_t fc_ruid; /*real user ID*/
gid_t fc_rgid; /*real group ID*/
char fc_login[MAXLOGNAME];/*setlogin() name*/
uid_t fc_uid; /*effectivr user ID*/
short fc_ngroups; /*number of groups*/
gid_t fc_groups[NGROUPS]; /*supplemenary group IDs*/
};
#define fc_gid  fc_groups[0] /* effective group ID */
receiving sender credentials(2)

• Usally MAXLOGNAME is 16
• NGROUP is 16
• fc_ngroups is at least 1

• the credentials are sent as ancillary data when data is sent on unix domain socket.(only if
receiver of data has enabled the LOCAL_CREDS socket option)
• on a datagram socket , the credentials accompany every datagram.
• Credentials cannot be sent along with a descriptor
• user are not able to forge credentials
Advanced I/O Functions
Outline
• Socket Timeouts
• recv and send Functions
• readv and writev Functions
• recvmsg and sendmsg Function
• Ancillary Data
• How much Data is Queued?
• Sockets and Standard I/O
Socket Timeouts
• Three ways to place a timeout on an I/O operation involving a
socket
– Call alarm, which generates the SIGALRM signal when the
specified time has expired.
– Block waiting for I/O in select, which has a time limit built in, instead
of blocking in a call to read or write.
– Use the newer SO_RCVTIMEO and SO_SNDTIMEO socket
options.
Connect with a Timeout Using SIGALRM

static void connect_alarm(int);


int connect_timeo(int sockfd, const SA *saptr, socklen_t salen, int nsec)
{
Sigfunc *sigfunc;
int n;
sigfunc = Signal(SIGALRM, connect_alarm);
if (alarm(nsec) != 0)
err_msg("connect_timeo: alarm was already set");
if ( (n = connect(sockfd, (struct sockaddr *) saptr, salen)) < 0) {
close(sockfd);
if (errno == EINTR)
errno = ETIMEDOUT;
}
alarm(0); /* turn off the alarm */
return(n);
}
static void
connect_alarm(int signo)
{
return; /* just interrupt the connect() */
}
recvfrom with a Timeout Using SIGALRM

static void sig_alrm(int);


void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t servlen)
{
int n;
char sendline[MAXLINE], recvline[MAXLINE + 1];
Signal(SIGALRM, sig_alrm);
while (Fgets(sendline, MAXLINE, fp) != NULL) {
Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);
alarm(5);
if ( (n = recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL)) < 0) {
if (errno == EINTR)
fprintf(stderr, "socket timeout\n");
else
err_sys("recvfrom error");
} else {
alarm(0);
recvline[n] = 0; /* null terminate */
Fputs(recvline, stdout);
}
}
}
static void sig_alrm(int signo)
{
return; /* just interrupt the recvfrom() */
}
recvfrom with a Timeout Using select

int
readable_timeo(int fd, int sec)
{
fd_set rset;
struct timeval tv;

FD_ZERO(&rset);
FD_SET(fd, &rset);

tv.tv_sec = sec;
tv.tv_usec = 0;

return(select(fd+1, &rset, NULL, NULL, &tv));


/* > 0 if descriptor is readable */
}
Timeout Using the SO_RCVTIMEO SO_SNDTIMEO Socket Option

• We set this option once for a descriptor, specifying the timeout


value, and this timeout then applies to all read operations on
that descriptor.
• we set the option only once, compared to the previous two
methods, which required doing something before every
operation on which we wanted to place a time limit.
• neither socket option can be used to set a timeout for a connect.
recvfrom with a Timeout Using the SO_RCVTIMEO Socket Option

int n;
char sendline[MAXLINE], recvline[MAXLINE + 1];
struct timeval tv;
tv.tv_sec = 5;
tv.tv_usec = 0;
Setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
while (Fgets(sendline, MAXLINE, fp) != NULL) {
Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);
n = recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL);
if (n < 0) {
if (errno == EWOULDBLOCK) {
fprintf(stderr, "socket timeout\n");
continue;
} else
err_sys("recvfrom error");
}
recvline[n] = 0; /* null terminate */
Fputs(recvline, stdout);
}
recv and send Functions

#include <sys/socket.h>

ssize_t recv (int sockfd, void *buff, size_t nbytes, int flags);

ssize_t send (int sockfd, const void *buff, size_t nbytes, int flags);

Flag

M S G _D O N T
readv and writev Functions
#include <sys/uio.h>

ssize_t readv (int filedes, const struct iovec *iov, int iovcnt);

ssize_t writev (int filedes, const struct iovec *iov, int iovcnt);

Struct iovec {
void *iov_base; /* starting address of buffer */
size_t iov_len; /* size of buffer */
};

– readv and writev let us read into or write from one or more
buffers with a single function call.
• are called scatter read and gather write.
readv and writev Functions

– The readv and writev functions can be used with any descriptor, not just sockets.
– writev is an atomic operation. For a record-based protocol such as UDP, one call to
writev generates a single UDP datagram.
– One use of writev with the TCP_NODELAY socket option. //modify
• a write of 4 bytes followed by a write of 396 bytes could invoke the Nagle algorithm and a
preferred solution is to call writev for the two buffers.
Nagle’s Algorithm

if there is new data to send


if the window size >= MSS and available data is >= MSS
send complete MSS segment now
else
if there is unconfirmed data still in the pipe
enqueue data in the buffer until an acknowledge is received
else
send data immediately
end if
end if
end if
recvmsg and sendmsg
#include <sys/socket.h>

ssize_t recvmsg (int sockfd, struct msghdr *msg, int flags);

ssize_t sendmsg (int sockfd, struct msghdr *msg, int flags);

Struct msghdr {
void *msg_name; /* starting address of buffer */
socklen_t msg_namelen; /* size of protocol address */
struct iovec *msg_iov; /* scatter/gather array */
size_t msg_iovlen; /* # elements in msg_iov */
void *msg_control; /* ancillary data; must be aligned
for a cmsghdr structure */
socklen_t msg_controllen; /* length of ancillary data */
int msg_flags; /* flags returned by recvmsg() */
};
recvmsg and sendmsg

Flag
recvmsg and sendmsg

m s g h d r{ }
m s g _ n a m e
m s g _ n a m1 6e le n io v e c { }
m s g _ io v io v _ b a s e
1 0 0
m s g _ io v l 3e n io v _ le n
m s g _ c o n t r o l io v _ b a s e
6 0
m s g _ c o n 2 t 0r o l l e i no v _ le n
m s g _ f la g 0s io v _ b a s e
8 0
io v _ le n

F ig u re 1 3 . 8 D a t a s t ru c t u r e s w h e n r e c v m s g is c a lle d
recvmsg and sendmsg
s o c k a d d r _ in { }
1 6 , A F _ IN E T , 2 0 0 0
1 9 8 . 6 9 . 1 0 . 2

m s g h d r{ }
m s g _ n a m e
m s g _ n a m 1 e6 l e n io v e c { } [ ]
m s g _ io v io v _ b a s e
1 0 0
m s g _ io v l e3 n io v _ le n
m s g _ c o n t r o l io v _ b a s e
6 0
m s g _ c o n 2t r0 o lle ni o v _ le n
m s g _ f la g s0 io v _ b a s e
8 0
io v _ le n

c m s g _ l e1 n6
c m s g _ l e I P v eP l R O T P _ I P
c m s g _ t yI P p _e R E C V D S T A D D R
2 0 6 . 6 2 . 2 2 6 . 3 5

F ig u re 1 3 . 9 U p d a te o f F ig u re 1 3 . 8 w h e n re c v m s g r
Ancillary Data
• Ancillary data can be sent and received using the msg_control and
msg_controllen members of the msghdr structure with sendmsg and recvmsg
functions.

Protocol c
IPv4 IP
Ancillary Data
m s g _ c o n t r o l

c m s g _ l e n
c m s g _ l e v e l c m s g h d r{ }

CMSG_LEN()
c m s g _ t y p e
cmsg_len p a d a c c i lla r y
d a t a o b je c t
C M S G _ S P A C E ( )
d a t a
msg_controllen

p a d
c m s g _ l e n
c m s g _ l e v e l c m s g h d r{ }
CMSG_LEN()

c m s g _ t y p e
cmsg_len

a c c i lla r y
p a d d a t a o b je c t
C M S G _ S P A C E ( )

d a t a

F ig u r e 1 3 . 1 2 A n c illa r y d a t a c o n t a in in g t w o a n c illa r
Ancillary Data

c m s g h d r{ } c m s g h d r{ }
c m s g _ l e n1 6 c m s g _ l e n1 6
c m s g _ l e v S e Ol L _ S O C K E cT m s g _ l e v S e Ol L _ S O C K E T
c m s g _ t y p Se C M _ R IG H T Sc m s g _ t y p S e C M _ C R E D S
d i s c r i p t o r

f c r e d { }

F ig u r e 1 3 . 1 3 c m s g h d r s t r u c t u r e w h e n u s e d w it h
How Much Data Is Queued?
• nonblocking I/O
• MSG_PEEK with MSG_DONTWAIT flag
• FIONREAD command of ioctl
Sockets and Standard I/O
• The standard I/O stream can be used with sockets, but there are a few
items to consider.

– A standard I/O stream can be created from any desciptor by calling the
fdopen function. Similarly, given a standard I/O stream, we can obtain the
corresponding descriptor by calling fileno.
– fseek, fsetpos, rewind functions is that they all call lseek, which fails on a
socket.
– The easiest way to handle this read-write problem is to open two standard
I/O streams for a given socket: one for reading, and one for writing.
Standard i/O buffers
• Fully buffered: i/O takes place only when the buffer is
full, fflush() or exit() 8192 bytes
• Line buffered: i/O takes place when a new line is
encountered, fflush(), or exit()
• Unbuffered: i/O take place each time a standard i/O
output function is called.
Standard i/O buffers
• Standard error is always unbuffered
• Standard input and standard output are fully buffered,
unless they refer to a terminal device in which case
they are line buffered.
• All other streams are fully buffered unless they refer
to terminal device in which case they are line
buffered.
Sockets and Standard I/O
#include "unp.h"

void
str_echo(int sockfd)
{
char line[MAXLINE];
FILE *fpin, *fpout;

fpin = Fdopen(sockfd, "r");


fpout = Fdopen(sockfd, "w");

for ( ; ; ) {
if (Fgets(line, MAXLINE, fpin) == NULL)
return; /* connection closed by other end */

Fputs(line, fpout);
}
}
Chapter 12.
Daemon Processes
and inetd Superserver
12.1 Introduction
• A daemon is a process that runs in the background and is
independent of control from all terminals.
• There are numerous ways to start a daemon
1. the system initialization scripts ( /etc/rc )
2. the inetd superserver
3. cron deamon
4. the at command
5. from user terminals

• Since a daemon does not have a controlling terminal, it


needs some way to output message when something
happens, either normal informational messages, or
emergency messages that need to be handled by an
administrator.
12.2 syslogd daemon
• Berkeley-derived implementation of syslogd perform the following
actions upon startup.
1. The configuration file is read, specifying what to do with each type
of log message that the daemon can receive.
2. A Unix domain socket is created and bound to the pathname
/var/run/log ( /dev/log on some system).
3. A UDP socket is created and bound to port 514
4. The pathname /dev/klog is opened. Any error messages from
within the kernel appear as input on this device.

• We could send log messages to the syslogd daemon from our


daemons by creating a Unix domain datagram socket and sending our
messages to the pathname that the daemon has bound, but an easier
interface is the syslog function.
syslogd

Filesystem
Unix domain socket
/var/log/messages
/dev/log

UDP socket
port 514 syslogd
syslogd Console

/dev/klog
Remote syslogd
12. 3 syslog function
#include <syslog.h>
void syslog(int priority, const char *message, . . . );

– the priority argument is a combination of a level and a


facility.
– The message is like a format string to printf, with the
addition of a %m specification, which is replaced with the
error message corresponding to the current value of errno.

Ex) Syslog(LOG_INFO|LOG_LOCAL2, “rename(%s, %s):


%m”,file1,file2);
12. 3 syslog function

• Log message
level have
value a level between 0 and 7.
description
LOG_EMERG 0 system is unusable ( highest priority )
LOG_ALERT 1 action must be taken immediately
LOG_CRIT 2 critical conditions
LOG_ERR 3 error conditions
LOG_WARNING 4 warning conditions
LOG_NOTICE 5 normal but significant condition (default)
LOG_INFO 6 informational
LOG_DEBUG 7 debug-level message ( lowest priority )
Figure 12.1 level of log message.
12. 3 syslog function
facility Description
• A facility to identify
LOG_AUTH the type of process sending the
security / authorization messages
message.
LOG_AUTHPRIV security / authorization messages (private)
LOG_CRON cron daemon
LOG_DAEMON system daemons
LOG_FTP FTP daemon
LOG_KERN kernel messages
LOG_LOCAL0 local use
LOG_LOCAL1 local use
LOG_LOCAL2 local use
LOG_LOCAL3 local use
LOG_LOCAL4 local use
LOG_LOCAL5 local use
LOG_LOCAL6 local use
LOG_LOCAL7 local use
LOG_LPR line printer system
LOG_MAIL mail system
LOG_NEWS network news system
LOG_SYSLOG messages generated internally by  syslog
LOG_USER random user-level messages(default)
LOG_UUCP UUCP system
Figure 12.2  facility  of log messages.
12. 3 syslog function
• Openlog and closelog
– openlog can be called before the first call to syslog and
closelog can be called when the application is finished
sending is finished log messages.

options
LOG_CONS
#include <syslog.h>
void openlog(const char *ident, int options, int facility);
void closelog(void);
Unix Login
Unix Login
Process Group
• process group is a collection of one or more
processes, usually associated with the same job
• int setpgid(pid_t pid, pid_t pgid);
• pid_t getpgid(pid_t pid);
• It is possible for a process group leader to create a
process group, create processes in the group, and
then terminate. The process group still exists, as long
as at least one process is in the group, regardless of
whether the group leader terminates

Process Groups in a Session

• The processes in a process group are usually placed


there by a shell pipeline
– proc1 | proc2 &
– proc3 | proc4 | proc5
Creating Session

• A process establishes a new session by calling the setsid


function
• If the calling process is not a process group leader, this
function creates a new session. Three things happen.
– The process becomes the session leader of this new session. (A
session leader is the process that creates a session.) The process is
the only process in this new session.
– The process becomes the process group leader of a new process
group. The new process group ID is the process ID of the calling
process.
– The process has no controlling terminal. If the process had a
controlling terminal before calling setsid, that association is broken.
setsid

• pid_t setsid(void);
• This function returns an error if the caller is already a
process group leader.
• To ensure this is not the case, the usual practice is to
call fork and have the parent terminate and the child
continue. We are guaranteed that the child is not a
process group leader, because the process group ID
of the parent is inherited by the child, but the child
gets a new process ID. Hence, it is impossible for the
child's process ID to equal its inherited process group
ID
Controlling Terminal
#include
#define
12.4 daemon_init Function
<syslog.h>
MAXFD 64
extern int daemon_proc; /* defined in error.c */
void daemon_init(const char *pname, int facility)
{
int i;
pid_t pid;

if ( (pid = Fork()) != 0)
exit(0); /* parent terminates */
/* 1st child continues */
setsid(); /* become session leader */
Signal(SIGHUP, SIG_IGN);
if ( (pid = Fork()) != 0) exit(0); /* 1st child terminates */

/* 2nd child continues */


daemon_proc = 1; /* for our err_XXX() functions */
chdir("/"); /* change working directory */
umask(0); /* clear our file mode creation mask */

for (i = 0; i < MAXFD; i++)


close(i);

openlog(pname, LOG_PID, facility);


}
Daemon_init

1. We first call fork and then the parent terminates, and


the child continues. If the process was started as a
shell command in the foreground, when the parent
terminates, the shell thinks the command is done.
This automatically runs the child process in the
background. Also, the child inherits the process
group ID from the parent but gets its own process
ID. This guarantees that the child is not a process
group leader, which is required for the next call to
setsid
2. The process becomes the session leader of the new
session, becomes the process group leader of a
new process group, and has no controlling terminal
Daemon_init

• We ignore SIGHUP and call fork again. When this function


returns, the parent is really the first child and it terminates,
leaving the second child running. The purpose of this second
fork is to guarantee that the daemon cannot automatically
acquire a controlling terminal should it open a terminal device
in the future. When a session leader without a controlling
terminal opens a terminal device (that is not currently some
other session's controlling terminal), the terminal becomes the
controlling terminal of the session leader. But by calling fork a
second time, we guarantee that the second child is no longer a
session leader, so it cannot acquire a controlling terminal. We
must ignore SIGHUP because when the session leader
terminates (the first child), all processes in the session (our
second child) receive the SIGHUP signal.
12.5 inetd Daemon

• A typical Unix system’s problems


1. All these daemons contained nearly identical startup code.
2. Each daemon took a slot in the process table, but each daemon
was asleep most of the time.

• inetd daemon fixes the two problems.


1. It simplifies writing daemon processes, since most of the startup
details are handled by inetd.
2. It allow a single process(inetd) to be waiting for incoming client
requests for multiple services, instead of one process for each
service.
12.5 inetd daemon s o c k e t ( )

• Figure 12.7 b i n d ( )

l i s t e n ( )
( i f T C P s

s e l e c t ( )
f o r r e a d

a c c p e t ( )
( i f T C P

f o r k ( )
inetd service specification

• For each service, inetd needs to know:


– the socket type and transport protocol
– wait/nowait flag.
– login name the process should run as.
– pathname of real server program.
– command line arguments to server program.
• Servers that are expected to deal with frequent requests are
typically not run from inetd
– mail, web, NFS.
Example /etc/inetd.conf
# Syntax for socket-based Internet services:
# <service_name> <socket_type> <proto> <flags> <user> <server_pathname> <args>
#
# comments start with #
echo stream tcp nowait root internal
echo dgram udp wait root internal
chargen stream tcp nowait root internal
chargen dgram udp wait root internal
ftp stream tcp nowait root /usr/sbin/ftpd ftpd -l
telnet stream tcp nowait root /usr/sbin/telnetd telnetd
finger stream tcp nowait root /usr/sbin/fingerd fingerd
# Authentication
auth stream tcp nowait nobody /usr/sbin/in.identd in.identd -l -e -o
# TFTP
tftp dgram udp wait root /usr/sbin/tftpd tftpd -s /tftpboot
wait/nowait

• WAIT specifies that inetd should not look for new clients for
the service until the child (the real server) has terminated.
• TCP servers usually specify nowait - this means inetd can
start multiple copies of the TCP server program - providing
concurrency
• Most UDP services run with inetd told to wait until the child
server has died.
Broadcasting

• Many networks support the notion of sending a


message from one host to all other hosts on the
network.
• A special address called the “broadcast address” is
often used.
• Some popular network services are based on
broadcasting (YP/NIS, rup, rusers)

Broadcasting 578
Broadcasting
• TCP works only with unicast addresses, UDP supports also broadcasting
and multicasting

Type IPv4 IPv6 TCP UDP


Unicast    
Broadcast  
Multicast opt.  

• Multicasting support is optional in IPv4, but mandatory in IPv6


• Broadcasting support is not provided in IPv6; if an IPv4 application uses
broadcasting, recode with IPv6 to use multicasting instead of
broadcasting

Broadcasting 579
Broadcasting

Types of Casting:
Unicast: One to One
Anycast: a set to one in a set
Multicast: a set to all in a set
Broadcast: all to all

Useful over LAN only, and with UDP

Broadcasting 580
Uses of Broadcasting
• Mainly used for resource discovery purposes (server is known to exist in the local
subnet, but IP address is not known)

– ARP (Address Resolution Protocol)


• Broadcast to find MAC address for known IP address – The owner of the
IP address is to reply
– BOOTP (Bootstrap Protocol)
• For a diskless workstation to discover its own IP address, the IP address
of a BOOTP server on the network, and a file to be loaded into memory to
boot the machine

– NTP (Network Time Protocol)


• To synchronize time and coordinate time distribution in a large network
– Routing Daemons :broadcasts routing table on LAN

Broadcasting 581
Broadcast Address Types
• IPv4 address: {netid; subnetid; hostid}
– Subnet-directed Broadcast Address:
• {netid; subnetid; -1} //-1 means all bits are 1’s
• netid = 128.7, subnetid: 6
Broadcast Address: 128.7.6.255
• Normally, routers do not forward these broadcasts

– All-subnets-directed Broadcast Address:


• {netid; -1; -1}
• All subnets on the specified network – very rarely used

– Network-directed Broadcast Address:


• {netid: -1}
• If a network has no subnetting – almost non-existent

Broadcasting 582
Broadcast Address Types

– Limited Broadcast Address:


• {-1; -1; -1} or 255.255.255.255
• Must never be forwarded by a router

• Subnet-directed broadcast and limited broadcast are the most common


• Old systems do not understand subnet-directed broadcast
• For protocols like BOOTP, 255.255.255.255 is the only option

Broadcasting 583
Unicast Vs Broadcast

 In Unicast, only peers participate


 In Broadcast, every host on the subnet has to receive the packet and
process it up to the transport layer i.e through DL,IP, and UDP
 Every non-IP host also must receive at the datalink layer
 If broadcast datagrams arrive at higher rate, processing can affect
severely the performance

Broadcasting 584
Unicast

Sending Receiving
Appl Appl
Sendto
Dest IP: 7433
128.7.6.5 Port
Dest Port: 7433 =7433
UDP UDP UDP

Protocol
=UDP
IPv4 IPv4 IPv4
128.7.6.99 = unicast 128.7.6.5 = unicast
128.7.6.255 = broadcast 128.7.6.255 = broadcast
Frame type
= 0800
Data Data Data
Link Link Link
02:60:8c:2f:4e:00 08:00:20:03:f6:42
subnet 128.7.6
Enet IPv4 UDP UDP
Dest Enet: 08:00:20:03:f6:42
hdr hdr hdr Data
Frame type: 0800
Dest Port: 7433
Dest IP: 128.7.6.5
Protocol: UDP
Broadcasting 585
Broadcast

Sending Set SO_BROADCAST Receiving


Appl option using setsockopt() Appl
sendto
Dest IP: 128.7.6.255 520
Dest Port: 520 Port
=520
UDP UDP Discard UDP

Protocol Protocol
=UDP =UDP
IPv4 IPv4 IPv4
128.7.6.99 = unicast 128.7.6.5 = unicast
128.7.6.255 = broadcast 128.7.6.255 = broadcast
Frame type Frame type
= 0800 = 0800
Data Data Data
Link Link Link
02:60:8c:2f:4e:00 02:60:20:03:f6:42
subnet 128.7.6
Enet IPv4 UDP UDP
Dest Enet: ff:ff:ff:ff:ff:ff
hdr hdr hdr Data
Frame type: 0800
Dest Port: 520
Dest IP: 128.7.6.255
Protocol: UDP
Broadcasting 586
Programming Requirements

• Socket option has to be set with SO_BROADCAST

• Setsockopt(sockfd,
SOL_SOCKET,SO_BROADCAST,&on,sizeof(on)).
• IP Fragmentation: BSD generates EMSGSIZE if size
exceeds outgoing MTU

Broadcasting 587
Race Condition

- When multiple processes accessing shared data output


depends on the execution order of the processes.

void dg_cli(…) {
setsockopt(sockfd, SOL_SOCKET,SO_BROADCAST,&on,sizeof(on));
signal(SIGALRM, func);
while(fgets(…)!=NULL) {
sendto(…); Problem?
alarm(1);
for(; ; ) {
if (n=recvfrom(…) <0) {
if (errno==EINTR) break;
else err_sys(…);
} else {
recvline[n]=0;
sleep(1);
printf(…);
}}}
Void func( int signo) { return; }

Broadcasting 588
Solutions to Race Condition

1. By Un-blocking and Blocking SIGALRM


sigemptyset(&sig1);
sigaddset(&sig1, SIGALRM);
signal(SIGALRM, func); Signal Generation and
while(fgets(…) !=NULL))
sendto(…); Delivery is controlled
alarm(5);
for(; ; ){
sigprocmask(SIG_UNBLOCK, &sig1,NULL);
n=recvfrom(…);
sigprocmask(SIG_BLOCK,&sig1, NULL);
if(n<0) {
if (errno==EINTR) break; else err_sys(…);
} else { recvline[n]=0; printf(…); }}} Window is
void func(…)
{return;}
reduced but the
problem still
persists

Broadcasting 589
2. pselect can be used with SIGALRM first blocked and then
pselect being called with an empty signal set as it’s last
argument.

pselect, blocking and unblocking being atomic calls, earlier


problem does not persist.

Broadcasting 590
3. Using non-local goto siglongjmp to jump from signal
handler to the caller.
signal(SIGALRM, func);
while (fgets(…)!=NULL) {
sendto(…);
alarm(5);
for(; ;) {
if (sigsetjmp(jmpbuf, 1) != 0)
break;
n=recvfrom(…);
recvline[n]=0;
printf(…);
}
void func(…) {
siglongjmp(jmpbuf, 1);
}

Broadcasting 591
4. Using IPC from signal handler to function
void dg_cli(…) {
setsockopt(…);
pipe (pipefd);
FD_ZERO(&rset);
signal(SIGALRM, func);
while(fgets(…)!=NULL){
sendto(…);
alarm(5);
for(; ;) {
FD_SET(sockfd, &rset);
FD_SET(pipefd[0],&rset);
if(n = select (…) <0) {
if (errno==EINTR) continue; else err_sys(…); }
if (FD_ISSET(sockfd, &rset) ) {
recvfrom(…); printf(…); }
if (FD_ISSET(pipefd[0], &rset)) {
read(pipefd[0], &n, 1); break; }
void func(int signo) {
write (pipefd[1], “ ”, 1); return;}

Broadcasting 592
Multicasting

• IPv4 Class D addresses are multicast addresses


– Range 224.0.0.0 to 239.255.255.255

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

CLASS 0A: NET-ID(7b) HOST-ID (24b)

CLASS 1–0 32
B: bit Class
NET-IDD (12b)
address is called the group address
HOST-ID (14b)

CLASS 1C:
1 0 NET-ID (21b) HOST-ID (8b)

CLASS 1D:
1 1 0 GROUP-ID (28b)

Multicasting 593
• A mapping from IPv4 multicast addresses to Ethernet addresses
is also defined
– High order 24 bits always 01:00:5e
– 25th bit is 0
– Low order 23 bits from lowest 23 bits of multicast group address
– Not one-to-one, many (32) multicast addresses to a single Ethernet
address

• Broadcasting is normally limited to LANs, whereas Multicasting


can be done in LANs or WANs

Multicasting 594
multicast address
• IPv4 class D address
– 224.0.0.0 ~ 239.255.255.255
– (224.0.0.1: all hosts group), (224.0.0.2: all-routers group)
Multicast Addresses Scope
Multicast Session
• Especially in the case of streaming multimedia, the combination
of an IP multicast address (either IPv4 or IPv6) and a transport-
layer port (typically UDP) is referred to as a session.
• For example, an audio/video teleconference may comprise two
sessions; one for audio and one for video. These sessions
almost always use different ports and sometimes also use
different groups for flexibility in choice when receiving.
Multicast vs Broadcast

Sending Receiving
Appl Appl
sendto
Dest IP: 224.0.1.1 123
Dest Port: 123 Port
=123 join
UDP UDP UDP
224.0.1.1

Protocol
=UDP
IPv4 IPv4 Perfect sw filtering IPv4
based on dest IP
receive
Frame type 01:00:5e:
= 0800 00:01:01
Data Data Imperfect hw filtering Data
Link Link based on dest Enet Link
02:60:8c:2f:4e:00 02:60:20:03:f6:42
subnet 128.7.6
Enet IPv4 UDP UDP
Dest Enet: 01:00:5e:00:01:01
hdr hdr hdr Data
Frame type: 0800
Dest Port: 123
Dest IP: 224.0.1.1
Protocol: UDP
Multicasting 598
Multicasting on a WAN

MR1 MR5

MR2 MR3 MR4

Multicasting 599
Hosts joining a Multicast Group

join
group
H1

MR1 MR5

MRP MRP
MRP MRP
MR2 MR3 MR4

H2 H3 H4 H5
join join join join
group group group group

Multicasting 600
Sending packets on a WAN

join
group
H1

MR1 MR5

MR2 MR3 MR4

H2 H3 H4 H5
join join join join
group group group group

Multicasting 601
Multicasting
• Specifically note that;
– All interested multicast routers receive the packets, MR5 does not
receive any since there are no interested hosts in its LAN
– Packets are put to the specific LAN only if there are hosts in that LAN
to receive those packets, MR3 only forwards
– Multicast router MR2 both puts packets on its LAN for hosts H2 & H3,
and also makes a copy of the packets and forwards them to MR3.
– This behavior is something unique to multicast forwarding.

Multicasting 602
Source-Specific Multicast
• Multicasting on a WAN has been difficult to deploy for several
reasons.
– The biggest problem is that the MRP; needs to get the data from all
the senders, which may be located anywhere in the network, to all
the receivers, which may similarly be located anywhere.
– Another large problem is multicast address allocation: There are
not enough IPv4 multicast addresses to statically assign them to
everyone who wants one, as is done with unicast addresses.
Source-Specific Multicast
• combines the group address with a system's source address, which solves the
problems as follows:
– The receivers supply the sender's source address to the routers as part of joining the
group.
– This removes the rendezvous problem from the network, as the network now knows
exactly where the sender is.
– However, it retains the scaling properties of not requiring the sender to know who all
the receivers are. This simplifies multicast routing protocols immensely.
• It redefines the identifier from simply being a multicast group address to being a
combination of a unicast source and multicast destination (which SSM now calls
a channel.
• An SSM session is the combination of source, destination, and port
• struct ip_mreq {
• struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */
• struct in_addr imr_interface; /* IPv4 addr of local interface */
• };

• struct ipv6_mreq {
• struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */
• unsigned int ipv6mr_interface; /* interface index, or 0 */
• };

• struct group_req {
• unsigned int gr_interface; /* interface index, or 0 */
• struct sockaddr_storage gr_group; /* IPv4 or IPv6 multicast addr */
• }
struct ip_mreq_source {
struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */
struct in_addr imr_sourceaddr; /* IPv4 source addr */
struct in_addr imr_interface; /* IPv4 addr of local interface */
};

struct group_source_req {
unsigned int gsr_interface; /* interface index, or 0 */
struct sockaddr_storage gsr_group; /* IPv4 or IPv6 multicast addr */
struct sockaddr_storage gsr_source; /* IPv4 or IPv6 source addr */
}
Multicast Socket Options

• Use setsockopt() to modify socket options


– IP_ADD_MEMBERSHIP
• Join a multicast group on a specified local interface
– IP_DROP_MEMBERSHIP
• Leave a multicast group
– IP_MULTICAST_IF
• Specify the interface for outgoing multicast datagrams sent on this socket
– IP_MULTICAST_TTL
• Set the IPv4 TTL parameter (if not specified, default=1)
– IP_MULTICAST_LOOP
• Enable or disable local loopback (default is enabled)

Multicasting 609
Multicasting

• IPv4 Class D addresses are multicast addresses


– Range 224.0.0.0 to 239.255.255.255

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

CLASS 0A: NET-ID(7b) HOST-ID (24b)

CLASS 1–0 32
B: bit Class
NET-IDD (12b)
address is called the group address
HOST-ID (14b)

CLASS 1C:
1 0 NET-ID (21b) HOST-ID (8b)

CLASS 1D:
1 1 0 GROUP-ID (28b)

Multicasting 610
• A mapping from IPv4 multicast addresses to Ethernet addresses
is also defined
– High order 24 bits always 01:00:5e
– 25th bit is 0
– Low order 23 bits from lowest 23 bits of multicast group address
– Not one-to-one, many (32) multicast addresses to a single Ethernet
address

• Broadcasting is normally limited to LANs, whereas Multicasting


can be done in LANs or WANs

Multicasting 611
multicast address
• IPv4 class D address
– 224.0.0.0 ~ 239.255.255.255
– (224.0.0.1: all hosts group), (224.0.0.2: all-routers group)
Multicast Addresses Scope
Multicast Session
• Especially in the case of streaming multimedia, the combination
of an IP multicast address (either IPv4 or IPv6) and a transport-
layer port (typically UDP) is referred to as a session.
• For example, an audio/video teleconference may comprise two
sessions; one for audio and one for video. These sessions
almost always use different ports and sometimes also use
different groups for flexibility in choice when receiving.
Multicast vs Broadcast

Sending Receiving
Appl Appl
sendto
Dest IP: 224.0.1.1 123
Dest Port: 123 Port
=123 join
UDP UDP UDP
224.0.1.1

Protocol
=UDP
IPv4 IPv4 Perfect sw filtering IPv4
based on dest IP
receive
Frame type 01:00:5e:
= 0800 00:01:01
Data Data Imperfect hw filtering Data
Link Link based on dest Enet Link
02:60:8c:2f:4e:00 02:60:20:03:f6:42
subnet 128.7.6
Enet IPv4 UDP UDP
Dest Enet: 01:00:5e:00:01:01
hdr hdr hdr Data
Frame type: 0800
Dest Port: 123
Dest IP: 224.0.1.1
Protocol: UDP
Multicasting 615
Multicasting on a WAN

MR1 MR5

MR2 MR3 MR4

Multicasting 616
Hosts joining a Multicast Group

join
group
H1

MR1 MR5

MRP MRP
MRP MRP
MR2 MR3 MR4

H2 H3 H4 H5
join join join join
group group group group

Multicasting 617
Sending packets on a WAN

join
group
H1

MR1 MR5

MR2 MR3 MR4

H2 H3 H4 H5
join join join join
group group group group

Multicasting 618
Multicasting

• Specifically note that;


– All interested multicast routers receive the packets, MR5 does not
receive any since there are no interested hosts in its LAN
– Packets are put to the specific LAN only if there are hosts in that LAN
to receive those packets, MR3 only forwards
– Multicast router MR2 both puts packets on its LAN for hosts H2 & H3,
and also makes a copy of the packets and forwards them to MR3.
– This behavior is something unique to multicast forwarding.

Multicasting 619
Source-Specific Multicast
• Multicasting on a WAN has been difficult to deploy for several
reasons.
– The biggest problem is that the MRP; needs to get the data from all
the senders, which may be located anywhere in the network, to all
the receivers, which may similarly be located anywhere.
– Another large problem is multicast address allocation: There are
not enough IPv4 multicast addresses to statically assign them to
everyone who wants one, as is done with unicast addresses.
Source-Specific Multicast
• combines the group address with a system's source address, which solves the
problems as follows:
– The receivers supply the sender's source address to the routers as part of joining the
group.
– This removes the rendezvous problem from the network, as the network now knows
exactly where the sender is.
– However, it retains the scaling properties of not requiring the sender to know who all
the receivers are. This simplifies multicast routing protocols immensely.
• It redefines the identifier from simply being a multicast group address to being a
combination of a unicast source and multicast destination (which SSM now calls
a channel.
• An SSM session is the combination of source, destination, and port
• struct ip_mreq {
• struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */
• struct in_addr imr_interface; /* IPv4 addr of local interface */
• };

• struct ipv6_mreq {
• struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */
• unsigned int ipv6mr_interface; /* interface index, or 0 */
• };

• struct group_req {
• unsigned int gr_interface; /* interface index, or 0 */
• struct sockaddr_storage gr_group; /* IPv4 or IPv6 multicast addr */
• }
struct ip_mreq_source {
struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */
struct in_addr imr_sourceaddr; /* IPv4 source addr */
struct in_addr imr_interface; /* IPv4 addr of local interface */
};

struct group_source_req {
unsigned int gsr_interface; /* interface index, or 0 */
struct sockaddr_storage gsr_group; /* IPv4 or IPv6 multicast addr */
struct sockaddr_storage gsr_source; /* IPv4 or IPv6 source addr */
}
Multicast Socket Options

• Use setsockopt() to modify socket options


– IP_ADD_MEMBERSHIP
• Join a multicast group on a specified local interface
– IP_DROP_MEMBERSHIP
• Leave a multicast group
– IP_MULTICAST_IF
• Specify the interface for outgoing multicast datagrams sent on this socket
– IP_MULTICAST_TTL
• Set the IPv4 TTL parameter (if not specified, default=1)
– IP_MULTICAST_LOOP
• Enable or disable local loopback (default is enabled)

Multicasting 626
Distributed Program Design
c al
• Communication-Oriented Design
yp i s
T e t
– Design protocol first.
oc k a ch
– Build programs that adhere to the protocol. S r o
p
Ap
• Application-Oriented Design
– Build application(s).
– Divide programs up and add communication protocols.

PC
R
RPC
Remote Procedure Call
• Call a procedure (subroutine) that is running on
another machine.
• Issues:
– identifying and accessing the remote procedure
– parameters
– return value
Remote Subroutine
Client
Server
ol int
c int foo(int
foo(int x,
x, int
int yy )) {{
pr oto if
blah,
blah, blah,
blah, blah
blah if (x>100)
(x>100)
return(y-2);
return(y-2);
bar else
else if
if (x>10)
bar == foo(a,b);
foo(a,b); (x>10)
return(y-x);
return(y-x);
blah, else
blah, blah,
blah, blah
blah else
return(x+y);
return(x+y);
}}
Sun RPC
• There are a number of popular RPC specifications.
• Sun RPC (ONC RPC) is widely used.
• NFS (Network File System) is RPC based.
• Rich set of support tools.
Sun RPC Organization

Remote Program

Shared Global Data

Procedure 1 Procedure 2 Procedure 3


Procedure Arguments
• To reduce the complexity of the interface
specification, Sun RPC includes support for a single
argument to a remote procedure.*
• Typically the single argument is a structure that
contains a number of values.

* Newer versions can handle multiple args.


Procedure Identification

• Each procedure is identified by:


– Hostname (IP Address)
– Program identifier (32 bit integer)
– Procedure identifier (32 bit integer)

– Program Version identifier


• for testing and migration.
Program Identifiers
• Each remote program has a unique ID.
• Sun divided up the IDs:
0x00000000 - 0x1fffffff
0x20000000 - 0x3fffffff
0x40000000 - 0x5fffffff Sun
0x60000000 - 0xffffffff
SysAdmin
Transient
Reserved
Procedure Identifiers &
Program Version Numbers
• Procedure Identifiers usually start at 1 and are
numbered sequentially

• Version Numbers typically start at 1 and are


numbered sequentially.
Iterative Server
• Sun RPC specifies that at most one remote
procedure within a program can be invoked at any
given time.

• If a 2nd procedure is called, the call blocks until the


1st procedure has completed.
Iterative can be good
• Having an iterative server is useful for applications
that may share data among procedures.
• Example: database - to avoid insert/delete/modify
collisions.

• We can provide concurrency when necessary...


Call Semantics
• What does it mean to call a local procedure?
– the procedure is run exactly one time.

• What does it mean to call a remote procedure?


– It might not mean "run exactly once"!
Remote Call Semantics
• To act like a local procedure (exactly one invocation
per call) - a reliable transport (TCP) is necessary.
• Sun RPC does not support reliable call semantics. !
• "At Least Once" Semantics
• "Zero or More" Semantics
Sun RPC Call Semantics
• At Least Once Semantics
– if we get a response (a return value)

• Zero or More Semantics


– if we don't hear back from the remote subroutine.
Remote Procedure deposit()

deposit(DavesAccount,$100)

• Always remember that you don't know how many


times the remote procedure was run!
– The net can duplicate the request (UDP).
Network Communication
• The actual network communication is nothing new -
it's just TCP/IP.
• Many RPC implementations are built upon the
sockets library.
– the RPC library does all the work!

• We are just using a different API, the underlying stuff


is the same!
Dynamic Port Mapping
• Servers typically do not use well known protocol
ports!

• Clients know the Program ID (and host IP address).

• RPC includes support for looking up the port number


of a remote program.
Port Lookup Service
• A port lookup service runs on each host that contains
RPC servers.

• RPC servers register themselves with this service:


– "I'm program 17 and I'm looking for requests on port 1736"
The portmapper
• Each system which will support RPC servers runs a
port mapper server that provides a central registry for
RPC services.
• Servers tell the port mapper what services they offer.
More on the portmapper
• Clients ask a remote port mapper for the port number
corresponding to Remote Program ID.

• The portmapper is itself an RPC server!

• The portmapper is available on a well-known port


(111).
Sun RPC Programming
• The RPC library is a collection of tools for automating
the creation of RPC clients and servers.
• RPC clients are processes that call remote
procedures.
• RPC servers are processes that include procedure(s)
that can be called by clients.
RPC Programming
• RPC library
– XDR routines
– RPC run time library
• call rpc service
• register with portmapper
• dispatch incoming request to correct procedure
– Program Generator
RPC Run-time Library
• High- and Low-level functions that can be used by
clients and servers.

• High-level functions provide simple access to RPC


services.
High-level Client Library
int callrpc( char *host,
u_long prognum,
u_long versnum,
u_long procnum,
xdrproc_t inproc,
char *in,
xdrproc_t outproc,
char *out);
High-Level Server Library
int registerrpc(
u_long prognum,
u_long versnum,
u_long procnum,
char *(*procname)()
xdrproc_t inproc,
xdrproc_t outproc);
High-Level Server Library (cont.)
void svc_run();

• svc_run() is a dispatcher.
• A dispatcher waits for incoming connections and
invokes the appropriate function to handle each
incoming request.
High-Level Library Limitation
• The High-Level RPC library calls support UDP only
(no TCP).
• You must use lower-level RPC library functions to
use TCP.
• The High-Level library calls do not support any kind
of authentication.
Low-level RPC Library
• Full control over all IPC options
– TCP & UDP
– Timeout values
– Asynchronous procedure calls
• Multi-tasking Servers
• Broadcasting

IPC is InterProcess Communication


RPCGEN
• There is a tool for automating the creation of RPC
clients and servers.
• The program rpcgen does most of the work for you.
• The input to rpcgen is a protocol definition in the
form of a list of remote procedures and parameter
types.
RPCGEN

Protocol Input
Input File
File
Description

rpcgen

Client Stubs XDR filters header file Server skeleton


C Source Code
rpcgen Output Files

> rpcgen –C foo.x

foo_clnt.c (client stubs)


foo_svc.c (server main)
foo_xdr.c (xdr filters)
foo.h (shared header file)
Client Creation

> gcc -o fooclient foomain.c foo_clnt.c foo_xdr.c -lnsl

• foomain.c is the client main() (and possibly other


functions) that call rpc services via the client stub
functions in foo_clnt.c
• The client stubs use the xdr functions.
Server Creation

gcc -o fooserver fooservices.c foo_svc.c foo_xdr.c –lrpcsvc


-lnsl

• fooservices.c contains the definitions of the actual remote


procedures.
Example Protocol Definition
struct twonums {
int a;
int b;
};
program UIDPROG {
version UIDVERS {
int RGETUID(string<20>) = 1;
string RGETLOGIN( int ) = 2;
int RADD(twonums) = 3;
} = 1;
} = 0x20000001;
RPC Programming with rpcgen
Issues:
– Protocol Definition File
– Client Programming
• Creating an "RPC Handle" to a server
• Calling client stubs
– Server Programming
• Writing Remote Procedures
Protocol Definition File

• Description of the interface of the remote procedures.


– Almost function prototypes
• Definition of any data structures used in the calls
(argument types & return types)
• Can also include shared C code (shared by client and
server).
XDR the language
• Remember that XDR data types are not C data types!
– There is a mapping from XDR types to C types – that's most
of what rpcgen does.

• Most of the XDR syntax is just like C


– Arrays, strings are different.
XDR Arrays

• Fixed Length arrays look just like C code:


int foo[100]
• Variable Length arrays look like this:

int foo<> or int foo<MAXSIZE>

Implicit maximum size is 232 -1


What gets sent on the network

int x[n]

x0 x1 x2 ... xn-1

int y<m> k is actual array size


k≤ m
k y0 y1 y2 ... yk
XDR String Type
• Look like variable length arrays:
string s<100>
• What is sent: length followed by sequence of ASCII
chars:

n s0 s1 s2 s3 . . . Sn-1

n is actual string length (sent as int)


Linked Lists!
struct foo {
int x; rpcgen recognizes
foo *next; this as a linked list
}

The generated XDR filter uses xdr_pointer() to


encode/decode the stuff pointed to by a pointer.

Check the online example "linkedlist".


Declaring The Program
program SIMP_PROG {
version SIMP_VERSION {
type1 PROC1(operands1) = 1;
type2 PROC2(operands2) = 2;
} = 1;
} = 40000000;

Color Code:
Keywords Generated Symbolic Constants
Used to generate stub and procedure names
Procedure Numbers
• Procedure #0 is created for you automatically.
– Start at procedure #1!

• Procedure #0 is a dummy procedure that can help


debug things (sortof an RPC ping server).
Procedure Names
Rpcgen converts to lower case and prepends underscore
and version number:
rtype PROCNAME(arg)

Client stub:
rtype *proc_1(arg *, CLIENT *);
Server procedure:
rtype *proc_1_svc(arg *,
struct svc_req *);
Program Numbers
• Use something like:
555555555 or 22222222

• You can find the numbers currently used with "rpcinfo


–p hostname"
Client Programming
• Create RPC handle.
– Establishes the address of the server.

• RPC handle is passed to client stubs (generated by


rpcgen).

• Type is CLIENT *
clnt_create

CLIENT *clnt_create( r
r v e
char *host, o f se
a m e
u_long prog, os tn e r
H m b
u_long vers, m n u
g r a r
char *proto); Pro u m be
i on n
Ver s

Can be "tcp" or "udp"


Calling Client Stubs
• Remember:
– Return value is a pointer to what you expect.
– Argument is passed as a pointer.
– If you are passing a string, you must pass a char**
• When in doubt – look at the ".h" file generated by
rpcgen
Server Procedures
• Rpcgen writes most of the server.
• You need to provide the actual remote procedures.
• Look in the ".h" file for prototypes.
• Run "rpcgen –C –Ss" to generate (empty) remote
procedures!
Server Function Names
• Old Style (includes AIX): Remote procedure FOO,
version 1 is named foo_1()

• New Style (includes Sun,BSD,Linux): Remote


procedure FOO, version 1 is named foo_1_svc()
Running rpcgen
• Command line options vary from one OS to another.
• Sun/BSD/Linux – you need to use "-C" to get ANSI C
code!
• Rpcgen can help write the files you need to write:
– To generate sample server code: "-Ss"
– To generate sample client code: "-Sc"
Other porting issues
• Shared header file generated by rpcgen may have:
#include <rpc/rpc.h>

• Or Not!
RPC without rpcgen
• Can do asynchronous RPC
– Callbacks
– Single process is both client and server.
• Write your own dispatcher (and provide concurrency)
• Can establish control over many network parameters:
protocols, timeouts, resends, etc.
rpcinfo
rpcinfo –p host prints a list of all registered
programs on host.

u : UDP
rpcinfo –[ut] host program# makes a call to
t : TCP#0 of the specified RPC program (RPC
procedure
ping).
Sample Code
• simple – integer add and subtract
• ulookup – look up username and uid.
• varray – variable length array example.
• linkedlist – arg is linked list.
Example simp
• Standalone program simp.c
– Takes 2 integers from command line and prints out the sum
and difference.
– Functions:
int add( int x, int y );
int subtract( int x, int y );
Splitting simp.c
• Move the functions add() and subtract() to the server.

• Change simp.c to be an RPC client


– Calls stubs add_1() , subtract_1()
• Create server that serves up 2 remote procedures
– add_1_svc() and subtract_1_svc()
Protocol Definition: simp.x
struct operands {
int x;
int y;
};

program SIMP_PROG {
version SIMP_VERSION {
int ADD(operands) = 1;
int SUB(operands) = 2;
} = VERSION_NUMBER;
} = 555555555;
rpcgen –C simp.x
simp.x
simp.x

rpcgen

simp_clnt.c
simp.h
Client Stubs simp_xdr.c
header file simp_svc.c
XDR filters Server skeleton
xdr_operands XDR filter
bool_t xdr_operands( XDR *xdrs,
operands *objp){

if (!xdr_int(xdrs, &objp->x))
return (FALSE);
if (!xdr_int(xdrs, &objp->y))
return (FALSE);
return (TRUE);
}
simpclient.c
• This was the main program – is now the client.
• Reads 2 ints from the command line.
• Creates a RPC handle.
• Calls the remote add and subtract procedures.
• Prints the results.
simpservice.c

• The server main is in simp_svc.c.


• simpservice.c is what we write – it holds the add
and subtract procedures that simp_svc will call
when it gets RPC requests.
• The only thing you need to do is to match the
name/parameters that simp_svc expects (check
simp.h!).
Raw Sockets
Raw Sockets
TCP/IP Stack

67
69
25 21 23 53 161
Bootp
DHCP

TCP UDP
Port # Port #
IPv6 EGP OSPF
41 8 89 6 17 Port
address
protocol

1 2
IP
address

frame
type
MAC
address
ICMP IGMP User TCP User UDP
(ping, etc)

TCP TCP UDP


RAW RAW
port port port

ICMP
echo TCP stack UDP stack
timestamp
2 port port
98
1
17
6

17 UDP
6 TCP
1 ICMP
2 IGMP
89 OSPF
What can raw sockets do?
• Bypass TCP/UDP layers
• Read and write ICMP and IGMP packets
– ping, traceroute, multicast routing daemon
• Read and write IP datagrams with an IP protocol field not processed by the
kernel
– OSPF
• Send and receive your own IP packets with your own IP header using the
IP_HDRINCL socket option
– can build and send TCP and UDP packets
– testing, hacking
– only superuser can create raw socket though
• You need to do all protocol processing at user-level
Creating Raw Sockets
• Only Superuser can create
• socket(AF_INET, SOCK_RAW, protocol)
– where protocol is one of the constants, IPPROTO_xxx, such as
IPPROTO_ICMP.
• bind can be called on the raw socket, but this is rare. This
function sets only the local address: There is no concept of a
port number with a raw socket.
• connect can be called on the raw socket, but this is rare. This
function sets only the foreign address: Again, there is no
concept of a port number with a raw socket.

RAW SOCKETS 694


Creating Raw Sockets: IP Header option

• The IP_HDRINCL socket option can be set as


follows:
• const int on = 1;
• if (setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &on,
sizeof(on)) < 0) error

RAW SOCKETS 695


Raw Socket Output
• Normal output is performed by calling sendto or sendmsg and specifying
the destination IP address
– write, writev, or send can also be called if the socket has been connected.
• If the IP_HDRINCL option is not set, kernel prepends the IP header
– The kernel sets the protocol field of the IPv4 header that it builds to the third
argument from the call to socket.
• If the IP_HDRINCL option is set, the starting address of the data for the
kernel to send specifies the first byte of the IP header.
– The process builds the entire IP header, except: (i) the IPv4 identification field
can be set to 0, which tells the kernel to set this value; (ii) the kernel always
calculates and stores the IPv4 header checksum; and (iii) IP options may or may
not be included
• The kernel fragments raw packets that exceed the outgoing interface MTU.

RAW SOCKETS 696


Raw Socket Input
• Which received IP datagrams does the kernel pass to raw sockets?
• Received UDP packets and received TCP packets are never passed to a raw
socket.
– read at the datalink layer
• Most ICMP packets are passed to a raw socket after the kernel has finished
processing the ICMP message.
– Except echo request, timestamp request, and address mask request
• All IGMP packets are passed to a raw socket after the kernel has finished
processing the IGMP message.
• All IP datagrams with a protocol field that the kernel does not understand are
passed to a raw socket.
• If the datagram arrives in fragments, nothing is passed to a raw socket until all
fragments have arrived and have been reassembled.

RAW SOCKETS 697


Raw Socket Input
• When the kernel has an IP datagram, all raw sockets for all processes are
examined, looking for all matching sockets.
• A copy of the IP datagram is delivered to each matching socket.
• The following tests are performed for each raw socket and only if all three tests
are true is the datagram delivered to the socket:
– If a nonzero protocol is specified, protocol field must match
– If a local IP address is bound to the raw socket by bind, then the destination IP
address of the received datagram must match
– If a foreign IP address was specified for the raw socket by connect, then the source IP
address of the received datagram must match
• Notice that if a raw socket is created with a protocol of 0, and neither bind nor
connect is called, then that socket receives a copy of every raw datagram the
kernel passes to raw sockets.

RAW SOCKETS 698


Raw Socket Input
• Whenever a received datagram is passed to a raw IPv4 socket,
the entire datagram, including the IP header, is passed to the
process
• For a raw IPv6 socket, only the payload (i.e., no IPv6 header or
any extension headers) is passed to the socket

RAW SOCKETS 699


Raw Socket Input
• Whenever a received datagram is passed to a raw IPv4 socket,
the entire datagram, including the IP header, is passed to the
process
• For a raw IPv6 socket, only the payload (i.e., no IPv6 header or
any extension headers) is passed to the socket

RAW SOCKETS 700


Example: Ping Program

• Send an ICMP echo request to some IP address and


receive an ICMP echo reply.
• #ping 172.10.1.3
• Ping 172.10.1.3: 56 bytes of data
• Reply from 172.10.1.3: bytes=56 time<10ms ttl=255
• … (4 replies)

Not active : Request Timeout

RAW SOCKETS 701


ICMP Message

• set the identifier to the PID of the ping process and we increment the sequence
number by one for each packet we send
• We store the 8-byte timestamp of when the packet is sent as the optional data.
The rules of ICMP require that the identifier, sequence number, and any optional
data be returned in the echo reply.
• Storing the timestamp in the packet lets us calculate the RTT when the reply is
received.

RAW SOCKETS 702


ICMP Message

RAW SOCKETS 703


ICMP Echo Message

RAW SOCKETS 704


ICMP Echo Message

RAW SOCKETS 705


ICMP Echo Message

RAW SOCKETS 706


Ping Program

main Sig_Alrm

Read loop
Send_v4

recvfrom Proc_v4
Send an echo
request once a
Infinite receive loop second

RAW SOCKETS 707


Traceroute Example

• Determines the path IP datagrams follow


• Uses TTL field(IPv4) or hop limit(IPv6) and two ICMP messages
• One UDP datagram is sent by the host with TTL=1 to the destination
• 1st hop router sends an ICMP “time exceed in transit” error
• TTL is increased to 2, and another datagram is sent
• Process repeats with a final datagram with a port number not in use on
the destination, so that destination can send “ICMP port unreachable”
error

RAW SOCKETS 708


RAW SOCKETS 709
Datalink Access
• Uses
– Watch packets on the interface
– Programs can be run as applications than as part of kernel
• Ways to access the datalink
– BSD Packet Filter
– Datalink Provider Interface
– Linux SOL_PACKET interface

Public library: libpcap

DATALINK ACCESS 710


BSD Packet filter
application application process

Writing is not frequent.


Why?
buffer buffer kernel

filter filter IPv4 IPv6

BPF datalink

Filters: tcp, udp, tcp[15:1] 1 byte starting at offset


15

DATALINK ACCESS 711


BPF reduces its’ overhead by

1. Filtering is within the kernel


2. Only a part of each packet is transmitted
3. Uses buffering for both read and write to reduce
number of system calls.

Accessing a BPF: Open a BPF device, Use ioctl to set the properties like
Load the filter, set read timeout, set buffer size, attach a DL to BPF, enable
Promiscuous mode etc.

DATALINK ACCESS 712


Linux : SOCK_PACKET

• Superuser privileges are required

• Fd =socket(AF_INET, SOCK_PACKET, htons (ETH_P_ALL))

ETH_P_IP, ETH_P_ARP, ETH_IP_IPV6

Disadvantages:
1. No kernel buffering, hence, more system calls
2. No device filtering, hence, ETH_IP_P will give
packets from Ethernet, PPP, SLIP links, and loop
back devices

DATALINK ACCESS 713


ICMP Format

subtype
Ping Program

• Create a raw socket to send/receive ICMP echo 
request and echo reply packets
• Install SIGALRM handler to process output
– Sending echo request packets every t second
– Build ICMP packets (type, code, checksum, id, 
seq, sending timestamp as optional data)
• Enter an infinite loop processing input
– Use recvmsg() to read from the network
– Parse the message and retrieve the ICMP packet
– Print ICMP packet information, e.g., peer IP 
address, round-trip time
Traceroute program

• Create a UDP socket and bind source port


– To send probe packets with increasing TTL
– For each TTL value, use timer to send a probe every three seconds,
and send 3 probes in total
• Create a raw socket to receive ICMP packets
– If timeout, printing “ *”
– If ICMP “port unreachable”, then terminate
– If ICMP “TTL expired”, then printing hostname of the router and round
trip time to the router
ISZC462

Lecture#8
Problem 1
• This problem is about implementing a local chat server and client in a system. The server
and client will facilitate the communication between multiple users of the system. You
should submit client_idno.c and server_idno.c for client and server respectively.
• The chat server supports the following functionalities.
• let us say currently users B, C and D have entered chat server. Then user A joins chat.
Server will tell all the current chatters B, C and D: ‘A just joined’
– command: connect <username>
• A can say a message to every one “Hello! Everyone!” or A can whisper a message to C
alone ‘I want to tell a secret to you’. So server should facilitate one to all and one to one
communication.
– Command: talk * //to talk to all chatters
– Command: talk <username> to talk to one user
• A can also get the list of all chatters.
– Command: list
• A can disconnect from chat
– Command: disconnect
Problem 2
• The server program should
• start like ./server <path>
• since it runs within the system, it should use either FIFO/Message Queues for
inter process communication.
• use select() call for dealing with multiple users concurrently
• The client program should
• start like ./client <serverpath>
• take care of interpreting commands entered by user.
• process the command until Ctrl-D is pressed. When a user types and then
presses <ENTER>, that is the end of one message. But the program will still
wait for the next message until user presses Ctrl-D (EOF for fgets()).
• the client is capable of handling the sending and receiving simultaneously. Any
messages received while the user is typing the message to be sent, will be
simply flashed on the console.
Problem 3
• A simple TCP based chat server could allow two users to use
any TCP client (telnet, for example) to communicate with each
other. Consider a single process, single thread server that
can support exactly 2 clients at once, the server simply forwards
whatever is sent from one client to the other (in both directions).
As soon as something is sent from one client it is immediately
forwarded to the other client. As soon as either client terminates
the connection, the server exits. Provide server code with
comments.
Problem 4
1. When the server starts it reads from a file having a list of domain
names which are to be forbidden to access. When a HTTP
request comes to server,
http://discovery.bits-pilani.ac.in/index.html, it checks if the
domain name “discovery.bits-pilani.ac.in” exists in the list. If it is,
the server sends back HTTP error 403 Forbidden to the client. If
not it sends the request to the actual server. When it gets the
reply, it sends the reply to the client.
2. Your server takes a port number on the command line. It can be
iterative server.
3. Your server will be tested with a browser.
Problem 5
• Suppose you are given a task of testing the validity of links in a given web
page. You are expected to test each url present in the web page and
report the result. URL is of this form:
• http://<domain name>/<directory1>/<directory2>/ … /<filename>
• Testing URL for validity means to test the existence of domain name, and
existence of file in the given path on remote server.
• To simplify the problem, you can take a list of URLs in a file; one url per
line. Your program takes this file name as command-line argument. Your
program should read each URL and validate the URL. The result is one
of {VALID, INCORRECT DOMAIN, FILE DOESNT EXIST}. Your
program should display the URL and result(s); each URL and its result
per one line on console
Problem 6
Consider the following network. There are n nodes connected in a ring topology. The
communication to any node in the network happens in clock-wise direction i.e. through the
next node. Each node shares a set of files with it.
The nodes communicate using SUN RPC . When a node joins the network it invokes
connectMe() on the next node and the previous node. The next node and previous node
addresses are supplied as CLA. When a node searches for a file, it invokes
void* search(Node n, char* filename)
{
If search is successful then
Return the result set
Else
return search(nextNode(n), filename);
}
Write the protocol file. Take help of rpcgen. Develop rpcclient and rpcserver. Demonstration
should have all communications printed on the console indication the ip, port, file etc.
ISZC462

Tutorial 2
EC1 solutions
Q1
• Write a TCP client and server programs for the
following. The connection between client and
server is persistent i.e. multiple requests are sent
on the same connection. The client sends N
integers to server. The server sums up all of them
and sends the result back to the client. The server
handles the clients concurrently. Also the server
avoids zombies processes to hang around. [10]
Q1 Ans
Protocol:
Client  server: 4 bytes: N, 4 bytes: 1st int, 4 bytes: 2nd int, … until last integer
Server  client: 4 bytes: result

/*Client.c*/
void error(char *msg)
{
perror(msg);
exit(0);
}
int main(int argc, char *argv[])
{
int sockfd, portno, n;
struct sockaddr_in serv_addr;
struct hostent *server;
char buffer[256];
if (argc < 3) {
fprintf(stderr,"usage %s hostname port\n", argv[0]);
exit(0);
}
portno = atoi(argv[2]);
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0)
error("ERROR opening socket");
server = gethostbyname(argv[1]);
if (server == NULL) {
fprintf(stderr,"ERROR, no such host\n");
exit(0);
}
Q1 Ans
bzero((char *) &serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
bcopy((char *)server->h_addr,
(char *)&serv_addr.sin_addr.s_addr,
server->h_length);
serv_addr.sin_port = htons(portno);
if (connect(sockfd,&serv_addr,sizeof(serv_addr)) < 0)
error("ERROR connecting");
/*Protocol implementation*/
printf("Enter number of integers:");
scanf("%d", &N):
while(N>0){
buf[0]=N;
for(i=0;i<N'i++)
{ printf("Enter the %dth number:");
scanf("%d",&buf[i+1]);
}
write(sockfd,buf,(N+1)*4);
n=read(sockfd,&result, 4);
if(n==0)
printf("Server terminted prematurely");
printf("The result is: %d\n", result);
printf("Enter number of integers(-1 to exit):");
scanf("%d", &N):
}while();

return 0;
}
Q1 Ans
/*server.c*/
void
error (char *msg)
{
perror (msg);
exit (1);
}
void
sigchldhandler (int signo)
{
int pid;
while ((pid = waitpid (-1, NULL, WNOHANG)) > 0);
}
int
main (int argc, char *argv[])
{
int ret, i, N, val, sum;
int sockfd, newsockfd, portno, clilen;
char buffer[256];
struct sockaddr_in serv_addr, cli_addr;
int n;
signal (SIGCHLD, sigchldhandler);
if (argc < 2)
{
fprintf (stderr, "ERROR, no port provided\n");
exit (1);
}
sockfd = socket (AF_INET, SOCK_STREAM, 0);
if (sockfd < 0)
error ("ERROR opening socket");
Q1 Ans
bzero ((char *) &serv_addr, sizeof (serv_addr));
portno = atoi (argv[1]);
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = INADDR_ANY;
serv_addr.sin_port = htons (portno);
if (bind (sockfd, (struct sockaddr *) &serv_addr, sizeof (serv_addr)) < 0)
error ("ERROR on binding");
listen (sockfd, 5);
for (;;)
{
clilen = sizeof (cli_addr);
newsockfd = accept (sockfd, (struct sockaddr *) &cli_addr, &clilen);
if (newsockfd < 0)
error ("ERROR on accept");
printf ("connection is accepted");
Q1 Ans
ret = fork ();
if (ret == 0)
{
close (sockfd);
n = read (newsockfd, &N, 4);
printf ("N=%d\n", N);
while (n > 0)
{
i = 0;
sum = 0;
while (i < N)
{
n = read (newsockfd, &val, 4);
printf ("val[%d]=%d\n", i, val);
if (n < 0)
error ("ERROR reading from socket");
sum = sum + val;
i++;
}
printf ("sum=%d\n", sum);
n = write (newsockfd, &sum, 4);
if (n < 0)
error ("ERROR writing to socket");
n = read (newsockfd, &N, 4);
}
return 0;
}
else if (ret > 0)
{
close (newsockfd);
continue;
}
}
}
Q2
1.Write a complete program to implement the
shell command ls –l|grep ^d| wc –l that
displays the number of sub directories in
the current directory. Use system calls
such as exec etc. and pipes for inter
process communication. [8]
Q2 Ans
main ()
{
int pid, p1[2], p2[2];
pipe (p1);
pipe (p2);
pid = fork ();
if (pid == 0)
{
pid = fork ();
if (pid > 0)
{
close(p2[1]);
dup2 (p2[0], 0);
dup2 (p1[1], 1);
wait (NULL);
execlp ("grep", "grep","^d", NULL);
}
else if (pid == 0)
{
dup2 (p2[1], 1);
execlp ("ls", "ls", "-l", NULL);
}
}
else
{
close(p2[1]);
close(p1[1]);
dup2 (p1[0], 0);
execlp ("wc", "wc", "-l", NULL);
}
}
Q3
What is a connected UDP socket? How is it created?
What are the advantages of using it?

Connected UDP socket means that UDP layer remembers


the association of local and remote end points. By
default it doesn’t happen in UDP. This is achieved
by calling connect() on the socket. The advantage
is that asynchronous errors over the network will
be informed to the process. Also there an be only
one destination communicating with the socket.
This provides security against spoofing.
Q3
Normally whenever a socket is closed using close()
system call, TCP termination sequence is
initiated. In a concurrent TCP server, when a
server process closes the connection socket, the
TCP termination is not initiated. Why?

Close() initiates the termination sequence only if the


reference count of the socket descriptor reaches zero.
When a new connection comes to the server, a child
process is created. So the connection descriptor
reference count is 2. if the parent closes the socket,
it becomes 1. so the termination sequence doesn’t start.
Q3
Why is a signal generated for the writer of a FIFO
after the reader disappears not for the reader of
FIFO after its writer disappears?
cat Bigfile | grep pattern | compute
if some error occurs in compute and it terminates, how
does the grep process will come to know about it. Since
the filter program grep doesn't know and has no way of
knowing that it's output has been redirected then the
only way to tell it to stop writing to a broken pipe if
‘cmpute’ crashes is with a signal since return values of
writes to STDOUT are rarely checked.
Q3
Write two advantages of using message queues over pipes?

– message queues preserve message boundaries


where as pipe are stream based
– in message queues, messages can be retrieved
in any order. But in pipes data is invariably
retrieved in FIFO order.
– message queues can be operated asynchronously
where as pipes re strictly synchronous.
– message queues are full duplex where are as
pipes are half duplex
Q4
Write a program ‘myprogram’ that takes the executable
name and its arguments on the command line and
executes it. Don’t use system() command.
$ myprogram exe arg1 arg2 arg3 ……..argn

main(int argc, char **argv)


{
execvp(argv[1], argv+1);
}
Q4
Write a piece of code that is necessary for creating and
mapping shared memory segment onto a process.

key = ftok ("shmget.c", 'R');


if ((shmid = shmget (key, 1024, 0644 | IPC_CREAT)) == -1)
{
perror ("shmget: shmget failed");
exit (1);
}
data = shmat (shmid, (void *) 0, 0);
Q5
Consider the following program.
#include <stdlib.h>
int glob = 6;
int
main ()
{
int var;
pid_t pid;
var = 88;
if (!fork())
{
glob++;
var++;
printf ("Child: pid = %d, glob=%d, var=%d\n", getpid (), glob, var);
}
glob++;
var++;
printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var);
exit (0);
}
Q5 Ans
Write the output of the above program? Assume appropriate
logical pids for parent and child.
[3]
pid = 11710, glob=7, var=89
pid = 11710, glob=8, var=90
pid = 11709, glob=7, var=89
Q5 Ans
Modify the above program such that child starts printing only after
parent has printed.
void usr1_handler(int signo){
return;
}
int glob = 6;
int
main ()
{
int var;
pid_t pid;
var = 88;
pid=fork();
if (pid==0)
{
signal(SIGUSR1,usr1_handler);
glob++;
var++;
pause();
printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var);
}
if(pid>0)
{
glob++;
var++;
printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var);
kill(pid,SIGUSR1);
int st;
wait(&st);
}
exit (0);
Q5 Ans
Modify the above program such that parent waits for the child to exit and prints the
child’s status. int glob = 6;
int
main ()
{
int var;
pid_t pid;
var = 88;
pid=fork();
if (pid==0)
{
glob++;
var++;
pause();
printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var);
}
if(pid>0)
{
int st;
wait(&st);
glob++;
var++;
printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var);
}
exit (0);

S-ar putea să vă placă și