Sunteți pe pagina 1din 139

www.jntuworld.com

KATRAGADDA INNOVATIVE TRUST FOR EDUCATION

www.jntuworld.com KATRAGADDA INNOVATIVE TRUST FOR EDUCATION NETWORK PROGRAMMING Notes prepared by D. Teja Santosh,

NETWORK

PROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

2 | P a g e

N E T W O R K P R O G R A M M I N G

UNIT-I

Introduction and TCP/IP

INTRODUCTION

When writing programs that communicate across a computer network, one must first invent a protocol, an agreement on how those programs will communicate. Before delving into the design details of a protocol, high-level decisions must be made about which program is expected to initiate communication and when responses are expected. For example, a Web server is typically thought of as a long-running program (or daemon) that sends network messages only in response to requests coming in from the network. The other side of the protocol is a Web client, such as a browser, which always initiates communication with the server. This organization into client and server is used by most network-aware applications.

and server is used by most network-aware applications. Notes prepared by D. Teja Santosh, Assistant Professor,
and server is used by most network-aware applications. Notes prepared by D. Teja Santosh, Assistant Professor,
and server is used by most network-aware applications. Notes prepared by D. Teja Santosh, Assistant Professor,

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

3 | P a g e

N E T W O R K P R O G R A M M I N G

3 | P a g e N E T W O R K P R O
3 | P a g e N E T W O R K P R O
3 | P a g e N E T W O R K P R O

OSI Model

3 | P a g e N E T W O R K P R O
3 | P a g e N E T W O R K P R O

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

4 | P a g e

N E T W O R K P R O G R A M M I N G

A common way to describe the layers in a network is to use the International Organization for Standardization (ISO) open systems interconnection (OSI) model for computer communications. This is a seven-layer model, along with the approximate mapping to the Internet protocol suite.

The sockets programming interfaces described are interfaces from the upper three layers (the "application") into the transport layer. Why do sockets provide the interface from the upper three layers of the OSI model into the transport layer? There are two reasons for this design:

First, the upper three layers handle all the details of the application (FTP, Telnet, or HTTP, for example) and know little about the communication details. The lower four layers know little about the application, but handle all the communication details: sending data, waiting for acknowledgments, sequencing data that arrives out of order, calculating and verifying checksums, and so on. The second reason is that the upper three layers often form what is called a user process while the lower four layers are normally provided as part of the operating system (OS) kernel. Unix provides this separation between the user process and the kernel, as do many other contemporary operating systems. Therefore, the interface between layers 4 and 5 is the natural place to build the API.

layers 4 and 5 is the natural place to build the API. APPLICATION LEVEL VIEW OF

APPLICATION LEVEL VIEW OF A SOCKET

place to build the API. APPLICATION LEVEL VIEW OF A SOCKET Notes prepared by D. Teja

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

5 | P a g e

N E T W O R K P R O G R A M M I N G

5 | P a g e N E T W O R K P R O

KERNEL LEVEL VIEW OF A SOCKET (IPv4)

The Big Picture

M I N G KERNEL LEVEL VIEW OF A SOCKET (IPv4) The Big Picture represents SOCKET

represents SOCKET

VIEW OF A SOCKET (IPv4) The Big Picture represents SOCKET Notes prepared by D. Teja Santosh,

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

6 | P a g e

N E T W O R K P R O G R A M M I N G

6 | P a g e N E T W O R K P R O

IPv4 Internet Protocol version 4. IPv4, which we often denote as just IP, has been the workhorse protocol of the IP suite since the early 1980s. It uses 32-bit addresses. IPv4 provides packet delivery service for TCP, UDP, SCTP, ICMP, and IGMP.

IPv6 Internet Protocol version 6. IPv6 was designed in the mid-1990s as a replacement for IPv4. The major change is a larger address comprising 128 bits, to deal with the explosive growth of the Internet in the 1990s. IPv6 provides packet delivery service for TCP, UDP, SCTP, and ICMPv6. We often use the word "IP" as an adjective, as in IP layer and IP address, when the distinction between IPv4 and IPv6 is not needed.

TCP Transmission Control Protocol. TCP is a connection-oriented protocol that provides a reliable, full-duplex byte stream to its users. TCP sockets are an example of stream sockets. TCP takes care of details such as acknowledgments, timeouts, retransmissions, and the like. Most Internet application programs use TCP. Notice that TCP can use either IPv4 or IPv6.

UDP User Datagram Protocol. UDP is a connectionless protocol, and UDP sockets are an example of datagram sockets. There is no guarantee that UDP datagrams ever reach their intended destination. As with TCP, UDP can use either IPv4 or IPv6.

SCTP Stream Control Transmission Protocol. SCTP is a connection-oriented protocol that provides a reliable full-duplex association. The word "association" is used when referring to a connection in SCTP because SCTP is multihomed, involving a set of IP addresses and a single port for each side of an association. SCTP provides a message service, which maintains record boundaries. As with TCP and UDP, SCTP can use either IPv4 or IPv6, but it can also use both IPv4 and IPv6 simultaneously on the same association.

ICMP Internet Control Message Protocol. ICMP handles error and control information between routers and hosts. These messages are normally generated by and processed by the TCP/IP networking software itself, not user processes, although we show the ping and traceroute programs, which use ICMP. We sometimes refer to this protocol as ICMPv4 to distinguish it from ICMPv6.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

7 | P a g e

N E T W O R K P R O G R A M M I N G

IGMP Internet Group Management Protocol. IGMP is used with multicasting, which is optional with IPv4. ARP Address Resolution Protocol. ARP maps an IPv4 address into a hardware address (such as an Ethernet address). ARP is normally used on broadcast networks such as Ethernet, token ring, and FDDI, and is not needed on point-to-point networks.

RARP Reverse Address Resolution Protocol. RARP maps a hardware address into an IPv4 address. It is sometimes used when a diskless node is booting.

ICMPv6 Internet Control Message Protocol version 6. ICMPv6 combines the functionality of ICMPv4, IGMP, and ARP.

BPF BSD packet filter. This interface provides access to the datalink layer. It is normally found on Berkeley-derived kernels.

DLPI Datalink provider interface. This interface also provides access to the datalink layer. It is normally provided with SVR4.

We use the terms "IPv4/IPv6 host" and "dual-stack host" to denote hosts that support both IPv4 and IPv6.

USER DATAGRAM PROTOCOL [UDP]:-

The User Datagram Protocol (UDP) provides a connectionless, unreliable transport service. Connectionless means that a communication session between hosts is not established before exchanging data. UDP is often used for communications that use broadcast or multicast Internet Protocol (IP) packets. The UDP connectionless packet delivery service is unreliable because it does not guarantee data packet delivery or send a notification if a packet is not delivered.

Because delivery of UDP packets is not guaranteed, applications that use this protocol must supply their own mechanisms for reliability if necessary. Although UDP appears to have some limitations, it is useful in certain situations.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

8 | P a g e

N E T W O R K P R O G R A M M I N G

Each UDP datagram has a length. The length of a datagram is passed to the receiving application along with the data.

is passed to the receiving application along with the data. TRANSMISSION CONTROL PROTOCOL [TCP]:-  Connection

TRANSMISSION CONTROL PROTOCOL [TCP]:-

Connection oriented: An application requests a ―connection‖ to destination and uses connection to transfer data.

Point-to-point: A TCP connection has two endpoints (no broadcast/multicast).

Reliability: TCP guarantees that data will be delivered without loss, duplication or transmission errors.

Full duplex: Endpoints can exchange data in both directions simultaneously.

Delivering TCP: TCP segments travel in IP datagrams. Internet routers only look at IP header to forward datagrams. Each segment contains a sequence number.

Flow Control: Flow control is necessary when a computer in the network transmits data too fast for another computer to receive it .Flow control requires some form of feedback from the receiving peer. This is executed effectively due to the receivers buffer i.e., Window.

TCP contains algorithms to estimate the round-trip time (RTT) between a client and server dynamically so that it knows how long to wait for an acknowledgment. For example, the RTT on a LAN can be milliseconds while across a WAN, it can be seconds. Furthermore, TCP continuously estimates the RTT of a given connection, because the RTT is affected by variations in the network traffic.

TCP Connection Establishment

Three-Way Handshake

The following scenario occurs when a TCP connection is established:

1. The server must be prepared to accept an incoming connection. This is normally done by calling socket, bind, and listen and is called a passive open.

2. The client issues an active open by calling connect. This causes the client TCP to send a "synchronize" (SYN) segment, which tells the server the client's initial sequence

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

9 | P a g e

N E T W O R K P R O G R A M M I N G

number for the data that the client will send on the connection. Normally, there is no data sent with the SYN; it just contains an IP header, a TCP header, and possible TCP options (which we will talk about shortly).

3. The server must acknowledge (ACK) the client's SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection. The server sends its SYN and the ACK of the client's SYN in a single segment.

4. The client must acknowledge the server‘s SYN.

segment. 4. The client must acknowledge the server‘s SYN. TCP Connection Termination 1. One application calls
segment. 4. The client must acknowledge the server‘s SYN. TCP Connection Termination 1. One application calls

TCP Connection Termination

1. One application calls close first, and we say that this end performs the active close. This end's TCP sends a FIN segment, which means it is finished sending data.

2. The other end that receives the FIN performs the passive close. The received FIN is acknowledged by TCP. The receipt of the FIN is also passed to the application as an endof- file (after any data that may have already been queued for the application to receive), since the receipt of the FIN means the application will not receive any additional data on the connection.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

10 | P a g e

N E T W O R K P R O G R A M M I N G

3. Sometime later, the application that received the end-of-file will close its socket. This causes its TCP to send a FIN.

4. The TCP on the system that receives this final FIN (the end that did the active close) acknowledges the FIN.

Since a FIN and an ACK are required in each direction, four segments are normally required. We use the qualifier "normally" because in some scenarios, the FIN in Step 1 is sent with data. Also, the segments in Steps 2 and 3 are both from the end performing the passive close and could be combined into one segment.

the passive close and could be combined into one segment. Importance of TIME_WAIT State: Undoubtedly, one
the passive close and could be combined into one segment. Importance of TIME_WAIT State: Undoubtedly, one

Importance of TIME_WAIT State:

Undoubtedly, one of the most misunderstood aspects of TCP with regard to network programming is its TIME_WAIT state. The end that performs the active close goes through this state. The duration that this endpoint remains in this state is twice the maximum segment lifetime (MSL), sometimes called 2MSL.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

11 | P a g e

N E T W O R K P R O G R A M M I N G

Every implementation of TCP must choose a value for the MSL. The recommended value in RFC 1122 [Braden 1989] is 2 minutes, although Berkeley-derived implementations have traditionally used a value of 30 seconds instead. This means the duration of the TIME_WAIT state is between 1 and 4 minutes. The MSL is the maximum amount of time that any given IP datagram can live in a network. We know this time is bounded because every datagram contains an 8-bit hop limit with a maximum value of 255. Although this is a hop limit and not a true time limit, the assumption is made that a packet with the maximum hop limit of 255 cannot exist in a network for more than MSL seconds. The way in which a packet gets "lost" in a network is usually the result of routing anomalies. A router crashes or a link between two routers goes down and it takes the routing protocols seconds or minutes to stabilize and find an alternate path. During that time period, routing loops can occur (router A sends packets to router B, and B sends them back to A) and packets can get caught in these loops. In the meantime, assuming the lost packet is a TCP segment, the sending TCP times out and retransmits the packet, and the retransmitted packet gets to the final destination by some alternate path. But sometime later (up to MSL seconds after the lost packet started on its journey), the routing loop is corrected and the packet that was lost in the loop is sent to the final destination. This original packet is called a lost duplicate or a wandering duplicate. TCP must handle these duplicates.

THE

FROM:

FOLLOWING

INFORMATION

HAS

BEEN

TAKEN

It should be noted that the exchange is really two independent exchanges and it is possible to close the connection in one direction but not the other. This is known as a half close. The following example (due to Stevens) demonstrates the use of the half-close.

Consider the Unix command

rsh remote sort < datafile The effect of this is that the local file datafile is sorted on the remote host and the results transferred back to the local host. The data flow is shown in the following diagram.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

12 | P a g e

N E T W O R K P R O G R A M M I N G

12 | P a g e N E T W O R K P R O

The problem here is that the sort program on the remote host will not start sorting the data until it has read all the data, this event is indicated by the local host closing the connection and the sort program responding to the corresponding EOF indication. However, the "back" connection must remain open for the return of data.

Stevens suggests that the library call shutdown() be used with sockets programming to achieve a half close.

Once the final ACK has been sent on an active close, the port/connection cannot be relaeased and re-used for the time period 2MSL. This is twice the maximum segment life and this constraint is imposed in case the the final ACK is lost. If the final ACK is lost then the passive closing host will time out awaiting an ACK in response to the closing FIN and will resend the FIN. If this arrives before the 2MSL time has expired there is no problem, after this time the FIN does not appear to belong to whatever connection might exist between the two clients.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

13 | P a g e

N E T W O R K P R O G R A M M I N G

13 | P a g e N E T W O R K P R O

RFC 793 defines MSL (Maximum Segment Lifetime) as 120 seconds but some implementations use 30 or 60 seconds. It is, basically, the maximum time for which it is reasonable to wait for a segment, i.e. if a segment doesn't reach its destination in MSL, it probably won't get there at all at it can be assumed that it has been lost.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

14 | P a g e

N E T W O R K P R O G R A M M I N G

14 | P a g e N E T W O R K P R O
14 | P a g e N E T W O R K P R O

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

15 | P a g e

N E T W O R K P R O G R A M M I N G

There are two reasons for the TIME_WAIT state:

1. To implement TCP's full-duplex connection termination reliably

2. To allow old duplicate segments to expire in the network

The first reason can be explained by assuming that the final ACK is lost. The server will resend its final FIN, so the client must maintain state information, allowing it to resend the final ACK. If it did not maintain this information, it would respond with an RST (a different type of TCP segment), which would be interpreted by the server as an error. If TCP is performing all the work necessary to terminate both directions of data flow cleanly for a connection (its full-duplex close), then it must correctly handle the loss of any of these four segments. This example also shows why the end that performs the active close is the end that remains in the TIME_WAIT state: because that end is the one that might have to retransmit the final ACK.

To understand the second reason for the TIME_WAIT state, assume we have a TCP connection between 12.106.32.254 port 1500 and 206.168.112.219 port 21. This connection is closed and then sometime later, we establish another connection between the same IP addresses and ports: 12.106.32.254 port 1500 and 206.168.112.219 port 21. This latter connection is called an incarnation of the previous connection since the IP addresses and ports are the same. TCP must prevent old duplicates from a connection from reappearing at some later time and being misinterpreted as belonging to a new incarnation of the same connection. To do this, TCP will not initiate a new incarnation of a connection that is currently in the TIME_WAIT state. Since the duration of the TIME_WAIT state is twice the MSL, this allows MSL seconds for a packet in one direction to be lost, and another MSL seconds for the reply to be lost. By enforcing this rule, we are guaranteed that when we successfully establish a TCP connection, all old duplicates from previous incarnations of the connection have expired in the network.

USEFUL LINKS FOR TIME_WAIT IMPORTANCE:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

16 | P a g e

N E T W O R K P R O G R A M M I N G

Port Numbers

P a g e N E T W O R K P R O G R

ALLOCATION OF PORT NUMBERS

INTRODUCTION TO CONCURRENT SERVERS:

SOCKETPAIR:

The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the connection: the local IP address, local port, foreign IP address, and foreign port. A socket pair uniquely identifies every TCP connection on a network.

pair uniquely identifies every TCP connection on a network. Notes prepared by D. Teja Santosh, Assistant

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

17 | P a g e

N E T W O R K P R O G R A M M I N G

17 | P a g e N E T W O R K P R O
17 | P a g e N E T W O R K P R O
17 | P a g e N E T W O R K P R O

NOTE: FOR MORE INFORMATION ABOUT FIRST 6 UNITS, PLEASE GO THROUGH THE FOLLOWING LINK:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

18 | P a g e

N E T W O R K P R O G R A M M I N G

UNIT-II

Socket Address Structure

Most socket functions require a pointer to a socket address structure as an argument. Each supported protocol suite defines its own socket address structure.

IPv4 Socket Address Structure(SAS)

An IPv4 socket address structure, commonly called an "Internet socket address structure," is named sockaddr_in and is defined by including the <netinet/in.h> header. The POSIX definition of IPV4 SAS is shown below:

struct in_addr {

in_addr_t s_addr; };

struct sockaddr_in { uint8_t sin_len; sa_family_t sin_family; in_port_t sin_port; struct in_addr sin_addr; char sin_zero[8]; };

The diagrammatical representation of IPV4 SAS is:

; }; The diagrammatical representation of IPV4 SAS is: Notes prepared by D. Teja Santosh, Assistant

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

19 | P a g e

N E T W O R K P R O G R A M M I N G

19 | P a g e N E T W O R K P R O

Datatype, Description and Header File of IPV4 SAS Members

IMP NOTE: The 32-bit IPv4 address can be accessed in two different ways. For example, if serv is defined as an Internet socket address structure, then serv.sin_addr references the 32- bit IPv4 address as an in_addr structure, while serv.sin_addr.s_addr references the same 32- bit IPv4 address as an in_addr_t (typically an unsigned 32-bit integer). We must be certain that we are referencing the IPv4 address correctly, especially when it is used as an argument to a function, because compilers often pass structures differently from integers. Socket address structures are used only on a given host: The structure itself is not communicated between different hosts, although certain fields (e.g., the IP address and port) are used for communication.

(e.g., the IP address and port) are used for communication. Notes prepared by D. Teja Santosh,

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

20 | P a g e

N E T W O R K P R O G R A M M I N G

Value-Result Arguments

N E T W O R K P R O G R A M M I
N E T W O R K P R O G R A M M I

Three functions, bind, connect, and sendto, pass a socket address structure from the process to the kernel. One argument to these three functions is the pointer to the socket address structure and another argument is the integer size of the structure. Since the kernel is passed both the pointer and the size of what the pointer points to, it knows exactly how much data to copy from the process into the kernel.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

21 | P a g e

N E T W O R K P R O G R A M M I N G

21 | P a g e N E T W O R K P R O
21 | P a g e N E T W O R K P R O

Four functions, accept, recvfrom, getsockname, and getpeername, pass a socket address structure from the kernel to the process, the reverse direction from the previous scenario. Two of the arguments to these four functions are the pointer to the socket address structure along with a pointer to an integer containing the size of the structure.

The reason that the size changes from an integer to be a pointer to an integer is because the size is both a value when the function is called (it tells the kernel the size of the structure so that the kernel does not write past the end of the structure when filling it in) and a result when the function returns. This type of argument is called a value-result argument.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

22 | P a g e

N E T W O R K P R O G R A M M I N G

Byte Ordering Functions

Consider a 16-bit integer that is made up of 2 bytes. There are two ways to store the two bytes in memory: with the low-order byte at the starting address, known as little-endian byte order, or with the high-order byte at the starting address, known as big-endian byte order.

at the starting address, known as big-endian byte order. Network Byte Order – Big Endian Byte

Network Byte Order Big Endian Byte Order Host Byte Order Big Endian or Little Endian Byte Order

We must deal with these byte ordering differences as network programmers because networking protocols must specify a network byte order. For example, in a TCP segment, there is a 16-bit port number and a 32-bit IPv4 address. The sending protocol stack and the receiving protocol stack must agree on the order in which the bytes of these multibyte fields will be transmitted. The Internet protocols use big-endian byte ordering for these multibyte integers.

In theory, an implementation could store the fields in a socket address structure in host byte order and then convert to and from the network byte order when moving the fields to and from the protocol headers, saving us from having to worry about this detail. But, both history and the POSIX specification say that certain fields in the socket address structures must be

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

23 | P a g e

N E T W O R K P R O G R A M M I N G

maintained in network byte order. Our concern is therefore converting between host byte order and network byte order. We use the following four functions to convert between these two byte orders.

four functions to convert between these two byte orders. In the names of these functions, h

In the names of these functions, h stands for host, n stands for network, s stands for short, and l stands for long. The terms "short" and "long" are historical artifacts from the Digital VAX implementation of 4.2BSD. We should instead think of s as a 16-bit value (such as a TCP or UDP port number) and l as a 32-bit value (such as an IPv4 address). Indeed, on the 64-bit Digital Alpha, a long integer occupies 64 bits, yet the htonl and ntohl functions operate on 32-bit values. NOTE: These functions are used exclusively for data functionality between sockets (storage).

Byte Manipulation Functions

There are two groups of functions that operate on multibyte fields, without interpreting the data, and without assuming that the data is a null-terminated C string. We need these types of functions when dealing with socket address structures because we need to manipulate fields such as IP addresses, which can contain bytes of 0, but are not C character strings.

The first group of functions, whose names begin with b (for byte), are from 4.2BSD and are still provided by almost any system that supports the socket functions. The second group of functions, whose names begin with mem (for memory), are from the ANSI C standard and are provided with any system that supports an ANSI C library.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

24 | P a g e

N E T W O R K P R O G R A M M I N G

24 | P a g e N E T W O R K P R O
24 | P a g e N E T W O R K P R O
24 | P a g e N E T W O R K P R O

src might represent application space and dest might represent socket send buffer space (socket receive buffer space).

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

25 | P a g e

N E T W O R K P R O G R A M M I N G

inet_aton, inet_addr, and inet_ntoa Functions

To send IP address on the network, we have the functions that serve the purpose. The following functions are for IPV4.

serve the purpose. The following functions are for IPV4. inet_pton and inet_ntop Functions The IPV6 functions

inet_pton and inet_ntop Functions

The IPV6 functions for the data communication over the network, following functions are used. These functions can also be used for IPV4 addresses also (The ‗family‘ argument specifies this).

addresses also (The ‗family‘ argument specifies this). Notes prepared by D. Teja Santosh, Assistant Professor,
addresses also (The ‗family‘ argument specifies this). Notes prepared by D. Teja Santosh, Assistant Professor,

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

26 | P a g e

N E T W O R K P R O G R A M M I N G

sock_ntop Function

A basic problem with inet_ntop is that it requires the caller to pass a pointer to a binary

address. This address is normally contained in a socket address structure, requiring the caller

to know the format of the structure and the address family.

To solve this problem, sock_ntop() is used which takes pointer to a socket address structure

as an argument, calls the appropriate function and the presentation address is returned.

function and the presentation address is returned. readn , writen , and readline Functions Stream sockets

readn, writen, and readline Functions

Stream sockets (e.g., TCP sockets) exhibit a behavior with the read and write functions that differ from normal file I/O. A read or write on a stream socket might input or output fewer

bytes than requested, but this is not an error condition. The reason is that buffer limits might

be reached for the socket in the kernel. All that is required to input or output the remaining

bytes is for the caller to invoke the read or write function again. Some versions of Unix also exhibit this behavior when writing more than 4,096 bytes to a pipe. This scenario is always a possibility on a stream socket with read, but is normally seen with write only if the socket is nonblocking. Nevertheless, we always call our writen function instead of write, in case the implementation returns a short count.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

27 | P a g e

N E T W O R K P R O G R A M M I N G

27 | P a g e N E T W O R K P R O

The following functions overcome this problem.

R A M M I N G The following functions overcome this problem. Notes prepared by

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

28 | P a g e

N E T W O R K P R O G R A M M I N G

Elementary TCP Sockets

N E T W O R K P R O G R A M M I

Socket functions for elementary TCP client/server

Socket:

socket (af, type, protocol); Creates a socket on demand (placing it in an unconnected state), returns an integer identifying the socket (descriptor), and specifies:

Address Family (af) - particular address of the family. Type - Type of communication socket:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

29 | P a g e

N E T W O R K P R O G R A M M I N G

SOCK_STREAM - connection-oriented SOCK_DGRAM - connection-less SOCK_RAW - access to low-level protocols or network interfaces. Protocol - Accommodates multiple protocols within a family.

Bind:

bind (socket, localaddr, addrlen); Socket is created without any association to local or destination addresses, so a program uses bind to establish a local address for it. Socket - integer descriptor of the socket. Localaddr - structure that specifies the local address to be bound. Addrlen - integer length of the address (in bytes).

Listen:

listen (socket, qlength); Server creates a socket, binds it to a well-known port, and waits for requests. To avoid rejecting service requests that cannot be handled, a server queue is created using Listen. It provides a mechanism to create the queue and then listen for incoming connections (passive mode). Listen only works with sockets using a reliable stream service. Socket - Integer descriptor. Qlength - length of the request queue for that socket (max. = 5).

Connect:

connect (socket, destaddr, addrlen); Binds a permanent destination to a socket placing it in a connected state. Sockets using connection-less service do not have to use connect (specify the address in every datagram), but may. Socket - socket descriptor. Destaddr - socket_addr structure (also includes protocol port number) specifying the destination address. Addrlen - length of destination address (in bytes).

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

30 | P a g e

N E T W O R K P R O G R A M M I N G

Accept:

accept (socket, addr, addrlen); Bind associates a socket with port, but that socket is not connected to a foreign destination. When a request comes in, Accept establishes the full connection. It blocks until a connection request arrives. Addr - pointer to the sockaddr structure. Addrlen - pointer to integer size of address.

Close: (A system call from traditional UNIX Environment) close (socket descriptor); When a client or server finishes with a socket, calls close to deallocate it‘s resources. The connection immediately terminates unless several processes share the same socket. It then decrements the reference count (closing it completely when reference count = 0).

Order of Socket System Calls:

Client Side Client Side (depends on connection type):

Socket Connect Write (may be repeated) Read (may be repeated) Close

Server Side Server Side (depends on connection type):

Socket Bind Listen Accept Read (may be repeated) Write (may be repeated) Close (go back to Accept)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

31 | P a g e

N E T W O R K P R O G R A M M I N G

Shutdown:

Shutdown (socket, direction);

The shutdown function applies to full-duplex sockets (connected using a TCP socket) and is used to partially close the connection. Socket - socket descriptor of a connected socket. Direction - direction in which shutdown is desired

0 = terminate further input.

1 = terminate further output.

2 = terminate input / output (close).

IMPORTANT NOTES:

File and Socket Descriptors:

A socket is a generalized UNIX file access mechanism that provides an endpoint for communication. Descriptors (maintained in the descriptor tables) are kept per process by the operating system to point to internal data structures for files and sockets. Descriptors are small integer values. File Descriptor:

Bound to a file when open is called. Socket Descriptor:

Created using open, but does not bind it to a destination. Unbounded - UDP specifies destination every time. Bounded - TCP specifies destination during an open system call.

- TCP specifies destination during an open system call. Notes prepared by D. Teja Santosh, Assistant

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

32 | P a g e

N E T W O R K P R O G R A M M I N G

After a socket has been created (using open), additional system calls are required to specify the details of it‘s use. Passive Socket - used by a server to wait for calls. Active Socket - used by a client to initiate a connection.

Basic I/O Functions in UNIX:

UNIX and other operating systems provide a basic set of system functions used for I/O operations on files and other devices. Most operating systems provide similar variations to the five standard I/O operations that BSD UNIX uses.

I/O Functions:

Open - prepare for input / output. Close - terminate the use of a device. Write - transfer data from memory to an output device. Read - transfer data from an input device to memory. Lseek - position the head of a disk drive to a specific place on the disk.

The Socket Interface:

The Berkeley socket interface provides generalized functions that support network communication using many possible protocols.

Socket calls refer to all TCP/IP protocols as a single protocol family (protocol suite). The calls allow a programmer to specify the type of service required, rather than the name of a specific protocol.

The socket interface was created since an API (application program interface) for network connections is not standardized, it‘s design lies outside the scope of a protocol suite.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

33 | P a g e

N E T W O R K P R O G R A M M I N G

Concurrent Servers

g e N E T W O R K P R O G R A M
g e N E T W O R K P R O G R A M
g e N E T W O R K P R O G R A M

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

34 | P a g e

N E T W O R K P R O G R A M M I N G

34 | P a g e N E T W O R K P R O

getsockname and getpeername Functions

These two functions return either the local protocol address associated with a socket (getsockname) or the foreign protocol address associated with a socket (getpeername).

#include <sys/socket.h> int getsockname(intsockfd, struct sockaddr *localaddr, socklen_t *addrlen); int getpeername(intsockfd, struct sockaddr *peeraddr, socklen_t *addrlen); Both return: 0 if OK, -1 on error

Notice that the final argument for both functions is a value-result argument. That is, both functions fill in the socket address structure pointed to by localaddr or peeraddr. We mentioned in our discussion of bind that the term "name" is misleading. These two functions return the protocol address associated with one of the two ends of a network connection, which for IPV4 and IPV6 is the combination of an IP address and port number. These functions have nothing to do with domain names.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

35 | P a g e

N E T W O R K P R O G R A M M I N G

These two functions are required for the following reasons:

After connect successfully returns in a TCP client that does not call bind, getsockname returns the local IP address and local port number assigned to the connection by the kernel.

After calling bind with a port number of 0 (telling the kernel to choose the local port number), getsockname returns the local port number that was assigned. getsockname can be called to obtain the address family of a socket.

In a TCP server that binds the wildcard IP address, once a connection is established with a client (accept returns successfully), the server can call getsockname to obtain the local IP address assigned to the connection. The socket descriptor argument in this call must be that of the connected socket, and not the listening socket.

When a server is execed by the process that calls accept, the only way the server can obtain the identity of the client is to call getpeername.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

36 | P a g e

N E T W O R K P R O G R A M M I N G

UNIT-III

TCP Client/Server Example

Introduction

Our simple example is an echo server that performs the following steps:

1. The client reads a line of text from its standard input and writes the line to the server.

2. The server reads the line from its network input and echoes the line back to the client.

3. The client reads the echoed line and prints it on its standard output.

reads the echoed line and prints it on its standard output. Normal Startup(w.r.to socket pair) In

Normal Startup(w.r.to socket pair)

In order to initiate the communication between the client and server, we first start the Server by calling socket(). The socket pair at the server is; SP = (IPs:Ps , IPc:Pc)

where IPc IP address of Client IPs IP address of Server Pc Port Number of Client Ps Port Number of Server

Next comes bind(), then SP = (localhost:33600 , IPc:Pc) Then listen(), now SP = (localhost:33600 , IPc:Pc) [You may enter wildcard character „*‟ for IPs, IPc, Pc when they are not known.]

So, at Server the status is ―Passive Open‖ and the format is:

Server socket() - SP = (IPs:Ps , IPc:Pc) bind() - SP = (localhost:33600 , IPc:Pc)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

37 | P a g e

N E T W O R K P R O G R A M M I N G

listen() - SP = (localhost:33600 , IPc:Pc) or (*:33600 , *:*)

Now, the Client requests the connection with the server. The function calls are; socket(). The socket pair is;

SP = (IPc:Pc , IPs:Ps) So, at the client side, the status is ―Active Open‖. Now, ―SIMULTANEOUS OPEN‖ situation occurs as both the ends connect with each other as, At Client:

Call is connect() SP = (localhost:33597, x.y.z.w:33600) At Server:

Call is accept() SP = (localhost:33600 , a.b.c.d:33597) The format is:

Client socket() - SP = (IPc:Pc , IPs:Ps)

SIMULTANEOUS OPEN connect() SP = (localhost:33597, x.y.z.w:33600) accept() SP = (localhost:33600 , a.b.c.d:33597) At this point, Normal Startup of Client and Server is said to be occurred.

The following steps take place with our Client/Server example:

1. The client calls str_cli, which will block in the call to fgets, because we have not typed a line of input yet.

2. When accept returns in the server, it calls fork and the child calls str_echo. This function calls readline, which calls read, which blocks while waiting for a line to be sent from the client.

3. The server parent, on the other hand, calls accept again, and blocks while waiting for the next client connection.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

38 | P a g e

N E T W O R K P R O G R A M M I N G

Normal Termination

We can follow through the steps involved in the normal termination of our client and server:

1. When we type our EOF character, fgets returns a null pointer and the function str_cli returns.

2. When str_cli returns to the client main function , the latter terminates by calling exit.

3. Part of process termination is the closing of all open descriptors, so the client socket is closed by the kernel. This sends a FIN to the server, to which the server TCP responds with an ACK. This is the first half of the TCP connection termination sequence. At this point, the server socket is in the CLOSE_WAIT state and the client socket is in the FIN_WAIT_2 state.

4. When the server TCP receives the FIN, the server child is blocked in a call to readline, and readline then returns 0. This causes the str_echo function to return to the server child main.

5. The server child terminates by calling exit.

6. All open descriptors in the server child are closed. The closing of the connected socket by the child causes the final two segments of the TCP connection termination to take place: a FIN from the server to the client, and an ACK from the client. At this point, the connection is completely terminated. The client socket enters the TIME_WAIT state.

7. Finally, the SIGCHLD signal is sent to the parent when the server child terminates. This occurs in this example, but we do not catch the signal in our code, and the default action of the signal is to be ignored. Thus, the child enters the zombie state. We can verify this with the ps command.

wait and waitpid Functions

we call the wait function to handle the terminated child.

#include <sys/wait.h> pid_t wait (int *statloc); pid_t waitpid (pid_tpid, int *statloc, intoptions); Both return: process ID if OK, 0 or1 on error

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

39 | P a g e

N E T W O R K P R O G R A M M I N G

wait and waitpid both return two values: the return value of the function is the process ID of the terminated child, and the termination status of the child (an integer) is returned through the statloc pointer. There are three macros that we can call that examine the termination status and tell us if the child terminated normally, was killed by a signal, or was just stopped by job control. Additional macros let us then fetch the exit status of the child, or the value of the signal that killed the child, or the value of the job-control signal that stopped the child. We will use the WIFEXITED and WEXITSTATUS macros for this purpose. If there are no terminated children for the process calling wait, but the process has one or more children that are still executing, then wait blocks until the first of the existing children terminates.

blocks until the first of the existing children terminates. waitpid gives us more control over which
blocks until the first of the existing children terminates. waitpid gives us more control over which

waitpid gives us more control over which process to wait for and whether or not to block. First, the pid argument lets us specify the process ID that we want to wait for. A value of -1 says to wait for the first of our children to terminate. (There are other options, dealing with process group IDs, but we do not need them in this text.) The options argument lets us specify additional options. The most common option is WNOHANG. This option tells the kernel not to block if there are no terminated children.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

40 | P a g e

N E T W O R K P R O G R A M M I N G

Termination of Server Process

We will now start our client/server and then kill the server child process. This simulates the crashing of the server process, so we can see what happens to the client. The following steps take place:

1.

We start the server and client and type one line to the client to verify that all is okay. That line is echoed normally by the server child.

2.

We find the process ID of the server child and kill it. As part of process termination, all open descriptors in the child are closed. This causes a FIN to be sent to the client, and the client TCP responds with an ACK. This is the first half of the TCP connection termination.

3.

The SIGCHLD signal is sent to the server parent and handled correctly.

4.

Nothing happens at the client. The client TCP receives the FIN from the server TCP and responds with an ACK, but the problem is that the client process is blocked in the call to fgets waiting for a line from the terminal.

5.

Running netstat at this point shows the state of the sockets.

 

linux

% netstat

-a | grep

9877

 

tcp

0

0

*:9877

*:*

LISTEN

tcp

0

0

localhost:9877

localhost:43604

FIN_WAIT2

tcp

1

0

localhost:43604

localhost:9877

CLOSE_WAIT

6.

We can still type a line of input to the client. Here is what happens at the client starting from Step 1:

linux

%tcpcli01

127.0.0.1

start client

hello

the first line that we type

hello

is echoed correctly here we kill the

another line str_cli : server terminated prematurely

 

server child on the server host we then type a second line to the client

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

41 | P a g e

N E T W O R K P R O G R A M M I N G

When we type "another line," str_cli calls writen and the client TCP sends the data to the server. This is allowed by TCP because the receipt of the FIN by the client TCP only indicates that the server process has closed its end of the connection and will not be sending any more data. The receipt of the FIN does not tell the client TCP that the server process has terminated (which in this case, it has). When the server TCP receives the data from the client, it responds with an RST since the process that had that socket open has terminated. We can verify that the RST was sent by watching the packets with tcpdump.

7. The client process will not see the RST because it calls readline immediately after the call to writen and readline returns 0 (EOF) immediately because of the FIN that was received in Step 2. Our client is not expecting to receive an EOF at this point so it quits with the error message "server terminated prematurely."

8. When the client terminates, all its open descriptors are closed.

Crashing of Server Host

The following steps take place:

1. When the server host crashes, nothing is sent out on the existing network connections. That is, we are assuming the host crashes and is not shut down by an operator.

2. We type a line of input to the client, it is written by writen , and is sent by the client TCP as a data segment. The client then blocks in the call to readline, waiting for the echoed reply.

3. If we watch the network with tcpdump, we will see the client TCP continually retransmitting the data segment, trying to receive an ACK from the server. Section 25.11 of TCPv2 shows a typical pattern for TCP retransmissions: Berkeley-derived implementations retransmit the data segment 12 times, waiting for around 9 minutes before giving up. When the client TCP finally gives up (assuming the server host has not been rebooted during this time, or if the server host has not crashed but was unreachable on the network, assuming the host was still unreachable), an error is returned to the client process. Since the client is blocked in the call to readline, it returns an error. Assuming the server host crashed and there were no responses at all

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

42 | P a g e

N E T W O R K P R O G R A M M I N G

to the client's data segments, the error is ETIMEDOUT. But if some intermediate router determined that the server host was unreachable and responded with an ICMP destination unreachablemessage, the error is either EHOSTUNREACH or ENETUNREACH.

Crashing and Rebooting of Server Host

The following steps take place:

1. We start the server and then the client. We type a line to verify that the connection is established.

2. The server host crashes and reboots. We type a line of input to the client, which is sent as a TCP data segment to the server host.

3. When the server host reboots after crashing, its TCP loses all information about connections that existed before the crash. Therefore, the server TCP responds to the received data segment from the client with an RST.

4. Our client is blocked in the call to readline when the RST is received, causing readline to return the error ECONNRESET.

Shutdown of Server Host

The previous two sections discussed the crashing of the server host, or the server host being unreachable across the network. We now consider what happens if the server host is shut down by an operator while our server process is running on that host. When a Unix system is shut down, the init process normally sends the SIGTERM signal to all processes (we can catch this signal), waits some fixed amount of time (often between 5 and 20 seconds), and then sends the SIGKILL signal (which we cannot catch) to any processes still running. This gives all running processes a short amount of time to clean up and terminate. If we do not catch SIGTERM and terminate, our server will be terminated by the SIGKILL signal. When the process terminates, all open descriptors are closed, and we then follow the same sequence of steps discussed in TERMINATION OF SERVER PROCESS. As stated there, we must use the select or poll function in our client to have the client detect the termination of the server process as soon as it occurs.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

43 | P a g e

N E T W O R K P R O G R A M M I N G

UNIT-IV

I/O Multiplexing:The select and poll functions

Introduction

We saw our TCP client handling two inputs at the same time: standard input and a TCP socket. We encountered a problem when the client was blocked in a call to fgets (on standard input) and the server process was killed. The server TCP correctly sent a FIN to the client TCP, but since the client process was blocked reading from standard input, it never saw the EOF until it read from the socket (possibly much later). What we need is the capability to tell the kernel that we want to be notified if one or more I/O conditions are ready (i.e., input is ready to be read, or the descriptor is capable of taking more output). This capability is called I/O multiplexing and is provided by the select and poll functions. We will also cover a newer POSIX variation of the former, called pselect.

a newer POSIX variation of the former, called pselect . Notes prepared by D. Teja Santosh,

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

44 | P a g e

N E T W O R K P R O G R A M M I N G

I/O multiplexing is typically used in networking applications in the following scenarios:

When a client is handling multiple descriptors (normally interactive input and a network socket), I/O multiplexing should be used.

It is possible, but rare, for a client to handle multiple sockets at the same time.

If a TCP server handles both a listening socket and its connected sockets, I/O multiplexing is normally used.

If a server handles TCP and UDP, I/O multiplexing is normally used.

If a server handles multiple services and perhaps multiple protocols, I/O multiplexing is normally used.

There are normally two distinct phases for an input operation:

1. Waiting for the data to be ready

2. Copying the data from the kernel to the process

For an input operation on a socket, the first step normally involves waiting for data to arrive on the network. When the packet arrives, it is copied into a buffer within the kernel. The second step is copying this data from the kernel's buffer into our application buffer.

I/O Models

The five I/O models those are available to us under UNIX:

blocking I/O

nonblocking I/O

I/O multiplexing (select and poll)

signal driven I/O (SIGIO)

asynchronous I/O (the POSIX aio_functions)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

45 | P a g e

N E T W O R K P R O G R A M M I N G

BLOCKING I/O MODEL:

e N E T W O R K P R O G R A M M

NONBLOCKING I/O MODEL:

O G R A M M I N G BLOCKING I/O MODEL: NONBLOCKING I/O MODEL: Notes

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

46 | P a g e

N E T W O R K P R O G R A M M I N G

I/O MULTIPLEXING

a g e N E T W O R K P R O G R A

SIGNAL-DRIVEN I/O

R K P R O G R A M M I N G I/O MULTIPLEXING SIGNAL-DRIVEN

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

47 | P a g e

N E T W O R K P R O G R A M M I N G

ASYNCHRONOUS I/O MODEL

N E T W O R K P R O G R A M M I

SELECT FUNCTION

select()Synchronous I/O Multiplexing

This function is somewhat strange, but it's very useful. Take the following situation: you are a server and you want to listen for incoming connections as well as keep reading from the connections you already have.

No problem, you say, just an accept() and a couple of recv()s. Not so fast, buster! What if you're blocking on an accept() call? How are you going to recv() data at the same time? "Use non-blocking sockets!" No way! You don't want to be a CPU hog. What, then?

select() gives you the power to monitor several sockets at the same time. It'll tell you which ones are ready for reading, which are ready for writing, and which sockets have raised exceptions, if you really want to know that.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

48 | P a g e

N E T W O R K P R O G R A M M I N G

This being said, in modern times select(), though very portable, is one of the slowest methods for monitoring sockets. One possible alternative is libevent, or something similar, that encapsulates all the system-dependent stuff involved with getting socket notifications.

Without any further ado, I'll offer the synopsis of select():

#include <sys/time.h> #include <sys/types.h> #include <unistd.h>

int select(int numfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

The function monitors "sets" of file descriptors; in particular readfds, writefds, and exceptfds. If you want to see if you can read from standard input and some socket descriptor, sockfd , just add the file descriptors 0 and sockfd to the set readfds. The parameter numfds should be set to the values of the highest file descriptor plus one. In this example, it should be set to sockfd+1, since it is assuredly higher than standard input (0).

When select() returns, readfds will be modified to reflect which of the file descriptors you selected which is ready for reading. You can test them with the macro FD_ISSET(), below.

Before progressing much further, I'll talk about how to manipulate these sets. Each set is of the type fd_set. The following macros operate on this type:

FD_SET(int fd, fd_set *set); Add fd to the set . FD_CLR(int fd, fd_set *set); Remove fd from the set. FD_ISSET(int fd, fd_set *set); Return true if fd is in the set.

FD_ZERO(fd_set *set);

Clear all entries from the set.

Finally, what is this weirded out struct timeval? Well, sometimes you don't want to wait forever for someone to send you some data. Maybe every 96 seconds you want to print "Still "

Going

to the terminal even though nothing has happened. This time structure allows you to

specify a timeout period. If the time is exceeded and select() still hasn't found any ready

file descriptors, it'll return so you can continue processing.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

49 | P a g e

N E T W O R K P R O G R A M M I N G

The struct timeval has the follow fields:

struct timeval { int tv_sec;

// seconds

int tv_usec;

// microseconds

};

Just set tv_sec to the number of seconds to wait, and set tv_usec to the number of microseconds to wait. Yes, that's microseconds, not milliseconds. There are 1,000 microseconds in a millisecond, and 1,000 milliseconds in a second. Thus, there are 1,000,000 microseconds in a second. Why is it "usec"? The "u" is supposed to look like the Greek letter μ (Mu) that we use for "micro". Also, when the function returns, timeout might be updated to show the time still remaining. This depends on what flavor of Unix you're running.

Yay! We have a microsecond resolution timer! Well, don't count on it. You'll probably have to wait some part of your standard Unix timeslice no matter how small you set yourstruct

timeval.

Other things of interest: If you set the fields in your struct timeval to 0, select() will timeout immediately, effectively polling all the file descriptors in your sets. If you set the parametertimeout to NULL, it will never timeout, and will wait until the first file descriptor is ready. Finally, if you don't care about waiting for a certain set, you can just set it to NULL in the call toselect().

The following code snippet waits 2.5 seconds for something to appear on standard input:

/* ** select.c -- a select() demo */

#include <stdio.h> #include <sys/time.h> #include <sys/types.h> #include <unistd.h>

#define STDIN 0 // file descriptor for standard input

int main(void)

{

struct timeval tv; fd_set readfds;

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

50 | P a g e

N E T W O R K P R O G R A M M I N G

tv.tv_sec = 2; tv.tv_usec = 500000;

FD_ZERO(&readfds); FD_SET(STDIN, &readfds);

// don't care about writefds and exceptfds:

select(STDIN+1, &readfds, NULL, NULL, &tv);

if (FD_ISSET(STDIN, &readfds)) printf("A key was pressed!\n");

else

printf("Timed out.\n");

return 0;

}

If you're on a line buffered terminal, the key you hit should be RETURN or it will time out anyway.

Now, some of you might think this is a great way to wait for data on a datagram socketand you are right: it might be. Some Unices can use select in this manner, and some can't. You should see what your local man page says on the matter if you want to attempt it.

Some Unices update the time in your struct timeval to reflect the amount of time still remaining before a timeout. But others do not. Don't rely on that occurring if you want to be portable. (Use gettimeofday() if you need to track time elapsed. It's a bummer, I know, but that's the way it is.)

What happens if a socket in the read set closes the connection? Well, in that case, select() returns with that socket descriptor set as "ready to read". When you actually do recv() from it,recv() will return 0. That's how you know the client has closed the connection.

One more note of interest about select(): if you have a socket that is listen()ing, you can check to see if there is a new connection by putting that socket's file descriptor in the readfds set.

And that, my friends, is a quick overview of the almighty select() function.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

51 | P a g e

N E T W O R K P R O G R A M M I N G

But, by popular demand, here is an in-depth example. Unfortunately, the difference between the dirt-simple example, above, and this one here is significant. But have a look, then read the description that follows it.

This program acts like a simple multi-user chat server. Start it running in one window, then telnet to it ("telnet hostname 9034") from multiple other windows. When you type something in onetelnet session, it should appear in all the others.

/* ** selectserver.c -- a cheezy multiperson chat server */

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <netdb.h>

#define PORT "9034"

// port we're listening on

// get sockaddr, IPv4 or IPv6:

void *get_in_addr(struct sockaddr *sa)

{

if (sa->sa_family == AF_INET) { return &(((struct sockaddr_in*)sa)->sin_addr);

}

return &(((struct sockaddr_in6*)sa)->sin6_addr);

}

int main(void)

{

fd_set master;

fd_set read_fds; // temp file descriptor list for select()

// master file descriptor list

int fdmax;

// maximum file descriptor number

int listener;

// listening socket descriptor

int newfd;

// newly accept()ed socket descriptor

struct sockaddr_storage remoteaddr; // client address

socklen_t addrlen;

char buf[256];

// buffer for client data

int nbytes;

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

52 | P a g e

N E T W O R K P R O G R A M M I N G

char remoteIP[INET6_ADDRSTRLEN];

int yes=1; int i, j, rv;

// for setsockopt() SO_REUSEADDR, below

struct addrinfo hints, *ai, *p;

FD_ZERO(&master);

FD_ZERO(&read_fds);

// clear the master and temp sets

// get us a socket and bind it memset(&hints, 0, sizeof hints); hints.ai_family = AF_UNSPEC; hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_PASSIVE; if ((rv = getaddrinfo(NULL, PORT, &hints, &ai)) != 0) { fprintf(stderr, "selectserver: %s\n", gai_strerror(rv));

exit(1);

}

for(p = ai; p != NULL; p = p->ai_next) { listener = socket(p->ai_family, p->ai_socktype, p->ai_protocol); if (listener < 0) { continue;

}

// lose the pesky "address already in use" error message setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int));

if (bind(listener, p->ai_addr, p->ai_addrlen) < 0) { close(listener); continue;

}

break;

}

// if we got here, it means we didn't get bound if (p == NULL) { fprintf(stderr, "selectserver: failed to bind\n");

exit(2);

}

freeaddrinfo(ai); // all done with this

// listen if (listen(listener, 10) == -1) { perror("listen");

exit(3);

}

// add the listener to the master set

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

53 | P a g e

N E T W O R K P R O G R A M M I N G

FD_SET(listener, &master);

// keep track of the biggest file descriptor fdmax = listener; // so far, it's this one

// main loop for(;;) { read_fds = master; // copy it if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) { perror("select");

exit(4);

}

// run through the existing connections looking for data to read for(i = 0; i <= fdmax; i++) { if (FD_ISSET(i, &read_fds)) { // we got one!! if (i == listener) { // handle new connections addrlen = sizeof remoteaddr; newfd = accept(listener, (struct sockaddr *)&remoteaddr, &addrlen);

if (newfd == -1) { perror("accept");

}

else {

FD_SET(newfd, &master); // add to master set

if (newfd > fdmax) { fdmax = newfd;

// keep track of the max

}

printf("selectserver: new connection from %s on " "socket %d\n", inet_ntop(remoteaddr.ss_family, get_in_addr((struct sockaddr*)&remoteaddr), remoteIP, INET6_ADDRSTRLEN), newfd);

}

} else { // handle data from a client if ((nbytes = recv(i, buf, sizeof buf, 0)) <= 0) { // got error or connection closed by client if (nbytes == 0) {

// connection closed printf("selectserver: socket %d hung up\n", i); } else { perror("recv");

}

close(i); // bye! FD_CLR(i, &master); // remove from master set

} else { // we got some data from a client

for(j = 0; j <= fdmax; j++) {

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

54 | P a g e

N E T W O R K P R O G R A M M I N G

// send to everyone! if (FD_ISSET(j, &master)) { // except the listener and ourselves if (j != listener && j != i) { if (send(j, buf, nbytes, 0) == -1) { perror("send");

}

}

}

}

}

} // END handle data from client } // END got new incoming connection } // END looping through file descriptors } // END for(;;)--and you thought it would never end!

return 0;

}

Notice I have two file descriptor sets in the code: master and read_fds . The first, master, holds all the socket descriptors that are currently connected, as well as the socket descriptor that is listening for new connections.

The reason I have the master set is that select() actually changes the set you pass into it to reflect which sockets are ready to read. Since I have to keep track of the connections from one call of select() to the next, I must store these safely away somewhere. At the last minute, I copy the master into the read_fds, and then call select().

But doesn't this mean that every time I get a new connection, I have to add it to the master set? Yup! And every time a connection closes, I have to remove it from the master set? Yes, it does.

Notice I check to see when the listener socket is ready to read. When it is, it means I have a new connection pending, and I accept() it and add it to the master set. Similarly, when a client connection is ready to read, and recv() returns 0, I know the client has closed the

connection, and I must remove it from the master set.

If the client recv() returns non-zero, though, I know some data has been received. So I get

it, and then go through the master list and send that data to all the rest of the connected clients.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

55 | P a g e

N E T W O R K P R O G R A M M I N G

And that, my friends, is a less-than-simple overview of the almighty select() function.

In addition, here is a bonus afterthought: there is another function called poll() which behaves much the same way select() does, but with a different system for managing the file descriptor sets.

POLL FUNCTION

poll()

Test for events on multiple sockets simultaneously

Prototypes

#include <sys/poll.h>

int poll(struct pollfd * ufds , unsigned int nfds , int timeout );

Description

This function is very similar to select() in that they both watch sets of file descriptors for events, such as incoming data ready to recv(), socket ready to send() data to, out-of-band data ready to recv(), errors, etc.

The basic idea is that you pass an array of nfds struct pollfds in ufds, along with a timeout in milliseconds (1000 milliseconds in a second.) The timeout can be negative if you want to wait forever. If no event happens on any of the socket descriptors by the timeout, poll() will return.

Each element in the array of struct pollfds represents one socket descriptor, and contains the following fields:

struct pollfd { int fd; short events;

// the socket descriptor // bitmap of events we're interested in

short revents; // when poll() returns, bitmap of events that occurred

};

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

56 | P a g e

N E T W O R K P R O G R A M M I N G

Before calling poll(), load fd with the socket descriptor (if you set fd to a negative

number, this struct pollfd is ignored and its revents field is set to zero) and then

construct the events field by bitwise-ORing the following macros:

POLLIN

POLLOUT

POLLPRI

Alert me when data is ready to recv() on this socket.

Alert me when I can send() data to this socket without blocking.

Alert me when out-of-band data is ready to recv() on this socket.

Once the poll() call returns, the revents field will be constructed as a bitwise-OR of the

above fields, telling you which descriptors actually have had that event occur. Additionally,

these other fields might be present:

POLLERR

POLLHUP

POLLNVAL

An error has occurred on this socket.

The remote side of the connection hung up.

Something was wrong with the socket descriptor fdmaybe it's uninitialized?

Return Value

Returns the number of elements in the ufds array that have had event occur on them; this can

be zero if the timeout occurred. Also returns -1 on error (and errno will be set accordingly.)

Example

int s1, s2; int rv; char buf1[256], buf2[256]; struct pollfd ufds[2];

s1 = socket(PF_INET, SOCK_STREAM, 0); s2 = socket(PF_INET, SOCK_STREAM, 0);

// pretend we've connected both to a server at this point

//connect(s1,

)

//connect(s2,

)

// set up the array of file descriptors. // // in this example, we want to know when there's normal or out-of-band // data ready to be recv()'d

ufds[0].fd = s1;

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

57 | P a g e

N E T W O R K P R O G R A M M I N G

ufds[0].events = POLLIN | POLLPRI; // check for normal or out-of-band

ufds[1] = s2; ufds[1].events = POLLIN; // check for just normal data

// wait for events on the sockets, 3.5 second timeout rv = poll(ufds, 2, 3500);

if (rv == -1) { perror("poll"); // error occurred in poll()

}

else if (rv == 0) { printf("Timeout occurred! No data after 3.5 seconds.\n");

}

else { // check for events on s1:

if (ufds[0].revents & POLLIN) { recv(s1, buf1, sizeof buf1, 0); // receive normal data

}

if (ufds[0].revents & POLLPRI) { recv(s1, buf1, sizeof buf1, MSG_OOB); // out-of-band data

}

// check for events on s2:

if (ufds[1].revents & POLLIN) { recv(s1, buf2, sizeof buf2, 0);

}

}

Socket Options

There are various ways to get and set the options that affect a socket:

The getsockopt and setsockopt functions

The fcntl function

The ioctl function

This chapter starts by covering the setsockopt and getsockopt functions, followed by an example that prints the default value of all the options, and then a detailed description of all the socket options. We divide the detailed descriptions into the following categories: generic,

IPv4, IPv6, TCP, and SCTP. This detailed coverage can be skipped during a first reading of this chapter, and the individual sections referred to when needed. A few options are discussed in detail in a later chapter, such as the IPv4 and IPv6 multicasting options.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

58 | P a g e

N E T W O R K P R O G R A M M I N G

setsockopt(), getsockopt()

Set various options for a socket

Prototypes

#include <sys/types.h> #include <sys/socket.h>

int getsockopt(int s , int level , int optname , void * optval , socklen_t * optlen ); int setsockopt(int s , int level , int optname , const void * optval , socklen_t optlen );

Description

Sockets are fairly configurable beasts. In fact, they are so configurable, I'm not even going to

cover it all here. It's probably system-dependent anyway. But I will talk about the basics.

Obviously, these functions get and set certain options on a socket. On a Linux box, all the

socket information is in the man page for socket in section 7. (Type: "man 7 socket" to get

all these goodies.)

As for parameters, s is the socket you're talking about, level should be set to SOL_SOCKET.

Then you set the optname to the name you're interested in. Again, see your man page for all

the options, but here are some of the most fun ones:

SO_BINDTODEVICE Bind this socket to a symbolic device name like eth0 instead of using bind() to bind it to an IP address. Type the command ifconfig under Unix to see the device names.

SO_REUSEADDR

Allows other sockets to bind() to this port, unless there is an active listening socket bound to the port already. This enables you to get around those "Address already in use" error messages when you try to restart your server after a crash.

SO_BROADCAST

Allows UDP datagram (SOCK_DGRAM) sockets to send and receive packets sent to and from the broadcast address. Does nothingNOTHING!!to TCP stream sockets! Hahaha!

As for the parameter optval, it's usually a pointer to an int indicating the value in question.

For booleans, zero is false, and non-zero is true. And that's an absolute fact, unless it's

different on your system. If there is no parameter to be passed, optval can be NULL.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

59 | P a g e

N E T W O R K P R O G R A M M I N G

The final parameter, optlen, is filled out for you by getsockopt() and you have to specify it for setsockopt(), where it will probably be sizeof(int).

Warning: on some systems (notably Sun and Windows), the option can be a char instead of an int, and is set to, for example, a character value of '1' instead of an int value of 1. Again, check your own man pages for more info with "man setsockopt" and "man 7 socket"!

Return Value

Returns zero on success, or -1 on error (and errno will be set accordingly.)

Example

int optval;

int optlen;

char *optval2;

// set SO_REUSEADDR on a socket to true (1):

optval = 1; setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval);

// bind a socket to a device name (might not work on all systems):

optval2 = "eth1"; // 4 bytes long, so 4, below:

setsockopt(s2, SOL_SOCKET, SO_BINDTODEVICE, optval2, 4);

// see if the SO_BROADCAST flag is set:

getsockopt(s3, SOL_SOCKET, SO_BROADCAST, &optval, &optlen); if (optval != 0) { print("SO_BROADCAST enabled on s3!\n");

}

The following options are supported for setsockopt():

SO_DEBUG

Provides the ability to turn on recording of debugging information. This option takes an int value in

the optval argument. This is a BOOL option.

SO_BROADCAST

Permits sending of broadcast messages, if this is supported by the protocol. This option takes

an int value in the optval argument. This is a BOOL option.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

60 | P a g e

N E T W O R K P R O G R A M M I N G

SO_REUSEADDR

Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local

addresses, if this is supported by the protocol. This option takes an int value in the optval argument.

This is a BOOLoption.

SO_KEEPALIVE

Keeps connections active by enabling periodic transmission of messages, if this is supported by the protocol.

If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option takes an int value in

the optval argument. This is a BOOL option.

SO_LINGER

Specifies whether the socket lingers on close() if data is present. If SO_LINGER is set, the system

blocks the process during close() until it can transmit the data or until the end of the interval

indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified,

and close() is issued, the system handles the call in a way that allows the process to continue as

quickly as possible. This option takes a linger structure in the optval argument.

SO_OOBINLINE

Specifies whether the socket leaves received out-of-band data (data marked urgent) in line. This option takes an int value in optval argument. This is a BOOL option.

SO_SNDBUF

Sets send buffer size information. This option takes an int value in the optval argument.

SO_RCVBUF

Sets receive buffer size information. This option takes an int value in the optval argument.

SO_DONTROUTE

Specifies whether outgoing messages bypass the standard routing facilities. The destination must be on

a directly-connected network, and messages are directed to the appropriate network interface according

to the destination address. The effect, if any, of this option depends on what protocol is in use. This

option takes an int value in the optval argument. This is a BOOL option.

TCP_NODELAY

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

61 | P a g e

N E T W O R K P R O G R A M M I N G

Specifies whether the Nagle algorithm used by TCP for send coalescing is to be disabled. This option takes an int value in the optval argument. This is a BOOL option.

For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.

The following options are supported for getsockopt():

SO_DEBUG

Reports whether debugging information is being recorded. This option stores an int value in

the optval argument. This is a BOOL option.

SO_ACCEPTCONN

Reports whether socket listening is enabled. This option stores an int value in the optval argument.

This is a BOOL option.

SO_BROADCAST

Reports whether transmission of broadcast messages is supported, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOL option.

SO_REUSEADDR

Reports whether the rules used in validating addresses supplied to bind() should allow reuse of local

addresses, if this is supported by the protocol. This option stores an int value in the optval argument.

This is a BOOLoption.

SO_KEEPALIVE

Reports whether connections are kept active with periodic transmission of messages, if this is supported by the protocol.

If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option stores an int value in

the optval argument. This is a BOOL option.

SO_LINGER

Reports whether the socket lingers on close() if data is present. If SO_LINGER is set, the system

blocks the process during close() until it can transmit the data or until the end of the interval

indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified,

and close() is issued, the system handles the call in a way that allows the process to continue as

quickly as possible. This option stores a linger structure in the optval argument.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

62 | P a g e

N E T W O R K P R O G R A M M I N G

SO_OOBINLINE

Reports whether the socket leaves received out-of-band data (data marked urgent) in line. This option stores an int value in optval argument. This is a BOOL option.

SO_SNDBUF

Reports send buffer size information. This option stores an int value in the optval argument.

SO_RCVBUF

Reports receive buffer size information. This option stores an int value in the optval argument.

SO_ERROR

Reports information about error status and clears it. This option stores an int value in

the optval argument.

SO_TYPE

Reports the socket type. This option stores an int value in the optval argument.

SO_DONTROUTE

Reports whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option stores an int value in the optval argument. This is a BOOL option.

SO_MAX_MSG_SIZE

Maximum size of a message for message-oriented socket types (for example, SOCK_DGRAM). Has no

meaning for stream-oriented sockets. This option stores an int value in the optval argument.

TCP_NODELAY

Specifies whether the Nagle algorithm used by TCP for send coalescing is disabled. This option stores an int value in the optval argument. This is a BOOL option.

For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

63 | P a g e

N E T W O R K P R O G R A M M I N G

fcntl()

Control socket descriptors

Prototypes

#include <sys/unistd.h> #include <sys/fcntl.h>

int fcntl(int s , int cmd , long arg);

Description

This function is typically used to do file locking and other file-oriented stuff, but it also has a

couple socket-related functions that you might see or use from time to time.

Parameter s is the socket descriptor you wish to operate on, cmd should be set to F_SETFL,

and arg can be one of the following commands. (Like I said, there's more to fcntl() than

I'm letting on here, but I'm trying to stay socket-oriented.)

O_NONBLOCK

Set the socket to be non-blocking. See the section on blocking for more details.

O_ASYNC

Set the socket to do asynchronous I/O. When data is ready to be recv()'d on the socket, the signal SIGIO will be raised. This is rare to see, and beyond the scope of the guide. And I think it's only available on certain systems.

Return Value

Returns zero on success, or -1 on error (and errno will be set accordingly.)

Different uses of the fcntl() system call actually have different return values, but I haven't

covered them here because they're not socket-related. See your local fcntl() man page for

more information.

Example

int s = socket(PF_INET, SOCK_STREAM, 0);

fcntl(s, F_SETFL, O_NONBLOCK); // set to non-blocking

fcntl(s, F_SETFL, O_ASYNC);

// set to asynchronous I/O

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

64 | P a g e

N E T W O R K P R O G R A M M I N G

UNIT-V

Elementary UDP Sockets

Introduction

There are some fundamental differences between applications written using TCP versus those that use UDP. These are because of the differences in the two transport layers: UDP is a connectionless, unreliable, datagram protocol, quite unlike the connection-oriented, reliable byte stream provided by TCP. Nevertheless, there are instances when it makes sense to use UDP instead of TCP. Some popular applications are built using UDP: DNS, NFS, and SNMP, for example.

The below figure shows the function calls for a typical UDP client/server. The client does not establish a connection with the server. Instead, the client just sends a datagram to the server using the sendto function, which requires the address of the destination (the server) as a parameter. Similarly, the server does not accept a connection from a client. Instead, the server just calls the recvfrom function, which waits until data arrives from some client. recvfrom returns the protocol address of the client, along with the datagram, so the server can send a response to the correct client.

The figure also shows a timeline of the typical scenario that takes place for a UDP client/server exchange. We can compare this to the typical TCP exchange. We will also describe the new functions that we us with UDP sockets, recvfrom and sendto, and redo our echo client/server to use UDP. We will also describe the use of the connect function with a UDP socket, and the concept of asynchronous errors.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

65 | P a g e

N E T W O R K P R O G R A M M I N G

65 | P a g e N E T W O R K P R O

send(), sendto()

Send data out over a socket

Prototypes

#include <sys/types.h> #include <sys/socket.h>

ssize_t send(int s, const void *buf, size_t len, int flags); ssize_t sendto(int s, const void *buf, size_t len, int flags, const struct sockaddr *to, socklen_t tolen);

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

66 | P a g e

N E T W O R K P R O G R A M M I N G

Description

These functions send data to a socket. Generally speaking, send() is used for

for

TCP SOCK_STREAM connected sockets, and sendto() is used

UDP SOCK_DGRAM unconnected datagram sockets. With the unconnected sockets, you must

specify the destination of a packet each time you send one, and that's why the last parameters

of sendto() define where the packet is going.

With both send() and sendto(), the parameter s is the socket, buf is a pointer to the data

you want to send, len is the number of bytes you want to send, and flags allows you to

specify more information about how the data is to be sent. Set flags to zero if you want it to

be "normal" data. Here are some of the commonly used flags, but check your

local send() man pages for more details:

MSG_OOB

MSG_DONTROUTE

MSG_DONTWAIT

MSG_NOSIGNAL

Send as "out of band" data. TCP supports this, and it's a way to tell the receiving system that this data has a higher priority than the normal data. The receiver will receive the signal SIGURG and it can then receive this data without first receiving all the rest of the normal data in the queue.

Don't send this data over a router, just keep it local.

If send() would block because outbound traffic is clogged, have it return EAGAIN. This is like a "enable non-blocking just for this send." See the section on blocking for more details.

If you send() to a remote host which is no longer recv()ing,

flag

you'll

prevents that signal from being raised.

typically

get

the

signal SIGPIPE.

Adding

this

Return Value

Returns the number of bytes actually sent, or -1 on error (and errno will be set

accordingly.) Note that the number of bytes actually sent might be less than the number you

asked it to send! See the section on handling partial send()s for a helper function to get

around this.

Also, if the socket has been closed by either side, the process calling send() will get the

signal SIGPIPE. (Unless send() was called with the MSG_NOSIGNAL flag.)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

67 | P a g e

N E T W O R K P R O G R A M M I N G

Example

int spatula_count = 3490; char *secret_message = "The Cheese is in The Toaster";

int stream_socket, dgram_socket; struct sockaddr_in dest; int temp;

// first with TCP stream sockets:

// assume sockets are made and connected //stream_socket = socket( //connect(stream_socket,

// convert to network byte order temp = htonl(spatula_count); // send data normally:

send(stream_socket, &temp, sizeof temp, 0);

// send secret message out of band:

send(stream_socket, secret_message, strlen(secret_message)+1, MSG_OOB);

// now with UDP datagram sockets:

//getaddrinfo(

//dest =

// assume "dest" holds the address of the destination

//dgram_socket = socket(

// send secret message normally:

sendto(dgram_socket, secret_message, strlen(secret_message)+1, 0, (struct sockaddr*)&dest, sizeof dest);

recv(), recvfrom()

Receive data on a socket

Prototypes

#include <sys/types.h> #include <sys/socket.h>

ssize_t recv(int s, void * buf , size_t len , int flags ); ssize_t recvfrom(int s , void * buf , size_t len , int flags , struct sockaddr *from , socklen_t *fromlen );

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

68 | P a g e

N E T W O R K P R O G R A M M I N G

Description

Once you have a socket up and connected, you can read incoming data from the remote side

using the recv() (for TCP SOCK_STREAM sockets) and recvfrom() (for

UDP SOCK_DGRAMsockets).

Both functions take the socket descriptor s, a pointer to the buffer buf, the size (in bytes) of

the buffer len, and a set of flags that control how the functions work.

Additionally, the recvfrom() takes a struct sockaddr*, from that will tell you where

the data came from, and will fill in fromlen with the size of struct sockaddr. (You

must also initialize fromlen to be the size of from or struct sockaddr.)

So what wondrous flags can you pass into this function? Here are some of them, but you

should check your local man pages for more information and what is actually supported on

your system. You bitwise-or these together, or just set flags to 0 if you want it to be a

regular vanilla recv().

MSG_OOB

Receive Out of Band data. This is how to get data that has been sent to you with the MSG_OOB flag in send(). As the receiving side, you will have had signal SIGURG raised telling you there is urgent data. In your handler for that signal, you could call recv()with this MSG_OOB flag.

MSG_PEEK

If you want to call recv() "just for pretend", you can call it

with this flag. This will tell you what's waiting in the buffer for

when

you

call recv() "for

real"

(i.e. without the MSG_PEEK flag. It's like a sneak preview into the next recv() call.

MSG_WAITALL Tell recv() to not return until all the data you specified in the len parameter. It will ignore your wishes in extreme circumstances, however, like if a signal interrupts the call or if some error occurs or if the remote side closes the connection, etc. Don't be mad with it.

When you call recv(), it will block until there is some data to read. If you want to not

block, set the socket to non-blocking or check with select() or poll() to see if there is

incoming data before calling recv() or recvfrom().

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

69 | P a g e

N E T W O R K P R O G R A M M I N G

Return Value

Returns the number of bytes actually received (which might be less than you requested in the len parameter), or -1 on error (and errno will be set accordingly.)

If the remote side has closed the connection, recv() will return 0. This is the normal method for determining if the remote side has closed the connection. Normality is good, rebel!

Example

// stream sockets and recv()

struct addrinfo hints, *res; int sockfd; char buf[512]; int byte_count;

// get host info, make socket, and connect it memset(&hints, 0, sizeof hints); hints.ai_family = AF_UNSPEC; // use IPv4 or IPv6, whichever hints.ai_socktype = SOCK_STREAM; getaddrinfo("www.example.com", "3490", &hints, &res); sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol); connect(sockfd, res->ai_addr, res->ai_addrlen);

// all right! now that we're connected, we can receive some data! byte_count = recv(sockfd, buf, sizeof buf, 0); printf("recv()'d %d bytes of data in buf\n", byte_count); // datagram sockets and recvfrom()

struct addrinfo hints, *res; int sockfd; int byte_count; socklen_t fromlen; struct sockaddr_storage addr; char buf[512]; char ipstr[INET6_ADDRSTRLEN];

// get host info, make socket, bind it to port 4950 memset(&hints, 0, sizeof hints); hints.ai_family = AF_UNSPEC; // use IPv4 or IPv6, whichever hints.ai_socktype = SOCK_DGRAM; hints.ai_flags = AI_PASSIVE; getaddrinfo(NULL, "4950", &hints, &res); sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol); bind(sockfd, res->ai_addr, res->ai_addrlen);

// no need to accept(), just recvfrom():

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

70 | P a g e

N E T W O R K P R O G R A M M I N G

fromlen = sizeof addr; byte_count = recvfrom(sockfd, buf, sizeof buf, 0, &addr, &fromlen);

printf("recv()'d %d bytes of data in buf\n", byte_count); printf("from IP address %s\n", inet_ntop(addr.ss_family, addr.ss_family == AF_INET? ((struct sockadd_in *)&addr)->sin_addr:

((struct sockadd_in6 *)&addr)->sin6_addr, ipstr, sizeof ipstr);

Lost Datagrams

Our UDP client/server example is not reliable. If a client datagram is lost (say it is discarded by some router between the client and server), the client will block forever in its call to recvfrom in the function dg_cli, waiting for a server reply that will never arrive. Similarly, if the client datagram arrives at the server but the server's reply is lost, the client will again block forever in its call to recvfrom. A typical way to prevent this is to place a timeout on the client's call to recvfrom.

Just placing a timeout on the recvfrom is not the entire solution. For example, if we do time out, we cannot tell whether our datagram never made it to the server, or if the server's reply never made it back. If the client's request was something like "transfer a certain amount of money from account A to account B" (instead of our simple echo server), it would make a big difference as to whether the request was lost or the reply was lost.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

71 | P a g e

N E T W O R K P R O G R A M M I N G

71 | P a g e N E T W O R K P R O

connect Function with UDP

an asynchronous error is not returned on a UDP socket unless the socket has been connected. Indeed, we are able to call connect for a UDP socket. But this does not result in anything like a TCP connection: There is no three-way handshake. Instead, the kernel just checks for any immediate errors (e.g., an obviously unreachable destination), records the IP address and port number of the peer (from the socket address structure passed to connect), and returns immediately to the calling process.

Overloading the connect function with this capability for UDP sockets is confusing. If theconvention that sockname is the local protocol address and peername is the foreign protocol address is used, then a better name would have been setpeername. Similarly, a better name for the bind function would be setsockname. With this capability, we must now distinguish between

An unconnected UDP socket, the default when we create a UDP socket

A connected UDP socket, the result of calling connect on a UDP socket

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

72 | P a g e

N E T W O R K P R O G R A M M I N G

With a connected UDP socket, three things change, compared to the default unconnected UDP socket:

1. We can no longer specify the destination IP address and port for an output operation. That is, we do not use sendto, but write or send instead. Anything written to a connected UDP socket is automatically sent to the protocol address (e.g., IP address and port) specified by connect.

2. We do not need to use recvfrom to learn the sender of a datagram, but read, recv, or recvmsg instead. The only datagrams returned by the kernel for an input operation on a connected UDP socket are those arriving from the protocol address specified in connect. Datagrams destined to the connected UDP socket's local protocol address (e.g., IP address and port) but arriving from a protocol address other than the one to which the socket was connected are not passed to the connected socket. This limits a connected UDP socket to exchanging datagrams with one and only one peer.

3. Asynchronous errors are returned to the process for connected UDP sockets. The corollary, as we previously described, is that unconnected UDP sockets do not receive asynchronous errors.

unconnected UDP sockets do not receive asynchronous errors. Notes prepared by D. Teja Santosh, Assistant Professor,

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

73 | P a g e

N E T W O R K P R O G R A M M I N G

Lack of Flow Control with UDP

We observe two cases:

CASE 1:

SLOW CLIENT

FAST SERVER

CASE 2:

FAST CLIENT

SLOW SERVER

WE KNOW THE STATEMENT “AT ANY MOMENT OF TIME, SENDER WILL NOT OVERFLOW THE RECEIVER BUFFER” FROM TCP CONCEPT.

Based on this statement, we explain the concept like this:

W.r.to Client:

SLOW-BIT RATE IS LESS FAST-BIT RATE IS MORE W.r.to Server:

SLOW-RECEIVER BUFFER (WINDOW) SIZE IS LESS FAST- RECEIVER BUFFER (WINDOW) SIZE IS MORE

In CASE 2, the Datagrams are lost to the maximum extent. This is the normal situation that is present in UDP Communication.

In CASE 1, the Datagrams are maintained and delivered to the receiver (as there will be flow control). Consider the following example for CASE 2:

The client sent 2,000 datagrams, but the server application received only 30 of these, for a 98% loss rate. is no indication whatsoever to the server application or to the client application that these datagrams were As we have said, UDP has no flow control and it is unreliable. It is trivial, as we have shown, for a UDP sender overrun the receiver. If we look at the netstat output, the total number of datagrams received by the server host (not the server application) is 2,000 (73,208 - 71,208). The counter "dropped due to full socket

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

74 | P a g e

N E T W O R K P R O G R A M M I N G

buffers" indicates how many datagrams were received by UDP but were discarded because the receiving socket's receive queue was full 775 of TCPv2). This value is 1,970 (3,491 - 1,971), which when added to the counter output by the application.

The following Output specifies this:

by the application. The following Output specifies this: THE FIRST SET OF LINES IS WHEN THE

THE FIRST SET OF LINES IS WHEN THE DATAGRAMS ARE NOT YET OBTAINED AT THE CLIENT SIDE (BEFORE THIS COMMUNICATION).

THE SECOND SET OF LINES IS WHEN DATAGRAMS ARE COMMUNICATED IN THIS (CURRENT) COMMUNICATION.

This specifies clearly that there is lack of flow control with the UDP Service.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

75 | P a g e

N E T W O R K P R O G R A M M I N G

Determining Outgoing Interface with UDP

A connected UDP socket can also be used to determine the outgoing interface that will be

used to a particular destination. This is because of a side effect of the connect function when applied to a UDP socket: The kernel chooses the local IP address (assuming the process has not already called bind to explicitly assign this). This local IP address is chosen by searching the routing table for the destination IP address, and then using the primary IP address for the resulting interface.

using the primary IP address for the resulting interface. In the above figure, UDP Client connects

In the above figure, UDP Client connects with the UDP Server using bind(). But, in order for

the datagrams to move from UDP Client to UDP Server, they should move through intermediate routers. So, PEER System now becomes R1 but not UDP Server. This is because we are using connect() within the UDP communication.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

76 | P a g e

N E T W O R K P R O G R A M M I N G

UNIT-VI

Elementary UDP Sockets

All the examples so far in this text have used numeric addresses for the hosts (e.g., 206.6.226.33) and numeric port numbers to identify the servers (e.g., port 13 for the standard daytime server and port 9877 for our echo server). We should, however, use names instead of numbers for numerous reasons: Names are easier to remember; the numeric address can change but the name can remain the same; and with the move to IPv6, numeric addresses become much longer, making it much more error-prone to enter an address by hand. This chapter describes the functions that convert between names and numeric values:

gethostbyname and gethostbyaddr to convert between hostnames and IPv4 addresses, and getservbyname and getservbyport to convert between service names and port numbers.

Domain Name System (DNS)

The DNS is used primarily to map between hostnames and IP addresses. A hostname can be either a simple name, such as solaris or freebsd, or a fully qualified domain name '(FQDN), such as solaris.unpbook.com. Technically, an FQDN is also called an absolute name and must end with a period, but users often omit the ending period. The trailing period tells the resolver that this name is fully qualified and it doesn't need to search its list of possible domains.

Resource Records

Entries in the DNS are known as resource records (RRs). There are only a few types of RRs that we are interested in.

A

A record maps a hostname into a 32-bit IPv4 address.

AAAA

A AAAA record, called a "quad A" record, maps a hostname into a 128-bit IPv6 address. The term "quad A" was chosen because a 128-bit address is four times larger than a 32-bit address.

PTR

PTR records (called "pointer records") map IP addresses into hostnames. For an IPv4 address, then 4 bytes of the 32-bit address is reversed, each byte is

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

77 | P a g e

N E T W O R K P R O G R A M M I N G

converted to its decimal ASCII value (0255), and in-addr.arpa is the

appended. The resulting string is used in the PTR query. For an IPv6 address,

the 32 4-bit nibbles of the 128-bit address are reversed, each nibble is

converted to its corresponding hexadecimal ASCII value (09af), and

ip6.arpa is appended.

MX

An MX record specifies a host to act as a "mail exchanger" for the specified

host. In the example for the host freebsd above, two MX records are provided:

The first has a preference value of 5 and the second has a preference value of

10. When multiple MX records exist, they are used in order of preference,

starting with the smallest value.

CNAME

CNAME stands for "canonical name." A common use is to assign CNAM

records for common services, such as ftp and www. If people use these service

names instead of the actual hostnames, it is transparent when a service is

moved to another host. For example, the following could be CNAMEs for our

host linux:

ftp IN CNAME linux.unpbook.com. www IN CNAME linux.unpbook.com.

IN CNAME linux.unpbook.com. www IN CNAME linux.unpbook.com. Notes prepared by D. Teja Santosh, Assistant Professor,

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

78 | P a g e

N E T W O R K P R O G R A M M I N G

Resolvers and Name Servers

Organizations run one or more name servers, often the program known as BIND (Berkeley Internet Name Domain). Applications such as the clients and servers that we are writing in this text contact a DNS server by calling functions in a library known as the resolver. The common resolver functions are gethostbyname and gethostbyaddr, both of which are described in this chapter. The former maps a hostname into its IPv4 addresses, and the latter does the reverse mapping. The figure below shows a typical arrangement of applications, resolvers, and name servers. We now write the application code. On some systems, the resolver code is contained in a system library and is link-edited into the application when the application is built. On others, there is a centralized resolver daemon that all applications share, and the system library code performs RPCs to this daemon. In either case, application code calls the resolver code using normal function calls, typically calling the functions gethostbyname and gethostbyaddr.

calling the functions gethostbyname and gethostbyaddr . The resolver code reads its system-dependent configuration

The resolver code reads its system-dependent configuration files to determine the location of the organization's name servers. (We use the plural "name servers" because most organizations run multiple name servers, even though we show only one local server in the figure. Multiple name servers are absolutely required for reliability and redundancy.) The file /etc/resolv.conf normally contains the IP addresses of the local name servers.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

79 | P a g e

N E T W O R K P R O G R A M M I N G

It might be nice to use the names of the name servers in the /etc/resolv.conf file, since the names are easier to remember and configure, but this introduces a chicken-and-egg problem of where to go to do the name-to-address conversion for the server that will do the name and address conversion! The resolver sends the query to the local name server using UDP. If the local name server does not know the answer, it will normally query other name servers across the Internet, also using UDP. If the answers are too large to fit in a UDP packet, the resolver will automatically switch to TCP.

gethostbyname Function (Returns: IPV4 Address)

Host computers are normally known by human-readable names. All the examples that we have shown so far in this book have intentionally used IP addresses instead of names, so we know exactly what goes into the socket address structures for functions such as connect and sendto, and what is returned by functions such as accept and recvfrom. But, most applications should deal with names, not addresses. This is especially true as we move to IPv6, since IPv6 addresses (hex strings) are much longer than IPv4 dotted-decimal numbers. (The example AAAA record and ip6.arpa PTR record in the previous section should make this obvious.) The most basic function that looks up a hostname is gethostbyname. If successful, it returns a pointer to a hostent structure that contains all the IPv4 addresses for the host. However, it is limited in that it can only return IPv4 addresses.

it is limited in that it can only return IPv4 addresses. Notes prepared by D. Teja

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

80 | P a g e

N E T W O R K P R O G R A M M I N G

80 | P a g e N E T W O R K P R O

gethostbyname differs from the other socket functions that we have described in that it does not set errno when an error occurs. Instead, it sets the global integer h_errno to one of the following constants defined by including <netdb.h>:

HOST_NOT_FOUND

TRY_AGAIN

NO_RECOVERY

NO_DATA (identical to NO_ADDRESS)

gethostbyaddr Function (Returns:Hostname)

The function gethostbyaddr takes a binary IPv4 address and tries to find the hostname corresponding to that address. This is the reverse of gethostbyname.

This function returns a pointer to the same hostent structure that we described with gethostbyname. The field of interest in this structure is normally h_name, the canonical hostname.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

81 | P a g e

N E T W O R K P R O G R A M M I N G

81 | P a g e N E T W O R K P R O

The addr argument is not a char*, but is really a pointer to an in_addr structure containing the IPv4 address. len is the size of this structure: 4 for an IPv4 address. The family argument is AF_INET.In terms of the DNS, gethostbyaddr queries a name server for a PTR record in the inaddr.arpa domain.

getservbyname and getservbyport Functions (Returns: Port Number and Service Name)

Services, like hosts, are often known by names, too. If we refer to a service by its name in our code, instead of by its port number, and if the mapping from the name to port number is contained in a file (normally /etc/services), then if the port number changes, all we need to modify is one line in the /etc/services file instead of having to recompile the applications. The next function, getservbyname, looks up a service given its name.

getservbyname , looks up a service given its name. The service name servname must be specified.

The service name servname must be specified. If a protocol is also specified (protoname is a non-null pointer), then the entry must also have a matching protocol. Some Internet services

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.jntuworld.com

www.jntuworld.com

82 | P a g e

N E T W O R K P R O G R A M M I N G

are provided using eithe